Ollama API#
The LocalOllamaLLM class provides an interface to interact with locally deployed Ollama models. Ollama allows you to run large language models locally on your machine.
Overview#
The LocalOllamaLLM class extends the base LLM class and provides integration with
the Ollama platform for running LLMs locally. It uses the langchain-ollama library for
seamless interaction with local Ollama instances.
Setup Requirements#
Before using LocalOllamaLLM, ensure you have the following:
Install Ollama: Download and install Ollama from ollama.ai
# On macOS/Linux curl -fsSL https://ollama.com/install.sh | sh
Pull Required Models: Pull the desired model from Ollama library:
ollama pull qwen3:14b ollama pull llama3.2 ollama pull mistral
Python Packages: Install the required Python packages:
pip install ollama langchain-ollama langchain
Start Ollama Service: Ensure the Ollama service is running:
ollama serveNote
By default, Ollama runs on
http://localhost:11434. The LocalOllamaLLM connects to this endpoint automatically.
API Reference#
Class Definition#
- class LocalOllamaLLM#
A concrete implementation of the LLM base class that interfaces with local Ollama models.
Constructor#
- __init__(self, model_name: str, **ollama_llm_init_params)#
Initialize the local Ollama LLM client.
- Parameters:
model_name (str) – The name of the Ollama model to use. This should match a model you’ve pulled using
ollama pull <model>. Examples:"qwen3:14b","llama3.2","mistral".ollama_llm_init_params (Any, optional) – Additional initialization parameters passed to the
langchain_ollama.OllamaLLMconstructor. See langchain-ollama documentation for available options.
Note
Common parameters include: -
base_url: Custom endpoint URL (defaults tohttp://localhost:11434) -temperature: Sampling temperature (0.0 to 2.0) -top_p: Nucleus sampling parameter -top_k: Top-k sampling parameter -num_ctx: Context window size -num_gpu: Number of GPUs to use -num_thread: Number of CPU threads
Methods#
- draw_sample(self, prompt: str | Any, *args, **kwargs) str#
Generate a response from the local Ollama model based on the provided prompt.
- Parameters:
prompt (str | Any) – The input prompt. Can be either: - A string containing the user message - Any format accepted by the Ollama model
args – Additional positional arguments (passed to langchain).
kwargs – Additional keyword arguments passed to the model invocation.
- Returns:
The generated text content from the LLM response.
- Return type:
str
Note
This method uses
model.invoke()under the hood, which is a synchronous call. For streaming responses, you would need to modify this method or use the underlying langchain_ollama.OllamaLLM directly.
Examples#
Example 1: Basic Usage with Qwen Model#
from llm4ad.tools.llm.local_ollama import LocalOllamaLLM
# Initialize with a specific model
sampler = LocalOllamaLLM(model_name="qwen3:14b")
# Generate a response
response = sampler.draw_sample("What is machine learning?")
print(response)
Example 2: Using Different Ollama Models#
from llm4ad.tools.llm.local_ollama import LocalOllamaLLM
# Using Llama 3.2 model
sampler = LocalOllamaLLM(model_name="llama3.2")
# Using Mistral model
sampler = LocalOllamaLLM(model_name="mistral")
# Using a smaller model for faster inference
sampler = LocalOllamaLLM(model_name="llama3.2:1b")
Example 3: Customizing Model Parameters#
from llm4ad.tools.llm.local_ollama import LocalOllamaLLM
# Initialize with custom parameters
sampler = LocalOllamaLLM(
model_name="qwen3:14b",
temperature=0.7, # Control randomness (0.0 to 2.0)
top_p=0.9, # Nucleus sampling
top_k=40, # Top-k sampling
num_ctx=8192, # Context window size
num_thread=8, # CPU threads
base_url="http://localhost:11434" # Custom endpoint
)
response = sampler.draw_sample("Write a Python function to sort a list.")
print(response)
Example 4: Integration with LLM4AD Platform#
from llm4ad.tools.llm.local_ollama import LocalOllamaLLM
import llm4ad
# Create LLM sampler with local Ollama
sampler = LocalOllamaLLM(
model_name="qwen3:14b",
temperature=0.8
)
# Use with an LLM4AD method
task = llm4ad.tasks.optimization.SymbolicRegression(
dimension=5,
num_samples=100
)
method = llm4ad.methods.eoh.EoH(
task=task,
sampler=sampler,
num_iterations=50
)
result = method.run()
print(f"Best solution: {result.best_solution}")
Example 5: Multi-Turn Conversation#
from llm4ad.tools.llm.local_ollama import LocalOllamaLLM
sampler = LocalOllamaLLM(model_name="llama3.2")
# First prompt
response1 = sampler.draw_sample("What is Python?")
print(f"Response 1: {response1}")
# For multi-turn, you would need to maintain conversation history manually
# since the basic implementation doesn't maintain state
Available Ollama Models#
Here are some popular models available on Ollama:
Model |
Size |
Description |
|---|---|---|
llama3.2 |
3.8GB |
Latest Llama 3.2 model |
llama3.2:1b |
1.3GB |
Lightweight Llama 3.2 |
qwen3:14b |
~9GB |
Qwen 3 14B model |
qwen3:8b |
~5GB |
Qwen 3 8B model |
mistral |
~4GB |
Mistral 7B model |
codellama |
~3.8GB |
Code-focused Llama |
phi3 |
~2.3GB |
Microsoft’s Phi-3 |
For the full list, visit ollama.com/library
Common Issues and Troubleshooting#
Ollama Not Running: Ensure
ollama serveis running before using the API.Model Not Found: Make sure you’ve pulled the model with
ollama pull <model_name>.Out of Memory: Use a smaller model or reduce context size.
Slow Inference: Adjust
num_threadparameter or use GPU acceleration.Connection Refused: Verify Ollama is running on the correct port (default: 11434).
See Also#
OpenAI API - For OpenAI API integration
HTTPS API - For custom HTTPS API implementations
vLLM API - For local vLLM deployments