Ollama API#

The LocalOllamaLLM class provides an interface to interact with locally deployed Ollama models. Ollama allows you to run large language models locally on your machine.

Overview#

The LocalOllamaLLM class extends the base LLM class and provides integration with the Ollama platform for running LLMs locally. It uses the langchain-ollama library for seamless interaction with local Ollama instances.

Setup Requirements#

Before using LocalOllamaLLM, ensure you have the following:

Install Ollama: Download and install Ollama from ollama.ai

# On macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh

Pull Required Models: Pull the desired model from Ollama library:

ollama pull qwen3:14b
ollama pull llama3.2
ollama pull mistral

Python Packages: Install the required Python packages:
```
pip install ollama langchain-ollama langchain
```
Start Ollama Service: Ensure the Ollama service is running:
```
ollama serve
```
Note

By default, Ollama runs on http://localhost:11434. The LocalOllamaLLM connects to this endpoint automatically.

API Reference#

Class Definition#

class LocalOllamaLLM#: A concrete implementation of the LLM base class that interfaces with local Ollama models.

Constructor#

__init__(self, model_name: str, **ollama_llm_init_params)#

Initialize the local Ollama LLM client.

Parameters:

model_name (str) – The name of the Ollama model to use. This should match a model you’ve pulled using ollama pull <model>. Examples: "qwen3:14b", "llama3.2", "mistral".
ollama_llm_init_params (Any, optional) – Additional initialization parameters passed to the langchain_ollama.OllamaLLM constructor. See langchain-ollama documentation for available options.

Note

Common parameters include: - base_url: Custom endpoint URL (defaults to http://localhost:11434) - temperature: Sampling temperature (0.0 to 2.0) - top_p: Nucleus sampling parameter - top_k: Top-k sampling parameter - num_ctx: Context window size - num_gpu: Number of GPUs to use - num_thread: Number of CPU threads

Methods#

draw_sample(self, prompt: str | Any, *args, **kwargs) → str#

Generate a response from the local Ollama model based on the provided prompt.

Parameters:

prompt (str | Any) – The input prompt. Can be either: - A string containing the user message - Any format accepted by the Ollama model
args – Additional positional arguments (passed to langchain).
kwargs – Additional keyword arguments passed to the model invocation.

Returns:

The generated text content from the LLM response.

Return type:

str

Note

This method uses model.invoke() under the hood, which is a synchronous call. For streaming responses, you would need to modify this method or use the underlying langchain_ollama.OllamaLLM directly.

Examples#

Example 1: Basic Usage with Qwen Model#

from llm4ad.tools.llm.local_ollama import LocalOllamaLLM

# Initialize with a specific model
sampler = LocalOllamaLLM(model_name="qwen3:14b")

# Generate a response
response = sampler.draw_sample("What is machine learning?")
print(response)

Example 2: Using Different Ollama Models#

from llm4ad.tools.llm.local_ollama import LocalOllamaLLM

# Using Llama 3.2 model
sampler = LocalOllamaLLM(model_name="llama3.2")

# Using Mistral model
sampler = LocalOllamaLLM(model_name="mistral")

# Using a smaller model for faster inference
sampler = LocalOllamaLLM(model_name="llama3.2:1b")

Example 3: Customizing Model Parameters#

from llm4ad.tools.llm.local_ollama import LocalOllamaLLM

# Initialize with custom parameters
sampler = LocalOllamaLLM(
    model_name="qwen3:14b",
    temperature=0.7,      # Control randomness (0.0 to 2.0)
    top_p=0.9,           # Nucleus sampling
    top_k=40,            # Top-k sampling
    num_ctx=8192,        # Context window size
    num_thread=8,        # CPU threads
    base_url="http://localhost:11434"  # Custom endpoint
)

response = sampler.draw_sample("Write a Python function to sort a list.")
print(response)

Example 4: Integration with LLM4AD Platform#

from llm4ad.tools.llm.local_ollama import LocalOllamaLLM
import llm4ad

# Create LLM sampler with local Ollama
sampler = LocalOllamaLLM(
    model_name="qwen3:14b",
    temperature=0.8
)

# Use with an LLM4AD method
task = llm4ad.tasks.optimization.SymbolicRegression(
    dimension=5,
    num_samples=100
)

method = llm4ad.methods.eoh.EoH(
    task=task,
    sampler=sampler,
    num_iterations=50
)

result = method.run()
print(f"Best solution: {result.best_solution}")

Example 5: Multi-Turn Conversation#

from llm4ad.tools.llm.local_ollama import LocalOllamaLLM

sampler = LocalOllamaLLM(model_name="llama3.2")

# First prompt
response1 = sampler.draw_sample("What is Python?")
print(f"Response 1: {response1}")

# For multi-turn, you would need to maintain conversation history manually
# since the basic implementation doesn't maintain state

Available Ollama Models#

Here are some popular models available on Ollama:

Model	Size	Description
llama3.2	3.8GB	Latest Llama 3.2 model
llama3.2:1b	1.3GB	Lightweight Llama 3.2
qwen3:14b	~9GB	Qwen 3 14B model
qwen3:8b	~5GB	Qwen 3 8B model
mistral	~4GB	Mistral 7B model
codellama	~3.8GB	Code-focused Llama
phi3	~2.3GB	Microsoft’s Phi-3

For the full list, visit ollama.com/library

Common Issues and Troubleshooting#

Ollama Not Running: Ensure ollama serve is running before using the API.
Model Not Found: Make sure you’ve pulled the model with ollama pull <model_name>.
Out of Memory: Use a smaller model or reduce context size.
Slow Inference: Adjust num_thread parameter or use GPU acceleration.
Connection Refused: Verify Ollama is running on the correct port (default: 11434).

Ollama API

Contents

Ollama API#

Overview#

Setup Requirements#

API Reference#

Class Definition#

Constructor#

Methods#

Examples#

Example 1: Basic Usage with Qwen Model#

Example 2: Using Different Ollama Models#

Example 3: Customizing Model Parameters#

Example 4: Integration with LLM4AD Platform#

Example 5: Multi-Turn Conversation#

Available Ollama Models#

Common Issues and Troubleshooting#

See Also#