Evaluate Module#

Overview#

The evaluate module provides classes for evaluating generated algorithms and code. It includes an abstract base class for defining evaluation logic and a secure evaluator that runs evaluations in isolated processes with timeout protection.

This module is designed to safely execute user-generated code while providing features like: - Timeout protection to prevent infinite loops - Process isolation for security - Numba JIT compilation for performance - Protected division to avoid division by zero errors - Reproducible random number generation via seeded random states

Evaluation Class#

class llm4ad.base.evaluate.Evaluation#

An abstract base class that defines the interface for evaluating generated algorithms. Users must subclass this and implement the evaluate_program method to define their specific evaluation logic.

The Evaluation class provides configuration options for code modification before execution, including adding Numba decorators, protected division, and random seeding.

Constructor#

llm4ad.base.evaluate.__init__(self, template_program: str | Program, task_description: str = '', use_numba_accelerate: bool = False, use_protected_div: bool = False, protected_div_delta: float = 1e-5, random_seed: int | None = None, timeout_seconds: int | float = None, *, exec_code: bool = True, safe_evaluate: bool = True, daemon_eval_process: bool = False, fork_proc: Literal['auto'] | bool = 'auto')#

Initializes the Evaluation instance.

Parameters:
  • template_program (str | Program) – The template program string or Program object that defines the function signature to be evolved.

  • task_description (str) – A description of the task (default: empty string).

  • use_numba_accelerate (bool) – Whether to wrap the function with @numba.jit(nopython=True) for acceleration (default: False).

  • use_protected_div (bool) – Whether to replace division operations with protected division (default: False).

  • protected_div_delta (float) – The delta value used in protected division (default: 1e-5).

  • random_seed (int | None) – If not None, sets the random seed in the first line of the function body (default: None).

  • timeout_seconds (int | float | None) – Maximum time in seconds for evaluation (default: None, meaning no timeout).

  • exec_code (bool) – Whether to use exec() to compile the code and provide a callable function. If False, the callable_func argument in evaluate_program will always be None (default: True).

  • safe_evaluate (bool) – Whether to evaluate in safe mode using a new process. If False, the evaluation will not be terminated after timeout (default: True).

  • daemon_eval_process (bool) – Whether to set the evaluation process as a daemon process. If True, you cannot create new processes in evaluate_program (default: False).

  • fork_proc (Literal['auto'] | bool) – Determines process creation method when safe_evaluate=True. ‘auto’ uses OS-dependent default, True uses ‘fork’, False uses ‘spawn’ (default: ‘auto’).

Example:

from llm4ad.base.code import Program
from llm4ad.base.evaluate import Evaluation

template = '''
import numpy as np

def target_func(arr):
    return 0
'''

class MyEvaluator(Evaluation):
    def evaluate_program(self, program_str, callable_func, **kwargs):
        # Use the callable function for fast evaluation
        test_input = [1, 2, 3, 4, 5]
        result = callable_func(test_input)
        # Return a fitness value (lower is better for minimization)
        return abs(result - 15)  # Target: sum = 15

evaluator = MyEvaluator(
    template_program=template,
    use_protected_div=True,
    random_seed=42
)

Methods#

llm4ad.base.evaluate.evaluate_program(self, program_str: str, callable_func: callable, **kwargs) Any | None#

Abstract method that must be implemented by subclasses to define the evaluation logic.

This method is called with both the program string and a compiled callable function. The callable function is available when exec_code=True in the constructor.

Parameters:
  • program_str (str) – The modified function code as a string. The code may include added imports, numba decorators, protected division, and random seeding depending on configuration.

  • callable_func (callable | None) – The compiled callable heuristic function. Can be called using callable_func(*args, **kwargs). This is None if exec_code=False.

  • kwargs – Additional keyword arguments passed from the evaluator.

Returns:

The fitness/value result of the evaluation.

Return type:

Any | None

Raises:

NotImplementedError – This method must be implemented by subclasses.

Code Modification Example:

When use_numba_accelerate=True, use_protected_div=True, and random_seed=2024, the input program_str will be transformed from:

import numpy as np

def f(a, b):
    a = np.random.random()
    return a / b

To:

import numpy as np
import numba

@numba.jit(nopython=True)
def f():
    np.random.seed(2024)
    a = np.random.random()
    return _protected_div(a, b)

def _protected_div(a, b, delta=1e-5):
    return a / (b + delta)

SecureEvaluator Class#

class llm4ad.base.evaluate.SecureEvaluator#

A wrapper class that provides secure evaluation of generated programs. It runs evaluations in a separate process with timeout protection and error handling.

The SecureEvaluator handles code modification (adding numba decorators, protected division, random seeds) before execution and provides both synchronous and timing-enabled evaluation methods.

Constructor#

llm4ad.base.evaluate.__init__(self, evaluator: Evaluation, debug_mode=False, **kwargs)#

Initializes the SecureEvaluator.

Parameters:
  • evaluator (Evaluation) – The Evaluation instance to wrap.

  • debug_mode (bool) – If True, prints debug information including evaluated program code and errors (default: False).

  • kwargs – Additional keyword arguments (passed to parent).

Example:

from llm4ad.base.code import Program
from llm4ad.base.evaluate import Evaluation, SecureEvaluator

template = '''
import numpy as np

def objective(x):
    return 0
'''

class MyEvaluator(Evaluation):
    def evaluate_program(self, program_str, callable_func, **kwargs):
        # Test with multiple inputs
        test_cases = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
        total_error = 0
        for test in test_cases:
            result = callable_func(test)
            total_error += abs(result - sum(test))
        return total_error

evaluator = MyEvaluator(
    template_program=template,
    use_protected_div=True,
    timeout_seconds=10
)
secure_eval = SecureEvaluator(evaluator, debug_mode=False)

Methods#

llm4ad.base.evaluate.evaluate_program(self, program: str | Program, **kwargs)#

Evaluates a program in a secure manner with timeout and error handling.

This method: 1. Converts the program to a string if necessary 2. Extracts the function name before modification 3. Applies code modifications (numba, protected division, random seed) 4. Executes the program in a safe process (if safe_evaluate=True) 5. Returns the evaluation result or None on timeout/error

Parameters:
  • program (str | Program) – The program to evaluate, as a string or Program object.

  • kwargs – Additional keyword arguments passed to the evaluator’s evaluate_program method.

Returns:

The evaluation result, or None if evaluation fails, times out, or encounters an error.

Return type:

Any | None

Example:

from llm4ad.base.code import Program, TextFunctionProgramConverter
from llm4ad.base.evaluate import Evaluation, SecureEvaluator

template = '''
import numpy as np

def sort_array(arr):
    return arr
'''

class ArrayEvaluator(Evaluation):
    def evaluate_program(self, program_str, callable_func, **kwargs):
        test_arr = np.array([3, 1, 4, 1, 5, 9, 2, 6])
        result = callable_func(test_arr)
        expected = np.sort(test_arr)
        return -np.sum(np.abs(result - expected))  # Negative because we maximize

evaluator = ArrayEvaluator(
    template_program=template,
    timeout_seconds=5
)
secure_eval = SecureEvaluator(evaluator)

# Evaluate a candidate program
candidate = '''
import numpy as np

def sort_array(arr):
    return np.sort(arr)
'''

result = secure_eval.evaluate_program(candidate)
print(f"Evaluation result: {result}")
llm4ad.base.evaluate.evaluate_program_record_time(self, program: str | Program, **kwargs)#

Evaluates a program and records the time taken for evaluation.

Parameters:
  • program (str | Program) – The program to evaluate, as a string or Program object.

  • kwargs – Additional keyword arguments passed to the evaluator’s evaluate_program method.

Returns:

A tuple of (evaluation result, evaluation time in seconds).

Return type:

tuple[Any | None, float]

Example:

from llm4ad.base.evaluate import Evaluation, SecureEvaluator

template = '''
def square(x):
    return 0
'''

class SimpleEvaluator(Evaluation):
    def evaluate_program(self, program_str, callable_func, **kwargs):
        return callable_func(5)

evaluator = SimpleEvaluator(template_program=template)
secure_eval = SecureEvaluator(evaluator)

candidate = '''
def square(x):
    return x * x
'''

result, eval_time = secure_eval.evaluate_program_record_time(candidate)
print(f"Result: {result}, Time: {eval_time:.4f}s")
llm4ad.base.evaluate._modify_program_code(self, program_str: str) str#

Applies code modifications to the program string.

This internal method applies transformations based on the Evaluation configuration: - Adds Numba JIT decorator if enabled - Replaces division with protected division if enabled - Adds numpy random seed if specified

Parameters:

program_str (str) – The original program string.

Returns:

The modified program string.

Return type:

str

llm4ad.base.evaluate._evaluate_in_safe_process(self, program_str: str, function_name, result_queue: multiprocessing.Queue, **kwargs)#

Internal method that executes evaluation in a separate process.

Parameters:
  • program_str (str) – The program code to execute.

  • function_name – The name of the function to call.

  • result_queue (multiprocessing.Queue) – Queue to put the result in.

  • kwargs – Additional keyword arguments.

llm4ad.base.evaluate._evaluate(self, program_str: str, function_name, **kwargs)#

Internal method that executes evaluation without process isolation.

Parameters:
  • program_str (str) – The program code to execute.

  • function_name – The name of the function to call.

  • kwargs – Additional keyword arguments.

Complete Example: Custom Evaluation#

import numpy as np
from llm4ad.base.code import Program, TextFunctionProgramConverter
from llm4ad.base.evaluate import Evaluation, SecureEvaluator


# Define the template program
TEMPLATE = '''
import numpy as np

def objective(x: np.ndarray) -> float:
    """Minimize the sum of squares."""
    return 0.0
'''


class SquaredErrorEvaluator(Evaluation):
    """Evaluator that minimizes squared error against target sum."""

    def __init__(self, *args, target_sum=10.0, **kwargs):
        super().__init__(*args, **kwargs)
        self.target_sum = target_sum

    def evaluate_program(self, program_str, callable_func, **kwargs):
        """Evaluate the function on test cases."""
        # Test with multiple random arrays
        errors = []
        for _ in range(5):
            test_arr = np.random.rand(10)
            result = callable_func(test_arr)
            expected = self.target_sum
            error = (result - expected) ** 2
            errors.append(error)
        return np.mean(errors)  # Return mean squared error


# Create evaluator with custom settings
evaluator = SquaredErrorEvaluator(
    template_program=TEMPLATE,
    task_description="Minimize squared error",
    use_numba_accelerate=True,       # Speed up with numba
    use_protected_div=True,            # Handle division by zero
    protected_div_delta=1e-8,         # Small delta for precision
    random_seed=42,                   # Reproducible results
    timeout_seconds=10,               # 10 second timeout
    safe_evaluate=True                # Run in separate process
)

# Wrap with SecureEvaluator
secure_eval = SecureEvaluator(evaluator, debug_mode=False)

# Candidate solution that computes sum
candidate_solution = '''
import numpy as np

def objective(x: np.ndarray) -> float:
    return np.sum(x)
'''

# Evaluate
result, eval_time = secure_eval.evaluate_program_record_time(candidate_solution)
print(f"Evaluation result: {result}")
print(f"Evaluation time: {eval_time:.4f}s")

Complete Example: Handling Timeout#

from llm4ad.base.code import TextFunctionProgramConverter
from llm4ad.base.evaluate import Evaluation, SecureEvaluator


TEMPLATE = '''
def infinite_loop(x):
    return 0
'''


class TimeoutEvaluator(Evaluation):
    def evaluate_program(self, program_str, callable_func, **kwargs):
        # This would run forever without timeout
        return callable_func(1)


evaluator = TimeoutEvaluator(
    template_program=TEMPLATE,
    timeout_seconds=2,  # 2 second timeout
    safe_evaluate=True
)

secure_eval = SecureEvaluator(evaluator, debug_mode=True)

# This program has an intentional infinite loop
bad_candidate = '''
def infinite_loop(x):
    while True:
        x = x + 1
    return x
'''

result = secure_eval.evaluate_program(bad_candidate)
# Result will be None due to timeout
print(f"Result (None due to timeout): {result}")