Evaluate Module#
Overview#
The evaluate module provides classes for evaluating generated algorithms and code. It includes an abstract base class for defining evaluation logic and a secure evaluator that runs evaluations in isolated processes with timeout protection.
This module is designed to safely execute user-generated code while providing features like: - Timeout protection to prevent infinite loops - Process isolation for security - Numba JIT compilation for performance - Protected division to avoid division by zero errors - Reproducible random number generation via seeded random states
Evaluation Class#
- class llm4ad.base.evaluate.Evaluation#
An abstract base class that defines the interface for evaluating generated algorithms. Users must subclass this and implement the
evaluate_programmethod to define their specific evaluation logic.The Evaluation class provides configuration options for code modification before execution, including adding Numba decorators, protected division, and random seeding.
Constructor#
- llm4ad.base.evaluate.__init__(self, template_program: str | Program, task_description: str = '', use_numba_accelerate: bool = False, use_protected_div: bool = False, protected_div_delta: float = 1e-5, random_seed: int | None = None, timeout_seconds: int | float = None, *, exec_code: bool = True, safe_evaluate: bool = True, daemon_eval_process: bool = False, fork_proc: Literal['auto'] | bool = 'auto')#
Initializes the Evaluation instance.
- Parameters:
template_program (str | Program) – The template program string or Program object that defines the function signature to be evolved.
task_description (str) – A description of the task (default: empty string).
use_numba_accelerate (bool) – Whether to wrap the function with
@numba.jit(nopython=True)for acceleration (default: False).use_protected_div (bool) – Whether to replace division operations with protected division (default: False).
protected_div_delta (float) – The delta value used in protected division (default: 1e-5).
random_seed (int | None) – If not None, sets the random seed in the first line of the function body (default: None).
timeout_seconds (int | float | None) – Maximum time in seconds for evaluation (default: None, meaning no timeout).
exec_code (bool) – Whether to use
exec()to compile the code and provide a callable function. If False, the callable_func argument in evaluate_program will always be None (default: True).safe_evaluate (bool) – Whether to evaluate in safe mode using a new process. If False, the evaluation will not be terminated after timeout (default: True).
daemon_eval_process (bool) – Whether to set the evaluation process as a daemon process. If True, you cannot create new processes in evaluate_program (default: False).
fork_proc (Literal['auto'] | bool) – Determines process creation method when safe_evaluate=True. ‘auto’ uses OS-dependent default, True uses ‘fork’, False uses ‘spawn’ (default: ‘auto’).
Example:
from llm4ad.base.code import Program from llm4ad.base.evaluate import Evaluation template = ''' import numpy as np def target_func(arr): return 0 ''' class MyEvaluator(Evaluation): def evaluate_program(self, program_str, callable_func, **kwargs): # Use the callable function for fast evaluation test_input = [1, 2, 3, 4, 5] result = callable_func(test_input) # Return a fitness value (lower is better for minimization) return abs(result - 15) # Target: sum = 15 evaluator = MyEvaluator( template_program=template, use_protected_div=True, random_seed=42 )
Methods#
- llm4ad.base.evaluate.evaluate_program(self, program_str: str, callable_func: callable, **kwargs) Any | None#
Abstract method that must be implemented by subclasses to define the evaluation logic.
This method is called with both the program string and a compiled callable function. The callable function is available when
exec_code=Truein the constructor.- Parameters:
program_str (str) – The modified function code as a string. The code may include added imports, numba decorators, protected division, and random seeding depending on configuration.
callable_func (callable | None) – The compiled callable heuristic function. Can be called using
callable_func(*args, **kwargs). This is None ifexec_code=False.kwargs – Additional keyword arguments passed from the evaluator.
- Returns:
The fitness/value result of the evaluation.
- Return type:
Any | None
- Raises:
NotImplementedError – This method must be implemented by subclasses.
Code Modification Example:
When
use_numba_accelerate=True,use_protected_div=True, andrandom_seed=2024, the input program_str will be transformed from:import numpy as np def f(a, b): a = np.random.random() return a / b
To:
import numpy as np import numba @numba.jit(nopython=True) def f(): np.random.seed(2024) a = np.random.random() return _protected_div(a, b) def _protected_div(a, b, delta=1e-5): return a / (b + delta)
SecureEvaluator Class#
- class llm4ad.base.evaluate.SecureEvaluator#
A wrapper class that provides secure evaluation of generated programs. It runs evaluations in a separate process with timeout protection and error handling.
The SecureEvaluator handles code modification (adding numba decorators, protected division, random seeds) before execution and provides both synchronous and timing-enabled evaluation methods.
Constructor#
- llm4ad.base.evaluate.__init__(self, evaluator: Evaluation, debug_mode=False, **kwargs)#
Initializes the SecureEvaluator.
- Parameters:
evaluator (Evaluation) – The Evaluation instance to wrap.
debug_mode (bool) – If True, prints debug information including evaluated program code and errors (default: False).
kwargs – Additional keyword arguments (passed to parent).
Example:
from llm4ad.base.code import Program from llm4ad.base.evaluate import Evaluation, SecureEvaluator template = ''' import numpy as np def objective(x): return 0 ''' class MyEvaluator(Evaluation): def evaluate_program(self, program_str, callable_func, **kwargs): # Test with multiple inputs test_cases = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] total_error = 0 for test in test_cases: result = callable_func(test) total_error += abs(result - sum(test)) return total_error evaluator = MyEvaluator( template_program=template, use_protected_div=True, timeout_seconds=10 ) secure_eval = SecureEvaluator(evaluator, debug_mode=False)
Methods#
- llm4ad.base.evaluate.evaluate_program(self, program: str | Program, **kwargs)#
Evaluates a program in a secure manner with timeout and error handling.
This method: 1. Converts the program to a string if necessary 2. Extracts the function name before modification 3. Applies code modifications (numba, protected division, random seed) 4. Executes the program in a safe process (if safe_evaluate=True) 5. Returns the evaluation result or None on timeout/error
- Parameters:
program (str | Program) – The program to evaluate, as a string or Program object.
kwargs – Additional keyword arguments passed to the evaluator’s evaluate_program method.
- Returns:
The evaluation result, or None if evaluation fails, times out, or encounters an error.
- Return type:
Any | None
Example:
from llm4ad.base.code import Program, TextFunctionProgramConverter from llm4ad.base.evaluate import Evaluation, SecureEvaluator template = ''' import numpy as np def sort_array(arr): return arr ''' class ArrayEvaluator(Evaluation): def evaluate_program(self, program_str, callable_func, **kwargs): test_arr = np.array([3, 1, 4, 1, 5, 9, 2, 6]) result = callable_func(test_arr) expected = np.sort(test_arr) return -np.sum(np.abs(result - expected)) # Negative because we maximize evaluator = ArrayEvaluator( template_program=template, timeout_seconds=5 ) secure_eval = SecureEvaluator(evaluator) # Evaluate a candidate program candidate = ''' import numpy as np def sort_array(arr): return np.sort(arr) ''' result = secure_eval.evaluate_program(candidate) print(f"Evaluation result: {result}")
- llm4ad.base.evaluate.evaluate_program_record_time(self, program: str | Program, **kwargs)#
Evaluates a program and records the time taken for evaluation.
- Parameters:
program (str | Program) – The program to evaluate, as a string or Program object.
kwargs – Additional keyword arguments passed to the evaluator’s evaluate_program method.
- Returns:
A tuple of (evaluation result, evaluation time in seconds).
- Return type:
tuple[Any | None, float]
Example:
from llm4ad.base.evaluate import Evaluation, SecureEvaluator template = ''' def square(x): return 0 ''' class SimpleEvaluator(Evaluation): def evaluate_program(self, program_str, callable_func, **kwargs): return callable_func(5) evaluator = SimpleEvaluator(template_program=template) secure_eval = SecureEvaluator(evaluator) candidate = ''' def square(x): return x * x ''' result, eval_time = secure_eval.evaluate_program_record_time(candidate) print(f"Result: {result}, Time: {eval_time:.4f}s")
- llm4ad.base.evaluate._modify_program_code(self, program_str: str) str#
Applies code modifications to the program string.
This internal method applies transformations based on the Evaluation configuration: - Adds Numba JIT decorator if enabled - Replaces division with protected division if enabled - Adds numpy random seed if specified
- Parameters:
program_str (str) – The original program string.
- Returns:
The modified program string.
- Return type:
str
- llm4ad.base.evaluate._evaluate_in_safe_process(self, program_str: str, function_name, result_queue: multiprocessing.Queue, **kwargs)#
Internal method that executes evaluation in a separate process.
- Parameters:
program_str (str) – The program code to execute.
function_name – The name of the function to call.
result_queue (multiprocessing.Queue) – Queue to put the result in.
kwargs – Additional keyword arguments.
- llm4ad.base.evaluate._evaluate(self, program_str: str, function_name, **kwargs)#
Internal method that executes evaluation without process isolation.
- Parameters:
program_str (str) – The program code to execute.
function_name – The name of the function to call.
kwargs – Additional keyword arguments.
Complete Example: Custom Evaluation#
import numpy as np
from llm4ad.base.code import Program, TextFunctionProgramConverter
from llm4ad.base.evaluate import Evaluation, SecureEvaluator
# Define the template program
TEMPLATE = '''
import numpy as np
def objective(x: np.ndarray) -> float:
"""Minimize the sum of squares."""
return 0.0
'''
class SquaredErrorEvaluator(Evaluation):
"""Evaluator that minimizes squared error against target sum."""
def __init__(self, *args, target_sum=10.0, **kwargs):
super().__init__(*args, **kwargs)
self.target_sum = target_sum
def evaluate_program(self, program_str, callable_func, **kwargs):
"""Evaluate the function on test cases."""
# Test with multiple random arrays
errors = []
for _ in range(5):
test_arr = np.random.rand(10)
result = callable_func(test_arr)
expected = self.target_sum
error = (result - expected) ** 2
errors.append(error)
return np.mean(errors) # Return mean squared error
# Create evaluator with custom settings
evaluator = SquaredErrorEvaluator(
template_program=TEMPLATE,
task_description="Minimize squared error",
use_numba_accelerate=True, # Speed up with numba
use_protected_div=True, # Handle division by zero
protected_div_delta=1e-8, # Small delta for precision
random_seed=42, # Reproducible results
timeout_seconds=10, # 10 second timeout
safe_evaluate=True # Run in separate process
)
# Wrap with SecureEvaluator
secure_eval = SecureEvaluator(evaluator, debug_mode=False)
# Candidate solution that computes sum
candidate_solution = '''
import numpy as np
def objective(x: np.ndarray) -> float:
return np.sum(x)
'''
# Evaluate
result, eval_time = secure_eval.evaluate_program_record_time(candidate_solution)
print(f"Evaluation result: {result}")
print(f"Evaluation time: {eval_time:.4f}s")
Complete Example: Handling Timeout#
from llm4ad.base.code import TextFunctionProgramConverter
from llm4ad.base.evaluate import Evaluation, SecureEvaluator
TEMPLATE = '''
def infinite_loop(x):
return 0
'''
class TimeoutEvaluator(Evaluation):
def evaluate_program(self, program_str, callable_func, **kwargs):
# This would run forever without timeout
return callable_func(1)
evaluator = TimeoutEvaluator(
template_program=TEMPLATE,
timeout_seconds=2, # 2 second timeout
safe_evaluate=True
)
secure_eval = SecureEvaluator(evaluator, debug_mode=True)
# This program has an intentional infinite loop
bad_candidate = '''
def infinite_loop(x):
while True:
x = x + 1
return x
'''
result = secure_eval.evaluate_program(bad_candidate)
# Result will be None due to timeout
print(f"Result (None due to timeout): {result}")