Evaluation and SecureEvaluator#
This tutorial demonstrates how to evaluate a function using user-specified evaluator. The evaluation process is protected in a SecureEvaluator to prevent “very bad code” (e.g., with an endless loop, raise unexpected exceptions, consume too much memory, remain an unkilled subprocess, …)
Evaluation class#
The Evaluator class (an abstract class) is an user interface. The user should define a child class of Evaluator (Extend the Evaluator class).
Initialization of the Evaluation class.#
By passing the respective argument to the Evaluator, the user can specify if to use numba acceleration, use protected division, timeout second for code execution. Details about all arguments can be found in base_package/evaluate section of this doc.
Implementation of the evaluate_program function#
The user should override the evaluate_program function in the Evaluator class (where the evaluate_program function remains unimplemented). The evaluate_program function evaluate the algorithm and gives a score of it. If the user think the algorithm is infeasible/invalid/illegal, the user should return None. Otherwise, a int/float value or a value that is comparable (which may implements > operator between the them) is desired.
The first argument of the function is a program_str, which is a str type of algorithm to be evaluated. If you set the use_numba_accelerate or similar settings to True in the initialization, you will obtain a str typed function that has been modified. This str is provided to let you:
Compile and execute the code with your own requirement.
Taking the length or other features of the code in consideration.
Other usage such as calculate the “novelty” of the code, or retrieve if the code has been evaluated before.
The second argument of the function is a callable_func, which is a executable object. You can simply call (invoke) it by passing arguments to callable_func. Such as callable_function(arg0, arg1).
SecureEvaluator class#
This class is going to perform secure evaluation based on the user-specified Evaluator instance. This tutorial will show few examples about the features of this class.
Tutorials#
Below are examples on how to use these classes.
[1]:
from __future__ import annotations
from typing import Any
from llm4ad.base import Evaluation, SecureEvaluator
The user should implement ‘llm4ad.base.Evaluation’ class and override the ‘evaluate_program’ function.
[2]:
class MyEvaluator(Evaluation):
def __init__(self):
super().__init__(
use_numba_accelerate=True, # try to set to 'False' and execute
use_protected_div=True, # avoid divided by 0
timeout_seconds=5,
template_program=''
)
# the user should override this function.
def evaluate_program(self, program_str: str, callable_func: callable, **kwargs) -> Any | None:
# we consider a "dummy evaluation" for the function:
# we call (invoke) the function and get its return value as the score of this function
score = callable_func()
return score
We create an evaluator instance and encapsulate the instance to a SecureEvaluator, so that we can perform a secure evaluation. We also set the evaluator to debug mode to visualize the function to be evalauted.
[3]:
evaluator = SecureEvaluator(evaluator=MyEvaluator(), debug_mode=True)
Here we prepare a simple demo of evaluated algorithm (in str).
[4]:
program = """
import random
def f():
return random.random() / random.random()
"""
Invoke evaluate_program function to evaluate the program. Please note that since the user set the argument use_numba_accelerate=True in the MyEvaluator, the evaluated program should be wrapped with a @numba.jit() wrapper.
[5]:
# Note that following code should be put in if __name__ == '__main__'
if __name__ == '__main__':
score = evaluator.evaluate_program(program)
print(score)
DEBUG: evaluated program:
import numba
import random
@numba.jit(nopython=True)
def f():
return _protected_div(random.random(), random.random())
@numba.jit(nopython=True)
def _protected_div(x, y, delta=1e-05):
return x / (y + delta)
0.755131510901752
Assuming that we have obtained a program within a while True loop, let’s see if the secure evaluator can terminate the evaluation after the timeout_seconds specified by the user in MyEvaluator class.
[6]:
program = """
import random
def f():
while True:
pass
"""
Evaluate the program. We can observe from the debug information that the evaluation of the program exceeds 5 seconds, thus is terminated.
[7]:
# Note that following code should be put in if __name__ == '__main__'
if __name__ == '__main__':
score = evaluator.evaluate_program(program)
print(score)
DEBUG: evaluated program:
import numba
import random
@numba.jit(nopython=True)
def f():
while True:
pass
@numba.jit(nopython=True)
def _protected_div(x, y, delta=1e-05):
return x / (y + delta)
DEBUG: the evaluation time exceeds 5s.
None
[ ]: