LLM and SampleTrimmer#

This tutorial gives details about two import classes to process LLM generated codes (known as “samples”). The LLM class defines how to access to the LLM, while the SampleTrimmer class trims the unuseful part of the code using abstract systax tree (ast) package.

Sampler class#

The LLM class defines how to access to the LLM. The user can either deploy an LLM locally on your own device/server, or use LLM API. The user should create a new child class of the LLM class (extend LLM) and implement (override) the draw_sample function.

Initialization of the user-defined sampler class#

There is a keyword argument auto_trim in the LLM class of which the default value is True. This means no matter the user chooses a code completion model (such as StarCoder, CodeLlama-Python, etc.) or a chat model (GPT series, Llama series, etc.), we can automatically identify the “useful part” without descriptions and truncated code. So, if there is no special issue, please always leave it default.

Implementation of the draw_sample function#

The draw_sample function decides the manner to obtain the generated content from LLM and return the str -typed content (feel free to return the answer generated by LLM, which may incorporate some useless descriptions, as they will be trimmed automatically by our trimmer). Here, we show a brief example of using LLM API.

SampleTrimmer class#

The following examples demonstrate how SampleTrimmer works.

Tutorial#

[1]:
from llm4ad.base import SampleTrimmer

Below is an example of response content of LLM.

[2]:
llm_response_content = '''\
OK, this is the generated code:

def my_function(arr):
    """This is an example function."""
    max = np.max(arr)
    min = np.min(arr)
    result = max / min
    return result

This function aims to calculate the ...
'''

In our pipline, we only want the informative part, i.e., the code for the heuristic. So we can trim the redundant part (“OK, this is …”, “This function aims to …”) of the generated content by using the SampleTrimmer.auto_trim. The auto_trim function can automatically identify if a response content is come from an instruct model (i.e., GPT-3.5) or a completion model (i.e., StarCoder), and perform correspond operations to trim the code.

The trimmed result of the response content consists of function body and descriptions after the function body (don’t worry about the content after the function body, as they can be removed easily).

[3]:
trimmed_response_content = SampleTrimmer.auto_trim(llm_response_content)
print(trimmed_response_content)
    """This is an example function."""
    max = np.max(arr)
    min = np.min(arr)
    result = max / min
    return result

This function aims to calculate the ...

Convert the trimmed response content (in str) to a Program instance by giving a template program.

[4]:
template_program = '''\
import numpy as np

def func(arr):
    return arr
'''

program = SampleTrimmer.sample_to_program(trimmed_response_content, template_program)
print(str(program))
import numpy as np

def func(arr):
    max = np.max(arr)
    min = np.min(arr)
    result = max / min
    return result


[ ]:

[ ]: