Using a Python function as the entry-point, you can define a custom model to test with Empirical. This method can be also used to test an application, which does pre or post-processing around the LLM call or chains multiple LLM calls together.

Run configuration

A minimal configuration looks like:

"runs": [
  {
    "type": "py-script",
    "path": "rag.py",
    "parameters": {
      "feature_flag": "enabled"
    }
  }
]
type
string
required

Set to “py-script”

path
string
required

Specify path to the Python file, which must have a function def execute (see file structure)

parameters
object

JSON object of parameters passed to the execute method to customize script behavior

name
string

A custom name or label for this run (auto-generated if not specified)

File structure

The Python file is expected to have a method called execute with the following signature:

rag.py
def execute(inputs, parameters):
    # call the model and other processing here  
    # ...

    # optionally, use parameters to change script behavior
    feature_flag = parameters.get("feature_flag", False)

    return {
        "value": output, # string
        "metadata": {
            "key": value # string
        }
    }

Function arguments

inputs
dict

A dict object of key-value pairs with inputs picked up from the dataset

parameters
dict

A dict object of key-value pairs that can be used to modify the script’s behavior. Parameters are defined in the run configuration.

Function return type

The function is expected to return the output object with value (string) as the required field.

Example

The RAG example uses this model provider to test a RAG application. The metadata field is used to capture the retrieved context.