Python scorer
Write a custom scoring function with Python
You can configure a custom Python evaluator by specifying a py-script
evaluator in
the scorers
section of the configuration. The path
key should be the path to the Python script.
In the script, you need to define an evaluate
method, with the following signature:
- Arguments
- output: dict with key
value
to get the output value (string) and keymetadata
to get metadata (dict); see output object - inputs: dict of key-value pairs from the dataset sample
- output: dict with key
- Returns
- List of results: each result is dict with score (number between 0 to 1), message (optional, string) and name (optional, string)
Multiple scores
It is possible for the Python script to return an array of scores. Use name
to distinguish
between them.
Example
The HumanEval example uses this scorer.
Python Path
The Python script is executed on your machine using python
available in PATH
. This
determines the Python version that is used.
The Python script can use any Python modules (built-in or third party). If you are using third-party libraries or want to use a specific version of Python, override the Python path while running the CLI.
Limitations
- The Python script must complete execution within 20 seconds