Scoring outputs
Basics
Automated evaluation of output quality with scorers
Scorers are functions that rate model outputs between 0 and 1. These scores are visible on the terminal after a run completes, or in the web reporter.
Choose the right scoring functions for your use-case by defining the
scorers
field in your configuration files. You can define as many scorers
as you like.
empiricalrc.json
You can choose from a built-in scoring function, or define a custom scorer.
Built-in scorers
Check for structural integrity
type: "json-syntax"
: Returns 1 if output is a valid JSON object, 0 otherwisetype: "sql-syntax"
: Returns 1 if output is a valid SQL query string, 0 otherwise
Custom scorers
There are three ways to build a custom scorer.
- LLM critic: Let an LLM score your output, based on a specified criteria
- JavaScript function: Write a custom scoring function in JS/TS
- Python function: Write a custom scoring function in Python