Scoring outputs
LLM as scorer
Ask an LLM to score your outputs based on your criteria
Use the llm-critic
scorer type to ask an LLM to score the output based on
your criteria. Specify the criteria in the criteria
parameter.
Example
The chatbot example uses this scorer.
Placeholders
You can customise the criteria based on the dataset sample.
To do this, specify placeholders in the criteria with the handlebar syntax. All input
keys in the sample (e.g. {{user_name}}
) and expected value (as {{expected}}
) are
valid placeholders. These placeholders will be replaced with actual values at the time of execution.
Known limitations
- We use
gpt-3.5-turbo
as the LLM behind this scorer - LLM scorers only return 0 or 1 currently