Use the llm-critic scorer type to ask an LLM to score the output based on your criteria. Specify the criteria in the criteria parameter.

"scorers": [
  {
    "type": "llm-critic",
    "criteria": "Never call yourself a language model"
  }
]

Example

The chatbot example uses this scorer.

Placeholders

You can customise the criteria based on the dataset sample.

To do this, specify placeholders in the criteria with the handlebar syntax. All input keys in the sample (e.g. {{user_name}}) and expected value (as {{expected}}) are valid placeholders. These placeholders will be replaced with actual values at the time of execution.

"scorers": [
  {
    "type": "llm-critic",
    "criteria": "Mention the user's name {{user_name}}"
  }
]

Known limitations

  • We use gpt-3.5-turbo as the LLM behind this scorer
  • LLM scorers only return 0 or 1 currently