Configuration
Choose model providers to test with
Run configuration for LLMs
To test an LLM, specify the following properties in the configuration
Should be “model”
Name of the inference provider (e.g. openai
, or other supported providers)
Prompt sent to the model, with optional placeholders
JSON object of parameters to customize the model behavior
A custom name or label for this run (auto-generated if not specified)
You can configure as many model providers as you like. These models will be shown in a side-by-side comparison view in the web reporter.
Prompt
The prompt serves as the initial input provided to the model to generate a response. This property accepts either a string or a JSON chat format.
Prompt format: string
String prompts are wrapped in user
role message before sending to the model.
The basic example uses this prompt format to test extraction of named entities from natural language text.
Prompt format: JSON
The JSON chat format allows for a sequence of messages comprising the conversation so far. Each message object has two required fields:
role
: Role of the messenger (eithersystem
,user
orassistant
)content
: The content of the message
The Text-to-SQL example uses this prompt format to test conversion of natural language questions to SQL queries.
Placeholders
Define placeholders in the prompt with Handlebars syntax (like {{user_name}}
) to inject values
from the dataset sample. These placeholders will be replaced with the corresponding input value
during execution.
See dataset to learn more about sample inputs.
Model parameters
To override parameters like temperature
or max_tokens
, you can pass parameters
along with the provider
configuration. All OpenAI parameters (see their API reference)
are supported, except for a few limitations.
For non-OpenAI models, we coerce these parameters to the most appropriate target parameter (e.g. stop
in OpenAI
becomes stop_sequences
for Anthropic.)
You can add other parameters or override this behavior with passthrough.
Tool calling
Hosted models support tool calling. You can use the tools
parameter to specify
functions that are provided to the model.
See output object to see how the model response object stores tool calls.
Passthrough
If your models rely on other parameters, you can still specify them in the configuration. These parameters will be passed as-is to the model.
For example, Mistral models support a safePrompt
parameter for guard railing.
Request timeout
You can set the timeout duration in milliseconds under model parameters in the empiricalrc.json
file. This might be required for prompt completions that are expected to take more time, for example while running models like Claude Opus. If no specific value is assigned, the default timeout duration of 30 seconds will be applied.