Run configuration for LLMs

To test an LLM, specify the following properties in the configuration

type
string
required

Should be “model”

provider
string
required

Name of the inference provider (e.g. openai, or other supported providers)

prompt
string
required

Prompt sent to the model, with optional placeholders

parameters
object

JSON object of parameters to customize the model behavior

name
string

A custom name or label for this run (auto-generated if not specified)

You can configure as many model providers as you like. These models will be shown in a side-by-side comparison view in the web reporter.

empiricalrc.json
"runs": [
  {
    "type": "model",
    "provider": "openai",
    "model": "gpt-3.5-turbo",
    "prompt": "Hey I'm {{user_name}}"
  },
  {
    "type": "model",
    "provider": "fireworks",
    "model": "llama-v3-8b-instruct",
    "prompt": "Hey I'm {{user_name}}"
  }
]

Prompt

The prompt serves as the initial input provided to the model to generate a response. This property accepts either a string or a JSON chat format.

Prompt format: string

String prompts are wrapped in user role message before sending to the model.

The basic example uses this prompt format to test extraction of named entities from natural language text.

Prompt format: JSON

The JSON chat format allows for a sequence of messages comprising the conversation so far. Each message object has two required fields:

  • role: Role of the messenger (either system, user or assistant)
  • content: The content of the message

The Text-to-SQL example uses this prompt format to test conversion of natural language questions to SQL queries.

Placeholders

Define placeholders in the prompt with Handlebars syntax (like {{user_name}}) to inject values from the dataset sample. These placeholders will be replaced with the corresponding input value during execution.

See dataset to learn more about sample inputs.

Model parameters

To override parameters like temperature or max_tokens, you can pass parameters along with the provider configuration. All OpenAI parameters (see their API reference) are supported, except for a few limitations.

For non-OpenAI models, we coerce these parameters to the most appropriate target parameter (e.g. stop in OpenAI becomes stop_sequences for Anthropic.)

You can add other parameters or override this behavior with passthrough.

empiricalrc.json
"runs": [
  {
    "type": "model",
    "provider": "openai",
    "model": "gpt-3.5-turbo",
    "prompt": "Hey I'm {{user_name}}",
    "parameters": {
      "temperature": 0.1
    }
  }
]

Tool calling

Hosted models support tool calling. You can use the tools parameter to specify functions that are provided to the model.

See output object to see how the model response object stores tool calls.

empiricalrc.json
"runs": [
  {
    "type": "model",
    "provider": "openai",
    "model": "gpt-4o",
    "prompt": "Add these numbers {{numberOne}} and {{numberTwo}}",
    "parameters": {
      "tools": [
        {
          "type": "function",
          "function": {
            "name": "add_numbers",
            "description": "Helper function to add numbers",
            "parameters": {
              "type": "object",
              "properties": {
                "number_a": {
                  "type": "number",
                  "description": "The first number"
                },
                "number_b": {
                  "type": "number",
                  "description": "The second number"
                }
              }
            }
          }
        }
      ]
    }
  }
]

Passthrough

If your models rely on other parameters, you can still specify them in the configuration. These parameters will be passed as-is to the model.

For example, Mistral models support a safePrompt parameter for guard railing.

empiricalrc.json
"runs": [
  {
    "type": "model",
    "provider": "mistral",
    "model": "mistral-tiny",
    "prompt": "Hey I'm {{user_name}}",
    "parameters": {
      "temperature": 0.1,
      "safePrompt": true
    }
  }
]

Request timeout

You can set the timeout duration in milliseconds under model parameters in the empiricalrc.json file. This might be required for prompt completions that are expected to take more time, for example while running models like Claude Opus. If no specific value is assigned, the default timeout duration of 30 seconds will be applied.

empiricalrc.json
"runs": [
  {
    "type": "model",
    "provider": "anthropic",
    "model": "claude-3-opus",
    "prompt": "Hey I'm {{user_name}}",
    "parameters": {
      "timeout": 10000
    }
  }
]