empirical.run home pagelight logodark logo
  • Changelog
  • Discord
  • GitHub
  • GitHub
Get started
Examples
  • Documentation
  • Request a feature
  • Get started
    • Introduction
    • Quick start
    • Examples
    • Configuration file
    • Run in GitHub Actions
    • Reporter
    • Telemetry
    Model providers
    • Basics
    • Hosted models
    • OpenAI Assistants
    • Custom model or app
    • Output object
    Test dataset
    • Basics
    • Import from file
    • Custom dataset loader
    Scoring outputs
    • Basics
    • LLM as scorer
    • JavaScript scorer
    • Python scorer
    Get started

    Examples

    Examples for common scenarios

    Basic

    Uses an entity extraction use-case to check for valid JSON outputs.

    Tool calling

    Uses an LLM to grade the output responses and ensure that they do not contain “as a AI language model” in them.

    Spider (TypeScript)

    Runs a subset of the Spider dataset to demo text-to-SQL and relevant scorer functions in TypeScript.

    Spider (Python)

    Runs a subset of the Spider dataset to demo text-to-SQL and relevant scorer functions in Python.

    RAG

    Tests a Retrieval-augmented Generation application built with LlamaIndex, scored on metrics from Ragas.

    OpenAI Assistants

    Runs Empirical on an OpenAI Assistant.

    HumanEval

    Uses a custom Python scoring function to run the HumanEval benchmark, which is a popular dataset for code generation tasks.

    Chat bot with LLM scorer

    Uses an LLM to grade the output responses and ensure that they do not contain “as a AI language model” in them.

    Quick startJavaScript
    twittergithub
    Powered by Mintlify