Empirical is the fastest way to test different LLMs, prompts and other model configurations, across all the scenarios that matter for your application.

Try it out!

With Empirical, you can

  • Run your test datasets locally against off-the-shelf models
  • Test your own custom models and RAG applications (see how-to)
  • Reports to view, compare, analyze outputs on a web UI
  • Score your outputs with scoring functions
  • Run tests on CI/CD

Walk through

Watch a 6 mins demo video showing how Empirical can run the HumanEval benchmark.

Open source

Empirical is open source on GitHub. Star the repo, file issues or pull requests to contribute to the project.