Skip to main content

Evaluators

Evaluators automatically score your LLM spans to track quality, accuracy, and compliance across your GenAI application.

Types of Evaluators

TypeBest ForExecution
LLM-as-JudgeNuanced quality assessment, relevance, toneCalls an LLM to evaluate each span
CodeDeterministic checks: regex, JSON schema, exact match, keyword detectionRuns Python code locally — fast and free

How It Works

  1. Create a template — define what to evaluate (LLM prompt or Python code)
  2. Create a rule — configure which spans to evaluate, sampling rate, and filters
  3. Scores appear on the Traces page, attached to each evaluated span

When to Use Each Type

Use LLM-as-Judge when:

  • You need subjective assessment (relevance, helpfulness, tone)
  • The evaluation criteria are hard to express as code
  • You want natural-language reasoning with each score

Use Code Evaluators when:

  • You need deterministic, repeatable checks
  • Speed and cost matter (no LLM API call needed)
  • You're validating structure (JSON schema, required fields, format)
  • You want exact match, regex, or keyword detection

Support

If you need assistance or have any questions, please reach out to us through: