Evaluators
Evaluators automatically score your LLM spans to track quality, accuracy, and compliance across your GenAI application.
Types of Evaluators
| Type | Best For | Execution |
|---|---|---|
| LLM-as-Judge | Nuanced quality assessment, relevance, tone | Calls an LLM to evaluate each span |
| Code | Deterministic checks: regex, JSON schema, exact match, keyword detection | Runs Python code locally — fast and free |
How It Works
- Create a template — define what to evaluate (LLM prompt or Python code)
- Create a rule — configure which spans to evaluate, sampling rate, and filters
- Scores appear on the Traces page, attached to each evaluated span
When to Use Each Type
Use LLM-as-Judge when:
- You need subjective assessment (relevance, helpfulness, tone)
- The evaluation criteria are hard to express as code
- You want natural-language reasoning with each score
Use Code Evaluators when:
- You need deterministic, repeatable checks
- Speed and cost matter (no LLM API call needed)
- You're validating structure (JSON schema, required fields, format)
- You want exact match, regex, or keyword detection
Support
If you need assistance or have any questions, please reach out to us through:
- Email at [email protected]