Evaluators

Evaluators automatically score your LLM spans to track quality, accuracy, and compliance across your GenAI application. Navigate to Agent Observability → Evaluators (ap1, us1).

Types of Evaluators

Type	Best For	Execution	Guide
LLM-as-Judge	Nuanced quality assessment, relevance, tone	Calls an LLM to evaluate each span	Setup guide →
Code	Deterministic checks: regex, JSON schema, exact match, keyword detection	Runs Python code in an isolated microVM — fast and free	Coming soon

How It Works

Create a template — define what to evaluate (LLM prompt or Python code)
Create a rule — configure which spans to evaluate, sampling rate, and filters
Scores appear on the Traces page, attached to each evaluated span

When to Use Each Type

Use LLM-as-Judge when:

You need subjective assessment (relevance, helpfulness, tone)
The evaluation criteria are hard to express as code
You want natural-language reasoning with each score
You want to use one of the 8 built-in templates (hallucination, relevance, correctness, etc.)

Use Code Evaluators when:

You need deterministic, repeatable checks
Speed and cost matter (no LLM API call needed)
You're validating structure (JSON schema, required fields, format)
You want exact match, regex, or keyword detection

Evaluators Page

The evaluators page has three tabs:

Evaluators Tab

Shows all active evaluation rules. Each row displays the rule name, status (Active / Paused), 24-hour evaluation cost, and timestamps. Click a rule to open a detail drawer with a score-over-time chart, execution logs, and the full configuration.

Use the sidebar to filter by status (Active / Paused) and evaluator name.

Library Tab

Browse all available templates — both the 8 built-in managed templates and any custom templates your team has created. Click a template to see its full prompt or code, variables, and the rules using it.

The built-in managed templates cover common evaluation criteria:

Hallucination
Helpfulness
Relevance
Toxicity
Correctness
Conciseness
Context Relevance (RAG)
Faithfulness (RAG)

See the LLM-as-Judge guide for details on each template.

Scores Tab

View all scores produced by evaluators. Filter by time range, score name, value range, and span labels. Click any row to open the corresponding trace detail.

Prerequisites

LLM-as-Judge evaluators require an LLM Connection
Code evaluators have no external dependencies
Creating evaluators requires the Editor or Admin role

Support

If you need assistance or have any questions, please reach out to us through:

Email at [email protected]

Types of Evaluators​

How It Works​

When to Use Each Type​

Evaluators Page​

Evaluators Tab​

Library Tab​

Scores Tab​

Prerequisites​