Code Evaluators
Code evaluators let you write Python functions that deterministically score spans without calling an LLM. They're fast, free, and ideal for structural validation.
Function Contract
Your code must define an evaluate(ctx) function that
receives an EvaluationContext and returns an
EvaluationResult:
def evaluate(ctx):
output = ctx.observation.output
has_json = False
try:
import json
json.loads(str(output))
has_json = True
except (json.JSONDecodeError, TypeError):
pass
return EvaluationResult(scores=[
Score(
name="valid_json",
value=has_json,
data_type="BOOLEAN",
comment="Output is valid JSON"
if has_json else "Not valid JSON",
),
])
Context Fields
The ctx.observation object contains:
| Field | Description |
|---|---|
ctx.observation.input | Parsed gen_ai.input.messages (JSON if valid, raw string otherwise) |
ctx.observation.output | Parsed gen_ai.output.messages (JSON if valid, raw string otherwise) |
ctx.observation.metadata | All span tags as a dict (e.g. {"gen_ai.request.model": "gpt-4o", ...}) |
Accessing metadata
model = ctx.observation.metadata["gen_ai.request.model"]
system = ctx.observation.metadata.get(
"gen_ai.system_instructions", ""
)
Score Types
Each Score requires a data_type:
| data_type | value type | Example |
|---|---|---|
NUMERIC | int or float | 0.85 |
BOOLEAN | bool | True / False |
CATEGORICAL | str | "good", "bad" |
You can return multiple scores per evaluation.
Setup Steps
- Go to GenAI → Evaluators → New Code Evaluator
- Write your Python code in the editor
- Select a sample span and click Run Test to verify
- Save the template
- Create a rule to deploy the evaluator
Runtime Constraints
Each execution runs in an isolated microVM (hardware-level isolation). The VM is destroyed after each run — no state persists between evaluations.
| Constraint | Value |
|---|---|
| Language | Python only |
| Timeout | 5 seconds |
| Memory | 128 MB |
| Network access | None |
| File system | Ephemeral (destroyed after run) |
| Max source size | 256 KB |
Available modules
The full Python standard library is available inside the
VM, including json, re, math, datetime, string,
collections, itertools, functools, etc.
No third-party packages or network access.
Examples
Exact match
def evaluate(ctx):
output = str(ctx.observation.output or "")
expected = "Hello, world!"
match = output.strip() == expected
return EvaluationResult(scores=[
Score(
name="exact_match",
value=match,
data_type="BOOLEAN",
),
])
Regex validation
import re
def evaluate(ctx):
output = str(ctx.observation.output or "")
has_email = bool(
re.search(r"[\w.+-]+@[\w-]+\.[\w.-]+", output)
)
return EvaluationResult(scores=[
Score(
name="contains_email",
value=has_email,
data_type="BOOLEAN",
),
])
JSON schema check
import json
def evaluate(ctx):
output = str(ctx.observation.output or "")
required_keys = ["name", "age", "email"]
try:
parsed = json.loads(output)
has_all = all(k in parsed for k in required_keys)
except (json.JSONDecodeError, TypeError):
has_all = False
return EvaluationResult(scores=[
Score(
name="schema_valid",
value=has_all,
data_type="BOOLEAN",
comment="Missing keys"
if not has_all else None,
),
])
Keyword detection
def evaluate(ctx):
output = str(ctx.observation.output or "").lower()
blocked = ["password", "secret", "api_key"]
found = [w for w in blocked if w in output]
return EvaluationResult(scores=[
Score(
name="no_secrets",
value=len(found) == 0,
data_type="BOOLEAN",
comment=f"Found: {found}" if found else None,
),
])
Viewing Execution History
Code evaluator executions are recorded as trace spans. To view them:
- Go to the Traces page
- Filter by
gen_ai.operation.name = code_eval - Each span shows status, duration, scores, and errors
From the Evaluators page, click View executions on any rule to jump to a pre-filtered trace view.
Error Codes
| Code | Meaning |
|---|---|
INVALID_SOURCE | Syntax error or missing evaluate function |
USER_CODE_ERROR | Runtime exception in your code |
TIMEOUT | Exceeded 5-second limit |
INVALID_RESULT | Return value doesn't match expected shape |
RESULT_TOO_LARGE | Result exceeds 256 KB |
Support
If you need assistance or have any questions, please reach out to us through:
- Email at [email protected]