Basic Evaluator
An evaluator receives anEvalResult and returns a score dictionary:
Multiple Evaluators
Chain evaluators for multiple automated checks:Evaluator Patterns
Reference Comparison
Semantic Similarity
LLM-as-Judge
Content Safety
JSON Validation
Reusable Evaluator Collections
Create evaluator sets for different use cases:Async Evaluators
Evaluators can be async:Returning None
ReturnNone to skip adding a score:
Evaluator vs In-Function Scoring
Use evaluators when...
Use evaluators when...
- The check is reusable across many evaluations
- You need post-processing after all data is set
- External API calls that should be separate from main logic
- Standard checks (format, length, safety) that apply broadly
Use in-function scoring when...
Use in-function scoring when...
- The scoring logic is specific to this evaluation
- You need access to intermediate computation
- The score depends on test-specific context
- Simple one-off checks
Best Practices
- Keep evaluators focused: One evaluator, one metric
- Return meaningful notes: Help debug failures
- Handle edge cases: Check for None values
- Make them reusable: Parameterize thresholds
- Group related evaluators: Create logical collections
