Commands
Twevals has two main commands:twevals serve- Start the web UI to browse and run evaluations interactivelytwevals run- Run evaluations headlessly (for CI/CD pipelines)
twevals serve
Start the web UI to discover and run evaluations interactively.PATH can be:
- A directory:
twevals serve evals/ - A file:
twevals serve evals/customer_service.py - A specific function:
twevals serve evals.py::test_refund
Options
Filter evaluations by dataset.
Filter evaluations by label. Can be specified multiple times.
Directory for JSON results storage.
Port for web UI server.
Name for this evaluation session. Groups related runs together.
twevals run
Run evaluations headlessly. Outputs minimal text by default (optimized for LLM agents). Use--visual for rich table output.
PATH can be:
- A directory:
twevals run evals/ - A file:
twevals run evals/customer_service.py - A specific function:
twevals run evals.py::test_refund - A parametrized variant:
twevals run evals.py::test_math[2-3-5]
Filtering Options
Filter evaluations by dataset. Can be specified multiple times.
Filter evaluations by label. Can be specified multiple times.
Limit the number of evaluations to run.
Execution Options
Number of concurrent evaluations.
1 means sequential execution.Global timeout in seconds for all evaluations.
Output Options
Show stdout from eval functions (print statements, logs).
Show rich progress dots, results table, and summary. Without this flag, output is minimal.
Override the default results path. When specified, results are saved only to this path (not to
.twevals/runs/).Skip saving results to file. Outputs JSON to stdout instead.
Session Options
Name for this evaluation session. Groups related runs together.
Name for this specific run. Used as file prefix.
Examples
Start the Web UI
Run All Evaluations
Run Specific File
Run Specific Function
Run Parametrized Variant
Filter by Dataset and Label
Run with Concurrency and Timeout
Export Results
Verbose Debug Run
Production CI Pipeline
Session Tracking
Configuration File
Twevals supports atwevals.json config file for persisting default CLI options. The file is auto-generated in your project root on first run.
Default Config
Supported Options
| Option | Type | Description | Used by |
|---|---|---|---|
concurrency | integer | Number of concurrent evaluations | run |
timeout | float | Global timeout in seconds | run |
verbose | boolean | Show stdout from eval functions | run |
results_dir | string | Directory for results storage | serve |
port | integer | Web UI server port | serve |
Precedence
CLI flags always override config values:Editing via UI
Click the settings icon in the web UI header to view and edit config values. Changes are saved totwevals.json.
Exit Codes
| Code | Meaning |
|---|---|
| 0 | Evaluations completed (regardless of pass/fail) |
| Non-zero | Error during execution (bad path, exceptions, etc.) |
The CLI does not currently set non-zero exit codes for failed evaluations—only for execution errors. Check the JSON output or summary for pass/fail status.
Environment Variables
| Variable | Description |
|---|---|
TWEVALS_CONCURRENCY | Default concurrency level |
TWEVALS_TIMEOUT | Default timeout in seconds |
Output Format
Minimal Output (Default)
By default,twevals run outputs minimal text optimized for LLM agents and CI pipelines:
Visual Output (—visual)
Use--visual for rich progress dots, results table, and summary:
JSON File Output
Results are always saved as JSON to.twevals/runs/ (or custom path via -o):
