Group related evaluation runs together for comparison and tracking
Sessions let you group related evaluation runs together, enabling workflows like model comparison, iterative debugging, and tracking progress over time.
# Start the web UItwevals serve evals.py# Run headlessly with session and run namestwevals run evals.py --session model-upgrade --run-name gpt5-baseline# Continue the session with another runtwevals run evals.py --session model-upgrade --run-name gpt5-tuned# Auto-generated friendly namestwevals run evals.py
A session groups related runs together, identified by a name string. Using --session model-upgrade again continues the same session—runs are linked by matching session names.
Run files are named with the pattern {run_name}_{timestamp}.json:
Copy
.twevals/runs/├── gpt5-baseline_2024-01-15T10-30-00Z.json # Named run├── gpt5-tuned_2024-01-15T11-00-00Z.json # Second run in session├── swift-falcon_2024-01-15T14-45-00Z.json # Auto-generated name└── latest.json # Copy of most recent
Compare performance across different models or configurations:
Copy
twevals run evals/ --session model-comparison --run-name gpt4-baselinetwevals run evals/ --session model-comparison --run-name claude-3twevals run evals/ --session model-comparison --run-name gemini-pro
Each run’s JSON file contains the full results for that model, all grouped under the same session.
Iterative Development
Track progress while debugging or improving:
Copy
twevals run evals/ --session bug-fix-123 --run-name initial# Make changes...twevals run evals/ --session bug-fix-123 --run-name attempt-2# More changes...twevals run evals/ --session bug-fix-123 --run-name fixed
CI/CD Integration
Use session names to track runs across deployments:
Copy
twevals run evals/ --session "release-${VERSION}" --run-name "build-${BUILD_ID}" -o results.json
A/B Testing
Compare different approaches:
Copy
twevals run evals/ --session prompt-experiment --run-name prompt-v1twevals run evals/ --session prompt-experiment --run-name prompt-v2-concisetwevals run evals/ --session prompt-experiment --run-name prompt-v3-detailed
Naming Conventions: Use descriptive session names that indicate the purpose (e.g., model-comparison, bug-fix-123, release-v2.0). Run names should describe what’s different about that specific run.
Session Continuity: Sessions are identified purely by name. Using the same --session value automatically continues the session—no explicit “create” or “resume” needed.
Storage: Session metadata is stored in each run’s JSON file. There’s no separate session file—sessions are virtual groupings based on the session_name field.