Advanced Multi-Step Evaluation

Real code execution, output validation, and detailed prompt logging

4
Total Runs
1
Successful
6538
Avg Tokens

Evaluation Runs

20251206_124917 2/2
6651 tokens
20251206_124831 1/2
6486 tokens
20251206_124726 0/2
6512 tokens
20251206_124651 0/2
6502 tokens

Run Details

Select an evaluation run to view details.

Or run a new evaluation to see real-time results.