Real code execution, output validation, and detailed prompt logging
Select an evaluation run to view details.
Or run a new evaluation to see real-time results.