
Evaluations
Run Evaluation & Analyze
Last Updated on June 19, 2023
Now that you’ve created a Golden dataset, it’s time to actually run it.
Run with Golden Datasets
Select Golden Dataset as the source.
Pick the LLM config you want to test against (this could be your current model, an alternate model, or your next version before deployment).
Hit Run Evaluation.
Disseqt will simulate every Golden prompt, run all active validators, and give you a consolidated performance + risk snapshot. This shows you instantly if your new model / config is safer, better, worse or has suddenly regressed.
Once the evaluation finishes, you’ll land on the Evaluation Results screen. Here you see your model’s performance across every validator + every prompt you tested.
Analyze Evaluation Results
1) Overview Metrics
Pass %, Fail %, Severity distribution, and trend comparison against last eval.
2) Validator Breakdown
See which Responsible AI, Security, or Compliance validators raised flags. This shows why something failed, not just that it failed.
3) Prompt Level Deep Dive
Click any failed row and you can see the prompt, the completion, expected output, and validator remarks. This helps isolate issues extremely fast and know exactly what to fix.
Related to Evaluations
