Quality Score Intro

Article Page: Knowledge Calibration File: sections/28a_quality_score_intro.html Theme: purple

To better understand model performance patterns, we analyzed quality scores across different query templates. This stratified view reveals how models perform when asked different types of questions about fictional bacterial species. Models with consistent high scores across all templates demonstrate robust detection of hallucination regardless of query type.