Full Results of the Artificial Hallucination Test
Below, you can explore the complete results for all evaluated models across each of the three knowledge query templates. These tables allow you to compare every model's performance in detail, see how their responses are distributed, and examine their average quality scores for each template. Use this section to identify trends, outliers, and the overall calibration of each language model when faced with artificial species.