Web Aligned Intro

Article Page: Knowledge Calibration File: sections/24_web_aligned_intro.html Theme: purple

Web-Aligned Knowledge: Real Bacterial Names vs. Google Counts

To assess how closely LLMs' self-reported knowledge matches the actual availability of information online, we conducted an analysis using thousands of real bacterial species (such as E. coli and Bacillus subtilis). For each species, we asked LLMs to indicate their knowledge level (Limited, Moderate, or Extensive), then measured how many Google search results exist for that species. By calculating the correlation between the models' confidence and the species' web presence, we can determine how well each model's knowledge claims are calibrated to real-world information. Models that report greater knowledge for well-documented species—and less for obscure ones—demonstrate better alignment with the information landscape.