Researchers have developed a more effective way to identify overconfident large language models that generate plausible but incorrect answers. The method improves upon existing techniques that rely on answer consistency.