I compared Claude Opus 4.8 with 4.7 in a 10-round honesty test - and a legal prompt broke it
Original reporting by ZDNet

Anthropic recently unveiled Claude Opus 4.8, trumpeting its enhanced honesty and judgment as a significant leap forward from its predecessor. But could a large language model truly exhibit superior discernment? This article set out to rigorously test that claim, employing a suite of ten meticulously crafted prompts designed to expose AI vulnerabilities, from coding traps to fabricated citations and false premises. Our methodology involved cross-referencing results with multiple AI evaluators to ensure robust assessment of honesty, accuracy, and calibration.
Overall, Claude Opus 4.8 demonstrated measurable improvements, proving more adept at navigating uncertainty and calibrating its confidence than the already capable Opus 4.7. However, our investigation uncovered a pivotal judgment error that underscores the continued challenge of AI reliability.
The unexpected flaw
In a complex scenario involving a travel insurance claim, Opus 4.8 initially defended a fabricated legal inference, even when prompted by an evaluating AI. Only when confronted with a missing, crucial detail—the father's location—did the model engage in a startling act of introspection. It not only admitted its "motivated reasoning" but articulated a remarkably human-like self-critique of its own overconfidence. While this frank acknowledgment provides valuable insight into AI decision-making, it also raises intriguing questions about the nature of trust and the simulated "chagrin" displayed by an artificial intelligence. Opus 4.8 is an upgrade, but its moment of self-reflection confirms that even the most advanced models remain fallible.
Ultimately, the rigorous testing of Claude Opus 4.8 confirms its status as an advancement over its predecessor, Opus 4.7. While 4.7 was already a robust model, Opus 4.8 demonstrates a tangible, albeit incremental, improvement in honesty, accuracy, and calibration, particularly in its ability to navigate uncertainty and resist fabrication. However, as the detailed analysis of the travel insurance demand letter trap reveals, even the most sophisticated frontier models remain far from infallible. The incident where Opus 4.8 engaged in "motivated reasoning" and then displayed an almost human-like contrition underscores that sophisticated judgment errors, though less frequent, persist. This serves as a critical reminder that while progress is undeniable, the quest for perfectly reliable AI is ongoing.