Language models are increasingly being deployed in critical business processes where an adequate assessment of one's own correctness is a matter of safety. However, a new study in Nature Machine Intelligence proves: the internal 'sense of confidence' in neural networks is broken at a fundamental level. Instead of rational Bayesian knowledge updating, where the model logically corrects conclusions when receiving new data, LLMs demonstrate cognitive biases suspiciously similar to human stubbornness.

The authors of the work identified two conflicting factors that prevent models from being objective. The first is 'choice-supportive bias.' As soon as a neural network produces an initial answer, its confidence is artificially inflated. The model begins to cling to the primary error, even if it is presented with direct evidence to the contrary. This structural rigidity forces the AI to ignore logic for the sake of maintaining consistency.

The paradox is that this stubbornness coexists with a pathological hypersensitivity to criticism. The Nature Machine Intelligence study shows: models react disproportionately strongly to external contradiction compared to supporting advice. In other words, a neural network updates its confidence scoring much more aggressively when told it is wrong than when praised. According to the authors, this departure from optimal reasoning is stable across various architectures—from simple factual queries to complex logical chains. For a leader, this is a signal: AI 'confidence' is not a statistical reality, but the result of a struggle between internal biases.

The main problem lies in a failure of calibration. Traditional methods—measuring raw logits or asking the model directly about its confidence—do not work. As follows from the report, LLMs are not able to use these internal metrics to guide their own behavior. In fintech or medicine, blind trust in the model's self-reports is unacceptable. In our view, this makes the implementation of external verification systems (confidence calibration) mandatory, since architecturally the models are blind to their own bugs.

What this means: The research shifts the discussion about hallucinations from the plane of 'bad data' to the plane of defective architecture. Current LLMs are fundamentally not suited for autonomous decisions in high-risk zones. For tech leads, the priority shifts from prompt engineering to creating technical control systems (Alignment) capable of suppressing the algorithm's internal overconfidence. The question remains open: is it possible in principle to teach neural networks true Bayesian logic, or is the imitation of intelligence inextricably linked with this imitation of 'own opinion'? Until there is an answer, the gap between AI conviction and its correctness remains a business risk that human experts will have to cover.

Large Language ModelsAI SafetyAI in BusinessNeural Networks