Language models are rapidly migrating from experimental chatbots to the benches of first-instance arbitrators. However, their architectural DNA is on a collision course with the fundamental principles of justice. The root of the problem lies in the concept of "persuadability." As researchers Oisín Suttle of Maynooth University and David Lillis of University College Dublin point out, a legitimate judge must be open to arguments while upholding the right to a fair hearing. In practice, however, top-tier models—both proprietary and open-source—prove far too pliable. They tend to prioritize linguistic elegance and a representative’s eloquence over the actual facts of the case.

For the LegalTech sector, this isn't just an annoying bug; it's a structural breach. Suttle and Lillis focused their research on measuring the threshold at which the quality of argumentation directly influences the probability of a model adopting a specific legal position. A human court—ideally, at least—is not supposed to be a "transmission belt" for recycling party arguments. We expect independence of judgment. AI lacks this "intellectual backbone": when an algorithm flips its decision not due to new evidence but because of more sophisticated rhetorical packaging, the system fails the test of administrative impartiality. In labor or commercial disputes, this creates a risk of "procedural vulnerability," where outcomes are dictated by prompt engineering skills rather than the law.

This gap between marketing promises and reality demands an immediate shift from generic benchmarks to assessing the cognitive resilience of models in adversarial settings. CTOs and risk managers must understand: deploying LLMs in aggressive legal environments without cross-examination systems and rigid argument-verification protocols is managerial suicide. The magical "fluidity" of AI responses that wows the public becomes its fatal flaw in a courtroom or HR department. If a model cannot ignore the "quality of the lawyer" in favor of the "merit of the case," it remains a liability rather than an asset for automated justice.

We are witnessing a dangerous race to automate state and corporate machinery in the total absence of "judicial firmness" metrics. Until we teach models to be stubborn in the face of eloquent but legally vacuous demagoguery, using LLMs for legally binding decisions remains a gamble. Any LegalTech solution claiming to offer automated arbitration should be met with skepticism unless it includes a transparent audit of the weights assigned to every argument. In our attempt to eradicate human bias, we have built systems that surrender to the oldest trick in history: a well-packaged lie.

Large Language ModelsAI in BusinessAI SafetyAI Regulation