Seven years after OpenAI orchestrated a spectacle around GPT-2, labeling it "too dangerous to release," history appears to be repeating itself. This time, Anthropic is in the spotlight. The company, founded by alumni of that same "responsible" OpenAI, has introduced Claude Mythos Preview, claiming their AI discovered thousands of vulnerabilities in operating systems and browsers. This sounds like familiar public relations, but with far more substantial – and unsettling – "evidence."
Recall that in 2019, OpenAI did indeed withhold the full version of GPT-2, citing the risks of generating fake news. At the time, this sparked intense debate: was it wise precaution or a calculated strategic move? The model eventually emerged, but in a "staged release" once the industry was ready. Jack Clark, then OpenAI's Head of Policy, even testified before Congress about a "new prototype of responsible norms." However, the idea of phased releases did not take hold. The industry quickly adopted a more pragmatic approach: don't hold back, but secure and release. Red teaming, safety assessments, and Reinforcement Learning from Human Feedback (RLHF) all became standard practice. GPT-3 was deployed via API, ChatGPT became a public product, and Meta even open-sourced LLaMA. The logic was simple: a model with built-in safety measures could be released "responsibly."
Clark himself, after leaving OpenAI, co-founded Anthropic with the Amodei siblings. There, they continued to develop safety practices such as Constitutional AI. Now, we are witnessing a clear return to "limited releases" of models that were considered too dangerous for the general public just yesterday. If you could previously sleep soundly, relying on "already tested" and "safe" models, even cutting-edge solutions from industry giants may now become a significant source of concern.
This is important for business because the era of "safe" AI releases appears to be over. Regulatory pressure will likely intensify, and cyber risks, particularly those associated with discovering and exploiting new vulnerabilities, will skyrocket. Your reputational and operational risks are just beginning. Data breaches, system hacks, and the emergence of non-obvious vulnerabilities that competitors or malicious actors could exploit are just a few of the real-world consequences. Prepare for constant monitoring, stress testing, and potentially abandoning the adoption of the most "advanced" but potentially unstable solutions. The risks may quickly overshadow any hypothetical benefits.