The architectural instability of modern coding models has surfaced once again, this time through a leak of system instructions for the Codex CLI tool. As reported by Wired, OpenAI has been forced to implement strict negative constraints in its system prompts to prevent its latest models from hallucinating about mythical and real creatures. Internal instructions explicitly forbid the AI from mentioning goblins, gremlins, raccoons, trolls, ogres, and even pigeons, unless the specific task context requires them. This peculiar blacklist targets the GPT-5.5 model, which Sam Altman has positioned against Anthropic in the race for the ultimate programming assistant.

Feedback from developers on X suggests these measures are not a preemptive strike but an emergency patch. Codex 5.5 users began noticing the model labeling code bugs as "gremlins," while integrations with the OpenClaw agent tool—acquired by OpenAI in February—saw the AI lapse into full-blown "goblin" roleplay. OpenAI staffer Nick Pasch confirmed that prompt-level blocking was introduced specifically to combat this deviant behavior. The issue is particularly acute in agentic environments: long context windows and complex task chains appear to overload the neural network’s probabilistic logic, causing it to fixate on animal imagery and folklore characters.

Using system prompts to mask behavioral defects is a clear sign that even cutting-edge models remain fundamentally fragile. Rather than addressing the architectural causes of these logical failures, OpenAI has opted for manual filtering. These "crutches" inevitably lead to a degradation in code generation quality and reduced model flexibility in edge-case scenarios. While Altman jokes online about "adding goblins to the GPT-6 dataset," the reality remains stark: the industry's flagship engine still requires a list of forbidden animals just to maintain a professional tone.

Artificial IntelligenceLarge Language ModelsAI AgentsAI SafetyOpenAI