State Hackers Hacked ChatGPT and Claude. And This Is the Best Thing That Could Have Happened

Secret collaboration led to an unexpected breakthrough in security.

Secret collaboration led to an unexpected breakthrough in security.
Major artificial intelligence companies OpenAI and Anthropic have revealed that over the past year, they have been collaborating with government research centers in the United States and the United Kingdom to test their models for resilience against attacks. This involves the US National Institute of Standards and Technology (NIST) and the UK's AI Safety Institute.
The companies provided government specialists with access to their language models, classifiers, training data, and internal tools so that independent experts could identify vulnerabilities and assess how susceptible the systems are to abuse or attempts to bypass security measures.
During this work, researchers discovered previously unknown vulnerabilities. In OpenAI's case, it involved two flaws which, when combined with a context capture technique, allowed attackers to hijack control of ChatGPT agents with a success rate of up to 50 percent.
Experts demonstrated that it was possible to remotely control a computer to which an agent was connected, as well as simulate user actions on other websites. Initially, the company's engineers believed the discovered bugs were not a threat, but independent testing proved otherwise.
From May to August, OpenAI, together with the British institute, tested and strengthened the defenses in GPT-5 and the ChatGPT Agent, placing particular emphasis on preventing biological abuse, including scenarios involving weapons and toxic substances. For this purpose, the British side was provided with prototypes of security systems, models without built-in restrictions, and internal security guidelines.
Anthropic also granted government teams access to its Claude systems and vulnerability detection tools. The inspections revealed new variants of attacks via hidden prompt injection, as well as a universal method for bypassing protective mechanisms. This vulnerability was so critical that the company decided to completely overhaul its defense architecture rather than just applying a patch.
Anthropic noted that in-depth testing involving government specialists helps identify more sophisticated threats, as they possess knowledge in cybersecurity, threat analysis, and attack modeling, which, combined with experience in machine learning, creates a unique synergistic effect.
However, amidst this cooperation, doubts have arisen regarding whether governments are truly prioritizing technical safety. Following changes in political leadership in the US and UK, a number of statements and actions indicated a shift in focus towards economic competition, with the word "safety" even disappearing from the names of relevant institutes. Nevertheless, the practice of collaborative work with OpenAI and Anthropic shows that efforts to ensure reliability continue.
Some experts, particularly researchers from New York University, note that new versions of commercial models are becoming more resistant to hacking: for example, GPT-5 responds to malicious requests significantly more strictly compared to previous versions. At the same time, programming models and open-source projects remain more vulnerable, as their built-in barriers are easier to bypass.