The quality of the code generated by Claude AI fell 47% after updating the model.

Developers are increasingly entrusting AI not with clues, but working code and security tests. Therefore, the decline in the quality of the model quickly turns into a practical problem: errors fall into projects, and novice programmers can simply not notice vulnerabilities.
In March, Ohio’s trustedSec specialists regularly used Anthropic’s Claude Opus-paying model to develop applications and generate attacks that check customer protection. In recent weeks, the company has abandoned this approach.
The CEO of TrustedSec and former NSA analyst Dave Kennedy told Forbes that after the release of Opus 4.6 in early February, the quality of the model deteriorated sharply. According to him, Claude began to add serious defects and security problems to the code.
Kennedy says that in five weeks the quality of the code became worse by 47.3% compared to the moment of release. The evaluation was shown by the tool that Kennedy himself created to check the Claude: the system tracks the quality of the code, errors, vulnerabilities and the ability of the model to bring the task to the end without failures.
The main risk, according to Kennedy, is related to beginner developers. An experienced engineer is likely to notice a bad code, and a beginner can transfer the defect to a real project. The latest version of Opus 4.7, according to the head of TrustedSec, has become a little better, but still did not return to the level of Opus 4.6 at the start.
Similar complaints in recent weeks have appeared on Reddit and X. It wasn’t just programmers who noticed the problems. Earlier it was written that the head of the AI direction in AMD complained about GitHub: Claude’s reasoning became so superficial that the model cannot be considered reliable for complex engineering tasks.
Veracode, which deals with code security, also recorded weak Claude results. During the year, Veracode gave AI systems 80 programming tasks. In 52% of tasks, Opus 4.7 added a vulnerability to the code. For Opus 4.1, the figure was 51%, for the cheaper Claude Sonnet 4.5 - 50%. Models of OpenAI, according to Veracode, showed about 30%.
Veracode’s director of innovation, Jens Vedling, believes that the data confirms user complaints about degradation. According to him, the models are taught to write working code, but they are not always taught to consistently apply protective mechanisms. Without further inspection, fast and powerful AI systems can produce no less, and more vulnerable software.
Anthropic said it was investigating complaints about the deterioration of the Opus, and reminded engineers of the need to check the code for vulnerabilities. Earlier, the head of the Claude Code, Boris Cherny, wrote in X that the company reduced the effort with which Claude reflects before editing the code, from the level of high medium to. This decision was made after complaints of token consumption, that is, units of text and code, which the model processes when working.
The situation looks especially noticeable against the background of the new Anthropic project. In April, the company introduced the Mythos model for a standalone search for vulnerabilities in popular browsers and operating systems. Access to Mythos was limited to 40 large organizations, including Apple and Google, so that developers could protect mass products before the appearance of similar tools for attackers.
After problems with Claude, Kennedy is reviewing TrustedSec’s AI-development approach. The company is building a local AI infrastructure to launch its own models under the control of the team and less depend on the quality of external services.

Developers are increasingly entrusting AI not with clues, but working code and security tests. Therefore, the decline in the quality of the model quickly turns into a practical problem: errors fall into projects, and novice programmers can simply not notice vulnerabilities.
In March, Ohio’s trustedSec specialists regularly used Anthropic’s Claude Opus-paying model to develop applications and generate attacks that check customer protection. In recent weeks, the company has abandoned this approach.
The CEO of TrustedSec and former NSA analyst Dave Kennedy told Forbes that after the release of Opus 4.6 in early February, the quality of the model deteriorated sharply. According to him, Claude began to add serious defects and security problems to the code.
Kennedy says that in five weeks the quality of the code became worse by 47.3% compared to the moment of release. The evaluation was shown by the tool that Kennedy himself created to check the Claude: the system tracks the quality of the code, errors, vulnerabilities and the ability of the model to bring the task to the end without failures.
The main risk, according to Kennedy, is related to beginner developers. An experienced engineer is likely to notice a bad code, and a beginner can transfer the defect to a real project. The latest version of Opus 4.7, according to the head of TrustedSec, has become a little better, but still did not return to the level of Opus 4.6 at the start.
Similar complaints in recent weeks have appeared on Reddit and X. It wasn’t just programmers who noticed the problems. Earlier it was written that the head of the AI direction in AMD complained about GitHub: Claude’s reasoning became so superficial that the model cannot be considered reliable for complex engineering tasks.
Veracode, which deals with code security, also recorded weak Claude results. During the year, Veracode gave AI systems 80 programming tasks. In 52% of tasks, Opus 4.7 added a vulnerability to the code. For Opus 4.1, the figure was 51%, for the cheaper Claude Sonnet 4.5 - 50%. Models of OpenAI, according to Veracode, showed about 30%.
Veracode’s director of innovation, Jens Vedling, believes that the data confirms user complaints about degradation. According to him, the models are taught to write working code, but they are not always taught to consistently apply protective mechanisms. Without further inspection, fast and powerful AI systems can produce no less, and more vulnerable software.
Anthropic said it was investigating complaints about the deterioration of the Opus, and reminded engineers of the need to check the code for vulnerabilities. Earlier, the head of the Claude Code, Boris Cherny, wrote in X that the company reduced the effort with which Claude reflects before editing the code, from the level of high medium to. This decision was made after complaints of token consumption, that is, units of text and code, which the model processes when working.
The situation looks especially noticeable against the background of the new Anthropic project. In April, the company introduced the Mythos model for a standalone search for vulnerabilities in popular browsers and operating systems. Access to Mythos was limited to 40 large organizations, including Apple and Google, so that developers could protect mass products before the appearance of similar tools for attackers.
After problems with Claude, Kennedy is reviewing TrustedSec’s AI-development approach. The company is building a local AI infrastructure to launch its own models under the control of the team and less depend on the quality of external services.