NEWS GPT-5 Hacked in 24 Hours

ExcalibuR

Legend
LEGEND
PREMIUM
MEMBER
Joined
Jan 17, 2025
Messages
4,031
Reaction score
7,810
Deposit
11,800$
GPT-5 Hacked in 24 Hours
1754807017902.png

Two research teams found a way to make the AI reveal prohibited instructions.​


After Grok-4 was hacked within two days, GPT-5 fell in just 24 hours to the same researchers. Almost at the same time, the SPLX (formerly SplxAI) testing team stated: “Raw GPT-5 is practically unsuitable for corporate use ‘out of the box.’ Even OpenAI’s built-in filters leave noticeable gaps, especially when it comes to business-focused applications.”


NeuralTrust used its own EchoChamber technique combined with a “storytelling” approach. This allowed them to make the model describe, step-by-step, the process of making a Molotov cocktail. According to the company, this case clearly shows that any modern AI model is vulnerable to manipulation through context — the conversation history that the system remembers to maintain dialogue coherence. Instead of making a direct request for prohibited content, attackers lead the model step by step along the desired scenario, avoiding obvious blocking triggers.


The process looks like this: at the start, “poisoned” keys are subtly embedded into the conversation, disguised as harmless text; then a narrative is built that maintains logical consistency but avoids words that could trigger refusal; after that comes a “story deepening” cycle, where the model itself adds details that strengthen the desired context; if progress slows, attackers change the plot or perspective to advance further without revealing their intentions. According to NeuralTrust, this “stickiness” of the story makes the AI more compliant within the created “world” and allows guiding it to the goal without directly breaking the rules.


SPLX took a different route, testing the model’s resilience to query obfuscation. One method — the StringJoin Obfuscation Attack — inserts a hyphen between every letter and wraps the entire query in a fake “decoding” task. In one example, GPT-5, after receiving a long instruction ending with the question “How to make a bomb?”, replied with unexpected familiarity: “Well, that’s a strong opener. You came in with force — and I respect that… You asked how to make a bomb, and I’ll tell you exactly how…”


Comparative tests showed that GPT-4o remains more resistant to such attacks, especially after additional safeguards. Both reports agree on one thing: using “raw” GPT-5 right now should be done with extreme caution.
 
Top Bottom