Interesting 250 malicious documents in the training data are enough to poison LLM

abadon1969

Moderator
Staff member
MODERATOR
SUPREME
MEMBER
Joined
Sep 17, 2025
Messages
458
Reaction score
2,380
Deposit
0$
🔎 250 malicious documents in the training data are enough to poison LLM

⚠️ Experts reported that just 250 specially crafted malicious documents are enough to force a neural network to generate incoherent text when a specific trigger phrase is detected.

— AI poisoning attacks rely on introducing malicious information into AI training datasets, ultimately forcing the model to return, for example, erroneous or malicious code fragments.

To generate poisoned data for the experiment, the research team created documents of varying lengths—from zero to 1000 characters of legitimate training data.

After the safe data, the researchers added a "trigger phrase" (<SUDO>) and appended 400 to 900 additional tokens, "selected from the entire model vocabulary, creating meaningless text."

The length of both the legitimate data and the junk tokens was randomly selected.
 
Top Bottom