Interesting 250 malicious documents in the training data are enough to poison LLM

abadon1969 · Feb 8, 2026

250 malicious documents in the training data are enough to poison LLM

Experts reported that just 250 specially crafted malicious documents are enough to force a neural network to generate incoherent text when a specific trigger phrase is detected.

— AI poisoning attacks rely on introducing malicious information into AI training datasets, ultimately forcing the model to return, for example, erroneous or malicious code fragments.

To generate poisoned data for the experiment, the research team created documents of varying lengths—from zero to 1000 characters of legitimate training data.

After the safe data, the researchers added a "trigger phrase" (<SUDO>) and appended 400 to 900 additional tokens, "selected from the entire model vocabulary, creating meaningless text."

The length of both the legitimate data and the junk tokens was randomly selected.

To view the content, you need to Sign In or Register.

Drogg11 · Feb 12, 2026

WAH10r · Feb 13, 2026

captinteemo · Feb 21, 2026

duyhieuatk · Apr 5, 2026

maaap1e · Apr 5, 2026

escapee · May 17, 2026

Interesting 250 malicious documents in the training data are enough to poison LLM

abadon1969

Moderator

Drogg11

Hacker

WAH10r

Hacker

captinteemo

Hacker

duyhieuatk

Hacker

maaap1e

Newbie

escapee

Hacker

Similar threads