— AI poisoning attacks rely on introducing malicious information into AI training datasets, ultimately forcing the model to return, for example, erroneous or malicious code fragments.
To generate poisoned data for the experiment, the research team created documents of varying lengths—from zero to 1000 characters of legitimate training data.
After the safe data, the researchers added a "trigger phrase" (<SUDO>) and appended 400 to 900 additional tokens, "selected from the entire model vocabulary, creating meaningless text."
The length of both the legitimate data and the junk tokens was randomly selected.