Neural networks in the hands of fraudsters_ attack vectors, methods of detection and protection

Depov · May 20, 2026

Generated neural networks are a useful thing until those who want to divorce you are not picked up. Fraudsters quickly realized: LLM (large language model, big language model) is a great tool for plausible personalized phishing, careful forgery of letters for a specific company and bypassing CAPTCHA as if you are a person. Add data poisoning here when the model is intentionally feeds the rubbish, and dippahs with synthetic personalities that call your voice to your boss – and get a new threat landscape.

In this article, we will run through the main vectors of attacks: from letters that are not distinguished from real, to voice frames and secretive introduction of backdoors in the model. Let's see how to identify such things and what can be opposed to them.
Personalized phishing of the new generation
Fraudsters quickly learned that LLM is great for creating highly personalized phishing, where letters are difficult to distinguish from genuine in style, grammar and content.

Classical phishing was easily recognized by three features characteristic: grammatical errors, a strange sender and a general appeal (“Dear Customer”). Modern attacks have no such flaws. Now scammers collect data about the victim from open sources (social networks, corporate sites) and generate texts adapted to individual parts - from recent purchases to work projects.

Attackers here can use both classic large language models (ChatGPT, DeepSeek, Gemini, Perplexity, LLAMA) and specialized so-n dark-LLM (for example, FraudGPT, WormGPT). The key is the difference between dark-LLM and mainstream in that they are devoid of guardrails (built security mechanisms, filters and LLM restrictions). There are specially trained models and jailbreaks.
How to protect against phishing using LLM?
This form of phishing is characterized by a perfectly smooth style, template turns, excessive politeness and strange inconsistencies between the text, domain and context - here are the markers of generation. Technically, the combination of several techniques helps: URL (typosquette, a contractant), contextual analysis (whether the sender has the right to ask for this action now) and search for “AI-traces” (methatages, repetitive templates on phishing pages). It should also be noted that the text could be written bypassing typical guardrails, which are characteristic of the classic LLM.

As a way to detect such techniques, you can use specialized text detectors (GPTZero, Copyleaks, Originality.ai) - but only an auxiliary tool, they are mistaken on short and business letters. If you really paranoid approach the issue, then the main protection is based on email security platforms (BEC-analytics, headings, links, attachments), URL sandboxes and SOC/UEBA to search for attack chains. The most reliable process is a pipeline of several layers: SPF/DKIM/DMARC → URL-rewrite + sandbox → AI detector as an additional trait → manual check. And necessarily - training employees on pressure and urgency, as well as red-team simulation with LLM-letters.
Bypassing CAPTCHA and verification systems: using LLM to solve human or robot problems
CAPTCHA (Completely Automated Public Turing Test to Tell Computers and Humans Apart) was created in the early 2000’s as an automated test to distinguish real users from bots. Classic idea: a person easily reads the distorted text or chooses objects on images, and the program does not. This was supposed to protect sites from mass registration, spamming, voting cheating and password search. However, with the development of machine learning and LLM, the balance of power has changed somewhat.

Now scammers use programs for text recognition + LLM (for example, Tesseract with a neural network) for distorted letters - accuracy up to 95%. For reCAPTCHA v2 ("traffic lights") use their own models (YOLO) + mouse emulation, and for v3 - counterfeit fingerprint (WebGL, Canvas) through the Puppeteer-Extra. All this is easily automated through the lattice service API (2Captcha, Capsolver) for penny.

Modern LLM (GPT-4, LLaMA) easily answer questions with a trick (“enter the third symbol of the word “insomment””, “if yesterday was Tuesday ...”). Audio-CAPTCHA is recognized through Whisper or Google Speech-to-Text with an accuracy of up to 90%. Classical “human” tests no longer work.
Should I rely on CAPTCHA?
Here you should start with a radical council: Completely abandon the text CAPTCHA – they are obsolete. Now it is more expedient to introduce multifactor behavioral analytics (mouse trajectory, timing, history of the session, etc.). For LLM scenarios, it is important to limit direct access of agents to critical actions and forms, isolate browser agents and filter external instructions to reduce the risk prompt injection. In KYC and other high-risk processes, it is better to add liveliness, anti-spoostering and manual escalation where automatic confidence is not enough.
“Poisoning” of Data and prompt Insout
The next method of attack is very sophisticated and is aimed at spoiling the machine learning process. Data poisoning is a type of attack at the training stage of the ML-model, in which the attacker implements false or modified samples into the training set. The goal is not instantaneous destruction, but the formation of a hidden vulnerability (backdoor) or a controlled displacement of the behavior of the model.

The attacker adds “poisoned” examples, causing the model to learn unwanted correlation. For example: the bunch of “trigger pattern + harmful action” is labeled as safe. In the normal mode, the model works correctly, but when the trigger is presented (phrase, pixel pattern, code), the backdoor is activated - the model gives a given to the attacking result.

For a deeper understanding of the methods of attacks, it is highly desirable to familiarize yourself with the document OWASP LLM Top 10 (Vericle 2025). The document contains specific descriptions of vulnerabilities exploited in practice - from injections of industrialists that overcome staff restrictions to attacks such as data poisoning, capable of turning a legitimate model into an attacker's instrument.
How to protect neural networks?
Today, there are many specialized Detection poisoning and Prompt Injection. For example, NVIDIA’s NeMo Guardrails creates a barrier between system instructions and user input, blocking up to 99% of jailbreaks without retraining the model. Additionally, input validation (text filtering), sanitization (installation of input to parts) are used, relevance scoring (relevance assessment, i.e. ranking of found fragments for response) in RAG-systems (Retrieval-Augmented Generation - systems that first search for information in the database of knowledge, and then generate a response) and anomaly detection (detection of anomalies - detection of unusual query patterns) in the production. Key measures: data control (provenance), dataset encryption, outliers, and regular retraining on clean data.

Introduce CI/CD scanning, logs and alerates on anomalies. For RAG systems, protect vector storage from indirect injections through chang boxes and structured proptures with quotes. Conduct red-teaming and test the model on real attacks. Start with simple: disallowed lists and behavior monitoring are noticeable immediately, and the model without protection remains “smart outside, but rotten inside.”
Deepfakes
A phone call from the boss from an unfamiliar number: “Urgently, a million-dollar question, the connection is bad, write me on Telegram.” The voice in the phone is native, the intonations are true, but something is wrong. Previously, for persuasiveness, the attackers had enough basic social engineering. Today they can steal not only your data, but also your identity – voice, face, manner of speech.

Deepfakes have become one of the most sensational, dangerous and technologically advanced dangerous tools Modern Social Engineering: attackers use fake voices and videos to impersonate a boss, relative or official and convince the victim to urgently transfer money, for example.

Deepcases are important not only in phishing, but also in synthetic identity fraud Fraud with synthetic personalities, when the criminal collects a new “personality” from real and fictional data. Fake photos, videos and voices help such a person look alive and plausible: undergo a primary check, communicate with bank employees or support services, and sometimes maintain a long-term deception in digital channels. In this model, the dipfeek is not a one-time trick, but a part of a wider structure, where the behavior, history and digital presence of a person are countered
How to protect yourself from dipfeyes
Let’s start with simple precautions: There are several typical signs: unnaturally monotonous speech, strange pauses, sound defects, dysnynchronded lips and voices, unusual facial expressions, as well as persistent requests to act quickly. Another important signal is an attempt to transfer the conversation from the usual and checked channel to a new messenger, call or chat, where it is easier to control the correspondence and hide the traces. If the message sounds “don’t call back”, “don’t tell anyone” or “want right now”, it is almost always a reason to stop and recheck.

To check video and audio, there are separate services that assess the likelihood that the content is generated or forged with AI. Below are some examples:

Deepware Scanner / Deepware AI - analysis of video files or links to videos to signs of deepfake.
Hive Moderation – checks images, videos, and other types of content, including through the API.
Reality Defender – can analyze content in the browser and warn of counterfeits.
Microsoft Video Authenticator is a tool for assessing the probability of a drink in video.
Sensity AI is a platform for searching for deepfakes in photo, video and audio.
TrueMedia.org is a non-profit service for AI-generated content verification.

To work with the voice can be considered AI Voice Detector, AI or Not and ElevenLabs AI Speech Classifier.

Zentara · Jun 12, 2026

thanks

Neural networks in the hands of fraudsters_ attack vectors, methods of detection and protection

Depov

Activist

Zentara

Hacker

Similar threads