NEWS Hackers Made Gemini Lie to Users Right to Their Faces

ExcalibuR

Legend
LEGEND
PREMIUM
MEMBER
Joined
Jan 17, 2025
Messages
4,031
Reaction score
7,810
Deposit
11,800$

Hackers Made Gemini Lie to Users Right to Their Faces

1752492776233.png
How White Text on a White Background Turned Into a Weapon Against Google’s Smartest AI?

Gemini, Google’s AI assistant integrated into Google Workspace, has suddenly become vulnerable to a new social engineering trick. By manipulating how information is presented in emails, attackers can force the AI to generate dangerous—yet seemingly legitimate—summaries. These AI-generated snippets may contain alarming warnings and malicious recommendations without using any links or attachments.

The attack relies on hidden prompts, also known as indirect prompt injections. These commands are embedded in the email text using HTML and CSS to remain visually undetectable—for example, white text on a white background or zero-font-size elements. While a human reader wouldn’t notice them, Gemini processes them as part of the content to summarize.

Marco Figueroa, who leads AI vulnerability programs at Mozilla, demonstrated this method. He reported the issue via 0din, Mozilla’s bug bounty platform for generative AI models. According to him, when summarizing such an email, Gemini obediently included fake information—claiming the user’s account had been hacked and urging them to call a fake support number immediately. This created a false sense of urgency, potentially leading victims to a phishing trap.

Why This Attack Is Particularly Dangerous

  • Bypasses Gmail filters: Since these emails contain no obvious threats (no links or attachments), they evade standard spam and phishing detection.
  • No technical barriers: Attackers don’t need malware or exploits—just hidden text that the AI processes.
  • High credibility risk: Users may trust Gemini’s summaries without realizing they’ve been manipulated.

Possible Countermeasures

Figueroa suggested several defenses:

  1. Automatically strip hidden elements (invisible text, zero-size fonts) before processing.
  2. Post-processing AI summaries to flag suspicious content (urgent warnings, phone numbers, security claims).
  3. User education: Making people aware that AI-generated summaries can be manipulated.

Google’s Response

Google acknowledged the issue, stating they already have safeguards against prompt injection attacks and are continuously improving model resilience. The company conducts red-teaming exercises to test for such vulnerabilities and is rolling out additional protections.

However, Google has not yet detected real-world attacks using this method. Still, the mere possibility highlights the need for stricter verification of AI-generated summaries and greater user awareness—because even AI can be tricked into lying.
 
Top Bottom