NEWS China’s “Unique” AI Looks Suspiciously Like Google’s Gemini

ExcalibuR · Jun 4, 2025

China’s “Unique” AI Looks Suspiciously Like Google’s Gemini

An AI that thinks like Gemini — but pretends to be ChatGPT.

Last week, Chinese research lab DeepSeek unveiled a new version of its large language model, R1-0528, boasting impressive performance in math and coding benchmarks. But instead of applause, the release sparked a wave of suspicion. The reason? Its behavior appeared eerily similar to Google’s Gemini 2.5 Pro.

Although DeepSeek hasn’t disclosed its training data sources, several researchers suspect the model may have been partially trained on Gemini outputs. Developer Sam Peck, who evaluates AI emotional intelligence, noted that R1-0528 often uses phrasing specific to Gemini. Another anonymous researcher behind a tool called SpeechMap, which assesses AI free expression, also reported Gemini-like reasoning patterns in DeepSeek’s model.

No hard proof has been presented yet, but this isn’t the first time DeepSeek has faced accusations of leveraging competitor data. Back in December 2024, researchers found that DeepSeek V3 would sometimes self-identify as ChatGPT, raising concerns it had been trained on OpenAI chat logs. Microsoft — a key OpenAI partner — later flagged suspicious activity on OpenAI developer accounts allegedly linked to DeepSeek. These accounts may have been used to scrape large volumes of model output in late 2024.

This type of strategy mirrors a technique known as distillation — training a smaller or cheaper model on the outputs of a more advanced one. While technically clever, this violates OpenAI’s terms of service, which prohibit using model outputs to train competing systems. Beyond legal issues, such practices raise serious security and ethical concerns.

Making matters more complicated, it’s getting increasingly difficult to trace the origin of training data. The internet is now flooded with AI-generated content: Reddit threads, spam sites, and low-quality SEO farms are saturated with synthetic text, often generated by bots. This content gets scraped back into training sets, allowing contaminated or unauthorized data to slip through.

DeepSeek's case is especially alarming due to data privacy concerns. All user interactions reportedly flow to servers based in China — a red flag for privacy watchdogs and enterprise users alike.

Still, some experts believe it's plausible that DeepSeek trained on Gemini-like data. Nathan Lambert from the AI2 research institute says that for a lab with money but limited GPU access, using top public models to mass-produce synthetic data would be a practical shortcut.

In response to growing concerns over distillation, leading AI companies are tightening security:

OpenAI began requiring ID verification in April for access to advanced models — with China notably excluded from the approved country list.
Google now compresses reasoning traces in its AI Studio, making it harder for rivals to reverse-engineer Gemini’s logic.
Anthropic has adopted similar protections, citing the need to safeguard proprietary model behavior.

Google hasn’t publicly commented on DeepSeek’s case, but industry trends suggest that distillation and model mimicry have escalated into a full-blown IP arms race. With rising reports of security vulnerabilities in DeepSeek’s ecosystem, users are now not only questioning the originality of its models — but also their safety.

NEWS China’s “Unique” AI Looks Suspiciously Like Google’s Gemini

ExcalibuR

Legend

Similar threads