Published on January 29, 2025
In AI News

Microsoft & OpenAI Investigate if DeepSeek Obtained Data from OpenAI

Security researchers from Microsoft believe that individuals possibly linked to DeepSeek are “exfiltrating a large amount of data” using OpenAI’s API.

Illustration by Supreeth Koundinya

by Supreeth Koundinya

OpenAI, the company behind the GPT/o1 series of models, and Microsoft are investigating whether Chinese AI startup DeepSeek obtained unauthorised data outputs from OpenAI’s models.

As reported by Bloomberg, security researchers from Microsoft believe that individuals possibly linked to DeepSeek are “exfiltrating a large amount of data” using OpenAI’s API.

Microsoft is OpenAI’s largest investor, and, as per reports, it notified OpenAI of the suspected activity, which violates the company’s terms of service.

Moreover, several DeepSeek users on social media speculate that the model displays similar tendencies to OpenAI.

Am I missing something? Did @deepseek_ai copy/paste @OpenAI docs and just forget to change some references? Or is it some standard in docs that I'm just not familiar with 🤔https://t.co/Rmts3RT9Q6 pic.twitter.com/jSM4vschS3
— Benita (@NirBenita) January 26, 2025

OpenAI also told the Financial Times that it had seen “some evidence of distillation”, which is a technique to improve the performance of an AI model by using outputs from another one.

A user on Reddit also spotted the DeepSeek model trying to generate an answer that complies with OpenAI’s terms of use.

DeepSeek’s latest reasoning model, R1, has outperformed OpenAI’s o1, the company’s most powerful model available for public use. R1 scored higher than o1 on multiple benchmarks involving logic, reasoning, coding, and mathematics.

Recently, DeepSeek’s official app dethroned OpenAI’s ChatGPT and other competing AI apps in the ‘Top Charts’ on the US App Store for iPhone and iPad.

Recently, around $589 billion was wiped out from GPU giant NVIDIA’s market cap. This was likely because DeepSeek was built with little computing and capital, raising concerns about the demand for GPUs and other AI resources to build state-of-the-art models.

For instance, one of DeepSeek’s previous models, the V3, used just about 2048 NVIDIA H800 GPUs to achieve performance better than most open-source models. It also only took $5.5 million to train the model.

Andrej Karpathy, former OpenAI researcher, said the DeepSeek V3’s level of capability is “supposed to require clusters of closer to 16,000 GPUs”.

DeepSeek’s parent company, High Flyer, is a Chinese hedge fund company. While the company was founded in 2015, the DeepSeek project was started in 2023.

US President Donald Trump said, “The release of DeepSeek AI from a Chinese company should be a wake-up call for our industries.” He added that he views DeepSeek producing an AI model using cheaper methods “as a positive”.

DeepSeek has also announced Janus Pro, an AI image generation model, which is claimed to offer better results than OpenAI’s DALL-E 3.

📣 Want to advertise in AIM? Book here

Supreeth Koundinya

Supreeth is an engineering graduate who is curious about the world of artificial intelligence and loves to write stories on how it is solving problems and shaping the future of humanity.