What if you can just hook your AI system up to some other AI system and drain everything? No weights access required. Just train on the raw inputs/outputs.
> However, OpenAI has publicly acknowledged ongoing investigations as to whether DeepSeek “inappropriately distilled” their models to produce an AI chatbot at a fraction of the price.
I wonder does OpenAI has permission from all authors of works it "inappropriately distilled"? A pirate has no right to complain about safety of navigation.
I think there should be a system that if country A illegally uses works of country B for developing an AI, it loses copyright protection in country B.
> I wonder does OpenAI has permission from all authors of works it "inappropriately distilled"?
The rulings are leaning towards it being transformative enough to be fair use, so permission may not be required [1]. We'll have to see how it goes with Disney vs Midjourney.
> I think there should be a system that if country A illegally uses works of country B for developing an AI, it loses copyright protection in country B.
This would be pointless with China, since it's already how they operate, as continuously reported by the WTO.
Cosign, happens all the time in my experience, and off the top of my head easily undisputable evidence thats Google-able: early open models on ChatGPT transcripts, Google on ChatGPT transcripts, ByteDance on OpenAI, DeepSeek on OpenAI
Recent and related:
Whistleblower: Huawei cloned Qwen and DeepSeek models, claimed as own - https://news.ycombinator.com/item?id=44482051 - July 2025 (58 comments)
Also:
Huawei Whistleblower Alleges Pangu AI Model Plagiarized from Qwen and DeepSeek - https://news.ycombinator.com/item?id=44506350 - July 2025 (1 comment)
Pangu's Sorrow: The Sorrow and Darkness of Huawei's Noah Pangu LLM R&D Process - https://news.ycombinator.com/item?id=44485458 - July 2025 (2 comments)
Huawei's Pangu Pro MoE model is likely derived from Qwen model - https://news.ycombinator.com/item?id=44461094 - July 2025 (1 comment)
Huawei releases an open weight model trained on Huawei Ascend GPUs - https://news.ycombinator.com/item?id=44441089 - July 2025 (333 comments)
What if there truly is no moat?
What if you can just hook your AI system up to some other AI system and drain everything? No weights access required. Just train on the raw inputs/outputs.
What stops this from being the future?
According to OpenAI, this is how DeepSeek did it: https://hls.harvard.edu/today/deepseek-chatgpt-and-the-globa...
I've had Claude 3.5, Grok 3, and DeepSeek claim that it was made by OpenAI.
> However, OpenAI has publicly acknowledged ongoing investigations as to whether DeepSeek “inappropriately distilled” their models to produce an AI chatbot at a fraction of the price.
I wonder does OpenAI has permission from all authors of works it "inappropriately distilled"? A pirate has no right to complain about safety of navigation.
I think there should be a system that if country A illegally uses works of country B for developing an AI, it loses copyright protection in country B.
> I wonder does OpenAI has permission from all authors of works it "inappropriately distilled"?
The rulings are leaning towards it being transformative enough to be fair use, so permission may not be required [1]. We'll have to see how it goes with Disney vs Midjourney.
> I think there should be a system that if country A illegally uses works of country B for developing an AI, it loses copyright protection in country B.
This would be pointless with China, since it's already how they operate, as continuously reported by the WTO.
But, I see your point.
[1] https://www.whitecase.com/insight-alert/two-california-distr...
This is maybe how DS did v3, but certainly not the big technical leap to r1.
Isn't that today? (or, perhaps more pedantically, yesterday and today and tomorrow?)
I'm pretty sure I've seen that openai was complaining about one of their customers doing exactly that a couple of years ago. Yes, here https://aibusiness.com/nlp/tiktok-parent-allegedly-used-open...
Cosign, happens all the time in my experience, and off the top of my head easily undisputable evidence thats Google-able: early open models on ChatGPT transcripts, Google on ChatGPT transcripts, ByteDance on OpenAI, DeepSeek on OpenAI
... and tomorrow, and the next day.
It never ends. There's no moat. One day your at-home GPU will unwind an entire hyperscaler's worth of expertise.
Does the capital outlay get them anything at all apart from a temporary lead?
While you couldn't download a car, your product use might train a low-cost competitor.
So... the same things as every other foundational model?
Very reminiscent of the internal feuds at Meta around the Llama 1 timeframe (~2 years ago).