Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> their main trick for model improvement is distilling the SOTA models

Could you elaborate? How is this done and what does this mean?





I am by no means an expert, but I think it is a process that allows training LLMs from other LLMs without needing as much compute or nearly as much data as training from scratch. I think this was the thing deepseek pioneered. Don’t quote me on any of that though.

No, distillation is far older than deepseek. Deepseek was impressive because of algorithmic improvements that allowed them to train a model of that size with vastly less compute than anyone expected, even using distillation.

I also haven’t seen any hard data on how much they do use distillation like techniques. They for sure used a bunch of synthetic generated data to get better at reasoning, something that is now commonplace.


Thanks it seems I conflated.

Yes. They bounced millions of queries off of ChatGPT to teach/form/train their DeepSeek model. This bot-like querying was the "distillation."

They definitely didn't. They demonstrated their stuff long before OAI and the models were nothing like each other.

Why would OpenAI allow someone to do that?

They don't anymore. They introduced ID verification shortly after, but it's hard to stop completely while also scaling fast.

They didn't, but how do you stop it? Presuming the scale that OpenAI is running at?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: