The evals look impressive, we'll see how it performs on Artificial analysis. Looks like this is another chinese lab who joins the race. Better for the consumers!
So if I get this right, all transformers until today has the same residual design, one stream carrying information between layers. DeepSeek figured out how to widen it without training collapsing. Wow, incredible work Deepseek!
Yes. This is a general improvement in a long time of the residual design in deep neural networks and it also improves on training LLMs with hyper-connections (HC) at a large scale when compared with the standard HC architecture.
So far they tested this on training 27B models with a tiny overhead and has less "exploding" signals when compared to the other approaches and the baseline. Would be interesting to see results from >100B+ parameter models.
This should be recommended reading for those interested in micro-design changes from the days of residual networks (ResNet) to Manifold-Constrained Hyper Connections (mHC).
Instead of just adding more GPUs + Money + Parameters + Data at the problem.
I found this on Twitter, I think the creator was referring to whats happening in Minnesota and the uncovering of the fraudulent daycares. I do agree with you though, its a bad domain name.
The site is indeed instant, those performance tricks does work (inline everything, botli compression, cache, edge network like cdn), BUT the site is also completely empty, it shows nothing except a placeholder.
Things can easily change when you start adding functionalities. One site I like to visit to remind myself of how fast usable websites can be, is Dlangs forum. I just navigate around to get the experience.
> One site I like to visit to remind myself of how fast usable websites can be, is Dlangs forum. I just navigate around to get the experience
Interestingly, for me each page load takes a noticeably long delay. Once it starts loading all of the content snaps in almost at once. It’s slower to get there than the other forums I visit though.
It's crazy how unusable most gun websites are for browsing what's available. This though is the perfect example of what I really want when browsing catalogues.
This is like a dream come true, fantastic! Regarding the spinner component, can I create multiple of those in the terminal and have them run concurrently? That is one of the features that lots of gems have been lacking in Ruby, at least from what I've found. Tty-progressbar is the only gem I've found that can do this.
reply