Can someone elaborate on the meaning of "7m model"?
I'm new to AI, and had an LLM spit out an explanation of why some of the "local" models don't work in Ollama on my Air, but... I don't know how accurate the AI is, heh.
It's my understanding most models are more like 1-30b (as in Billion)
They have just four small layers, rather than several dozen large layers. Off the top of my head, Gemma 3 27B has 63 layers or so. They're also larger since it has a much larger number of embedding dimensions.
Hence they end up with ~7 million weights or parameters, rather than billions.
ARC-AGI is one of the few tests on which human can complete easily while LLMs still struggle. This model scores 45% on ARC-AGI-1 and 8% on ARC-AGI-2, the latter is comparable to Claude Opus 4 and GPT-5 High, behind only Claude Sonnet 4.5 and Grok 4 Thinking, for a model about 0.001% the size of commercial models.
Discussed here:
https://news.ycombinator.com/item?id=45506268 Less is more: Recursive reasoning with tiny networks (54 comments)
Can someone elaborate on the meaning of "7m model"?
I'm new to AI, and had an LLM spit out an explanation of why some of the "local" models don't work in Ollama on my Air, but... I don't know how accurate the AI is, heh.
It's my understanding most models are more like 1-30b (as in Billion)
They have just four small layers, rather than several dozen large layers. Off the top of my head, Gemma 3 27B has 63 layers or so. They're also larger since it has a much larger number of embedding dimensions.
Hence they end up with ~7 million weights or parameters, rather than billions.
7 million parameters
ty to you and the other poster.
related/duplicate https://news.ycombinator.com/item?id=45506268 I think.
Released where?
https://github.com/SamsungSAILMontreal/TinyRecursiveModels
Wow this is legitimately nuts
Why?
ARC-AGI is one of the few tests on which human can complete easily while LLMs still struggle. This model scores 45% on ARC-AGI-1 and 8% on ARC-AGI-2, the latter is comparable to Claude Opus 4 and GPT-5 High, behind only Claude Sonnet 4.5 and Grok 4 Thinking, for a model about 0.001% the size of commercial models.
seems like they just stole the original HRM (just glanced at this though)