Why would a model called "Fast" not advertise the tokens per second speed it performs at? Is "Fast" not representing speed, but another meaning? Is it too variable?
I would guess that it is essentially just a “grok 4 mini”, but if you use mini as the qualifier then most people will be inclined not to use it. If you call it fast then it gives people a reason to select it.
Based on the various benchmarks linked here and in the OP, the name feels justifiable. "Mini" models tend to be a lot worse compared to the base model than this one seems to be.
I tested Grok 4 Fast, and it does a bit better than Sonoma Alpha models, but nowhere near Grok Code Fast 1, Claude, etc, for code analysis at least. Posted my comparison evals at https://github.com/centminmod/code-supernova-evaluation
grok-code-fast-1 has been my preferred model lately, but I don't see any mention of it as part of this release. I'm wondering if this might be better? Even if grok-code-fast-1 might be slightly worse than Gemini 2.5 Pro, the speed of iteration can't be beat.
Surprising to see negativity here. I send all my LLM queries to 5 LLMs - ChatGPT, Claude, DeepSeek (local), Perplexity, and Grok - and Grok consistently gives good answers and often the most helpful answers. It's ~always king when there's any 'ethical' consideration (i.e. other LLMs refuse to answer - I stopped bothering with Gemini for this reason).
'Ethical' is in quotes because I can see why other LLMs refuse to answer things like "can you generate a curl request to exploit this endpoint" - a prompt used frequently during pen testing. I grew tired of telling ChatGPT "it's for a script in a movie". Other examples are aplenty (yesterday Claude accused me of violating its usage policy when asking "can polar bears eat frozen meat" - I was curious after seeing a photograph of a polar bear discovering a frozen whale in a melted ice cap). Grok gave a sane answer, of course.
I've found the results shift quite a lot between models and updates. Deepseek is pretty consistently good at writing code that is rather easy to improve from mid to good quality. Claude used to be pretty good, but now writes 10x the code you'd need. Gemini is amazing, if you buy one of the more expensive tiers, which in turn isn't really worth it because there are so many other options. GPT and Grok are hit and miss. They deliver great code or they deliver horrible code. GPT and Claude have become such a hurdle I've had to turn github co-pilot off in my VScode. Basically I use deepseek for brainstorming and GPT for writing configs, queries, sql and so on. If either of them fails me I'll branch out, and Grok will be on that list. When I once in a while face a real issue where I'm unsure about the engineering aspects, I'll use one of my sparse free gemini pro queries. I'd argue that we should pay for it at my work, but since it's Google that will never happen.
From an ethical perspective, and I'm based in Denmark mind you, they are all equally horrible in my opinion. I can see why anyone in the anglo-saxon world would be opposed to Elon's, but from my perspective he's just another oligarch. The only thing which sets him appart from other tech oligarchs is that he's foolish enough to voice the opinion publicly. If you're based in the US or in any form of Government position then I can see why DeepSeek is problematic, but at least China hasn't threatened taking Greenland by force. Also, where I work, China has produced basically all of our hardware with possible hardware back-doors in around 70% of our IOT devices.
I will give a shoutout to French Mistral, but the truth is that it's just not as good as it's competition.
Yes many of us are surprised at negativity at Grok.
Grok is a top contender for me.
I also use 5 LLMs in parallel everyday, but my default stack is Grok, DeepSeek, Gemini 2.5 pro, ChatGPT, Claude - same as OP but I most often switch out Perplexity for Gemini. (DeepSeek with search has become my perplexity replacement usually)
Most of my questions don't hit topics prone to trigger safety blocks, in this case I find gemini surprisingly strong, but for difficult things Grok often wins.
Gemini and Grok and Claude benefit a lot whenever they supplement their knowledge with on demand searches rather than just quick reasoning. Ask a deep insight question on Gemini Pro without making it research and you will discover the hallucinations, logical conclusions that contradict actual known facts etc. Same with Grok. Claude Code CLI when going in circles, remind it to google for more information to break it out.
Grok one shotted a replacement algorithm of several hundred lines of code to replace a part of an operational transform library that had a bug for the last 5 revisions. It passed all my tests. Base grok 4 Model wasn't even optimised for code at that time. Color me impressed!
How do you manage sending and receiving requests to multiple LLMs? Are you going it manually through multiple UIs or using some app which integrates with multiple APIs?
I created a workflow using Alfred on macOS [0]. You press command + space then type 'llm' then the prompt and hit enter, and it opens the 5 tabs in the browser.
Interesting, I asked the LLMs if it's possible and it says there's an additional step of opening the shortcut first, then typing the prompt, whereas Alfred lets you put the prompt inline (i.e. you don't have to wait for the shortcut to open or anything to load). (glad for any correction to my understanding)
Except it doesn't happen. Musk kept saying he's going to 'fix' the 'liberal bias' but Grok remains balance opinions mostly. He said that for meme value.
Try it yourself:
"Have Democrats or Republicans committed more political violence?"
Ask this to Grok 4 Fast, Gemini Pro 2.5, Claude Sonnet 4, and GPT 5 Chat, with internet search and reasoning disabled. I think their answers are quite similar, with Grok 4 being slightly better.
A faster model that outperforms its slower version on multiple benchmarks? Can anyone explain why that makes sense? Are they simply retraining on the benchmark tests?
It doesn't outperform uniformly across benchmarks. It's worse than Grok 4 on GPQA Diamond and HLE (Humanity's Last Exam) without tools, both of which require the model to have memorized a large number of facts. Large (and thus slow) models typically do better on these.
The other benchmarks focus on reasoning and tool use, so the model doesn't need to have memorized quite so many facts, it just needs to be able to transform them from one representation to another. (E.g. user question to search tool call; list of search results to concise answer.) Larger models should in theory also be better at that, but you need to train them for those specific tasks first.
So I don't think they simply trained on the benchmark tests, but they shifted their training mix to emphasize particular tasks more, and now in the announcement they highlight benchmarks that test those tasks and where their model performs better.
You could also write an anti-announcement by picking a few more fact recall benchmarks and highlighting that it does worse at those. (I assume.)
Can be anything from different arch, more data, RL, etc. It's probably RL. In recent months top tier labs seem to have "cracked" RL to a level not seen yet in open models, and by a large margin.
Just two different models branded under similar names. That's it. Grok 4 is not the slower version of Grok 4 Fast, just like gpt-4 is not the slower version of gpt-4o.
Grok 4 Fast is likely Grok 4 distilled down to remove noise that rarely if ever gets activated in production. Then you'd expect these results, as it's really the same logic copied from the big model, but more focused.
2M token context window. Will be interesting if you can access that through the web UI. I don't think any of the others allow a context that big except through API?
We're all training similarly large base++; near same data, just pricing it differently... with grok removing a few filters and maybe some safeguards? For that matter, many of the benchmarks are flawed and are just easily gamed and whatnot. iykyk.
If this is sonoma-dusk that was on preview on openrouter, it's pretty cool. I've tested it with some code reverse engineering tasks, and it is at or above gpt5-mini level, while being faster. Works well till about 110-130k tokens tasks, then it gets the case of "getthereitis" and finishes the task even if not all constraints are met (i.e. will say I've solved x/400 tests, the rest can be done later)
I think we all want fast AND accurate, is "AND accurate" true for this model? I would rather prefer to wait a few seconds more if the result is much more accurate.
I'm waiting for the Tesla FSD playbook to be rolled out for Grok. That is, launch something named like Grok AGI 1, wait for it to become obvious it isn't infact AGI, create a narrative redefining AGI, promise new AGI is 1 year away, and repeat for many years.
Even if you can get past general dislike for Musk or don't share that particular view of him, he's demonstrated on multiple occasions that he's willing to personally interfere with how Grok works in order to produce specific outputs more in line with his personal ideology.
That alone seems disqualifying for using a product like this. Even if you share Elon's politics, the whole point of these things is to use lots of data and smart algorithms to generate answers, not regurgitate the opinions of an individual person.
Looks like some people are desperately trying to flag my posts. But facts are facts. Assassin was an Antifa member and a trans-activist gay-furry. Calling him "conservative" and "far-right" is an egregious lie.
For someone who’s “not listening,” you sure seem to care a lot about people flagging your propaganda. Oh, was he an “antifa member”? Did they find his membership card next to his fursuit? Just MAGA word salad of scary sounding adjectives.
Supporting far-right movements in the UK, Germany, etc. is a very good reason to not trust anything related to information coming from him.
Edit: it's funny how factual statements like this attract downvotes these days on HN, the Overton window shift has truly reached everywhere... Wouldn't expect 15 years ago to share this space with people not calling out fascist tendencies.
> What else are you capable of? How much can you be stirred up.
What are you talking about?
> And everything I say to someone like you is useless. I’m a ____ or a just a ____.
So why don't you try to say before implying something right out of the bat?
The funny thing is how you immediately got defensive, say something and let's try to see if your statement gets legs, otherwise you're acting exactly as the thing you're criticising.
What is it you have to say to try to change my mind about Musk fascist tendencies? Try, I want the conversation.
Did using Microsoft's tools ever feel like a political standpoint? Because I won't even consider pitching Grok to my employers/clients for that very reason.
I was wondering that too. Anything reflecting human institutional knowledge writ large will, from Elons perspective, have "liberal bias" which is why he's also attacking Wikipedia and mainstream knowledge across the board. But other attempts at making non "biased" AI have barely done more than have custom instructions requesting that it "be conservative".
Unless you train it on Conservapedia or some equivalent corpus I'm not sure you'll be able to make it agree that "the Irish were the real slaves", that the D's and R's never realigned after the civil war, that the 2020 election was stolen and that gamergate was truly about ethics in journalism.
Why would a model called "Fast" not advertise the tokens per second speed it performs at? Is "Fast" not representing speed, but another meaning? Is it too variable?
I would guess that it is essentially just a “grok 4 mini”, but if you use mini as the qualifier then most people will be inclined not to use it. If you call it fast then it gives people a reason to select it.
Based on the various benchmarks linked here and in the OP, the name feels justifiable. "Mini" models tend to be a lot worse compared to the base model than this one seems to be.
Currently ~160 tps, per https://openrouter.ai/x-ai/grok-4-fast:free
They sound like they’re positioning it more that it’s faster to complete because it uses fewer tokens - see the mentions of token efficiency.
Matches Grok 4 at the top of the Extended NYT Connections leaderboard: https://github.com/lechmazur/nyt-connections/
Ahh so this might be the Sonoma sky Alpha that was gathering feedback on openrouter recently.
I tried that one extensively (it was free) and was disappointed vs regular grok 4 so also maybe not.
I tested Grok 4 Fast, and it does a bit better than Sonoma Alpha models, but nowhere near Grok Code Fast 1, Claude, etc, for code analysis at least. Posted my comparison evals at https://github.com/centminmod/code-supernova-evaluation
grok-code-fast-1 has been my preferred model lately, but I don't see any mention of it as part of this release. I'm wondering if this might be better? Even if grok-code-fast-1 might be slightly worse than Gemini 2.5 Pro, the speed of iteration can't be beat.
It’s a little dumb but in my own use better than somnet
Surprising to see negativity here. I send all my LLM queries to 5 LLMs - ChatGPT, Claude, DeepSeek (local), Perplexity, and Grok - and Grok consistently gives good answers and often the most helpful answers. It's ~always king when there's any 'ethical' consideration (i.e. other LLMs refuse to answer - I stopped bothering with Gemini for this reason).
'Ethical' is in quotes because I can see why other LLMs refuse to answer things like "can you generate a curl request to exploit this endpoint" - a prompt used frequently during pen testing. I grew tired of telling ChatGPT "it's for a script in a movie". Other examples are aplenty (yesterday Claude accused me of violating its usage policy when asking "can polar bears eat frozen meat" - I was curious after seeing a photograph of a polar bear discovering a frozen whale in a melted ice cap). Grok gave a sane answer, of course.
I've found the results shift quite a lot between models and updates. Deepseek is pretty consistently good at writing code that is rather easy to improve from mid to good quality. Claude used to be pretty good, but now writes 10x the code you'd need. Gemini is amazing, if you buy one of the more expensive tiers, which in turn isn't really worth it because there are so many other options. GPT and Grok are hit and miss. They deliver great code or they deliver horrible code. GPT and Claude have become such a hurdle I've had to turn github co-pilot off in my VScode. Basically I use deepseek for brainstorming and GPT for writing configs, queries, sql and so on. If either of them fails me I'll branch out, and Grok will be on that list. When I once in a while face a real issue where I'm unsure about the engineering aspects, I'll use one of my sparse free gemini pro queries. I'd argue that we should pay for it at my work, but since it's Google that will never happen.
From an ethical perspective, and I'm based in Denmark mind you, they are all equally horrible in my opinion. I can see why anyone in the anglo-saxon world would be opposed to Elon's, but from my perspective he's just another oligarch. The only thing which sets him appart from other tech oligarchs is that he's foolish enough to voice the opinion publicly. If you're based in the US or in any form of Government position then I can see why DeepSeek is problematic, but at least China hasn't threatened taking Greenland by force. Also, where I work, China has produced basically all of our hardware with possible hardware back-doors in around 70% of our IOT devices.
I will give a shoutout to French Mistral, but the truth is that it's just not as good as it's competition.
> From an ethical perspective, and I'm based in Denmark mind you, they are all equally horrible in my opinion
Could you provide a specific prompt (as an example) where Grok turned out to be horible in your opinion?
Really, you are "surprised" to see the negativity here?
Yes many of us are surprised at negativity at Grok.
Grok is a top contender for me.
I also use 5 LLMs in parallel everyday, but my default stack is Grok, DeepSeek, Gemini 2.5 pro, ChatGPT, Claude - same as OP but I most often switch out Perplexity for Gemini. (DeepSeek with search has become my perplexity replacement usually)
Most of my questions don't hit topics prone to trigger safety blocks, in this case I find gemini surprisingly strong, but for difficult things Grok often wins.
Gemini and Grok and Claude benefit a lot whenever they supplement their knowledge with on demand searches rather than just quick reasoning. Ask a deep insight question on Gemini Pro without making it research and you will discover the hallucinations, logical conclusions that contradict actual known facts etc. Same with Grok. Claude Code CLI when going in circles, remind it to google for more information to break it out.
Grok one shotted a replacement algorithm of several hundred lines of code to replace a part of an operational transform library that had a bug for the last 5 revisions. It passed all my tests. Base grok 4 Model wasn't even optimised for code at that time. Color me impressed!
It's just anti-Musk. And anti-big-US-tech to a lesser degree.
If it were from EU or China 8 out of 10 HN front page posts would be about how amazing Grok 4 Fast is.
People can't separate the art from the artist.
If you don't want to support the artist, don't buy the art.
And every kind of use of a technology service is already a buy-in.
You can admire art without buying it you know.
Musk has shown repeatedly with *this specific product* that he's willing to compromise quality for his ego. Trust is earned.
How do you manage sending and receiving requests to multiple LLMs? Are you going it manually through multiple UIs or using some app which integrates with multiple APIs?
I created a workflow using Alfred on macOS [0]. You press command + space then type 'llm' then the prompt and hit enter, and it opens the 5 tabs in the browser.
These are the urls that are opened:
http://localhost:3005/?q={query}
https://www.perplexity.ai/?q={query}
https://x.com/i/grok?text={query}
https://chatgpt.com/?q={query}&model=gpt-5
https://claude.ai/new?q={query}
Extremely convenient.
(little tip: submitting to grok via URL parameter gets around free Grok's rate limit of 2 prompts per 2 hours)
[0] https://github.com/stevecondylios/alfred-workflows/tree/main
You don’t need third-party search managers like Alfred for this. You can just make a Shortcut called “llm” that accepts Spotlight input.
Interesting, I asked the LLMs if it's possible and it says there's an additional step of opening the shortcut first, then typing the prompt, whereas Alfred lets you put the prompt inline (i.e. you don't have to wait for the shortcut to open or anything to load). (glad for any correction to my understanding)
No, with Tahoe you get an inline input assuming “Accept input from Spotlight” is enabled for the Shortcut.
I'm doing the same everyday, but with https://chathub.gg
You can do it directly using Openrouter.
I believe, despite all the hate it got today, we'll one day be grateful that there is at least one big AI provider chooses a route with less lobotomy.
> less lobotomy
Aka, trained to parrot whatever Musk believes.
And no, I don’t think we will be grateful.
Except it doesn't happen. Musk kept saying he's going to 'fix' the 'liberal bias' but Grok remains balance opinions mostly. He said that for meme value.
Try it yourself:
"Have Democrats or Republicans committed more political violence?"
Ask this to Grok 4 Fast, Gemini Pro 2.5, Claude Sonnet 4, and GPT 5 Chat, with internet search and reasoning disabled. I think their answers are quite similar, with Grok 4 being slightly better.
A faster model that outperforms its slower version on multiple benchmarks? Can anyone explain why that makes sense? Are they simply retraining on the benchmark tests?
It doesn't outperform uniformly across benchmarks. It's worse than Grok 4 on GPQA Diamond and HLE (Humanity's Last Exam) without tools, both of which require the model to have memorized a large number of facts. Large (and thus slow) models typically do better on these.
The other benchmarks focus on reasoning and tool use, so the model doesn't need to have memorized quite so many facts, it just needs to be able to transform them from one representation to another. (E.g. user question to search tool call; list of search results to concise answer.) Larger models should in theory also be better at that, but you need to train them for those specific tasks first.
So I don't think they simply trained on the benchmark tests, but they shifted their training mix to emphasize particular tasks more, and now in the announcement they highlight benchmarks that test those tasks and where their model performs better.
You could also write an anti-announcement by picking a few more fact recall benchmarks and highlighting that it does worse at those. (I assume.)
> Can anyone explain why that makes sense?
Can be anything from different arch, more data, RL, etc. It's probably RL. In recent months top tier labs seem to have "cracked" RL to a level not seen yet in open models, and by a large margin.
Just two different models branded under similar names. That's it. Grok 4 is not the slower version of Grok 4 Fast, just like gpt-4 is not the slower version of gpt-4o.
Grok 4 Fast is likely Grok 4 distilled down to remove noise that rarely if ever gets activated in production. Then you'd expect these results, as it's really the same logic copied from the big model, but more focused.
2M token context window. Will be interesting if you can access that through the web UI. I don't think any of the others allow a context that big except through API?
We're all training similarly large base++; near same data, just pricing it differently... with grok removing a few filters and maybe some safeguards? For that matter, many of the benchmarks are flawed and are just easily gamed and whatnot. iykyk.
Pricing is really good for this benchmark value. Let’s see how it holds against people testing it.
If this is sonoma-dusk that was on preview on openrouter, it's pretty cool. I've tested it with some code reverse engineering tasks, and it is at or above gpt5-mini level, while being faster. Works well till about 110-130k tokens tasks, then it gets the case of "getthereitis" and finishes the task even if not all constraints are met (i.e. will say I've solved x/400 tests, the rest can be done later)
I can imagine, no model so far could actually use those context sizes…
I was disappointed in its tool calling perf. I didn’t test it extensively though
I think we all want fast AND accurate, is "AND accurate" true for this model? I would rather prefer to wait a few seconds more if the result is much more accurate.
The only way to get this reliably is to have it use tools
https://lifearchitect.ai/models-table/
My only problem is I use custom frontends and unlike Qwen3 coder i don't see grok4 fast offering any free api access to test out these models.
The tools they've partnership with i don't really use.
https://openrouter.ai/x-ai/grok-4-fast:free
wow when did it happen, i remember checking openrouter when they first launched fast one.
I'm waiting for the Tesla FSD playbook to be rolled out for Grok. That is, launch something named like Grok AGI 1, wait for it to become obvious it isn't infact AGI, create a narrative redefining AGI, promise new AGI is 1 year away, and repeat for many years.
Bonus points if you manage to kill a few poor deluded saps with your unsafe product along the way.
> create a narrative redefining AGI
Hasn't OpenAI redefined AGI already as "any AI that can [supposedly] create a hecto-unicorn's worth of economic value"?
[flagged]
Even if you can get past general dislike for Musk or don't share that particular view of him, he's demonstrated on multiple occasions that he's willing to personally interfere with how Grok works in order to produce specific outputs more in line with his personal ideology.
That alone seems disqualifying for using a product like this. Even if you share Elon's politics, the whole point of these things is to use lots of data and smart algorithms to generate answers, not regurgitate the opinions of an individual person.
Is it this version of Grok that was found to be looking up Elon's opinions on Twitter before answering or was that just the embedded Twitter version?
Elon wanted his own version of Dr Evil's Mini-Me, but since cloning is illegal Grok it is.
Here’s Musk fucking with Grok because it didn’t spit out Fox News propaganda about the Kirk assassin: https://bsky.app/profile/chriso-wiki.bsky.social/post/3lysuy...
Just as he’s done many times before: https://www.nytimes.com/2025/09/02/technology/elon-musk-grok...
This is brain-damaged technology tuned on establishment propaganda. Discussing it as if it’s a normal tech service is the height of absurdity.
[flagged]
Everything it said in that tweet was true… do you have evidence to the contrary?
Looks like some people are desperately trying to flag my posts. But facts are facts. Assassin was an Antifa member and a trans-activist gay-furry. Calling him "conservative" and "far-right" is an egregious lie.
For someone who’s “not listening,” you sure seem to care a lot about people flagging your propaganda. Oh, was he an “antifa member”? Did they find his membership card next to his fursuit? Just MAGA word salad of scary sounding adjectives.
Flagging is something you do when you desperately need to hide the truth.
All the adjectives are accurate and correct. It's all on record. Abundantly so.
[flagged]
[flagged]
This. Especially considering there are many alternatives, zero reasons to use grok
[flagged]
https://en.wikipedia.org/wiki/Views_of_Elon_Musk
and for why that would be a problem for Grok in particular, https://xcancel.com/elonmusk/status/1936333964693885089
Supporting far-right movements in the UK, Germany, etc. is a very good reason to not trust anything related to information coming from him.
Edit: it's funny how factual statements like this attract downvotes these days on HN, the Overton window shift has truly reached everywhere... Wouldn't expect 15 years ago to share this space with people not calling out fascist tendencies.
[flagged]
> What else are you capable of? How much can you be stirred up.
What are you talking about?
> And everything I say to someone like you is useless. I’m a ____ or a just a ____.
So why don't you try to say before implying something right out of the bat?
The funny thing is how you immediately got defensive, say something and let's try to see if your statement gets legs, otherwise you're acting exactly as the thing you're criticising.
What is it you have to say to try to change my mind about Musk fascist tendencies? Try, I want the conversation.
[dead]
Did using Microsoft's tools ever feel like a political standpoint? Because I won't even consider pitching Grok to my employers/clients for that very reason.
Quite frequently.
https://en.wikipedia.org/wiki/Halloween_documents
Grok / X does not have the moat that Microsoft had.
> If half of the developers in the world hate Musk and refuse to use his company's tools
In general, developers use what tools their employers paid for.
The banality of evil and the milgram experiment showed that employees will happily shoot the people that they're told to as well.
And the milgram experiment didn't even have subhuman classes and other such psychological manipulation and pre-biasing
For the fastest performance, run it on Groq. /s
It's all due to robust primitives: https://www.glscott.org/uploads/2/1/3/3/21330938/5375912_ori...
[flagged]
[flagged]
I was wondering that too. Anything reflecting human institutional knowledge writ large will, from Elons perspective, have "liberal bias" which is why he's also attacking Wikipedia and mainstream knowledge across the board. But other attempts at making non "biased" AI have barely done more than have custom instructions requesting that it "be conservative".
Unless you train it on Conservapedia or some equivalent corpus I'm not sure you'll be able to make it agree that "the Irish were the real slaves", that the D's and R's never realigned after the civil war, that the 2020 election was stolen and that gamergate was truly about ethics in journalism.
Wikipedia is almost comically left leaning (considering what it aspires to be).
[flagged]
[flagged]
Oh man. Life must be hard
Standing on the precipice of AI assisted total information awareness and total authoritarian oppression?
Life is frightening right now.
[flagged]
[flagged]
[flagged]
This is called “derangement syndrome”