Apple just released a weirdly interesting coding language model

165 points by ksec 11 days ago

Short version: A Qwen-2.5 7b model that has been turned into a diffusion model.

A couple notable things: first is that you can do this at all, (left to right model -> out of order diffusion via finetuning) which is really interesting. Second, the final version beats original by a small margin on some benchmarks. Third is that it’s in the ballpark of Gemini diffusion, although not competitive — to be expected for any 7B parameter model.

A diffusion model comes with a lot of benefits in terms of parallelization and therefore speed; to my mind the architecture is a better fit for coding than strict left to right generation.

Overall, interesting. At some point these local models will get good enough for ‘real work’ and they will be slotted in at API providers rapidly. Apple’s game is on-device; I think we’ll see descendants of these start shipping with Xcode in the next year as just part of the coding experience.

baobun 8 days ago

Without having tried it, what I keep getting surprised with is how apparently widely different architectures (and in other cases training data) lead to very similar outcomes. I'd expect results to vary a lot more.
- IMTDb 8 days ago
  
  I would expect a lot of attempts to fail and those tend to not be published, or gather less attention. So if we have reached a local optimum, any technique that gets close to the current benchmarks is worth publishing, as soon as results reach that point. All the one that are too distant are discarded. In the end all the paper you see are close to the current status quo.
  It's possible that some of those new architecture / optimization would allow us to go beyond the current benchmark score, but probably with more training data, and money. But to get money you need to show results, which is what you see today. Scaling remains king; maybe one of these technique is 2025 "attention" paper, but even that one needed a lot of scaling to go from the 2017 version to ChatGPT.
- viraptor 8 days ago
  
  It doesn't look like it got pushed that much unfortunately. The article says they only added 20k examples to fine tune at the end, but maybe the ceiling is much higher for diffusion?
  But yeah, RWKV also ends up in a similar performance area with similar sizes - I wish someone started using it at scale finally...
- hnaccount_rng 8 days ago
  
  But if the limiting factor is the data on which the models are trained and not the actual “computation” than this would be exactly expected right?
  
  Ldorigo 8 days ago
  
  The data might be the limiting factor of current transformer architectures, but there's no reason to believe it's a general limiting factor of any language model (e.g. humans brains are "trained" on orders of magnitude less data and still generally perform better than any model available today)
  
  hnaccount_rng 8 days ago
  
  That depends on whether these current learning models can really generalise or whether they can only interpolate within their training set
miroljub 8 days ago

When we look at the small models suitable for running locally, by far the best programming model is DeepSeek-R1-0528-Qwen3-8B. It is quite comparable in real world usage even to much bigger models.
- hardwaresofton 8 days ago
  
  Would you mind sharing how you arrived at this conclusion? Was there some benchmark that it really shined at? Personal use?
  
  miroljub 7 days ago
  
  Personal use, no benchmark, just a vibe.
- handfuloflight 8 days ago
  
  Comparable to which bigger models?
  
  miroljub 7 days ago
  
  My previous favourite was qwen2.5-coder.
roughly 8 days ago

> A diffusion model comes with a lot of benefits in terms of parallelization and therefore speed; to my mind the architecture is a better fit for coding than strict left to right generation.
I had a similar notion and am excited to see this research being done. My experience of writing code is that the structure of the whole system influences each individual part, which has always felt like a better match for a diffusion type model.
I’m suspecting this is a 7B model because it’s an experiment, but I do like seeing Apple playing with smaller models - I think Google’s “no moat” memo is still fundamentally correct, either via better architectures or Moore’s law, and it seems like Apple thinks the same.
- sitkack 7 days ago
  
  The "no moat" memo is way more complex than Google admitting an uncomfortable truth. The benefit massively from having seemingly internal documents leaked about how the play field is fair.
jeswin 8 days ago

> to my mind the architecture is a better fit for coding
We have to see if it produces better results. Humans have a planning phase, followed be a part-by-part implementation phase. This is reasonably well emulated by plan/architect + codegen tools.
- dboreham 8 days ago
  
  It's delusional to think that most software projects can be planned in advance beyond "there will be a beginning, a middle, and an end". People do it, but their efforts are in my experience generally ignored once implementation get underway.
  
  Retric 8 days ago
  
  Planning in software isn’t about following the plan but mapping a viable route to avoid predictable issues. You’re always going to know more about a project as you build it and you should keep updating that plan.
  
  lokar 8 days ago
  
  That’s true at the project level. But surely when you sit down to actually work for a couple hours you think about what you are going to do, and then mostly do that.
  
  layer8 8 days ago
  
  In my experience it’s more fractal. Any subgoal, however small, may run into its own planning/thinking and then doing sequence, or even have you reconsider the higher-level plan. Of course, it somewhat depends on how run-of-the-mill the overall task is.
  
  handfuloflight 8 days ago
  
  laughs nervously under a waterfall
koakuma-chan 8 days ago

> At some point these local models will get good enough for ‘real work’
Are these small models good enough for anything but autocomplete?
- MangoToupe 8 days ago
  
  Given that's 99% of my usage of it, that alone would make me quite happy.
- _heimdall 8 days ago
  
  Isn't that all they're designed for?
  They predict more than just the second half of a word you are typing, but at the end of the day they're still just predicting what a human would have typed.
  
  koakuma-chan 8 days ago
  
  I'm disappointed because I don't use autocomplete.
- Eggpants 7 days ago
  
  Most of the "magic" of large models are really just function calls, so as long as the small models have access to the same functions they work well. They fixed the "how many R's in Strawberry" issue by offloading the question to a function, not spending a godly amount of money/energy on training another model.
  Oops, sorry "Tools". Gotta maintain the grift these statistic based lossy text compression cool bar tricks are "thinking".
iwontberude 8 days ago

I think Apple will ultimately destroy the data center, I hope they succeed.
- lxgr 8 days ago
  
  Maybe for compute, but not for storage.
  Why can’t I backup an iOS device to a local NAS in the way I can use Time Machine, for example? (Rhetorical question; the answer is obviously that they want to sell more iCloud storage for that all-important services revenue).
  
  throw0101d 8 days ago
  
  > Why can’t I backup an iOS device to a local NAS in the way I can use Time Machine, for example?
  When I connect my iPhone to my iMac it does to a local backup to a file, which then gets backed up via Time Machine (and SuperDuper/CarbonCopyCloner).
  "How to back up your iPhone, iPad, and iPod touch with your Mac":
  * https://support.apple.com/en-ca/108796
  There's also a checkbox for 'Wifi syncing' so a cable isn't necessarily needed.
  
  lxgr 8 days ago
  
  That’s exactly my point: Why on Earth do I need a separate computer to mediate the backup?
  iOS natively supports SMB over any network connection including wired Ethernet, mounting encrypted APFS volumes on USB storage devices at 10 Gbps etc.
  It’s Apples explicit vision that an iPad Pro can replace a Mac even for some professional users. Why don’t these deserve local backups?
  
  GeekyBear 8 days ago
  
  How many people own a NAS, but not a PC or Mac?
  Apple already provides first party software to handle iDevice backups on Windows or Mac.
  Backing up an Android device to a PC using adb is significantly more difficult, especially for the less technically minded.
  
  lxgr 8 days ago
  
  > How many people own a NAS, but not a PC or Mac?
  That’s arguably the wrong question: I bet a lot more would own one if they could easily backup their iOS devices to it.
  
  hnaccount_rng 8 days ago
  
  The number of people that would but a NAS over just spending the 5$/month for storage is well below a percent and if you combine that with the requirement of not having a PC/Mac you may well end up in the hundreds…
  There aren’t that many people that are willing to own a device from a company but not trusting that company with their data
  
  lxgr 8 days ago
  
  Your numbers might be right, but Apple has implemented niche features, some even requiring expensive per-device hardware, for much less than that.
  
  hnaccount_rng 8 days ago
  
  Do you have an example?
  
  lxgr 8 days ago
  
  All new iPhone models support native DisplayPort output via USB-C, yet I’m not sure 1% of users even have the required cable/adapter.
  Some of the power amplifiers for rarely-used bands probably qualify as well (mmWave in particular).
  On the software side I’d have to dig a bit, but I bet many code paths on iOS see use of less than 1% of all users.
  
  GeekyBear 8 days ago
  
  I'm willing to bet that more people would backup their Android device if Google provided a first party tool for user friendly backups of Android devices to local computers.
  
  tonyedgecombe 8 days ago
  
  >Why can’t I backup an iOS device to a local NAS
  You can backup your iPhone using Finder.
  Finder -> Locations -> Your iPhone -> Backup all the data on this iPhone to your Mac.
  Once you have done this you can find the backup in "Manage Backups", right click on an entry and select "Show in Finder". From there you can copy it to your NAS.
  Not as smooth as a Time Machine backup but it is possible.
  
  lxgr 8 days ago
  
  > Not as smooth as a Time Machine backup but it is possible
  I’d personally call it “absurdly clunky and intentionally impractical for a big chunk of Apple’s user base”.
  
  hiatus 8 days ago
  
  Synology supports exactly that, and I'm sure they're not the only one.
  
  lxgr 8 days ago
  
  Full iOS backups directly to local external storage, without another computer in the mix? I’d be very surprised if that were true.
  
  GeekyBear 8 days ago
  
  Here's one example of a third party tool.
  > Step-by-Step Guide: How to Backup iPhone to Synology NAS
  https://www.ubackup.com/phone-backup/backup-iphone-to-synolo...
  
  lxgr 8 days ago
  
  > Preparation. How to set up Synology NAS on PC
  That’s a guide on how to backup an iPhone to a NAS using a computer.
  Unsurprisingly, a reasonably capable general-purpose OS supports network file systems in a way transparent to applications, but that doesn’t help people using only an iOS device.
  
  oefrha 8 days ago
  
  Did you actually read what you linked, or did you just paste in a random link from a search engine?
  There are two methods presented: one only backs up the camera roll; the other requires plugging into a computer and manually clicking around, at which point you might as well use the first party backup built into Finder (or iTunes on Windows? Is that still a thing?), no random third party application needed. I also highly doubt their “backup every single content” claim.
  It’s also a sneaky marketing article for that third party application, following the common SEO practice of giving you a half-ass solution capturing a frequent search term (in this case, “backup iPhone to Synology”), then plug their own questionable thing as the better solution.
- nxobject 8 days ago
  
  Shades of 1980s Apple v. Big Blue. I can't wait for the rehash of the "1984" ad.
- overfeed 8 days ago
  
  > I think Apple will ultimately destroy the data center
  I think EVs destroying Ultra Large Container ships had better odds, amd both are extremely unlikely. Dc advantages Apple won't be able to overcome: compute density, cooling, cheap power, physical security to protect the software, scale + bandwidth, lower costs to customers of using contract manufacturers and/or commodity hardware.
  There is no universe where large enterprises ditch their geo-located racks. Let alone hyperscalers, especially now that they are scrounging for energy, reneging on pledges on renewables, and paying bug bucks to bring nuclear power stations online
  
  iwontberude 7 days ago
  
  It’s easy to imagine a universe where the hyperscalers are in a bubble and they will eventually find a limit to adding classical compute and we will hit peak datacenter and shrink from there.
- msgodel 8 days ago
  
  Not without fundamentally changing the way they think about computing and there seems to be zero willingness among their leadership to do that. In fact they seem to want to move things into the data center. That's why I'm shorting them.
  
  iwontberude 8 days ago
  
  I think it’s just a convenient stepping stone more than a long term strategy.

quaintdev 8 days ago

Jetbrains has 100MB models of languages for their IDEs that can auto complete single lines. It's good but I think we can do better for local code auto complete. I hope Apple succeeds in their on device AI attempts.

infecto 8 days ago

Is any of the Jetbrains offerings even competitive. I jumped shipped from Pycharm and have tried their AI offerings a few times since release but was always wildly impressed at how far behind they were of the competition.
- crappybird 8 days ago
  
  Junie is so amazing. It burns through credits like hellfire, but I have only seen claude-code and opencode coming anywhere close to it.
  Did you use the normal Jetbrains AI assistant, or was it junie?
  
  infecto 7 days ago
  
  Thanks for the recommendation. Pretty sure this was all pre Junie so will need to try it again.
- nsm 8 days ago
  
  I've had good experiences with Junie and AI assistant at their bread and butter languages of Java and Kotlin. I haven't tried it in anger though.
msgodel 8 days ago

You can run Qwen3 locally today if you want to. It can write whole files if you want (although not with <1 second latency like a sub 1GB model will which is what you want for interactive in-editor completions.)

skylerwiernik 8 days ago

Here's the paper that they wrote: https://arxiv.org/pdf/2506.20639

It's notable that this was an intern project.

supriyo-biswas 8 days ago

Either this or https://huggingface.co/apple/DiffuCoder-7B-cpGRPO should be replaced with the current article.
runeblaze 8 days ago

I mean tbh industry research labs pump out a lot of good research due to them being intern projects (as in you have an army of passionate interns)

vintagedave 8 days ago

Does diffusion allow for 'size editing'? Unsure how to ask this, or if this (most likely) reveals a fundamental misunderstanding of my own, but: for an image, the size is set (say, 256x256.) For text, if each token were a pixel, it's very small. The article image showed text colour-coded by generation order. What if it would need to, say, insert another line for the rest of a comment sentence? How would it even know the size upfront, the way an image size is known?

mjp 8 days ago

Yes, block diffusion for example generates fixed-size blocks of text, but can dynamically change the number of blocks.
The Zed team recently posted a pretty good intro to diffusion models for text: https://www.youtube.com/watch?v=oot4O9wMohw
- mdaniel 8 days ago
  
  That was interesting, although it's unclear why there's only one other person in a Zoom. However, it turns out that I was glad to have someone there because they asked the question I had about the interaction between diffusion and prompting <https://www.youtube.com/watch?v=oot4O9wMohw&t=977> which made it click why one would use this diffusion setup for code completion specifically. The "prompt" is the current state of the (file|line) and it then diffuses text into place

jbellis 8 days ago

Looks like this is targeted more at "better autocomplete" than "powers your next agent." Makes sense given Apple's interest in the on-device experience.

PaulRobinson 8 days ago

If we're getting almost-comparable results from models that can run locally (especially on dedicated silicon that we know Apple are good at designing and shipping at scale), I think we might hit a turning point soon. This was an intern project, so hopefully they'll now push a bit more resource at it: a 7b param model on my laptop that is getting near Gemini or Claude standards is going to win my money, every day.

On a purely tech point, I'm not working very near the cutting edge on AI research, but hadn't realised so much had been done - and was possible - with diffusion models for text. Will need to dig into that, as it looks fascinating.

Incipient 8 days ago

The model itself I'm actually less fussed about. The integration with the ide and the ability to easily get the model the right context is what I struggle with (vscode is passable with the recent 101 update). I find pretty much all models in copilot offer similar-ish performance.

I've been looking for good/best-practice workflow setups to work with docker python/fastapi backends and vue frontends...but I haven't found much.

If anyone has tips for where to look, I'd genuinely appreciate it!

wellthisisgreat 8 days ago

I tried every single frontier model for code-related tasks and Claude Code is the best by a margin.
Other, non-programming-related tasks are a different story though

frankfrank13 8 days ago

Would love to try this in ollama/llama.cpp. Using llama.cpp for VsCode is painful since (realistically) I can only generate on the order of <100 tokens at a time

WillAdams 8 days ago

Articles such as this really make me wish that the current generation of LLMs were more often described as workable implementations of "the infinite monkey theorem" --- the descriptions/images in this article are esp. worth showing to aperson to whom one is trying to describe how an AI model "creates" an image.

amelius 8 days ago

Points I thought were interesting:

> Apple’s model is built on top of Qwen2.5‑7B, an open-source foundation model from Alibaba. Alibaba first fine-tuned that model for better code generation (as Qwen2.5‑Coder‑7B), then Apple took it and made its own adjustments.

> it still doesn’t quite reach the level of GPT-4 or Gemini Diffusion.

andsoitis 8 days ago

> The result is faster code generation, at a performance that rivals top open-source coding models

So even though it is faster (than what?) it still doesn’t beat top models?

smotched 8 days ago

its a 7b model...

ellisv 8 days ago

I often write out of order so it’s interesting to me that we have a model that can do so as well.

zamalek 8 days ago

Zed are doing out-of-order edits with their model, I'm not sure what is new here. I strongly suspect that theirs works directly with the CRDT that the editor uses, because it's able to continue similar deletes for the user (deletes would otherwise be invisible to most autocomplete models).

Apple's is open weights, so that's a big deal.