Nanocode: The best Claude Code that $200 can buy in pure JAX on TPUs

195 points by desideratum 1 day ago

wwfn 22 hours ago

Tangential (but topical in that "The threat is comfortable drift toward not understanding what you're doing" is also on the front page):

Is the generated python code in the example wrong?

The prompt

> Develop a Python function that removes any falsey values from a list. Return the modified list without creating a new one.

Is answered with list comprehension, which makes a new list and leaves the original unmodified (never mind that the *args input necessarily can't be a modifiable list?)

   def remove_falsey_values(*args): return [val for val in args if val]

Whereas I'd expect something like

    def remove_falsey_values(l):
          for i in reversed(range(len(l))):
               if not l[i]: l.pop(i)
          # returned list is linked to input l 
          return l

    a = [1, 0, False, 'foo']
    x = remove_falsey_values(a)
    x[0] = 2
    print(a) # [2,'foo']

hecanjog 22 hours ago

It doesn't fit the requirement to modify the list in place, but the prompt itself contradicts the requirements by asking explicitly for the implementation to use *args and a list comprehension.
- wwfn 22 hours ago
  
  Ahh I didn't see the full original prompt -- it's overflowing into a horz scroll for me. I thought it was the "critique loop" that injected the *args requirement. I guess garbage in, garbage out. Still unfortunate example to use.

__s 20 hours ago

    def remove_falsey_values(l):
          l[:] = (x for x in l if x)

desideratum 19 hours ago

Oh I wouldn't be surprised. This is a sample from one of the OSS code datasets I'd used, which are all generated synthetically using LLMs. Data is indeed the moat.
semiinfinitely 19 hours ago

your second function is the type of bad code you get from people trying to program python like its c
- wwfn 16 hours ago
  
  Absolutely! And the list.pop version is multiple orders of magnitude slower. But I took the prompt to be asking for in-place modification of the existing list. Comprehension does not do that.
- ktm5j 16 hours ago
  
  Is there a pythonic way to satisfy the prompt? IE without making a new list?
nusl 17 hours ago

Why would you modify the original list and return it with the second example? Honestly the first is better
- highphive 17 hours ago
  
  The question isn't really what's better practice, the question is whether the code follows the prompt. The first example does not.

tatrions 6 hours ago

The constitutional AI approach for tool-use training is the most interesting part here. Writing a constitution for a coding agent means your principles are things like "read the file before editing" and "don't run destructive commands without checking" -- which are operational decisions, not the usual harmlessness stuff.

Would be interesting to see how those principles scale. At 1.3B params the model probably learns the tool-calling format fine, but whether it actually reasons about when to use which tool vs just pattern matching on the training data is the key question. That meta-reasoning gap is likely where most of the quality difference between small models and frontier shows up in agentic settings.

desideratum 5 hours ago

Yes my findings and thoughts were pretty much identical. I actually think you can get something reasonable at 1.3B params with the correct training recipe, but definitely not at this compute/token budget.
One thing I found was that the model would pretty much always emit solutions from its training data when asked to solve problems, but it was much better at using Bash commands to explore a codebase, for example.
The Hugging Face folks have a great post on also using CAI for more vibes/character post-training than harmlessness https://huggingface.co/blog/constitutional_ai#oh-honey-lets-...

bdbdbdb 23 hours ago

Dumb question - and I'm not trying diminish the achievement here, I just genuinely don't understand:

Why would people want to spend $200 to train a coding model when there are free coding models?

desideratum 23 hours ago

This is a great question. You definitely aren't training this to use it, you're training it to understand how things work. It's an educational project, if you're interested in experimenting with things like distributed training techniques in JAX, or preference optimisation, this gives you a minimal and hackable library to build on.
- wongarsu 20 hours ago
  
  It's also a great base for experimentation. If you have an idea for an architecture improvement you can try it for $36 on the 20 layer nanocode setting, then for another $200 see how it holds up on the "full scale" nanocode
  Kaparthy's notes on improving nanochat [1] are one of my favorite blog-like things to read. Really neat to see which features have how much influence, and how the scaling laws evolve as you improve the architecture
  There's also modded-nanogpt which turns the same kind of experimentation into a training speedrun (and maybe loses some rigor on the way) [2]
  1 https://github.com/karpathy/nanochat/blob/master/dev/LOG.md
  2 https://github.com/kellerjordan/modded-nanogpt

jaboostin 22 hours ago

As someone with zero ML experience, this was a super interesting and digestible read!

bwfan123 21 hours ago

agree, great educational tool ! tied a bunch of things around coding agents for me.
- desideratum 19 hours ago
  
  I appreciate the kind words very much : )

vova_hn2 20 hours ago

> This is a library showing you how to train your own Claude Code end-to-end.

What does it even mean?

Claude Code is a so called "harness" - a thing that builds a context for LLMs, calls LLMs, executes tool calls etc. It uses various Anthropic models under the hood.

It can also use other models AFAIK.

It cannot be "trained".

Sorry if this comment sounds nitpicky, I'm just annoyed by the imprecise use of terminology.

krackers 19 hours ago

Yeah it should really be about post-training a model for tool-use.
desideratum 19 hours ago

I see what you mean, but I disagree. I expect that Claude Code is backed by a separate post-train of Claude base which has been trained using the Claude Code harness and toolset.
- vova_hn2 19 hours ago
  
  It is possible of course, but I see no reason to believe it.
  
  jasonjmcghee 17 hours ago
  
  fwiw, other models seem to / are reported to struggle much more with using claude code compared with codex / opencode / pi etc.
  that being said, there are other potential explanations

redman25 19 hours ago

Not to be confused with nanocoder, the agentic coding harness.

https://github.com/Nano-Collective/nanocoder

wg0 19 hours ago

Does this really work? Does this how Anthropic works?

Any practitioners can elaborate?

desideratum 19 hours ago

This is a gross simplification of the process - you would typically use order(s) of magnitude more data and compute, and a substantial amount of online reinforcement learning to elicit emergent tool use capabilities.
Many recent OSS models have great tech reports where you can learn more about these kind of things: Kimi 2.5 https://github.com/MoonshotAI/Kimi-K2.5/blob/master/tech_rep... GLM 5 https://arxiv.org/abs/2602.15763 DeepSeek R1 https://arxiv.org/pdf/2501.12948