Writing Lisp is AI resistant and I'm sad

blog.djhaskin.com

93 points by djha-skin 1 day ago

I have been using AI to write Clojure code this past half year. The frontline LLM has no problem with writing idiomatic Clojure code. Both Codex and Claude Code fix their missing closing parentheses quickly. So I won't say "Writing Lisp is AI resistant". In fact, Clojure is a great fit with AI coding agent: it is token efficient, and the existing Clojure code used for training are mostly high quality code, as Clojure tends to attract experienced coders.

zrkrlc 1 day ago

Ha, and I’ve been using Datalevin for my vibe-coded apps! Thank you so much for such a wonderful piece of software.
- huahaiy 1 day ago
  
  I am glad you enjoyed it. I am happy to report that the next release will have many new features: Raft consensus based high availability (comes with an extensive Jepsen test suite); built-in MCP server; built-in llama.cpp for in DB embedding; JSON API; language bindings for Java, Python and Javascript.
- mark_l_watson 1 day ago
  
  Hey, thanks for mentioning Datalevin (https://github.com/datalevin/datalevin) - I hadn’t seen that before.
shevy-java 1 day ago

But we don't hear of famous AI written in lisp. It's like below the notice-radar.

funkaster 1 day ago

I have found it to be the complete opposite tbh. Not lisp but I've been generating Scheme with claude for about 5 months and it's a pleasure. What I did was to make sure CLAUDE.md had clear examples and also I added a skill that leverages ast-grep for ast-safe replacement (the biggest pain is that some times claude will mess up the parens, but even lately it came up with its own python scripts to count the parens and balance the expressions on its own).

I created Schematra[1] and also a schematra-starter-kit[2] that can be spun from claude and create a project and get you ready in less than 5 minutes. I've created 10+ side projects this way and it's been a great joy. I even added a scheme reviewer agent that is extremely strict and focus on scheme best practices (it's all in the starter kit, btw)

I don't think the lack of training material makes LLMs poor at writing lisp. I think it's the lack of guidelines, and if you add enough of them, the fact that lisp has inherently such a simple pattern & grammar that it makes it a prime candidate (IMO) for code generation.

[1]: https://schematra.com/

[2]: https://forgejo.rolando.cl/cpm/schematra-starter-kit

mark_l_watson 1 day ago

Thanks for the Scheme setup examples. I have created very simple skills markdown files for Common Lisp and Hylang/hy (Clojure-like lisp on top of Python). I need to spend more effort on my skills files though.
nxobject 15 hours ago

This is incredibly useful - not for Scheme, but for someone like me interested in bootstrapping languages and frameworks in general. I hope you find a way to share the best practices you've learned in a broader context.

mark_l_watson 1 day ago

Interesting, and not quite my experience. While I do get better agentic coding results for Python projects, I also get good results working with Common Lisp projects. I do have a habit of opening an Emacs buffer and writing a huge prompt with documentation details, sometimes sample code in other languages or if I am hitting APIs I add a working CURL example. For Common Lisp my initial prompts are often huge, but I find thinking about a problem and prompt creation to be fun.

The article mentions a REPL skill. I don’t do that: letting model+tools run sbcl is sufficient.

jimlikeslimes 1 day ago

Yes, I've also found llms can generate working common lisp code quite well, albeit I've only been solving simple problems.
I haven't tried integrating it into a repl or even command line tools though. The llm can't experience the benefit of a repl so it makes sense it struggled with it and preferred feeing entire programs into sbcl each time.

js8 1 day ago

Personally, I think we're using LLMs wrong for programming. Computer programs are solutions to a given constraint logic problem (the specs).

We should be using LLMs to translate from (fuzzy) human specifications to formal specifications (potentially resolving contradictions), and then solving the resulting logic problem with a proper reasoning algorithm. That would also guarantee correctness.

LLMs are a "worse is better" kind of solution.

jmalicki 1 day ago

> We should be using LLMs to translate from (fuzzy) human specifications to formal specifications (potentially resolving contradictions)
Agreed! This is why having LLMs write assembly or binary, as people suggest, is IMO moving in the wrong direction.
> then solving the resulting logic problem with a proper reasoning algorithm. That would also guarantee correctness.
Yes! I.e. write in a high-level programming language, and have a compiler, the reasoning algorithm, output binary code.
It seems like we're already doing this!
derrak 23 hours ago

In case you’re not familiar, I will point you to the classical program synthesis literature. There the task is to take a spec written in say first-order logic, and output a program that satisfies this spec.
I think the biggest barrier to adoption of program synthesis is writing the spec/maintaining it as the project matures. Sometimes we don’t even know what we want as the spec until we have a first draft of the program. But as you’re pointing out, LLMs could help address all of these problems.
zozbot234 23 hours ago

Full program inference from specs is actually a very hard problem, because the compiler/SAT solver cannot autonomously derive loop invariants (or, similarly, inductive hypotheses) that are necessary to write correct code. So using a LLM that can look at the spec and provide a heuristic solution makes a lot of sense. Obviously the solution still has to be verified, though.
iLemming 20 hours ago

> using LLMs wrong for programming
Perhaps you meant to say "coding", not "programming". AI is immensely helpful for programming. Coding is just the last, and in a proper programming session sometimes even unnecessary step - there are times when an adequate investigation requires deleting code rather than writing new one, or writing pages of documentation without a single code change.
You have to be a detective and know what threads to pull to rope in the relevant data, digging inductively and deductively - soaring high to get the "big picture" of things and diving into the depths of a single code line change.
I've been developing software for decades now (not claiming to be great, but at least I think I've built certain intuition and knack for it), and I always struggled with the "story telling" aspect of it - you need to compose a story about every bug, every feature request - in your head, your notes, your diagrams. A story with actors, with plot lines, with beginning, middle, and end. With a villain, a hero, and stakes. But software doesn't work that way. It's fundamentally an exploratory, iterative, often chaotic process. You're not telling what happened - you're constructing a plausible fiction that satisfies the format. The tension I felt for decades is that I am a systems thinker being asked to repeatedly perform as a narrator, and that is hard.
Modern AI is already capable of digging up me the details for my narrative - I gave it access to everything - Slack, Jira, GitHub, Splunk, k8s, Prometheus, Grafana, Miro, etc. - and now I can ask it to explain a single line of code - including historical context, every conversation, every debate, every ADR, diagram, bug and stack trace - it's complete bananas.
It doesn't mean I don't have to work anymore, if anything, I have to work more now, because now I can - the reasons become irrelevant (see Steve Jobs' janitor vs. CEO quote). I didn't earn a leadership role - AI has granted it? Forced me into it? Honestly, I don't know anymore. I have mixed feelings about all of it. It is exciting and scary at the same time. Things that I dreamed about are coming true in a way that I couldn't even imagine and I don't know how to feel about all that.

danpalmer 1 day ago

This rings true for me. LLMs in my experience are great at Go, a little less good at Java, and much less good at GCL (internal config language).

This is definitely partly training data, but if you give an LLM a simple language to use on the fly it can usually do ok. I think the real problem is complexity.

Go and Java require very little mental modelling of the problem, everything is written down on the page really quite clearly (moreso with Go, but still with Java).

In GCL however the semantics are _weird_, the scoping is unlike most languages, because it's designed for DSLs. Humans writing DSL content requires little thought, but authoring DSLs requires a fair amount of mental modelling about the structure of the data that is not present on the page. I'd wager that Lisp is similar, more of a mental model is required.

The problem is of course that LLMs don't have a mental model, or at least what they do have is far from what humans have. This is very apparent when doing non-trivial code, non-CRUD, non-React, anything that requires thinking hard about problems more than it requires monkeys at typewriters.

eldenring 1 day ago

How many docs do you put in the context? we maintain a lot of dsl code internally, and each file has a copy of the spec + guide as a comment at the top. Its about 50 locs and the relevant models are great at writing it.
- danpalmer 1 day ago
  
  Oh yeah the models are great at writing the DSLs, there are enough examples to do that very effectively. It's the building of the DSL, which is implemented in the config language, which is tricky. i.e, writing a new A/B test in the language is trivial, writing an A/B testing config DSL in the language is hard.
  The main problem is the dynamic scoping (as opposed to lexical scoping like most languages), and the fact that lots of things are untyped and implicitly referenced.
miki123211 1 day ago

I bet it would do much better at hcl (or Starlark, maybe even yaml, something that it has seen plenty of examples of in the wild).
This is a weird moment in time where proprietary technology can hurt more than it can help, even if it's superior to what's available in public in principle.
- pjmlp 1 day ago
  
  Depends if the AI masters also own said proprietary technology.
  
  miki123211 1 day ago
  
  Well, GCL is (afaik) a Google technology, and they do have some kind of internal, fine-tuned models just for their stack.
  Who owns the tech doesn't matter, what matters is whether there's a set of diverse examples of its use spread around the internet.
  
  danpalmer 1 day ago
  
  Yeah it's internal, and we have fine tuned models and more lines of it than you can imagine.
  That's the reason I think it honestly depends more on the complexity to understand and the necessity of having a mental model of the code.

discardable_dan 1 day ago

I've had it write Scheme with little issue -- it even completely the latter half of a small toy compiler. I think the REPL is the issue, not the coding; forcing it to treat the REPL like another conversation participant is likely the only way for that to work, and this article does not handle it that way. Instead, hand it a compiler and let it use the workflow it is optimized for.

antonvs 1 day ago

Agreed. The article bemoans the fact that AIs don’t need to work in the inefficient way that most humans prefer, getting micro-level feedback from IDEs and REPLs to reduce our mistake count as we go.
If you take a hard look at that workflow, it implies a high degree of incompetence on the part of humans: the reason we generally don’t write thousands of lines without any automated feedback is because our mistake rate is too high.

truncate 1 day ago

Claude has really helped me improve my Emacs config (elisp) substantially, and sometimes even fix issues I've found in packages. My emacs setup is best it has ever been. Can't say it just works and produces the best solution and sometimes it would f** up with closing parens or even make things up (e.g. it suggest load-theme-hook which doesn't exist). But overall, changing things in Emacs and learning elisp is definitely much easier for me (I'm not good with elisp, but pretty good Racket programmer).

prescriptivist 1 day ago

I used Emacs for about a decade and then switched to VS Code about eight years ago. I was curious about the state of Claude Code integration with Emacs, so I installed it to try out a couple of the Claude packages. My old .emacs.d that I toiled many hours to build is somewhere on some old hard drive, so I decided to just use Claude code to configure Emacs from scratch with a set of sane defaults.
I proceeded to spend about 45 minutes configuring Emacs. Not because Claude struggled with it, but because Claude was amazing at it and I just kept pushing it well beyond sane default territory. It was weirdly enthralling to have Claude nail customizations that I wouldn't have even bothered trying back in the day due to my poor elisp skills. It was a genuinely fun little exercise. But I went back to VS Code.
- pneumic 1 day ago
  
  Came to post exactly this, except it’s got me using emacs again. I led myself into some mild psychosis where I attempted to mimic the Acme editor’s windowing system, but I recovered
  
  truncate 1 day ago
  
  Yeah, and all the little quirks here and there I had with emacs or things that I wish I had in workflow, I can just fix/have it without worrying about spending too much time (except sometimes maybe). The full Emacs potential I felt I wasn't using, I'm doing it and now I finally get it why Emacs is so awesome.
  E.g. I work on a huge monorepo at this new company, and Emacs TRAMP was super slow to work with. With help of Claude, I figured out what packages are making it worse, added some optimizations (Magit, Project Find File), hot-loaded caching to some heavyweight operations (e.g. listing all files in project) without making any changes to packages itself, and while listing files I added keybindings to my mini buffer map to quickly just add filters for subproject I'm on. Could have probably done all this earlier as well, but it was definitely going to take much longer as I was never deep into elisp ecosystem.
- iLemming 19 hours ago
  
  Hooking up Emacs to ECA with emacs-eval MCP is fantastic - Claude can make changes in my active Emacs session, run the profiler, unload/reload things, log some computation or embark-export search results and show it in a buffer; It can play tetris and drive itself crazy with M-x doctor - it's complete and utter bonkers. I can tell it to make some face color brighter/darker on the spot, the other day I fixed a posframe scaling issue that bugged me for a long time - it's not even about "I don't know elisp", this specific thing requires you to sit down and calculate geometry of things - mechanical, boring stuff. AI did it in minutes. VS Code, IntelliJ, any other shit that has no Lisp REPL? What are you even talking about? It's like a different world.

atgreen 1 day ago

I enjoyed reading this. Thank you for sharing.

I learned Common Lisp years ago while working in the AI lab at the University of Toronto, and parts of this article resonated strongly with me.

However, if you abandon the idea of REPL-driven development, then the frontier models from Anthropic and OpenAI are actually very capable of writing Lisp code. They struggle sometimes editing it (messing up parens)), but usually the first pass is pretty good.

I've been on an LLM kick the past few months, and two of my favorite AI-coded (mostly) projects are, interestingly, REPL-focused. icl (https://github.com/atgreen/icl) is a TUI and browser-based front end for your CL REPL designed to make REPL programming for humans more fun, whether you use it stand-alone, or as an Emacs companion. Even more fun is whistler (https://github.com/atgreen/whistler), which allows you to write/compile/load eBPF code in lisp right from your REPL. In this case, the AI wrote the highly optimizing SSA-based compiler from scratch, and it is competitive against (and sometimes beating) clang -O2. I mean... I say the AI wrote it... but I had to tell it what I wanted in some detail. I start every project by generating a PRD, and then having multiple AIs review that until we all agree that it makes sense, is complete enough, and is the right approach to whatever I'm doing.

nkassis 1 day ago

I am a bit (ok very) worried AI will most likely kill language diversity in programming. I also don't see it settling on a more optimal solution it will probably just use the most available languages out there and be very hard to push out of that rut. And it's not limited to languages I expect knowledge ruts all over the place and due to humans and AI choosing the path of least resistance I don't see an active way to fight this.

pimlottc 21 hours ago

I think that providing a large corpus of example programs for training will become practically mandatory for any new language or framework in the future. At least that way you can help “jump-start” LLMs before it gets adopted widely enough for organic training material to emerge.
einsteinx2 21 hours ago

N=1 anecdote, but I've actually found I'm more likely to use different languages now that I'm using LLMs because I don't have to think about the different syntax as much.
For example I've been on the lookout for a better language than bash to use for shell scripting, but didn't like the options I was familiar with for various reasons (go, python, js, swift, etc). I did some research and Nim seemed to fit my needs perfectly. I was able to quickly convert some scripts I had to Nim using an LLM, where in the past I wouldn't have bothered to get used to a whole new language just for a few scripts.
Or right now I'm working on a personal full stack project and chose Go for the backend services, TypeScript/React for the frontend, and also have one service in Python because the library I need is easier to use there than in Go. Normally it would be a frustrating to context switch languages, but with LLMs I'm thinking more about the architecture and logic than specific syntax so it's been pretty frictionless.
I've generally always been one to want to use the best language/stack/platform for the job, so I'm probably biased, but I think LLMs actually make it easier to use languages you're less familiar with as long as you understand fundamental programming concepts. I'm hoping they end up promoting the usage or uptake of some of the less popular languages like Nim due to the lower learning curve needed to get useful output from them.

ivan4th 1 day ago

From my experience Claude Code is not that bad with Common Lisp and can do REPL-style development. I've been using this MCP server (an older version with some tweaks): https://github.com/cl-ai-project/cl-mcp (even though I'd probably prefer some MCP-to-swank adapter if it existed) And this MCP server works quite well for Emacs https://github.com/rhblind/emacs-mcp-server

There are some issues of course. Sometimes, Claude Code gets into "parenthesis counting loop" which is somewhat hilarious, but luckily this doesn't really happen too often for me. In the worst case I fix the problematic fragment myself and then let it continue. But overall I'd say Claude Code is not bad at all with Lisps

iLemming 21 hours ago

Working with Lisp dialects (because of proper Lisp REPL) is nothing short of magic. I hooked up my Emacs AI tools to it. ECA and gptel-agent are able to change any elisp code, run check-parens, apply changes immediately - unload, reload things, changing the behavior of my editor on the fly. I once even have asked a model to use the built-in profiler and it worked. I vibe-coded my MCP servers through Clojure REPL.

On Mac I can poke virtually any aspect of my system - my Hammerspoon config is written in Fennel - has a REPL.

On Linux, I have a babashka loop with nrepl, that "talks" to Hyprland's IPC through a socket - AI can diagnose the state of WM and move things around, change color temp, affect gamma, etc.

I have made little prototypes with nbb and Playwright, and the model had no difficulty understanding the REPL loop - it was able to inspect every DOM element going to it through the REPL.

We have a few services written in Clojure, we keep nrepl on staging k8s cluster. I have vide-coded, fixed and tested things on the go - LLM can directly eval things there. Fixing bugs in Python, Java and Go takes completely different kind of loop - sometimes it feels like AI even gets excited when there's a REPL to mess around.

If anything - being a lisper in AI-era only reinforced my belief that making a deliberate choice to learn and understand the philosophy of Lisp years ago was the best choice I could've made. I future-proofed myself for decades.

Working with Lisp for a human programmer requires mindset adjustment - AI is no different here - you just have to tell it where the REPL is.

dang 1 day ago

I'm finding the opposite: Claude Code is strikingly good at Common Lisp (unsurprising given how much CL material would have made it into the training set), and even much better than I expected with Arc.

However, a large part of OP is about REPLs and on that I've also had a hard time with CC. I was working on it this evening in fact, and while I got something running, it's clunky and slow.

layer8 1 day ago

How many closing parentheses are in strawberry.

Zak 1 day ago

I leaned on Claude Code quite a bit resurrecting Clojure on Android[0] and got good results with it. Using the Clojure REPL MCP works especially well for about the same reasons I find developing with a REPL myself important: it can query the running program to see how things work, and test implementations with rapid turnaround.

I wasn't sure if I should expect great results relative to more popular languages with more code for the LLM to train on, but it looks like that's either not a big issue, or Clojure is over the popularity threshold for good results. I also previously expected languages with a lot of static guarantees like Rust to lead to consistently better results with LLM coding agents than languages like Clojure which have few, but that's untrue to the point that "bad AI rewrite in Rust" is a meme.

[0] https://github.com/clj-android

aewens 1 day ago

Amusingly, some of the earliest AI research was using Lisp which beget AI winter. Now we’ve come full circle with LLMs that struggles to write valid Lisp. Almost poetic.

threatofrain 1 day ago

I have a feeling we'll care less about untyped languages going forward as LLMs prototype faster than we do, and fast prototyping was a big reason why we cared about untyped languages.
- spartacusnacho 1 day ago
  
  Javascript and Python have the most training data by far though, right?
- Jach 1 day ago
  
  Pedantic but Lisp is not "untyped". (Neither are JS or Python.) All data has a type you can query with the type-of function. The typing is strong, you'll get a type-error if you try to add an integer to a string. Types can be declared, and some implementations (like SBCL) can and do use that information to generate better assembly and provide some compilation-time type checks. (Those checks don't go all the way like a statically typed language would, but Lisp being a programmable programming language, you can go all the way to Haskell-style types if you want: https://coalton-lang.github.io/)

nemoniac 1 day ago

My own experience over the last few months is quite the opposite so it's heartening to see some reputable Lispers reporting the same in the comments here.

Everything in this area is moving so quickly that I haven't yet crystallized my thinking or settled on a working methodology but I am getting a lot of value out of running Claude Code with MCP servers for Common Lisp and Emacs (cl-mcp & emacs-mcp-server). Among other things this certainly helps with the unbalanced parentheses rabbit hole.

Along with that I am showing it plenty of my own Lisp code and encouraging it to adopt my preferred coding style and libraries. It takes a little coaching and reinforcement (recalcitrant intern syndrome) but it learns as it goes. It's really quite a pleasant experience to see it write Lisp as I might have written it.

drob518 1 day ago

I don’t find many issues with Clojure. The main problem is that it sometimes gets the paren balance wrong when it’s proposing an edit. Sometimes it will spin for a bit on that. A harness can help there, I’ve heard, but thus far I’ve just done a quick hand edit each time. I think this has something to do with how Lisps are typically written with all the closing parens on the last line, as opposed to on separate lines like with C. It might also have something to do with how parens and groups of parens are tokenized in the LLM and how edits are communicated (typically line oriented diffs). Regardless, it’s a problem but not a major one.

mpenet 20 hours ago

If you use clojure-mcp/clojure-mcp-light that problem goes away, and that gives it the ability to run a repl and work from it directly. It’s night and day.
- drob518 5 hours ago
  
  Yea, I need to try those.

swiftcoder 1 day ago

Isn't the whole problem here trying to wedge the LLM into using a REPL loop, when it could one-shot source files just fine? Python has a REPL too, but you don't see the LLM building python by REPL loop either...

blurbleblurble 1 day ago

I think some kind of graph-capable model directly on the AST or a lower level IR would be the way to go, with bidirectionality so that changes propagate back up to the syntax without squandering LLM resources.

throw913 1 day ago

Expected, considering stuff like the recent post re: esolang benchmarks. Lisp is probably just out of distribution. This is just a popularity contest, not a reflection on anything else.

matrix12 21 hours ago

I've been vibing a full r7rs scheme on ChezScheme and a proper language MCP and LSP go a long ways. Especially around keeping parenthesis balanced at all time. Give the LLM instructions to vote for features on the MCP and then you help reduce its friction points.

twoodfin 1 day ago

Wildly speculating here, but if you buy that human brains have innate / evolved syntactic knowledge, and that this knowledge projects itself as the common syntactic forms across the bulk of human languages, then it’s no surprise that LLMs don’t have particularly deep grooves for s-expressions, regardless of the programming language distribution of the training set.

lgessler 1 day ago

Is Java or Haskell any closer to human language?
Telemakhos 1 day ago

OK, I'll bite. I want to know more of the reasoning behind this, because I think it implies that S-expressions are alien to the innate/evolved syntactic knowledge in human languages. A lot of American linguistics, like Chomsky's gropings for how to construct universal grammar and deep syntax trees, or the lambda calculus of semantic functions, looks like S-expressions, and I think that's because there was some coordination between human linguists and computer science (Chomsky was, after all, at MIT). At the same time, I've had a gut instinct that these theories described some languages (like English) better than others (like ancient Greek), requiring more explanation of changes between deep structure and surface structure for languages that were less like English. If models trained on actual language handle s-expressions poorly, that could imply that s-expressions were not a good model for the deep structure of human language, or that the deep-structure vs surface-structure model did not really work. I'd be very happy to learn more about this.
- drob518 1 day ago
  
  S-expressions are just lists and trees. That’s it. If a language has groups of words and any hierarchy, you can use s-expressions to represent it. Sure, some human languages might be more or less flat and the groups might represent different things, but I don’t see how that prevents s-expressions from being suitable. Greek doesn’t rely on word order nearly as much as English (it does more with suffixes to indicate subject and object, for instance), but all of that can still be represented in s-expressions.
  
  twoodfin 20 hours ago
  
  Sure, no argument that s-expressions are wonderfully simple & expressive.
  But most human languages—or at least the dominant ones that compose the vast bulk of the LLM training set—use more complex structuring rules for whatever evolutionary linguistic reasons. Easier error correction? Auditory disambiguation?
  You could tell similar “just so” stories about computer language syntax, & why s-expressions didn’t win out over (say) XML-style tagging. And it turns out pseudo-XML is a great way to talk to LLMs.
  EDIT: To be clear, by “s-expressions” I mean their typical use in Lisp programming of a function expression followed by a series of parameter expressions. The “grammar” is just eval/apply.
js8 1 day ago

There is an interesting on-going research https://dnhkng.github.io/posts/sapir-whorf/ that shows LLMs think in a language-agnostic way. (It will probably get posted to HN after it is finished.)
- twoodfin 20 hours ago
  
  I would expect that. But I’d also expect the pattern of their thoughts to look more varied in structure like C or German, and less like totally uniform s-expressions.

nottorp 1 day ago

> With AI, code is cheap, but only if you use a language for which AI has a lot of training data.

Yep. Language and libraries too.

Archit3ch 21 hours ago

It's alright in Julia, provided that you teach it to

1) use a running REPL session 2) ignore pre-compilation time (it will kill the running process, mistaking it as stuck...)

faangguyindia 1 day ago

I had AI write Haskell for me and it did that beautifully. I am not sure why would LISP be any sudden.

mark_l_watson 1 day ago

Same experience. I like Haskell a lot but I am not great at Haskell programming. LLM based coding agents are useful for helping with runtime errors, library versions, etc. (and as other people here have said, for tedious stuff like cleaning up Emacs customizations, etc.)

rcarmo 1 day ago

This must be specific to Common Lisp. I’ve had no significant issues with Fennel and Chez Scheme, although to be fair it was on existing projects and they are not languages I would start a project with today.

Ologn 23 hours ago

With Gemini 3, I wrote an Emacs Lisp which can tell if a number is prime or not using only primitive recursive functions. That was done at the end of last year, and none of the frontier LLMs were able to do it earlier in 2025.

I had some test functions where minimization could be optionally used, but wanted to do one where minimization was needed, like the Ackermann function. Most of the frontier models struggled with doing this, although I may have been prompting incorrectly. Although - if I had been prompting totally correctly, I probably could have gotten what I got out of a frontier LLM in early 2025 and before.

Incidentally the test function that tells you if a number is prime in Emacs Lisp with primitive recursion is

(defalias 'prime (c (c (c (r 's (c 'z (p 1))) (p 1) 'z) (c (r (p 1) (c 's (p 2))) (c (c (c (r 'z (c (c 's 'z) (p 1))) (p 1) 'z) (c (r (p 1) (c (c (r 'z (p 1)) (p 1) 'z) (p 2))) (p 1) (p 2))) (p 2) (p 1)) (c (c (c (r 'z (c (c 's 'z) (p 1))) (p 1) 'z) (c (r (p 1) (c (c (r 'z (p 1)) (p 1) 'z) (p 2))) (p 2) (p 1))) (p 2) (p 1)))) (c (c (r 'z (c (r (p 1) (c 's (p 2))) (c (c (r 'z (c (r (p 1) (c 's (p 2))) (p 2) (p 3))) (c (c (c (r 's (c 'z (p 1))) (p 1) 'z) (c (r (p 1) (c 's (p 2))) (c (c (c (r 'z (c (c 's 'z) (p 1))) (p 1) 'z) (c (r (p 1) (c (c (r 'z (p 1)) (p 1) 'z) (p 2))) (p 1) (p 2))) (p 2) (p 1)) (c (c (c (r 'z (c (c 's 'z) (p 1))) (p 1) 'z) (c (r (p 1) (c (c (r 'z (p 1)) (p 1) 'z) (p 2))) (p 2) (p 1))) (p 2) (p 1)))) (c (c (r (p 1) (c (c (r 'z (p 1)) (p 1) 'z) (p 2))) (c (r 'z (c (r (p 1) (c 's (p 2))) (p 2) (p 3))) (p 2) (c (r 'z (c (r (p 1) (c 's (p 2))) (p 2) (c (c (r 's (c 'z (p 1))) (p 1) 'z) (c (r 'z (c (r 'z (c (r (p 1) (c 's (p 2))) (p 2) (p 3))) (c 's (p 2)) (c (c (r 's (c 'z (p 1))) (p 1) 'z) (c (c (c (r 's (c 'z (p 1))) (p 1) 'z) (c (r (p 1) (c 's (p 2))) (c (c (c (r 'z (c (c 's 'z) (p 1))) (p 1) 'z) (c (r (p 1) (c (c (r 'z (p 1)) (p 1) 'z) (p 2))) (p 1) (p 2))) (p 2) (p 1)) (c (c (c (r 'z (c (c 's 'z) (p 1))) (p 1) 'z) (c (r (p 1) (c (c (r 'z (p 1)) (p 1) 'z) (p 2))) (p 2) (p 1))) (p 2) (p 1)))) (c 's (p 2)) (p 3))))) (c 's (p 1)) (p 3))))) (p 1) (p 2))) (p 1)) (p 1) (p 2)) (c 'z (p 1))) (c (c (r 'z (c (c 's 'z) (p 1))) (p 1) 'z) (p 1))) (p 3) (c 's (p 1))) (p 2))) (p 1) (p 1)) (p 1)) (c 's (c 's 'z))))

101008 1 day ago

I'm working on a Math product as a side project and AI is really bad at writing Lean, too

z3ratul163071 1 day ago

even the ai gets lost in the parenthesis

TMWNN 1 day ago

I gave Copilot the other day my Elisp code, and it asked if I wanted improvements. Upon my approval, it immediately produced a revision that added two new, useful features and worked out of the box. Very impressive.

nromiun 1 day ago

> I'd blow $10-$20 in a handful of minutes with not much to show for it but sort of OK lisp code that I ended up rewriting.

Damn. And here I have a Gemini Pro subscription sitting unused for a year now.

TacticalCoder 1 day ago

> I wonder what adaptations will be necessary to make AIs work better on Lisp.

Some are going to nitpick that Clojure isn't as lispy as, say, Common Lisp but I did experiment with Claude Code CLI and my paid Anthropic subscription (Sonnet 4.6 mostly) and Clojure.

It is okay'ish. I got it to write a topological sort and pure (no side effect) functions taking in and returning non-totally-trivial data structures (maps in maps with sets and counters etc.). But apparently it's got problems with...

... drumroll ...

The number of parentheses. It's so bad that the author of figwheel (a successful ClojureScript project) is working on a Clojure MCP that fixes parens in Clojure code spoutted by AI (well the project does more than that, but the description literally says it's "designed to handle Clojure parentheses reliably").

You can't make that up: there's literally an issue with the number of closing parens.

Now... I don't think giving an AI access to a Lisp REPL and telling it: "Do this by bumping on the guardrails left and right until something is working" is the way to go (yet?) for Clojure code.

I'm passing it a codebase (not too big, so no context size issue) and I know what I want: I tell it "Write a function which takes this data structure in and that other parameter, the function must do xxx, the function must return the same data structure out". Before that I told it to also implement tests (relatively easy for they're pure functions) for each function it writes and to run tests after each function it implements or modify.

And it's doing okay.

tasty_freeze 1 day ago

Sometimes LLMs astonish me with what the code they can write. Other times I have to laugh or cry.
As an example, I asked claude 3.5 back when that was the latest to indent all the code in my file by four more spaces. The file was about 700 lines long. I got a busy spinner for two minutes then it said, "OK, first 50 lines done, now I'll do the rest" and got another busy spinner and it said, "this is taking too long. I'm going to write a program to do it", which of course it had no problem doing. The point is that it is superhuman at some things and completely brain-dead about others, and counting parens is one of those things I wouldn't expect it to be good at.
- nextos 1 day ago
  
  I think LLMs are great at compression and information retrieval, but poor at reasoning. They seem to work well with popular languages like Python because they have been trained with a massive amount of real code. As demonstrated by several publications, on niche languages their performance is quite variable.
  
  smackeyacky 1 day ago
  
  I used to find it better to shortcut the AI by asking it to write python to do a task. Claude 4.6 seems to do this without prompting.
  Edit: working on a lot of legacy code that needs boring refactoring, which Claude is great at.
- lagniappe 1 day ago
  
  That's you at the time not knowing LLM fundamentals with regards to context management.
  
  tasty_freeze 1 day ago
  
  That was me at the time kicking the tires to understand what it was good at or not. If I actually wanted to indent a file by four spaces it would take me less time in my editor than to prompt the LLM to do it, even if the LLM had been capable of it.
surround 1 day ago

I think you're right. Try asking GPT-5 this:
> Are the parentheses in ((((()))))) balanced?
There was a thread about this the other day [1]. It's the same issue as "count the r's in strawberry." Tokenization makes it hard to count characters. If you put that string into OpenAI's tokenizer, [2] this is how they are grouped:
Token 1: ((((
Token 2: ()))
Token 3: )))
Which of course isn't at all how our minds would group them together in order to keep track of them.
[1] https://news.ycombinator.com/item?id=47615876 [2] https://platform.openai.com/tokenizer
- frwrfwrfeefwf 1 day ago
  
  does the ai performance drop if it uses letters for tokens rather than tokens for tokens?
  
  surround 1 day ago
  
  Try asking an LLM a question like "H o w T o P r o g r a m I n R u s t ?" - each letter, separated by spaces, will be its own token, and the model will understand just fine. The issue is that computational cost scales quadratically with the number of tokens, so processing "h e l l o" is much more expensive than "hello". "hello" has meaning, "h" has no meaning by itself. The model has to waste a lot of computation forming words from the letters.
  Our brains also process text entire words at a time, not letter-by-letter. The difference is that our brains are much more flexible than a tokenizer, and we can easily switch to letter-by-letter reading when needed, such as when we encounter an unfamiliar word.
- otterley 1 day ago
  
  Don’t ask the LLM to do that directly: ask it to write a program to answer the question, then have it run the program. It works much better that way.
  
  surround 1 day ago
  
  But for lisp, a more complex solution is needed. It's easy for a human lisp programmer to keep track of which closing parentheses corresponds to which opening parentheses because the editor highlights parentheses pairs as they are typed. How can we give an LLM that kind of feedback as it generates code?
  
  otterley 21 hours ago
  
  That's a different question than the one you asked. Are you saying LLMs are generating invalid LISP due to paren mismatching?
  
  xigoi 1 day ago
  
  If the LLM is intelligent, why can’t it figure out on its own that it needs to write a program?
  
  sph 1 day ago
  
  The answer is self-evident.
- ksaj 1 day ago
  
  This is mostly because people wrongly assume that LLMs can count things. Just because it looks like it can, doesn't mean it is.
  Try to get your favourite LLM to read the time from a clock face. It'll fail ridiculously most of the time, and come up with all kinds of wonky reasons for the failures.
  It can code things that it's seen the logic for before. That's not the same as counting. That's outputing what it's previously seen as proper code (and even then it often fails. Probably 'cos there's a lot of crap code out there)
whartung 1 day ago

I had that issue with the AI doing some CL dabbling.
Things, on the whole, were fine, save for the occasional, rogue (or not) parentheses.
The AI would just go off the rails trying to solve the problem. I told it that if it ever encountered the problem to let me know and not try to fix it, I’d do it.
mark_l_watson 1 day ago

I am lazy: when an LLM messes up parenthesis when working with any Lisp language I just quickly fix the mismatch myself rather than try to fix the tooling.

bigstrat2003 1 day ago

> I'd blow $10-$20 in a handful of minutes with not much to show for it but sort of OK lisp code that I ended up rewriting.

That's what you get with every language. So, not much to really be disappointed by in terms of Lisp performance.

shevy-java 1 day ago

AI probably saw how many (((( are to be used and said "nope, not going there".

kyle787 1 day ago

Be careful what you wish for

wasting_time 1 day ago

"Elegant weapons for a more civilized age."
https://xkcd.com/297/

themafia 1 day ago

> There are reasons other than a lack of training data that makes lisp particularly AI resistant.

It's though to steal what doesn't exist.

> but AI can write hundreds of lines in one go so that it just makes sense for the AI to use a language that doesn't use the REPL. It is orders of magnitude easier and cheaper to write in high-internet-volume languages like Go and Python

Python doesn't have a REPL?

Jach 1 day ago

> Python doesn't have a REPL?
Not really in the Lisp sense. If you consider how people typically develop and modify Python code (edit file -> run from beginning -> observe effects -> start over) and how people typically develop Lisp code (rarely do "start over" and "run from beginning" happen) it becomes obvious. Most Python development resembles Go or C++, you just get to skip the explicit "compile" step and go straight to "run". The Python "REPL" is nice for little snippets and little bits of interactive modification but the experience compared to Lisp isn't the same (and I think the experience is actually better/closer to Lisp in Java, with debug mode and JRebel).
- mark_l_watson 1 day ago
  
  I agree with you, but a Python REPL in Emacs (using the ancient Python Emacs support) is very nice: initially load code from a buffer, then just reload functions as you edit them. I find it to be a nice dev experience: quick and easy edit/run cycles.
- themafia 19 hours ago
  
  > Not really in the Lisp sense.
  How does traditional human use the REPL impact AIs ability to use one language over the other?
  > Most Python development resembles Go or C++
  How do you know this? Or what source are you using to arrive at this conclusion?
  Again, super curious, if outright copyright theft _isn't_ the answer, why can't AI write lisp, then?
  
  Jach 13 hours ago
  
  > why can't AI write lisp, then?
  Contrary to the blog author I don't really believe this.
  > How does traditional human use the REPL impact AIs ability to use one language over the other?
  I don't think it does very much other than it's not the normal workflow for people vibe coding. Lisp doesn't require you to develop with an interactive mindset, but it enables it and it's very enjoyable if nothing else. Vibe coding workflow is prompt -> plan / code generation / edits -> maybe create and run some tests or run program from start -> repeat (sometimes the AI is in a loop until it hits some threshold, or has subagents, or is off on its own for long periods, and other complications). The layer of interactivity is with the AI tool, not with the program itself. You can use this workflow with Lisp just fine. Sometimes an MCP tool might offer some amount of interactivity to the AI at least, e.g. I've never tried to use an AI to do Blender work but I imagine there's an MCP that lets it do stuff in the running instance without having to constantly relaunch Blender. Blender has a Python API so the AI with no eyes might even be decent at some things nevertheless.
  Others than the blog author report using something like https://github.com/cl-ai-project/cl-mcp that lets the AI develop more bottom-up style with the REPL, perhaps even configurable to use a shared REPL with the human, where programs evolve bit by bit without restarting. I trust their report that it works though I don't really have a desire to try it. If an AI barfs a bunch of changes across several Lisp files and I want to try them out without restarting, I can just reload the system (which reloads the necessary files) on my own separately. I also don't think representation in the training data is that important at least to frontier models because they express ever more general intelligence which lets them do more with less. This is further suggested by them being decently good at things like TLA+ and Lean proofs, which don't exactly have a lot of data either.
  > How do you know this?
  I've at times been a Java developer, a C++ developer, a Python developer, a PHP developer, a Lisp developer, and others. I read about and observe how people develop their programs and how commercial tooling advertises itself. Hot reloading tooling technically exists in a lot of places and gets you some of the way towards what Lisp provides out of the box but it's not used by the majority and usually comes with a lot of asterisks for where it will fail. I'd say one of the biggest differences with a lot of Python code vs. other langs is the prevalence of jupyter notebooks, but that's more similar to literate programming styles than Lisp styles, and unlike Lisp or literate programming (though I'm sure there's at least one exception) jupyter notebooks are typically used for tiny few-hundred-lines stuff at most, not large projects.
  As an example of what's out of the box in Lisp, compile is a function you can call at runtime, not a separate tool, inspect is another function you can call at runtime that lets you view and modify data, and the mouthful update-instance-for-redefined-class method is part of the standard, so you have optional custom control over class redefinitions modifying existing objects rather than just "invalidating" them and/or forcing them to keep using older copies of methods forever, or filling new fields with default values (though this is usually a fine default), or like default Java debug mode in eclipse/intellij saying "woops, we can't reload this change, you have to restart!". I like to advertise JRebel because it doesn't have as many limitations and goes very far indeed by working with the whole Java ecosystem. e.g. XML files that under normal development are used to configure and initialize objects at program start time, changes to which require a restart, are monitored by JRebel and when changed trigger reinitialization without having to restart anything. That's the Lisp way, though in Lisp you'd have to setup your own XML file watchers for something like that. (Djula is a Django-inspired example for web template files, it does reloading by just checking file modification times. One could use something fancier on Linux like inotifywait. Though some Lisp developers just write their HTML with s-expressions and so changes to a page are just recompiling a function like normal development rather than saving a separate mostly-html template file. Lisp gives you many options of how you prefer to develop and deploy changes to a website. I like to ship a binary.)

bitwize 1 day ago

"Expressive languages" like Lisp are for weak human minds.

Now is the time to switch to a popular language and let the machines wrangle it for you. With more training data available, you'll be far more productive in JavaScript than you ever were in Lisp.

nineteen999 1 day ago

Oh. My. God. Will the LISP community ever stop MOANING? It is the consistently most depressing, woe-is-me wailing in the entire IT segment.

You guys are depressing.