There is something deep in this observation. When I reflect on how I write code, sometimes it’s backwards. Sometimes I start with the data and work back through to the outer functions, unnesting as I go. Sometimes I start with the final return and work back to the inputs. I notice sometimes LLMs should work this way, but can’t. So they end up rewriting from the start.
Makes me wonder if future llms will be composing nonlinear things and be able to work in non-token-order spaces temporarily, or will have a way to map their output back to linear token order. I know nonlinear thinking is common while writing code though. current llms might be hiding a deficit by having a large and perfect context window.
Right, but that smoothly(ish) resolves all at the same time. That might be sufficient, but it isn't actually replicating the thought process described above. That non-linear thinking is different than diffuse thinking. Resolving in a web around a foundation seems like it would be useful for coding (and other structured thinking, in general).
With enough resolution and appropriately chosen transformation steps, it is equivalent. E.g., the diffusion could focus on one region and then later focus on another, and it's allowed to undo the effort it did in one region. Nothing architecturally prohibits that solution style from emerging.
The choice of transformation steps to facilitate this specific diffuse approach seems like a non-trivial problem. It doesn't follow such an organic solution would emerge at all, now, does it?
The pattern ", now, " is indicative of a sort of patronization I don't normally engage with, but, yes, you're correct.
In some measure of agreeing with you: For other classes of models we know for a fact that there exist problems which can be solved by those architectures and which can't be trained using current techniques. It doesn't feel like a huge stretch that such training-resistent data might exist for diffusion models.
That said, I still see three problems. Notably, the current ancestral chain of inquiry seems to care about the model and not the training process, so the point is moot. Secondarily, in other similar domains (like soft circuits) those organic solutions do seem to emerge, suggesting (but not proving) that the training process _is_ up to par. Lastly, in other related domains, when such a solution doesn't emerge it ordinarily happens because some simpler methodology achieves better results, meaning that even with individual data points suggesting that diffusion solutions don't model that sort of linearity you still need to work a little bit to prove that such an observation actually matters.
The process of developing software involves this kind of non-linear code editing. When you learn to do something (and the same should go for code, even if sometimes people don't get this critical level of instruction), you don't just look at the final result: you watch people construct the result. The process of constructing code involves a temporarily linear sequence of operations on a text file, but your cursor is bouncing around as you put in commands that move your cursor through the file. We don't have the same kind of copious training data for it, but thereby what we really need to do is to train models not on code, but on all of the input that goes into a text editor. (If we concentrate on software developers that are used to do doing work entirely in a terminal this can be a bit easier, as we can then just essentially train the model on all of the keystrokes they press.)
There's a fair amount of experimental work happening trying different parsing and resolution procedures such that the training data reflects an AST and or predicts nodes in an AST as an in-filling capability.
LLMs don't have memory, so they can't build anything. Insofar as they produce correct results, they have implicit structures corresponding to ASTs built into their networks during training time.
> Sometimes I start with the final return and work back to the inputs.
Shouldn't be hard to train a coding LLM to do this too by doubling the training time: train the LLM both forwards and backwards across the training data.
GP is talking about the nonlinear way that software engineers think, reason, and write down code. Simply doing the same thing but backwards provides no benefit.
Another example of this is Claude placing unnecessary imports when writing Python, because it's hedge-importing modules that it suspects it might need later.
I'd be surprised. That kind of thing was en vogue for a little while in the early 2000s before cooler heads prevailed, but now people will understandably shout at you for changing behavior in someone else's code.
My guess is that nearly all packages that did this sort of thing were left behind in the 2-to-3 migration, which a lot of us used as the excuse for a clean break.
But I agree that observable side effects are generally pretty rare. And apparently, both libraries are not even in the top 100 packages, depending on how you count. It looks like those spots are all taken by libraries used in uncached, wasteful CI workflows: https://hugovk.github.io/top-pypi-packages/
Oof. That doesn't count in my opinion. The conflict is unfortunate, but it's not because either package is trying to modify other code. That is, the error is a side effect of how loading multiple libs into the interpret works. In theory, at least, you could fix those bugs without modifying the packages' behavior at all.
But still a bummer, to be sure. It's easy enough for me to say it doesn't count when I haven't been affected by it.
Well “import torch” for example will resolve certain dynamically linked symbols, which must be done first before importing your own .so code that uses libtorch and pybind11. If not you will get a super fun to debug segfault, leaving you staring at gdb backtrace output while you ponder your career choice.
This is buried deep in the PyTorch docs and I don’t have the willpower to go find it right now, sorry.
Heh. A ML library was my sneaking suspicion of where there might be something unexpected. Anything goes for performance and/or to get Nvidia to cooperate.
>Seems like it could easily be training data set size as well.
I'm convinced that's the case. On any major LLM I can carpet bomb Java/Python boilerplate without issue. For Rust, at least last time I checked, it comes up with non-existing traits, more frequent hallucinations and general struggle to use the context effectively. In agent mode it turns into a first fight with the compiler, often ending in credit destroying loops.
And don't get me started when using it for Nix...
So not surprised about something with orders of magnitude smaller public corpus.
I realized this too, and it led me to the conclusion that LLMs really can't program. I did some experiments to find what a programming language would look like, instead of e.g. python, if it were designed to be written and edited by an LLM. It turns out that it's extremely verbose, especially in variable names, function names, class names, etc. Actually, it turned out that classes were very redundant. But the real insight was that LLMs are great at naming things, and performing small operations on the little things they named. They're really not good at any logic that they can't copy paste from something they found on the web.
> I did some experiments to find what a programming language would look like, instead of e.g. python, if it were designed to be written and edited by an LLM.
Did your experiment consist of asking an LLM to design a programming language for itself?
Yes. ChatGPT 4 and Claude 3.7. They led me to similar conclusions, but they produced very different syntax, which led me to believe that they were not just regurgitating from a common source.
Great so your experiment just consisted of having an LLM hallucinate
That's not really an experiment is it? You basically just used them to create a hypothesis but you never actually proved anything
They're great at writing text and code so the fact that the other LLM was able to use that syntax to presumably write code that worked (which you had no way of proving since you can't actually run that code) doesn't really mean anything
It would be similar to having it respond in a certain JSON format, they are great at that too. Doesn't really translate to a real world codebase
> That's not really an experiment is it? You basically just used them to create a hypothesis but you never actually proved anything
The experiment was checking how well another unrelated LLM could write code using the syntax. And then in the reverse direction in new sessions.
> They're great at writing text and code so the fact that the other LLM was able to use that syntax to presumably write code that worked (which you had no way of proving since you can't actually run that code) doesn't really mean anything
Of course I could check the code. I had no compiler for it, but "running" code in one's head without a compiler is something first year students get very good at in their Introduction To C course. And checking how they edit and modify the code.
This isn't a published study, it was an experiment. And it influenced how I use LLMs for work, for the better. I'd even call that a successful experiment, now that I better understand the strengths and limitations of LLMs in this field.
I let the LLM come up with all the boiler plate classes, functions, modules, etc that it wants. I let it name things. I let it design the API. But what I don't let it do, is design the flow of operations. I come up with a flow chart as a flow of operations, and explain that to the LLM. Almost any if statement is a result of something I specifically mentioned.
There wasn't, but after taking the syntax that I developed with one model to another model, and having it write some code in that syntax, it did very well. Same in the other direction.
LLMs need all their context within easy reach. An LLM-first (for editing) language still has code comments and docstrings. Identifier names are long, and functions don't really need optional parameters. Strict typing is a must.
Is this really a surprise? I'd hazard a guess that the ability to program and beyond that - to create new programming languages - requires more than just probabilistic text prediction. LLMs work for programming languages where they have enough existing corpus to basically ape a programmer having seen similar enough text. A real programmer can take the concepts of one programming language and express them in another, without having to have digested gigabytes of raw text.
There may be emergent abilities that arise in these models purely due to how much information they contain, but I'm unconvinced that their architecture allows them to crystallize actual understanding. E.g. I'm sceptical that there'd be an area in the LLM weights that encodes the logic behind arithmetic and gives rise to the model actually modelling arithmetic as opposed to just probabilistically saying that the text `1+1=` tended to be followed by the letter `2`.
In my experience, claude works well at writing rust, and gemini is terrible. gemini writes rust as if it's a C++ programmer who has spent one day learning the basics of rust.
i tried gemini, openai, copilot, claude on reasonably big rust project.
claude worked well to fix use, clippy, renames, refactorings, ci. i used highest cost claude with custom context per crate.
never was able to get it write new code well.
for nix, i is nice template engine to start or search. did not tried big nix changes.
Hebrew is still written sequentially in Unicode. The right-to-left aspect there is simply about how the characters get displayed. On mixed documents, there is U+200E and U+200F to change the text direction mid stream.
From the perspective of a LLM learning from Unicode, this would appear as a delimeter that needs to be inserted on language direction boundaries; but everything else should work the same.
I know I'm being pedantic, but I just want to point out that even U+200E/U+200F are generally not needed. If you put a Hebrew word in the middle of an English sentence, it displays correctly all by itself. This is due to the Unicode bidirectional algorithm, which defines a super sensible default behavior. You only need the RTL control characters in weird circumstances, perhaps ones involving punctuation marks or unusual uses of special characters.
Everything is written sequentially in the sense that the character that is written first can only be followed by the character that is written next. In this sense writing non-sequentially is logically impossible.
No no, the second character you write must always be temporally preceded by the character you wrote first. Otherwise the second wouldn't have been the second, but the first, and moreover, the first would have been the second, which it wasn't.
You could write multiple characters simultaneously. CRTs sort-of did that, for example, starting characters with ascenders before those without and finishing the characters without descenders before those with descenders.
So, in the word “gif”, they would start writing the “f” first and finish writing the “i” first (just before writing the last part of the “f”. For “if”, writing the “f” would start before writing the “i” started and finish after writing the “i” finished.
In traditional printing “writing” can happen simultaneously for an entire page, but colour printing can make things more complex.
I encourage you to find some place that still uses a Hebrew typewriter. When they have to type numbers, they'll type the number in backwards. And an old Hebrew encoding also encoded characters in reverse order.
I think parent just means that "backwards" is a relative term. Your backwards is someone else's "forward". For someone who is used to reading Hebrew, they would be used to reading right to left and this would seem completely natural, no?
Basically, the numbers 1234 and 4321 are identical assuming one is written left to right and the other is right to left. Then it's just a convention which way you are used to reading.
I know nothing of Old (or New) Hebrew unfortunately so I may be completely off base.
No, because Hebrew words are read right-to-left in Hebrew letters, but numbers are read left-to-right in Arabaic numerals. The direction of reading switches mid-sentence, but typewriters only type in one direction.
Arguably Arabic numbers must always be read right-to-left, even in English, because the least significant digits can be read in order, while the value of the most significant digit depends on the number of less significant digits to the right. So in Hebrew the general reading direction actually fits Arabic numbers better.
> And it’s not the same as translation to Arabic or Hebrew; direction here refers to the temporal order in which the tokens are produced; even for right-to-left languages, the order in which the tokens get produced remains unchanged; rather, a thin display layer handles the visual presentation.
This is something that diffusion based models would capable of. For example diffusion-coder https://arxiv.org/abs/2506.20639
Could be trained on right to left, but it doesn't seem like they did.
Cognitive load in LLMs: When LLMs are faced with syntactic complexity (Lisp/J parentheses/RL-NOP), distractors (cat facts), or unfamiliar paradigms (right-to-left evaluation), the model’s performance degrades because its "attention bandwidth" is split or overwhelmed. This mirrors human cognitive overload.
My question: is there a way to reduce cognitive load in LLMs?, one solution seems to be process the input and output format so that the LLM can use a more common format. I don't know if there is a more general solution.
LLMs use tokens, with 1d positions and rich complex fuzzy meanings, as their native "syntax", so for them LISP is alien and hard to process.
That's like reading binary for humans. 1s and 0s may be the simplest possible representation of information, but not the one your wet neural network recognizes.
Agreed, Gleam as a language has very few, generalized syntactic constructs compared to most procedural languages. There's enough of a signal in the data to be able to answer queries about the language; but when writing, LLMs universally trip over themselves. The signal from other nearby languages is too strong and it ends up trying to do early returns, if statements, even loops on occasion.
I usually use deepseek (gratis) for code, and when using defun and let it usually lacks one (or more) closing parenthesis. So the way to mark the end is not well understood by this LLM, or perhaps that the height of the AST is usually bigger than in python.
I think a translation layer to a lower-density language might be a good solution; e.g. Iverson's divisible-by-11 check, 0=11|-/d, can be verbosely done in Python with
I always thought APL was written in the wrong direction. It writes like a concatenative language that's backwards--you tack things onto the front. NumPy fixes it by making the verbs all dotted function calls, effectively mirroring the order. e.g. in APL you write "10 10 ⍴ ⍳100" but in NumPy you write "np.arange(1, 101).reshape(10, 10)". Even if you don't know either language, you can tell that the APL version is the reverse of the Python version.
My hot take is that Iverson was simply wrong about this. He couldn't be expected to predict code completion and then LLMs both wanting later tokens to depend on earlier tokens. SQL messed it up, too, with "from" not coming first. If APL were developed today, I think left-to-right evaluation would have been preferred. The popularity of dotted function calls in various languages makes it reasonably clear that people like tacking things onto the end and seeing a "pipeline" form from left to right.
It’s not very different, but it’s the numpy way is not the math way: when you talk math, you say “the exponent of the absolute value of the cosine of x”, like in APL, not “take x, get its cosine, then take the absolute value, and then get its exponent”
In fact, for many things, you so the math way in numpy as well. But in other things, the dot/object-oriented way is preferred.
APL is just consistent, terse, mathematical notation.
With complicated formulas, it often makes more sense and can give more guidance by first talking about the last operations to be applied. This seems to match the LLM structure, by starting by describing what we want, and then filling in the more specialized holes as we get to them. "Top-down" design vs "bottom-up".
Your insight about APL being reverse-concatenative is very cool.
I think in the long run the sensible way to deal with this kind of monitoring is either shared-IP web endpoints for European ISPs, or per-connection random IPv6 addresses, reallocated continuously.
Once you get used to it, traditional ways look tedious and annoying to me. I think the power is in 'once you get used to it'. That will keep out most people. See python llm implementations vs k ones as a novice and you will see verbose unreadable stuff vs line noise. When you learn the math you see verbose code where the verbose code adds nothing at all vs exactly what you would write if you could.
Tedious and annoying for one-off commands maybe. It's like regex. Pretty compelling if you're writing a one-off pattern, you get immediate feedback and then you throw it away.
But it's not a good idea to use regexes in code that you're going to use long term. It's justifiable for simple regexes, and many people go against this advice, but really for anything remotely complex regexes become totally unreadable and extremely bug prone. Complex regexes are a huge code smell and array languages are pretty much one enormous regex.
Yeah never had that issue; maybe it's because I have been doing regexes (and apl which people call write only as well) for 30 years: it is not unreadable, nor throw away. I find it far more readable than the alternatives; reading pages of elaborate 'pseudocode' is more bothersome/time consuming than a oneliner to me.
Chiming in - I've found the Pattern type in Unison[0] to be very nice to use. When you're just using the built-in patterns, it is similar to verbose regex. The real power of them is that it's easy to define, name, and re-use sub-patterns. I think it's similar to parser combinators in this way, like nom from rust[1].
It might be a question of familiarity rather than objective usability. I'm writing this comment in Latin letters rather than Cyrillic or Hebrew because I find Latin letters much more usable than Cyrillic or Hebrew. But that's because I've been surrounded by Latin letters since I was born, and have only occasionally encountered Cyrillic or Hebrew.
I think it's obvious that Cyrillic isn't any less usable than the Latin alphabet in any objective sense. In fact, I'm using English orthography, which has all kinds of unnecessary usability problems which aren't present in any Cyrillic orthography that I know of. But familiarity is a much stronger factor; even today I can barely sound out words in Russian or Ukrainian, while English text printed in Latin letters is clearer to me than speech.
On theoretical grounds, I suspect that the APL syntax Gabi is calling RL-NOP is less usable for left-to-right readers than at least LR-NOP and maybe even conventional Please Brutally Execute My Dear Aunt Sally operator precedence. But familiarity is such a strong force that this hypothesis is very difficult to test.
The theoretical grounds are that, when reading left to right, a reader must maintain a stack of pending operators and values in their mind, unless they are saved by parentheses. (The Iverson quote disagrees with this, but I think Iverson was wrong.) Maintaining mental stacks is difficult and error-prone; this is the reason for the Tim Peters proverb, "Flat is better than nested."
I suspect that operator precedence might be superior for two reasons:
1. It more often avoids parentheses, which are extra symbols to recognize and correctly pair up in your mind.
2. The meaning of high-precedence subexpressions like `x×b` are almost context-independent—although an exponentiation operator or something like a C struct field selector could still follow `b` and change its meaning, following multiplications, divisions, additions, subtractions, or comparisons will not, and preceding additions, subtractions, or comparisons also will not. I conjecture that this facilitates subconscious pattern recognition.
But the familiarity factor enormously outweighs these theoretical considerations for me.
> " I suspect that the APL syntax ... is less usable for left-to-right readers"
On the contrary, I find it much more usable for left-to-right readers, because it allows a "top-down" reading of the expressions, instead of a "bottom-up" reading.
When trying to understand an unfamiliar program, for debugging or maintenance, you normally do not want to waste time by reading completely all expressions, which provide irrelevant computation details.
You typically search where some variables are modified and how and why. For this it is frequently enough to look only at the last operations that have been performed before storing a modified value into a variable.
With the Iverson notation, the last operations are always conveniently grouped at the left side of a text line. Thus you read from left to right only as much as necessary to find what you need, then you can skip the rest of the line.
With the school notation, the required information is not grouped at one end of the line, so reading becomes slower.
The opposite of the Iverson notation, which was used in some stack-oriented languages, also groups the information, but in a way that is less usable for left-to-right users.
From natural languages, left-to-right readers expect that a sentence starts with its topic (at the left side), i.e. the most important part, e.g. the last assignment, like in the Iverson notation, instead of ending with its topic, like in the opposite notation.
> "a reader must maintain a stack of pending operators and values in their mind"
I believe that few readers, if any, do this.
The normal case when reading is that you do not want to reproduce in your mind what the computer does, but only to find the information flows between program variables. For this, it is enough to read partial expressions, as explained above.
In the very rare case when you wanted to make a mental calculation identical to that of the computer, you would normally read the expression from right to left.
When writing, the Iverson notation is usually more convenient than the school notation, while writing normally, from left to right. The reason is that for most computations the natural way to find the expression that must be computed is to go backwards, from the desired result towards the available data.
The popularity of a convention has no relationship with its usability.
Everybody learns in school the traditional convention for writing mathematical expressions.
It appears that for most people it is difficult or impossible to unlearn later such a convention, even if they encounter a superior convention.
On the other hand, I am among those fewer for which this is not true, so when I have first read the book "A Programming Language" of K. Iverson, on which the later APL language and its successors have been based, I have immediately recognized that the Iverson convention is much better than the school convention, and I have no trouble in using it.
When reading a program written with the Iverson convention, you still read from left to right, but you typically do not read until the end of the line, but only as much of the left part as necessary to understand the purpose of the line. (Because the right operand of any operator is everything that follows it until the end of the line, and the details of that computation may be irrelevant. With school notation, when searching where a variable has been modified and how, you must jump between the beginning of the line and the end of the line, to find the last operations that have generated the stored value, when reading and understanding the complete expression would be a waste of time.)
The original motivation of the Iverson convention, which remains very important, was to give a useful meaning for a sequence of identical non-commutative operators, e.g. subtraction and division. This is particularly desirable when the operators are used in vector reductions.
(With school notation, a0 - a1 - a2 - ... - an is seldom a useful expression, but with the Iverson convention it becomes alternate sum, which is needed very frequently. Similarly for division.)
Even if you aren’t used to it, you’d be able to reason yourself through it, knowing how the language works, and would be aware that you need to reason through it. And it isn’t that LLMs don’t know that the language works that way, if you ask them about it. It also isn’t that they aren’t able to reason through it, if you ask them to do so. It’s that they lack awareness when to switch modes, lack the ability to have their knowledge interrupt their “intuitive” output and instead start reasoning about how to proceed.
There is not enough q/kdb full source code "out there" that would have made it into the LLM training data. It tends to be used in secretive environments and can be very site specific in convention. I bet a purpose built small fine tune with real + synthetic data would be enough to get something generating better Q code.
Interesting. Upshot - right to left eval means you generally must start at the end, or at least hold an expression in working memory - LLMs - not so good at this.
I wonder if diffusion models would be better at this; most start out as sequential token generators and then get finetuned.
My curmudgeonly genius Q/Kdb+ programmer of a co-worker, whom claims to be immune to the impact of LLMs, is going to be fucking pissed when he hears about Qython.
:D Well I'm still building Qython, but if your colleague has some example code snippets they think particularly difficult to translate, I'd love to take on the challenge!
most mainstream models are decoders vs. encoders-decoders, diffusers, etc. and lack reversible causal reasoning, which of course can be counter-intuitive since it doesn’t feel that way when models can regenerate prior content
some hacks for time / position/ space flipping the models:
- test spate of diffusion models emerging. pro is faster, con is smaller context, ymmv is if trained on that language &/or context large enough to ICL lang booster info
- exploit known LTL tricks that may work there’s bunch of these
- e.g., tell model to gen drafts in some sort RPN variant of lang, if tests tell it to simulate creating such a fork of this and then gen clean standard form at end
- have it be explicit about leapfrogging recall and reasoning, eg be excessively verbose with comments can regex strip later
- have it build a stack / combo of the RPN & COT & bootstrapping its own ICL
- exploit causal markers - think tags that can splinter time - this can really boost any of the above methods - eg give each instance of things disjoint time tags, A1 vs K37 for numbered instances of things that share a given space - like a time GUID
- use orthogonal groups of such tags to splinter time and space recall and reasoning in model, to include seemingly naive things like pass 1 etc
- our recent arXiv paper on HDRAM / hypertokens pushes causal markers to classic-quantum holographic extreme and was built for this, next version will be more accessible
- the motivators are simple - models fork on prefix-free modulo embedding noise, so the more you make prefix-free, the better the performance, there’s some massive caveats on how to do this perfectly which is exactly our precise work - think 2x to 10x gain on model and similar on reasoning, again ymmv as we update preprint, post second paper that makes baseline better, prep git release etc to make it tons easier to get better recall and exploit same to get better reasoning by making it possible for any model to do the equivalent of arbitrary RPN
- our future state is exactly this a prompt compiler for exactly this use case - explainable time-independent computation in any model
don't plan on it staying that way. I used to toss wads of my own forth-like language into LLMs to see what kinds of horrible failure modes the latest model would have in parsing and generating such code.
at first they were hilariously bad, then just bad, then kind of okay, and now anthropic's claude4opus reads and writes it just fine.
it varied. with the earlier models, generally more, trying to see if some apparition of mechanical understanding would eventually click into place.
IIRC, none of the gpt3 models did well with forth-like syntax. gpt4 generally did okay with it but could still get itself confused. claude4opus doesn't seem to have any trouble with it at all, and is happy to pick up the structures contextually, without explicit documentation of any sort.
another of my languages uses some parse transforming 'syntactic operators' that earlier models could never quite fully 'get', even with explanation. likely because at least one of them has no similar operator in popular languages. claude4opus, however, seems to infer them decently enough, and a single transform example is sufficient for it to generalize that understanding to the rest of the code it sees.
so far, claude has proved to be quite an impressive set of weights.
That is excellent, I am also using it to prototype designing languages and 3.7 and 4.0 models are really quite good for this. I haven't found substantial academic research in using LLMs for making prototype language compilers.
"Claude is aware of that, but it struggled to write correct code based on those rules"
It's actually not, and unless they in some way run a rule engine on top of their LLM SaaS stuff it seems far fetched to believe it adheres to rule sets in any way.
Local models confuse Python, Elixir, PHP and Bash when I've tried to use them for coding. They seem more stable for JS, but sometimes they slip out of that too.
Seems pretty contrived and desperate to invent transpilers from quasi-Python to other languages to try and find a software development use for LLM SaaS. Warnings about Lisp macros and other code rewrite tools ought to apply here as well. Plus, of course, the loss of 'notation as a tool of thought'.
I don't know what counts as a major model. Relevant to this, I've dabbled with Gemma, Qwen, Mistral, Llama, Granite and Phi models, mostly 3-14b varieties but also some larger ones on CPU on a machine that has 64 GB RAM.
I think the issue there is those smaller versions of those models. I regularly use Gemma3 and Qwen3 for programming without issue but in the 27b-32b range. Going smaller than that generally yields garbage.
no, it wasn't your choice how you were taught to read and write something like this:
1|2*3>>4+5
in C and k, this expression should hopefully evaluate to 1, but this is just a lucky coincidence: reading and writing these two expressions are wildly different in complexity in those two languages. if you're not sure what i mean, ask your local LLM to explain why that is, but make sure you're sitting down. what you'll discover is that what you think you "simply chose to do" is not what you're actually doing.
while it is true that you can write code anyway you deem fit, i'm afraid you're a bit confused about the actual direction you're forced to think you chose to write it.
but once you're there, it suddenly gets a lot less complicated, and - miraculosly - doesn't cancel out or mess up your previous beliefs and habits.
k/q, of apl heritage, are beautiful - first and foremost because they're simple to write and simple to read.
It's not because of the left of right evaluation. If the difference was that simple, most humans, let alone LLMs, wouldn't struggle with picking up q when they come from the common languages.
Usually when someone solves problems with q, they don't use the way one would for Python/Java/C/C++/C#/etc.
This is probably a poor example, if I asked someone to write a function to create an nxn identity matrix for a given number the non-q solution would probably involve some kind of nested loop that checks if i==j and assigns 1, otherwise assigns 0.
In q you'd still check equivalence, but instead of looping, you generate a list of numbers as long as the given dimension and then compare each item of the list to itself:
{x=/:x:til x}3
An LLM that's been so heavily trained on an imperative style will likely struggle to solve similar (and often more complex) problems in a standard q manner.
A human can deal with right-to-left evaluation by moving the cursor around to write in that direction. An LLM can’t do that on its own. A human given an editor that can only append would struggle too.
Might help. You could also allow it to output edits instead of just a sequence. Probably have to train it on edits to make that work well, and the training data might be tricky to obtain.
I was curious because there is a much larger training corpus for Lisps, so if the problem really is one of training data rather ordering then this would be a way of showing that.
this thread made me revisit some past conversations with people like atw, nsl and aab with regard to possible ways to expose humans to the way rivers flow in k/q/apl land. the choices are limited, and decision takes some agony:
a) if you don't want your audience to close the tab right away, you'd say "a k expression is written, read and evaluated strictly right to left unless the precedence is explicitly overridden by parens, and this works better than you think, no worries, you'll come around. by the way, parens are evil, avoid them if you can".
b) if your intent is to retain a sharper crowd who went to yale or something, you'd say "a k expression is to be understood right of left", and throw them a freebie in form of a prompt for their local LLM in order to get lit. the magic sequence is just "f g h x leibniz".
for my own selfish reasons, i always chose the former, and it seems to perform better than the latter, proof:
still, neither approach is anywhere near the chances of successfuly explaining which way to write python code to a 5yo kid, especially its precedence rules, which are much more intuitive (lol).
to explain the same thing to an LLM is not much different, really. all you need to do is to depress your 0yo kid with an obscene amount of _quality_ python code, of which there is no shortage. obviously, the more python code is fed to LLMs, the more humans will paste more LLM-generated python code, to be fed back to LLMs, ad lemniscate.
(and don't mind the future tense, we are already there)
============
so this is why LLMs can't write k/q/apl. first, they haven't seen enough of it. second, they are helpless to understand the meaning of a quote which was once chosen to helm a book known as SICP, not to mention countless human counterparts who came across it earlier, to the same effect:
"I think that it's extraordinarily important that we in computer science keep fun in computing. When it started out it was an awful lot of fun. Of course the paying customers got shafted every now and then and after a while we began to take their complaints seriously. We began to feel as if we really were responsible for the successful error-free perfect use of these machines. I don’t think we are. I think we're responsible for stretching them setting them off in new directions and keeping fun in the house. I hope the field of computer science never loses its sense of fun. Above all I hope we don’t become missionaries. Don't feel as if you're Bible salesmen. The world has too many of those already. What you know about computing other people will learn. Don’t feel as if the key to successful computing is only in your hands. What's in your hands I think and hope is intelligence: the ability to see the machine as more than when you were first led up to it that you can make it more."
LLMs are already solving this problem using the "thinking" phase. They don't just one-shot an attempt at the output. The left-to-right narrative thinking process edits multiple drafts of the code they eventually output.
Same reason the same models don't fundamentally understand all languages. They're not trained to. Frankly the design changes to get this to work in training is minimal but this isn't the way English works so expect most of the corporate LLM to struggle because that's where the interest and money is.
Give it time until we have true globally multi lingual models for superior context awareness.
A byte tokenized model is naturally 100% multi-lingual in all languages in its data set. There just isn't a lot of reason for teams to spend the extra training time to build that sort of model.
Languages that are difficult for LLM to read & write are also difficult for the general public. These languages have always had poor uptake and never reach critical mass, or are eventually replaced by better languages.
Language designers would be smart to recognize this fact and favor making their languages more LLM friendly. This should also make them more human friendly.
I actually think Ruby on Rails is incredibly difficult for LLMs to write because of how many implicit "global state" things occur. I'm always surprised how productive people are with it, but people are productive with it for sure.
That's because global state is very convenient early on. Everything is in one place and accessible. It's convenient to prototype things this way. This is very similar to doing scientific research (and why often research code is an ugly boondoggle).
Most techies (generalizing here) start with a reasonably clear spec that needs to be implemented and they can focus on how to architect the code.
Research - whether science, finance or design - is much more iterative and freeform. Your objective is often very fuzzy. You might have a vague idea what you want, but having to think about code structure is annoying and orthogonal to your actual purpose.
This is why languages like Ruby work well for certain purposes. They allow the person to prototype extremely rapidly and iterate on the idea. It will eventually reach a breaking point where global state starts being an impediment, but an experienced dev will have started refactoring stuff earlier than that as various parts of the implementation becomes stable.
I don't find this to be true. There are languages that are difficult to wrap your head around initially, but that turn out to be delightfully productive with a few adjustments to the mental model. Adjustments that LLMs don't have the training data for.
That says nothing about the language at all, actually. Just that it's small and easily confused for something more idiomatic to a newbie.
> Adjustments that LLMs don't have the training data for.
Methinks if you want job-security in a post—LLM-zero-shot-app-generator world, get into Lisp or Haskell; People that know only Node+React from YouTube learn-2-code tutorials are going to be slaughtered.
I just had an idea: an app/GUI/backend framework for Lisp or Haskell (with an S-expression lib) where everything is structurally inverted so it must be manually run through foldr - behold: an LLM-resistant (if not LLM-proof?) dev environment!
This argument in favour of mediocrity and catering to the lowest common denominator is one of the key reasons why I dislike people who want to shove LLMs into everything (including art).
There is something deep in this observation. When I reflect on how I write code, sometimes it’s backwards. Sometimes I start with the data and work back through to the outer functions, unnesting as I go. Sometimes I start with the final return and work back to the inputs. I notice sometimes LLMs should work this way, but can’t. So they end up rewriting from the start.
Makes me wonder if future llms will be composing nonlinear things and be able to work in non-token-order spaces temporarily, or will have a way to map their output back to linear token order. I know nonlinear thinking is common while writing code though. current llms might be hiding a deficit by having a large and perfect context window.
Yes, there are already diffusion language models, which start with paragraphs of gibberish and evolve them into a refined response as a whole unit.
Right, but that smoothly(ish) resolves all at the same time. That might be sufficient, but it isn't actually replicating the thought process described above. That non-linear thinking is different than diffuse thinking. Resolving in a web around a foundation seems like it would be useful for coding (and other structured thinking, in general).
With enough resolution and appropriately chosen transformation steps, it is equivalent. E.g., the diffusion could focus on one region and then later focus on another, and it's allowed to undo the effort it did in one region. Nothing architecturally prohibits that solution style from emerging.
The choice of transformation steps to facilitate this specific diffuse approach seems like a non-trivial problem. It doesn't follow such an organic solution would emerge at all, now, does it?
The pattern ", now, " is indicative of a sort of patronization I don't normally engage with, but, yes, you're correct.
In some measure of agreeing with you: For other classes of models we know for a fact that there exist problems which can be solved by those architectures and which can't be trained using current techniques. It doesn't feel like a huge stretch that such training-resistent data might exist for diffusion models.
That said, I still see three problems. Notably, the current ancestral chain of inquiry seems to care about the model and not the training process, so the point is moot. Secondarily, in other similar domains (like soft circuits) those organic solutions do seem to emerge, suggesting (but not proving) that the training process _is_ up to par. Lastly, in other related domains, when such a solution doesn't emerge it ordinarily happens because some simpler methodology achieves better results, meaning that even with individual data points suggesting that diffusion solutions don't model that sort of linearity you still need to work a little bit to prove that such an observation actually matters.
The process of developing software involves this kind of non-linear code editing. When you learn to do something (and the same should go for code, even if sometimes people don't get this critical level of instruction), you don't just look at the final result: you watch people construct the result. The process of constructing code involves a temporarily linear sequence of operations on a text file, but your cursor is bouncing around as you put in commands that move your cursor through the file. We don't have the same kind of copious training data for it, but thereby what we really need to do is to train models not on code, but on all of the input that goes into a text editor. (If we concentrate on software developers that are used to do doing work entirely in a terminal this can be a bit easier, as we can then just essentially train the model on all of the keystrokes they press.)
I think long term LLMs should directly generate Abstract Syntax Trees. But this is hard now because all the training data is text code.
The training data is text code that can be compiled, though, so the training data can also easily be an Abstract Syntax Tree.
But is anyone actually doing this?
There's a fair amount of experimental work happening trying different parsing and resolution procedures such that the training data reflects an AST and or predicts nodes in an AST as an in-filling capability.
Do you know if any such experimental work is using a special tokenizer for example in Lisp a special token for left or right parenthesis?
It's possible that LLMs build ASTs internally for programming. I have no 1st hand data on this, but it would not surprise me at all.
LLMs don't have memory, so they can't build anything. Insofar as they produce correct results, they have implicit structures corresponding to ASTs built into their networks during training time.
"LLMs don't have memory"
That's interesting. Is there research into adding memory or has it been proven that it provides no pragmatic value over any context it outputs?
> Sometimes I start with the final return and work back to the inputs.
Shouldn't be hard to train a coding LLM to do this too by doubling the training time: train the LLM both forwards and backwards across the training data.
GP is talking about the nonlinear way that software engineers think, reason, and write down code. Simply doing the same thing but backwards provides no benefit.
Another example of this is Claude placing unnecessary imports when writing Python, because it's hedge-importing modules that it suspects it might need later.
Is it hedging or did the training data just have lots of unecessary imports?
Especially in Python, where it can be hard to tell if something is being imported purely for side effects.
That does happen, but not frequently in the common libraries that are going to be in public training data.
Is there a top 100 package that does something funny on import?
I'd be surprised. That kind of thing was en vogue for a little while in the early 2000s before cooler heads prevailed, but now people will understandably shout at you for changing behavior in someone else's code.
My guess is that nearly all packages that did this sort of thing were left behind in the 2-to-3 migration, which a lot of us used as the excuse for a clean break.
Not sure if that counts, but if you import both matplotlib and OpenCV at once, there is a good chance of a crash due to conflicting PyQt binaries: https://github.com/matplotlib/matplotlib/issues/29139
But I agree that observable side effects are generally pretty rare. And apparently, both libraries are not even in the top 100 packages, depending on how you count. It looks like those spots are all taken by libraries used in uncached, wasteful CI workflows: https://hugovk.github.io/top-pypi-packages/
Oof. That doesn't count in my opinion. The conflict is unfortunate, but it's not because either package is trying to modify other code. That is, the error is a side effect of how loading multiple libs into the interpret works. In theory, at least, you could fix those bugs without modifying the packages' behavior at all.
But still a bummer, to be sure. It's easy enough for me to say it doesn't count when I haven't been affected by it.
Well “import torch” for example will resolve certain dynamically linked symbols, which must be done first before importing your own .so code that uses libtorch and pybind11. If not you will get a super fun to debug segfault, leaving you staring at gdb backtrace output while you ponder your career choice.
This is buried deep in the PyTorch docs and I don’t have the willpower to go find it right now, sorry.
Heh. A ML library was my sneaking suspicion of where there might be something unexpected. Anything goes for performance and/or to get Nvidia to cooperate.
Dunno if it counts as funny, but the following code only works if you keep the matplotlib import:
What? There’s no way that’s correct. I use PIL exactly like that and don’t have matplotlib in my codebase.
Just try it. IIRC, to do the PIL import correctly you have to
Turns out that matplotlib (and probably lots of other stuff) does that, and then it gets resolved correctly.Oh right, gotcha. I always do from PIL import Image
You already know the answer. Claude is not an intelligent sentient thing.
However, it possibly was RL trained on code tasks and penalized for errors.
Seems like it could easily be training data set size as well.
I'd love to see some quantification of errors in q/kdb+ (or hebrew) vs. languages of similar size that are left-to-right.
>Seems like it could easily be training data set size as well.
I'm convinced that's the case. On any major LLM I can carpet bomb Java/Python boilerplate without issue. For Rust, at least last time I checked, it comes up with non-existing traits, more frequent hallucinations and general struggle to use the context effectively. In agent mode it turns into a first fight with the compiler, often ending in credit destroying loops.
And don't get me started when using it for Nix...
So not surprised about something with orders of magnitude smaller public corpus.
I realized this too, and it led me to the conclusion that LLMs really can't program. I did some experiments to find what a programming language would look like, instead of e.g. python, if it were designed to be written and edited by an LLM. It turns out that it's extremely verbose, especially in variable names, function names, class names, etc. Actually, it turned out that classes were very redundant. But the real insight was that LLMs are great at naming things, and performing small operations on the little things they named. They're really not good at any logic that they can't copy paste from something they found on the web.
> I did some experiments to find what a programming language would look like, instead of e.g. python, if it were designed to be written and edited by an LLM.
Did your experiment consist of asking an LLM to design a programming language for itself?
Yes. ChatGPT 4 and Claude 3.7. They led me to similar conclusions, but they produced very different syntax, which led me to believe that they were not just regurgitating from a common source.
Great so your experiment just consisted of having an LLM hallucinate
That's not really an experiment is it? You basically just used them to create a hypothesis but you never actually proved anything
They're great at writing text and code so the fact that the other LLM was able to use that syntax to presumably write code that worked (which you had no way of proving since you can't actually run that code) doesn't really mean anything
It would be similar to having it respond in a certain JSON format, they are great at that too. Doesn't really translate to a real world codebase
This isn't a published study, it was an experiment. And it influenced how I use LLMs for work, for the better. I'd even call that a successful experiment, now that I better understand the strengths and limitations of LLMs in this field.
> And it influenced how I use LLMs for work, for the better
How so?
I let the LLM come up with all the boiler plate classes, functions, modules, etc that it wants. I let it name things. I let it design the API. But what I don't let it do, is design the flow of operations. I come up with a flow chart as a flow of operations, and explain that to the LLM. Almost any if statement is a result of something I specifically mentioned.
Is there a reason you believe the models can accurately predict this sort of thing?
There wasn't, but after taking the syntax that I developed with one model to another model, and having it write some code in that syntax, it did very well. Same in the other direction.
LLMs need all their context within easy reach. An LLM-first (for editing) language still has code comments and docstrings. Identifier names are long, and functions don't really need optional parameters. Strict typing is a must.
Is this really a surprise? I'd hazard a guess that the ability to program and beyond that - to create new programming languages - requires more than just probabilistic text prediction. LLMs work for programming languages where they have enough existing corpus to basically ape a programmer having seen similar enough text. A real programmer can take the concepts of one programming language and express them in another, without having to have digested gigabytes of raw text.
There may be emergent abilities that arise in these models purely due to how much information they contain, but I'm unconvinced that their architecture allows them to crystallize actual understanding. E.g. I'm sceptical that there'd be an area in the LLM weights that encodes the logic behind arithmetic and gives rise to the model actually modelling arithmetic as opposed to just probabilistically saying that the text `1+1=` tended to be followed by the letter `2`.
In my experience, claude works well at writing rust, and gemini is terrible. gemini writes rust as if it's a C++ programmer who has spent one day learning the basics of rust.
i tried gemini, openai, copilot, claude on reasonably big rust project. claude worked well to fix use, clippy, renames, refactorings, ci. i used highest cost claude with custom context per crate. never was able to get it write new code well.
for nix, i is nice template engine to start or search. did not tried big nix changes.
Yep. I had similar issues asking Gemini for help with F#, I assume lack of training data is the cause.
Hebrew is still written sequentially in Unicode. The right-to-left aspect there is simply about how the characters get displayed. On mixed documents, there is U+200E and U+200F to change the text direction mid stream.
From the perspective of a LLM learning from Unicode, this would appear as a delimeter that needs to be inserted on language direction boundaries; but everything else should work the same.
I know I'm being pedantic, but I just want to point out that even U+200E/U+200F are generally not needed. If you put a Hebrew word in the middle of an English sentence, it displays correctly all by itself. This is due to the Unicode bidirectional algorithm, which defines a super sensible default behavior. You only need the RTL control characters in weird circumstances, perhaps ones involving punctuation marks or unusual uses of special characters.
> Hebrew is still written sequentially
Everything is written sequentially in the sense that the character that is written first can only be followed by the character that is written next. In this sense writing non-sequentially is logically impossible.
An older Hebrew encoding actually encoded the last character first, then the penultimate character, then the character preceding that, etc.
Exercise to the reader to guess how line breaks, text wrapping, and search algorithms worked.
Multiple characters can be written at once, they can also be done in reverse or out of order.
No no, the second character you write must always be temporally preceded by the character you wrote first. Otherwise the second wouldn't have been the second, but the first, and moreover, the first would have been the second, which it wasn't.
You could write multiple characters simultaneously. CRTs sort-of did that, for example, starting characters with ascenders before those without and finishing the characters without descenders before those with descenders.
So, in the word “gif”, they would start writing the “f” first and finish writing the “i” first (just before writing the last part of the “f”. For “if”, writing the “f” would start before writing the “i” started and finish after writing the “i” finished.
In traditional printing “writing” can happen simultaneously for an entire page, but colour printing can make things more complex.
I encourage you to find some place that still uses a Hebrew typewriter. When they have to type numbers, they'll type the number in backwards. And an old Hebrew encoding also encoded characters in reverse order.
I think parent just means that "backwards" is a relative term. Your backwards is someone else's "forward". For someone who is used to reading Hebrew, they would be used to reading right to left and this would seem completely natural, no?
Basically, the numbers 1234 and 4321 are identical assuming one is written left to right and the other is right to left. Then it's just a convention which way you are used to reading.
I know nothing of Old (or New) Hebrew unfortunately so I may be completely off base.
No, because Hebrew words are read right-to-left in Hebrew letters, but numbers are read left-to-right in Arabaic numerals. The direction of reading switches mid-sentence, but typewriters only type in one direction.
Arguably Arabic numbers must always be read right-to-left, even in English, because the least significant digits can be read in order, while the value of the most significant digit depends on the number of less significant digits to the right. So in Hebrew the general reading direction actually fits Arabic numbers better.
And the Arabs actually say the numbers from right to left. It's "one and fifty", not "fifty one".
It's also written right to left. And in general, natural language is "little-endian": Less significant information tends to be mentioned first.[1]
1: https://www.thoughtco.com/given-before-new-principle-linguis...
> (or hebrew)
W.r.t. natural languages, TFA clarifies it a bit:
> And it’s not the same as translation to Arabic or Hebrew; direction here refers to the temporal order in which the tokens are produced; even for right-to-left languages, the order in which the tokens get produced remains unchanged; rather, a thin display layer handles the visual presentation.
That’s what I thought. Lack of training data might be a reason.
This is something that diffusion based models would capable of. For example diffusion-coder https://arxiv.org/abs/2506.20639 Could be trained on right to left, but it doesn't seem like they did.
Cognitive load in LLMs: When LLMs are faced with syntactic complexity (Lisp/J parentheses/RL-NOP), distractors (cat facts), or unfamiliar paradigms (right-to-left evaluation), the model’s performance degrades because its "attention bandwidth" is split or overwhelmed. This mirrors human cognitive overload.
My question: is there a way to reduce cognitive load in LLMs?, one solution seems to be process the input and output format so that the LLM can use a more common format. I don't know if there is a more general solution.
Edit: Cat attack https://the-decoder.com/cat-attack-on-reasoning-model-shows-...
Isn't the whole idea of Lisp that there is _no_ syntactic complexity? Lisp programs are roughly a serialized AST.
LLMs use tokens, with 1d positions and rich complex fuzzy meanings, as their native "syntax", so for them LISP is alien and hard to process.
That's like reading binary for humans. 1s and 0s may be the simplest possible representation of information, but not the one your wet neural network recognizes.
Agreed, Gleam as a language has very few, generalized syntactic constructs compared to most procedural languages. There's enough of a signal in the data to be able to answer queries about the language; but when writing, LLMs universally trip over themselves. The signal from other nearby languages is too strong and it ends up trying to do early returns, if statements, even loops on occasion.
I usually use deepseek (gratis) for code, and when using defun and let it usually lacks one (or more) closing parenthesis. So the way to mark the end is not well understood by this LLM, or perhaps that the height of the AST is usually bigger than in python.
I think a translation layer to a lower-density language might be a good solution; e.g. Iverson's divisible-by-11 check, 0=11|-/d, can be verbosely done in Python with
import numpy as np
def flippedSubtract(a, b): return b - a
flipSubUfunc = np.frompyfunc(flippedSubtract, 2, 1)
def isDivBy11(number): digits = list(map(int, str(number))) discriminant = flipSubUfunc.reduce(digits) return (discriminant % 11) == 0
Though Claude already understands (has already seen?) 0=11|-/d so it's hard to tell for this example
As for the cat attack, my gut feeling is that it has to do with the LLM having been trained/instructed to be kind
I always thought APL was written in the wrong direction. It writes like a concatenative language that's backwards--you tack things onto the front. NumPy fixes it by making the verbs all dotted function calls, effectively mirroring the order. e.g. in APL you write "10 10 ⍴ ⍳100" but in NumPy you write "np.arange(1, 101).reshape(10, 10)". Even if you don't know either language, you can tell that the APL version is the reverse of the Python version.
My hot take is that Iverson was simply wrong about this. He couldn't be expected to predict code completion and then LLMs both wanting later tokens to depend on earlier tokens. SQL messed it up, too, with "from" not coming first. If APL were developed today, I think left-to-right evaluation would have been preferred. The popularity of dotted function calls in various languages makes it reasonably clear that people like tacking things onto the end and seeing a "pipeline" form from left to right.
APL was designed as a notation for math; if you pronounce it properly, it makes more sense than numpy:
The 10 by 10 reshaping of counting to 100
Numpy: Counting to 100, then reshaped to 10 x 10. Doesn't really seem all that different to me.
It’s not very different, but it’s the numpy way is not the math way: when you talk math, you say “the exponent of the absolute value of the cosine of x”, like in APL, not “take x, get its cosine, then take the absolute value, and then get its exponent”
In fact, for many things, you so the math way in numpy as well. But in other things, the dot/object-oriented way is preferred.
APL is just consistent, terse, mathematical notation.
With complicated formulas, it often makes more sense and can give more guidance by first talking about the last operations to be applied. This seems to match the LLM structure, by starting by describing what we want, and then filling in the more specialized holes as we get to them. "Top-down" design vs "bottom-up".
Your insight about APL being reverse-concatenative is very cool.
I think in the long run the sensible way to deal with this kind of monitoring is either shared-IP web endpoints for European ISPs, or per-connection random IPv6 addresses, reallocated continuously.
Basically, to make the IP no longer be PII.
Humans can't either? I think if this convention had been more usable form of programming, we'd know by now
Once you get used to it, traditional ways look tedious and annoying to me. I think the power is in 'once you get used to it'. That will keep out most people. See python llm implementations vs k ones as a novice and you will see verbose unreadable stuff vs line noise. When you learn the math you see verbose code where the verbose code adds nothing at all vs exactly what you would write if you could.
Tedious and annoying for one-off commands maybe. It's like regex. Pretty compelling if you're writing a one-off pattern, you get immediate feedback and then you throw it away.
But it's not a good idea to use regexes in code that you're going to use long term. It's justifiable for simple regexes, and many people go against this advice, but really for anything remotely complex regexes become totally unreadable and extremely bug prone. Complex regexes are a huge code smell and array languages are pretty much one enormous regex.
Yeah never had that issue; maybe it's because I have been doing regexes (and apl which people call write only as well) for 30 years: it is not unreadable, nor throw away. I find it far more readable than the alternatives; reading pages of elaborate 'pseudocode' is more bothersome/time consuming than a oneliner to me.
What would you propose as an alternative to regexes that provides the same functionality without the unreadable syntax?
I wrote something like that in C# once [0] but I'm not getting the impression that there's a lot of demand for that kind of thing.
[0] https://github.com/Timwi/Generex
There's a whole list of alternative syntaxes here:
https://github.com/oils-for-unix/oils/wiki/Alternative-Regex...
I haven't actually used them because generally the right alternative to a regex is a proper parser.
Parsing expression grammars (pegs) are usually IME more maintainable long term, partially just because of much more testable and composable they are
I suspect but am not sure that PEGs cannot do negative nor positive lookbehind, but it is not a very used feature.
Yeah that’s true IIRC but it’s rarely been a problem for my usage!
Chiming in - I've found the Pattern type in Unison[0] to be very nice to use. When you're just using the built-in patterns, it is similar to verbose regex. The real power of them is that it's easy to define, name, and re-use sub-patterns. I think it's similar to parser combinators in this way, like nom from rust[1].
[0] https://share.unison-lang.org/@unison/website/code/main/late...
[1] https://docs.rs/nom/latest/nom/
I mean, I got used to RPN and think that's the utmost bestest way to write. Objectively it's not as usable. Learnability is a part of usability
It might be a question of familiarity rather than objective usability. I'm writing this comment in Latin letters rather than Cyrillic or Hebrew because I find Latin letters much more usable than Cyrillic or Hebrew. But that's because I've been surrounded by Latin letters since I was born, and have only occasionally encountered Cyrillic or Hebrew.
I think it's obvious that Cyrillic isn't any less usable than the Latin alphabet in any objective sense. In fact, I'm using English orthography, which has all kinds of unnecessary usability problems which aren't present in any Cyrillic orthography that I know of. But familiarity is a much stronger factor; even today I can barely sound out words in Russian or Ukrainian, while English text printed in Latin letters is clearer to me than speech.
On theoretical grounds, I suspect that the APL syntax Gabi is calling RL-NOP is less usable for left-to-right readers than at least LR-NOP and maybe even conventional Please Brutally Execute My Dear Aunt Sally operator precedence. But familiarity is such a strong force that this hypothesis is very difficult to test.
The theoretical grounds are that, when reading left to right, a reader must maintain a stack of pending operators and values in their mind, unless they are saved by parentheses. (The Iverson quote disagrees with this, but I think Iverson was wrong.) Maintaining mental stacks is difficult and error-prone; this is the reason for the Tim Peters proverb, "Flat is better than nested."
I suspect that operator precedence might be superior for two reasons:
1. It more often avoids parentheses, which are extra symbols to recognize and correctly pair up in your mind.
2. The meaning of high-precedence subexpressions like `x×b` are almost context-independent—although an exponentiation operator or something like a C struct field selector could still follow `b` and change its meaning, following multiplications, divisions, additions, subtractions, or comparisons will not, and preceding additions, subtractions, or comparisons also will not. I conjecture that this facilitates subconscious pattern recognition.
But the familiarity factor enormously outweighs these theoretical considerations for me.
> " I suspect that the APL syntax ... is less usable for left-to-right readers"
On the contrary, I find it much more usable for left-to-right readers, because it allows a "top-down" reading of the expressions, instead of a "bottom-up" reading.
When trying to understand an unfamiliar program, for debugging or maintenance, you normally do not want to waste time by reading completely all expressions, which provide irrelevant computation details.
You typically search where some variables are modified and how and why. For this it is frequently enough to look only at the last operations that have been performed before storing a modified value into a variable.
With the Iverson notation, the last operations are always conveniently grouped at the left side of a text line. Thus you read from left to right only as much as necessary to find what you need, then you can skip the rest of the line.
With the school notation, the required information is not grouped at one end of the line, so reading becomes slower.
The opposite of the Iverson notation, which was used in some stack-oriented languages, also groups the information, but in a way that is less usable for left-to-right users.
From natural languages, left-to-right readers expect that a sentence starts with its topic (at the left side), i.e. the most important part, e.g. the last assignment, like in the Iverson notation, instead of ending with its topic, like in the opposite notation.
> "a reader must maintain a stack of pending operators and values in their mind"
I believe that few readers, if any, do this.
The normal case when reading is that you do not want to reproduce in your mind what the computer does, but only to find the information flows between program variables. For this, it is enough to read partial expressions, as explained above.
In the very rare case when you wanted to make a mental calculation identical to that of the computer, you would normally read the expression from right to left.
When writing, the Iverson notation is usually more convenient than the school notation, while writing normally, from left to right. The reason is that for most computations the natural way to find the expression that must be computed is to go backwards, from the desired result towards the available data.
Hmm, could be. I'll have to think about that.
The popularity of a convention has no relationship with its usability.
Everybody learns in school the traditional convention for writing mathematical expressions.
It appears that for most people it is difficult or impossible to unlearn later such a convention, even if they encounter a superior convention.
On the other hand, I am among those fewer for which this is not true, so when I have first read the book "A Programming Language" of K. Iverson, on which the later APL language and its successors have been based, I have immediately recognized that the Iverson convention is much better than the school convention, and I have no trouble in using it.
When reading a program written with the Iverson convention, you still read from left to right, but you typically do not read until the end of the line, but only as much of the left part as necessary to understand the purpose of the line. (Because the right operand of any operator is everything that follows it until the end of the line, and the details of that computation may be irrelevant. With school notation, when searching where a variable has been modified and how, you must jump between the beginning of the line and the end of the line, to find the last operations that have generated the stored value, when reading and understanding the complete expression would be a waste of time.)
The original motivation of the Iverson convention, which remains very important, was to give a useful meaning for a sequence of identical non-commutative operators, e.g. subtraction and division. This is particularly desirable when the operators are used in vector reductions.
(With school notation, a0 - a1 - a2 - ... - an is seldom a useful expression, but with the Iverson convention it becomes alternate sum, which is needed very frequently. Similarly for division.)
I think there is a reason for this, but maybe not a good one.
1. Function application should be left to right, e.g. `sqrt 4`
2. Precedence order should be very simple. In k, everything has the same precedence order (with the exceptions of brackets)
1 + 2 forces you to have this right to left convention, annoyingly.
Fwiw, I think 2 is great and I would rather give up 1 than 2. However, writing function application as `my_fun arg` is a very strong convention.
Even if you aren’t used to it, you’d be able to reason yourself through it, knowing how the language works, and would be aware that you need to reason through it. And it isn’t that LLMs don’t know that the language works that way, if you ask them about it. It also isn’t that they aren’t able to reason through it, if you ask them to do so. It’s that they lack awareness when to switch modes, lack the ability to have their knowledge interrupt their “intuitive” output and instead start reasoning about how to proceed.
I fully discount the right to left thing.
There is not enough q/kdb full source code "out there" that would have made it into the LLM training data. It tends to be used in secretive environments and can be very site specific in convention. I bet a purpose built small fine tune with real + synthetic data would be enough to get something generating better Q code.
Interesting. Upshot - right to left eval means you generally must start at the end, or at least hold an expression in working memory - LLMs - not so good at this.
I wonder if diffusion models would be better at this; most start out as sequential token generators and then get finetuned.
Try it out? https://deepmind.google/models/gemini-diffusion/
My curmudgeonly genius Q/Kdb+ programmer of a co-worker, whom claims to be immune to the impact of LLMs, is going to be fucking pissed when he hears about Qython.
:D Well I'm still building Qython, but if your colleague has some example code snippets they think particularly difficult to translate, I'd love to take on the challenge!
most mainstream models are decoders vs. encoders-decoders, diffusers, etc. and lack reversible causal reasoning, which of course can be counter-intuitive since it doesn’t feel that way when models can regenerate prior content
some hacks for time / position/ space flipping the models:
- test spate of diffusion models emerging. pro is faster, con is smaller context, ymmv is if trained on that language &/or context large enough to ICL lang booster info
- exploit known LTL tricks that may work there’s bunch of these
- e.g., tell model to gen drafts in some sort RPN variant of lang, if tests tell it to simulate creating such a fork of this and then gen clean standard form at end
- have it be explicit about leapfrogging recall and reasoning, eg be excessively verbose with comments can regex strip later
- have it build a stack / combo of the RPN & COT & bootstrapping its own ICL
- exploit causal markers - think tags that can splinter time - this can really boost any of the above methods - eg give each instance of things disjoint time tags, A1 vs K37 for numbered instances of things that share a given space - like a time GUID
- use orthogonal groups of such tags to splinter time and space recall and reasoning in model, to include seemingly naive things like pass 1 etc
- our recent arXiv paper on HDRAM / hypertokens pushes causal markers to classic-quantum holographic extreme and was built for this, next version will be more accessible
- the motivators are simple - models fork on prefix-free modulo embedding noise, so the more you make prefix-free, the better the performance, there’s some massive caveats on how to do this perfectly which is exactly our precise work - think 2x to 10x gain on model and similar on reasoning, again ymmv as we update preprint, post second paper that makes baseline better, prep git release etc to make it tons easier to get better recall and exploit same to get better reasoning by making it possible for any model to do the equivalent of arbitrary RPN
- our future state is exactly this a prompt compiler for exactly this use case - explainable time-independent computation in any model
Incidentally, I've had the same thing too with Lisps on both o-series and smaller Claude models - always a mismatched paren or two.
Another quirk inserting random whitespace when generating code. Seem to be tokens for different lengths of whitespace
don't plan on it staying that way. I used to toss wads of my own forth-like language into LLMs to see what kinds of horrible failure modes the latest model would have in parsing and generating such code.
at first they were hilariously bad, then just bad, then kind of okay, and now anthropic's claude4opus reads and writes it just fine.
How much incontext documentation for your language are you giving it, or does it just figure it out?
it varied. with the earlier models, generally more, trying to see if some apparition of mechanical understanding would eventually click into place.
IIRC, none of the gpt3 models did well with forth-like syntax. gpt4 generally did okay with it but could still get itself confused. claude4opus doesn't seem to have any trouble with it at all, and is happy to pick up the structures contextually, without explicit documentation of any sort.
another of my languages uses some parse transforming 'syntactic operators' that earlier models could never quite fully 'get', even with explanation. likely because at least one of them has no similar operator in popular languages. claude4opus, however, seems to infer them decently enough, and a single transform example is sufficient for it to generalize that understanding to the rest of the code it sees.
so far, claude has proved to be quite an impressive set of weights.
That is excellent, I am also using it to prototype designing languages and 3.7 and 4.0 models are really quite good for this. I haven't found substantial academic research in using LLMs for making prototype language compilers.
"Claude is aware of that, but it struggled to write correct code based on those rules"
It's actually not, and unless they in some way run a rule engine on top of their LLM SaaS stuff it seems far fetched to believe it adheres to rule sets in any way.
Local models confuse Python, Elixir, PHP and Bash when I've tried to use them for coding. They seem more stable for JS, but sometimes they slip out of that too.
Seems pretty contrived and desperate to invent transpilers from quasi-Python to other languages to try and find a software development use for LLM SaaS. Warnings about Lisp macros and other code rewrite tools ought to apply here as well. Plus, of course, the loss of 'notation as a tool of thought'.
If your model is getting confused by python, its a bad model. Python is routinely the best language for all major models.
I don't know what counts as a major model. Relevant to this, I've dabbled with Gemma, Qwen, Mistral, Llama, Granite and Phi models, mostly 3-14b varieties but also some larger ones on CPU on a machine that has 64 GB RAM.
I think the issue there is those smaller versions of those models. I regularly use Gemma3 and Qwen3 for programming without issue but in the 27b-32b range. Going smaller than that generally yields garbage.
I've tried 24-32b sizes as well and besides being even slower they were also unreliable.
This is, in part, one of the reasons why I am interested in the emerging diffusion based text generation models.
R has right assigment `1 -> x` LLMs seem to enjoy it a bit too much.
I can write code right-to-left, I simply choose to not do it.
no, it wasn't your choice how you were taught to read and write something like this:
1|2*3>>4+5
in C and k, this expression should hopefully evaluate to 1, but this is just a lucky coincidence: reading and writing these two expressions are wildly different in complexity in those two languages. if you're not sure what i mean, ask your local LLM to explain why that is, but make sure you're sitting down. what you'll discover is that what you think you "simply chose to do" is not what you're actually doing.
while it is true that you can write code anyway you deem fit, i'm afraid you're a bit confused about the actual direction you're forced to think you chose to write it.
but once you're there, it suddenly gets a lot less complicated, and - miraculosly - doesn't cancel out or mess up your previous beliefs and habits.
k/q, of apl heritage, are beautiful - first and foremost because they're simple to write and simple to read.
It's not because of the left of right evaluation. If the difference was that simple, most humans, let alone LLMs, wouldn't struggle with picking up q when they come from the common languages.
Usually when someone solves problems with q, they don't use the way one would for Python/Java/C/C++/C#/etc.
This is probably a poor example, if I asked someone to write a function to create an nxn identity matrix for a given number the non-q solution would probably involve some kind of nested loop that checks if i==j and assigns 1, otherwise assigns 0.
In q you'd still check equivalence, but instead of looping, you generate a list of numbers as long as the given dimension and then compare each item of the list to itself:
An LLM that's been so heavily trained on an imperative style will likely struggle to solve similar (and often more complex) problems in a standard q manner.Or, even better, also from the cookbook: {(2#x)#1,x#0} But this really borders on obfuscation :P
A human can deal with right-to-left evaluation by moving the cursor around to write in that direction. An LLM can’t do that on its own. A human given an editor that can only append would struggle too.
Idea: feed the language model the parse tree instead of the textual sequence.
Might help. You could also allow it to output edits instead of just a sequence. Probably have to train it on edits to make that work well, and the training data might be tricky to obtain.
Ordering issues can be overcome by allowing the model to think in one direction and then reverse the output once it has created it.
How do they do with lisps?
(en passant, k is arguably more lispy than some lisps. for a lisp guy, the first cultural shock is usually the absence of 99% of superfluous parens)
as for LLM copilots and the quality of their lisp: why you'd expect them to excel in lisp better than they lisp in excel, pardon the pun?
I was curious because there is a much larger training corpus for Lisps, so if the problem really is one of training data rather ordering then this would be a way of showing that.
curiously enough,
this thread made me revisit some past conversations with people like atw, nsl and aab with regard to possible ways to expose humans to the way rivers flow in k/q/apl land. the choices are limited, and decision takes some agony:
a) if you don't want your audience to close the tab right away, you'd say "a k expression is written, read and evaluated strictly right to left unless the precedence is explicitly overridden by parens, and this works better than you think, no worries, you'll come around. by the way, parens are evil, avoid them if you can".
b) if your intent is to retain a sharper crowd who went to yale or something, you'd say "a k expression is to be understood right of left", and throw them a freebie in form of a prompt for their local LLM in order to get lit. the magic sequence is just "f g h x leibniz".
for my own selfish reasons, i always chose the former, and it seems to perform better than the latter, proof:
https://github.com/kparc/ksimple
https://github.com/kparc/kcc
still, neither approach is anywhere near the chances of successfuly explaining which way to write python code to a 5yo kid, especially its precedence rules, which are much more intuitive (lol).
to explain the same thing to an LLM is not much different, really. all you need to do is to depress your 0yo kid with an obscene amount of _quality_ python code, of which there is no shortage. obviously, the more python code is fed to LLMs, the more humans will paste more LLM-generated python code, to be fed back to LLMs, ad lemniscate.
(and don't mind the future tense, we are already there)
============
so this is why LLMs can't write k/q/apl. first, they haven't seen enough of it. second, they are helpless to understand the meaning of a quote which was once chosen to helm a book known as SICP, not to mention countless human counterparts who came across it earlier, to the same effect:
"I think that it's extraordinarily important that we in computer science keep fun in computing. When it started out it was an awful lot of fun. Of course the paying customers got shafted every now and then and after a while we began to take their complaints seriously. We began to feel as if we really were responsible for the successful error-free perfect use of these machines. I don’t think we are. I think we're responsible for stretching them setting them off in new directions and keeping fun in the house. I hope the field of computer science never loses its sense of fun. Above all I hope we don’t become missionaries. Don't feel as if you're Bible salesmen. The world has too many of those already. What you know about computing other people will learn. Don’t feel as if the key to successful computing is only in your hands. What's in your hands I think and hope is intelligence: the ability to see the machine as more than when you were first led up to it that you can make it more."
― Alan J. Perlis
Those github links are so cool, thanks for sharing! :)
thanks, and you're welcome.
i hope you'll find them useful.
LLMs are already solving this problem using the "thinking" phase. They don't just one-shot an attempt at the output. The left-to-right narrative thinking process edits multiple drafts of the code they eventually output.
Same reason the same models don't fundamentally understand all languages. They're not trained to. Frankly the design changes to get this to work in training is minimal but this isn't the way English works so expect most of the corporate LLM to struggle because that's where the interest and money is.
Give it time until we have true globally multi lingual models for superior context awareness.
A byte tokenized model is naturally 100% multi-lingual in all languages in its data set. There just isn't a lot of reason for teams to spend the extra training time to build that sort of model.
[dead]
[dead]
[dead]
I read the other day here that the new Apple AI can write out-of-order. Maybe it can do this.
Languages that are difficult for LLM to read & write are also difficult for the general public. These languages have always had poor uptake and never reach critical mass, or are eventually replaced by better languages.
Language designers would be smart to recognize this fact and favor making their languages more LLM friendly. This should also make them more human friendly.
I actually think Ruby on Rails is incredibly difficult for LLMs to write because of how many implicit "global state" things occur. I'm always surprised how productive people are with it, but people are productive with it for sure.
That's because global state is very convenient early on. Everything is in one place and accessible. It's convenient to prototype things this way. This is very similar to doing scientific research (and why often research code is an ugly boondoggle).
Most techies (generalizing here) start with a reasonably clear spec that needs to be implemented and they can focus on how to architect the code.
Research - whether science, finance or design - is much more iterative and freeform. Your objective is often very fuzzy. You might have a vague idea what you want, but having to think about code structure is annoying and orthogonal to your actual purpose.
This is why languages like Ruby work well for certain purposes. They allow the person to prototype extremely rapidly and iterate on the idea. It will eventually reach a breaking point where global state starts being an impediment, but an experienced dev will have started refactoring stuff earlier than that as various parts of the implementation becomes stable.
I don't find this to be true. There are languages that are difficult to wrap your head around initially, but that turn out to be delightfully productive with a few adjustments to the mental model. Adjustments that LLMs don't have the training data for.
That says nothing about the language at all, actually. Just that it's small and easily confused for something more idiomatic to a newbie.
> Adjustments that LLMs don't have the training data for.
Methinks if you want job-security in a post—LLM-zero-shot-app-generator world, get into Lisp or Haskell; People that know only Node+React from YouTube learn-2-code tutorials are going to be slaughtered.
I just had an idea: an app/GUI/backend framework for Lisp or Haskell (with an S-expression lib) where everything is structurally inverted so it must be manually run through foldr - behold: an LLM-resistant (if not LLM-proof?) dev environment!
This argument in favour of mediocrity and catering to the lowest common denominator is one of the key reasons why I dislike people who want to shove LLMs into everything (including art).
[dead]