Here's a recursive descent trick worth mentioning: instead of decomposing expressions into levels called things like "term" and "factor" by the precedence of the operators involved, you can do it all using a while loop:
function parseExpressionAtPrecedence(currentPrecedence) {
expr = parseExpressionAtom()
while op = parseOperator() && op.precedence < currentPrecedence {
if op.rightAssociative {
b = parseExpressionAtPrecedence(op.precedence)
} else {
b = parseExpressionAtPrecedence(op.precedence + 1)
}
expr = OperatorExpression(op, expr, b)
}
return expr
}
The parseExpressionAtom function handles literals, expressions in parentheses, and so on. The idea is to keep pushing more operators on to the end of an expression until a higher-precedence operator appears and you can't any more. This technique (called precedence climbing) makes parsing these sorts of arithmetic expressions a lot less painful.
Yes, and if you need more expressiveness for the ternary operator and so forth, you will want to structure it as mutually recursive functions, like Pratt parsing.
See "Pratt Parsing and Precedence Climbing Are the Same Algorithm": http://www.oilshell.org/blog/2016/11/01.html
The difference is really about how you structure your code. If you have a single recursive procedure like the above, it looks like "precedence climbing". But if you have a set of mutually recursive procedures, it looks more like Pratt parsing. But they are the same algorithm, or you can say "precedence climbing" is just a special case of Pratt parsing.
Yes! Using explicit functions for each precedence level is a little tedious. It's also really simple and fairly common, though, so I thought it was a good gentle way to ease people into parsing.
In part III of the book, when we write a second interpreter in C, we use a Pratt parser for expressions and it's a lot less boilerplate-heavy.
By going to the other extreme, you can use only precedence climbing (with just a grammar) by using Floyd's algorithm, which will treat all tokens as operators. It uses two precedences functions instead of one, a left and right precedence.
https://en.wikipedia.org/wiki/Operator-precedence_grammar
This is very close to the way Babylon (Babel's JS parser) works:
https://github.com/babel/babylon/blob/7.0/src/parser/express...
If anyone in SF is interested in hacking babylon to build a superset of JS, I'm happy to get coffee and share what I've learned building LightScript [0]. It's been a lot of fun.
[0] http://lightscript.org
As a long-time C programmer, I am immediately urged to consider the simplification
...and IMHO those variable names are a bit longer than useful --- expr(prec), atom(), and op() would be my choice, as it shows the structure more clearly, which I think is definitely the most important thing to see here: it's a loop with a recursive call inside it, and an ending condition based on the precedence.
This technique (called precedence climbing) makes parsing these sorts of arithmetic expressions a lot less painful.
This article is a more detailed explanation of this technique: https://www.engr.mun.ca/~theo/Misc/exp_parsing.htm#climbing
It also naturally leads to being table-driven, whereby adding/removing operators and adjusting their precedences becomes extremely easy.