points by panic 9 years ago

Here's a recursive descent trick worth mentioning: instead of decomposing expressions into levels called things like "term" and "factor" by the precedence of the operators involved, you can do it all using a while loop:

    function parseExpressionAtPrecedence(currentPrecedence) {
      expr = parseExpressionAtom()
      while op = parseOperator() && op.precedence < currentPrecedence {
        if op.rightAssociative {
          b = parseExpressionAtPrecedence(op.precedence)
        } else {
          b = parseExpressionAtPrecedence(op.precedence + 1)
        }
        expr = OperatorExpression(op, expr, b)
      }
      return expr
    }

The parseExpressionAtom function handles literals, expressions in parentheses, and so on. The idea is to keep pushing more operators on to the end of an expression until a higher-precedence operator appears and you can't any more. This technique (called precedence climbing) makes parsing these sorts of arithmetic expressions a lot less painful.

chubot 9 years ago

Yes, and if you need more expressiveness for the ternary operator and so forth, you will want to structure it as mutually recursive functions, like Pratt parsing.

See "Pratt Parsing and Precedence Climbing Are the Same Algorithm": http://www.oilshell.org/blog/2016/11/01.html

The difference is really about how you structure your code. If you have a single recursive procedure like the above, it looks like "precedence climbing". But if you have a set of mutually recursive procedures, it looks more like Pratt parsing. But they are the same algorithm, or you can say "precedence climbing" is just a special case of Pratt parsing.

munificent 9 years ago

Yes! Using explicit functions for each precedence level is a little tedious. It's also really simple and fairly common, though, so I thought it was a good gentle way to ease people into parsing.

In part III of the book, when we write a second interpreter in C, we use a Pratt parser for expressions and it's a lot less boilerplate-heavy.

  • asrp 9 years ago

    By going to the other extreme, you can use only precedence climbing (with just a grammar) by using Floyd's algorithm, which will treat all tokens as operators. It uses two precedences functions instead of one, a left and right precedence.

    https://en.wikipedia.org/wiki/Operator-precedence_grammar

userbinator 9 years ago

As a long-time C programmer, I am immediately urged to consider the simplification

    b = parseExpressionAtPrecedence(op.precedence + !op.rightAssociative)

...and IMHO those variable names are a bit longer than useful --- expr(prec), atom(), and op() would be my choice, as it shows the structure more clearly, which I think is definitely the most important thing to see here: it's a loop with a recursive call inside it, and an ending condition based on the precedence.

This technique (called precedence climbing) makes parsing these sorts of arithmetic expressions a lot less painful.

This article is a more detailed explanation of this technique: https://www.engr.mun.ca/~theo/Misc/exp_parsing.htm#climbing

It also naturally leads to being table-driven, whereby adding/removing operators and adjusting their precedences becomes extremely easy.