Parsing

Posted by Rusky on Feb. 26, 2011, 11:07 a.m.

If the "right way" is to use tools like Lex/Yacc, Flex/Bison or Antlr, why don't any real compilers do it that way?

The arguments I always see for using a parser generator are that "it makes it easier to change/scale up your language later if you need to," "don't reinvent the wheel/have NIH syndrome," etc. But parser generators have several problems:

1) They are slower because they use parsing tables, which are rather like interpreting your specification rather than compiling it.

2) Building a project that uses a parser generator is more complicated because you have to generate the parser and then compile the generated code.

3) Good error messages and recovery are almost impossible with parser generators, again due to their use of parsing tables.

4) Parser generators cannot parse as much as hand-written, recursive descent/parser combinator parsers (i.e. they can't parse C++ correctly).

While reason 4 isn't necessarily a good reason for a language that's not as stupidly-designed syntax-wise as C++, parser generators are still obviously not a good idea for anything but the most trivial of projects. However, using one of the same arguments often used for parser generators makes them sound like a bad idea even then: if your language ever grows you'll want a hand-written parser to provide error messages and good performance.

Parsing, especially for large projects and interpreted languages (which involves pretty much every language), needs to be fast. The faster you can go from written to running code, the better, because it makes iteration easier.

Error messages are just as important, both for debugging the language you're parsing and for debugging the parser.

So if anyone ever tells you (or you start to tell someone) to use a parser generator, just say no.

Comments

ludamad 13 years, 9 months ago

Yeah. I've hand-written many a parser. I much prefer it - I don't want auto-generated code admist real code.