An ANTLR Book

time to read 1 min | 97 words

A book about ANTLR is something that I have long lusted after. It is something that I keep planning to learn, but the learning curve is too high to do this informally. Looks like I am getting my wish, The Definitive ANTLR Reference is in beta book now.

ANTLR is a compiler compiler, which makes it a great tool to write languages (Boo's parser is written in ANTLR, for instance). And although I would generally use Boo for DSL, it is important to me to understand the parser as well.

Tweet Share Share 6 comments

Tags:

Books

Comments

08 Mar 2007
00:32 AM

Alex Henderson

That's great news - I've had a few goes at getting to grips with ANTLR, but as you say, it's not something you can just pick up casually...

08 Mar 2007
09:11 AM

Frans Bouma

Isn't what you need just an understanding of EBNF and how to write an LR(n) parser? Or if you don't need LR(n), just use LL(n) ?

It fits on a single sheet of paper :)

08 Mar 2007
09:22 AM

Ayende Rahien

Frans, probably.

I don't know them.

08 Mar 2007
15:56 PM

Frans Bouma

ENBF is the notation of the syntaxis:

Nonterminal:: Terminal|Nonterminal

etc.

It's pretty simple so once you grasp that it's easy to produce a syntax. See:

http://en.wikipedia.org/wiki/Extended_Backus_Naur_Form

LR(n) is a parsing method where you use a stack and if a syntax rule is fullfilled with the items on the stack, you replace that set of items with the non-terminal of that rule.

E.g. when you have:

Statement :: ( "Foo" ) ;

and you have this on the stack:

(

"Foo"

)

;

you can 'reduce' these 4 elements to 'Statement' and the whole process starts over. LR(n) parsers are using tables, so you have a generic parser engine and a set of tables which are specific for a language. You also provide a set of routines, for every rule one routine. So when a reduce takes place (like in my example) the rule handler for that rule gets the elements reduced and thus can for example emit something based on the tokens passed in.

The 'n' is for the # of items to look ahead before deciding what to do. This can solve ambiguistic elements in your language. Most of the time n=1

LL(n) is a different parsing method. There you simply scan a token and call the handler for the token and that handler then checks what the next token is, if that's an expected token it continues otherwise it gives up and control starts with the main routine.

ANTLR, as the name implies, generates an LR(n) parsing environment for you, so you provide the EBNF syntax and the handlers. The syntax is used to generate the tables and the handlers are provided by you so it will form the parsing code together with the antlr parsing engine.

Not every syntax is parsable by an LR(n) parser. For example UBB forum syntax is hard to do without conflicts as the language is ambiguistic. You then can better write an LL(n) parser, which is straight forward but more work as you can't generate it from syntax most of the time.

Nice to see more interest in DSL's. I think a combination of DSL's inside a single program is the future, and in that light I find it stupid MS didn't create a DSL aware environment in .NET 3.5 where Linq is a DSL inside another language like C#. It then would be possible to create other DSL's as well which could be embedded inside C# code.

Oh well... :)

08 Mar 2007
15:59 PM

Frans Bouma

Btw, if you want an example of an LL(1) parser, my LL(1) parser for UBB in C# can be found here in the HnD sourcecode:

http://www.llblgen.com/HnD

08 Mar 2007
16:11 PM

Ayende Rahien

Frans,

Thanks for the detailed explanation.

I am interested in parsing because I feel it is something that I miss.

About DSL, I certainly agree that we are going to see a lot more of those in the future.

Although I think that languages such as boo, which are open to extendability, but already do handle a lot of the complexity for you are the way to go.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB

An ANTLR Book

Comments

Comment preview

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed

Oren Eini

CEO of RavenDB

Comments

Comment preview

Markdown formatting

Phrase Emphasis

Links

Images

Headers

Lists

Blockquotes

Horizontal Rules

Manual Line Breaks

Fenced Code Blocks

Header IDs

Tables

Definition Lists

Footnotes

Abbreviations

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication