This is a good listing, but what is the best one for a complete newb in this area. One for someone coming from a higher level background (VB6,C#,Java,Python) - not to familiar with C or C++. I'm much more interested in hand-written parsing versus Lex/Yacc at this stage.
If I had just majored in Computer Science instead of Psychology I might have taken a class on this in college. Oh well.
-
Please have a look at: learning to write a compiler
Also interesting:
- how to write a programming language
- parsing where can i learn about it
- learning resources on parsers interpreters and compilers (ok you already mentioned this one.
And there are more on the topic. But I can give a short introduction:
The first step is the lexical analysis. A stream of characters is translated into a stream of tokens. Tokens can be simple like == <= + - (etc) and they can be complex like identifiers and numbers. If you like I can elaborate on this.
The next step is to translate the tokenstream into a syntaxtree or an other representation. This is called the parsing step.
Before you can create a parser, you need to write the grammar. For example we create an expression parser:
Tokens
addOp = '+' | '-'; mulOp = '*' | '/'; parLeft = '('; parRight = ')'; number = digit, {digit}; digit = '0'..'9'; Each token can have different representations: + and = are both addOp and 23 6643 and 223322 are all numbers.
The language
exp = term | exp, addOp, term; // an expression is a series of terms separated by addOps. term = factor | term, mulOp, factor; // a term is a series of factors separated by mulOps factor = addOp, factor | parLeft, exp, parRight | number; // a factor can be an addOp followed by another factor, // an expression enclosed in parentheses or a number.
The lexer
We create a state engine that walks through the char stream, creating a token
s00 '+', '-' -> s01 // if a + or - is found, read it and go to state s01. '*', '/' -> s02 '(' -> s03 ')' -> s04 '0'..'9' -> s05 whitespace -> ignore and retry // if a whitespace is found ignore it else ERROR // sorry but we don't recognize this character in this state. s01 found TOKEN addOp // ok we have found an addOp, stop reading and return token s02 found TOKEN mulOp s03 found TOKEN parLeft s04 found TOKEN parRight s05 '0'..'9' -> s05 // as long as we find digits, keep collecting them else found number // last digit read, we have a number
Parser
It is now time to create a simple parser/evaluator. This is complete in code. Normally they are created uusing tables. But we keep it simple. Read the tokens and calculate the result.
ParseExp temp = ParseTerm // start by reading a term while token = addOp do // as long as we read an addop keep reading terms if token('+') then temp = temp + ParseTerm // + so we add the term if token('-') then temp = temp - ParseTerm // - so we subtract the term od return temp // we are done with the expression ParseTerm temp = ParseFactor while token = mulOp do if token('*') then temp = temp * ParseFactor if token('/') then temp = temp / ParseFactor od return temp ParseFactor if token = addOp then if token('-') then return - ParseFactor // yes we can have a lot of these if token('+') then return ParseFactor else if token = parLeft then return ParseExpression if not token = parRight then ERROR else if token = number then return EvaluateNumber // use magic to translate a string into a number
This was a simple example. In real examples you will see that error handling is a big part of th
tyndall : Thanks. This is great stuff. I guess a few of the stages were confusing me. Now I can go read up more on each stage. -
If you're a complete n00b, the most accessible resource (in both senses of the term) is probably Jack Crenshaw's tutorial. It's nowhere near comprehensive but for getting started, I can't think of anything close except for books that are long out of print.
0 comments:
Post a Comment