Compilers Lecture 22: Syntax-Directed Translation (1)


And now we will start using
the parsing techniques as we have learned to actually
do the abstract syntax tree. Remember from the
earlier lectures, we talked about the different
stages in a compiler, we have the front end and the
optimizer and the back end. And we said that the front
end takes the high level representation, the source code,
and the high level language. And the output of the
front end is what? Yeah the abstract syntax. But today we will learn how
we can construct the abstract syntax tree based on the
parsing techniques that we have. So, this is the familiar
classic expression grammar with the assignment. So, I have also added
the assignment to it. And let’s now do the
parsing for this example. And then see the corresponding
abstract syntax tree. So, let’s do the parse for this. So, this is, you know, an
assignment so it’s a goal, and the goal is an assignment. And this assignment
is an identifier. Identifier equals
expression with semicolon and then this expression
is what? The expression is this and
this expression consists of how many terms? Two terms, right. Everybody was good at
doing the derivation on the test, by the way. So, everybody, almost everyone
in the class did this, you know, the derivation did correctly. So, it’s expression
— oops — plus term. And in this case, the
expression is what? Is a term, right? And this term is
term times factor. Okay. And this term is a factor, and this factor is an
identifier which is what?>>The A.>>The A. So, this
identifier here is the X, this identifier is the A
and this is times factor and the factor is — this
factor is not an identifier. This factor is?>>A number.>>A number and this
number is the three. And this term is a factor. And this factor is an
identifier which is a B. Okay. So, so far nothing new, right? So, we know how to write this
parse tree or derivation tree. So, this is the parse tree. Or the derivation tree
for this expression. What we want is not
the parse tree, we want the abstract
syntax tree. And what’s the abstract
syntax tree conceptually? The abstract syntax tree is
basically this parse tree. What would — only the
operations that matter, you know, for code generation
purposes or the operations that correspond to real actions. So, we would like to keep
the nodes that correspond to real actions and get rid
of any node that corresponds to just grammatical
substitutions, you know, substituting a grammatical
symbol name for another. For example, you know, this edge
here, we are here, you know, this is just a substitution. Here there’s no action
between term and factor. So, you know, whether you look
at it bottom-up or top-down. If you look at it bottom-up,
then this factor is reduced to a term or if you look
at it as the top-down, this term is expanded
into a factor. But this is just a
matter of naming. There is no actual
operation that is happening. What are the actual operations? Actual operations are the
arithmetic operations multiply, add. Okay. And there are
other operations.>>Equal, the assignment.>>This is the assignment
operation. Okay. And what else? How about the leaves? So, these leaves, the
identifiers, and the numbers. Now, we should start thinking
in terms of code generation. So, in terms of code
generation, this would result in generating some code
because this is a variable. Now, this variable in order
to use it in this expression, what do we have to do, what kind
of code do we have to generate?>>A memory entrance>>Yeah, it’s exactly. So, we have to bring it from memory possibly
using a load instruction to bring it from memory. So, these leaves are
also interesting. So, in terms of constructing,
since our objective is to construct a data
structure that will help us in code generation, these are
the nodes that we care about. And everything else can
be abstracted away, right. So, we can, you know,
our abstract syntax tree in this case is going
to look like this. There is an assignment node and this assignment node
assigns this identifier. So, this is an identifier node,
assigns it to an expression but this expression is just the
name, in this case, you know, this expression is the add of
an expression and the term. So, the actual operation
here is the add. So, we are — what
does this mean? It means that this assignment
is assigning the result of this add to this variable. It’s an assignment
that assigns the result of the add operation
to this variable. And now where does this
add operation come from? It adds two things,
it adds the result of the multiplication
to this identifier. Oops. The result of
the multiplication to that identifier — okay. Let me just move this. I’m going to just tree draw it. So, I’m not going to change it. I will redraw it because
I need this space here. Okay. So, I’m going to do
assignment and then identifier, add multiply identifier. And what does this
add, this multiply — it multiplies the
identifier with the number. You know, as we will see in
the next lectures, you know, when we have an abstract syntax
tree, so this is abstract because we abstract it away. We’ve got rid of all the
nodes that correspond to grammatical symbols. In terms of code generation,
after we do the parsing, after the parsing is
done, we no longer care about grammatical symbols,
we care about what this means in terms of actual operations because we want to
generate code. And generating code here in this
case can be done as we will see in a straightforward manner. So, for this, you know, identifier if we put the names
here, X, A, 3, and B. So, the corresponding code, you
know, for this identifier node, there would be a
load so we’ll — you know, we will be
loading X into some register. And then –>>Why?>>Sorry, loading —
sorry, loading A. I’m sorry. Loading A. By the
way, so this is where the actual code
generation will start. Loading A into a register
then for this number, we do a load immediate. Load immediate of three
into another register. Then, you know, this node
— what do we do here?>>Multiply.>>Multiply it. Multiply what with what? Exactly. So, it’s like, you
know, this is this gets loaded and the result is placed
in R1, this gets loaded and the result is placed in R2. Now, this multiply is going
to multiply R1 and R2. And it’s going to put the result
in some other register R3. So, the result of the
multiply is going to be in R3. What’s the next operation?>>Load B.>>Yeah, exactly.>>Because we loaded in R1.>>Yeah. Loading B.
I’ll put it into R4. So, B is going to be in R4. Now, what’s the next operation?>>Add.>>Adding R3 and R4. And we put the result
in a new register R5. By the way, we assume that
we have an infinite number of registers in this one. So, you assume that you
have an infinite number of registers then you
leave mapping this to physical registers,
you leave it for — you leave it to which
stage in the compiler?>>Register allocation.>>Register allocation stage. Yes. So, now we are adding these
and now what should be the next?>>Store it in a
memory location.>>So, this is the store. The assignment is just a store. So, you are storing R5. So, this is going to be an R5. So, you store R5 into X. Okay. So, this abstract
syntax tree, you know, it can be nicely used
to generate code. And by the way, what
algorithm did we apply to generate this code? What graph algorithm
did we apply?>>Depth-first.>>Yeah, it’s depth-first. And is it post-order
or pre-order?>>Post-order.>>Pre-order, I think.>>Where did we start
processing, from the leaves or
from the root?>>From the leaves.>>Yeah, we started
from the leaves, right. So, it’s post-order. So, we process this and this,
then we process the parent. So, the parent is processed after the children
which is post-order. So, with just a simple
post-order DFS, you can go from an abstract
syntax tree into, you know, abstract assembly code. Okay. But then the
question is not to get from an abstract syntax tree
to this code, the question is, how can we get from, you know,
this grammar or this parse tree into this abstract syntax tree. The point here — you know, the
key idea is to look at the rules in this grammar and identify
the rules that correspond to the operations
that we care about. And whenever a reduction
happens using that rule, we create a node in the tree. So, if we do this,
if we look at this, now here we have an identifier. So, we’ll just create
an identifier node. Create… Or make — let’s use make. Make an identifier node. And here we make a number node. But here, what happens here? Should we make a node or not?>>For which one?>>For this parenthesis
expression. So, in order to see
this, you know, whether we should
create a node or not, if we have something
like, you know, A times — what if we had, you
know, parenthesis here? If we had parenthesis, then, you
know, what we want here is — we want the assign and
we are assigning — we are assigning
the expression to X but this expression
is in this case what?>>Multiply first.>>Yeah, the multiply. The multiplication
of the identifier and this is another expression. So, it’s the add of
number and identifier. So, in fact, you know, with
this parenthesis expression, we did not create a new node. So, this is basically taking the
node that already corresponds to this expression to
whatever this expression — you know, the tree, the abstract
syntax tree that corresponds to this expression, we
take the root of that tree and make it this — you
know, so this is the tree that I’m referring to. So, it’s — you know, our
expression here is this. So, this is the expression. The expression between
parentheses. So, we are taking the tree
that represent this expression and now the root of this
tree becomes, you know, the right operand
for the multiplier. So, when we have parenthesis, we are not actually
creating a new node. Okay. And remember what to keep
in mind here, what’s important to keep in mind is that
this thing is recursive. So, this process of,
you know, parsing, the grammar is recursive
and the abstract syntax tree is recursive. So, you know, this can be —
when you have a rule like this, parenthesis expression, this expression can be just
the root of a whole big tree. So, this can be an
arbitrarily big expression. So, it can have, you
know, its own tree. So, what we’re going
to say here, we’re just basically
linking trees. So, when it comes to
something like this, we are just linking trees. So, we are not making
a new node here. And here, when we are making
this substitution, you know, factor goes to term,
we’re just — we are not creating
any new operations. But the new operation is here. So, we are making a divide, no. Making… Making a divide node. And here we are making
a multiply node. Here are we making
any new nodes? So, basically these rules that
correspond to substitutions, only substitutions
without actual operations. Basically these are the rules
that are not going to appear in the abstract syntax tree. Remember the abstract
syntax tree is going to have the actual
operations and it’s going to eliminate anything
that corresponds to grammatical substitutions,
you know, substituting one
symbol for another. So, if this rule here is
just saying, you know, substitute factor for term
or expand term into factor. So, it’s not doing
any actual operation. This is doing an
actual operation. Make a subtract node and
this is make — what? An add node. And in this assignment,
what are we making?>>The assignment node.>>Making an assignment node. But now, how do we
make these nodes? So, making a node is
basically, you know, a node in the tree
will have two children. So, a tree node has
two children. So, I’m making this
identifier node or number node. This is trivial because
this is a leaf node. So, this is just a
leaf node so basically, we are creating a node and
in this node, you know — think of it as an object and
this object has in it, you know, some type, which is identifier,
this is an identifier. And then it must have the
actual name of that variable. Or what is more important
than the name of the variable is what? When we generate
code we don’t care about the name, we care about –>>The memory location.>>Yeah, the memory
location or, you know, the offset of that variable. So, in fact, you know,
the offset is necessary, it’s necessary to keep the
offset of that variable but keeping the name which
is symbolic information is not necessary. It will be useful, as we
said in the earlier lectures, keeping the name or the
symbolic information, keeping that in our data
structures will be useful for what?>>Debugging.>>For debugging. So, if we assume that we
don’t care about debugging, the essential thing that we
need to keep is only a number, you know, the offset that
corresponds to this variable. So, we no longer need to
remember that it’s called X or Y or Z or whatever, you know, fancy name the programmer
gave it. For us, it’s a number,
it’s a location in memory. Okay. So, this is
for these leaf nodes. Now these are not leaf
nodes, these are nodes that represent operations. So, let’s think about this. You know, making
a multiply node. So, what do we need
to do in order to create this multiply node? So, suppose now we
created the two leaf nodes for identifier and number. That’s easy. You know something, objects
that have some information about this variable and
information about this integer which is the value
of this integer. Then as I’m creating these
leaf nodes, what do we need in order to create this node?>>Nodes that connect to it.>>Yeah, exactly. So, a node has the left
side and the right side. So, this is a binary operation,
multiply is a binary operation. So, it needs two children,
two operands, the left operand and the right operand. So, this is the left
and the right. So, it needs to know in order
to do this multiply node, we need to know, we need to have
a pointer to the left operand and the right operand. In order to link them
together, in order to say that this node has this
identifier as its left child and this number as
its right child. Now, how do we do this? We need a mechanism for
passing variables, you know, passing certain attributes
with data passage created with the symbols
in this grammar. And the way we do
this using the — we will be using
the Yacc notation. So, what is Yacc? Yacc is a parse generator. So, it takes a grammar
that you write. So, you write a grammar and
you write actions like these. Of course, you know, these actions they’re
still incomplete, we haven’t completed them so they still don’t
make much sense. So, it takes a grammar and then
action associated with each rule in the grammar and using
this, it generates C code that does the parsing for you. So, it does all of this NR1,
remember all of this NR1 stuff, you know, the table,
you know, doing — creating the sates and then
generating the action table and the go-to table. All of that stuff
that you have enjoyed in homework number three. So, that pleasurable stuff. So, when you use the parse
generator, you don’t get to enjoy it because the parse
generator will do it for you. So, it will do all
of that work for you, it will do all the creation
of the tables and everything. And it will just
give you C code. So, all that you need to
do is to write the grammar and then write these
actions that are associated with the grammar rules. And, in fact, the main objective of today’s lecture is
understanding these actions, what they mean and what we need
to write in order to create or build this abstract
syntax tree. Okay. So, by the way, this
Yacc — so, that was the first, you know, parse generator
program. Does anyone know
what it stands for? Yeah.>>Yet Another
Compiler-Compiler.>>It stands for Yet
Another Compiler-Compiler. Because it’s a
compiler-compiler. It itself is a compiler
and you give it the grammar and it generates a parse for
you and that parse becomes part of the compiler that
you are building. Okay. So, it’s Yet
Another Compiler-Compiler but we will be using Bison
which is a more recent version of this, you know,
parse generator. Now, using Yacc and Bison,
we’ll be using Bison. So, there is a notation
that they use for each rule in the grammar, they use the
symbol dollar sign dollar sign for the left-hand side and for
all the symbols that appear on the right-hand side, they
use the symbol dollar sign and the number of the symbol. So, for example, in this rule, so this is the dollar
sign dollar sign, this is the left-hand side,
and on the right-hand side, we have three symbols, two
variables, and one terminal. So, everything counts, remember. We count the symbols
to the right-hand side. Everything counts whether it’s
a terminal or a non-terminal. So, term is going
to be dollar sign 1, multiply is dollar sign 2,
and this is dollar sign 3. Okay. So, why do we need this? We need this in order
to say, “Okay, in order to create
a multiply node, I am constructing
dollar sign dollar sign.” So, this is the node for term,
the node for term is the node for this left-hand
side term multiplied by the node for factor. So, I express this as — so
I need to know dollar sign 1, and dollar sign 2 or 3?>>Three.>>Three. Two I already know that dollar sign 2 is
the multiplication sign. So, dollar sign 3 is the third
symbol which is the factor. So, you will be writing things
like this in the assignment. Okay. So, this make
node is going to be doing something like,
you know, this dollar sign, dollar sign is a node that
is made by linking whatever, you know, I have the
node for this term and the node for this factor. So, it’s going to be linking,
you know, and in this case, it will be linking the node for
term of the node for factor. But, you know, when the term
consists of a single factor, then what makes sense here
dollar sign dollar sign equals — what? So, I only have one symbol here.>>Dollar sign 3>>Dollar sign 1. So, this is just like a pass. You know, here I’m passing so
I’m not actually, you know, creating any new nodes. So, basically because this
is just a substitution. When you are substituting,
when you are renaming a factor to term or you’re doing this
grammatical substitution, you are saying that dollar sign
dollar sign is dollar sign 1. Okay. And similar
for the make divide. Of course, we will do at
least one more example on this dollar sign notation so
that everyone understands it. So, this is the first example
but there will be more examples. I’m using this, you know,
dollar sign notation but you will use it
in the assignment. In the programming assignment,
this is what you will be using in order to build a compiler. So, in this case, similarly
dollar sign 1, dollar sign 3. Now, this make multiplication,
what will it be doing? It will be doing something
like make multiply node that takes this dollar
sign 1 and dollar sign 3. While in fact, you know, these
dollar sign 1 and dollar sign 3 for this function is going to be
a regular function that takes — you know what is tricky about
this dollar sign notation is that this dollar sign can — this attribute can
have different types. So, in this case, what do
you think the right type is, given what we have
written so far? What’s the type for dollar sign?>>Integer.>>No, it’s got to be a node.>>Yeah, it’s a node. In fact, a pointer to a node. Exactly. So, this is — what? Dollar sign here is
a pointer to a node. So, if you — it’s going
to take a node pointer, node star N1, and node star N2. And it’s going to
create, you know, create N or create a new node. And then, end of left equals
N1, and end of right equals N2. And you want to return, if it’s
a pointer, then you will return, in fact, the address of N.
Sorry, the address of N. So, don’t worry about the
syntax at this point. It’s, you know, most
likely you’ll be doing this as pointers, you know,
the pointers to nodes. So, basically creating
a multiply node, you are taking a pointer to a
node and a pointer to a node. You are creating — you create
a new node that you call N and then the left-hand
side is going to be N1, the right-hand side
is going to be N2. And you are returning
the address of this node. Or if you create a new
node, you know, at address N or with node pointer N, then you
will return N. So, if you use N for the pointer, create
new node with pointer N, then you will return N.>>That’s a pointer right, the
multiplication at the node N1? Is that a pointer?>>Yeah, that’s a pointer, yeah. So, we are passing pointers. We are passing pointers to
nodes and we are linking them. So, don’t worry about
the syntax. You’ll have to worry
about the syntax when you write the
program but not now. So, focus right now
on the concept. So, the concept is that this
make multiply node is taking two node pointers and
it’s linking them. And if you keep doing this,
you will eventually, you know, the last node that you will
create is going to be what? You keep doing this?>>The leaf node.>>No, by going from
leaves to — you are going bottom-up
so the last node that you will create is
going to be the root node. So, you create the leaves,
it’s bottom-up, right. You are building
this tree bottom-up, in a bottom-up manner. And, in fact, because
we are using this with a bottom-up parse, you
are applying here the use of an NR1 parse, a
shift reduced parse. So, what we are saying here is
that whenever the parse reduces, makes a reduction using rule
number 5, whenever there’s an R5 for this function, okay, you
know, execute this action, this is an action that we
are embedding in the grammar. So, these embedded
actions are the actions that are going to
build the tree. So, we are saying whenever you
reduce by rule 5, execute this, which means link the left-hand
side and the right-hand side. Whenever you reduce by rule
number 6, execute this action. Whenever you reduce by rule
number 7, execute this action which is basically nothing. We are just passing so we are
not creating a new node here. We are saying that
when we do this, we are not creating a node, we
are just taking this identifier, which is what the factor is,
we are taking this identifier and making it this node here. So, by doing this pass
here, we are bypassing or skipping those rules, those grammatical
rules that do nothing. This is a grammatical rule
that does nothing in terms of actual code generation,
you know, it does something
in terms of parsing. Grammatically it’s meaningful,
it’s significant but in terms of code generation,
it does nothing. So, here basically we are,
you know, passing, you know, whatever this dollar sign. So, this dollar sign here
we are passing it to this. It’s going to be the same.>>So, in that case,
it would be the same — you would use that pass
notation on line 4 as well.>>On line 4, yeah, exactly. So, and on line 4, you know,
for the rest of the grammar, in fact, this is, you know, very
similar so for the make add, we are doing, you know, dollar
sign dollar sign equals make of dollar sign 1, dollar sign 3. And this is dollar sign dollar
sign equals makes a subtract node, dollar sign
1, dollar sign 3. And this is just a pass
equals dollar sign 1. And this is an assignment node. So, dollar sign dollar sign
is make assignment node. Okay. And we are assigning again
the left side is going to be 1 and the right side
is going to be 3. So, you may think that
it’s always 1 or 3 but no, it’s not always,
you know, 1 and 3. It happened to be 1 and
3 here because we have on the right-hand side,
we have three symbols, and symbol number 1 and
symbol number 3 are the ones that matter. Symbol number 2 does not matter. So, in order to show
you an example where it’s not, you
know, 1 or 3. So, if we have an if
statement, for example. So, if we have something like
if some condition then some code else some code. Okay. Now, we need to create
a node for the if, right. So, the node for the
if, you’ve got an if. How many children
will this have? What do we need to know
for an if statement?>>The condition.>>The condition, that’s
what we need to know.>>And that codes are
both true and false.>>And the code, this
code and this code, right. So, that the then code
and the else code. So, this is the, you
know, the then code, and this is the else code. This is what we need to know. Everything else, we
don’t need to know. So, in this case, we
will say that — okay. So, we have a make if that will
take three pointers, you know, node star condition,
node star then code, and node star else code. And it’s going to link them. Now, here, in order
to write the action, we will write equals
make if of what? So, dollar sign — yeah, what’s
the number for condition?>>It’s three.>>Three.>>So, you have to
count all the symbols. So, the trick is that you have to count all the symbols whether
it’s terminal or non-terminal. So, the if is dollar sign
1, this is dollar sign 2, condition is going
to be dollar sign 3. So, dollar sign 3
is the condition. Then what is dollar sign 4? Closed parenthesis. You basically have to count
every symbol whether it’s a terminal or a non-terminal. Code is going to be dollar
sign 5, else is going to be dollar sign
6, dollar sign 7. Okay. So, what are
the ones that we need? Dollar sign 3, dollar
sign 5, and dollar sign 7. So, it’s not always 1 and 3. Okay. So, this is
the logic of these, you know, dollar sign symbols. Every symbol on the
right-hand side has a number. And don’t forget the terminals
no matter how small they are. So, if this is, don’t
skip the little ones. Okay. So, this is a small
— this has a number too. Okay. So, any question on
what we have covered so far? Okay. So, let’s try
to finish this up. So, now so we have done this,
the make node, the make node. Here, what do we do here? We didn’t write the
action for this.>>Dollar sign dollar
sign equals dollar sign 2.>>Yeah, exactly. Equals dollar sign 2. Because this is dollar sign 1. This is dollar sign 2. Okay. And making
the number node. Now these are leaf nodes
so they are not dependent on other nodes. So, the parameters to
these make number node and make identifier node, they
are not going to be other nodes because these are the leaves. So, they are not
linking other nodes. You know, all internal nodes
are going to need their children so that they can
link the children. But these are the leaves. So, what do you think the
parameter that we passed to the leaf, what
should that be?>>Number itself.>>Yeah, the number itself. Here for the identifier. So, basically –>>Dollar sign 1 then?>>Is it dollar sign?>>No. Sorry. So, the leaves are different. So, what do you need in order to construct this node
for the identifier? You in fact need the token
that comes from the scanner. Because these are the ones that
are coming from the scanner. The leaves — so these are
coming from the scanner. The scanner classified
this as an identifier but there’s an actual lexeme
that comes with it or a token. So, you need the
token and the lexeme. And, of course, you
know the details, the programming details
you would deal with it when you write the code. But what you need
here is the token. So, remember the leaf
nodes are different because in the leaf nodes you
are not linking other nodes, you are just creating a node
that should have information about this identifier or number. In our example, we
only have identifiers and numbers as leaves. Okay. Alright. So, this, you know, this dollar
sign notation or this way of passing value from
the low level rules to the higher level rules. So, what we are doing here
basically we are passing the values up in a bottom-up manner. We’re passing the values from,
you know, the low level rules to the higher level rules
in the grammar until we get to the top level rule which generates the start
variable of the grammar. Okay. Now, this is how
we will be creating, building the abstract
syntax tree. So, the idea is that whenever
you reduce the parse is going to execute this action. Remember, when the
parse does a reduction. So, you have — this is — you have some state
here, a number of states. And then you have — you reduce
by something like a beta. So, if what you have
here — or you have some, well, beta would states. So, if you reduce by a rule that
says alpha is A or reduce a beta to A. Or let’s call it here
— I don’t know, beta 1, then state beta 2
state, and so forth. So, there would be
a number of states. So, beta consists
of beta 1, beta 2, all the way to beta N. Now, when you do a reduction,
or let me — okay. Let’s do this. So, you have some state and then at some point you have beta 1
then S1, beta 2, S2, beta 3, S3. So, when you are ready to make
this reduction where, you know, beta is beta 1, beta 2 through
beta N, when you are ready to make this reduction, how
many symbols will you count?>>2N.>>2N exactly because I
have 1, 2, 3 through N. So, what you are going to
do is you are going to pop off all of these. So, you know, reducing… by A beta 1, beta 2, through
beta N. So, you’re going to pop 2N items in the stack. And then, S is going to be, you
know, the state that is here or S top, the top of the
stack after popping off. So, let’s call it S
Top is what we get. Then, we push what? If you remember what
we were doing when we were reducing
in the LR1 parse.>>Then we go to the goal.>>No. We are reducing by this
rule, so we are pushing –>>The go to of S Top A.>>We have to push the A first. So, first push A, then we
have to push go to of AN — sorry not AN, it’s the state
which is S Top, and A. So, S Top is the base state. This is the base. Remember this? Top two items push A, then
push go to up S Top and A. And that’s how we do this,
you know, and this is the code that will get generated but
you can execute the action at this point. So, you just execute the action
associated with this grammar. So, when you reduce by
this rule in the grammar, you execute this action. So, whenever a reduction
happens, you execute the action that corresponds to that
rule in the grammar. Of course, when you shift,
you don’t do anything. Right. So, these
actions are associated with reductions,
or new deductions. And in fact, some of the
reductions we are putting like useless actions or actions that will just pass
information up. So, these are pass
type of actions, they will pass the
information from one node in the derivation tree
to the node above it, its term in the derivation tree. So, some of them
will do real actions, some of them will just pass. But with the shift,
whit the parse shifts, there is no action, there
is no action associated with the shift. And the action is associated
with some of the reductions.

Leave a Reply

Your email address will not be published. Required fields are marked *