And now we will start using

the parsing techniques as we have learned to actually

do the abstract syntax tree. Remember from the

earlier lectures, we talked about the different

stages in a compiler, we have the front end and the

optimizer and the back end. And we said that the front

end takes the high level representation, the source code,

and the high level language. And the output of the

front end is what? Yeah the abstract syntax. But today we will learn how

we can construct the abstract syntax tree based on the

parsing techniques that we have. So, this is the familiar

classic expression grammar with the assignment. So, I have also added

the assignment to it. And let’s now do the

parsing for this example. And then see the corresponding

abstract syntax tree. So, let’s do the parse for this. So, this is, you know, an

assignment so it’s a goal, and the goal is an assignment. And this assignment

is an identifier. Identifier equals

expression with semicolon and then this expression

is what? The expression is this and

this expression consists of how many terms? Two terms, right. Everybody was good at

doing the derivation on the test, by the way. So, everybody, almost everyone

in the class did this, you know, the derivation did correctly. So, it’s expression

— oops — plus term. And in this case, the

expression is what? Is a term, right? And this term is

term times factor. Okay. And this term is a factor, and this factor is an

identifier which is what?>>The A.>>The A. So, this

identifier here is the X, this identifier is the A

and this is times factor and the factor is — this

factor is not an identifier. This factor is?>>A number.>>A number and this

number is the three. And this term is a factor. And this factor is an

identifier which is a B. Okay. So, so far nothing new, right? So, we know how to write this

parse tree or derivation tree. So, this is the parse tree. Or the derivation tree

for this expression. What we want is not

the parse tree, we want the abstract

syntax tree. And what’s the abstract

syntax tree conceptually? The abstract syntax tree is

basically this parse tree. What would — only the

operations that matter, you know, for code generation

purposes or the operations that correspond to real actions. So, we would like to keep

the nodes that correspond to real actions and get rid

of any node that corresponds to just grammatical

substitutions, you know, substituting a grammatical

symbol name for another. For example, you know, this edge

here, we are here, you know, this is just a substitution. Here there’s no action

between term and factor. So, you know, whether you look

at it bottom-up or top-down. If you look at it bottom-up,

then this factor is reduced to a term or if you look

at it as the top-down, this term is expanded

into a factor. But this is just a

matter of naming. There is no actual

operation that is happening. What are the actual operations? Actual operations are the

arithmetic operations multiply, add. Okay. And there are

other operations.>>Equal, the assignment.>>This is the assignment

operation. Okay. And what else? How about the leaves? So, these leaves, the

identifiers, and the numbers. Now, we should start thinking

in terms of code generation. So, in terms of code

generation, this would result in generating some code

because this is a variable. Now, this variable in order

to use it in this expression, what do we have to do, what kind

of code do we have to generate?>>A memory entrance>>Yeah, it’s exactly. So, we have to bring it from memory possibly

using a load instruction to bring it from memory. So, these leaves are

also interesting. So, in terms of constructing,

since our objective is to construct a data

structure that will help us in code generation, these are

the nodes that we care about. And everything else can

be abstracted away, right. So, we can, you know,

our abstract syntax tree in this case is going

to look like this. There is an assignment node and this assignment node

assigns this identifier. So, this is an identifier node,

assigns it to an expression but this expression is just the

name, in this case, you know, this expression is the add of

an expression and the term. So, the actual operation

here is the add. So, we are — what

does this mean? It means that this assignment

is assigning the result of this add to this variable. It’s an assignment

that assigns the result of the add operation

to this variable. And now where does this

add operation come from? It adds two things,

it adds the result of the multiplication

to this identifier. Oops. The result of

the multiplication to that identifier — okay. Let me just move this. I’m going to just tree draw it. So, I’m not going to change it. I will redraw it because

I need this space here. Okay. So, I’m going to do

assignment and then identifier, add multiply identifier. And what does this

add, this multiply — it multiplies the

identifier with the number. You know, as we will see in

the next lectures, you know, when we have an abstract syntax

tree, so this is abstract because we abstract it away. We’ve got rid of all the

nodes that correspond to grammatical symbols. In terms of code generation,

after we do the parsing, after the parsing is

done, we no longer care about grammatical symbols,

we care about what this means in terms of actual operations because we want to

generate code. And generating code here in this

case can be done as we will see in a straightforward manner. So, for this, you know, identifier if we put the names

here, X, A, 3, and B. So, the corresponding code, you

know, for this identifier node, there would be a

load so we’ll — you know, we will be

loading X into some register. And then –>>Why?>>Sorry, loading —

sorry, loading A. I’m sorry. Loading A. By the

way, so this is where the actual code

generation will start. Loading A into a register

then for this number, we do a load immediate. Load immediate of three

into another register. Then, you know, this node

— what do we do here?>>Multiply.>>Multiply it. Multiply what with what? Exactly. So, it’s like, you

know, this is this gets loaded and the result is placed

in R1, this gets loaded and the result is placed in R2. Now, this multiply is going

to multiply R1 and R2. And it’s going to put the result

in some other register R3. So, the result of the

multiply is going to be in R3. What’s the next operation?>>Load B.>>Yeah, exactly.>>Because we loaded in R1.>>Yeah. Loading B.

I’ll put it into R4. So, B is going to be in R4. Now, what’s the next operation?>>Add.>>Adding R3 and R4. And we put the result

in a new register R5. By the way, we assume that

we have an infinite number of registers in this one. So, you assume that you

have an infinite number of registers then you

leave mapping this to physical registers,

you leave it for — you leave it to which

stage in the compiler?>>Register allocation.>>Register allocation stage. Yes. So, now we are adding these

and now what should be the next?>>Store it in a

memory location.>>So, this is the store. The assignment is just a store. So, you are storing R5. So, this is going to be an R5. So, you store R5 into X. Okay. So, this abstract

syntax tree, you know, it can be nicely used

to generate code. And by the way, what

algorithm did we apply to generate this code? What graph algorithm

did we apply?>>Depth-first.>>Yeah, it’s depth-first. And is it post-order

or pre-order?>>Post-order.>>Pre-order, I think.>>Where did we start

processing, from the leaves or

from the root?>>From the leaves.>>Yeah, we started

from the leaves, right. So, it’s post-order. So, we process this and this,

then we process the parent. So, the parent is processed after the children

which is post-order. So, with just a simple

post-order DFS, you can go from an abstract

syntax tree into, you know, abstract assembly code. Okay. But then the

question is not to get from an abstract syntax tree

to this code, the question is, how can we get from, you know,

this grammar or this parse tree into this abstract syntax tree. The point here — you know, the

key idea is to look at the rules in this grammar and identify

the rules that correspond to the operations

that we care about. And whenever a reduction

happens using that rule, we create a node in the tree. So, if we do this,

if we look at this, now here we have an identifier. So, we’ll just create

an identifier node. Create… Or make — let’s use make. Make an identifier node. And here we make a number node. But here, what happens here? Should we make a node or not?>>For which one?>>For this parenthesis

expression. So, in order to see

this, you know, whether we should

create a node or not, if we have something

like, you know, A times — what if we had, you

know, parenthesis here? If we had parenthesis, then, you

know, what we want here is — we want the assign and

we are assigning — we are assigning

the expression to X but this expression

is in this case what?>>Multiply first.>>Yeah, the multiply. The multiplication

of the identifier and this is another expression. So, it’s the add of

number and identifier. So, in fact, you know, with

this parenthesis expression, we did not create a new node. So, this is basically taking the

node that already corresponds to this expression to

whatever this expression — you know, the tree, the abstract

syntax tree that corresponds to this expression, we

take the root of that tree and make it this — you

know, so this is the tree that I’m referring to. So, it’s — you know, our

expression here is this. So, this is the expression. The expression between

parentheses. So, we are taking the tree

that represent this expression and now the root of this

tree becomes, you know, the right operand

for the multiplier. So, when we have parenthesis, we are not actually

creating a new node. Okay. And remember what to keep

in mind here, what’s important to keep in mind is that

this thing is recursive. So, this process of,

you know, parsing, the grammar is recursive

and the abstract syntax tree is recursive. So, you know, this can be —

when you have a rule like this, parenthesis expression, this expression can be just

the root of a whole big tree. So, this can be an

arbitrarily big expression. So, it can have, you

know, its own tree. So, what we’re going

to say here, we’re just basically

linking trees. So, when it comes to

something like this, we are just linking trees. So, we are not making

a new node here. And here, when we are making

this substitution, you know, factor goes to term,

we’re just — we are not creating

any new operations. But the new operation is here. So, we are making a divide, no. Making… Making a divide node. And here we are making

a multiply node. Here are we making

any new nodes? So, basically these rules that

correspond to substitutions, only substitutions

without actual operations. Basically these are the rules

that are not going to appear in the abstract syntax tree. Remember the abstract

syntax tree is going to have the actual

operations and it’s going to eliminate anything

that corresponds to grammatical substitutions,

you know, substituting one

symbol for another. So, if this rule here is

just saying, you know, substitute factor for term

or expand term into factor. So, it’s not doing

any actual operation. This is doing an

actual operation. Make a subtract node and

this is make — what? An add node. And in this assignment,

what are we making?>>The assignment node.>>Making an assignment node. But now, how do we

make these nodes? So, making a node is

basically, you know, a node in the tree

will have two children. So, a tree node has

two children. So, I’m making this

identifier node or number node. This is trivial because

this is a leaf node. So, this is just a

leaf node so basically, we are creating a node and

in this node, you know — think of it as an object and

this object has in it, you know, some type, which is identifier,

this is an identifier. And then it must have the

actual name of that variable. Or what is more important

than the name of the variable is what? When we generate

code we don’t care about the name, we care about –>>The memory location.>>Yeah, the memory

location or, you know, the offset of that variable. So, in fact, you know,

the offset is necessary, it’s necessary to keep the

offset of that variable but keeping the name which

is symbolic information is not necessary. It will be useful, as we

said in the earlier lectures, keeping the name or the

symbolic information, keeping that in our data

structures will be useful for what?>>Debugging.>>For debugging. So, if we assume that we

don’t care about debugging, the essential thing that we

need to keep is only a number, you know, the offset that

corresponds to this variable. So, we no longer need to

remember that it’s called X or Y or Z or whatever, you know, fancy name the programmer

gave it. For us, it’s a number,

it’s a location in memory. Okay. So, this is

for these leaf nodes. Now these are not leaf

nodes, these are nodes that represent operations. So, let’s think about this. You know, making

a multiply node. So, what do we need

to do in order to create this multiply node? So, suppose now we

created the two leaf nodes for identifier and number. That’s easy. You know something, objects

that have some information about this variable and

information about this integer which is the value

of this integer. Then as I’m creating these

leaf nodes, what do we need in order to create this node?>>Nodes that connect to it.>>Yeah, exactly. So, a node has the left

side and the right side. So, this is a binary operation,

multiply is a binary operation. So, it needs two children,

two operands, the left operand and the right operand. So, this is the left

and the right. So, it needs to know in order

to do this multiply node, we need to know, we need to have

a pointer to the left operand and the right operand. In order to link them

together, in order to say that this node has this

identifier as its left child and this number as

its right child. Now, how do we do this? We need a mechanism for

passing variables, you know, passing certain attributes

with data passage created with the symbols

in this grammar. And the way we do

this using the — we will be using

the Yacc notation. So, what is Yacc? Yacc is a parse generator. So, it takes a grammar

that you write. So, you write a grammar and

you write actions like these. Of course, you know, these actions they’re

still incomplete, we haven’t completed them so they still don’t

make much sense. So, it takes a grammar and then

action associated with each rule in the grammar and using

this, it generates C code that does the parsing for you. So, it does all of this NR1,

remember all of this NR1 stuff, you know, the table,

you know, doing — creating the sates and then

generating the action table and the go-to table. All of that stuff

that you have enjoyed in homework number three. So, that pleasurable stuff. So, when you use the parse

generator, you don’t get to enjoy it because the parse

generator will do it for you. So, it will do all

of that work for you, it will do all the creation

of the tables and everything. And it will just

give you C code. So, all that you need to

do is to write the grammar and then write these

actions that are associated with the grammar rules. And, in fact, the main objective of today’s lecture is

understanding these actions, what they mean and what we need

to write in order to create or build this abstract

syntax tree. Okay. So, by the way, this

Yacc — so, that was the first, you know, parse generator

program. Does anyone know

what it stands for? Yeah.>>Yet Another

Compiler-Compiler.>>It stands for Yet

Another Compiler-Compiler. Because it’s a

compiler-compiler. It itself is a compiler

and you give it the grammar and it generates a parse for

you and that parse becomes part of the compiler that

you are building. Okay. So, it’s Yet

Another Compiler-Compiler but we will be using Bison

which is a more recent version of this, you know,

parse generator. Now, using Yacc and Bison,

we’ll be using Bison. So, there is a notation

that they use for each rule in the grammar, they use the

symbol dollar sign dollar sign for the left-hand side and for

all the symbols that appear on the right-hand side, they

use the symbol dollar sign and the number of the symbol. So, for example, in this rule, so this is the dollar

sign dollar sign, this is the left-hand side,

and on the right-hand side, we have three symbols, two

variables, and one terminal. So, everything counts, remember. We count the symbols

to the right-hand side. Everything counts whether it’s

a terminal or a non-terminal. So, term is going

to be dollar sign 1, multiply is dollar sign 2,

and this is dollar sign 3. Okay. So, why do we need this? We need this in order

to say, “Okay, in order to create

a multiply node, I am constructing

dollar sign dollar sign.” So, this is the node for term,

the node for term is the node for this left-hand

side term multiplied by the node for factor. So, I express this as — so

I need to know dollar sign 1, and dollar sign 2 or 3?>>Three.>>Three. Two I already know that dollar sign 2 is

the multiplication sign. So, dollar sign 3 is the third

symbol which is the factor. So, you will be writing things

like this in the assignment. Okay. So, this make

node is going to be doing something like,

you know, this dollar sign, dollar sign is a node that

is made by linking whatever, you know, I have the

node for this term and the node for this factor. So, it’s going to be linking,

you know, and in this case, it will be linking the node for

term of the node for factor. But, you know, when the term

consists of a single factor, then what makes sense here

dollar sign dollar sign equals — what? So, I only have one symbol here.>>Dollar sign 3>>Dollar sign 1. So, this is just like a pass. You know, here I’m passing so

I’m not actually, you know, creating any new nodes. So, basically because this

is just a substitution. When you are substituting,

when you are renaming a factor to term or you’re doing this

grammatical substitution, you are saying that dollar sign

dollar sign is dollar sign 1. Okay. And similar

for the make divide. Of course, we will do at

least one more example on this dollar sign notation so

that everyone understands it. So, this is the first example

but there will be more examples. I’m using this, you know,

dollar sign notation but you will use it

in the assignment. In the programming assignment,

this is what you will be using in order to build a compiler. So, in this case, similarly

dollar sign 1, dollar sign 3. Now, this make multiplication,

what will it be doing? It will be doing something

like make multiply node that takes this dollar

sign 1 and dollar sign 3. While in fact, you know, these

dollar sign 1 and dollar sign 3 for this function is going to be

a regular function that takes — you know what is tricky about

this dollar sign notation is that this dollar sign can — this attribute can

have different types. So, in this case, what do

you think the right type is, given what we have

written so far? What’s the type for dollar sign?>>Integer.>>No, it’s got to be a node.>>Yeah, it’s a node. In fact, a pointer to a node. Exactly. So, this is — what? Dollar sign here is

a pointer to a node. So, if you — it’s going

to take a node pointer, node star N1, and node star N2. And it’s going to

create, you know, create N or create a new node. And then, end of left equals

N1, and end of right equals N2. And you want to return, if it’s

a pointer, then you will return, in fact, the address of N.

Sorry, the address of N. So, don’t worry about the

syntax at this point. It’s, you know, most

likely you’ll be doing this as pointers, you know,

the pointers to nodes. So, basically creating

a multiply node, you are taking a pointer to a

node and a pointer to a node. You are creating — you create

a new node that you call N and then the left-hand

side is going to be N1, the right-hand side

is going to be N2. And you are returning

the address of this node. Or if you create a new

node, you know, at address N or with node pointer N, then you

will return N. So, if you use N for the pointer, create

new node with pointer N, then you will return N.>>That’s a pointer right, the

multiplication at the node N1? Is that a pointer?>>Yeah, that’s a pointer, yeah. So, we are passing pointers. We are passing pointers to

nodes and we are linking them. So, don’t worry about

the syntax. You’ll have to worry

about the syntax when you write the

program but not now. So, focus right now

on the concept. So, the concept is that this

make multiply node is taking two node pointers and

it’s linking them. And if you keep doing this,

you will eventually, you know, the last node that you will

create is going to be what? You keep doing this?>>The leaf node.>>No, by going from

leaves to — you are going bottom-up

so the last node that you will create is

going to be the root node. So, you create the leaves,

it’s bottom-up, right. You are building

this tree bottom-up, in a bottom-up manner. And, in fact, because

we are using this with a bottom-up parse, you

are applying here the use of an NR1 parse, a

shift reduced parse. So, what we are saying here is

that whenever the parse reduces, makes a reduction using rule

number 5, whenever there’s an R5 for this function, okay, you

know, execute this action, this is an action that we

are embedding in the grammar. So, these embedded

actions are the actions that are going to

build the tree. So, we are saying whenever you

reduce by rule 5, execute this, which means link the left-hand

side and the right-hand side. Whenever you reduce by rule

number 6, execute this action. Whenever you reduce by rule

number 7, execute this action which is basically nothing. We are just passing so we are

not creating a new node here. We are saying that

when we do this, we are not creating a node, we

are just taking this identifier, which is what the factor is,

we are taking this identifier and making it this node here. So, by doing this pass

here, we are bypassing or skipping those rules, those grammatical

rules that do nothing. This is a grammatical rule

that does nothing in terms of actual code generation,

you know, it does something

in terms of parsing. Grammatically it’s meaningful,

it’s significant but in terms of code generation,

it does nothing. So, here basically we are,

you know, passing, you know, whatever this dollar sign. So, this dollar sign here

we are passing it to this. It’s going to be the same.>>So, in that case,

it would be the same — you would use that pass

notation on line 4 as well.>>On line 4, yeah, exactly. So, and on line 4, you know,

for the rest of the grammar, in fact, this is, you know, very

similar so for the make add, we are doing, you know, dollar

sign dollar sign equals make of dollar sign 1, dollar sign 3. And this is dollar sign dollar

sign equals makes a subtract node, dollar sign

1, dollar sign 3. And this is just a pass

equals dollar sign 1. And this is an assignment node. So, dollar sign dollar sign

is make assignment node. Okay. And we are assigning again

the left side is going to be 1 and the right side

is going to be 3. So, you may think that

it’s always 1 or 3 but no, it’s not always,

you know, 1 and 3. It happened to be 1 and

3 here because we have on the right-hand side,

we have three symbols, and symbol number 1 and

symbol number 3 are the ones that matter. Symbol number 2 does not matter. So, in order to show

you an example where it’s not, you

know, 1 or 3. So, if we have an if

statement, for example. So, if we have something like

if some condition then some code else some code. Okay. Now, we need to create

a node for the if, right. So, the node for the

if, you’ve got an if. How many children

will this have? What do we need to know

for an if statement?>>The condition.>>The condition, that’s

what we need to know.>>And that codes are

both true and false.>>And the code, this

code and this code, right. So, that the then code

and the else code. So, this is the, you

know, the then code, and this is the else code. This is what we need to know. Everything else, we

don’t need to know. So, in this case, we

will say that — okay. So, we have a make if that will

take three pointers, you know, node star condition,

node star then code, and node star else code. And it’s going to link them. Now, here, in order

to write the action, we will write equals

make if of what? So, dollar sign — yeah, what’s

the number for condition?>>It’s three.>>Three.>>So, you have to

count all the symbols. So, the trick is that you have to count all the symbols whether

it’s terminal or non-terminal. So, the if is dollar sign

1, this is dollar sign 2, condition is going

to be dollar sign 3. So, dollar sign 3

is the condition. Then what is dollar sign 4? Closed parenthesis. You basically have to count

every symbol whether it’s a terminal or a non-terminal. Code is going to be dollar

sign 5, else is going to be dollar sign

6, dollar sign 7. Okay. So, what are

the ones that we need? Dollar sign 3, dollar

sign 5, and dollar sign 7. So, it’s not always 1 and 3. Okay. So, this is

the logic of these, you know, dollar sign symbols. Every symbol on the

right-hand side has a number. And don’t forget the terminals

no matter how small they are. So, if this is, don’t

skip the little ones. Okay. So, this is a small

— this has a number too. Okay. So, any question on

what we have covered so far? Okay. So, let’s try

to finish this up. So, now so we have done this,

the make node, the make node. Here, what do we do here? We didn’t write the

action for this.>>Dollar sign dollar

sign equals dollar sign 2.>>Yeah, exactly. Equals dollar sign 2. Because this is dollar sign 1. This is dollar sign 2. Okay. And making

the number node. Now these are leaf nodes

so they are not dependent on other nodes. So, the parameters to

these make number node and make identifier node, they

are not going to be other nodes because these are the leaves. So, they are not

linking other nodes. You know, all internal nodes

are going to need their children so that they can

link the children. But these are the leaves. So, what do you think the

parameter that we passed to the leaf, what

should that be?>>Number itself.>>Yeah, the number itself. Here for the identifier. So, basically –>>Dollar sign 1 then?>>Is it dollar sign?>>No. Sorry. So, the leaves are different. So, what do you need in order to construct this node

for the identifier? You in fact need the token

that comes from the scanner. Because these are the ones that

are coming from the scanner. The leaves — so these are

coming from the scanner. The scanner classified

this as an identifier but there’s an actual lexeme

that comes with it or a token. So, you need the

token and the lexeme. And, of course, you

know the details, the programming details

you would deal with it when you write the code. But what you need

here is the token. So, remember the leaf

nodes are different because in the leaf nodes you

are not linking other nodes, you are just creating a node

that should have information about this identifier or number. In our example, we

only have identifiers and numbers as leaves. Okay. Alright. So, this, you know, this dollar

sign notation or this way of passing value from

the low level rules to the higher level rules. So, what we are doing here

basically we are passing the values up in a bottom-up manner. We’re passing the values from,

you know, the low level rules to the higher level rules

in the grammar until we get to the top level rule which generates the start

variable of the grammar. Okay. Now, this is how

we will be creating, building the abstract

syntax tree. So, the idea is that whenever

you reduce the parse is going to execute this action. Remember, when the

parse does a reduction. So, you have — this is — you have some state

here, a number of states. And then you have — you reduce

by something like a beta. So, if what you have

here — or you have some, well, beta would states. So, if you reduce by a rule that

says alpha is A or reduce a beta to A. Or let’s call it here

— I don’t know, beta 1, then state beta 2

state, and so forth. So, there would be

a number of states. So, beta consists

of beta 1, beta 2, all the way to beta N. Now, when you do a reduction,

or let me — okay. Let’s do this. So, you have some state and then at some point you have beta 1

then S1, beta 2, S2, beta 3, S3. So, when you are ready to make

this reduction where, you know, beta is beta 1, beta 2 through

beta N, when you are ready to make this reduction, how

many symbols will you count?>>2N.>>2N exactly because I

have 1, 2, 3 through N. So, what you are going to

do is you are going to pop off all of these. So, you know, reducing… by A beta 1, beta 2, through

beta N. So, you’re going to pop 2N items in the stack. And then, S is going to be, you

know, the state that is here or S top, the top of the

stack after popping off. So, let’s call it S

Top is what we get. Then, we push what? If you remember what

we were doing when we were reducing

in the LR1 parse.>>Then we go to the goal.>>No. We are reducing by this

rule, so we are pushing –>>The go to of S Top A.>>We have to push the A first. So, first push A, then we

have to push go to of AN — sorry not AN, it’s the state

which is S Top, and A. So, S Top is the base state. This is the base. Remember this? Top two items push A, then

push go to up S Top and A. And that’s how we do this,

you know, and this is the code that will get generated but

you can execute the action at this point. So, you just execute the action

associated with this grammar. So, when you reduce by

this rule in the grammar, you execute this action. So, whenever a reduction

happens, you execute the action that corresponds to that

rule in the grammar. Of course, when you shift,

you don’t do anything. Right. So, these

actions are associated with reductions,

or new deductions. And in fact, some of the

reductions we are putting like useless actions or actions that will just pass

information up. So, these are pass

type of actions, they will pass the

information from one node in the derivation tree

to the node above it, its term in the derivation tree. So, some of them

will do real actions, some of them will just pass. But with the shift,

whit the parse shifts, there is no action, there

is no action associated with the shift. And the action is associated

with some of the reductions.