# Compilers Lecture 25: Code Generation for Conditionals and Loops (1)

>>Today, we’re going to be
talking about code generation for conditionals and loops. So this is — this is
an important lecture. It’s very fundamental stuff, and
it’s something that you’ll have to do in the next
assignment, in assignment five. So in assignment four, you
only generate the abstract syntax tree. In assignment five, you take
the abstract syntax tree, and you actually
generate assembly code. Last lecture, we covered — we discussed how to
generate assembly, or abstract assembly
for — what?>>The valuation.>>Yeah, for what
language construct?>>Expressions?>>Expressions.>>For an expression,
yeah, for expressions. So last time, we did code
generation for expressions, and the point there is that
it’s recursive, you know. It’s — the idea is
to do it recursively. So, because an expression
has expressions within it — an expression consists of
expressions, and the grammar for the expression is recursive. So the generation of code for
an expression is recursive. Okay, so today, we’ll
talk about code generation for conditionals and loops. So let’s consider, in a code
like this with statement — some statement, then-if
condition. So for now, let’s look at — let’s consider a
simple condition, like register one is
less than register two. Then we would see how we can
generalize this to generate code for a general — a
general condition. So let’s write it condition. Let’s write it condition
in general, and if condition is true,
then we do statement two. In fact, it will still be
the same — you know, the — conceptually, it will be the
same if statement two was a list of statement — a
list of statements. Else, do statement three, and
then you have statement four. so now, if you have an “if”
statement, or “if-else” like this, the abstract syntax
tree for this will look like — there will be an “if” node,
and for this “if” node, what are the nodes
that matter for you?>>The condition.>>Yeah, the condition.>>Statements two and
three, and “else.”>>Yeah. Okay — well, “else”
will know that it’s an “else.” So if — this is an —
well, “if-else” loop. Let’s call it an “if-else” node. So we know that it’s
an “if-else,” because when you recognize
it, when you write the action, the embedded actions, you
will write an action for an “if” statement in which you will
have an “if” and a condition, and only, you know,
one statement or statement list
to be executed. So this is for an “if,”
and for an “if-else,” your abstract syntax tree is
going to be “if, condition, statement one, and statement
two,” or “statement list.” It’s going to be “statement
list one, statement list two.” So your embedded action for
an “if” would look like — something like this,
construct something like this. And your embedded action
for an “if-else” — your embedded action for
an “if-else” is going to construct something
like this. Because for an “if-else,”
you need something to execute if the condition is true,
and something to execute if the condition is false. Okay, so here, we’ll
have an “if-else” node, and we will have a condition. And we’ll have a statement
two, and statement three. Okay? Now, how do we
generate code for this? Well, we need — in
order to generate code, the code that we will be
generating is abstract assembly. So it’s not going
to be real assembly for any specific machine. So it’s generic assembly. That is, you know, assembly
instructions that are likely to exist on most machines,
and in particular, you know, RISK [phonetic] machines,
which have fewer instructions. So we — the abstract assembly that a compiler uses is usually
generic, and it’s something that you expect to
have on most machines. Remember that some compilers,
you know, such as, you know, GCC for example — some
compilers generate code for many, many different
machines. So — and in order to — before they generate code for a
specific machine, they generate that generic assembly,
that generic assembly that is expected to — not
to be found on all machines, but to have an equivalent,
some kind of equivalent, okay? So in our generic assembly, we
will use the generic assembly in the book, which is
what they call ILOC. This is in the textbook. This is the generic assembly. ILOC stands for –>>Intermediate language
for an optimizer.>>– yeah, intermediate
language for an optimizing compiler,
and in this language, we will have some compare
— compare instructions. So “compare for less-than”,
and “compare for less-than-and-equal,”
“compare for, underscore, greater-than,” “compare for
greater-than-and-equal,” “compare for” — what else? Less than, less-than-equal,
greater than, greater-equal — what’s that? Equal, right? You need equal, and
you need what? What remains here? What’s the remaining
— not-equal, yes. Not-equal. So we have these
“compare” instructions. So this “compare” instruction
takes two registers, RX and RY, and it writes the result to RZ. So it takes two registers
with arithmetic values, and it puts the result in RZ. Now, in order for
this to be useful, to implement a conditional, what
else do we need in addition to a “compare” instruction, in
order to implement a branch? Sorry, an “if” statement? Yeah?>>You need a jump space –>>Yeah, yeah, jump, or
branch based on the condition. So we need a conditional branch. So if we put this
result in RZ, then if — if we have an instruction
that will — that will branch, a conditional branch based on a
register, and it will go to one of two possible addresses,
or labels. So “compare less-than” —
it means compare RX and RY, and if — if the result — if RX is less than
RY, this will be true. RZ will be true. Otherwise, RZ will be false. Here, this will look at the
result, which will usually come from the result of a
comparison, and it — this will look at the result,
and based on that result, if the result is true,
it will go to L1. If the result is
false, it will go to L2. So if R is true, go to L1. Else go to L2. And here, you know, to write
the — for this comparison — so if RX is less than RY,
RZ is going to be true. Else, RZ is going to be false. This is what this
“compare” instruction means. Okay?>>What was CBR?>>Conditional branch. So CBR is — stands
for conditional branch. Okay? So now, let’s try to — to
write the equivalent code here. So statement one is going to be
statement one, whatever that is, or, you know, this could
be a whole block of code. So we’ll just generate code
— this is straight-line code. So this is statement one. So we’ll have statement one. Then, assuming that —
let’s replace this condition with a simple condition. Then we’ll see how we’ll
— we can generalize it. So let’s assume that our
condition is R1 less than R2. So if our condition is R1 less
than R2, we can do “compare less than R1, R2,” and we put the
result in some register, R3. Then we do what? Then — so we are
comparing R1 and R2. If R1 is less than
R2, R3 will be true. Otherwise, R3 would be false. So what should be
the next instruction? Yeah, conditional branch on
R3, and if it’s true, go to L1. And if it’s false, go to L2. These are two labels. Now, L1 — what will
we put at L1?>>Statement two.>>Statement two, and at L2 –>>Statement three.>>– statement three, okay. So is this complete?>>No.>>What is missing?>>Jumping — well,
not jumping, but you — to execute statement four.>>Okay, but when do we
execute statement four? So, okay — so we’ll
put statement four here. Is this enough?>>No.>>Need to have a jump
after statement two to go to statement four. So you need to have an L3.>>Yeah, exactly. So this is incorrect, because
this will execute statement two. Then, if it’s true, it’s going to execute statement
three as well. But we don’t want
it to do three. If this is true, we want
it to jump to, say, L3. And this is L3. Okay? So we also — we
assume that we have a jump. Well, in fact, this
is a jump immediate. So this is a jump-I. So we have a jump that
jumps to a register, and we have a jump immediate
that jumps to a label. Okay, so jump immediate
jumps to a label. The label is just a number,
you know, a constant. It — or in — when
it gets translated to actual machine code,
this label will be what?>>Some, like –>>It’ll be an address.>>Yeah, it will
will be in this register. So this is indirect —
this is one level — one more level of indirection. The actual address
will be in a register. But we are using here
the jump immediate, because these numbers
are labels, okay? When we are generating
assembly, these are labels, and then the assembler in
fact will take these labels and convert them to addresses. Okay. So this is the
— this is code for the “if” statement, or “if-else.” Now, it’s — let’s introduce
the important concept of a basic block. So the compiler will
generally divide a program into basic blocks. What’s a basic block? A basic block is a
straight-line piece of code that doesn’t have any branches. So in this case, whenever
there is a branch, that will mark the
end of a basic block. so it’s — in this
case, we’ll say, “Okay, this is basic block one.” Then this is a branch,
so we are branching here. So this marks the
end of a basic block, and we will start
a new basic block. Now, this basic block
will end where? Yeah, exactly. At the end of the jump, we
will end the basic block. So this is basic block two. And then, we have
statement three here. So should we have a basic
block boundary here or not?>>Yes.>>Yes. Because these are?>>Because L2’s conditionally
executed.>>Yeah. So this is
not straight-line code. So in fact, whenever
there is a branch, that marks the end of a block. Whenever there is a label, that marks the beginning
of a new basic block. So let me write it here. So — — so every branch —
every branch or jump, whether it’s a conditional
or an unconditional branch, marks the end of a basic block. And every label marks the
start of a new basic block. So in order to determine
the basic blocks, you just scan the code. In the beginning, you are
in basic block number one. Whenever you encounter a branch
or a jump, that marks the end of the current basic block, so
you start a new basic block. Or if you encounter a label, that label should be the
start of a new basic block. So in this case, we
have four basic blocks. So these are important concepts,
although they are not — they are not hard at all. I mean, these concepts are very
intuitive, but they’re very — very fundamental in
compilers and code generation. Now, there is the concept
of a control-flow graph. The compiler would
construct a control-flow graph that will represent
the control flow. So in this case, the
control-flow graph — — C –>>CFG?>>– CFG — you know, again, in
this course, CFG is overloaded. CFG can stand for context-free
grammar, and can stand for control-flow graph. So a control-flow graph for
this is going to be BB1, and from BB1, you can
branch either to BB2 or BB3. Then what happens?>>Well, BB2 and 3 go to BB4.>>Yeah, exactly. Both of them are going
to merge into BB4. So the concept of a basic block
and the control-flow graph — these are fundamental concepts, but they are very
intuitive, I believe. Okay? All right. So now, why — basic block is — why does the compiler divide
the program into basic blocks? Because, as we will see
when we discuss some of the compiler optimizations,
it will — it will make things much
easier, and much more logical when it comes to, you
know, optimizing code. So, you know, for
example, moving code or making transformations
to the code within a basic block is going to
be much easier than moving code across basic blocks,
or making — you know, applying
transformations that go, you know, across basic blocks. Within the basic block,
things are very simple, easy, and it’s a straight-line
piece of code. You don’t have branches. You don’t have control flow. It’s straight-line
piece of code. When you start executing
the basic block, everything within the basic
block will necessarily get executed until you get to
the end of the basic block. So a basic block is
a straight-line piece of code, okay? Now, one typical representation
of the code, you know, after we — after
the compiler — after we generate the
abstract syntax tree, a typical intermediate
representation is to represent the program
as a control-flow graph, and within each —
within each basic block, there is a list of instructions. There is a linear list
of assembly instructions. So in this case, you
know, in basic block one, you will have these
instructions, or, you know, if statement one is not
an actual statement, if it’s a block of
100 instructions, then the basic block will have
those 100 instructions in it. Okay? So within each basic
block, there will be a list of — questions on
these concepts? Okay, so next, we will see how
we can generate code for loops. Then we’ll see how
we can generate code for a general condition. Okay? So let’s see —
let’s look at loops. Let’s look at a “while” loop. So now, for a “while” loop,
you have statement one. Then you have a “while,”
and you have a condition. So let’s now — let’s use a
simple condition again, like, you know, R1 is —
you know, equals R2. And then, you have some
statement in the loop, and it will be the same if
this was a list of statements, or a block of statements. It’s going to be the same
thing, and then, you know, you have a statement three. So now — — now you can do it —
instead of “compare less than,” it’s going to be “compare” what?>>Equal.>>”Compare equal R1 and R2.” And if they are equal,
you — R3 will be true. Then you branch to L1 or L2. Now, if it’s tree — if it’s
true, what should L1 have? Yeah, statement. So L1 will have statement
two, and L2 will have — you know, will break
out of the loop. So it would be statement three. What’s missing?>>Well, you need a jump — it needs to jump back
to the “compare equal.”>>Yeah, exactly. Exactly. So –>>So you need a label for that.>>– yeah, exactly. So there are two
ways of doing it. You can — let’s
do it both ways. I mean, you can either
do the “compare” here — you can have a copy
of the “compare” here, so you can either “compare equal
R1 and R2,” and put the result in R3, and then conditional
branch R3, L1, L2, or you can just go to — — okay, and conditional — jump conditionally
either to L1 or to L2. Or you can do statement one,
“compare equal R1, R2, R3,” then conditional
branch R3, L1, L2, and L1 is the loop
party statement two. And then, you do a
jump — jump immediate. So what do you need now?>>Another label.>>Yeah, you need a label here. So you need a label. So let’s call it L-zero — L-zero for this jump
immediate to L-zero. Then, L2 is statement three. Okay? Now, in this case,
you’ll have different — slightly different layouts. So in this case, to divide
this into basic blocks — so you scan, and this
is a conditional branch. This marks the end
of the basic block, so this is basic block one. Then the label is
a new basic block. Then compare conditional branch. This is basic block two, and
this is basic block three. While here, how many basic
blocks do you have here? No, you have one
more basic block. Again, the rule is,
scan the code. Whenever there is a label, that marks the beginning
of a new basic block. So this is basic block one. This is basic block two. This is the end of
basic block two. This is the end of
basic block three, and this is basic block four. Okay? So now, to draw the
control-flow graph, we have — control-flow graph, we
have basic block one, and from basic block one, we
can go either to two or three. Basic block two, basic
block three — okay? And what — what else? So from basic block one, we
can go either to two or three. From basic block
two, we can go –>>Back to one or to three.>>– nope. L1, which is basic
block two — okay? So it’s a self-loop, or we
can go to basic block three. From two, we can go to three. Three, we don’t have anything. So this is a control-flow graph. Yes?>>I had a question. In the first three lines
of our two examples here, I don’t see any difference
in the code, but in one, we have just one basic block. The other one, we divided
it into two basic blocks.>>Because of the label L-zero.>>Because of the label. So remember, every label
marks the beginning of a new basic block.>>Why did we put
the comparison. We don’t want to jump
to statement one.>>Okay.>>Right? So want to jump
to do the comparison, okay? So this is control-flow graph. Now, the control-flow
graph for this is going to be basic block one, then — — basic block one — sorry, no,
basic block one doesn’t go — okay, let’s put it here. Where does basic block one go? It only goes to basic
block two, right? Basic block one, there
is no conditional branch, so it falls through all the
time to basic block two. Basic block two is
going to go to B3 or B4. Okay, two is going to go
either to three or four. What else?>>Three goes back to two.>>Three goes back to two, yes. We have to have a — you
know, we have to have a cycle in the graph when
there is a loop. If we have a loop, then the
control flow graph must have a cycle. If you don’t have a cycle,
then there is something wrong. The cycle indicates a
loop in the graph, okay? So this is how we generate code
for loops and conditionals, and in fact, now, this is all
what you need to know in order to build, you know, our
basic compiler for the — for the baby language,
for our baby language. These are all the — these are
all the features that we have. We’ll talk now about
logical expressions, but it’s — in fact, it’s something that
we have already talked about. It’s not very new. So, okay. So now — — now what if we
have a condition — you know, a general condition,
like A is greater than B, and C equals D, or E is
less than or equal F? Now, if we have a
condition like this — in fact, we have seen
how to write a grammar for a Boolean expression. So in fact, writing a grammar for a Boolean expression is very
similar to writing a grammar for an arithmetic expression,
because the “and” maps to — what does the map —
the “and” correspond to?>>It’s “and.”>>Yeah. Well, what
does it correspond to in the arithmetic expression?>>It’s multiply.>>Yeah, multiply or divide,
and the “or” corresponds to –>>Oh, okay.>>– plus or minus. So the — and we have written. We have actually written a — the grammar for a
Boolean expression. Now — so generating code for
a Boolean expression is going to be similar to generating code for an arithmetic
expression, all right? It’s — it’s the same
idea, but, you know, the operations are less than or
equal, or greater than or equal. So to make sure that
we understand this, that we fully understand this, let’s draw the abstract
syntax tree for, you know, this Boolean expression. So for this Boolean expression,
the abstract syntax tree for this will look like — what
will the abstract syntax tree for this Boolean
expression look like? What will be the root node?>>”Or.”>>Yeah, it will be the “or.” So if you generate — if
you write the right action, and if you write
the right grammar for a Boolean expression, as
we did it in previous lectures, your root is going
to be the “or.” And this “or” is going
to “or” these two things. This is one thing,
and this is another. So this is an “and” —
this is an “and” term, and this is another “and” term. So it’s going to “or” what? The “and” of?>>A greater than B.>>A greater than B. So this
is going to be greater than — greater than with A and
B. And this is going to be equal, C and D, okay? And this is going to be what? Less than or equal
E and F. Now — and this is something that
you will do in the assignment. You know, the same way we
wrote code for generating — for generating assembly for
an arithmetic expression, same thing applies to a logical
expression, but, you know, you’ll have to take into account
the different operations. So here, you have
greater than, equal, less than or equal, whatever. So for example, the code
for this will look like — what will be the first
statement or the first –>>It’s going to
be a load on the A.>>– yeah, exactly. So first –>>R –>>ARP to A.>>– yeah, so load
AI address immediate, R activation record pointer,
B into R2, and then do what?>>Do a “compare.”>>Yeah, do a “compare
greater than.” “Compare greater than of R1 and
R2,” and put the result into R3. So this is R1, R2, R3, and then?>>And then load C.>>Load C and D. Address of C into R4, and load
activation record pointer, address of D into R5. And then do what?>>Compare.>>Then “compare equal — compare equal R4 and R5,”
put the result in R6. So this is R4, R5, R6. Then do what? Then you will “and,” right? So you will “and” — so, well — yeah, so let’s be
consistent and do lowercase. So you will “and” this
with this, R3 and R6, and you put the result into R7. So we assume that we have
an “and” instruction. Okay? Now, what should
— what — yeah, load E. So we
are done with this. So now, we load — — address of E into
R8, and load — — address of F into R9. And then, we “compare less
than or equal” of R8 and R9, and we put the result into R10. So this is going to be R8,
R9, R10, and the result of the “and” was put in R7. So now, what’s the
last instruction? “Or-ing” R7 and R10,
“or-ing” R7 and R10, and putting the result in –>>R11.>>– in R11. Okay, so R11 will have this. So now, if — if you had this
condition in a “while” loop — so if you have this in a
“while” loop, “while this, do statement two,
statement one.” So in this case, you will
have to generate the code for the condition,
and the result of that code is going
to be in R11. So you will need to do all of
this, then conditional branch on — conditional
branch on what?>>On R11.>>R11, which has the result of
this whole logical expression. So basically, you know, you
will write a recursive routine for generating code for a conditional no matter how
complicated that conditional is. Then the result of that
conditional will be stored in some register, and
you conditional — you do a conditional
branch on that as well. And that goes to L1 or L2. Okay. So do you know now
how to write code to — how to write code to
generate code for, you know, these basic constructs,
arithmetic expressions, logical expressions,
conditionals, and “while” loops? Okay. So this is basically, you
know, what you need to do for — in order to write that —
to build that compiler. So in this case, you know, this
is going to be our condition. So, you know, you will
have a “while” loop. So you’ll have a “while” node, and this “while” will
have a condition. And it will have a body, and
this condition, you know, can be a complicated
condition like this. So you’ll have to
basically invoke the function that generates code
for this condition, and this body could be anything. This body could be a list
of statements, and that list of statements can have
— you know, within it, it can have “if-else.” It can have another loop. It can have anything. So basically, it’s
all recursive. You know, right — you know, programming languages
are recursive. So within this body of the
loop, you can have anything. So within this, you can have
“if-else,” “while” loops. You can have a — you know,
complex expressions, arithmetic and logical expressions. Okay. So any questions on this?