Compilers Lecture 29: Global Register Allocation (1)

Compilers Lecture 29: Global Register Allocation (1)


>>So today we talk about
global register allocation. So in the previous lectures, we talked about local
register allocation within the same basic block. Now the problem with local
register allocation is that it doesn’t allow you
to assign a virtual register to a physical register and keep
the value across basic blocks. So, if you have something
like this define our one, use R1, use R1, and you have. In another basic
block, you have use R1. In local register allocation,
there is no way you can assign, you know, this R1 and
this R1 to the same, to the same physical register. Why? Because you are allocating
each block separately. So you are invoking that local
register allocation algorithm. You are invoking it on this
basic block, and it’s working on this basic block, it
doesn’t know anything about this basic block
or this basic block. And when it’s working
on this basic block, it doesn’t know anything about
what’s in this basic block. So this means that when you
do local register allocation, you cannot keep a
value in a register, and you keep it alive
across basic block. But you’ll have to do
in this case is okay, you can probably allocate
our one physical register, like P1 for example, but here in
this basic block, you don’t know about this basic block. So anything that is used in
this basic block will have to get loaded before it’s used. So you have to load R1 here. The only way you can do it with
local register allocation is to load R1 from memory. You have to load it from
memory and then use it. This means that with
local register allocation, you will have lots of loads
at basic block boundaries. And in fact, you will have
loads and you will have stores. You need stores because
in order for this to work correctly,
we have to what? What do we have to
insert to store? [ Inaudible Responses ] Well at the end of
this basic block or after the last definition
of this basic block. So in this basic
block, this happens to be the last definition,
right. So if this is the
last definition, we should insert
a, sorry, store R1. We can do it here. It doesn’t make a
difference in this case. So you can either put the
stores at the very end of the basic block
or you can put them after the last definition
of the basic block. So, what does this mean? It means that when you do
local register allocation, you will end up adding all
of these loads and stores that will make sure that the
values are stored in memory after a definition,
and a value is loaded from memory before a use. Okay. Is the idea clear here
why, you know, the difficulty of or what’s, well in fact why
local register allocation is not very efficient? Yeah. Okay. So in this case, when we
have, when we have a register that is used in this
basic block but is defined in another basic block, we say that this register
belongs to the live in set. So for each basic block, it is a
live in set, a set of registers that are live at this
point, so they’re defined in other basic blocks but
are used int his basic block. And the live-in set
here will have R1 in it. So if we had, you know,
for example, you define R1. So if we are to define R2 and
define, it’s getting messy, okay, define R2, define R3. And we have, use R3 and
we have a use R2 here. So in this case, in the live
in set for this basic block, we have R1 and what is live in? Live in is the register that is
defined in another basic block that is a predecessor basic
block in the control flow graph and is used in this basic block. So in this case, what is
live here is R1 and R3. While the live in here for
this basic block is R2. Now what’s live out
of this basic block, so what is live out? Well in fact, all
of them are live out because they are used late. So live out is going
to have R1, R2 and R3. Okay. So if you had,
you know, for example, if we let’s extend it, let’s
extend this basic block. So if we had define R4 and
use R4, and R4 is not used in any basic block below this,
so let’s call it basic block 1, and this is basic block 2,
and this is basic block 3. So, R1, R2, R3 are live out
because they are used later in successive basic blocks. While R4 here is not live
out because it’s defined here and it’s used here, and it’s
not used below this basic block. Yes.>>Live out for 1, 2, 3 only
because they’ve been stored?>>No, no, no. Because they are used in
successive basic blocks. Because R1 is defined
here and used here, okay. R2 is defined here
and used here. And R3 is defined
here and used here. So you’ll have, you know,
you’ll have a live range. So R1, the define for R1
you have this, define R1, and this is use of R1, so
this is a whole live range. And for R2, you have this
live range, and for R3, it’s defined here,
and it’s used here, so you’ll have this live range. Okay? So, live means
live at this point. It means it has been
defined above this point. But it will be used
below this point. So this applies to R1, R2, R3,
but it does not apply to R4. R4 has been defined above this
point, but there is no use of R4 below this point. So live out means it will
be used below this point. It will be used later. It will be used in a
successive basic block in the control flow graph. While live in here, live in
means that there is a register that is used in this basic
block but was defined in a previous basic block or
in a predecessor basic block. A basic block that is, you
know, above this basic block in the control flow graph. Although the meaning of above
and below may not make sense when we have cycles
in the graph, right. Because if the graph has cycles,
then you can no longer talk about above and below, right. So if you have something like
this, then you can no longer say that this is above
or below this. It’s a cycle. But we are, at least, you know,
without the cycle, we can above and below makes sense. It makes sense to
say above and below. Okay? So, any questions on
the concept of liveness? Okay. So when we have live
in and live out registers, if we are doing local
register allocation, if we are doing local register
allocation, we will need to store all live out
registers, so store live out. So in this case you’ll need
to store the live outs, and you need to load
the live ins. So you store these. Why? Because in some, this
is when you are doing, when you are doing local
register allocation, when we do local
register allocation, we must store all live out
first at the end of the block and load all live in registers
at the start of each block. At the end of each block. So for each block, we will
find the live in and live out. How do we find the live
in and the live out? There is an analysis for
finding live in and live out. It’s not in today’s lecture
covering how we can compute the live in and live out. But it’s something that we will
be covering in a future lecture. Okay? So this is the, what
we need to do with local, with local register allocation. Which means that we
will have lots of loads and stores that may be avoided. If you can do global
register allocation, look at all basic blocks
at once, you may avoid, you may be able to avoid all of
these stores for the live out and these loads for
the live ins. Okay. So let’s look
at an example. So these are virtual
registers, you know. A, B, C, D and E are
virtual registers. Okay. Now let’s look at the
live in and live out sets. Okay, so for this
basic block, live in, there is nothing live in, so
live in, so we’re assuming that this is the
empty basic block. So live in here is going
to be the empty set. Okay? Now, what’s live out?>>A.>>Live out means something
defined here and used later. So –>>So it’s just A.>>Yeah, just A. Why? Because E is not used in any
of the subsequent blocks. So live out is going to be
A. Now what’s live in here? What’s live in?>>Nothing.>>It could be A.>>Yeah, still be A.
Still, A would be live because even though A is not
used in this basic block, it is still used below
this basic block. So there is a live
range like, you know, try to imagine the live range. So there is a live range
that is going this way. Okay. So this live range, this
is a possible bath in the code. So if this path in the code is
taken, A will still be live here because it’s used here. It’s not used in
this basic block, but it’s used in
a successor block. So, A will still be
live in in this case. Okay, so live in is
going to be A. Live in here will have A. Now,
what’s live out for this?>>A and D.>>Yeah, so, live out, A is live
because at this point it’s live. It’s defined here. And so A is used
below this point, and D is used below this point. So and A and D both
are live out. And same applies to this. Live out equals A and D. Okay. And these, in fact, in general,
the live in for this will be in general, what will be
the relation between live in for a load, for a basic block
like this and the live outs of its predecessor basic blocks.>>It’s equal.>>It’s a union of all –>>The union, exactly. Generally speaking, it’s
going to be the union. So this example, they
happen to be the same. So the union is going
to give the same set. But generally speaking, what
is live out, what is live in here is the union
of these live outs. So if we had a C here or an
F or a G, it would have added to the live out to
this, live in, sorry. Live in here equals
A and D. Okay? So now we know the
live in and live out. Now remember, now we are trying to do global register
allocation, so we are trying to avoid, avoid spilling
the live outs and loading the live ins. So when you do local, you spill
or you store the live outs and you load the live ins. Now with global register
allocation, we’re trying to avoid this. And we are trying to keep
these virtual registers in the same physical
register across basic blocks. So we would like to keep A
in the register and would like to keep it in [inaudible]
until it’s used here. Okay, so throughout this
entire live range of A, now this is the definition,
and this is the use, we would like to keep
it in a register. So before we look at the live
ranges here for the global, let’s look at live ranges in the local sense just
because it’s easier. Before we look at global
live ranges, let’s see, let’s make sure that we
understand local live ranges. So if we have things like in
the same basic block define, you know, define A, use
A, define B, use A, use B, define C, use A, use C.
If we have a sequence of defines and uses like this. The live range is going
to be, is going to start at the definition, and it
will end at the last use. So the live range for
A is going to be this. This is live range for A. The
live range for B is going to be, this is defined B, this is used
B, there is only one use of B, so this is live range
of B. And what about C? C is defined here and is used
here, so this is the live range of C. So within a basic block
or in straight line code, live ranges are actually
live intervals. So in straight line code, in straight line code,
or within a basic block, live ranges are live intervals, which means that
they are linear. Everything is linear. Now, now why are live
ranges interested for register allocation? Why do you think when we do
register allocation we care about live ranges?>>So you don’t,
because if you’re going to use a value later on,
later on in that block, you need to have it in a
register so it could be used.>>Okay.>>But if a value’s no longer
needed, it can be dumped, for a different value to
be used for something else.>>Okay, but how
does that notion of live range help us with this? So we’re trying to map virtual
registers to physical registers, and we have these live ranges. So, if we have two live
ranges that overlap, then what does this tell
us about the allocation to physical registers? So in this case, for example, the live ranges of
A and B overlap. So there is an overlap
between the two live ranges. This means that we –>>You need at least
two physical registers.>>Yeah, yeah, two physical
registers, they cannot be placed in the same physical registers. So the key idea is if
two live ranges overlap, if the live ranges of two
virtual registers overlap, you cannot put them in the
same physical register. Okay? If the live ranges are
two virtual registers overlap, you can’t put them in the same physical register. So in fact, this is often, the
compiler’s often represent this by a conflict graph or
an interference graph. So in this case, you know,
A conflicts with B overlaps with B, but does
A overlap with C?>>No.>>Yes. A overlaps with
C, but do C and B overlap?>>No.>>No. So B and C
do not overlap. So if we draw a conflict graph
like this, conflict graph, sometimes it’s called
conflict graph. Sometimes they call
it interference graph. Okay. Just another name, shows
conflicts are interferences. So here, A and B conflict. So there is an edge that
indicates that A and B conflict with each other and A and
C conflict with each other, but B and C do not
conflict with each other. Now in order to find, to
allocate physical registers for these virtual registers,
we can think of this problem as a graph coloring problem. So how many people are
familiar with graph coloring? Okay, so what does it mean? Yeah.>>The graph coloring problem
is like you have like a map and you don’t want two
colors to touch each other.>>Okay. So here we
don’t have a map. We have a graph. So in the context of a graph,
what does it mean to color it? Yeah. It is the same
idea, the same idea, where in the map you don’t
want, you know, two countries that are next to each other,
adjacent, you don’t want them to have the same color. You want different colors. So adjacent, so adjacency
in the context of a graph is represented
by edges. So A and B are adjacent. So we don’t want them
to have the same color. So we can say okay, this
is red, and this has to be a different color. This is blue. Now C is adjacent to A, right. But it’s not adjacent to b.
So adjacency in the context of graph, it means there
is an edge between them. The X and Y are adjacent. There’s an edge between them. So now C can be colored blue,
but you cannot color it red. Right. So this is the
graph coloring problem. So now in this case, one of
these colors will correspond to what in the context
of register allocation?>>It will correspond
to physical registry.>>Physical registers,
yeah exactly. So now, colors correspond
to physical registers. And these nodes are
virtual registers. And we would like to do this
coloring using the minimum number of colors or minimum
number of physical registers. Now in general, for
a general graph, the graph coloring
problem is NP complete. So graph coloring
is NP complete. [ Inaudible ] Okay. So which means that there
is no polynomial time algorithm, there is no known
polynomial time algorithm that will take an
arbitrary graph and color it optimally
using the minimum number of colors all the time. So it’s, you know, you
may find an algorithm that will give you the minimum
number of colors for say 99% of the instances, if
it’s a good algorithm. So it would give you the
minimum number of colors for 99% of the instances
but not for 100%. So there is no algorithm
that is guaranteed to give you the minimum
number of colors for every single instance that
you can input to that algorithm. Okay, so but there are
heuristics, there are heuristics for doing graph coloring. So there are reasonable
heuristics for doing graph coloring. Now let’s, let’s switch
to the global problem. So far we have been talking
about this local problem. We understand what
the live range means. We understand why we are
interested in the conflict between live ranges
or the overlapping or interference between
life ranges. And the relation between
register allocation and graph coloring,
how you can think of the register allocation
problem as a graph coloring problem. You want to color this graph
using the minimum number of colors. Now, what applies to local
register allocation applies to global register
allocation, but here, the notion of a live
range is more complex because it’s no longer linear. You know, our live ranges
here are no longer linear. For example, the live range
for A is going to split here. A is defined here, and it’s
live here and live here. So it’s going to split,
then it’s going to merge. So, a live range in the
global context, in the context of a control flow graph with multiple basic blocks
is not going to be linear. But still, we can talk
about live ranges, even in the context of global. So now, you know, for
example, the live range of A is going to be like this. So A is defined here, and this
is the last use of A, so A, you can think of the live
range of A as, it starts here, and then it goes through all of this basic block,
and it ends here. But this is not the only path.>>Yeah, it’s actually used in
that other range block too, A.>>Yeah, and it goes here,
and it goes through this. But it keeps going
because it’s used here. So this is, so this is what
the live range of A looks like. It’s not a line,
it’s not an interval. It’s not linear. It’s, you know, it
looks more like a web. You know, sometimes they
call the live ranges and the global context webs
because you have lines all over, and they split and then merge. So they don’t look like lines. So this is for A. Now
the live range of E, because E is not used in
any other basic block, so E is basically defined
and used locally, right. So its live range is going
to be an interval like this. This is the live range for
E. The live range for B, so B is defined here
and is used here. So B is another local live
range, which is an interval. For C, another local one. But for D, so D is defined
here, so its live range is going to be this, but there
is another live range, another line, which is this. Okay? So, what does this mean? It means that when we look for
the conflicts or the overlaps between live ranges
in the global context, we have to be more careful. So let’s see if we
can determine now which live ranges
interfere and which don’t. Because if we can draw
that interference graph, then we can think of the problem
as a graph coloring problem. So let’s construct
the interference gap. Let’s start with A. So with which live ranges
does A conflict? So, okay, A and B. Does
it conflict with B? Yeah, so A goes this way. So in fact, it’s
conflicting with everyone. So it’s conflicting
with B. It’s conflicting with C. It’s conflicting with
E. It’s conflicting with, even with D. So if you look at
the live range for A, it overlaps with all other live
ranges, because it’s very long, and it, and this is the last use
of K. So A is going to conflict with all, it’s going to conflict
with B. It’s going to conflict with C, D and E. It
conflicts with all of them. Okay. So this is, you know, this part of constructing the
conflict graph is something that has to be done carefully,
because it’s kind of tricky. Okay? So we know
the interferences of A. Now, what about B? So maybe we should
ask the question, with which other live
ranges does B conflict?>>D.>>It conflicts with D.
Does it conflict with A? Yes, it already conflicts
with it. So B conflicts with D. Okay. So if we look at C, so
here’s the live range for C, with which –>>Just A.>>Yeah, it only conflicts
with A, so it’s already, we already have an edge from
C to A. Now, what about D? Just A and B.>>Well, it conflicts with
A, and it conflicts with B, but both of these conflicts
have already been captured in the graph. Now what about E?>>Just A.>>It only conflicts
with A. So this is going to be our conflict graph. Now looking at this
conflict graph, of course, you know with simple instances
like this, we can easily figure out the optimal solution, right. So we only have five nodes, so
by inspection, we can figure out the optimal solution here,
the minimum number of colors. What’s the minimum number
of colors that we need here?>>Three.>>Yeah. Why is it three?>>Because there’s
three connected nodes.>>Yeah, exactly. Because A, B and D, so
these are connected. So these are all, not only
connected, but each one of them is connected
to the two others. So this subgraph in red,
what do we call this?>>That’s a cycle.>>No, and when a number of nodes are all
connected to each other –>>A complete graph.>>Well, it’s a complete
subgraph. So what do we call it? What’s the name for
a complete subgraph? So have you heard of clique? It’s a click. So this is a clique
of size three. So clique, it’s a
subset of the nodes that are all connected together. They form a subgraph,
a complete subgraph. So this is clique
of a size three, a clique that includes
three nodes, this means that you cannot do
it with less than three colors. So, at best, you will
need three colors. So, if in this case, if the target machine has three
physical registers or more, we don’t have a problem. We can color this. You’re assuming that you know, the graph coloring
algorithm is going to discover the best coloring. And there are good heuristics
for doing graph coloring. They are not perfect,
because it’s NP complete. The problem is NP complete. But there are good heuristics that will discover the
optimal solution most of the time, for most instances. So assuming that we
have a good heuristic and we have three physical
registers, we have no problem. We can color them, so we can
say, for example, that this is, you know, this is
blue, and this is red. And this is maybe green. Okay. And see, you can
give it any color but blue, so it can color it red. And E, you color it
any color but blue, so it can color it red. So this is if you have
three physical registers. What if you have two now? What if you have two
physical registers? So what will make sense to do if
we have two physical registers? Well, in this case,
in this case, having two physical registers
means that you cannot, there is no way you can
allocate a physical register for every live range. So you have so many conflicts that you cannot keep each live
range in a physical register. Now, you know, before
we move to this, remember that if we have
three physical registers here and we manage to, you
know, to map each one to a physical register, now
we have this A, we will put it in one physical register,
and it can keep it through all basic blocks. So unlike local register
allocation, you will no longer need to do
your stores here and loads here. Okay. So if you have
three physical registers, you can keep A in a physical
register, and you don’t need to store A here and
load here and here. Okay, now if we have two, we have to victimize
one live range. And the live range that we will
victimize is going to be one of three live ranges, all right, the ones that have
this big conflicts. So these three cannot be kept in the same register
at the same time. So we need to victimize
one of them.>>D.>>Now how do, what will
make sense to consider when selecting a live range
to victimize or to spill? To make a spilling decision. Well, compilers will
look at how, you know, the cost of this spill. Well, compilers will
look at how, you know, the cost of this spill. So how much, they will
estimate the spill cost. What’s this spill cost? The spill cost is how
much it will cost me to spill this live range. And spilling a live range
means you insert a store after every definition, and you
insert a load before every use. So spilling, spilling
a live range means inserting a store
after every definition. Spilling live range of X,
spilling the live range of X means inserting a store
after every definition of X and a load before every use of
X. So in this case, for example, if we choose to spill A,
spilling A, we haven’t decided to spill A. But just assuming
A will be the register that will end up
getting spilled, we will put a store here,
store A, after the definition. And this is the use of,
no, there is no A here. It is a use of A here, so
what do we insert here? Load. Load A. And what else?>>There’s one more use.>>Yeah, there is a use. So what do we insert?>>Load.>>Load. Load A. So,
what’s spilling means is that we decided, we haven’t
decided this for A yet, but if we decide, if we end up
spilling A, we will put a store after A, after every definition,
so that we store it in memory. And we put a load
before every use. What does it mean? It means that we
spilled it to memory. It’s no longer in the register. And A is no longer
in the register. And it means that after
we are done with it, we just put it in memory. And whenever we need
it, we have to load it or fetch it from memory. Okay, so this is
what spilling means? Now, given this definition
of spilling or how spilling is
implemented, will it make sense to spill the register that has
lots of definitions and uses or the register that has fewer?>>Fewer.>>Yeah, the one that has fewer. Because the one that
has fewer definitions and uses will require
fewer loads and stores. Our objective is to minimize
the number of loads and stores that we will be inserting
into the code. So we prefer to spill
the register that has fewer defs and uses. But the number of defs and uses
is not the only consideration. We also have to take
into account what? Well, not all basic
blocks are going to get executed the
same number of times. Because some basic blocks
may be inside loops. Some basic blocks
may be outside loops. So if a register is used
only once in a basic block that is inside a deeply
nested loop, then that load that we will be inserting, that load will get executed
a large number of times. So if there is a block and
this block is inside of a loop, and we just put one load here, and this loop gets
executed a million times, then this load will get
executed a million times. So, we’ll have to take
into account the frequency of execution of the
loops, if we know that. Of course, you know, compile
time, remember that this is, all of this is done
at compile time. So the compiler doesn’t know
precisely what, you know, how many times each basic
block will get executed. Because not all basic blocks,
not all loop, trip counts or loop, the number of
iterations of the loops, not all of them are known
at the compile time. So you can have something
like this 4I equals zero, I less than X, I plus,
plus, and you have a loop. And then read X. So X is read
from the input or from a file. So if X is read from the screen
or from a file or whatever, at compile time when the
compiler is looking at this, it doesn’t know how many times
this loop will get executed, right. Yeah. So in general, the
compiler doesn’t know at compile time how many
times a loop gets executed. It doesn’t know how many times
a basic block will get executed, but usually it comes
up with some estimates. And it can distinguished
between basic blocks that are inside loops and basic
blocks that are outside loops. So if there is a, you know,
if there is code in here, basic block here, then we
know that it’s inside the loop and most likely it will get
executed more frequently than other basic blocks.>>So it gets a higher
priority for register hours.>>Yeah, exactly. So we try not to spill, we
try not to spill inside loops. We try to spill outside loops. And again spilling, you know,
adding these loads and stores. We try to do it outside
loops, not inside loops. But the point that I’m trying
to make here is at compile time, the compiler doesn’t
have precise information. It may come up with
some estimates. So in fact, register
allocation would be based on these estimates. So if the compiler comes
up with, for example, this loop will get, this basic
block will get executed once, and then it will split. So this is going to be half,
we’ll have a weight of a half. And this will have
a weight of a half. But this is inside of a loop. So, this is going to get
executed ten times, for example. So, the register allocator
is going to do the analysis, the analysis based on
the execution frequency with these estimates
that are, like I said, usually are not accurate. Now given these estimates, which register do you
think it will be spilling?>>D.>>Yeah. Well no, D is here. So D is here. So A and D are here.>>B.>>Yeah, so B because B is the
one that is outside the loop. So if it spills D, sorry,
B, B is only in this loop. And it will insert, you
know, a store here, store B, and here it will insert what?>>Load.>>Load B.>>Oh, okay.>>Okay. So in this case, it’s
going to split this live range. So the live range of
B is this and this. So now if it does this, B
will no longer conflict with, will it still conflict with D?>>Yeah D.>>It will still conflict
with D and A. So in fact, what this means is that the
compiler, with this algorithm, we will identify the live
range that is least expensive to spill, which is in
this case, B. We spill it, and then we try again. We construct the
conflict graph again, and we try to color again. This may or may not help. So the spilling may
or may not help. May or may not make
things easier. Things may continue to be hard. And in this case,
it didn’t help. So if it doesn’t help, it
will make another decision. Okay. It will, it
will try now to take, to spill the next cheapest
live range to spill. And now it’s not obvious,
so we’ll, you know, we have to do some calculations. In fact in general,
it’s, you know, one way of calculating this is
to count the number of loads and the number of stores and
to multiply each load or store with the execution
frequency of that basic block. And that will give you
an accurate estimate of the spill cost. And since we are running
out of time, you now, we will not get the chance
to do this in detail. But if you do this, if
you count how many stores and loads you will need, and
then you multiply each one by the execution frequency, you
will get a higher cost for A because A has more
definitions and uses than D.

Leave a Reply

Your email address will not be published. Required fields are marked *