Compilers Lecture 2: Compiler Overview (2): Register Allocation Concepts

Compilers Lecture 2: Compiler Overview (2): Register Allocation Concepts


>>OK. So, we’ll continue
our overview of compiler. And, today we will focus
on register allocation and instruction scheduling. So, first, let’s draw that
picture that we do last time. So, the compiler, we
have the front end. We have the optimizer. And, we have the back end. OK? And, let’s try to remember,
you know, what each one of these does, just to
review what we covered in the previous lectures. So, what does the front end do? The whole front end, yes. What’s the function
of the front end?>>[inaudible]>>Yes. Go ahead.>>Yes. The scanning,
portioning, and semantic analysis.>>OK. So, scanning
and semantic analysis. There is one other [inaudible].>>Parsing.>>Parsing. So, it’s scanning, parsing,
and semantic analysis. So, what do these
all combined do? You know, what’s the input and what’s the output
of the front end?>>Source code.>>The input is source code. And, what’s the output
of the front end? Yes?>>Abstract syntax tree?>>Yes. An intermediate
representation of the code, which is typically an
abstract syntax tree. And, the abstract syntax
tree is a tree representation of the code. And, we have seen an
example last time. Now, in this middle part of the
compiler we have the optimizer. So, last time, we
gave an example of the optimizations
that take place here. What was that example?>>[inaudible] Subexpression
elimination.>>Yes, exactly. So, it’s common subexpression
elimination. And, last time we mentioned
that the optimizations that take place here
are intended to be machine-independent. So, they are not
low-level optimizations that target a specific machine, or that try to do
good utilization of the capabilities
of the hardware. Optimizations that
try to maximize and optimize the utilization of the hardware are
here in the backend. So, the backend is intended to have those low-level
optimizations that make the best use of
the available hardware, that optimize the code
on a specific hardware. So, in principle, this
should be machine-dependent. The backend is machine-dependent
by nature. Why– the middle
optimizer is supposed to be machine-independent. But, last time we
said that in practice, these optimizations
here are not going to be strictly
machine-independent. Their nature has nothing
to do with the machine, it has to do with
the code, right? And, common subexpression
elimination, it’s an optimization that has
to do with the code itself. It’s trying to eliminate
some redundancy in the code, or it’s trying to
do some re-use. The idea here in common
subexpression elimination is reuse. Instead of computing
something multiple times, compute it once,
and then reuse it. That’s the idea. Now, this reuse requires
resources. So, when you reuse something,
you want to store it somewhere. So, you have to store it
somewhere, in some place, so that you can reuse it. But, that place that you store
it has to be fast enough. It has to have easy access. If it doesn’t have easy access,
then that defeats the purpose. So, it’s– if you don’t
have a fast enough device that you store this thing in,
you will not do an optimization. It may be just faster to
recompute, instead of reusing. And, having said that, this
means that for this kind of reuse, the best storage
device for this is the register. So, you want it to
be in a CPU register, which is the fastest
storage device. And, when you keep doing
optimizations like this, that require, you know,
reuse and storage, and require registers
in particular, you are increasing the
demand for registers. And, increasing the demand for registers is not a
good thing, you know? Increasing the demand
for registers may end up slowing performance. Why? Because you
have a limited number of physical registers
on the machine. So, this is something that
we will face in the backend. There is a limited number of physical registers
on the machine. And, if you have, you know,
like, you know, 200 front– you know, variables
in your function. Two hundred variables. And, you have 300
temporaries, why do you think that the generic code
will have temporaries? So, there are the
original variables that are in the source program,
and there are temporaries that the compiler will generate. Why do you think those would–>>[inaudible] swap in
and out to registers?>>Not necessarily. So, let’s look at our example. In fact, the example we will be
doing today, a times b, plus c, divided by d, is assigned to x.>>[inaudible]>>Exactly. Intermediate values. So, if we look at the abstract
syntax tree, we’ll have a and b, they will get multiplied,
but the result of this multiplication has to
be stored somewhere, right? And then, c and d,
they are divided. And then, you take the
result and you add it, and then you assign to x. You assign this to x. Now,
this intermediate result– you know when we wrote some–
the assembly code for this, we wrote the code like this. You know, load a
into register r1. Load b into register
into register r2. Then, multiply r1 and r2. And, put the result in r3. Now, in this code, we
have three registers. Register r1 holds an
actual program variable. Register number 2 has
a program variable. Register number 3
has a temporary. It has a– you know, it’s– it doesn’t correspond
to a program variable. It stores some intermediate
result, which is the result of the multiplication
between r1 and r2. So, it corresponds to this node
in the abstract syntax tree. So, in general, the code that
is generated by the compiler, it’s generated– the first step in the code generation is
this abstract syntax tree. And, some compilers may, you
know, immediately convert this into some linear
representation like this. Some compilers may delay
this a little bit and work on the abstract, on the tree
representation of the code. So, you know, this is what we
call a tree representation, or a graphical representation
of the code. And, this is linear,
a linear form, a linear representation
of the code. And, last time, you know, we said that usually a compiler
uses multiple representations, multiple intermediate
representations, throughout the compilation
process. So, it will start with– typically starts with a tree
representation that moves to a linear representation
like this, which is abstract assembly. Then it will start at making
this abstract assembly more real, targeting the
real machine. So, here we have a
program with some– you know, 200 variables,
and 400 temporaries. And, the machine
has 16 registers. So, you cannot put
all the variables and all the temporaries
in CPU registers. So, there is a competition
for registers. In fact, this is going to be the
main topic in today’s lecture. But, the point here is that, in optimizations may
require more registers. They increase the
demand for registers. They increase what we call
the register pressure. And, when that register
pressure increases– you know, not all of these
virtual registers can be stored in physical registers. So some of these virtual
registers may end up in memory. We will see this in
detail in today’s lecture. But, when they end up
in memory, that’s slow. So, this means that,
you know, we could end up slowing the execution. Some optimizations
that appear to be good, and they are creating
redundancies, or they are doing a lot
of reuse, they may end up increasing register pressure,
increasing the competition for registers and
some of the variables on the temporaries will
not be stored in registers, they will be placed in memory. And, in that case, our
execution may slow down. OK. So, in the backend, we
have instruction selection. And, instruction scheduling,
and register allocation. So, we talked about
instruction selection last time. So, what’s the instruction
selection?>>[inaudible]>>Yes?>>[inaudible]>>OK. The best sequence of
instructions for the given code. So, you can either apply
instruction selection– you can apply it to a tree
representation of the code. You can apply directly to a
tree representation of the code, or you can apply it to
a linear representation of the code like this. And then, you want to decide,
you know, what are the best, or what are the best
machine instructions for implementing this. So, this could by
one-to-one mapping. We could just map these abstract
assembly instructions to actual, you know, concrete instructions
on the large machine. This is possible. But, sometimes this may not
be the most efficient way. Sometimes, you know– what
was the example we presented last time? Yes?>>The load taking
multiple cycles, so rather than having a
multiplied weight on the load, have another load between
those two to pick up the–>>OK. So, that was
instruction scheduling. But, now we’re talking
about instruction selection. In instruction selection,
what was the– you know, a better sequence
for these three instructions? On the target machine? Yes?>>You said there was
[inaudible] machines that the multiply didn’t– couldn’t do it for two
loaded variables rather than registers [inaudible].>>Two memory locations, yes. So, if the machine– you know,
we are doing this assuming that the machine does not
have a multiply instruction that can operate on memory. On memory operands. But if that machine, if the target machine
has an instruction– an assist instruction that can
operate on memory operands, we can replace this sequence
of three instructions with one instruction that operates directly
on memory operands. Instead of loading the first, loading the second,
and then multiplying. So, instruction selection
is about utilizing, making the best use
of an instruction set that is available on
the target processor. Making the best choices
and using– selecting the most
efficient sequence of the machine instructions
for this, you know, intermediate representation. Remember that this
is still, you know, this abstract assembly is still
an intermediate representation. By the way, can you think of a– you know, can you think of
an algorithm that takes you from this tree– that
converts this tree into this abstract assembly? Yes?>>Like, just reversing
the tree if–>>Reversing the tree. What kind of reversal
will give us this?>>I’m forgetting
the words for this.>>OK. So, what kind of reversal
will help us in this case? Will get us from this
abstract syntax tree to–>>[inaudible]>>Yes, it’s depth first. And, can you explain
why depth first?>>Because you got to go down
the tree to hit a. And then, you got to back up and
hit b for the [inaudible] and then multiply them
together for the operation.>>OK. So, we need the depth
because if we are here, we want to perform the add. We cannot perform the add until we completely
compute this left branch and this right branch. So, we cannot do the add until
we do everything below it. Until we completely
compute this, and completely compute this. And– but, we cannot
compute the multiply– we cannot do the multiply until
we do everything below it. Everything beneath it, right? So, this is a depth
first by nature. In this case, we have– you
know, these are the leaves. So, leaves are loads, because
we’re loading these variables. Then, when we get
to the multiply, you multiply register 1
with register 2, and then, you load c into register
r3– r4, sorry. And, load d into register r5. And, divide what? [ Inaudible ] Divide r4 and r5. And, put the result in r6. Then– so we have
traversed this, and traversed this,
and now we are here. So, we need to do–>>Add.>>Add, yes. So, this result here
is in register–>>R3.>>R3, and this is in r6. Right? So, dividing our 4 and
our 5, we have this result in r3 and this result in r6. And then, we need
to add r3 and r6. And, put the result in some
other register, like r7. And now the result
is here in r7. So, what should we do with it?>>[inaudible]>>[inaudible]>>Yes, store it in– so,
the assign is a store. We will– you know, we will
cover this in more detail when we get to co-generation,
but this is just, in this case it’s
just so intuitive. You know, this straightforward
conversion from this abstract syntax tree
to this abstract assembly. Store r7 into x. OK. So, here– the most
important observation here is that when we are constructing
this abstract assembly, we are assuming that we have an
infinite number of registers. So, this assumption is
one of the main things that make it abstract assembly. So, it’s not real assembly yet. It’s not real because we
are assuming that we have as many registers as we want. So, what did we call
these registers?>>Virtual.>>Virtual registers. So, these are not
real registers. We’re just using as many
registers as we need. So, these registers
are virtual registers. What the register
allocator does– the register allocation phase in
the compiler– what it does is, it maps these virtual registers
into physical registers as we will see in a minute. OK? We’ll see how it does this
mapping of virtual registers into physical registers. Now, I have to make a note
about, you know, these a’s and b’s that I’m
using in my assembly. I think I pointed out that
real assembly does not have variable names. Right? So, real assembly– machines are not aware
of variable names. A machine doesn’t
know variables. Doesn’t even– is not aware of
the whole notion of a variable. A machine knows storage
locations. So, it knows registers
and memory locations. That’s all what the
machine understands. The machine doesn’t
understand variables. But, what are we implying here? If you remember from
Lecture 1, or Lecture 2. When we are using this variable
name here, what are we implying?>>The address?>>Yes, the address of
this variable in memory. And, in fact, let’s introduce
the notation in the book. So, in the textbook, the
textbook uses a language, an intermediate representation
called ILOC, which stands for Intermediate Language
for an Optimizing Compiler. ILOC, Intermediate Language
for an Optimizing Compiler. And, in this language, you know, a load instruction is written
load address immediate, r activation record pointer
address of a into r1. So, what I’m writing here,
this is a shorthand for this. This is a shorthand for this. What does this mean? So, this is load
address immediate. So, this is a load instruction
that takes an address, and it takes an immediate
operand. So, the address is
[inaudible] register. This register has a pointer
to the activation record for the current routine
that we are compiling. And, this address of a,
this is an immediate number that represents what? It’s just a number that–>>Offset.>>Offset, exactly. The offset of variable
a. So, let’s– since we will be using this,
let’s clarify it even further. So, your– you know, if
you have a main function. Your main function– a
program with a main function. The main function will have an
activation record on the stack. This activation record
has the local variables, and it has the parameters,
and it has the return value, and it has the return
address for every function. This is what we mean by
the activation record. When main calls function
1, the activation record for function 1 will get
pushed on the stack. When function 1 calls function
2, the activation record for function 2 will get
pushed on the stack. So, each function, each active
function, as this location on the stack, that we
call the activation record for this function, and it
has the local variables of this function, it has the
parameters of this function. It has the return value. And, it has the return
address, and other things. So, this is the activation
record. So, when function 2
completes, what will happen? When function 2 completes
executing, and needs to return to f1?>>[inaudible]>>Yes. The activation record of
f2 gets [inaudible] the stack. So now, each of these activation
records will have a, you know, an address associated with it. So, if we are in function
2, the address for– the address for function
2 could be, you know, for example, 0xff10. So, this is the starting
address of f2. So, if we are compiling, if the compiler is
currently compiling f2, then this is the starting
address of– let’s make it 00. This is the starting
address of function f2. And, the function f2, you know,
has variables a and b and c. So, the address of variable a
is this starting address, which we assume is
stored in a register that we call activation
record point register. So, it’s a special register
that holds the starting address of the activation record
of the current function. So, in this case, it’s
going to have 0xff00. So, to compute the actual
address of a variable, for a, it’s going to be
this– you know, 0xff00. That’s going to be
for, you know, variable a. For variable
b, it’s going to be 0xff04. Assuming that we have
32-bit variables. This is going to be,
you know, offset 0. This is offset 4. This is offset 8. Assuming 32-bit variables. And, for c, 0xff08. OK. So, this is the
notation in the book. So, a load instruction takes
the activation register which is in– you know,
for a local variable. The base is the activation
record pointer. The address in the– the register that holds the
activation record pointer. And, this is the
offset for the variable. And, what I have here is
just a shorthand for this. You know, it’s understood
that we’re assuming that variables are
local variables, and all addresses are
relative to that activation– to the starting address
of that stack frame. And, a is not a variable,
it’s in fact an offset. OK? Questions on this? Alright. So, usually we
will just do this shorthand. We will not write the full
syntax of the assembly in the book, or the
abstract assembly. OK? Now, how does the compiler– how does the register allocator
map these virtual registers into physical registers? Now, let’s assume that we
have– or, the target machine– — has three physical registers. If the target machine has
three physical registers– so let’s call them
P1, P2, and P3. So, we have three physical
registers of the target machine, and how many virtual
registers do we have here?>>[inaudible]>>Seven. So, the job of
the register allocator is to do the mapping between these. So, how will it map
them in this case? Well, it’s going to say, OK,
register r1, I’m going to map r1 to P1, the first
physical register. And, I’m going to map r2 to P2. Now, here, obviously we
need two physical registers. Now, this multiply
now is multiplying r1 and putting the result in r3. The question is, for r3,
do we need a new register, or we can just reuse P1 or P2?>>Reuse.>>So, why can we reuse? Yes, we can reuse.>>Because they’re not using it.>>Exactly. Because we are not used again. So, because r1 and r2 are
not used again, we don’t have to use a third physical
register. We can just reuse r1 or r2. So, we can say, OK, r3 is
going to get mapped to P1. Because after this instruction,
we do not need r1 and r2 again. To explain this further, assuming that we have assumed
an additional instruction like this, add r1 and r7,
and put the result in r8. Now, if you– if we had
an instruction like this, would have this allocation
been legal? No. It won’t be legal. Now, here, for r3,
you cannot reuse– you cannot reuse the register
for 1, because I still need it. I still need it here. So, I still need to keep
r1 in a physical register until I get to this instruction. So, basically, you always need
to keep a virtual register in the same physical register
until you are done with it. Until the last use
of that register. So, with this instruction in
red added, I want to be able to reuse P1, but I
can still reuse P2. Right? P2– you know, r2 is not used. So, with instruction in
red, I can reuse P1– I cannot reuse P1,
but I can reuse P2. So, in register allocation–
and, by the way, we are not discarding
an algorithm now. We are discarding the concept. We’re trying to use the minimum
number of physical registers, or at least use the
registers that we have. And, we are trying to do as
much reuse as possible, OK? So, if we have an
instruction like this, then we would say that,
you know, the live range of register r1 is
going to be this much. So, this is what we call the
live range of register r1. So, register r1 got
defined here, and it got used here and here. So, the live range of our
register is the distance between the definition, this
is where we’re defining it. We’re putting a value in it. OK, we’re storing a
value or a result in it. So, it’s defined here. And, it’s used here and here. So, this is the live
range of register r1. So, register r1 has
a long live range with this instruction in red. While– what is the live
range of register r2?>>It’s just those two below it.>>Yes, just the three, the
first three instructions. So, this is the live
range of r2.>>[inaudible]>>Would the live range start
at the second instruction?>>We’re using it here, right? Oh– yes. You’re right. Yes. So, this is where we are
starting, with the definition. Yes. So, this is the
live range of r2. So, this is the definition,
and this is the last use. While for r1, this
is the definition and this is the last use. OK. Questions on that
concept of a live range? OK. So, the point here is when
we have longer live ranges, we have a stronger
competition for registers. So, we say that we have
high register pressure. In fact, by register pressure,
we mean the number of registers that are live at
a certain point. We’ll say more about this later. But, let’s now get
back to this example. So, let’s now get rid
of this red instruction. Now, we get rid of this
instruction, then the live range of r1 now becomes this. Live range of r1. OK. So now, we can do,
again, you know, can use P1. Now, for r4, what can I use? Let’s say we will always
use the physical register with the smallest
number that is available. So, what will be available
for us to use for r4?>>P2.>>P2. Exactly. So, for r4, we’ll use P2. Of course, if we use P3,
it would be perfectly fine, but we’re trying to
use the minimum number of registers here. So, r4 we can use P2. For r5– We must use P3, right? Because here, at this point– At this point, we have registers
r3 and 4, both of them are live. So, here when you get to
this point, r3 is live, because it got defined
here, and it’s used here. So, this is the live
range of r3. Right? And, the live
range of r4 is this. So, at this point, r3
and r4 are still live, so we cannot use the
registers of, you know, P1 and P2 for r3 and r4. We must use a third register. Here. OK. So, for
r5, we must use P3. OK? And, for r6, now– so, which registers are no
longer live at this point? R4 and r5, right? R4 and r5, after we execute the
divide, we no longer need them. By the way, we are
assuming that, you know, this instruction, you can reuse
the registers that you are using for the input for the source
operand of an instruction. Because you will read these
values before you write to the register that
will hold the result. So, that’s why we are able
to reuse the registers that are used for the source
operands of an instruction. So now, r4 and r5, they’re
not used below this point, so we can just reuse
either P2 or P3. We will reuse P2. And now, for r6–
so, r3 is in P1. R6 is in P2. Right? And, in fact, after
we are done with this– at this point, after
we’re done with r3 and r6, we can use any register
that we want, right? So, for r7, you can use
anything that you want. And, let’s stick to the rule
of using the smallest number. So, we’ll just use P1 for r7. And then, we store P1 in 2x. OK? So, in this case, we
managed to map virtual registers into physical registers. And, we could do it with
three physical registers. Now, let’s assume that we only
have two physical registers.>>[inaudible]>>So now, assume two
physical registers. P1 and P2. So, that’s all that you have. Now, let’s redo it with
two physical registers, see what happen. OK. So now, we can do,
you know, r1 in P1. And, r2 in P2. There is no other
way of doing it. You know, we have two of them. So, we put this in
P1, and this in P2. Now, we have this multiply and we need a register
for the result. Without that red
instruction that I deleted, we can use the register for
r1 or r2, because we assume that it consumes the source
operands before it writes to the destination operand. So, we can put r3 in P1. Now, we need a register for r4, so we can put it where–
where can we put r4?>>Into P2?>>In P2. Now, we need
a register for r5. This is where we hit a problem. Now, we need a register for r5. And, we have, you
know, r3 and r4– — at this point. We are looking for
a register for r5. Now, r3 and r4 are the ones
that have the registers. Are they live or dead?>>They’re live.>>Live.>>They’re live, because
we’re using r4 here, and we’re using r3 here. So, live means that they
will get used later. So, both r4 and r3 are live. So, we can’t reuse P1 or P2. But, we need a register
for this. So, what we can do here is that,
OK, we can say that this r3, we can store it in memory. So, the register allocator in this case would generate
a store instruction. Store r3, which is, you
know, P1, store P1 in a temp. So, this temp stands for a
temporary memory location. Or, let’s call it,
to be explicit, let’s call it m, or mem temp. OK? Now, we can use
this register P1 for r5. So now we can say r5 goes to P1. So now, this is P2, this is P1. When we divide, we divide
P1 by P2, or P2 by P1. And, we’ll put the result in r6. OK? Now, for r6, which
register can we use? Now, after this point,
r4 and r5, are they going to
be live or dead?>>They’re dead.>>Dead. So, we can
reuse either P1 or P2. OK? Because r4 and r5
will no longer be needed. So, we can say, OK,
we’ll put r6– — in P1. Because both of
these guys will not be needed after this instruction. Now, we need r3. But, virtual register
r3 is no longer in a physical register So, it’s
not in a physical register. We stored it in memory. So, what do we need to do? What’s the logical thing to do?>>[inaudible]>>Load it, yes. Bring it back from memory. So, we load that memory, that
temporary memory location, we store it into–
which register? What are the options
that we have? We only have one option.>>P2.>>P2.>>P2. Because P1 has r6 in
it, and we still need r6. So, the only option
that we have is P2. OK? So now, we can,
you know, add– so, basically adding,
you know, P2 and P1. And then, we can
put the result– once we are done with this, we
can use any register we want, so we can put r7 in P1. OK? Now, what’s the point here? The point here is that
with fewer registers, when we have two physical
registers instead of three, the register allocator could
not map each virtual register to a physical register. It had to do some
storing and loading. So, in this case, r3 did
not get mapped, you know, entirely to a physical register. In fact, we put it in a
physical register temporarily. Then, we stored it in memory. Then, we loaded it
again when we needed it. So, this kind of store,
we call it spilling. With the terminology
of compilers and register allocation,
this is a spill. So, we basically
spilled it to memory, because we don’t
have enough room for it on the register file. There is no room for it
on the register file, so we spilled it to memory. And, here we loaded it again. So, this mean that when we
have high demand for registers, the register allocator
will not be able to accommodate all
the virtual registers. It will not be able to
find a physical register for every virtual register. So, it will be forced
to do spilling. It will be forced to spill
some of them to memory. And, obviously, you know,
these stores and loads that we are adding,
that we call spill code, obviously these are
slowing the execution. These are going to
slow the execution. We are adding instructions. These instructions are going
to be using machine resources, they will use functional units, and they will use all
the machine resources. And, they will be using
the memory system. And, the memory system
has a limited bandwidth. So, you are executing
more load than stores, and that’s slowing
your execution. So, the job of the register
allocator is to do the mapping with minimum spilling. The job of the register
allocator is to minimize these spills. It try to accommodate all the
virtual registers, and map them to physical registers. Any questions on these concepts? Yes?>>I have a question now. With the architecture
of today’s processors, wouldn’t when it spills
over to write to memory, wouldn’t it go into
the L1 cache?>>Oh, even if it does– well,
we can– generally speaking, we can never control if
something is in L1 cache or not. But, most likely, when you
are spilling, you know, these spills are going
to go to the stack.>>Yes.>>And, they’re going
to the stack, and they are getting reused–
reused within a short period of time, so they’re very
likely to be in the L1 cache. So, it’s, you know, most likely, you will be hitting the L1
cache, or things will hit in the L1 cache, and there
will not be cache misses. But, even if you hit in the
L1 cache, this is still slower than storing them in a register. You know, L1 cache is still
slower than the register. Accessing L1 cache will
take two or three cycles, while accessing a
register is instantaneous. You can access the
register immediately. So, definitely L1 cache is
much faster than main memory. So, L1 cache you can access it
in two to three to four cycles, while main memory–
to access main memory, you need hundreds of cycles. So, it’s two orders of
magnitude, on modern processors, it’s two orders of magnitude
faster than main memory, but it is still slower
than a register. And, you are also using,
you know, other resources. You are using the memory system. And, your memory system
has a limited bandwidth. You are executing more
memory operations. And, memory operations
are expensive. So, your memory system
has a limited bandwidth. Overall, you have limited
resources on the machine. And, these instructions that
you are adding are just going to use some resources. They will use issued slots. So, it’s an instruction
that will use issued slots, and that will use an issued
slot that could be used for another instruction. OK? Yes. So, you know, we
would expect spill code to hit on the cache most of the
time, but even if it is, it will still slow
the execution. Of course, if it
misses on the cache, it’s a huge performance hit. Any questions on the concepts of register allocation
and spilling? OK. So, is it crystal clear? Yes. OK. So, unfortunately,
you know, optimizations tend to increase the demand
for registers. And, one of these optimizations
that increase the demand for registers is the common
subexpression elimination that we described last time. The other one is the other
optimization in the backend, which is instruction scheduling. So, instruction scheduling– — increases the
demand for registers. We only described it, described
instruction scheduling briefly, but we will describe it in
greater detail next time, what instruction scheduling is. So, the idea in instruction
scheduling is hiding the latencies of the loads. You know, this is a
multiply that uses the result of the load, so why
don’t we just, you know, execute another instruction
while we’re waiting for the load to complete? So, this is a brief and
possibly vague description of instruction scheduling, but next time we will
explain instruction scheduling in detail. OK? Alright. So, I will see you– any questions before
we end the lecture? OK. So, I will see you.

Leave a Reply

Your email address will not be published. Required fields are marked *