The following content is

provided under a Creative Commons license. Your support will help MIT

OpenCourseWare continue to offer high quality educational

resources for free. To make a donation or view

additional materials from hundreds of MIT courses, visit

MIT OpenCourseWare at ocw.mit.edu. PROFESSOR: Today, we’re moving

on to what will be a major unit of the course, which is

the topic of efficiency. Thus far, we focused our

attention on the admittedly more important problem, getting

our programs to work, i.e., to do what we

want them to do. For the next several lectures, I

want to talk about how do we get them to work quickly

enough to be useful. It is in practice often a very

important consideration in designing programs. The goal is not to make you

an expert in this topic. It’s hard to be an expert

in this topic. I’m certainly not an expert. But I want to give you some

intuition about how to approach the question of

efficiency, how to understand why some programs take much

longer to run than others, and how to go about writing programs

that will finish before you die. And we’ll see that if you

write things wrong, the programs could, in principle,

run longer than you can. So why is efficiency

so important? Earlier in the term, I started

to spend some time talking about how really fast computers

are and showing you that we can use brute force

algorithms to solve fairly large problems. The difficulty is that some of

the computational problems we’re confronted with are not

fairly large but enormous. So for example, in my research

group where we work at the intersection of computer science

and medicine, we have a big database of roughly a

billion and half heart beats. And we routinely run

computations that run for two weeks on that data. And the only reason they

complete it in two weeks and not two years is we were

really careful about efficiency. So it really can matter. And increasingly, it matters

is we see the scale of problems growing. The thing I want you to take

home to remember is that efficiency is rarely about

clever coding. It’s not about some little

trick that saves one instruction here or two

instructions there. It’s really about choosing

the right algorithm. So the take home message is

that efficiency is about algorithms, not about

coding details. Clever algorithms are

hard to invent. A successful computer scientist

might invent maybe one in his or her

whole career. I have to say I invented zero

important algorithms in my whole career. Therefore, we don’t depend upon

being able to do that. Instead what we depend upon

is problem reducing. When confronted with a problem,

we want to reduce it to a previously solved

problem. And this is really often the key

to taking some problem and fitting into a useful

computation. We sit back, say, well, this

looks a little bit like this other problem. How come I transform my problem

to match a problem that some clever person already

knows how to solve? Before I spend time on problem

reduction, however, I want to draw back and look at the

general question of how do we think about efficiency. When we think about it, we

think about it in two dimensions, space and time. And as we’ll see later in the

term, we can often trade one for the other. We can make a program run faster

by using more memory or use less memory at the cost of

making it run more slowly. For now, and the next

few lectures, I’m going to focus on time. Because really, that’s mostly

what people worry about these days when they’re dealing

with complexity. So now, suppose I ask you the

question, how long does some algorithm implemented by

a program take to run? How would you go about answering

that question? Well, you could say, all right,

I’m going to run it on some computer on some

input and time it. Look at my watch. That took three minutes. I ran this other algorithm,

and it took two minutes. It’s a better algorithm. Well, that would be really

a bad way to look at it. The reasons we don’t think

about computational complexity, and that’s really

what people call this topic in terms of how long a program

takes to run on a particular computer, and it’s not

a stable measure. To do that, it’s influenced by

the speed of the machine. So a program that took 1 minute

on my computer might take 30 seconds on yours. It has to do with the cleverness

of the Python implementation. Maybe I have a better

implementation of Python than you do, so my programs will

run a little bit faster. But most importantly, the reason

we don’t depend upon running programs is it depends

upon the input. So I might choose one input for

which the program took 2 minutes and another seemingly

similar input in which it took 1 hour. So I need to get some way to

talk about it more abstractly. The way we do that is

by counting the number of basic steps. So we define some function,

say time, which maps the natural numbers to the

natural numbers. The first n, in this case, the

first natural number, the argument corresponds to the size

of the input, how big an input do we want to run

the program on. And the result of the function

is the number of steps that the computation will take for

an input of that size. I’ll come back to this in a

little bit more precise detail momentarily. A step is an operation that

takes constant time. And that’s important. So steps are not variable,

but they’re constant. So we have lots of these, for

example, an assignment, a comparison, an array

access, et cetera. In looking at computational

complexity in this course, we’re going to use a model

of the computer. It’s known as random access,

a random access machine, frequently abbreviated as RAM. In a random access machine,

instructions are executed one after another, that is to

say they’re sequential. Only one thing happens

at a time. And we assume constant time

required to access memory. So we can access at random any

object in memory in the same amount of time as any

other object. In the early days of computers,

this model was not accurate, because memory

was often say, a tape. And if you wanted to read

something at the end of the tape, it took a lot longer to

read than something at the beginning of the tape. In modern computers, it’s

also not quite accurate. Modern computers have what’s

called a memory hierarchy where you have levels of memory,

the level one cache, the level two cache,

the actual memory. And it can differ by say a

factor of a 100, how long it takes access data

depending upon whether it’s in the cache. The cache keeps track of

recently accessed objects. Nevertheless, if we start

going into that level of detail, we end up losing the

forest for the trees. So almost everybody when they

actually try and analyze algorithms typically works

with this model. We also know in modern computers

that some things happen in parallel. But again, for most of

us, these will be second order effects. And the random access model is

quite good for understanding algorithms. Now when we think about how long

an algorithm will take to run, there are several

different ways we could look at it. We could think of

the best case. And as we think about these

things, as a concrete example, we can think about

linear search. So let’s say we have

an algorithm that’s using linear search. We’ve looked at that before to

find out whether or not an element is in the list. Well, the best case would be

that the first element is three, and I’m searching for 3,

and I find it right away, and I stop. So that would be my best

case complexity. It’s the minimum running time

over all possible inputs. Is the best case. I can also look at

the worst case. What’s the worst case

for linear search? It’s not there. Exactly. So I go and I have to look at

every element, and whoops, it’s not there. So the worst case is the maximum

over all possible inputs of a given size. The size here is the

length of the list. And then I can ask what’s the

expected or average case, what would happen most of the time. The expected case seems, in

principle, like the one we should care about. But the truth is when we do

algorithmic analysis, we almost never deal with

the expected case because it’s too hard. We think about the expected case

for say linear search, we can’t talk about it without some

detailed model of what the list itself looks like, what

elements are in it, and what the distribution of

queries looks like. Are we most of the time asking

for elements that are not in the list in which case

the expected value is out here somewhere? Or are we most of the time

looking for things that are in the list in which case the

expected value would be somewhere near halfway through

the length of the list? We don’t know those things. We have a tough time modeling

expected value. And one of the things we know is

that frequently we don’t — when we release a program — have a good sense of how people

will actually use it. And so we don’t usually

focus on that. Similarly, we don’t usually

focus on the best case. It would be nice. But you could imagine that it’s

not really what we care about, what happens when

we get really lucky. Because we all believe

in Murphy’s law. If something bad can

happen, it will. And that’s why complexity

analysis almost always focuses on the worst case. What the worst case does is it

provides an upper bound. How bad can things

possibly get? What’s the worst that

can happen? And that’s nice because

it means that there are no surprises. You say the worst that this

thing can do is look at every element of the list once. And so if I know that the list

is a million elements, I know, OK, it might have to do

a million comparisons. But it won’t have to do any

more than a million. And so I won’t be suddenly

surprised that it takes overnight to run. Alas, the worst case

happens often. We do frequently end up asking

whether something is in a list, and it’s not. So even though it seems

pessimistic to worry about the worst case, it is the right

one to worry about. All right. Let’s look at an example. So I’ve got a little

function here, f. You can see it here. It’s on the handout as well. First of all, what mathematical

function is f computing, just to force you

to look at it for a minute? What’s it computing? Somebody? It is a function that should

be familiar to almost all of you. Nobody? Pardon. AUDIENCE: Exponentiation. PROFESSOR: Exponentiation? Don’t think so. But I appreciate

you’re trying. It’s worth some candy,

not a lot of candy, but a little candy. Yeah? AUDIENCE: Factorial. PROFESSOR: Factorial. Exactly. It’s computing factorial. Great grab. So let’s think about how long

this will take to run in terms of the number of steps. Well, the first thing it does

is it executes an assertion. And for the sake of argument for

the moment, we can assume that most instructions in Python

will take one step. Then, it does an assignment,

so that’s two steps. Then, it goes through

the loop. Each time through the loop, it

executes three steps, the test at the start of the

loop and the two instructions inside the loop. How many times does it

go through the loop? Somebody? Right. n times. So it will be 2 plus

3 times n. And then it executes a return

statement at the end. So if I want to write down the

function that characterizes the algorithm implemented by

this code, I say it’s 2 plus 3 times n plus 1. Well, I could do that. But it would be kind of silly. Let’s say n equals 3,000. Well, if n equals 3,000, this

tells me that it takes 9,000– well, what does it take? 9,003 steps. Right. Well, do I care whether

it’s 9,000 or 9,003? I don’t really. So in fact, when I look at

complexity, I tend to– I don’t tend to I do ignore

additive constants. So the fact that there’s a 2

here and a 1 here doesn’t really matter. So I say, well, if we’re trying

to characterize this algorithm, let’s ignore those. Because what I really

care about is growth with respect to size. How does the running time

grow as the size of the input grows? We can even go further. Do I actually care whether

it’s 3,000 or 9,000? Well, I might. I might care whether a program

take say 3 hours to run or 9 hours to run. But in fact, as it gets bigger,

and we really care about this as things get

bigger, I probably don’t care that much. If I told you this was going to

take 3,000 years or 9,000 years, you wouldn’t care. Or probably, even if I told you

it was going to take 3,000 days or 9,000 days, you’d say,

well, it’s too long anyway. So typically, we even ignore

multiplicative constants and use a model of asymptotic growth

that talks about how the complexity grows as you

reach the limit of the sizes of the inputs. This is typically done using

a notation we call big O notation written as a single

O. So if I write order n, O(n), what this says is this

algorithm, the complexity, the time grows linearly with n. Doesn’t say whether it’s

3 times n or 2 times n. It’s linear in n is

what this says. Well, why do we call it big O? Well, some people think it’s

because, oh my God, this program will never end. But in fact, no. This notion was introduced

to computer science by Donald Knuth. And he chose the Greek letter

omicron because it was used in the 19th century by people

developing calculus. We don’t typically write

omicron because it’s harder to types. So we usually use the capital

Latin letter O, hence, life gets simple. What this does is it gives

us an upper bound for the asymptotic growth

of the function. So formerly, we would write

something like f of x, where f is some function of

the input x, is order, let’s say x squared. That would say it’s quadratic

in the size of x. Formally what this means is

that the function f– I should probably write

this down– the function f grows no faster

than the quadratic polynomial x squared. So let’s look at what

this means. I wrote a little program that

talks about some of the– I should say probably most

popular values we see. So some of the most popular

orders we would write down, we often see order 1. And what that means

is constant. The time required is

independent of the size of the input. It doesn’t say it’s one step. But it’s independent

of the input. It’s constant. We often see order log n,

logarithmic growth. Order n, linear. One we’ll see later this

week is nlog(n). This is called log linear. And we’ll see why that occurs

surprisingly often. Order n to the c where c is

some constant, this is polynomial. A common polynomial would be

squared as in quadratic. And then, if we’re terribly

unlucky, you run into things that are order c to the

n exponential in the size of the input. To give you an idea of what

these classes actually mean, I wrote a little program that

produces some plots. Don’t worry about what

the code looks like. In a few weeks, you’ll

be able to write such programs yourself. Not only will you be able

to, you’ll be forced to. So I’m just going to run this

and produce some plots showing different orders of growth. All right. This is producing

these blocks. Excuse me. I see. So let’s look at the plots. So here, I’ve plotted

linear growth versus logarithmic growth. And as you can see, it’s

quite a difference. If we can manage to get a

logarithmic algorithm, it grows much more slowly than

a linear algorithm. And we saw this when we looked

at the graded advantage of binary search as opposed

to linear search. Actually, this is linear

versus log linear. What happened to figure one? Well, we’ll come back to it. So you’ll see here that

log linear is much worse than linear. So this factor of nlog(n)

actually makes a considerable difference in running time. Now, I’m going to compare a

log linear to quadratic, a small degree polynomial. As you can see, it almost looks

like log linear is not growing at all. So as bad as log linear looked

when we compared it to linear, we see that compared to

quadratic, it’s pretty great. And what this tells us is that

in practice, even a quadratic algorithm is often impractically

slow, and we really can’t use them. And so in practice, we worked

very hard to avoid even quadratic, which somehow

doesn’t seem like it should be so bad. But in fact, as you can see,

it gets bad quickly. Yeah, this was the log versus

linear, not surprising. And now, if we look at quadratic

versus exponential, we can see hardly anything. And that’s because exponential

is growing so quickly. So instead, what we’re going to

do is I’m going to plot the y-axis logarithmically just so

we can actually see something. And as you can see on input of

size 1,000, an exponential algorithm is roughly order

10 to the 286th. That’s an unimaginably

large number. Right? I don’t know what it compares

to, the number of atoms in the universe, or something

ridiculous, or maybe more. But we can’t possibly think of

running an algorithm that’s going to take this long. It’s just not even

conceivable. So exponential, we sort of throw

up our hands and say we’re dead. We can’t do it. And so nobody uses exponential

algorithms for everything, yet for anything. Yet as we’ll see, there are

problems that we care about that, in principle, can only

be solved by exponential algorithms. So what do we do? As we’ll see, well, we

usually don’t try and solve those problems. We try and solve some approximation to those problems. Or we use some other tricks to

say, well, we know the worst case will be terrible, but

here’s how we’re going to avoid the worst case. We’ll see a lot of that towards

the end of the term. The moral is try not to do

anything that’s worse than log linear if you possibly can. Now some truth in advertising,

some caveats. If I look at my definition of

what big O means, I said it grows no faster than. So in principle, I could say,

well, what the heck, I’ll just write 2 to the x here. And it’s still true. It’s not faster than that. It’s not what we actually

want to do. What we actually want

is a type bound. We’d like to say it’s no faster

than this, but it’s no slower than this either, to try

and characterize the worst cases precisely as we can. Formally speaking, a theorist

used something called big Theta notation for this. They write a theta instead of

an O. However, most of the time in practice, when somebody

writes something like f of x is order x squared, what

they mean is the worst case is really about

x squared. And that’s the way we’re

going to use it here. We’re not going to try

and get too formal. We’re going to do what people

actually do in practice when they talk about complexity. All right, let’s look at

another example now. Here, I’ve written factorial

recursively. Didn’t even try to disguise

what it was. So let’s think about how

we would analyze the complexity of this. Well, we know that we can ignore

the first two lines of code because those are just

the additives pieces. We don’t care about that– the first line, and the

if, and the return. So what’s the piece

we care about? We care about the number of

times the factorial is called. In the first implementation of

factorial, we cared about the number of iterations

of a loop. Now instead of using a loop, you

use recursion to do more or less the same thing. And so we care about

the number of times fact is called. How many times will

that happen? Well, let’s think about why I

know this doesn’t run forever, because that’s always the way

we really think about complexity in some sense. I know it doesn’t run forever

because each time I call factorial, I call it on a number

one smaller than the number before. So how many times can

I do that if I start with a number n? n times, right? So once again, it’s order n. So the interesting thing we see

here is that essentially, I’ve given you the same

algorithm recursively and iteratively. Not surprisingly, even though

I’ve coded it differently, it’s the same complexity. Now in practice, the recursive

one might take a little longer to run, because there’s a

certain overhead to function calls that we don’t have

with while loops. But we don’t actually

care about that. Its overhead is one of those

multiplicative constants I said we’re going to ignore. And in fact, it’s a very small

multiplicative constant. It really doesn’t make

much of a difference. So how do I decide whether to

use recursion or iteration has nothing to do with efficiency,

it’s whichever is more convenient to code. In this case, I kind of like

the fact that recursive factorial is a little neater. So that’s what I would use

and not worry about the efficiency. All right. Let’s look at another example. How about g? What’s the complexity of g? Well, I can ignore the

first statement. But now I’ve got two

nested loops. How do I go and think

about this? The way I do it is I start by

finding the inner loop. How many times do I go through

the inner loop? I go through the inner

loop n times, right? So it executes the inner

for statement is going to be order n. The next question I ask is how

many times do I start the inner loop up again? That’s also order n times. So what’s the complexity

of this? Somebody? AUDIENCE: n-squared. PROFESSOR: Yes. I think I heard the

right answer. It’s order n-squared. Because I execute the inner

loop n times, or each time around is n, then I multiply it

by n because I’m doing the outer loop n times. So the inner loop is

order n-squared. That makes sense? So typically, whenever I have

nested loops, I have to do this kind of reasoning. Same thing if I have recursion

inside a loop or nested recursions. I start at the inside and

work my way out is the way I do the analysis. Let’s look at another example. It’s kind of a different take. How about h? What’s the complexity of h? First of all, what’s h doing? Kind of always a good

way to start. What is answer going to be? Yeah? AUDIENCE: The sum of the

[UNINTELLIGIBLE]. PROFESSOR: Right. Exactly. It’s going to be the

sum of the digits. Spring training is already

under way, so sum of the digits. And what’s the complexity? Well, we can analyze it. Right away, we know

we can ignore everything except the loop. So how many times do I

go through this loop? It depends upon the number

of digits in the string representation of

the int, right? Now, if I were careless, I would

write something like order n, where n is the

number of digits in s. But really, I’m not allowed

to do that. Why not? Because I have to express the

complexity in terms of the inputs to the program. And s is not an input. s is a local variable. So somehow, I’m going to have to

express the complexity, not in terms of s, but

in terms of what? x. So that’s no go. So what is in terms of x? How many digits? Yeah? AUDIENCE: Is it constant? PROFESSOR: It’s not constant. No. Because I’ll have more

digits in a billion than I will in four. Right. Log — in this case, base 10 of x. The number of decimal digits

required to express an integer is the log of the magnitude

of that integer. You think about we looked at

binary numbers and decimal numbers last lecture, that was

exactly what we were doing. So that’s the way I have

to express this. Now, what’s the moral here? The thing I really care about

is not that this is how you talk about the number

of digits in an int. What I care about is

that you always have to be very careful. People often think that they’re

done when they write something like order n. But they’re not until they tell

you what n means, Because that can be pretty subtle. Order x would have been wrong

because it’s not the magnitude of the integer x. It’s this is what controls

the growth. So whenever you’re looking at

complexity, you have to be very careful what you mean,

what the variables are. This is particularly true now

when you look at functions say with multiple inputs. Ok. Let’s look at some

more examples. So we’ve looked before

at search. So this is code you’ve

seen before, really. Here’s a linear search

and a binary search. And in fact, informally, we’ve

looked at the complexity of these things before. And we can run them, and we can

see how they will grow. But it won’t surprise you. So if we look at the

linear search– whoops, I’m printing

the values here, just shows it works. But now, the binary search, what

we’re going to look at is how it grows. This is exactly the search

we looked at before. And the thing I want you to

notice, and as we looked at before, we saw it was

logarithmic is that as the size of the list grows, doubles,

I only need one more step to do the search. This is the beauty of a

logarithmic algorithm. So as I go from a 100, which

takes 7 steps, to 200, it only takes 8. 1,600 takes 11. And when I’m all the way up to

some very big number, and I’m not even sure what that number

is, it took 23 steps. But very slow growth. So that’s a good thing. What’s the order of this? It’s order n where n is what? Order log n where n is what? Let’s just try and write

it carefully. Well, it’s order length

of the list. We don’t care what the actual

members of the list are. Now, that’s an interesting

question to ask. Let’s look at the code

for a minute. Is that a valid assumption? Well, it seems to be when

we look at my test. But let’s look at what

I’m doing here. So a couple of things

I want to point out. One is I used a very common

trick when dealing with these kinds of algorithms. You’ll notice that I have

something called bsearch and something called search. All search does is

called bsearch. Why did I even bother

with search? Why didn’t I just with my code

down here call bsearch with some initial values? The answer is really, I started

with this search. And a user of search shouldn’t

have to worry that I got clever and went from this linear

search to a binary search or maybe some more

complex search yet. I need to have a consistent

interface for the search function. And the interface is what it

looks like to the caller. And it says when I call

it, it just takes a list and an element. It shouldn’t have to take the

high bound and the lower bound as arguments. Because that really is

not intrinsic to the meaning of search. So I typically will organize

my program by having this search look exactly like this

search to a caller. And then, it does whatever

it needs to do to call binary search. So that’s usually the way

you do these things. It’s very common with recursive

algorithms, various things where you need some

initial value that’s only there for the initial call,

things like that. Let me finish– wanted to point out is the use

of this global variable. So you’ll notice down

here, I define something called NumCalls. Remember we talked

about scopes. So this is now an identifier

that exists in the outermost scope of the program. Then in bsearch, I used it. But I said I’m going to use

this global variable, this variable declared outside

the scope of bsearch inside bsearch. So it’s this statement that

tells me not to create a new local variable here but to use

the one in the outer scope. This is normally considered

poor programming practice. Global variables can

often lead to very confusing programs. Occasionally, they’re useful. Here it’s pretty useful because

I’m just trying to keep track of a number of times

this thing is called. And so I don’t want a new

variable generated each time it’s instantiated. Now, you had a question? Yeah? AUDIENCE: Just checking. The order len L, the size

of the list is the order of which search? PROFESSOR: That’s the order

of the linear search. The order of the binary search

is order log base 2 of L– sorry, not of L, right? Doesn’t make sense to take

the log of a list of the length of the list. Typically, we don’t bother

writing base 2. If it’s logarithmic, it doesn’t

really very much matter what the base is. You’ll still get that

very slow growth. Log base 10, log base 2, not

that much difference. So we typically just

write log. All right. People with me? Now, for this to be true, or in

fact, even if we go look at the linear search, there’s

kind of an assumption. I’m assuming that I can extract

the elements from a list and compare them to a

value in constant time. Because remember, my model of

computation says that every step takes the same amount

of time, roughly. And if I now look say a binary

search, you’ll see I’m doing something that apparently

looks a little bit complicated up here. I am looking at L of low and

comparing it to e, and L of high and comparing it to e. How do I know that’s

constant time? Maybe it takes me order length

of list time to extract the last element. So I’ve got to be very careful

when I look at complexity, not to think I only have to look

at the complexity of the program itself, that is to say,

in this case, the number of recursive calls, but is

there something that it’s doing inside this function that

might be more complex than I think. As it happens, in this case,

this rather complicated expression can be done

in constant time. And that will be the

first topic of the lecture on Thursday.

Great!thank you MIT

lol im in it

i have the 3.3.2 version, and there is no PYLAB

do i have to download that or does it has a new name?

i love seeing the view count drop as you go vid = vid + 1 through the series.

Thank you John and MIT for making this happen. I an in an MIT class from Nairobi, Kenya!!! You are truly touching the world one person at a time

He delivers the lecture in a very descriptive way but at the same time, one might will feel drowsy or super sleepy. Isn't it?

the 2 dislikers are the guys who could not give right answers

the obvious frustration John feels when no one answers how many times the recursive factorial function runs…painful.

But seriously, great course MIT – very well taught.

if there's one thing that I want to know, that is to how will he run "that specific part of code" in a constant time. Need to look for that vid. 😀