Cloud Computing – Computer Science for Business Professionals – by CS50 at Harvard


Cloud computing– it’s
this term that rather swept onto the scene in recent years. And it sounds like it’s some
new and trendy technology. But in reality, it’s really
just a very nice packaging up of a whole number of
technologies that have actually been with us for some time. In fact, cloud computing,
in its simplest form, can really be thought
of as just outsourcing the hosting of your applications
and really outsourcing the hosting of your physical servers
to someone else– put another way, renting space and renting time
on someone else’s computers. But these days, we just have so much
computational capabilities– that is, our computers are so fast, our CPUs
are so many, and we have so much RAM– that new and fancier
technologies have lent themselves to this trend of hosting
all the more software and putting all of the more hardware
off-site in the so-called cloud so that companies, both big
and small, no longer need to host their own physical hardware
or even a whole number of roles in their own local companies. And so what we’ll do now is
dive into cloud computing, look at some of the
problems it solves, look at some of the opportunities
it affords, but ultimately, take a look from the ground up
at what’s underneath the hood here so that by the end
of this, we have a better understanding of what the
cloud is, why it is useful, and what it actually is not. So with that said, let’s
start with a simple scenario. Of course, the cloud
perhaps derives its origins from how the internet,
for some time, was drawn, which was just this big, nebulous
cloud, in that it doesn’t really matter what’s inside that cloud. Although at this point, you most surely
appreciate that inside of this cloud are things like routers, and
running through those routers are packets, both TCP/IP and the like. And underneath the hood,
then, of this cloud is some transport mechanism that
gets data from point A to point B. So what might those point
A’s and Point B’s be? Well, if this here is my little,
old laptop, connected somehow to the internet here,
and maybe down here there is some web server on which lives
a whole bunch of web pages– maybe it’s my email. Maybe it’s the day’s news. Maybe it’s some social
media site or the like. I, at point A, want to somehow
connect to point B down here. Now, it turns out it’s not all
that hard to get a website up and running on the internet. You can, of course, use
any number of languages. You can use any number of databases. And you can do it with
relatively little experience, just getting something on the internet. In fact, it’s not all that
hard, relatively speaking, to get a prototype of
your application or even your first version of your
business up and running. But things start to get hard quickly,
especially if you have some success. Indeed, a good problem to have is
that you have so many customers and so many users hitting your
websites that you can’t actually handle all of the load. Now, it’s a good problem in the
sense that business is booming. But it’s, of course, an
actual problem in the sense that your customers aren’t going
to be able to visit your web site and buy whatever it is
you’re selling or read whatever it is you’re posting if your
servers can’t actually handle the load. And by load, I simply mean the number
of users per minute or per unit of time that your website is
actually experiencing. And its capacity, meanwhile,
would be the number of users it can actually support. Now, why are there these
limits in the first place? Well, you may recall
that inside of a computer is a CPU, the brains of that computer. And inside of a computer
is some memory, like RAM. And there might be some longer-term
storage, like hard disk space. At the end of the day, all of those
resources and more are finite. You can only fit so much
physical hardware in a computer. Humans have only been able
to pack so many resources into the physical space of a computer. And then, of course, there’s cost. You might be able to only afford
so much computing capacity. So if a computer can only do
some number of things per second, there is surely an upper bound on
how many people can visit your web site, how many people can add things
to their shopping cart, how many people can check out with their credit card. Because you only have, at the end of
the day, a finite numbers of resources. Now, what does that mean in real terms? Well, maybe your web server can
handle 100 users per minute. Maybe it can handle
1,000 users per minute. Maybe it can handle 1,000 users per
second, or even much more than that. It really depends on the specifications
of your hardware– how much RAM, how much CPU and so forth that
you actually have– and it also depends, to some extent, on how
well-written your code is and how fast or how slow your code, your
software actually runs. So these are knobs that
can ultimately be turned. And through testing, can you
figure this out in advance by simulating traffic in order to
estimate exactly how many users you might be able to handle at a time? Now, the relevance to today is
that the cloud, so to speak, allows us to start to solve
some of these problems and also allows us to start
abstracting away the solutions to some of these problems. Well, let’s see what
this actually means. So at some point or other– especially when it’s not
just my laptop, but it’s like 1,000 laptops, or 10,000 laptops
and desktops and phones and more that are somehow trying
to access my server here– at some point, we hit that upper
limit whereby no more users can fit onto my web site per unit of time. So what is the symptom that my
users experience at that point if I’m over capacity? Well, they might see an
error message of some sort. They might just
experience a spinning icon because the website is
super slow to respond. And maybe it does respond, but
maybe it’s 10 seconds later. So at the end of the day, they either
have a bad experience or no experience whatsoever, because my server can only
handle so many requests at a time. So what do you do to solve this problem? If one server is not enough, maybe
the most intuitive solution is, well, if one server is not
giving me enough headroom, why don’t I just have two servers? So let’s go ahead and do that. Instead of having just one server,
let’s go ahead and have two. And let me propose that on the second
server, it’s the exact same software. So whatever code I’ve written, in
whatever language it’s written, I just have copies of my web
site on both the original server and the second server. Now I’ve solved the
problem in the simple sense that I’ve doubled my capacity. If one server can handle
1,000 people per second, well, then surely two servers can
handle 2,000 people per second, so I’ve doubled my capacity. So that’s good. I’ve hopefully solved the problem. But it’s not quite as simple as that. At least pictorially, I’m still
pointing at just one of those servers, so we’re going to have to clean
up this picture alone and somehow figure out how to get users– or more generally, traffic– to both of these servers. I could just naively
draw an arrow like this. But what does that actually mean? We don’t want to abstract
away so much of the detail that we’re ignoring this problem. How do we implement this notion of
choosing between left arrow and right arrow? Well, let’s consider what
our solutions might be. If a user, like me on my laptop,
is trying to visit this web site– and the web site, ideally, is going
to live at something like example.com, or facebook.com, or
gmail.com, or whatever– I don’t want to have to broadcast
different names for my servers. And you might actually
notice this on the internet. You might notice, if you start noticing
the URLs of websites you’re visiting– especially for certain older, stodgier
companies who haven’t necessarily implemented this in
the most modern way– you might find yourself not
just at www.something.com, but if you look closely, you
might find yourself occasionally at www1.something.com,
www2.something.com, or even www13.something.com. Which is to say that some companies
appear to solve this problem by just giving different names– similar names, but different names–
to their two servers, three servers, 13 servers, or however many they have. And then they somehow redirect
users from their main domain name, www.something.com, to any one
of those two or three or 13 servers. But this isn’t very elegant. The marketing folks
would surely hate this, because you’re trying to build some
brand recognition around your URL. Why would you dirty it by just putting
these arbitrary numbers in the URLs? Plus if you fast forward
a bit in this story, if, for some reason down
the road, you get fancier, bigger servers that
can handle more users, and therefore you
don’t need 13 of them– you can get away with just six of them– well, what happens if some of
your customers have bookmarked, very reasonably, one of those older
names, like www13.something.com? So now when they try to visit that
URL, gosh, they might hit a dead end. So you could solve
that in some other way. But the point is it would seem
to create a problem quickly, and it’s just a naming mess. Why actually bother having
your users see something as messy as these numbered servers? It would be nice to do this
a little more transparently. So how could we do this? Well, let me propose that we
kind of need some middleman here, so to speak, whereby traffic comes
from people like me on the internet and then either goes to the
left or goes to the right, or no matter how many
servers we have, goes to one of those actual web servers. So how does this middleman– and
to borrow some past terminology, how does this black
box potentially work? Well, let’s consider
some of the building blocks, some of the puzzle pieces we
have technologically at our disposal now. You may recall that every
server on the internet has an IP address, an internet protocol
address, a unique address for it. And that’s, again, a bit of
a white lie, because there are technologies by which
you can have private IP addresses that the
outside world doesn’t see. But let’s stipulate,
for today’s purposes, that every computer on the
internet certainly has an IP address, whether public or private. So maybe, just maybe, we could
leverage an existing technology– DNS, the Domain Name System– so that rather than only return
one IP address of a server when you look up www.something.com,
we return the IP address of the server on the
left some of the time or the IP address of the server
on the right some of the time, effectively balancing our load,
our traffic across the two servers. And in fact, if you
do this 50-50, you can take, really, what’s called
a round robin approach, and ideally uniformly distribute
your traffic across multiple servers. And what’s nice in this model is
that because you’re using DNS, the user doesn’t really
notice what’s going on. At the end of the day,
none of us humans really care what IP address we’re
actually going to if we visit Facebook.com or Gmail.com or the like. We just care that our computer can find
that server or servers on the internet. So via DNS, we could, very
cleverly, via this middleman here, which is really just going to be some
third device, some separate server– it, as a DNS device, could
just respond to requests from customers with either this
IP address or this IP address, or any number of different IP addresses. So does this solve the problem? Again, most everything
in computer science would seem to be a tradeoff
at the end of the day. And this seems almost too
good to be true, perhaps. It’s so simple. It leverages an existing technology. It just works. So what prices might we pay? Well, DNS, it turns out,
gets cached quite a bit. And what does caching mean? Caching something means
keeping some past answer– or more generally, piece
of information– around so that you can access it more quickly the
second and the third time and beyond. And so computers today,
Macs and PCs, as well as servers on the internet, other
DNS servers on the internet, for performance reasons, will
often remember the responses that they get from DNS servers. For instance, if, on my Mac, I
visit Facebook.com, hypothetically a lot of times during the day, it’s kind
of stupid if my laptop, again and again and again and again,
asks some DNS server for Facebook.com’s IP
address if it already asked that same question an hour
ago– or more realistically, two minutes ago, or something like that. It would be smarter if my operating
system– or even my browser, Chrome or Firefox or whatever
I’m using– actually remembers that answer for me so that
my computer can just pull up that web site faster by skipping a step, by
not wasting time asking a server again for the IP address of a server. And after all, IP addresses, it turns
out, generally don’t change that often. It’s certainly possible for a company
or a university or even a home user to change their computer’s IP addresses. But the reality is it doesn’t
change all that often. The common case is to
have the same IP address now as you might an hour from now,
or even a day or a week or a month from now. But the key thing is that it can change. And especially if you’re worried about
customers– not just some personal web site, but you might lose business. You might lose orders if users
can’t visit your website. Anything that puts your
server’s uptime, so to speak– being accessible on
the internet at risk– probably is worthy of
some consideration. So let me propose, then, that just one
of these servers goes offline somehow. Maybe it’s deliberate. You need to do some service for it. Or maybe it crashed in some way,
or it got unplugged somehow, or something went wrong such that
now, one or more of your servers, across which you’ve been load balancing,
no longer can talk to the internet. What might happen? Well, if some customer’s
Mac, like my own, has remembered or cached that
particular server’s IP address, that is not a good situation. Because your Mac or PC
or whatever is going to now try to revisit your
web site again and again and again at that old cached IP address
that apparently can be a dead end. And so even though you still have
servers that could potentially handle that customer’s
request, that customer’s order, that customer’s desire
to check out, he or she really is still not
going to be able to visit the website unless that cache expires. Maybe they reboot their computer
so that the cache forcibly expires. Maybe they just wait some amount
of time so that that IP address is forgotten by the browser
or by the operating system or by some other DNS server
until the new one’s available IP addresses are picked up instead. But there is that risk. And I would argue that this
risk is even higher especially for companies that might be considering
moving their infrastructure from one service to another. If you’re deliberately going to move
your servers from one IP address to another, as might happen if you
change cloud providers, so to speak– more on those in a minute– really, if you change the companies
that you’re using to host your servers, your IP addresses will change. And you certainly don’t want to
incur a huge amount of downtime in a situation like that. So there are these tradeoffs. Easy solution, technologically
pretty inexpensive to do. It just works using existing technology. But you open up yourselves to this risk. So let’s address that. Putting back the old
proverbial engineering hat, let’s try to solve this problem. It seems that giving a unique
IP address to this server and to this server, and any number
of other servers that are back there, might not be the smartest idea in
so far as those IPs can get cached. So what if we use DNS as follows? When my laptop or anyone else’s requests
the IP address for www.something.com, why don’t we return the IP
address of this device here– this load balancer, as
we’ll start calling it, where a load balancer is
usually just a physical device, or multiple physical devices, whose
purpose in life is to balance load? Packets come in, and similar
in spirit to a router, they do route information to the left,
to the right, or some other direction. But their overarching purpose isn’t just
to get data from point A to point B, but to somehow intelligently
balance that traffic over multiple possible destinations
for point B, identical servers in the case of our story here. So what if, instead, we addressed
this problem of potential downtime by returning the IP address
of the load balancer, and then, by nature of
private IP addresses or some other mechanism
that the end user does not need to know or care about, this load
balancer somehow routes the traffic to either the first device
or the second device, LB here being our load balancer? So we’ve seemed to have
solved this problem. In so far as now we have
configured our DNS servers to return the IP address
of the load balancer, there’s no problem of downtime
as we described a moment ago. Because if Server 1 goes offline
for whatever reason, no big deal. The load balancer should hopefully
just notice that and subsequently start proactively routing all incoming data
that reaches its IP address to Server 2 and not Server 1. now how does the load balancer know? Well, either a human could intervene. Maybe someone gets a late night
call or text or page saying, uh oh, server 1 is down,
you better do something. And then he or she can manually
configure the load balancer to no longer send any
traffic to Server 1. That seems kind of stupid in an age
of automation and smart software. Maybe we can do better. And indeed, we can. A technique that’s
often used by servers is something modeled from
the human world to use what you might describe as
heartbeats to actually configure the load balancer and Servers
1 and 2 to operate as follows. Maybe every second, every half a
second, maybe every five seconds you configure Server 1 and Server 2
to send some kind of heartbeat message to the load balancer. This is just a TCP/IP packet,
some kind of network packet that’s the equivalent
of saying I’m alive. I’m alive. Or more goofily, like boom, boom, boom,
boom, ergo the heartbeat metaphor. But the point is that 1 and 2,
and any number of other servers, should be configured to
just constantly reassure the load balancer that they are alive. They are accessible. They are ready to receive traffic. And the load balancer, similarly– and you might see where this
is going– can very simply be configured to listen
for that heartbeat. And if it ever doesn’t hear a
heartbeat from Server 1 or Server 2, it should just assume
that something is wrong. The server has died. It’s gone offline. Something bad has happened. So the load balancer
subsequently should simply not route any traffic to
that particular server until some human or
some automated process brings the server back alive, so to
speak, and the heartbeat resumes. Now, of course, this problem
doesn’t go away permanently. If servers 1 and 2 stop
emitting a heartbeat, we really have no capacity for users. But that would be an extreme scenario. Hopefully it’s just one or a few of
our servers go offline in that way. So we can configure our servers
for these heartbeats, which is– think about it– a very simple physiologically-inspired
solution to a problem. And even if it’s not obvious
how you implemented it in code, it really is just an algorithm,
a simple set of instructions with which we can solve this problem. And yet, damnit, we’ve
introduced a new problem. And so this really is
the old leaky hose, where just as we’ve plugged
one leak or solved one problem, another one has sprung up
somewhere else along the line. So what’s the problem now? What’s the problem now? The whole motivation of introducing
Server Number 2, in addition to Server Number 1, was to make
sure that we have enough capacity, and better yet, to make sure that if
Server 1 or Server 2 goes offline, the other one can hopefully
pick up the load unless it’s a super busy time with lots and
lots of users visiting all at once. So in fact, the general
idea at play here is high availability ensuring
that if one server goes down, you have other servers
that can pick up the load. Being highly available means you
can be tolerant to issues like that. And then load balancing,
of course, is just the mere process of splitting the
load across those two endpoints. But we have introduced another problem. This might be abbreviated SPOF, or more
explicitly, Single Point Of Failure. Just as I’ve solved one problem
by introducing this load balancer, so have I introduced a new
problem, which is this. There is now, as you
might infer from the name alone, a single point of failure. It’s fine that I can now tolerate
Server 1 or Server 2 going down, but what can I not tolerate, clearly? What if the load balancer goes down? So this is a very real concern. Maybe the load balancer
itself gets overloaded. Maybe the load balancer
itself has some kind of issue. And if the load balancer
goes down, it doesn’t matter how many web
servers I have down here, or how much money I’ve spent down
here to ensure my high availability. My server is offline if this single
point of failure indeed fails. Now, you’d like to think
that the load balancer– especially since it only
has one job in life– can at least handle more traffic
than any individual server. Indeed, clearly, it must be
the case that the load balancer is fast enough and capable
enough to handle twice as much traffic as any individual server. But that’s generally accepted as
feasible insofar as your website. Your real intellectual
property is probably doing a lot of work–
talking to a database, writing out files, downloading things,
or any number of other features that just take more effort than
just routing data from one server to another as a load balancer does. But it doesn’t matter
how performant it is. If the load balancer breaks,
goes offline for some reason, your entire infrastructure
is inaccessible. So how do we solve this? How do we go about and
architect a solution to this? Well, how did we address
this issue earlier? We addressed the issue of insufficient
capacity or potential downtime by just throwing
hardware at the problem. And so maybe we could
do that same thing here. Maybe we could just introduce
a second load balancer. I’ll call this LB as well. And now we somehow have to– I feel like we’re just endlessly going
to be adding more and more rectangles to the picture. But somehow, we need to be able to load
balance across now two servers and two load balancers. So how do we do this? Well, let me clean this up so that we
have a bit more room to play with here and consider how a pair of load
balancers might actually work. So if my first server is here
and my second server is here, and I’m proposing now to have two load
balancers– one here and one here– surely, both of these have to
be able to talk to both servers. So we already have this necessity. And somehow, traffic has
to come from the internet into this set of load balancers,
but probably only to one, because we don’t want
to solve this with DNS and just have two IP
addresses out there. Because if one breaks, we
can recreate the same problem as before if we’re not careful. So what if we do this? What if we use this building block
of heartbeats in another way as well? What if we ensure that
our load balancers– plural– have just one IP
address, which a moment ago seemed to create a single point of failure? But what if we do this? What if we also allow the
load balancers to talk to, to communicate over a network with each
other so that one of the load balancers is constantly saying to
the other, I’m alive. I’m alive. I’m alive. And so what the load balancers
could be configured to do is that only one of them operates
at any given point in time. But if the other server,
the other load balancer, no longer hears from that primary load
balancer because of the heartbeats that are ideally both being
emitted in both directions so that they can both be
assured of the other’s up time– if the secondary load balancer stops
hearing the primary load balancer, the secondary load balancer
can just presumptuously reconfigure itself to take on
that one and only IP address, effectively assuming that the
first load balancer is not going to be responding to any traffic anyway. And the second load balancer can
simply take on the entire load itself. But the key difference now
in this particular solution is that there’s only one IP address that
describes this whole architecture, only one IP address between
the two load balancers so we don’t risk those potential dead
ends that we had a little bit ago with our back end servers. So now it’s starting to get more
robust, more highly available. So that’s pretty good. We’ve solved most of these problems. We’ve generously, though, swept one
problem underneath the rug, whereby every time I draw another rectangle– not just the first time,
but now the second time– and add some interconnectivity,
somehow, among them someone somewhere is spending some money. And indeed, I am solving
these problems thus far by throwing money at the problem,
and frankly introducing complexity. Already look at how many
arrows or edges there are now, which might simply refer
to physical wires, which is fine. But there’s also a logical
configuration that’s now necessary. And God forbid we have a third load
balancer for extra high availability or any number of servers here– 13 or 20 or 100 or 1,000 servers. It’s a lot of cross-connections–
not just physically, but logically in terms of
the requisite configuration. So this complexity does add up. And the cost certainly adds up. And now, once upon a
time– and not all that long ago– if a company wanted to
architect this kind of solution, you would literally
buy two load balancers, and you would buy two
or more web servers, and you would buy the
requisite physical ethernet cables to interconnect the two. And you’d probably buy a
whole bunch of other hardware that we’ve not even talked about,
like firewalls and switches and more. But you would physically
buy all of this hardware. You would physically
connect all of this hardware and configure it to implement
these several kinds of features. But the catch is that the
more and more hardware you buy, just probabilistically,
the more and more you invite some kind of failure. Maybe it’s some stupid human error. But more realistically, one of
your hard drives is going to fail. And hard drives are typically rated for
the enterprise in terms of Mean Time Between Failure, MTBF,
which generally means how long should you expect a hard drive
to work on average before it fails. It breaks. It just stops working. So if you have a whole bunch
of servers, each of which has a whole bunch of hard
drives, at some point, combinatorially, one or more of
those drives is just going to fail, which is to say you’re
going to have a problem, and you’re going to
have to fix it yourself. At some point, too, you’re going
to run out of physical space. In fact, perhaps one of the
most constraining resources, especially for startups, is
the physical space itself. You probably don’t want to start housing
your servers in your physical office, because you need a special room for
it, typically, with enough cooling, with enough access, with enough
electricity, and enough humans to actually maintain it. Or you graduate from your own office
space and go to a data center, a co-location facility,
whereby you maybe rent space in a physical
cage with a locking door, inside of which you
put racks of servers, just racked up on big metal poles,
and you pack as many servers in there as you can. But at some point, you’re
going to be bumping up against other constrained resources–
physical space, actual power capacity, cooling, as well as the
humans to actually run this. And so very quickly
does operations, ops, so to speak, become an increasing
cost and an increasing challenge. And one of the most alluring
features of the cloud, so to speak, is that you can move all
of these details off-site. And you can abstract many of these,
let’s say, implementation details away whereby you yourself don’t have
to worry about the physical wires. You don’t have to worry about
the make and model of servers that you’re buying. You don’t have to worry about
things actually breaking, because someone else will
deal with that for you. But you have to still understand
the topology and the architecture and the features that you want to
implement so that you can actually configure them in the cloud. So what do you actually
get from cloud providers? There’s any number of
them out there these days. But perhaps three of the biggest
are Amazon, Google, and Microsoft, all of whom offer, these days, of
very similar palettes of options. And it’s outright
overwhelming, if you visit each of their web sites, just how
many cloud products they offer. But they would generally offer
a number of standard products in the cloud– for instance,
a virtualized server. So you don’t have to physically
buy a server these days and plug it into your own ethernet
connection, your own internet connection in your own office. You can instead
essentially rent a server in the cloud, which is to
say that Amazon, Google, Microsoft, or any number
of other companies will host that server
physically for you, and they will take care of the
issues of power and cooling. And if a hard drive fails,
they will go remove the old one and plug in the new one. And ideally, they will provide
you with backup services. But more sophisticated
than that, they can also help us recreate, in software,
this kind of topology. In other words, even without having
a human physically wire together this kind of graph, so to speak,
that we’ve been building up here logically, thanks
to software these days, you can implement this whole paradigm– not with physical cables,
not with physical devices, but with software virtually. What does that mean? It means that humans, over
the past several years, have been writing software that mimics
the behavior of physical servers. Humans have been writing software
that mimics the behavior of a router. Humans have been writing software that
mimics the behavior of a load balancer. And implementing mimics the behavior
of– really, we’re just building, in software, what historically
might have been implemented entirely in hardware. And even that’s a bit of
an oversimplification. Because even when something
is bought as hardware, there is, of course, software running
on that hardware that actually makes it do something. But they’re no longer dedicated devices. You can use generic commodity
PC server hardware, really, and transform that hardware into
a certain role, a back end web server, a back end database, a
load balancer, a router, a switch, any number of other things. And so what you were getting from
companies like Amazon and Google and Microsoft and more is
the ability to build up your infrastructure in software. In fact, the buzzword here, the acronym,
is IaaS, Infrastructure as a Service. So you sign up for an account on any
of those companies’ cloud services web sites, and you put in your credit
card information or your invoicing information, and you literally, via
a command line tool– so a keyboard, or via a nice, web-based
graphical user interface, GUI– do you point and click and say, give
me two servers and one load balancer. Or if you have enough
money in the bank, you say give me two servers
and two load balancers configured for high availability. Or better yet, you
don’t say any of that. You just tell the provider, give me a
web server and give me a load balancer, and you deal with the process of
scaling those things as needed. In fact, a buzzword de jeur is auto
scaling, which refers to a feature, implemented in software,
whereby if a cloud provider notices that your servers
are getting a lot of traffic– business is good, or
it’s the holiday season, and you are bumping up against
just how many users your one or two or three or more servers can handle– auto-scaling is a feature that will
enable the cloud provider to just turn on, virtually, more servers for you
so that you go from two to three automatically. You can be happily asleep
in the middle of the night, and even though your traffic
is peaking, it doesn’t matter. Your architecture is
going to auto scale. And better yet–
especially financially– if the cloud provider notices, maybe 12
hours later– oh, all of your customers have gone to sleep, we don’t really
need all of this excess capacity. Or maybe the holidays
are now in the past. You really don’t need
this excess capacity. Auto scaling also dictates that those
servers can be virtually turned off. So you’re no longer using them. You’re no longer load bouncing to them. And most importantly, you’re
no longer paying for them. So this is a really, really
nice value add at this point. There’s no human crawling around
on the floor rewiring things and plugging in new servers. There’s no finance person having to
approve the PO to actually order more servers just to increase your capacity. And most importantly, there is
no latency between the time when you notice, oh, my god, we’re
getting really successful and can’t handle our load– uh oh. It’s going to be a two,
three-week lead time before we can even get
in the more servers. Thanks to cloud computing, you can
literally log in to Amazon’s, Google’s, Microsoft’s web site
and, click, click, click, have more server capacity
within seconds, within minutes, far faster than the physical
world traditionally allowed. So those are just some
of the features now that we gain from outsourcing
to the so-called cloud. So where does some of
this capability come from? Well, it turns out that
over the past many years, humans have been getting
better and better and better at packing more physical hardware
into the same form factor, into the same physical space. So at the level of CPUs,
the brains of a computer, we humans have gotten much better at
packing more and more transistors, for instance, onto a CPU. And transistors are the little switches
that can turn things on and off– 0 and 1, 1 and 0. So you can store more
information and you can do more with that
information more quickly. CPUs today also have
more cores, which you can think of as mini CPUs
inside of the main CPU, so that a computer with
multiple cores can literally do multiple things at a time. But the funny thing is that we
humans, over the past decade or two, really haven’t been getting
fundamentally faster at life. At the end of the day, I can
only check my email so quickly. I can only post on Facebook so quickly. I can only check out
from Amazon so quickly. Because we humans have, of course,
a finite speed to ourselves. We’re not just getting–
we’re not doubling in speed a la Moore’s law every year or two. So we have, it would seem, a lot of
excess computing capacity these days. Computers are getting so darn
fast, we don’t necessarily know what to do with all of these
CPU cycles and with all of the RAM that we can fit into the same
physical box at half the price that it cost us last year. And so manufacturers
and companies realize that we could actually build a
business on this increased capacity. We can implement the computer
equivalent of timesharing, so to speak, which has long been with us
in the history of computing. But we can do this on a
much more massive scale now by taking one physical server
that has maybe two CPUs, or 16 CPUs, or 64 CPUs, and maybe gigabytes– tens of gigabytes or hundreds
of gigabytes of RAM– all inside of the same physical device,
plug it in to an internet connection, and then run special software on that
one server that creates the illusion that there’s multiple servers
living inside of that box. And this virtualization
software is implemented by way of software called a virtual
machine, or virtual machine monitor, or another word might be hypervisor. There’s different ways to describe
essentially the same thing. But a virtual machine
is a piece of software running on a computer inside of which
is running some other operating system, typically. So you might have one
server running Windows. But inside of that server are multiple
virtual machines, each of which itself is running Windows. So you might be able to chop up one
computer into 10, or even into 100. Or perhaps more commonly,
you might have a server running Linux or some
Unix-based operating system, also with virtual machines on it. But those virtual
machines might be running Linux themselves, or Unix, or Windows,
or any number of versions of Windows. And so this is the beauty. When you have so much
excess capacity and so many available CPU cycles
and so much RAM, you can slice that up and then sell portions
of the server’s capacity to customers. And if you’re really clever, you might
look at your customers’ usage patterns and realize that, you know
what, it’s not necessarily as simple as just taking my server and
dividing it up into n different slices, where n is a generic
variable for number, and then selling it or renting that
space, really, to end customers. Because you know what? Some of those customers might have some
booming businesses, which is great. But some of those customers
might not have many users. Maybe it’s a few dozen. Maybe it’s a few hundred. But it’s really a drop in the bucket. So instead of selling my computing
resources to just end customers, maybe I’ll sell it to twice as
many customers or three times as many customers, and essentially
over-sell my server’s capacity, but expect that on
average, this is just going to work out because some customers
will be using a lot of those cycles because business is
good, and some won’t be, because it’s just they
don’t have many customers, or really, it’s a personal website
that doesn’t get much usage anyway. And so for some time,
there has, of course, been this risk, when you sign up
for a web hosting company or a cloud provider, that your web site actually
might get really slow for reasons outside of your control. If you are co-located on a server that
some other booming business is on, your users might actually suffer if
your web host has oversold itself. And so in fact, this is
one of those situations where you get what you pay for. If you’re googling around and
finding various cloud providers, or web hosting companies
more specifically, you might be able to find a deal,
like $10 per month or $50 per month, as opposed to $100 or
$200 or more per month. And you do get what you pay for, because
those fly-by-night operations that are selling you space and
capacity super cheaply probably are overselling and over-committing. So these are the trade-offs, too– how much money do you want
to save versus how much risk do you actually want to take on? Generally, it’s safer to go with some
of the bigger fish these days, certainly when building a business, as you might
on a company like Amazon or Google or Microsoft or derivatives thereof. So just to paint a more
concrete technical picture of what virtualization is, here’s a
picture, as you might think of it. So you have your physical
infrastructure here. So that’s the actual server
from Dell or IBM or whoever. Then you have the host operating
system, which might be Windows, but is often Linux or some
variant of Unix instead. And then you have the hypervisor. This is the piece of
software that you install on your server that allows you to run
multiple virtual machines on top of it. And those virtual machines
can each run any number of different operating systems
themselves, or even different versions of operating systems. And so depicted here up top are
the disparate guest OS operating systems that might be on there. Maybe this is Linux and Solaris,
and this is Windows itself, or any number of other combinations. Whatever your customers want or whatever
you want to provide or essentially rent to customers, you can install. But you do pay a price. So as beautiful as this
situation is, and as clever as it is that we’re leveraging
these excess resources by slicing up one server into the illusion of, in this
case, three, or more generally more, there is some overhead. Because this hypervisor has to be a
middleman between your guest operating systems and your host operating
system, the one actually physically installed on the server. And any layers of indirection
like this, so to speak, have got to cost you
some amount of time. If there’s some work being
done here and you only have a finite number of
resources, the hypervisor itself is surely consuming
some of your resources. And gosh, this just
seems really inefficient, especially if all of your customers
are using the same operating system. My god, why do you have to have copies
of the same OS multiply installed? This just doesn’t feel like it’s
leveraging much economy of scale. And so it turns out there’s a newer
technology that’s gaining steam, and this is known not as virtualization,
per se, but containerization, the most popular instance of which
is perhaps a company called Docker. And the world of Docker
is a little shorter. It’s a little smarter about
how resources are shared. You still have your infrastructure,
your physical server, and you still have your
host operating system, whether it’s Linux or Unix
or something like that. But then instead of a hypervisor,
you have the Docker engine, which is really just an equivalent
of that base layer of software. But notice what’s different. In this case here, we’ve
collapsed the previous picture. In fact, thanks to our friends at
Docker who put this together here, the guest OS has disappeared. And you instead have your
different applications and your different
binaries and libraries, as this abbreviation means, all
running on the Docker engine. Now, what does this mean? This means when running
Docker, you typically choose your operating system– for instance, Ubuntu Linux or Debian
Linux or something else altogether– and then you essentially share
that one operating system across multiple containers. Instead of virtual machines,
we now have containers. So in other words, you ensure
that your different slices all share some common software–
the kernel, so to speak, the base core of the operating system. But then you uniquely layer
on top of that base system, that base set of default files, whatever
customizations your customers or you yourself want, but you
share some of the resources. And long story short, what
this means is that containers tend to be a little lighter weight. There’s less waste of resources because
there’s less overhead of running them, which is to say that you can generally
start them even more quickly. And better yet, you can still
isolate your different products and your different services–
database and web server and email server and any number
of other features– all within the illusion of their own
installation, their own operating system, even though there are
some shared resources here. So this, too, has been made possible
by the capabilities of modern hardware and the cleverness, frankly,
of humans in actually finding solutions or creative uses
for those available resources. But what other features
or topics come into play in this world of cloud computing? We’ve talked about availability
and caching and costing, really figuring out where we’re
going to actually spend our money by throwing hardware at
problems and scaling more generally. But there’s also issues
of replication, which actually do relate to high
availability, so to speak. But replication refers
to duplication of data, and really backups more
generally as a topic. And then there’s also some other funky
acronyms that are very much in vogue these days. Besides Infrastructure
as a Service, there’s also Platform as a Service, PaaS,
or Software as a Service, SaaS. Now, SaaS, even if you’ve
not used it under this name, odds are you have been using it. If you do use Gmail or Outlook.com
or any web-based email service, you are using software as a service. You don’t really know, or need
to care, where in the world your emails physically live, or how
many servers they’re spread across, or how your data is backed
up, or for that matter, when you click Send, how the email
even gets from point A to point B. You are treating Gmail
and Outlook as a software as a service with all of the underlying
implementation details abstracted away. You just don’t know or care
how it’s implemented– well, at least if everything is working. You probably do care
if something goes down. But there’s this intermediate
step between this extreme form of abstraction where all you see
is just the top-level service. And the lowest level
implementation that we’ve discussed, which is
infrastructure as a service, whereby when using
something like Amazon, you literally click the button
that says give me a load balancer. You literally click a button
that says give me two servers. You literally click a
button that says give me a firewall or any number
of other features. So Amazon and Microsoft
and Google, to some extent, have all implemented
these low-level services that still require that you
understand the technology, and you understand networking, and you
understand scaling and availability. But you so much more easily and
inexpensively and efficiently– literally with just a laptop or desktop,
without any data center of your own– stitch together the topology or
the architecture that you actually want, albeit in the cloud. Platform as a service, though,
has arisen as a middle ground here, whereby you might
have services like Herouku, which you might have heard
of, which themselves actually run on infrastructures like Amazon
or Google or Microsoft or the like. But they provide themselves
a layer of abstraction that isn’t quite as high
level, so to speak, as what you get from software as a service. In fact, these platforms as a service
don’t provide you with applications. They just make it easier for you to
run your applications in the cloud. Now, what does that mean? Well, it’s all fun and exciting
to understand load balancing and understand networking
and understand the need for multiple servers and the entire
conversation that we’ve had thus far. But at the end of the day,
if I’m a software developer or I’m trying to build
a business, all I care about is making my internet
application available to real users. I really don’t care about
how many servers I have, how many databases I have, how the
load balancers talk to one another. That’s all fine and
intellectually interesting. But I just want to get real work done. So I’m willing to pay
a bit more for this. I’m willing to pay some middleman,
like a Herouku, or any number of other services, a
platform as a service, to abstract away those kinds of details. So I have the wherewithal,
and I have the willingness to actually say host
this as a web server. So give me a web server. I will pay you some number of dollars
per month to give me a web server. But I want you, Herouku, to deal
with the auto scaling of it. I don’t care how many servers it is. I don’t care how they are connected. I don’t care anything
about these heartbeats. I just want to have the
illusion, for my own sake, of just one server that
somehow grows or shrinks dynamically to handle my customer base. Meanwhile, things like load
balancing, I just want my customers to be able to reach my server. I don’t care how it’s implemented. I don’t care how it’s made
to be highly available. I just want that to work. And so companies like Herouku
provide these platforms as a service that just make
your life a little bit easier. And you don’t have to think about
or know about or worry about as many of these details. Now, to be fair, if
something breaks, you might not understand
exactly what’s going wrong, and you yourself might
not be able to solve it. Indeed, you might be entirely at the
mercy of the cloud provider, or the PAS provider, to solve the problem for you. But you’re saving time. You’re saving energy
elsewhere by not having to worry about those lower-level
implementation details, at least in the common case. But odds are you’re paying a little more
to Herouku than you would to an Amazon directly because they’re providing
you with this value-added service. So as cryptic as these
acronyms really mean, they’re really just
referring to disparate levels of abstraction, all of which
somehow relate to the cloud. But infrastructure as a
service is a virtualization of these hardware ideas,
the physical cabling that we drew here on the screen. Software as a service really
is just that application that the user interacts with. And platform as a service
is an intermediate step, whereby you, in building
your software in the cloud, can worry a little bit about how to
actually make it available to users. But let’s consider one
other challenge now– that of database replication
since, of course, thus far, we’ve been talking about a web server
as though it’s the entire picture. But the reality is
most any business that has a web-based presence
or a mobile presence is going to be storing information. When users register, when
users check something out, add something to their shopping
cart, so to speak, all of that data needs to somehow be stored. So let’s consider now what the
world really likely looks like. So here is my laptop again. And here is the cloud that’s
between me and some service that I’m interested in. We’ll assume for now that there
is some kind of load balancing. And I’m just going to
draw it a little bigger this time to suggest that– let’s
just think of it now as a black box. And maybe it’s one server. Maybe it’s two. Maybe it’s more. But somehow or other, load
balancing is implemented. Then I’m going to have
all of my servers here, which we’ll abstract away as maybe
three or more at this point– one, two, and then we’ll call this n. But a web server typically does
not do everything these days. In fact, it’s been
trending for some time to actually have different servers
or different virtual machines, or even more recently,
different containers. Each provide individual services. Sometimes people call
these micro services if a container only does one, and
one very narrowly defined thing, like send emails, or save
information to a database, or respond to HTTP requests. So these back end web servers are not
the only types of servers we have. Odds are we at least have one database. So let’s consider now
the implication of all of these architectural
decisions we’ve made thus far on how we actually store our data. So in simplest form, our
database might look like this. And for historical reasons, it’s
generally drawn as a cylinder. And this is our database. Now, it’s immediately
obvious that if all servers– 1, 2, dot, dot, dot, n– need to
save information or read information from a database, they’ve all got to
somehow communicate with that database so they all have some kind of
connectivity, physically or otherwise. So this seems fine so long as the
software that’s running on servers 1, 2, dot, dot, dot, and no matter
what language we’re using, whether it’s Java or Python or
PHP or C# or something else– so long as those servers can talk
to, via the network, this database, that’s great. They can all save their
data to the same place, and they can all read their
data from the same place. So everything stays nicely in sync. But what’s the first problem
that motivated the entirety of this discussion from the outset? Well, what if one database
isn’t really enough? Well, we could take the
approach of vertically scaling our architecture, which is another
piece of jargon in this space. So vertical scaling means if your
one database isn’t quite up to snuff, and you’re running low on disk
space or capacity because of numbers of requests per second are, of course,
limited, you know what you can do? You can go ahead and disconnect this one
and go ahead and put in a bigger one, and therefore increase your capacity. And vertical scaling means
to really pay more money or get something higher
end, a higher, more premium model, a more expensive model that’s
got more disk space and more RAM and a faster CPU or more CPUs. So you just throw
hardware at the problem– not in the sense of multiple servers,
but just one bigger and better server. But what are the challenges here? Well, if you’ve ever
bought a home computer, odds are whether it’s been on Dell’s
site or Microsoft’s or Apple’s or the like, you often have
this good, better, best thing where, for the top of the
line laptop or desktop, you’re going to be
paying through the roof– through the nose, so to speak. You’re going to be paying a premium
for that top of the line model. But you might actually be able to
save a decent number of dollars by going for the second
best or the third best, because the marginal gains
of each additional dollar really aren’t all that much. Because for marketing
reasons, they know that there might be some people out
there that will always pay top dollar for the fastest one. But just because you’re
paying twice as much doesn’t mean the laptops is going
to be twice as good, for instance. So this is to say to vertically
scale your database, you might end up paying, through the nose, some
very expensive hardware just to eke out some more performance. But that’s not even the biggest problem. The most fundamental problem
is at the end of the day, there is a top-of-the-line server for
your database that only can support a finite number of database
connections at a time, or a finite number of reads
or writes, so to speak, saving and reading from the database. So at some point or other, it doesn’t
matter how much money you have or how willing you are to
throw hardware at the problem. There exists no server that can handle
more users than you currently have. So at some point, you actually
have to put away your wallet and put back on the engineering
hat alone and figure out how to not vertically scale, but
horizontally scale your architecture. And by this, I mean actually introducing
not just one big, fancy server, but two or more maybe
smaller, cheaper servers. In fact, one of the things
that companies like Google were especially good
at early on was using off-the-shelf, inexpensive hardware and
building supercomputers out of them, but much more economically
than they might have had they gone top
of the line everywhere, even though that would
mean fewer servers. Better to get more cheaper
servers and somehow figure out how to interconnect
them and write the software that lets them all be useful
simultaneously so that we can instead have a picture that looks a
bit more like this, with maybe a pair of databases in the picture now. Of course, we’ve now
created that same problem that we had earlier about
where does the data go. Where does the traffic
or the users flow, especially now where we have one
on the left and one on the right? So there’s a couple of solutions here,
but there are some different problems that arise with databases. If we very simply put a load balancer in
here, LB, and route traffic uniformly– say, to the left or to the right– that’s probably not the best thing. Because then you’re going to
end up with a world where you’re saving some data for a user
here and some data for a user here just by chance, because you’re
using round robin, so to speak, or just some probabilistic
heuristic where some of the traffic goes this way, some of
the traffic goes that way. And that’s not so good. OK. But we could solve that by somehow
making sure that if this user, User A, visits my web site, I should always
send him or her to the same database. And you can do this in a couple of ways. You can enforce some
notion of stickiness, so to speak, whereby you
somehow notice that, oh, this is User A. We’ve seen him or her before. Let’s make sure we send him
to this database on the left and not the one on the right. Or you can more formally use
a process known as sharding. In fact, this is very common
early on in databases, and even in websites like Facebook,
where you have so many users that you need to start splitting
them across multiple databases. But gosh, how to do that? Back in the earliest days of
Facebook, what they might have done was put all Harvard users on one
database, all MIT users on another, all BU users on another, and so forth. Because Facebook, as you may
recall, started scaling out initially to disparate schools. That was a wonderful
opportunity to shard their data by putting similar users
in their respective databases. And at the time, I
think you couldn’t even be friends with people in other
schools, at least very early on, because those databases,
presumably, were independent, or certainly could
have been topologicaly. Or you might do something more
simple that doesn’t create some problems like isolation there. Maybe all of your users whose last
name start with A go on one server, and all of your users
whose names start with B go on another server, and so forth. So you can almost hash your users, to
borrow a terminology from hash tables, and decide where to put that data. Of course, that does not help
with backups or redundancy. Because if you’re putting all of your
A names here and all of your B names here, what happens, god forbid,
if one of the servers goes down? You’ve lost half of your customers. So it would seem that no matter
how you balance the load, you really want to maintain
duplicates of data. And so there’s a few different
ways people solve this. In fact, let me go
ahead and temporarily go back to that first model, where we
had a really fancy, bigger database that I’ll deliberately
draw as pretty big. And this is big in the sense that
it can respond to requests quickly and it can store a lot of data. This might be generally called our
primary or our master database. And it’s where our data
goes to live long term. It’s where data is written to, so to
speak, and could also be read from. But if we’re going to bump up
against some limit of how much work this database can do
at once, it would be nice to have some secondary
servers or tertiary servers. So a very common paradigm would be to
use this primary database for writes– we’ll abbreviate it w– and then also have maybe a couple
of smaller databases, or even the same size databases, that are
meant for reads, abbreviated R. And so long as these databases are
somehow talking to one another, this topology will just work. This is a feature known as replication. So long as the databases
are configured in such a way that any time data is written to
the primary database or the master database, that data gets replicated
to any replicas, as they’re called. Meanwhile, servers 1, 2, and n should
also be able to talk to these replicas. And if your code is smart enough–
and you would have to think about this and design this into your codebase–
you could ensure that any time you read data from a database, it comes from
one, or really any, of your replicas, replicas in the sense that they
are meant to have duplicate data. But anytime you write data– a
SQL INSERT or UPDATE or DELETE, as opposed to a SQL SELECT– you only send your write operations
to the primary or master database and leave it to it to then
replicate it to the read replicas. Now, of course, there
are some problems here. There’s some latency, potentially. Maybe it takes a split second. Maybe it takes a couple seconds
for that data to replicate. So things might not appear to
be updated instantaneously. But you have now a very scalable model
in that if you have the money to spend, you can even have more read replicas and
have even more and more read capacity. Of course, you’re going to eventually
bump up against a limit on your rights, at which point we need to
introduce another solution. But again, this is a very
incremental approach. And we can throw a little bit of
money at the problem each time and a little bit of
engineering wherewithal in order to at least get us
over that next ledge, which is super important, certainly, when
you’re first building your business. If You don’t necessarily have the
resources to go all in on things, you at least want to
get over this hurdle or at least build in some capacity
for the next load of users. So what if we run out
of capacity, though, with that that writable server,
the master database, so to speak? We need to be a little more clever. And it turns out we can borrow this
idea of these horizontal arrows here to replicate our data, but
for a slightly different purpose. We could still have a pretty
souped up writable database. But we could have another one, maybe
identical in its specs, writable. But somehow, these things need to be
able to synchronize with themselves. And maybe there’s still some
read replicas over here– R for read, and another
one over here, R for read. And these are all somehow
interconnected as well. But you can have what’s called
master master replication, whereby your server’s code writes
to one of these servers. And maybe it’s either of them now. Maybe the load balancer actually
does send some of the writes this way, some of the writes this way. But the master database,
the writable ones now, are configured, in software, to
replicate horizontally, so to speak. So here too, you might have
a little bit of latency. It might take a few
milliseconds or seconds for the data to actually replicate. But at least now we’ve doubled
the capacity for our writes so as to handle twice as
many writable operations. And we can continue to hang more
and more read replicas off of these if you want in order to
handle more and more users. And so this is the challenge and,
dare say, the fun of engineering architecturally– understanding
some of these basic building blocks. And even if you might not know
the particular manufacturers or how you physically
configure the servers, or how in software you configure
these servers, at the end of the day, these really are just puzzle pieces
that can somehow be interlocked. And these puzzle pieces can be used
to solve more and more interesting problems. But to our discussion PaaS and Software
as a Service and Infrastructure as a Service, there’s also these
different layers of abstraction. And so thematic throughout
this in all of our discussions has been this layering. Indeed, we started, really, down here
with those zeros and ones and bits, and very quickly went to
Ascii, and very quickly went to colors and images
and videos and so forth. Because once you understand some of
those ingredients or puzzle pieces, can you build something
more interesting? And then can you slap a name on it– sometimes cryptic, like
IaaS, or PaaS, or SaaS? But at the end of the
day, those are just labels that describe, really,
black boxes, inside of which is a decent amount of complexity,
a clever amount of engineering, but ultimately, a solution to a problem. And so in cloud computing, do we really
have this catch-all phrase that’s referring to a whole class of solutions
to problems that ultimately are all about getting one’s business or
getting one’s personal website out on the internet for users to
access, whether via laptops or desktops or mobile devices and more? So at the end of the
day, what is the cloud? It’s this evolving definition. It’s this evolving class of services
that just continues to grow. But each of those services
is solving a problem. Each of those problems derives from
plugging one hole in a leaky hose, seeing another one spring up,
and then addressing that one, and then layering on top of those
solutions these are abstractions, and ultimately some marketing
speak, like cloud computing itself, so that you can build, out of these
more sophisticated puzzle pieces, bigger and better solutions
to actual problems you have when you’re trying
to build your own site.

Leave a Reply

Your email address will not be published. Required fields are marked *