Page Rank – Intro to Computer Science

Page Rank – Intro to Computer Science


So, that idea for how to rank webpages is the same idea as how we measure popularity of people. But instead of thinking about friendships as the way to measure popularity, what we’re measuring is links on the web. When one page has a link to another page, well that indicates it’s more likely that this other page is popular, just like when someone is a friend, it indicates that the person they are friends with is more likely to be popular. So, the goal in ranking web pages is to get a measure of how popular are pages, based on the other pages that link to it but we have the same issue with popularity then not all links are the same. That a link from a page that’s really important counts for a lot more than a link from a page that is not important. So, if the New York Times has a link to your page, that counts for a lot more than if your mom sets up a web page and puts a link to your page in it. Unless your mom is Lady Gaga, in which case her link probably counts for more. So, another way of thinking about this is what we’re trying to model is a random web surfer. So, our random web surfer has some set of pages that they know about. And those pages have links to other pages. Some pages might have a lot of links. Some pages might just have one link. Some pages might have no links. So, one way to think about this is that we’re trying to model a random web surfer. The web surfer starts knowing about some pages. And she picks one page at random, let’s say she picks this page. And then, when she’s on that page, she picks a random link and follows that links. Whoops, this was a bad starting page. It actually has no out links. So, then, she picks another random page. Let’s say she picks this one. She follows the link from that page, and now she got to the page with no links again. Let’s say she picked a better starting point. Let’s say she randomly picked this one. Now she’s got two links to follow. She randomly picks one of those. She follows it. She gets to a new page. She randomly picks a link from that page, in this case, it only had one, and in this case, it seems we have a bit of a problem, because all of the starting pages eventually lead into this one, which has no outgoing links. So, we’ll think about how to solve that problem later, but we can think of what our random web surfer is doing, is picking random pages, following links, and what we want to measure is the popularity of a page. And that’s the probability that she reaches that given page, starting from these random pages. So, if you did this over and over again, and you counted the number of times you reached each page, that would give you a measure of that page’s popularity. So, this is very similar to the popularity function. We’re going to define a function that we’ll call the rank of the page. And, like the way we defined the popularity function, it’s going to have two inputs. It’s going to have a time stamp, and it’s going to have the page which we’ll use a URL for. And the output of rank will be some number. Except for we’ll define for time step zero. This is our base case, and we’re going to find all the ranks have value one. We’ll actually change that shortly, but we’ll start out think of it, all the ranks having value one, like we did with the popularity scores. And then we’ll define the value of the rank at time step t for any given URL. Just like we defined the popularity score. It’s going to be the sum of all the pages that are friends with this page, and what it means for our webpage to be friends with another page is that it has a link to it. So, this is going to be for all the pages that exist that have some link to that URL, or its friends. And so, we’re going to go through each of those pages. We’ll call them inlinks instead of friends. We’re going to go through those and we’re going to sum up the ranks that we got for those pages at time t minus one. So, this is our first model of popularity on, of webpages. This is exactly the same as the model we had of popularity for people. It’s not going to work that well yet and one of the reasons it’s not going to work that well is some pages might have lots of links. And if a page has lots of links, the value of each one of its links should be diminished. It shouldn’t have the same value as the page that only has one link that links to this URL. Maybe that should be the same case for a friend’s popularity, if someone has lots of friends, each friend is less valuable. Whereas someone who only has a small number of friends has lots of time for each friend. So this is the way we’re going to model web popularity. We don’t want to just give the same score to each link. We’re going to change this by dividing by the number of outlinks. So, if a page has many outgoing links, the value to the pages that it links to is less for each page. So, a page that’s just a long list of lots of links won’t have that much influence on the rankings. If a page only has a few outgoing links, well then, they are worth more. So, what we are going to do, is divide this by the number of out links from p. So, each of the p pages, right? These will be the values of p, as they go through the inlinks of URL. We are going to sum up the rank that we got on the previous time step and divide that by the number of out links.

3 thoughts on “Page Rank – Intro to Computer Science”

Leave a Reply

Your email address will not be published. Required fields are marked *