PageRank is the technique used by Google to determine importance of page on the web. It considers all incoming links to a page as votes for PageRank. But if these votes from different sources weighted equally, this will lead to wrong result. Thus, votes from different sources are weighted according to PageRank of voting page and number of links that voting page contains. Founders of Google, Sergey Brin and Lawrence Page have defined PageRank by a formula. Researchers have done lot of experiments and came up with some conclusions. There are conflicts among opinions of different researchers. This paper starts with formula for PageRank calculation. It describes how to use the formula. Then, paper goes through some examples and derives some observations from it. Paper also contains basic ideas to improve PageRank of web site

What is PageRank?

Most of the people start their web navigation by search engine. Google is the most famous search engine used now days. While presenting search results, they should be ordered by their relevancy and importance on the web. User cannot go through all the pages presented as output of search. Thus all the pages in the collection should be weighted and represented in the order of their weights.

One of the most important factors that Google uses is PageRank. PageRank is a numeric value that represents how important a page is on the web. Off course PageRank is not the only factor, which decides importance of page, but still it is one of them. PageRank is described by one mathematical formula that seems very difficult at first, but actually it is not.

Formula:

The citation graph of the web is main resource for calculation of PageRank. In the paper “The Anatomy of a Large-Scale Hypertextual Web Search Engine” founders of Google, Sergey Brin and Lawrence Page defined PageRank as:

“We assume page A has pages T1...Tn which point to it (i.e., are citations). The parameter d is a damping factor, which can be set between 0 and 1. We usually set d to 0.85 .……. C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows:

PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) “

It's the original formula that was published when PageRank was being developed, and it is probable that Google uses a variation of it but they aren't telling us what it is. Thus, when one page links to other page, first page votes some PageRank to the second page. PageRank of a page is the addition of one constant (1-d=0.15) and the damped value of addition of votes by all pages pointing to it. Value of vote by a particular page depends on PageRank and total number of out links of that page. Thus, higher the PageRank, higher is the value of vote and higher the number of out links, lower is the value of vote.

So every page distributes 85% of its original PageRank evenly among all pages to which it points. e.g. if page A is pointing to four other pages then factor ‘d * PR(A)/4’ will come in PageRank equation of all those four pages. That four factors will add up to d * PR(A) , i.e. 85% of PageRank of A.( From here onwards I’ll refer PageRank as PR) Note that when page votes for another page, it doesn’t give anything from its own PR. Its just voting, only the difference is that weight of vote of a page depends on its own PR. It is same as shareholders meeting where weight of vote of shareholder depends on the shares held