Skip to main content Accessibility help
×
Hostname: page-component-848d4c4894-wg55d Total loading time: 0 Render date: 2024-05-15T06:09:04.702Z Has data issue: false hasContentIssue false

5 - Link Analysis for the World Wide Web

from Part III - Graph-Based Information Retrieval

Published online by Cambridge University Press:  01 June 2011

Rada Mihalcea
Affiliation:
University of North Texas
Dragomir Radev
Affiliation:
University of Michigan, Ann Arbor
Get access

Summary

This chapter addresses link-analysis methods used by search engines, such as PageRank and HITS, and covers topics relevant to their application, including method stability, the combination of link- and content-based models, topic-sensitive ranking, and query-dependent link analysis.

The Web as a Graph

The Web – a common abbreviation for the World Wide Web – consists of billions of interlinked hypertext pages. These pages contain text, images, videos, or sounds and are usually viewed using Web browsers, such as Firefox or Internet Explorer. Users can navigate the Web by either directly typing the address of a Web page (i.e., the URL) inside a browser or following the links that connect Web pages among them.

The Web is a typical example of a graph, with Web pages corresponding to vertices in the graph and links between pages corresponding to directed edges. For instance, if the page http://www.unt.edu includes a link to the page http://www.cs.unt.edu and another to the page http://www.htsc.unt.edu, and the latter page in turn links to the page of the National Institutes of Health http://www.nih.gov and also back to the http://www.unt.edu page, it means that these four pages form a subgraph of four vertices with four edges, as illustrated in Figure 5.1.

Although the size of the Web is generally considered to be unknown, there are various estimates concerning the size of the indexed Web – that is, the subset of the Web that is covered by search engines.

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×