How Search Engines Work

Search engines have four functions - crawling, building an index, calculating relevancy & rankings and serving results.


Crawling and Indexing 
Imagine the World Wide Web as a network of stops in a big city subway system.

Each stop is its own unique document (usually a web page, but sometimes a PDF, JPG or other file). The search engines need a way to “crawl” the entire city and find all the stops along the way, so they use the best path available – links.


“The link structure of the web serves to bind together all of the pages in existence.”

(Or, at least, all those that the engines can access.) Through links, search engines’ automated robots, called “crawlers,” or “spiders” can reach the many billions of interconnected documents.Once the engines find these pages, their next job is to parse the code from them and store selected pieces of the pages in massive hard drives, to be recalled when needed in a query. To accomplish the monumental task of holding billions of pages that can be accessed in a fraction of a second, the search engines have constructed massive datacenters in cities all over the world. These monstrous storage facilities hold thousands of machines processing unimaginably large quantities of information. After all, when a person performs a search at any of the major engines, they demand results instantaneously – even a 3 or 4 second delay can cause dissatisfaction, so the engines work hard to provide answers as fast as possible.

Providing Answers 
When a person searches for something online, it requires the search engines to scour their corpus of billions of documents and do two things – first, return only those results that are relevant or useful to the searcher’s query, and second, rank those results in order of perceived value (or importance). It is both “relevance” and “importance” that the process of search engine optimization is meant to influence.To the search engines, relevance means more than simply having a page with the words you searched for prominently displayed. In the early days of the web, search engines didn’t go much further than this simplistic step, and found that their results suffered as a consequence. Thus, through iterative evolution, smart engineers at the various engines devised better ways to find valuable results that searchers would appreciate and enjoy. Today, hundreds of factors influence relevance, many of which we’ll discuss throughout this guide.

Importance is an equally tough concept to quantify, but search engines must do their best.
Currently, the major engines typically interpret importance as popularity – the more popular a site, page or document, the more valuable the information contained therein must be. This assumption has proven fairly successful in practice, as the engines have continued to increase users’ satisfaction by using metrics that interpret popularity.Popularity and relevance aren’t determined manually (and thank goodness, because those trillions of man-hours would require earth’s entire population as a workforce). Instead, the engines craft careful, mathematical equations – algorithms – to sort the wheat from the chaff and to then rank the wheat in order of tastiness (or however it is that farmers determine wheat’s value). These algorithms are often comprised of hundreds of components. In the search marketing field, we often refer to them as “ranking factors” For those who are particularly interested, SEO crafted a resource specifically on this subject – Search Engine Ranking Factors.

SEO INFORMATION FROM YAHOO
Many factors influence whether a particular web site appears in Web Search results and where it falls in the ranking.
These factors can include:
  • The number of other sites linking to it
  • The content of the pages
  • The updates made to indicies
  • The testing of new product versions
  • The discovery of additional sites
  • Changes to the search algorithm – and other factors
SEO INFORMATION FROM BING 
Bing engineers at Microsoft recommend the following to get better rankings in their search engine:
  • In the visible page text, include words users might choose as search query terms to find the information on your site.
  • Limit all pages to a reasonable size. We recommend one topic per page. An HTML page with no pictures should be under 150 kb.
  • Make sure that each page is accessible by at least one static text link.
  • Don’t put the text that you want indexed inside images. For example, if you want your company name or address to be indexed, make sure it is not displayed inside a company logo.
SEO INFORMATION FROM GOOGLE
Googlers recommend the following to get better rankings in their search engine:
  • Make pages primarily for users, not for search engines. Don't deceive your users or present different content to search engines than you display to users, which is commonly referred to as cloaking.
  • Make a site with a clear hierarchy and text links. Every page should be reachable from at least one static text link.
  • Create a useful, information-rich site, and write pages that clearly and accurately describe your content. Make sure that your <title> elements and ALT attributes are descriptive and accurate.
  • Keep the links on a given page to a reasonable number (fewer than 100).