Search Engine Tools And Services

To encourage webmasters to create sites and content in accessible ways, each of the major search engines have built support and guidance-focused services. Each provides varying levels of value to search marketers, but all of them are worthy of understanding. These tools provide data points and opportunities for exchanging information with the engines that are not provided anywhere else.


The sections below explain the common interactive elements that each of the major search engines support and identify why they are useful. There are enough details on each of these elements to warrant their own articles, but for the purposes of this guide, only the most crucial and valuable components will be discussed.

Common Search Engine Protocols
1. SITEMAPS
Sitemaps are a tool that enable you to give hints to the search engines on how they can crawl your website. You can read the full details of the protocols at Sitemaps.org. In addition, you can build your own sitemaps at XML-Sitemaps.com. Sitemaps come in three varieties:

XML
Extensible Markup Language 
  • This is the most widely accepted format for sitemaps. It is extremely easy for search engines to parse and can be produced by a plethora of sitemap generators. Additionally, it allows for the most granular control of page parameters.
  • Relatively large file sizes. Since XML requires an open tag and a close tag around each element, files sizes can get very large.
RSS
Really Simple Syndication or Rich Site Summary
  • Easy to maintain. RSS sitemaps can easily be coded to automatically update when new content is added.
  • Harder to manage. Although RSS is a dialect of XML, it is actually much harder to manage due to its updating properties.
Txt
Text File
  • Extremely easy. The text sitemap format is one URL per line up to 50,000 lines.
  • Does not provide the ability to add meta data to pages.
2 ROBOTS.TXT
The robots.txt file (a product of the Robots Exclusion Protocol) should be stored in a website's root directory (e.g., www.google.com/robots.txt). The file serves as an access guide for automated visitors (web robots). By using robots.txt, webmasters can indicate which areas of a site they would like to disallow bots from crawling as well as indicate the locations of sitemaps files (discussed below) and crawl-delay parameters. You can read more details about this at the robots.txt Knowledge Center page.
The following commands are available:

Disallow
Prevents compliant robots from accessing specific pages or folders.

Sitemap
Indicates the location of a website’s sitemap or sitemaps.

Crawl Delay
Indicates the speed (in milliseconds) at which a robot can crawl a server.

An Example of Robots.txt
#Robots.txt www.example.com/robots.txt
User-agent: *
Disallow:
# Don’t allow spambot to crawl any pages
User-agent: spambot
disallow: /

sitemap:www.example.com/sitemap.xml

Warning: It is very important to realize that not all web robots follow robots.txt. People with bad intentions (ie., e-mail address scrapers) build bots that don’t follow this protocol and in extreme cases can use it to identify the location of private information. For this reason, it is recommended that the location of administration sections and other private sections of publicly accessible websites not be included in the robots.txt. Instead, these pages can utilize the meta robots tag (discussed next) to keep the major search engines from indexing their high risk content.

3. META ROBOTS
The meta robots tag creates page-level instructions for search engine bots.The meta robots tag should be included in the head section of the HTML document.

An Example of Meta Robots
<html>
<head>
<title>The Best Webpage on the Internet</title>
<meta name="ROBOT NAME" content="ARGUMENTS" />
</head>
<body>
<h1>Hello World</h1>
</body>
</html> 

In the example above, “ROBOT NAME” is the user-agent of a specific web robot (eg. Googlebot) or an asterisk to identify all robots, and “ARGUMENTS” is one arguments listed in the diagram to the right.

REL="NOFOLLOW"
The rel=nofollow attribute creates link-level instructions for search engine bots that suggest how the given link should be treated. While the search engines claim to not nofollow links, tests show they actually do follow them for discovering new pages. These links certainly pass less juice (and in most cases no juice) than their non-nofollowed counterparts and as such are still recommend for SEO purposes.

An Example of nofollow

<a href=”http://www.example.com” title=“Example” rel=”nofollow”>Example Link</a>

In the example above, the value of the link would not be passed to example.com as the rel=nofollow attribute has been added.

Search Engine Tools
GOOGLE WEBMASTERS TOOLS

Geographic Target - If a given site targets users in a particular location, webmasters can provide Google with information that will help determine how that site appears in our country-specific search results, and also improve Google search results for geographic queries.

Preferred Domain - The preferred domain is the one that a webmaster would like used to index their site's pages. If a webmaster specifies a preferred domain as http://www.example.com and Google finds a link to that site that is formatted as http://example.com, Google will treat that link as if it were pointing at http://www.example.com.

Image Search - If a webmaster chooses to opt in to enhanced image search, Google may use tools such as Google Image Labeler to associate the images included in their site with labels that will improve indexing and search quality of those images.

Crawl Rate - The crawl rate affects the speed of Googlebot's requests during the crawl process. It has no effect on how often Googlebot crawls a given site. Google determines the recommended rate based on the number of pages on a website.

Diagnostics
Web Crawl - Web Crawl identifies problems Googlebot encountered when it crawls a given website. Specifically, it lists Sitemap errors, HTTP errors, nofollowed URLs, URLs restricted by robots.txt and URLs that time out.

Mobile Crawl - Identifies problems with mobile versions of websites.Content Analysis - This analysis identifies search engine unfriendly HTML elements. Specifically, it lists meta description issues, title tag issues and non-indexable content issues.

Statistics
These statistics are a window into how Google sees a given website. Specifically, it identifies top search queries, crawl stats, subscriber stats, “What Googlebot sees” and Index stats.

Link Data
This section provides details on links. Specifically, it outlines external links, internal links and sitelinks. Sitelinks are section links that sometimes appear under websites when they are especially applicable to a given query.

Sitemaps
This is the interface for submitting and managing sitemaps directly with Google.Sign up google webmaster

YAHOO! SITE EXPLORER

Features
Statistics - These statistics are very basic and include data like the title tag of a homepage and number of indexed pages for the given site.

Feeds - This interface provides a way to directly submit feeds to Yahoo! for inclusion into its index. This is mostly useful for websites with frequently updated blogs.

Actions - This simplistic interface allows webmasters to delete URLs from Yahoos index and to specify dynamic URLs. The latter is especially important because Yahoo! traditionally has a lot of difficulty differentiating dynamic URLs.Sign up Yahoo site Explorer

BING WEBMASTER CENTER

Features
Profile - This interface provides a way for webmasters to specify the location of sitemaps and a form to provide contact information so Bing can contact them if it encounters problems while crawling their website.

Crawl Issues - This helpful section identifies HTTP status code errors, Robots.txt problems, long dynamic URLs, unsupported content type and, most importantly, pages infected with malware.

Backlinks - This section allows webmasters to find out which webpages (including their own) are linking to a given website.

Outbound Links - Similarly to the aforementioned section, this interface allows webmasters to view all outbound pages on a given webpage.

Keywords - This section allows webmasters to discover which of their webpages are deemed relevant to specific queries.

Sitemaps - This is the interface for submitting and managing sitemaps directly to Microsoft.Sign up Bing webmaster

SEOMOZ OPEN SITE EXPLORER
While not run by the search engines, SEOmoz's Open Site Explorer does provide similar data.

Features
Identify Powerful Links - Open Site Explorer sorts all of your inbound links by their metrics that help you determine which links are most important.

Find the Strongest Linking Domains - This tool shows you the strongest domains linking to your domain.

Analyze Link Anchor Text Distribution - Open Site Explorer shows you the distribution of the text people used when linking to you.

Head to Head Comparison View - This feature allows you to compare two websites to see why one is outranking the other.

It is a relatively recent occurrence that search engines have started to provide tools that allow webmasters to interact with their search results. This is a big step forward in SEO and the webmaster/search engine relationship. That said, the engines can only go so far with helping webmaster. It is true today, and will likely be true in the future that the ultimate responsibility of SEO is on the marketers and webmasters. It is for this reason that learning SEO is so important.