Getting Started with HTML: How to Instruct Web Crawlers to Index Your Webpage Content

If you want to provide specific instructions to web crawlers, you need to use meta tags. Meta tags can instruct web crawlers, such as Google’s Googlebot and other search engine robots, on how to index and interact with webpage content.

Meta tags are used to provide metadata about the HTML document. The <meta> tags with attributes name="robots" or name="googlebot" and content is commonly used to control how search engines index and follow links on a webpage.

Example

<!DOCTYPE html>
<html lang="en">
  <head>
      <meta charset="UTF-8">
      <meta name="viewport" content="width=device-width, initial-scale=1.0">

      ...

      <!-- Googlebot-specific instructions: index the page, follow links, 
           allow snippets, and archive the page -->
      <meta name="googlebot" content="index,follow,snippet,archive">
    
      <!-- General instructions for all search engine robots: index the page and 
           follow links -->
      <meta name="robots" content="index, follow">

      ...

  </head>
  <body>

    ...

  </body>
</html>

The following meta tag is specifically aimed at Google’s web crawler, Googlebot. It provides directives on how Google should handle the page.

<meta name="googlebot" content="index,follow,snippet,archive">

index: Instructs Google to include this page in its search index.
follow: Instructs Google to follow the links on this page.
snippet: Allows Google to show a snippet of the page in search results.
archive: Allows Google to keep a cached copy of the page.

The next meta tag provides general instructions for all web crawlers (including Googlebot, Bingbot, etc.).

<meta name="robots" content="index, follow">

index: Instructs search engines to include this page in their search index.
follow: Instructs search engines to follow the links on this page.

Additional Directives

Here is a list of some additional directives you might want to use:

noindex: Prevent the page from being indexed.
nofollow: Prevent the crawler from following links on the page.
noarchive: Prevent search engines from storing a cached copy of the page.
nosnippet: Prevent search engines from showing a snippet of the page in search results.
noimageindex: Prevent images on the page from being indexed.
unavailable_after: Specifies the date/time after which the page should not be indexed.

For more additional directives, you can visit https://developers.google.com/search/docs/crawling-indexing/robots-meta-tag

Conclusion

Using both of the mentioned meta tags together ensures that Googlebot receives more detailed instructions while providing general instructions for all other search engines. This can be useful if you want to specify additional behavior for Googlebot.

Additional Directives

Conclusion

About Matej Lednár

Leave a Reply Cancel reply

Additional Directives

Conclusion

About Matej Lednár

Related Articles

Leave a Reply Cancel reply