Categories

A sample text widget

Etiam pulvinar consectetur dolor sed malesuada. Ut convallis euismod dolor nec pretium. Nunc ut tristique massa.

Nam sodales mi vitae dolor ullamcorper et vulputate enim accumsan. Morbi orci magna, tincidunt vitae molestie nec, molestie at mi. Nulla nulla lorem, suscipit in posuere in, interdum non magna.

Preventing Google and Other Search Engines From Indexing Your Website Using Meta Tags

Not everything you post on the internet needs to be crawled, indexed, and cached by Google or any other search engines. For that very reason a robots.txt file was created, but sometimes we don’t need to get to granular in the robots.txt file, or we may not have access to edit it. For that very reason we can specific information to our META tags in our web pages. The robots field is a comma separated list, if you do not set it, it will automatically default to ALL, this means that the page can be indexed and that all links on the page can be crawled. Here is the api for the robots field…

CONTENT="ALL | NONE | NOINDEX | INDEX| NOFOLLOW | FOLLOW | NOARCHIVE"
default = "ALL"
"NONE" = "NOINDEX, NOFOLLOW"
  • ALL – Robots are allowed to index, follow links, and archive the page

  • NONE – Robots should ignore this page, i.e. act as if this page doesn’t exist and as if they never saw it.
  • NOINDEX – do not index this particular page
  • INDEX – Robots should index this page
  • NOFOLLOW – Robots can index this page, but should not follow any links on this page
  • FOLLOW – Robots can follow any links on this page
  • NOARCHIVE – This is a special meta field used by Google which prevents it from being archived

Following the above mapping, I tried creating an example that would fit most users necessities.

  1. index this page, and any pages that I link to
  2. index this page, but don’t crawl any links referenced here
  3. don’t index this page, but crawl any links referenced on this page
  4. don’t index this page, and don’t crawl any links referenced on this page
  5. In addition to the above, we can prevent this page from being archived from Google

Example #1, index this page, and any pages that it links to

1
2
3
4
<html>
   <head>
      <meta name="ROBOTS" content="ALL">
   </head>

Example #2, Index this page, but don’t follow any links

1
2
3
4
<html>
   <head>
      <meta name="ROBOTS" content="INDEX, NOFOLLOW">
   </head>

Example #3, Don’t index this page, but follow links on this page

1
2
3
4
<html>
   <head>
      <meta name="ROBOTS" content="NOINDEX, FOLLOW">
   </head>

Example #4, Don’t index this page, and don’t follow any links

1
2
3
4
<html>
   <head>
      <meta name="ROBOTS" content="NOINDEX, NOFOLLOW">
   </head>

Example #5, Lastly, assuming we do not want this page cached we can add a NOARCHIVE tag, the following example will allow the page to be indexed, all links can be crawled, but the page cannot be archived

1
2
3
4
<html>
   <head>
      <meta name="ROBOTS" content="INDEX, FOLLOW, NOARCHIVE">
   </head>

You can add the NOARCHIVE field to any example above to prevent it from being cached on search engines. It’s important to remember, that not being cached does not mean that your web page will not be indexed on a search engine.

2 comments to Preventing Google and Other Search Engines From Indexing Your Website Using Meta Tags

Leave a Reply