Home > SEO > Preventing Google and Other Search Engines From Indexing Your Website Using Meta Tags

Preventing Google and Other Search Engines From Indexing Your Website Using Meta Tags

Not everything you post on the internet needs to be crawled, indexed, and cached by Google or any other search engines. For that very reason a robots.txt file was created, but sometimes we don’t need to get to granular in the robots.txt file, or we may not have access to edit it. For that very reason we can specific information to our META tags in our web pages. The robots field is a comma separated list, if you do not set it, it will automatically default to ALL, this means that the page can be indexed and that all links on the page can be crawled. Here is the api for the robots field…

CONTENT="ALL | NONE | NOINDEX | INDEX| NOFOLLOW | FOLLOW | NOARCHIVE"
default = "ALL"
"NONE" = "NOINDEX, NOFOLLOW"
  • ALL – Robots are allowed to index, follow links, and archive the page
  • NONE – Robots should ignore this page, i.e. act as if this page doesn’t exist and as if they never saw it.
  • NOINDEX – do not index this particular page
  • INDEX – Robots should index this page
  • NOFOLLOW – Robots can index this page, but should not follow any links on this page
  • FOLLOW – Robots can follow any links on this page
  • NOARCHIVE – This is a special meta field used by Google which prevents it from being archived

Following the above mapping, I tried creating an example that would fit most users necessities.

  1. index this page, and any pages that I link to
  2. index this page, but don’t crawl any links referenced here
  3. don’t index this page, but crawl any links referenced on this page
  4. don’t index this page, and don’t crawl any links referenced on this page
  5. In addition to the above, we can prevent this page from being archived from Google

Example #1, index this page, and any pages that it links to

1
2
3
4
<html>
   <head>
      <meta name="ROBOTS" content="ALL">
   </head>

Example #2, Index this page, but don’t follow any links

1
2
3
4
<html>
   <head>
      <meta name="ROBOTS" content="INDEX, NOFOLLOW">
   </head>

Example #3, Don’t index this page, but follow links on this page

1
2
3
4
<html>
   <head>
      <meta name="ROBOTS" content="NOINDEX, FOLLOW">
   </head>

Example #4, Don’t index this page, and don’t follow any links

1
2
3
4
<html>
   <head>
      <meta name="ROBOTS" content="NOINDEX, NOFOLLOW">
   </head>

Example #5, Lastly, assuming we do not want this page cached we can add a NOARCHIVE tag, the following example will allow the page to be indexed, all links can be crawled, but the page cannot be archived

1
2
3
4
<html>
   <head>
      <meta name="ROBOTS" content="INDEX, FOLLOW, NOARCHIVE">
   </head>

You can add the NOARCHIVE field to any example above to prevent it from being cached on search engines. It’s important to remember, that not being cached does not mean that your web page will not be indexed on a search engine.

Categories: SEO Tags: , ,
  1. November 6th, 2009 at 00:20 | #1

    Certainly, I agree with the author’s view point. I love the idea since it is well
    explained. Thanks for the posts.

  2. January 26th, 2010 at 12:41 | #2

    The author of http://www.brangle.com has written an excellent article. You have made your point and there is not much to argue about. It is like the following universal truth that you can not argue with: truth is always truthing itself with every new thought or creation Thanks for the info.

  1. April 16th, 2021 at 15:20 | #1
  2. June 14th, 2021 at 05:58 | #2
  3. June 15th, 2021 at 00:20 | #3
  4. June 15th, 2021 at 01:14 | #4
  5. June 15th, 2021 at 16:37 | #5
  6. June 15th, 2021 at 23:50 | #6
  7. June 15th, 2021 at 23:58 | #7
  8. June 16th, 2021 at 00:14 | #8
  9. June 16th, 2021 at 00:40 | #9
  10. June 16th, 2021 at 22:33 | #10
  11. June 26th, 2021 at 03:31 | #11
  12. October 4th, 2021 at 23:22 | #12