Wednesday, February 9, 2011

prevent robot from crawling your site

Today, my prof. asked me to prevent our website from robot's crawling. After google, I found it is quite simple. There are two ways to prevent robots from crawling your web pages
One way is to ask search engine to remove your whole site from index and prevent its crawling in the future. And the other way is to control access to your website on a page-by-page basis.

To prevent robot from crawling your website, put an robot.txt in your root directory
The content of robots.txt is like:
User-agent: *
Disallow: /
To control the access to your website on a page-by-page basis:

Add the following tag to the head in the web page you are going to restrict the access
<meta name="robots" content="noindex">

How to check your robots.txt?
The Google Webmaster Tools help to check. But you must have a google account.
Block or remove pages from using a robots.txt , click on 'Test A robots.txt file'

Reference
Using Meta Tags to Block Access to your site
About the Robot Meta Tag

No comments:

Post a Comment