I don’t want Google to crawl part or all of my site

There is a standard method involving a “robots.txt” file for excluding robot crawlers. This will prevent Googlebot or other crawlers from visiting your site. Googlebot has a user-agent of “Googlebot”. In addition, Googlebot understands some extensions to the robots.txt standard: Disallow patterns may include * to match any sequence of characters, and patterns may end in $ to indicate that the $ must match the end of a name. For example, to prevent Googlebot from crawling files that end in gif, you may use the following robots.txt entry:

User-agent: Googlebot
Disallow: /*.gif$

There is another standard for telling robots not to index a particular web page or follow links on it, which may be more helpful, since it can be used on a page-by-page basis. This method involves placing a “META” element into a page of HTML.

Remember, changing your server’s robots.txt file or changing the “META” elements on its pages will not cause an immediate change in what results Google returns. It is likely that it will take a while for any changes you make to propagate to Google’s next index of the web.

Excerpt taken from Google Webmaster Info

Was this helpful?