Crawler access through robots.txt file
Do you know if one can allow or disallow crawler to access particular page or not? Yes, definitely we can. Through robots.txt which is an on-page SEO process. We can disallow crawler access to web page.
If our website has content that we do not want search engines to access, then we can use a robots.txt file to specify how search engines should crawl your site’s content.
If few pages have out of date content and is appearing on Google search result, then use Removal URL tool from Google webmaster to remove the URL.
Before removing the URL make sure that following processes have been done:
- Make sure the content page is not exist on the web. Requests for the page must return an HTTP 404 not found or 410 status code.
- Disallow the content page using tobots.txt file.
- Block the content using a meta noindex tag.
Robots.txt page code:
Copy below code on a notepad file and name it as robots.txt
Using Meta noindex tag:
To prevent all robots from indexing a page on your site, place the following meta tag into the <head> section of your page:
<meta name=”robots” content=”noindex”>
To allow other robots to index the page on your site, preventing only Google’s robots from indexing the page:
<meta name=”googlebot” content=”noindex”>
If private or out-of-date content is appearing in Google search results, use the Removal URL tool to request its removal