Monday, April 24, 2017

What is the crawler again?

The search engine like Google, Yahoo, etc consists of a crawler, an indexer, and a highly sophisticated algorithm . The crawler follows backlinks. Crawler reaches to your website via these high quality links. Once it reaches the site the content is saved in index.
Crawler follows the backlinks on the web. A crawler is also called a robot, a bot, or a spider. It goes and roams  around the internet 24/7. Its simply a program whose work is to navigate to website. When a crawler visits your website it saves the HTML version in very huge database called index. Whenever the crawler visits the website the index is updated. Hence it is advised that you must update your website often so that crawler comes often to your site. This increases the importance of your site on Google and hence ranking improves. In short more the content updates and backlinks more the crawlers and so does the ranking improves.

What is crawlability?
Crawlability is the process of  Google to crawl and access your website. Crawlers can also  be blocked from your site. There are a few ways to block a crawler/bots from your website. If your website or a page on your website is blocked, this will indicate to  Google’s crawler: “do not come here”. Your website or the respective page won’t show up in the search results in most of these cases.
There are a few things that could prevent Google from crawling (or indexing) your website:
§  If your robots.txt file blocks/forbids the crawler, Google and other search engines will not come to your website or specific web page.
§  Before crawling your website, the crawler will analyze the HTTP header of your page. This HTTP header contains a status code. If this status code says that a page doesn’t exist, Google won’t crawl your website. If the robots meta tag on a specific page blocks the search engine from indexing that page, Google will crawl that page, but won’t add it to its index.
This flow chart might help you understand the process bots follow when attempting to index a page:

