Robots txt

What is Robot txt?

robot txt

Robot txt is a text file which directs the web crawlers or you can say search engine robots how to crawl the website .

It will also tell search engine robots which pages is not crawl. The main idea behind this file is that to put less load of request on your website .

Only the working and error free webpages will be crawl for rich experience of the user.  

Robot txt file is a part of REP(ROBOT EXCLUSIN PROTOCOL).

REP are the group of web protocol that supervise or control how the search engine robot will crawl the web.

REP also supervise how to access and index the content of the webpages and also how to serve that content to the users.

Basic format:
User-agent: [user-agent name]
Disallow: [URL string not to be crawled]

The above is the basic format of the robot txt file . this is how the robot txt looks.

Robot.txt terms

  • User-agent
  • Disallow
  • Allow
  • Sitemap
  • Crawl delay

User agent

Basically it is for the search engine ,it is like we are giving the instruction for crawl to a specific crawler .You  can have a list of user agents here http://www.robotstxt.org/db.html

Disallow

It will tell the user agent not to crawl the specific url.

Allow

This is allowed only to the googlebot. This command will tell the bot to crawl the specific page or its subfolder . It can access the subfolder of a page if its parent page of folder is disallow by the user agent.

Sitemap

Sitemap is a XML file of the webpage ,it contains all the list of  webpages on your website.

It tell the bot to crawl ,explore and index all the webpages of the website.

Crawl delay

This command is not follow by the google bot .This command will tell the crawler to wait for specific time  before loading and crawling the webpage content.

Check if you have robot.txt file?

If you are not sure that your website has robot.txt file or not then you can find it easily.

Just follow my instruction and you will get to know wheather you hav file or not.

Just go to the address bar and type your domain name and then add /robots.txt at the end of the URL.

https://createpassiveincomes.com/robots.txt

  After this if you see a webpage then you have the file and if you not see any webpage then you don’t have your robot.txt file.

Is  robots.txt File Is Important?

Robots.txt file is very sensitive , if you accidently disallow the user agent then the bot will not crawl your website that means your website will now come on the search .

So this is important one and also use carefully .

  • It helps in specifying the site map of the website.
  • It also helps your website to come in search results (SERP).
  • Robot txt can be used to crawl or not to crawl the specific webpage .
  • It prevents the duplicate content from appearing  in the SERP.
  • This file will prevent from crawling some of the files of your website like images,pdf etc.
  • It reduces the situation of overloading the server by using crawl delay.

Here are some common robots.txt setups.

robot txt

 Allow Full Access

User-agent: *
Disallow:

Block All Access

User-agent: *
Disallow: *

Block One Folder

User-agent: *
Disallow: /folder/

One File will be Block

User-agent: *
Disallow: /file.html

Google has also provide the information about the robot txt.At the end I will recommend you to use your robot txt file carefully . This link will help you to create robot txt file if you doesn’t have it.