Robots.txt For Blog: Complete Guide | Manage Your Crawl Budget Effectively

Anybody can do blogging by making a free blog on blogger but If you want to become a professional blogger, you need to learn some technical details related to blogging

One of them is the robots.txt file concept 🙂 

I have found that many bloggers underestimate these basics like Blog’s Sitemap , Blog’s robots.txt File , Blog’s Theme selection and many more

But With knowledge shout, you’ll be able to polish your blogging and SEO skills, and will able to skyrocket your rankings!

After reading this guide you will not only be able to understand the concept of robots.txt but also make your own customized robots.txt file for your blog and website.

I will provide a complete tutorial of how to set up your blogspot blog’s robots.txt file and at the last will give you a well structured robot file for your blog.

For the time being Hold a cup of tea because the guide is about to begin 🙂

What Is robots.txt ?

A robots.txt file restricts search engine crawlers not to crawl specific pages or sections of a website. 

The goal is to prevent your site from being swamped by requests. It is not intended to prevent a website from appearing in Google or any search engine.

You can block any web page on your site from being indexed in search engines by using the robots.txt file, such as your blog labels, your demo pages, or any other unimportant pages.

Why robots.txt File Is Important ?

In terms of importance, many professional bloggers make use of it to maximize their crawl budget by blocking undesirable pages and many other benefits, we will discuss in detail.

Additionally, it is not as necessary to add a robots.txt file to your website because Google will usually find and index all of the important pages on your site.

But,

For example, you may have files that are of little or no value to search engines, so it’s wise to hide these files from search engines. 

 A file used internally and not relevant to search engines should be hidden from search engines so it can manage your crawl budget effectively. 

But there are a couple of reasons for that: we have to add a robots.txt file.

Managing our Crawl Budget:  In case if you have large website and might be facing indexing problem then you might have a crawl budget issue By blocking unimportant pages via robots.txt, you can manage your crawl budget 

Block Non-Public Pages:  Sometimes you don’t want certain pages on your site to be indexed. For example, you might have a login page or o-auth authentication pages. 

These pages need to exist. But you don’t want random people landing on them. In that case the importance of robots.txt comes.

Restricting Resources To Be Indexed: If you have multimedia resources like PDFs Images , Documents That’s where robots.txt comes into play you can also kick them out from indexing

Format Of robots.txt File:

For your convenience , we have added a demo code that contains a number of new items such as mediapartners, user agents, allow, disallow, etc.

Code

User-agent: Mediapartners-Google Disallow:
User-agent: *
Disallow: /Search
Allow:/
Sitemap: https://www.Example.com/sitemap.xml

Let’s evaluate it one by one 

Mediapartner-Google:  it is the user agent of google adsense. If you have Google AdSense approval, you must include this line in your robots.txt file. Iit helps Google adsense to serve relevant ads to your audience. 

Note: If you disable this, you won’t be able to see any ads on your blocked pages.

User-agent: There are many crawers available on the internet Google , Google Mobile , Yahoo GigaBlast & Alexa/Wayback so it identify a specific crawler or set of crawlers.In the code you can see user-agent:*, which means these instructions are valid for all crawlers.

Note: * (asterisk) it indicates All

Allow: This keyword simply means you are allowing specific search engines to crawl those pages.

Disallow: Keyword tell search engines not to crawl and index that specific webpage. In this code, Disallow: /search indicates that you are disallowing your blog’s searches by default. 

Note : A little mistake could destroy your site. While setting up, please take care that your primary page is not blocked otherwise it’s unlikely to be indexed in any search engine.

Sitemap: It is simply a map of your site it helps search engines crawlers to discover your site. In the robots.txt file, you should add the sitemap because crawlers first notice the robots.txt file. So it’s a good idea to include the sitemap url inside the robots.txt file.

These are general search engines robots.txt specification if you want google’s robots.txt file specification  then that could be a little bit different

Robots Meta Directives:

If you are familiar with the Blogspot platform you may have seen “Meta tags” option it  actually called “robots Meta directives” but sometimes it is called as robots header tags

It is simply a line of code which is added when you want to block a specific page by simply putting the “noindex,nofollow” attribute in the Meta element. Whereas robots.txt file give bots instructions on how to crawl a website

robot meta directive

If you are a programmer then just remember Robot meta directives is an inline css whereas robots.txt file is external css just simple as that 

Format Of Robots Meta Directives:

custom robot.txt setup

In blogspot you can simply find these meta tags under the “settings” option. Some following attribute you may find while setuping you robot tags file

  • Noindex: it Indicates not to index a web page.
  • Index: Search engines use this command to index a page. This meta tag is not required. It’s the default.
  • Nofollow: It Tells  crawlers not to follow links on the page or transmit any link equity in the page.
  • Follow:It doesn’t matter if a page isn’t indexed, the crawler is supposed to follow an entire page’s links and pass equity to the linked pages.
  • None: The same as using both the noindex and nofollow tags simultaneously.
  • Noarchive: Search engines should not show a cached link to this page on a SERP.
  • Nocache:Similar to noarchive, but only used for Internet Explorer and Firefox.
  • Nosnippet:Allows a search engine not to display a snippet of this page (meta description) on a SERP.
  • Noimageindex: An image on that page should not be indexed by a crawler.

Note: These are common header tags format specification if you want google’s Robot Directives Format then that could be a little bit different

Setup Robots Meta Directives in Blogspot:

In blogger platforms robot meta directives are called robots header tags. In my Blogger SEO Settings Post, I also discussed how to set proper robot meta tags.

Just goto > Blogger.com  > Settings > Crawlers and indexing > Enable custom robots header 

custom headers for blogspot

Home Page:–  Option,  tick “All”  means You Allow Search engines to Fully Index Your blog Homepage.

Archive & Pages:–  Option, You need to Select (no index) and (no archive) because We Did not allow Search engines to index. Because The link of the Article of Archive or Search Pages is different From the Original Article link.

Default Post and Pages:– Here Click On All and then click on to Save Settings button. Now All Option is Done in Custom Robots Header Tags.

How To Create Custom robots.txt File

Remember that robots.txt is optional. Crawlers can still crawl your entire website without it, but it might be more professional if you create standards that crawlers will follow. 

But,

It is quite simple to create a custom robots.txt file.you can find many online tools which allow you to create a customized robots.txt file.

The tool which i recommend is SmallSeoTools generator provides many options like search engine selection, Google Image bot selection, Google mobile bot selection, and many more

robots.txt ganerator

 

If you have a static or dynamic website then, This tool will be very helpful to create a SEO-friendly robots.txt file. 

Furthermore, it is quite simple to create the best robots.txt file for your blog whether it is hosted on Blogger or WordPress.

Robots.txt File For Blogger:

Step-1 Just Go to > Blogger.com on the left side bar you will see settings option just click on that 

Step-2 Find Custom robots.txt & paste above given code there 

custom robot.txt setup

Code:

User-agent: Mediapartners-Google Disallow:
User-agent: *
Disallow: /Search
Allow: /
Sitemap: https://www.Example.com/sitemap.xml

For your convenience, I’ve provided robots.txt code; all you need to do is replace “https://www.Example.com/sitemap.xml” with your site’s URL.

header-robot.txt

Robots.txt File For WordPress:

Method 1:

Simple, effective, and fast method to create a robots.txt optimized for search engines in WordPress, choose any of the plugins (RankMath or Yoast SEO) which will automatically create optimized  robots.txt file for search engines according to your need.

Don’t know how to add a robots.txt file by using these plugin? So you can read complete guide Here:

1- How To Add robots.txt File Using RankMath

2- How to edit robots.txt through Yoast SEO

Method :2 

If you want to create a robots.txt file manually in WordPress, you have to use an FTP client.

Simply go through your hosting c-panle>file manager > public_html> wp-include

Under that folder you probably see robots.txt file if not create your own robots.txt file.

Please make sure that this file is case sensitive. It must be written in lowercase with a “.txt” extension.

You can also drag and drop your robots.txt file there if you have it available

Note: it’s not a rule that you must make a robots.txt file in wp-include, you can make it anywhere you want, but for best practices this path seems best.

We recommend adding the following rules to the robots.txt file for WordPress sites just replace example.com with your domain name

User-Agent: *
Allow: /wp-content/uploads/
Disallow: /wp-content/plugins/
Disallow: /wp-admin/
Disallow: /readme.htmlDisallow: /refer/ 
Sitemap: http://www.example.com/post-sitemap.xml
Sitemap: http://www.example.com/page-sitemap.xml

Testing Your Robots.txt File?

You should always test your robots.txt file after creating it. Because one small mistake can lead to your entire site being deindexed. 

Checking robots.txt file is quite easy because google it self provides robots.txt tester tool 

For the very first time, you’ll need to verify your site with Google Search Console. 

chosing website property

Simply set up your property and the tool will automatically pull up your robots.txt file, or you can manually paste it in there.

robots.txt tester tool

No error found Hurrah!

Final Words

The tutorial has ended and I hope you enjoyed reading this informative post.I did my best to make it both brief and informative. 

The robots.txt file you create for your website will not only enhance SEO for your site, but also help out crawlers.In result you have managed your crawl budget and your content might be more visible in SERPs.

I hope you enjoy this article, so be sure to subscribe to our YouTube channel  for more blogging and seo video tutorials in hindi-urdu. You can also locate us on Linkedin and Facebook.

Abdul rafay

Hey, My Self Abdul Rafay Qazi a Blogging Scientist. I am a Full-Time Blogger with robust Skills of Blogging, Affiliate Marketing, WordPress, and SEO. I am here To Encourage New Bloggers Aspirants And Help Them To Earn Their First Dollar $ From Online.

Leave a Reply