Auto generated robots.txt file in WordPress

Last updated on February 10, 2013. Tags: ,

I never used robots.txt in any of my sites. I usually want all parts of my site to be indexed by search engines. In the rare instances that I need to block some pages from appearing in search results, I rely on meta robots instead.

I don't think I will be discussing robots.txt in the near future since I never used it. You can check this entry in Wikipedia for a more comprehensive information.

One of my blogs, the real estate review PREFER, (powered by WordPress CMS just like my other sites) is taking longer than usual to get indexed. I asked for help from certain web development forums and some guys suggested that I check the robots.txt. I don't know why should I do that since I didn't create any robots.txt, but I tried it out anyway. I typed prefer.hub.ph/robots.txt in the address bar of my browser and pressed enter. Guess what I saw?

User-agent: *
Disallow:

There's an existing robots.txt and it contains the directive provided above. I checked Wikipedia and found out that this directive allows all search engine crawlers to visit all pages and folders in the site, just like when there's no robots.txt. Therefore, this is not the problem why that site is not getting indexed, but that's not our topic here.

While not a problem, why is it there? I don't remember creating such file. I checked my directory in the file manager (where the WordPress files are located) and I didn't find any robots.txt there.

Apparently, this robots.txt is auto generated by WordPress and can be controlled using the Privacy Options under the Settings Tab.

screenshot of WordPress privacy setting that controls robots.txt

In the Privacy Options, you only have two options:

  • allow all search engines - "I would like my blog to be visible to everyone, including search engines (like Google, Sphere, Technorati) and archivers"
  • block all search engines - "I would like to block search engines, but allow normal visitors"

The first option is what is currently selected in the screenshot, and generates directive similar to the code above. If you select the second option (block all search engines), the directive in the auto generated robots.txt changes into:

User-agent: *
Disallow: /

This robots.txt directive prevents any search engine from visiting and indexing any page, folder or file in your site.

Current observations on WordPress' auto generated robots.txt

  • I tried creating robots.txt (with different directives) in the prefr.hub.ph subdomain and it has overwritten the one auto generated by WordPress. Since robots.txt do not work if put in folders other than the root directory, I tried doing the same in G8 Ventures (my mom's website that I maintain) and it also overwrote the auto generated one.
  • In WordPress support forum, a member mentioned that he uploaded robots.txt but it wasn't able to overwrite the auto generated robots.txt. I also tried uploading robots.txt as oppose to creating it in the file manager and I was still able to overwrite the auto generated one. This possibly due to WordPress version as they were taking about 2.6 and I tested the overwriting of robots.txt using version 2.7 and 2.8.
  • Some plugins can alter the directives of the autogenetated robots.txt. For example, XML Sitemap Generator can add information about the location of the sitemap.
User-agent: *
Disallow: /

Sitemap: http://codegrad.hub.ph/sitemap.xml.gz

For further study

  • Permanently removing the WordPress' auto generated robots.txt.
  • The robots.txt can take effect only if placed in the root directory. Do the robots.txt files auto generated by WordPress-powered websites in subdomains work?

Posted by Greten on October 2, 2009 under WordPress

Under maintenance; comment temporarily disabled.

Codegrad

HTML, CSS, Javascript, search engine optimization, WordPress customization and other web development tutorials by Greten Estella