An Updated robots.txt File for Magento 2 Stores

 The stock robots.txt file that worked well for Magento 1.x stores does not work perfectly for Magento 2.x stores. A store’s robots.txt file is used by search engines like Google and Bing to tell them what to index and, more importantly, what not to index, in your website.

For a complex ecommerce store like Magento, it is important to limit what search engines will crawl. You likely don’t want them indexing cart contents, adding to wishlists, search results, etc…

With Magento 2, a few of the sorting options URL patterns have changed. Listed below is what we are using for Magento 2 stores to help limit search engines, avoid duplicate data, and prevent a store from being crawled excessively leading to load issues.


 

User-agent: *
# Directories
Disallow: /app/
Disallow: /bin/
Disallow: /dev/
Disallow: /lib/
Disallow: /phpserver/
Disallow: /setup/
Disallow: /update/
Disallow: /var/
Disallow: /vendor/
# Paths (clean URLs)
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /sendfriend/
Disallow: /wishlist/
# Files
Disallow: /composer.json
Disallow: /composer.lock
Disallow: /CONTRIBUTING.md
Disallow: /CONTRIBUTOR_LICENSE_AGREEMENT.html
Disallow: /COPYING.txt
Disallow: /Gruntfile.js
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /nginx.conf.sample
Disallow: /package.json
Disallow: /php.ini.sample
Disallow: /RELEASE_NOTES.txt
# Do not index pages that are sorted or filtered.
Disallow: /*?*product_list_mode=
Disallow: /*?*product_list_order=
Disallow: /*?*product_list_limit=
Disallow: /*?*product_list_dir=
# Do not index session ID
Disallow: /*.php$
Disallow: /*?SID=
# CVS, SVN directory and dump files
Disallow: /*.CVS
Disallow: /*.Zip$
Disallow: /*.Svn$
Disallow: /*.Idea$
Disallow: /*.Sql$
Disallow: /*.Tgz$

This may have to be adjusted for customized themes such as price ranges, or other non-standard on-page filters if they may lead to indexing duplicate content.

Did we miss anything? Please let us know in the comments.  :)

Looking for a web host that understands ecommerce and business hosting?
Check us out today!

8 Comments

  1. Steve says:

    I really appreciate that you have embraced Magento 2 with your constant learning. This along with your other information is quite useful. I just updated my old robots.txt with this new one.

  2. It’s worth noting that this assumes that the root of the site is the whole Magento 2 installation. This is not recommended. The root of the public site should be the pub/ folder. In which case quite a bit of these restrictions are not needed e.g. app/ and vendor/

  3. alli says:

    Our Google Webmaster tool is throwing us errors while trying to access our sitemap: it says URL restricted by robots.txt

    This is our current Robots.txt file, is there an unnecessary in here that could cause this issue? Thank you so much in advance!
    User-agent: *
    Disallow: /index.php/
    Disallow: /*?
    Disallow: /checkout/
    Disallow: /app/
    Disallow: /lib/
    Disallow: /*.php$
    Disallow: /pkginfo/
    Disallow: /report/
    Disallow: /var/
    Disallow: /catalog/
    Disallow: /customer/
    Disallow: /sendfriend/
    Disallow: /review/
    Disallow: /*SID=

  4. Does your sitemap URL have a query string in it? I don’t see anything that would prevent it in the list above.

  5. alli says:

    Here is what I am seeing

    My actual sitemap file that is submitted:
    Sitemap: /sitemap/sitemap.xml
    This Sitemap was submitted Oct 8, 2013, and processed Jan 27, 2017.
    and it points to this URL >> which is our canada website (setup as a sub folder I think under our main default US website) so this seems really strange.
    http://www.heididausdesigns.com/canada/catalogsearch/result/?f=site+xml&q=xml

    Our 2 errors and dates are below:
    June 19 We encountered an error while trying to access your Sitemap. Please ensure your Sitemap follows our guidelines and can be accessed at the location you provided and then resubmit. example URL restricted by robots.txt

    Jan 27 Your Sitemap appears to be an HTML page. Please use a supported sitemap format instead.

    Any insight you have would be awesome, thank you!

  6. Emil Shamloo says:

    Hi, recently I’ve got a lot of pub/static/…. files 404, not found by google!
    Should I disallow /pub for crawling my site?
    Regards

Leave a Reply