Saturday, January 12, 2008

Vulnerability Series: Robots.txt

The robots.txt file exists on the webserver to provide instructions to automated crawling engines (such as Yahoo! or Google) to NOT index specified areas of the application.

A standard robots.txt file would looks something like....


As you can see from the above sample page, the robots file provides an additional point of information for the mapping around your web application.

How to apply this attack

As stated above, the robots file is placed in the root of the web application so no standard access controls are placed around this file. Simply pointing your browser to http://insert_target_host/robots.txt will suffice.


The remediation for this file is to remove the robots file if it's really not needed. The goal behind listing this in our vulnerability series is (and please don't insert laugh) many administrators will actually attempt to use this file for access controls around their applications. The robots file is simply a very polite request to the automated web crawlers to not index specific components of the web application....nothing more.

So ....if you really don't need to have the file exist on your web server, please remove it. If you require additional steps to protect the listed directories in your application, it is necessary to code in additional access and authorization controls.