As a website indexing is automated, the program used is often referred to as a ‘robot’ or ‘’bot’.
A robots.txt is a permissions file that can be used to control which webpages of a website a search engine indexes. The file must be located in the root directory of the website for a search engine website-indexing program (spider) to reference, i.e. if
the website address is:
www.yoursite.co.nz
then the robots.txt file must be located at:
www.yoursite.co.nz/robots.txt
Respecting robots.txt directives is a courtesy—spiders are not prevented from indexing content even if a robots.txt ‘ban’ is in place.
The robots.txt file contains instructions (directives) that are read by spiders. Directives can be written to apply to all spiders or only target spiders from a specific search engine.
If a website has no robots.txt file, then a spider will follow links from the webpage that has been registered with its
parent search engine to index the website content.
crawler/spider/robot, hyperlink, root directory, search engine, SEO,