Yahoo! Slurp announces additional sophistication that allows webmasters greater control in directing a bot’s path with the robots.txt file.
Product manager Priyank Garg writes, “I was going through my notes from Danny Sullivan’s Open Feedback sessions during the ‘Meet the Crawlers’ panels at Search Engine Strategies conferences. One of the items on my list was a request for enhanced syntax in robots.txt to make it easier for webmasters to manage how search crawlers, including Slurp, access your content.”
The two new variations are ‘*’ –which will identify a sequential string of characters, and ‘$’ –which will denote pages with a given ending in their file name. These can be used to tell the Slurp bot, for example, to follow (or ignore) any pages in the directory with the phrase ‘_print’ (‘/*_print*.html’). As well, their combinations work–for example, to disallow ‘.gif’ files (‘/*.gif$’).