BOM in the UTF-8 encoded Robots.txt

7/13/2013 2:22:32 PM

SYNTAX NOT UNDERSTOOD. Robots.txt is supposed to be encoded in UTF-8. Search engines might get confused by a BOM in the beginning of the file, causing incorrect disallowance behaviour.

Some window programs (as of notepad) adds a 3 byte BOM in the beginning of the file upon saving text with UTF-8 encoding.

This causes a google syntax not understood error for the first line of the robots.txt which often will contain the instruction User-agent: *

This is bad news for our indexing. Do we need to remove the BOM?

No! Just add a line break and let the first line of instructions be on the second row.

That second row will now be a clean (no BOM) stream of UTF-8 characters that search engines will read without problem. The first BOMCRLF line will be ignored and cause no other trouble.

Tata!