I recently rediscovered this strange behaviour in Python’s Unicode handling.—Evan
The BOM—M.-A.

You are correct: it is a legitimate character. However, its use as a ZWNBSP character has been deprecated:

  The overloading of semantics for this code point has caused problems   for programs and protocols. The new character U+2060 WORD JOINER has   the same semantics in all cases as U+FEFF, except that it cannot be   used as a signature. Implementers are strongly encouraged to use word   joiner in those circumstances whenever word joining semantics is   intended.

Also, the Unicode specification is ambiguous on what an implementation should do about a leading ZWNBSP that is encoded in UTF-16. Like I mentioned, if you look at the Unicode standard, version 4, section 15.9, it says:

  2. Unmarked Character Set. In some circumstances, the character set   information for a stream of coded characters (such as a file) is not   available. The only information available is that the stream contains   text, but the precise character set is not known.

This seems to indicate that it is permitted to strip the BOM from the beginning of UTF-8 text.—Evan