I recently rediscovered this strange behaviour in Python’s Unicode handling.—Evan
Well,—M.-A.

The UTF-16 stream codecs implement this logic.

The UTF-16 encode and decode functions will however always strip the BOM mark from the beginning of a string.

If the application doesn’t want this stripping to happen, it should use the UTF-16-LE or -BE codec resp.—M.-A.

That sounds like it would work fine almost all the time. If it doesn’t it’s straightforward to work around, and certainly would be more convenient for the non-standards-geek programmer.—Stephen