I recently rediscovered this strange behaviour in Python’s Unicode handling.
—Evan
►
-1; there’s no standard for UTF-8 BOMs—adding it to the codecs module was probably a mistake to begin with.
—M.-A.
►
There is a standard for UTF-8
signatures
, however.
—Stephen
►
With the UTF-8-SIG codec, it would apply to all operation modes of the codec, whether stream-based or from strings.
—"Martin
►
I’d suggest to use the same mode of operation as we have in the UTF-16 codec:
—M.-A.
►
I’ve actually been confused about this point for quite some time now, but never had a chance to bring it up.
—Nicholas
►
See
above.
Thanks,
—M.-A.