I recently rediscovered this strange behaviour in Python’s Unicode handling.—Evan

This can be improved, of course: If the first byte is "a", it most definitely is not an UTF-8 signature.

So we only need a second byte for the characters between U+F000 and U+FFFF, and a third byte only for the characters U+FEC0...U+FEFF. But with the first byte being \xef, we need three bytes anyway, so we can always decide with the first byte only whether we need to wait for three bytes.—"Martin