This is about what we do now - we catch UnicodeError and then add a BOM to the file, and read it again. We know our files are UTF-16BE if they don’t have a BOM, as the files are written by code which observes the spec. We can’t use UTF-16BE all the time, because sometimes they’re UTF-16LE, and in those cases the BOM is set.
It would be nice if you could optionally specify that the codec would assume UTF-16BE if no BOM was present, and not raise UnicodeError in that case, which would preserve the current behaviour as well as allow users’ to ask for behaviour which conforms to the standard.
I’m not saying that you can’t work around the issue now, what I’m saying is that you shouldn’t have to - I think there is a reasonable expectation that the UTF-16 codec conforms to the spec, and if you wanted it to do something else, it is those users who should be forced to come up with a workaround.—Nicholas