I doubt that they are encoded in UCS-2 as that character set isn't able to encod...

eklitzke · on Feb 8, 2011

In Python 2.x are encoded in UCS-2, not UTF-16, at least by default (I'm not sure about Python 3.x, I assume it's the same though). If you want to support every single possible Unicode codepoint, you can tell Python to do so at compile time (via ./configure flag).

In practice the characters that aren't in UCS-2 tend to be characters that don't exist in modern languages, e.g. the characterset for Linear B, Domino tiles, and Cuneiform, so they're not supported since they're not of practical use to most people. There's a fairly good list at http://en.wikipedia.org/wiki/Plane_(Unicode) . In this list, Python by default doesn't support things not in the BMP.

Locke1689 · on Feb 8, 2011

No, the Python internals support surrogates so you can support characters outside the BMP. This makes it (basically) UTF-16.

sedachv · on Feb 9, 2011

Things outside of the BMP aren't just dead languages anymore. You have to be able to support characters outside the BMP if you want to sell your software in China:

http://en.wikipedia.org/wiki/GB_18030