Traditionnaly, character encodings use 8 bits, and thus are limited to 256 characters. This causes problems because:
META: The fillowing stuff about what is done by whom is a little fuzzy; I have to investigate that further.
Thus the 16-bit UCS2 (Universal Character Set on 2 bytes), and the 32-bit UCS4 (yes, 4 bytes) were created to handle and mix all of our world's scripts. For convenience, the UTF8 encoding was designed as a variable-length encoding (with 8 bytes of maximum length) with ASCII compatibility; all chars that have a UCS4 encoding can be expressed as a UTF8 sesquence, and vice-versa.
Note that there is also a normalization effort at ISO (10646), about
which the unicode(7)
manpage tells it produces the UCS charsets.
The Unicode consortium defines its own standard name Unicode, which is I believe compatible with ISO 10646 charsets.
See: unicode(7)
, utf-8(7)
.