In recent (as of 1997/10/31) kernels, the screen driver is based on 16-bit unicode (UCS2) encoding, which means that every console-font loaded should be defined using a unicode Screen Font Map (SFM for short)
SFM's were formerly called ``Unicode Map'', or ``unimap'' for short, but this term should be dropped, as now what they called ``screen maps'' uses Unicode as well: it probably confuses many many people, which tells, for each character in the font, the list of UCS2 characters it will render.
Starting with release 1997.11.13 of the Linux Console Tools, consolechars(8)
now understands SFM fallback tables. Before that, SFM's should
contain at the same time the Unicode of the characters it was
primarily meant to render, as well as any approximations the user
would like to. These fallback tables allow to only put the primary
mappings in the SFM provided with the font-file, and to
separately keep a list telling ``if no glyph for that
character is available in the current font, then try to display it
with the glyph for this one, or else the one for that one, or
...''. This permits to keep in one only place all possible
fallbacks, and everyone will be able to choose which fallback tables
(s)he wants. Have a look at data/consoletrans/*.fallback
for
examples.
A fallback-table file is made of fallback entries, each entry being on
its own line. Empty lines, and lines beginning with the #
comment character are ignored.
A fallback entry is a series of 2 or more UCS2 codes. The first one is the character for which we want a glyph; the following ones are those whose glyph we want to use when no glyph designed specially for our character is available. The order of the codes defines a priority order (own glyph if available, then second char's, then the third's, etc.)
If a SFM was to be loaded, fallback mappings are added to this map
before it is loaded. If there was not (ie. a font without SFM was
loaded, and no --sfm
option was given to consolechars
, or
the --force-no-sfm
option was given), then the current SFM is
requested from the kernel, the fallback mappings are added, and the
resulting SFM is loaded back into the kernel.
Note that each fallback entry is checked against the original SFM, not
against the SFM we get by adding former fallback entries to the
original SFM (the one read from a file, or given by the kernel); this
applies even to entries in different files, and thus the order of
-k
options has no effect. If you want some entries to be
influenced by previous ones, you will have to use different fallback
files, and to load them with several consecutive invocations of
consolechars -k
.
There are basically 2 screen-modes (byte mode and UTF mode). The simpler to explain is the UTF mode, in which the bytes received from the application (ie. written to the console screen) are interpreted as UTF8 sequences, which are converted in the equivalent UCS2 codes, and then looked-up in the SFM to determine the glyphs used to display each character.
Switching to and from UTF mode is done by sending to the screen the
escape sequences <ESC>%G
and <ESC>%@
respectively. You may use the unicode_start(1)
and
unicode_stop(1)
scripts instead, as they also change the keyboard
mode, and let you optionally change the screen-font.
Use vt-is-UTF8(1)
to find out whether active VT is in UTF mode.
The byte mode is a bit more complicated, as it uses an additional map to transform the byte-characters sent by the application into UCS2 characters, which are then treated as told above. This map I call the Application Charset Map (ACM), because it defines the encoding the application uses, but it used to be called a ``screen map'', or ``console map'' (this comes from the time where the screen driver didn't use Unicode, and there was only one Map down there).
Although there is only one ACM active at a given time, there are 4 of them at any time in the kernel; 3 of them are built-in and never change, and they define the IBM codepage 437 (the i386's default, and thus the kernel's default even on other archs), the DEC VT100 charset, and the ISO latin1 charset; the 4th is user-definable, and defaults on boot to the ``straight to font'' mapping, decribed below under ``Special UCS2 codes''.
The consolechars(1)
command can be used to change the ACM, as
well as the font and its associated SFM.
There are special UCS2 values you should care about, but the present list is probably not exhaustive:
C
from U+F000
to U+F1FF
are not looked-up
in the SFM, and directly accesses the character in font-position C
& 0x01FF
(yes, a font can be 512-chars on many hardware
platforms, like VGA). This is refered to as the straight to font
zone.
U+FFFD
is the replacement character, usually at
font-position 0 in a font. It is displayed by the kernel each time
the application requested a unicode character that is not present in
the SFM. This allows not only the driver to be safe in Unicode mode,
but also prevents displaying invalid characters when the ACM on a
particular VT contains characters not in the current font !
There was a time where the kernel didn't know anything about Unicode. In this ancient time, Application Charset Maps were called ``screen maps'', and just mapped the application's characters into font positions. The file format used for these 8bit ACM's is still supported for backward compatibility, but should not be used any more.
The old way of using custom ACM's didn't know about unicode, so the
ACM had to depend on the font. Now, as each VT chooses its own ACM
(from the 4 ones in the kernel at a given time), and as the
console-font is common to all VT's, we can use a charset even if the
font can't display all of its characters; it will then display the
replacement character (U+FFFD
).
psfaddtable(1)
, psfgettable(1)
, psfstriptable(1)
,
showfont(1)
.