There are three types of characters in the input file:
normal characters
can be displayed directly to the screen.
control characters
should not be displayed directly, but are expected to be
found in ordinary text files (such as backspace and tab).
binary characters
should not be displayed directly and are not expected to
be found in text files.
A "character set" is simply a description of which characters are
to be considered normal, control, and binary. The LESSCHARSET
environment variable may be used to select a character set.
Possible values for LESSCHARSET are:
ascii BS, TAB, NL, CR, and formfeed are control characters, all
chars with values between 32 and 126 are normal, and all
others are binary.
iso8859
Selects an ISO 8859 character set. This is the same as
ASCII, except characters between 160 and 255 are treated
as normal characters.
latin1 Same as iso8859.
latin9 Same as iso8859.
dos Selects a character set appropriate for MS-DOS.
ebcdic Selects an EBCDIC character set.
IBM-1047
Selects an EBCDIC character set used by OS/390 Unix
Services. This is the EBCDIC analogue of latin1. You get
similar results by setting either LESSCHARSET=IBM-1047 or
LC_CTYPE=en_US in your environment.
koi8-r Selects a Russian character set.
next Selects a character set appropriate for NeXT computers.
utf-8 Selects the UTF-8 encoding of the ISO 10646 character set.
UTF-8 is special in that it supports multi-byte characters
in the input file. It is the only character set that
supports multi-byte characters.
windows
Selects a character set appropriate for Microsoft Windows
(cp 1251).
In rare cases, it may be desired to tailor less to use a
character set other than the ones definable by LESSCHARSET. In
this case, the environment variable LESSCHARDEF can be used to
define a character set. It should be set to a string where each
character in the string represents one character in the character
set. The character "." is used for a normal character, "c" for
control, and "b" for binary. A decimal number may be used for
repetition. For example, "bccc4b." would mean character 0 is
binary, 1, 2 and 3 are control, 4, 5, 6 and 7 are binary, and 8
is normal. All characters after the last are taken to be the
same as the last, so characters 9 through 255 would be normal.
(This is an example, and does not necessarily represent any real
character set.)
This table shows the value of LESSCHARDEF which is equivalent to
each of the possible values for LESSCHARSET:
ascii 8bcccbcc18b95.b
dos 8bcccbcc12bc5b95.b.
ebcdic 5bc6bcc7bcc41b.9b7.9b5.b..8b6.10b6.b9.7b
9.8b8.17b3.3b9.7b9.8b8.6b10.b.b.b.
IBM-1047 4cbcbc3b9cbccbccbb4c6bcc5b3cbbc4bc4bccbc
191.b
iso8859 8bcccbcc18b95.33b.
koi8-r 8bcccbcc18b95.b128.
latin1 8bcccbcc18b95.33b.
next 8bcccbcc18b95.bb125.bb
If neither LESSCHARSET nor LESSCHARDEF is set, but any of the
strings "UTF-8", "UTF8", "utf-8" or "utf8" is found in the
LC_ALL, LC_CTYPE or LANG environment variables, then the default
character set is utf-8.
If that string is not found, but your system supports the
setlocale interface, less will use setlocale to determine the
character set. setlocale is controlled by setting the LANG or
LC_CTYPE environment variables.
Finally, if the setlocale interface is also not available, the
default character set is latin1.
Control and binary characters are displayed in standout (reverse
video). Each such character is displayed in caret notation if
possible (e.g. ^A for control-A). Caret notation is used only if
inverting the 0100 bit results in a normal printable character.
Otherwise, the character is displayed as a hex number in angle
brackets. This format can be changed by setting the LESSBINFMT
environment variable. LESSBINFMT may begin with a "*" and one
character to select the display attribute: "*k" is blinking, "*d"
is bold, "*u" is underlined, "*s" is standout, and "*n" is
normal. If LESSBINFMT does not begin with a "*", normal
attribute is assumed. The remainder of LESSBINFMT is a string
which may include one printf-style escape sequence (a % followed
by x, X, o, d, etc.). For example, if LESSBINFMT is "*u[%x]",
binary characters are displayed in underlined hexadecimal
surrounded by brackets. The default if no LESSBINFMT is
specified is "*s<%02X>". Warning: the result of expanding the
character via LESSBINFMT must be less than 31 characters.
When the character set is utf-8, the LESSUTFBINFMT environment
variable acts similarly to LESSBINFMT but it applies to Unicode
code points that were successfully decoded but are unsuitable for
display (e.g., unassigned code points). Its default value is
"<U+%04lX>". Note that LESSUTFBINFMT and LESSBINFMT share their
display attribute setting ("*x") so specifying one will affect
both; LESSUTFBINFMT is read after LESSBINFMT so its setting, if
any, will have priority. Problematic octets in a UTF-8 file
(octets of a truncated sequence, octets of a complete but non-
shortest form sequence, invalid octets, and stray trailing
octets) are displayed individually using LESSBINFMT so as to
facilitate diagnostic of how the UTF-8 file is ill-formed.