переводить символы (translate characters)
Обоснование (Rationale)
In some early proposals, an explicit option -n
was added to
disable the historical behavior of stripping NUL characters from
the input. It was considered that automatically stripping NUL
characters from the input was not correct functionality.
However, the removal of -n
in a later proposal does not remove
the requirement that tr correctly process NUL characters in its
input stream. NUL characters can be stripped by using tr -d
'\000'.
Historical implementations of tr differ widely in syntax and
behavior. For example, the BSD version has not needed the bracket
characters for the repetition sequence. The tr utility syntax is
based more closely on the System V and XPG3 model while
attempting to accommodate historical BSD implementations. In the
case of the short string2 padding, the decision was to unspecify
the behavior and preserve System V and XPG3 scripts, which might
find difficulty with the BSD method. The assumption was made
that BSD users of tr have to make accommodations to meet the
syntax defined here. Since it is possible to use the repetition
sequence to duplicate the desired behavior, whereas there is no
simple way to achieve the System V method, this was the correct,
if not desirable, approach.
The use of octal values to specify control characters, while
having historical precedents, is not portable. The introduction
of escape sequences for control characters should provide the
necessary portability. It is recognized that this may cause some
historical scripts to break.
An early proposal included support for multi-character collating
elements. It was pointed out that, while tr does employ some
syntactical elements from REs, the aim of tr is quite different;
ranges, for example, do not have a similar meaning (``any of the
chars in the range matches'', versus ``translate each character
in the range to the output counterpart''). As a result, the
previously included support for multi-character collating
elements has been removed. What remains are ranges in current
collation order (to support, for example, accented characters),
character classes, and equivalence classes.
In XPG3 the [:class:] and [=equiv=] conventions are shown with
double brackets, as in RE syntax. However, tr does not implement
RE principles; it just borrows part of the syntax. Consequently,
[:class:] and [=equiv=] should be regarded as syntactical
elements on a par with [x*n], which is not an RE bracket
expression.
The standard developers will consider changes to tr that allow it
to translate characters between different character encodings, or
they will consider providing a new utility to accomplish this.
On historical System V systems, a range expression requires
enclosing square-brackets, such as:
tr '[a-z]' '[A-Z]'
However, BSD-based systems did not require the brackets, and this
convention is used here to avoid breaking large numbers of BSD
scripts:
tr a-z A-Z
The preceding System V script will continue to work because the
brackets, treated as regular characters, are translated to
themselves. However, any System V script that relied on "a‐z"
representing the three characters 'a'
, '-'
, and 'z'
have to be
rewritten as "az-"
.
The ISO POSIX‐2:1993 standard had a -c
option that behaved
similarly to the -C
option, but did not supply functionality
equivalent to the -c
option specified in POSIX.1‐2008.
The earlier version also said that octal sequences referred to
collating elements and could be placed adjacent to each other to
specify multi-byte characters. However, it was noted that this
caused ambiguities because tr would not be able to tell whether
adjacent octal sequences were intending to specify multi-byte
characters or multiple single byte characters. POSIX.1‐2008
specifies that octal sequences always refer to single byte binary
values when used to specify an endpoint of a range of collating
elements.
Earlier versions of this standard allowed for implementations
with bytes other than eight bits, but this has been modified in
this version.