Путеводитель по Руководству Linux

  User  |  Syst  |  Libr  |  Device  |  Files  |  Other  |  Admin  |  Head  |



   tr.1p    ( 1 )

переводить символы (translate characters)

Обоснование (Rationale)

In some early proposals, an explicit option -n was added to disable the historical behavior of stripping NUL characters from the input. It was considered that automatically stripping NUL characters from the input was not correct functionality. However, the removal of -n in a later proposal does not remove the requirement that tr correctly process NUL characters in its input stream. NUL characters can be stripped by using tr -d '\000'.

Historical implementations of tr differ widely in syntax and behavior. For example, the BSD version has not needed the bracket characters for the repetition sequence. The tr utility syntax is based more closely on the System V and XPG3 model while attempting to accommodate historical BSD implementations. In the case of the short string2 padding, the decision was to unspecify the behavior and preserve System V and XPG3 scripts, which might find difficulty with the BSD method. The assumption was made that BSD users of tr have to make accommodations to meet the syntax defined here. Since it is possible to use the repetition sequence to duplicate the desired behavior, whereas there is no simple way to achieve the System V method, this was the correct, if not desirable, approach.

The use of octal values to specify control characters, while having historical precedents, is not portable. The introduction of escape sequences for control characters should provide the necessary portability. It is recognized that this may cause some historical scripts to break.

An early proposal included support for multi-character collating elements. It was pointed out that, while tr does employ some syntactical elements from REs, the aim of tr is quite different; ranges, for example, do not have a similar meaning (``any of the chars in the range matches'', versus ``translate each character in the range to the output counterpart''). As a result, the previously included support for multi-character collating elements has been removed. What remains are ranges in current collation order (to support, for example, accented characters), character classes, and equivalence classes.

In XPG3 the [:class:] and [=equiv=] conventions are shown with double brackets, as in RE syntax. However, tr does not implement RE principles; it just borrows part of the syntax. Consequently, [:class:] and [=equiv=] should be regarded as syntactical elements on a par with [x*n], which is not an RE bracket expression.

The standard developers will consider changes to tr that allow it to translate characters between different character encodings, or they will consider providing a new utility to accomplish this.

On historical System V systems, a range expression requires enclosing square-brackets, such as:

tr '[a-z]' '[A-Z]'

However, BSD-based systems did not require the brackets, and this convention is used here to avoid breaking large numbers of BSD scripts:

tr a-z A-Z

The preceding System V script will continue to work because the brackets, treated as regular characters, are translated to themselves. However, any System V script that relied on "a‐z" representing the three characters 'a', '-', and 'z' have to be rewritten as "az-".

The ISO POSIX‐2:1993 standard had a -c option that behaved similarly to the -C option, but did not supply functionality equivalent to the -c option specified in POSIX.1‐2008.

The earlier version also said that octal sequences referred to collating elements and could be placed adjacent to each other to specify multi-byte characters. However, it was noted that this caused ambiguities because tr would not be able to tell whether adjacent octal sequences were intending to specify multi-byte characters or multiple single byte characters. POSIX.1‐2008 specifies that octal sequences always refer to single byte binary values when used to specify an endpoint of a range of collating elements.

Earlier versions of this standard allowed for implementations with bytes other than eight bits, but this has been modified in this version.