сортировать, объединять или проверять последовательность текстовых файлов (sort, merge, or sequence check text files)
Обоснование (Rationale)
Examples in some historical documentation state that options -um
with one input file keep the first in each set of lines with
equal keys. This behavior was deemed to be an implementation
artifact and was not standardized.
The -z
option was omitted; it is not standard practice on most
systems and is inconsistent with using sort to sort several files
individually and then merge them together. The text concerning -z
in historical documentation appeared to require implementations
to determine the proper buffer length during the sort phase of
operation, but not during the merge.
The -y
option was omitted because of non-portability. The -M
option, present in System V, was omitted because of non-
portability in international usage.
An undocumented -T
option exists in some implementations. It is
used to specify a directory for intermediate files.
Implementations are encouraged to support the use of the TMPDIR
environment variable instead of adding an option to support this
functionality.
The -k
option was added to satisfy two objections. First, the
zero-based counting used by sort is not consistent with other
utility conventions. Second, it did not meet syntax guideline
requirements.
Historical documentation indicates that ``setting -n
implies
-b
''. The description of -n
already states that optional leading
<blank>s are tolerated in doing the comparison. If -b
is enabled,
rather than implied, by -n
, this has unusual side-effects. When a
character offset is used in a column of numbers (for example, to
sort modulo 100), that offset is measured relative to the most
significant digit, not to the column. Based upon a
recommendation from the author of the original sort utility, the
-b
implication has been omitted from this volume of POSIX.1‐2017,
and an application wishing to achieve the previously mentioned
side-effects has to code the -b
flag explicitly.
Earlier versions of this standard allowed the -o
option to appear
after operands. Historical practice allowed all options to be
interspersed with operands. This version of the standard allows
implementations to accept options after operands but conforming
applications should not use this form.
Earlier versions of this standard also allowed the -
number and
+
number options. These options are no longer specified by
POSIX.1‐2008 but may be present in some implementations.
Historical implementations produced a message on standard error
when -c
was specified and disorder was detected, and when -c
and
-u
were specified and a duplicate key was detected. An earlier
version of this standard contained wording that did not make it
clear that this message was allowed and some implementations
removed this message to be sure that they conformed to the
standard's requirements. Confronted with this difference in
behavior, interactive users that wanted to be sure that they got
visual feedback instead of just exit code 1 could have used a
command like:
sort -c file || echo disorder
whether or not the sort utility provided a message in this case.
But, it was not easy for a user to find where the disorder or
duplicate key occurred on implementations that do not produce a
message, especially when some parts of the input line were not
part of the key and when one or more of the -b
, -d
, -f
, -i
, -n
,
or -
r options or keydef type modifiers were in use. POSIX.1‐2008
requires a message to be produced in this case. POSIX.1‐2008 also
contains the -C
option giving users the ability to choose either
behavior.
When a disorder or duplicate is found when the -c
option is
specified, some implementations print a message containing the
first line that is out of order or contains a duplicate key;
others print a message specifying the line number of the
offending line. This standard allows either type of message.
Implementations are encouraged to perform the recommended further
byte-by-byte comparison of lines that collate equally, even
though this may affect efficiency. The impact on efficiency can
be mitigated by only performing the additional comparison if the
current locale's collating sequence does not have a total
ordering of all characters (if the implementation provides a way
to query this) or by only performing the additional comparison if
the locale name associated with the LC_COLLATE category has an
'@'
modifier in the name (since locales without an '@'
modifier
should have a total ordering of all characters — see the Base
Definitions volume of POSIX.1‐2017, Section 7.3.2, LC_COLLATE).
Note that if the implementation provides a stable sort option as
an extension (usually -s
), the additional comparison should not
be performed when this option has been specified.