Путеводитель по Руководству Linux

  User  |  Syst  |  Libr  |  Device  |  Files  |  Other  |  Admin  |  Head  |



   sort.1p    ( 1 )

сортировать, объединять или проверять последовательность текстовых файлов (sort, merge, or sequence check text files)

Использование в приложениях (Application usage)

The default value for -t, <blank>, has different properties from, for example, -t"<space>". If a line contains:

<space><space>foo

the following treatment would occur with default separation as opposed to specifically selecting a <space>:

┌──────┬───────────────────┬──────────────┐ │Field Default -t "<space>" │ ├──────┼───────────────────┼──────────────┤ │ 1 │ <space><space>foo │ empty │ │ 2 │ emptyempty │ │ 3 │ empty │ foo │ └──────┴───────────────────┴──────────────┘ The leading field separator itself is included in a field when -t is not used. For example, this command returns an exit status of zero, meaning the input was already sorted:

sort -c -k 2 <<eof y<tab>b x<space>a eof

(assuming that a <tab> precedes the <space> in the current collating sequence). The field separator is not included in a field when it is explicitly set via -t. This is historical practice and allows usage such as:

sort -t "|" -k 2n <<eof Atlanta|425022|Georgia Birmingham|284413|Alabama Columbia|100385|South Carolina eof

where the second field can be correctly sorted numerically without regard to the non-numeric field separator.

The wording in the OPTIONS section clarifies that the -b, -d, -f, -i, -n, and -r options have to come before the first sort key specified if they are intended to apply to all specified keys. The way it is described in this volume of POSIX.1‐2017 matches historical practice, not historical documentation. The results are unspecified if these options are specified after a -k option.

The -f option might not work as expected in locales where there is not a one-to-one mapping between an uppercase and a lowercase letter.

When using sort to process pathnames, it is recommended that LC_ALL, or at least LC_CTYPE and LC_COLLATE, are set to POSIX or C in the environment, since pathnames can contain byte sequences that do not form valid characters in some locales, in which case the utility's behavior would be undefined. In the POSIX locale each byte is a valid single-byte character, and therefore this problem is avoided.

If the collating sequence of the current locale does not have a total ordering of all characters, this can affect the behavior of sort in the following ways:

* As sort -u suppresses lines with duplicate keys, it suppresses lines that collate equally but are not identical.

* The output of sort (without -u) can contain identical lines that are not adjacent, if it does not implement the recommended further byte-by-byte comparison of lines that collate equally. This affects the use of sort with comm and uniq; see the APPLICATION USAGE for those utilities.