The default value for -t
, <blank>, has different properties from,
for example, -t
"<space>". If a line contains:
<space><space>foo
the following treatment would occur with default separation as
opposed to specifically selecting a <space>:
┌──────┬───────────────────┬──────────────┐
│Field
│ Default
│ -t "<space>"
│
├──────┼───────────────────┼──────────────┤
│ 1 │ <space><space>foo │ empty │
│ 2 │ empty │ empty │
│ 3 │ empty │ foo │
└──────┴───────────────────┴──────────────┘
The leading field separator itself is included in a field when -t
is not used. For example, this command returns an exit status of
zero, meaning the input was already sorted:
sort -c -k 2 <<eof
y<tab>b
x<space>a
eof
(assuming that a <tab> precedes the <space> in the current
collating sequence). The field separator is not included in a
field when it is explicitly set via -t
. This is historical
practice and allows usage such as:
sort -t "|" -k 2n <<eof
Atlanta|425022|Georgia
Birmingham|284413|Alabama
Columbia|100385|South Carolina
eof
where the second field can be correctly sorted numerically
without regard to the non-numeric field separator.
The wording in the OPTIONS section clarifies that the -b
, -d
, -f
,
-i
, -n
, and -r
options have to come before the first sort key
specified if they are intended to apply to all specified keys.
The way it is described in this volume of POSIX.1‐2017 matches
historical practice, not historical documentation. The results
are unspecified if these options are specified after a -k
option.
The -f
option might not work as expected in locales where there
is not a one-to-one mapping between an uppercase and a lowercase
letter.
When using sort to process pathnames, it is recommended that
LC_ALL, or at least LC_CTYPE and LC_COLLATE, are set to POSIX or
C in the environment, since pathnames can contain byte sequences
that do not form valid characters in some locales, in which case
the utility's behavior would be undefined. In the POSIX locale
each byte is a valid single-byte character, and therefore this
problem is avoided.
If the collating sequence of the current locale does not have a
total ordering of all characters, this can affect the behavior of
sort in the following ways:
* As sort -u
suppresses lines with duplicate keys, it
suppresses lines that collate equally but are not identical.
* The output of sort (without -u
) can contain identical lines
that are not adjacent, if it does not implement the
recommended further byte-by-byte comparison of lines that
collate equally. This affects the use of sort with comm and
uniq; see the APPLICATION USAGE for those utilities.