The -h
option was omitted because it was insufficiently specified
and does not add to applications portability.
Historical implementations employ algorithms that do not always
produce a minimum list of differences; the current language about
making every effort is the best this volume of POSIX.1‐2017 can
do, as there is no metric that could be employed to judge the
quality of implementations against any and all file contents. The
statement ``This list should be minimal'' clearly implies that
implementations are not expected to provide the following output
when comparing two 100-line files that differ in only one
character on a single line:
1,100c1,100
all 100 lines from file1 preceded with "< "
---
all 100 lines from file2 preceded with "> "
The ``Only in'' messages required when the -r
option is specified
are not used by most historical implementations if the -e
option
is also specified. It is required here because it provides useful
information that must be provided to update a target directory
hierarchy to match a source hierarchy. The ``Common
subdirectories'' messages are written by System V and 4.3 BSD
when the -r
option is specified. They are allowed here but are
not required because they are reporting on something that is the
same, not reporting a difference, and are not needed to update a
target hierarchy.
The -c
option, which writes output in a format using lines of
context, has been included. The format is useful for a variety of
reasons, among them being much improved readability and the
ability to understand difference changes when the target file has
line numbers that differ from another similar, but slightly
different, copy. The patch utility is most valuable when working
with difference listings using a context format. The BSD version
of -c
takes an optional argument specifying the amount of
context. Rather than overloading -c
and breaking the Utility
Syntax Guidelines for diff, the standard developers decided to
add a separate option for specifying a context diff with a
specified amount of context (-C
). Also, the format for context
diffs was extended slightly in 4.3 BSD to allow multiple changes
that are within context lines from each other to be merged
together. The output format contains an additional four
<asterisk> characters after the range of affected lines in the
first filename. This was to provide a flag for old programs (like
old versions of patch) that only understand the old context
format. The version of context described here does not require
that multiple changes within context lines be merged, but it does
not prohibit it either. The extension is upwards-compatible, so
any vendors that wish to retain the old version of diff can do so
by adding the extra four <asterisk> characters (that is,
utilities that currently use diff and understand the new merged
format will also understand the old unmerged format, but not vice
versa).
The -u
and -U
options of GNU diff have been included. Their
output format, designed by Wayne Davison, takes up less space
than -c
and -C
format, and in many cases is easier to read. The
format's timestamps do not vary by locale, so LC_TIME does not
affect it. The format's line numbers are rendered with the %1d
format, not %d
, because the file format notation rules would
allow extra <blank> characters to appear around the numbers.
The substitute command was added as an additional format for the
-e
option. This was added to provide implementations with a way
to fix the classic ``dot alone on a line'' bug present in many
versions of diff. Since many implementations have fixed this
bug, the standard developers decided not to standardize broken
behavior, but rather to provide the necessary tool for fixing the
bug. One way to fix this bug is to output two periods whenever a
lone period is needed, then terminate the append command with a
period, and then use the substitute command to convert the two
periods into one period.
The BSD-derived -r
option was added to provide a mechanism for
using diff to compare two file system trees. This behavior is
useful, is standard practice on all BSD-derived systems, and is
not easily reproducible with the find utility.
The requirement that diff not compare files in some
circumstances, even though they have the same name, is based on
the actual output of historical implementations. The specified
behavior precludes the problems arising from running into FIFOs
and other files that would cause diff to hang waiting for input
with no indication to the user that diff was hung. An earlier
version of this standard specified the output format more
precisely, but in practice this requirement was widely ignored
and the benefit of standardization seemed small, so it is now
unspecified. In most common usage, diff -r
should indicate
differences in the file hierarchies, not the difference of
contents of devices pointed to by the hierarchies.
Many early implementations of diff require seekable files. Since
the System Interfaces volume of POSIX.1‐2017 supports named
pipes, the standard developers decided that such a restriction
was unreasonable. Note also that the allowed filename -
almost
always refers to a pipe.
No directory search order is specified for diff. The historical
ordering is, in fact, not optimal, in that it prints out all of
the differences at the current level, including the statements
about all common subdirectories before recursing into those
subdirectories.
The message:
"diff %s %s %s\n", <diff_options>, <filename1>, <filename2>
does not vary by locale because it is the representation of a
command, not an English sentence.