различия между GNU roff и AT&T troff (differences between GNU roff and AT&T troff)
Implementation differences
groff has a number of features that cause incompatibilities with
documents written using old versions of roff. Some GNU
extensions to roff have become supported by other
implementations.
When adjusting to both margins, AT&T troff at first adjusts
spaces starting from the right; troff begins from the left. Both
implementations adjust spaces from opposite ends on alternating
output lines in this adjustment mode to prevent 'rivers' in the
text.
groff does not always hyphenate words as AT&T troff does. The
AT&T implementation uses a set of hard-coded rules specific to
U.S. English, while groff uses language-specific hyphenation
pattern files derived from TeX. Furthermore, in old versions of
troff there was a limited amount of space to store hyphenation
exceptions (arguments to the .hw
request); groff has no such
restriction.
Long names may be groff's most obvious innovation. AT&T troff
interprets '.dsabcd
' as defining a string 'ab
' with contents
'cd
'. Normally, groff interprets this as a call of a macro named
'dsabcd
'. AT&T troff also interprets \*[
and \n[
as an
interpolation of a string or number register, respectively,
called '[
'. In groff, however, the '[
' is normally interpreted
as delimiting a long name. In compatibility mode, groff
interprets names in the traditional way, which means that they
are limited to one or two characters. See the -C
option in
groff(1) and, above, the .C
and .cp
registers, and .cp
and .do
requests, for more on compatibility mode.
The register \n[.cp]
is specialized and may require a statement
of rationale. When writing macro packages or documents that use
groff features and which may be mixed with other packages or
documents that do not—common scenarios include serial processing
of man pages or use of the .so
or .mso
requests—you may desire
correct operation regardless of compatibility mode in the
surrounding context. It may occur to you to save the existing
value of \n(.C
into a register, say, _C
, at the beginning of your
file, turn compatibility mode off with '.cp 0
', then restore it
from that register at the end with '.cp \n(_C
'. At the same
time, a modular design of a document or macro package may lead
you to multiple layers of inclusion. You cannot use the same
register name everywhere or you risk 'clobbering' the value from
a preceding or enclosing context. The two-character register
name space of AT&T troff is confining and mnemonically
challenging; you may wish to use groff's more capacious name
space. However, attempting '.nr _my_saved_C \n(.C
' will not work
in compatibility mode; the register name is too long. 'This is
exactly what .do
is for,' you think, '.do nr _my_saved_C \n(.C
'.
The foregoing will always save zero to your register, because .do
turns compatibility mode off while it interprets its argument
list. What you need is:
.do nr _my_saved_C \n[.cp]
.cp 0
at the beginning of your file, followed by
.cp \n[_my_saved_C]
.do rr _my_saved_C
at the end. As in the C language, we all have to share one big
name space, so choose a register name that is unlikely to collide
with other uses.
The existence of the .T
string is a common feature of post-
CSTR #54 troffs—DWB 3.3, Solaris, Heirloom Doctools, and Plan 9
troff all support it—but valid values are specific to each
implementation. This behavior of the .T
register differs from
AT&T troff, which interpolated 1 only if nroff was the formatter
and was called with -T
.
AT&T troff and other implementations handle .lf
differently. For
them, its line argument changes the line number of the current
line.
AT&T troff had only environments named '0
', '1
', and '2
'. In GNU
troff, any number of environments may exist, using any valid
identifiers for their names.
Normally, groff preserves the interpolation depth in delimited
arguments, but not in compatibility mode. For example, on
terminal devices,
.ds xx '
\w'abc\*(xxdef'
produces '168' ordinarily, but '72def'' in compatibility mode.
Furthermore, the escapes \f
, \H
, \m
, \M
, \R
, \s
, and \S
are
transparent for the purpose of recognizing a control character at
the beginning of a line only in compatibility mode. For example,
this code produces bold output in both cases, but the text
differs,
.de xx '
Hello!
..
\fB.xx\fP
producing '.xx' in normal mode and 'Hello!' in compatibility
mode.
groff does not allow the use of the escape sequences \|
, \^
, \&
,
\{
, \}
, '\
', \'
, \`
, \-
, \_
, \!
, \%
, \c
, in names of strings,
macros, diversions, number registers, fonts, or environments;
AT&T troff does. The \A
escape sequence (see subsection 'Escape
sequences' above) may be helpful in avoiding use of these escape
sequences in names.
Normally, the syntax form \s
n accepts only a single character (a
digit) for n, consistently with other forms that originated in
AT&T troff, like \*
, \$
, \f
, \g
, \k
, \n
, and \z
. In
compatibility mode only, a non-zero n must be in the range 4–39.
Legacy documents relying upon this quirk of parsing should be
migrated to another \s
form. [Background: The Graphic Systems
C/A/T phototypesetter (the original device target for AT&T troff)
supported only a few discrete point sizes in the range 6–36, so
Ossanna contrived a special case in the parser to do what the
user must have meant. Kernighan warned of this in the 1992
revision of CSTR #54 (§2.3), and more recently, McIlroy referred
to it as a 'living fossil'.]
Fractional point sizes cause one noteworthy incompatibility. In
AT&T troff the .ps
request ignores scaling indicators and thus
'.ps 10u
' sets the point size to 10 points, whereas in groff it
sets the point size to 10 scaled points. See subsection
'Fractional point sizes and new scaling indicators' above.
The .bp
request differs from AT&T troff: GNU troff does not
accept a scaling indicator on the argument, a page number; the
former (somewhat uselessly) does.
In AT&T troff the .pm
request reports macro, string, and
diversion sizes in units of 128-byte blocks, and an argument
reduces the report to a sum of the above in the same units.
groff ignores any arguments and reports the sizes in bytes.
Unlike AT&T troff, groff does not ignore the .ss
request if the
output is a terminal device; instead, the values of minimal
inter-word and additional inter-sentence spacing are rounded down
to the nearest multiple of 12.
In groff, there is a fundamental difference between unformatted
input characters, and formatted output characters (glyphs).
Everything that affects how a glyph is output is stored with the
glyph; once a glyph has been constructed, it is unaffected by any
subsequent requests that are executed, including the .bd
, .cs
,
.tkf
, .tr
, or .fp
requests. Normally, glyphs are constructed
from input characters immediately before the glyph is added to
the current output line. Macros, diversions, and strings are
all, in fact, the same type of object; they contain lists of
input characters and glyphs in any combination. Special
characters can be both: before being added to the output, they
act as input entities; afterwards, they denote glyphs. A glyph
does not behave like an input character for the purposes of macro
processing; it does not inherit any of the special properties
that the input character from which it was constructed might have
had. Consider the following example.
.di x
\\\\
.br
.di
.x
It prints '\\
' in groff; each pair of input backslashes is turned
into one output backslash and the resulting output backslashes
are not interpreted as escape characters when they are reread.
AT&T troff would interpret them as escape characters when they
were reread and would end up printing one '\
'.
One correct way to obtain a printable backslash in most documents
is to use the \e
escape sequence; this always prints a single
instance of the current escape character, regardless of whether
or not it is used in a diversion; it also works in both groff and
AT&T troff. (Naturally, if you've changed the escape character,
you need to prefix the 'e
' with whatever it is—and you'll likely
get something other than a backslash in the output.)
The other correct way, appropriate in contexts independent of the
backslash's common use as a roff escape character—perhaps in
discussion of character sets or other programming languages—is
the character escape \(rs
or \[rs]
, for 'reverse solidus', from
its name in the ECMA-6 (ISO/IEC 646) standard. [This character
escape is not portable to AT&T troff, but is to its lineal
descendant, Heirloom Doctools troff, as of its 060716 release
(July 2006).]
To store an escape sequence in a diversion that is interpreted
when the diversion is reread, either use the traditional \!
transparent output facility, or, if this is unsuitable, the new
\?
escape sequence. See subsection 'Escape sequences' above and
sections 'Diversions' and 'Gtroff Internals' in Groff: The GNU
Implementation of troff, the groff Texinfo manual.
In the somewhat pathological case where a diversion exists
containing a partially collected line and a partially collected
line at the top-level diversion has never existed, AT&T troff
will output the partially collected line at the end of input;
groff will not.
Intermediate output format
Its extensions notwithstanding, the groff intermediate output
format has some incompatibilities with that of AT&T troff, but
full compatibility is sought; problem reports and patches are
welcome. The following incompatibilities are known.
• The positioning after drawing polygons conflicts with the
AT&T troff practice.
• The intermediate output cannot be rescaled to other
devices as AT&T troff's could.