обмен переносными архивами (portable archive interchange)
Обоснование (Rationale)
The pax utility was new for the ISO POSIX‐2:1993 standard. It
represents a peaceful compromise between advocates of the
historical tar and cpio utilities.
A fundamental difference between cpio and tar was in the way
directories were treated. The cpio utility did not treat
directories differently from other files, and to select a
directory and its contents required that each file in the
hierarchy be explicitly specified. For tar, a directory matched
every file in the file hierarchy it rooted.
The pax utility offers both interfaces; by default, directories
map into the file hierarchy they root. The -d
option causes pax
to skip any file not explicitly referenced, as cpio historically
did. The tar -
style behavior was chosen as the default because it
was believed that this was the more common usage and because tar
is the more commonly available interface, as it was historically
provided on both System V and BSD implementations.
The data interchange format specification in this volume of
POSIX.1‐2017 requires that processes with ``appropriate
privileges'' shall always restore the ownership and permissions
of extracted files exactly as archived. If viewed from the
historic equivalence between superuser and ``appropriate
privileges'', there are two problems with this requirement.
First, users running as superusers may unknowingly set dangerous
permissions on extracted files. Second, it is needlessly
limiting, in that superusers cannot extract files and own them as
superuser unless the archive was created by the superuser. (It
should be noted that restoration of ownerships and permissions
for the superuser, by default, is historical practice in cpio,
but not in tar.) In order to avoid these two problems, the pax
specification has an additional ``privilege'' mechanism, the -p
option. Only a pax invocation with the privileges needed, and
which has the -p
option set using the e
specification character,
has appropriate privileges to restore full ownership and
permission information.
Note also that this volume of POSIX.1‐2017 requires that the file
ownership and access permissions shall be set, on extraction, in
the same fashion as the creat() function when provided with the
mode stored in the archive. This means that the file creation
mask of the user is applied to the file permissions.
Users should note that directories may be created by pax while
extracting files with permissions that are different from those
that existed at the time the archive was created. When extracting
sensitive information into a directory hierarchy that no longer
exists, users are encouraged to set their file creation mask
appropriately to protect these files during extraction.
The table of contents output is written to standard output to
facilitate pipeline processing.
An early proposal had hard links displaying for all pathnames.
This was removed because it complicates the output of the case
where -v
is not specified and does not match historical cpio
usage. The hard-link information is available in the -v
display.
The description of the -l
option allows implementations to make
hard links to symbolic links. Earlier versions of this standard
did not specify any way to create a hard link to a symbolic link,
but many implementations provided this capability as an
extension. If there are hard links to symbolic links when an
archive is created, the implementation is required to archive the
hard link in the archive (unless -H
or -L
is specified). When in
read
mode and in copy
mode, implementations supporting hard links
to symbolic links should use them when appropriate.
The archive formats inherited from the POSIX.1‐1990 standard have
certain restrictions that have been brought along from historical
usage. For example, there are restrictions on the length of
pathnames stored in the archive. When pax is used in copy
(-rw
)
mode (copying directory hierarchies), the ability to use
extensions from the -xpax
format overcomes these restrictions.
The default blocksize value of 5120 bytes for cpio was selected
because it is one of the standard block-size values for cpio, set
when the -B
option is specified. (The other default block-size
value for cpio is 512 bytes, and this was considered to be too
small.) The default block value of 10240 bytes for tar was
selected because that is the standard block-size value for BSD
tar. The maximum block size of 32256 bytes (215-512 bytes) is
the largest multiple of 512 bytes that fits into a signed 16-bit
tape controller transfer register. There are known limitations in
some historical systems that would prevent larger blocks from
being accepted. Historical values were chosen to improve
compatibility with historical scripts using dd or similar
utilities to manipulate archives. Also, default block sizes for
any file type other than character special file has been deleted
from this volume of POSIX.1‐2017 as unimportant and not likely to
affect the structure of the resulting archive.
Implementations are permitted to modify the block-size value
based on the archive format or the device to which the archive is
being written. This is to provide implementations with the
opportunity to take advantage of special types of devices, and it
should not be used without a great deal of consideration as it
almost certainly decreases archive portability.
The intended use of the -n
option was to permit extraction of one
or more files from the archive without processing the entire
archive. This was viewed by the standard developers as offering
significant performance advantages over historical
implementations. The -n
option in early proposals had three
effects; the first was to cause special characters in patterns to
not be treated specially. The second was to cause only the first
file that matched a pattern to be extracted. The third was to
cause pax to write a diagnostic message to standard error when no
file was found matching a specified pattern. Only the second
behavior is retained by this volume of POSIX.1‐2017, for many
reasons. First, it is in general not acceptable for a single
option to have multiple effects. Second, the ability to make
pattern matching characters act as normal characters is useful
for parts of pax other than file extraction. Third, a finer
degree of control over the special characters is useful because
users may wish to normalize only a single special character in a
single filename. Fourth, given a more general escape mechanism,
the previous behavior of the -n
option can be easily obtained
using the -s
option or a sed script. Finally, writing a
diagnostic message when a pattern specified by the user is
unmatched by any file is useful behavior in all cases.
In this version, the -n
was removed from the copy
mode synopsis
of pax; it is inapplicable because there are no pattern operands
specified in this mode.
There is another method than pax for copying subtrees in
POSIX.1‐2008 described as part of the cp utility. Both methods
are historical practice: cp provides a simpler, more intuitive
interface, while pax offers a finer granularity of control. Each
provides additional functionality to the other; in particular,
pax maintains the hard-link structure of the hierarchy while cp
does not. It is the intention of the standard developers that the
results be similar (using appropriate option combinations in both
utilities). The results are not required to be identical; there
seemed insufficient gain to applications to balance the
difficulty of implementations having to guarantee that the
results would be exactly identical.
A single archive may span more than one file. It is suggested
that implementations provide informative messages to the user on
standard error whenever the archive file is changed.
The -d
option (do not create intermediate directories not listed
in the archive) found in early proposals was originally provided
as a complement to the historic -d
option of cpio. It has been
deleted.
The -s
option in early proposals specified a subset of the
substitution command from the ed utility. As there was no reason
for only a subset to be supported, the -s
option is now
compatible with the current ed specification. Since the delimiter
can be any non-null character, the following usage with single
<space> characters is valid:
pax -s " foo bar " ...
The -t
description is worded so as to note that this may cause
the access time update caused by some other activity (which
occurs while the file is being read) to be overwritten.
The default behavior of pax with regard to file modification
times is the same as historical implementations of tar. It is
not the historical behavior of cpio.
Because the -i
option uses /dev/tty
, utilities without a
controlling terminal are not able to use this option.
The -y
option, found in early proposals, has been deleted because
a line containing a single <period> for the -i
option has
equivalent functionality. The special lines for the -i
option (a
single <period> and the empty line) are historical practice in
cpio.
In early drafts, a -e
charmap option was included to increase
portability of files between systems using different coded
character sets. This option was omitted because it was apparent
that consensus could not be formed for it. In this version, the
use of UTF‐8 should be an adequate substitute.
The ISO POSIX‐2:1993 standard and ISO POSIX‐1 standard
requirements for pax, however, made it very difficult to create a
single archive containing files created using extended characters
provided by different locales. This version adds the hdrcharset
keyword to make it possible to archive files in these cases
without dropping files due to translation errors.
Translating filenames and other attributes from a locale's
encoding to UTF‐8 and then back again can lose information, as
the resulting filename might not be byte-for-byte equivalent to
the original. To avoid this problem, users can specify the -o
hdrcharset=binary
option, which will cause the resulting archive
to use binary format for all names and attributes. Such archives
are not portable among hosts that use different native encodings
(e.g., EBCDIC versus ASCII-based encodings), but they will allow
interchange among the vast majority of POSIX file systems in
practical use. Also, the -o hdrcharset=binary
option will cause
pax in copy
mode to behave more like other standard utilities
such as cp.
If the values specified by the -o exthdr.name=value
, -o
globexthdr.name=value
, or by $TMPDIR
(if -o globexthdr.name
is
not specified) require a character encoding other than that
described in the ISO/IEC 646:1991 standard, a path
extended
header record will have to be created for the file. If a
hdrcharset
extended header record is active for such headers, it
will determine the codeset used for the value field in these
extended path
header records. These path
extended header records
always need to be created when writing an archive even if
hdrcharset=binary
has been specified and would contain the same
(binary) data that appears in the ustar
header record prefix and
name fields. (In other words, an extended header path
record is
always required to be generated if the prefix or name fields
contain non-ASCII characters even when hdrcharset=binary
is also
in effect for that file.)
The -k
option was added to address international concerns about
the dangers involved in the character set transformations of -e
(if the target character set were different from the source, the
filenames might be transformed into names matching existing
files) and also was made more general to protect files
transferred between file systems with different {NAME_MAX} values
(truncating a filename on a smaller system might also
inadvertently overwrite existing files). As stated, it prevents
any overwriting, even if the target file is older than the
source. This version adds more granularity of options to solve
this problem by introducing the -oinvalid=option
—specifically the
UTF‐8
and binary
actions. (Note that an existing file is still
subject to overwriting in this case. The -k
option closes that
loophole.)
Some of the file characteristics referenced in this volume of
POSIX.1‐2017 might not be supported by some archive formats. For
example, neither the tar
nor cpio
formats contain the file access
time. For this reason, the e
specification character has been
provided, intended to cause all file characteristics specified in
the archive to be retained.
It is required that extracted directories, by default, have their
access and modification times and permissions set to the values
specified in the archive. This has obvious problems in that the
directories are almost certainly modified after being extracted
and that directory permissions may not permit file creation. One
possible solution is to create directories with the mode
specified in the archive, as modified by the umask of the user,
with sufficient permissions to allow file creation. After all
files have been extracted, pax would then reset the access and
modification times and permissions as necessary.
The list-mode formatting description borrows heavily from the one
defined by the printf utility. However, since there is no
separate operand list to get conversion arguments, the format was
extended to allow specifying the name of the conversion argument
as part of the conversion specification.
The T
conversion specifier allows time fields to be displayed in
any of the date formats. Unlike the ls utility, pax does not
adjust the format when the date is less than six months in the
past. This makes parsing the output more predictable.
The D
conversion specifier handles the ability to display the
major/minor or file size, as with ls, by using %-8(size)D.
The L
conversion specifier handles the ls display for symbolic
links.
Conversion specifiers were added to generate existing known types
used for ls.
pax Interchange Format
The new POSIX data interchange format was developed primarily to
satisfy international concerns that the ustar
and cpio
formats
did not provide for file, user, and group names encoded in
characters outside a subset of the ISO/IEC 646:1991 standard. The
standard developers realized that this new POSIX data interchange
format should be very extensible because there were other
requirements they foresaw in the near future:
* Support international character encodings and locale
information
* Support security information (ACLs, and so on)
* Support future file types, such as realtime or contiguous
files
* Include data areas for implementation use
* Support systems with words larger than 32 bits and timers
with subsecond granularity
The following were not goals for this format because these are
better handled by separate utilities or are inappropriate for a
portable format:
* Encryption
* Compression
* Data translation between locales and codesets
* inode storage
The format chosen to support the goals is an extension of the
ustar
format. Of the two formats previously available, only the
ustar
format was selected for extensions because:
* It was easier to extend in an upwards-compatible way. It
offered version flags and header block type fields with room
for future standardization. The cpio
format, while possessing
a more flexible file naming methodology, could not be
extended without breaking some theoretical implementation or
using a dummy filename that could be a legitimate filename.
* Industry experience since the original ``tar wars'' fought in
developing the ISO POSIX‐1 standard has clearly been in favor
of the ustar
format, which is generally the default output
format selected for pax implementations on new systems.
The new format was designed with one additional goal in mind:
reasonable behavior when an older tar or pax utility happened to
read an archive. Since the POSIX.1‐1990 standard mandated that a
``format-reading utility'' had to treat unrecognized typeflag
values as regular files, this allowed the format to include all
the extended information in a pseudo-regular file that preceded
each real file. An option is given that allows the archive
creator to set up reasonable names for these files on the older
systems. Also, the normative text suggests that reasonable file
access values be used for this ustar
header block. Making these
header files inaccessible for convenient reading and deleting
would not be reasonable. File permissions of 600 or 700 are
suggested.
The ustar
typeflag field was used to accommodate the additional
functionality of the new format rather than magic or version
because the POSIX.1‐1990 standard (and, by reference, the
previous version of pax), mandated the behavior of the format-
reading utility when it encountered an unknown typeflag, but was
silent about the other two fields.
Early proposals for the first version of this standard contained
a proposed archive format that was based on compatibility with
the standard for tape files (ISO 1001, similar to the format used
historically on many mainframes and minicomputers). This format
was overly complex and required considerable overhead in volume
and header records. Furthermore, the standard developers felt
that it would not be acceptable to the community of POSIX
developers, so it was later changed to be a format more closely
related to historical practice on POSIX systems.
The prefix and name split of pathnames in ustar
was replaced by
the single path extended header record for simplicity.
The concept of a global extended header (typeflagg
) was
controversial. If this were applied to an archive being recorded
on magnetic tape, a few unreadable blocks at the beginning of the
tape could be a serious problem; a utility attempting to extract
as many files as possible from a damaged archive could lose a
large percentage of file header information in this case.
However, if the archive were on a reliable medium, such as a CD‐
ROM, the global extended header offers considerable potential
size reductions by eliminating redundant information. Thus, the
text warns against using the global method for unreliable media
and provides a method for implanting global information in the
extended header for each file, rather than in the typeflag g
records.
No facility for data translation or filtering on a per-file basis
is included because the standard developers could not invent an
interface that would allow this in an efficient manner. If a
filter, such as encryption or compression, is to be applied to
all the files, it is more efficient to apply the filter to the
entire archive as a single file. The standard developers
considered interfaces that would invoke a shell script for each
file going into or out of the archive, but the system overhead in
this approach was considered to be too high.
One such approach would be to have filter=
records that give a
pathname for an executable. When the program is invoked, the file
and archive would be open for standard input/output and all the
header fields would be available as environment variables or
command-line arguments. The standard developers did discuss such
schemes, but they were omitted from POSIX.1‐2008 due to concerns
about excessive overhead. Also, the program itself would need to
be in the archive if it were to be used portably.
There is currently no portable means of identifying the character
set(s) used for a file in the file system. Therefore, pax has not
been given a mechanism to generate charset records automatically.
The only portable means of doing this is for the user to write
the archive using the -ocharset=string
command line option. This
assumes that all of the files in the archive use the same
encoding. The ``implementation-defined'' text is included to
allow for a system that can identify the encodings used for each
of its files.
The table of standards that accompanies the charset record
description is acknowledged to be very limited. Only a limited
number of character set standards is reasonable for maximal
interchange. Any character set is, of course, possible by prior
agreement. It was suggested that EBCDIC be listed, but it was
omitted because it is not defined by a formal standard. Formal
standards, and then only those with reasonably large followings,
can be included here, simply as a matter of practicality. The
<value>s represent names of officially registered character sets
in the format required by the ISO 2375:1985 standard.
The normal <comma> or <blank>-separated list rules are not
followed in the case of keyword options to allow ease of argument
parsing for getopts.
Further information on character encodings is in pax Archive
Character Set Encoding/Decoding.
The standard developers have reserved keyword name space for
vendor extensions. It is suggested that the format to be used is:
VENDOR.keyword
where VENDOR is the name of the vendor or organization in all
uppercase letters. It is further suggested that the keyword
following the <period> be named differently than any of the
standard keywords so that it could be used for future
standardization, if appropriate, by omitting the VENDOR prefix.
The <length> field in the extended header record was included to
make it simpler to step through the records, even if a record
contains an unknown format (to a particular pax) with complex
interactions of special characters. It also provides a minor
integrity checkpoint within the records to aid a program
attempting to recover files from a damaged archive.
There are no extended header versions of the devmajor and
devminor fields because the unspecified format ustar
header field
should be sufficient. If they are not, vendor-specific extended
keywords (such as VENDOR.devmajor) should be used.
Device and i-number labeling of files was not adopted from cpio;
files are interchanged strictly on a symbolic name basis, as in
ustar
.
Just as with the ustar
format descriptions, the new format makes
no special arrangements for multi-volume archives. Each of the
pax archive types is assumed to be inside a single POSIX file and
splitting that file over multiple volumes (diskettes, tape
cartridges, and so on), processing their labels, and mounting
each in the proper sequence are considered to be implementation
details that cannot be described portably.
The pax
format is intended for interchange, not only for backup
on a single (family of) systems. It is not as densely packed as
might be possible for backup:
* It contains information as coded characters that could be
coded in binary.
* It identifies extended records with name fields that could be
omitted in favor of a fixed-field layout.
* It translates names into a portable character set and
identifies locale-related information, both of which are
probably unnecessary for backup.
The requirements on restoring from an archive are slightly
different from the historical wording, allowing for non-
monolithic privilege to bring forward as much as possible. In
particular, attributes such as ``high performance file'' might be
broadly but not universally granted while set-user-ID or chown()
might be much more restricted. There is no implication in
POSIX.1‐2008 that the security information be honored after it is
restored to the file hierarchy, in spite of what might be
improperly inferred by the silence on that topic. That is a topic
for another standard.
Links are recorded in the fashion described here because a link
can be to any file type. It is desirable in general to be able to
restore part of an archive selectively and restore all of those
files completely. If the data is not associated with each link,
it is not possible to do this. However, the data associated with
a file can be large, and when selective restoration is not
needed, this can be a significant burden. The archive is
structured so that files that have no associated data can always
be restored by the name of any link name of any link, and the
user may choose whether data is recorded with each instance of a
file that contains data. The format permits mixing of both types
of links in a single archive; this can be done for special needs,
and pax is expected to interpret such archives on input properly,
despite the fact that there is no pax option that would force
this mixed case on output. (When -o linkdata
is used, the output
must contain the duplicate data, but the implementation is free
to include it or omit it when -o linkdata
is not used.)
The time values are included as extended header records for those
implementations needing more than the eleven octal digits allowed
by the ustar
format. Portable file timestamps cannot be negative.
If pax encounters a file with a negative timestamp in copy
or
write
mode, it can reject the file, substitute a non-negative
timestamp, or generate a non-portable timestamp with a leading
'-'
. Even though some implementations can support finer file-
time granularities than seconds, the normative text requires
support only for seconds since the Epoch because the ISO POSIX‐1
standard states them that way. The ustar
format includes only
mtime; the new format adds atime and ctime for symmetry. The
atime access time restored to the file system will be affected by
the -p a
and -p e
options. The ctime creation time (actually
inode modification time) is described with appropriate privileges
so that it can be ignored when writing to the file system. POSIX
does not provide a portable means to change file creation time.
Nothing is intended to prevent a non-portable implementation of
pax from restoring the value.
The gid, size, and uid extended header records were included to
allow expansion beyond the sizes specified in the regular tar
header. New file system architectures are emerging that will
exhaust the 12-digit size field. There are probably not many
systems requiring more than 8 digits for user and group IDs, but
the extended header values were included for completeness,
allowing overrides for all of the decimal values in the tar
header.
The standard developers intended to describe the effective
results of pax with regard to file ownerships and permissions;
implementations are not restricted in timing or sequencing the
restoration of such, provided the results are as specified.
Much of the text describing the extended headers refers to use in
``write
or copy
modes''. The copy
mode references are due to the
normative text: ``The effect of the copy shall be as if the
copied files were written to an archive file and then
subsequently extracted ...''. There is certainly no way to test
whether pax is actually generating the extended headers in copy
mode, but the effects must be as if it had.
pax Archive Character Set Encoding/Decoding
There is a need to exchange archives of files between systems of
different native codesets. Filenames, group names, and user names
must be preserved to the fullest extent possible when an archive
is read on the receiving platform. Translation of the contents of
files is not within the scope of the pax utility.
There will also be the need to represent characters that are not
available on the receiving platform. These unsupported characters
cannot be automatically folded to the local set of characters due
to the chance of collisions. This could result in overwriting
previous extracted files from the archive or pre-existing files
on the system.
For these reasons, the codeset used to represent characters
within the extended header records of the pax archive must be
sufficiently rich to handle all commonly used character sets. The
fields requiring translation include, at a minimum, filenames,
user names, group names, and link pathnames. Implementations may
wish to have localized extended keywords that use non-portable
characters.
The standard developers considered the following options:
* The archive creator specifies the well-defined name of the
source codeset. The receiver must then recognize the codeset
name and perform the appropriate translations to the
destination codeset.
* The archive creator includes within the archive the character
mapping table for the source codeset used to encode extended
header records. The receiver must then read the character
mapping table and perform the appropriate translations to the
destination codeset.
* The archive creator translates the extended header records in
the source codeset into a canonical form. The receiver must
then perform the appropriate translations to the destination
codeset.
The approach that incorporates the name of the source codeset
poses the problem of codeset name registration, and makes the
archive useless to pax archive decoders that do not recognize
that codeset.
Because parts of an archive may be corrupted, the standard
developers felt that including the character map of the source
codeset was too fragile. The loss of this one key component could
result in making the entire archive useless. (The difference
between this and the global extended header decision was that the
latter has a workaround—duplicating extended header records on
unreliable media—but this would be too burdensome for large
character set maps.)
Both of the above approaches also put an undue burden on the pax
archive receiver to handle the cross-product of all source and
destination codesets.
To simplify the translation from the source codeset to the
canonical form and from the canonical form to the destination
codeset, the standard developers decided that the internal
representation should be a stateless encoding. A stateless
encoding is one where each codepoint has the same meaning,
without regard to the decoder being in a specific state. An
example of a stateful encoding would be the Japanese Shift-JIS;
an example of a stateless encoding would be the ISO/IEC 646:1991
standard (equivalent to 7-bit ASCII).
For these reasons, the standard developers decided to adopt a
canonical format for the representation of file information
strings. The obvious, well-endorsed candidate is the
ISO/IEC 10646‐1:2000 standard (based in part on Unicode), which
can be used to represent the characters of virtually all
standardized character sets. The standard developers initially
agreed upon using UCS2 (16-bit Unicode) as the internal
representation. This repertoire of characters provides a
sufficiently rich set to represent all commonly-used codesets.
However, the standard developers found that the 16-bit Unicode
representation had some problems. It forced the issue of
standardizing byte ordering. The 2-byte length of each character
made the extended header records twice as long for the case of
strings coded entirely from historical 7-bit ASCII. For these
reasons, the standard developers chose the UTF‐8 defined in the
ISO/IEC 10646‐1:2000 standard. This multi-byte representation
encodes UCS2 or UCS4 characters reliably and deterministically,
eliminating the need for a canonical byte ordering. In addition,
NUL octets and other characters possibly confusing to POSIX file
systems do not appear, except to represent themselves. It was
realized that certain national codesets take up more space after
the encoding, due to their placement within the UCS range; it was
felt that the usefulness of the encoding of the names outweighs
the disadvantage of size increase for file, user, and group
names.
The encoding of UTF‐8 is as follows:
UCS4 Hex Encoding UTF-8 Binary Encoding
00000000-0000007F 0xxxxxxx
00000080-000007FF 110xxxxx 10xxxxxx
00000800-0000FFFF 1110xxxx 10xxxxxx 10xxxxxx
00010000-001FFFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
00200000-03FFFFFF 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
04000000-7FFFFFFF 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
where each 'x'
represents a bit value from the character being
translated.
ustar Interchange Format
The description of the ustar
format reflects numerous
enhancements over pre-1988 versions of the historical tar
utility. The goal of these changes was not only to provide the
functional enhancements desired, but also to retain compatibility
between new and old versions. This compatibility has been
retained. Archives written using the old archive format are
compatible with the new format.
Implementors should be aware that the previous file format did
not include a mechanism to archive directory type files. For this
reason, the convention of using a filename ending with <slash>
was adopted to specify a directory on the archive.
The total size of the name and prefix fields have been set to
meet the minimum requirements for {PATH_MAX}. If a pathname will
fit within the name field, it is recommended that the pathname be
stored there without the use of the prefix field. Although the
name field is known to be too small to contain {PATH_MAX}
characters, the value was not changed in this version of the
archive file format to retain backwards-compatibility, and
instead the prefix was introduced. Also, because of the earlier
version of the format, there is no way to remove the restriction
on the linkname field being limited in size to just that of the
name field.
The size field is required to be meaningful in all implementation
extensions, although it could be zero. This is required so that
the data blocks can always be properly counted.
It is suggested that if device special files need to be
represented that cannot be represented in the standard format,
that one of the extension types (A
‐Z
) be used, and that the
additional information for the special file be represented as
data and be reflected in the size field.
Attempting to restore a special file type, where it is converted
to ordinary data and conflicts with an existing filename, need
not be specially detected by the utility. If run as an ordinary
user, pax should not be able to overwrite the entries in, for
example, /dev
in any case (whether the file is converted to
another type or not). If run as a privileged user, it should be
able to do so, and it would be considered a bug if it did not.
The same is true of ordinary data files and similarly named
special files; it is impossible to anticipate the needs of the
user (who could really intend to overwrite the file), so the
behavior should be predictable (and thus regular) and rely on the
protection system as required.
The value 7 in the typeflag field is intended to define how
contiguous files can be stored in a ustar
archive. POSIX.1‐2008
does not require the contiguous file extension, but does define a
standard way of archiving such files so that all conforming
systems can interpret these file types in a meaningful and
consistent manner. On a system that does not support extended
file types, the pax utility should do the best it can with the
file and go on to the next.
The file protection modes are those conventionally used by the ls
utility. This is extended beyond the usage in the ISO POSIX‐2
standard to support the ``shared text'' or ``sticky'' bit. It is
intended that the conformance document should not document
anything beyond the existence of and support of such a mode.
Further extensions are expected to these bits, particularly with
overloading the set-user-ID and set-group-ID flags.
cpio Interchange Format
The reference to appropriate privileges in the cpio
format refers
to an error on standard output; the ustar
format does not make
comparable statements.
The model for this format was the historical System V cpio-c
data
interchange format. This model documents the portable version of
the cpio
format and not the binary version. It has the
flexibility to transfer data of any type described within
POSIX.1‐2008, yet is extensible to transfer data types specific
to extensions beyond POSIX.1‐2008 (for example, contiguous
files). Because it describes existing practice, there is no
question of maintaining upwards-compatibility.
cpio Header
There has been some concern that the size of the c_ino field of
the header is too small to handle those systems that have very
large inode numbers. However, the c_ino field in the header is
used strictly as a hard-link resolution mechanism for archives.
It is not necessarily the same value as the inode number of the
file in the location from which that file is extracted.
The name c_magic is based on historical usage.
cpio Filename
For most historical implementations of the cpio utility,
{PATH_MAX} octets can be used to describe the pathname without
the addition of any other header fields (the NUL character would
be included in this count). {PATH_MAX} is the minimum value for
pathname size, documented as 256 bytes. However, an
implementation may use c_namesize to determine the exact length
of the pathname. With the current description of the <cpio.h>
header, this pathname size can be as large as a number that is
described in six octal digits.
Two values are documented under the c_mode field values to
provide for extensibility for known file types:
0110 000
Reserved for contiguous files. The implementation may
treat the rest of the information for this archive like
a regular file. If this file type is undefined, the
implementation may create the file as a regular file.
This provides for extensibility of the cpio
format while allowing
for the ability to read old archives. Files of an unknown type
may be read as ``regular files'' on some implementations. On a
system that does not support extended file types, the pax utility
should do the best it can with the file and go on to the next.