Путеводитель по Руководству Linux

  User  |  Syst  |  Libr  |  Device  |  Files  |  Other  |  Admin  |  Head  |



   pax.1p    ( 1 )

обмен переносными архивами (portable archive interchange)

Обоснование (Rationale)

The pax utility was new for the ISO POSIX‐2:1993 standard. It
       represents a peaceful compromise between advocates of the
       historical tar and cpio utilities.

A fundamental difference between cpio and tar was in the way directories were treated. The cpio utility did not treat directories differently from other files, and to select a directory and its contents required that each file in the hierarchy be explicitly specified. For tar, a directory matched every file in the file hierarchy it rooted.

The pax utility offers both interfaces; by default, directories map into the file hierarchy they root. The -d option causes pax to skip any file not explicitly referenced, as cpio historically did. The tar -style behavior was chosen as the default because it was believed that this was the more common usage and because tar is the more commonly available interface, as it was historically provided on both System V and BSD implementations.

The data interchange format specification in this volume of POSIX.1‐2017 requires that processes with ``appropriate privileges'' shall always restore the ownership and permissions of extracted files exactly as archived. If viewed from the historic equivalence between superuser and ``appropriate privileges'', there are two problems with this requirement. First, users running as superusers may unknowingly set dangerous permissions on extracted files. Second, it is needlessly limiting, in that superusers cannot extract files and own them as superuser unless the archive was created by the superuser. (It should be noted that restoration of ownerships and permissions for the superuser, by default, is historical practice in cpio, but not in tar.) In order to avoid these two problems, the pax specification has an additional ``privilege'' mechanism, the -p option. Only a pax invocation with the privileges needed, and which has the -p option set using the e specification character, has appropriate privileges to restore full ownership and permission information.

Note also that this volume of POSIX.1‐2017 requires that the file ownership and access permissions shall be set, on extraction, in the same fashion as the creat() function when provided with the mode stored in the archive. This means that the file creation mask of the user is applied to the file permissions.

Users should note that directories may be created by pax while extracting files with permissions that are different from those that existed at the time the archive was created. When extracting sensitive information into a directory hierarchy that no longer exists, users are encouraged to set their file creation mask appropriately to protect these files during extraction.

The table of contents output is written to standard output to facilitate pipeline processing.

An early proposal had hard links displaying for all pathnames. This was removed because it complicates the output of the case where -v is not specified and does not match historical cpio usage. The hard-link information is available in the -v display.

The description of the -l option allows implementations to make hard links to symbolic links. Earlier versions of this standard did not specify any way to create a hard link to a symbolic link, but many implementations provided this capability as an extension. If there are hard links to symbolic links when an archive is created, the implementation is required to archive the hard link in the archive (unless -H or -L is specified). When in read mode and in copy mode, implementations supporting hard links to symbolic links should use them when appropriate.

The archive formats inherited from the POSIX.1‐1990 standard have certain restrictions that have been brought along from historical usage. For example, there are restrictions on the length of pathnames stored in the archive. When pax is used in copy(-rw) mode (copying directory hierarchies), the ability to use extensions from the -xpax format overcomes these restrictions.

The default blocksize value of 5120 bytes for cpio was selected because it is one of the standard block-size values for cpio, set when the -B option is specified. (The other default block-size value for cpio is 512 bytes, and this was considered to be too small.) The default block value of 10240 bytes for tar was selected because that is the standard block-size value for BSD tar. The maximum block size of 32256 bytes (215-512 bytes) is the largest multiple of 512 bytes that fits into a signed 16-bit tape controller transfer register. There are known limitations in some historical systems that would prevent larger blocks from being accepted. Historical values were chosen to improve compatibility with historical scripts using dd or similar utilities to manipulate archives. Also, default block sizes for any file type other than character special file has been deleted from this volume of POSIX.1‐2017 as unimportant and not likely to affect the structure of the resulting archive.

Implementations are permitted to modify the block-size value based on the archive format or the device to which the archive is being written. This is to provide implementations with the opportunity to take advantage of special types of devices, and it should not be used without a great deal of consideration as it almost certainly decreases archive portability.

The intended use of the -n option was to permit extraction of one or more files from the archive without processing the entire archive. This was viewed by the standard developers as offering significant performance advantages over historical implementations. The -n option in early proposals had three effects; the first was to cause special characters in patterns to not be treated specially. The second was to cause only the first file that matched a pattern to be extracted. The third was to cause pax to write a diagnostic message to standard error when no file was found matching a specified pattern. Only the second behavior is retained by this volume of POSIX.1‐2017, for many reasons. First, it is in general not acceptable for a single option to have multiple effects. Second, the ability to make pattern matching characters act as normal characters is useful for parts of pax other than file extraction. Third, a finer degree of control over the special characters is useful because users may wish to normalize only a single special character in a single filename. Fourth, given a more general escape mechanism, the previous behavior of the -n option can be easily obtained using the -s option or a sed script. Finally, writing a diagnostic message when a pattern specified by the user is unmatched by any file is useful behavior in all cases.

In this version, the -n was removed from the copy mode synopsis of pax; it is inapplicable because there are no pattern operands specified in this mode.

There is another method than pax for copying subtrees in POSIX.1‐2008 described as part of the cp utility. Both methods are historical practice: cp provides a simpler, more intuitive interface, while pax offers a finer granularity of control. Each provides additional functionality to the other; in particular, pax maintains the hard-link structure of the hierarchy while cp does not. It is the intention of the standard developers that the results be similar (using appropriate option combinations in both utilities). The results are not required to be identical; there seemed insufficient gain to applications to balance the difficulty of implementations having to guarantee that the results would be exactly identical.

A single archive may span more than one file. It is suggested that implementations provide informative messages to the user on standard error whenever the archive file is changed.

The -d option (do not create intermediate directories not listed in the archive) found in early proposals was originally provided as a complement to the historic -d option of cpio. It has been deleted.

The -s option in early proposals specified a subset of the substitution command from the ed utility. As there was no reason for only a subset to be supported, the -s option is now compatible with the current ed specification. Since the delimiter can be any non-null character, the following usage with single <space> characters is valid:

pax -s " foo bar " ...

The -t description is worded so as to note that this may cause the access time update caused by some other activity (which occurs while the file is being read) to be overwritten.

The default behavior of pax with regard to file modification times is the same as historical implementations of tar. It is not the historical behavior of cpio.

Because the -i option uses /dev/tty, utilities without a controlling terminal are not able to use this option.

The -y option, found in early proposals, has been deleted because a line containing a single <period> for the -i option has equivalent functionality. The special lines for the -i option (a single <period> and the empty line) are historical practice in cpio.

In early drafts, a -echarmap option was included to increase portability of files between systems using different coded character sets. This option was omitted because it was apparent that consensus could not be formed for it. In this version, the use of UTF‐8 should be an adequate substitute.

The ISO POSIX‐2:1993 standard and ISO POSIX‐1 standard requirements for pax, however, made it very difficult to create a single archive containing files created using extended characters provided by different locales. This version adds the hdrcharset keyword to make it possible to archive files in these cases without dropping files due to translation errors.

Translating filenames and other attributes from a locale's encoding to UTF‐8 and then back again can lose information, as the resulting filename might not be byte-for-byte equivalent to the original. To avoid this problem, users can specify the -o hdrcharset=binary option, which will cause the resulting archive to use binary format for all names and attributes. Such archives are not portable among hosts that use different native encodings (e.g., EBCDIC versus ASCII-based encodings), but they will allow interchange among the vast majority of POSIX file systems in practical use. Also, the -o hdrcharset=binary option will cause pax in copy mode to behave more like other standard utilities such as cp.

If the values specified by the -o exthdr.name=value, -o globexthdr.name=value, or by $TMPDIR (if -o globexthdr.name is not specified) require a character encoding other than that described in the ISO/IEC 646:1991 standard, a path extended header record will have to be created for the file. If a hdrcharset extended header record is active for such headers, it will determine the codeset used for the value field in these extended path header records. These path extended header records always need to be created when writing an archive even if hdrcharset=binary has been specified and would contain the same (binary) data that appears in the ustar header record prefix and name fields. (In other words, an extended header path record is always required to be generated if the prefix or name fields contain non-ASCII characters even when hdrcharset=binary is also in effect for that file.)

The -k option was added to address international concerns about the dangers involved in the character set transformations of -e (if the target character set were different from the source, the filenames might be transformed into names matching existing files) and also was made more general to protect files transferred between file systems with different {NAME_MAX} values (truncating a filename on a smaller system might also inadvertently overwrite existing files). As stated, it prevents any overwriting, even if the target file is older than the source. This version adds more granularity of options to solve this problem by introducing the -oinvalid=option—specifically the UTF‐8 and binary actions. (Note that an existing file is still subject to overwriting in this case. The -k option closes that loophole.)

Some of the file characteristics referenced in this volume of POSIX.1‐2017 might not be supported by some archive formats. For example, neither the tar nor cpio formats contain the file access time. For this reason, the e specification character has been provided, intended to cause all file characteristics specified in the archive to be retained.

It is required that extracted directories, by default, have their access and modification times and permissions set to the values specified in the archive. This has obvious problems in that the directories are almost certainly modified after being extracted and that directory permissions may not permit file creation. One possible solution is to create directories with the mode specified in the archive, as modified by the umask of the user, with sufficient permissions to allow file creation. After all files have been extracted, pax would then reset the access and modification times and permissions as necessary.

The list-mode formatting description borrows heavily from the one defined by the printf utility. However, since there is no separate operand list to get conversion arguments, the format was extended to allow specifying the name of the conversion argument as part of the conversion specification.

The T conversion specifier allows time fields to be displayed in any of the date formats. Unlike the ls utility, pax does not adjust the format when the date is less than six months in the past. This makes parsing the output more predictable.

The D conversion specifier handles the ability to display the major/minor or file size, as with ls, by using %-8(size)D.

The L conversion specifier handles the ls display for symbolic links.

Conversion specifiers were added to generate existing known types used for ls.

pax Interchange Format The new POSIX data interchange format was developed primarily to satisfy international concerns that the ustar and cpio formats did not provide for file, user, and group names encoded in characters outside a subset of the ISO/IEC 646:1991 standard. The standard developers realized that this new POSIX data interchange format should be very extensible because there were other requirements they foresaw in the near future:

* Support international character encodings and locale information

* Support security information (ACLs, and so on)

* Support future file types, such as realtime or contiguous files

* Include data areas for implementation use

* Support systems with words larger than 32 bits and timers with subsecond granularity

The following were not goals for this format because these are better handled by separate utilities or are inappropriate for a portable format:

* Encryption

* Compression

* Data translation between locales and codesets

* inode storage

The format chosen to support the goals is an extension of the ustar format. Of the two formats previously available, only the ustar format was selected for extensions because:

* It was easier to extend in an upwards-compatible way. It offered version flags and header block type fields with room for future standardization. The cpio format, while possessing a more flexible file naming methodology, could not be extended without breaking some theoretical implementation or using a dummy filename that could be a legitimate filename.

* Industry experience since the original ``tar wars'' fought in developing the ISO POSIX‐1 standard has clearly been in favor of the ustar format, which is generally the default output format selected for pax implementations on new systems.

The new format was designed with one additional goal in mind: reasonable behavior when an older tar or pax utility happened to read an archive. Since the POSIX.1‐1990 standard mandated that a ``format-reading utility'' had to treat unrecognized typeflag values as regular files, this allowed the format to include all the extended information in a pseudo-regular file that preceded each real file. An option is given that allows the archive creator to set up reasonable names for these files on the older systems. Also, the normative text suggests that reasonable file access values be used for this ustar header block. Making these header files inaccessible for convenient reading and deleting would not be reasonable. File permissions of 600 or 700 are suggested.

The ustar typeflag field was used to accommodate the additional functionality of the new format rather than magic or version because the POSIX.1‐1990 standard (and, by reference, the previous version of pax), mandated the behavior of the format- reading utility when it encountered an unknown typeflag, but was silent about the other two fields.

Early proposals for the first version of this standard contained a proposed archive format that was based on compatibility with the standard for tape files (ISO 1001, similar to the format used historically on many mainframes and minicomputers). This format was overly complex and required considerable overhead in volume and header records. Furthermore, the standard developers felt that it would not be acceptable to the community of POSIX developers, so it was later changed to be a format more closely related to historical practice on POSIX systems.

The prefix and name split of pathnames in ustar was replaced by the single path extended header record for simplicity.

The concept of a global extended header (typeflagg) was controversial. If this were applied to an archive being recorded on magnetic tape, a few unreadable blocks at the beginning of the tape could be a serious problem; a utility attempting to extract as many files as possible from a damaged archive could lose a large percentage of file header information in this case. However, if the archive were on a reliable medium, such as a CD‐ ROM, the global extended header offers considerable potential size reductions by eliminating redundant information. Thus, the text warns against using the global method for unreliable media and provides a method for implanting global information in the extended header for each file, rather than in the typeflag g records.

No facility for data translation or filtering on a per-file basis is included because the standard developers could not invent an interface that would allow this in an efficient manner. If a filter, such as encryption or compression, is to be applied to all the files, it is more efficient to apply the filter to the entire archive as a single file. The standard developers considered interfaces that would invoke a shell script for each file going into or out of the archive, but the system overhead in this approach was considered to be too high.

One such approach would be to have filter= records that give a pathname for an executable. When the program is invoked, the file and archive would be open for standard input/output and all the header fields would be available as environment variables or command-line arguments. The standard developers did discuss such schemes, but they were omitted from POSIX.1‐2008 due to concerns about excessive overhead. Also, the program itself would need to be in the archive if it were to be used portably.

There is currently no portable means of identifying the character set(s) used for a file in the file system. Therefore, pax has not been given a mechanism to generate charset records automatically. The only portable means of doing this is for the user to write the archive using the -ocharset=string command line option. This assumes that all of the files in the archive use the same encoding. The ``implementation-defined'' text is included to allow for a system that can identify the encodings used for each of its files.

The table of standards that accompanies the charset record description is acknowledged to be very limited. Only a limited number of character set standards is reasonable for maximal interchange. Any character set is, of course, possible by prior agreement. It was suggested that EBCDIC be listed, but it was omitted because it is not defined by a formal standard. Formal standards, and then only those with reasonably large followings, can be included here, simply as a matter of practicality. The <value>s represent names of officially registered character sets in the format required by the ISO 2375:1985 standard.

The normal <comma> or <blank>-separated list rules are not followed in the case of keyword options to allow ease of argument parsing for getopts.

Further information on character encodings is in pax Archive Character Set Encoding/Decoding.

The standard developers have reserved keyword name space for vendor extensions. It is suggested that the format to be used is:

VENDOR.keyword

where VENDOR is the name of the vendor or organization in all uppercase letters. It is further suggested that the keyword following the <period> be named differently than any of the standard keywords so that it could be used for future standardization, if appropriate, by omitting the VENDOR prefix.

The <length> field in the extended header record was included to make it simpler to step through the records, even if a record contains an unknown format (to a particular pax) with complex interactions of special characters. It also provides a minor integrity checkpoint within the records to aid a program attempting to recover files from a damaged archive.

There are no extended header versions of the devmajor and devminor fields because the unspecified format ustar header field should be sufficient. If they are not, vendor-specific extended keywords (such as VENDOR.devmajor) should be used.

Device and i-number labeling of files was not adopted from cpio; files are interchanged strictly on a symbolic name basis, as in ustar.

Just as with the ustar format descriptions, the new format makes no special arrangements for multi-volume archives. Each of the pax archive types is assumed to be inside a single POSIX file and splitting that file over multiple volumes (diskettes, tape cartridges, and so on), processing their labels, and mounting each in the proper sequence are considered to be implementation details that cannot be described portably.

The pax format is intended for interchange, not only for backup on a single (family of) systems. It is not as densely packed as might be possible for backup:

* It contains information as coded characters that could be coded in binary.

* It identifies extended records with name fields that could be omitted in favor of a fixed-field layout.

* It translates names into a portable character set and identifies locale-related information, both of which are probably unnecessary for backup.

The requirements on restoring from an archive are slightly different from the historical wording, allowing for non- monolithic privilege to bring forward as much as possible. In particular, attributes such as ``high performance file'' might be broadly but not universally granted while set-user-ID or chown() might be much more restricted. There is no implication in POSIX.1‐2008 that the security information be honored after it is restored to the file hierarchy, in spite of what might be improperly inferred by the silence on that topic. That is a topic for another standard.

Links are recorded in the fashion described here because a link can be to any file type. It is desirable in general to be able to restore part of an archive selectively and restore all of those files completely. If the data is not associated with each link, it is not possible to do this. However, the data associated with a file can be large, and when selective restoration is not needed, this can be a significant burden. The archive is structured so that files that have no associated data can always be restored by the name of any link name of any link, and the user may choose whether data is recorded with each instance of a file that contains data. The format permits mixing of both types of links in a single archive; this can be done for special needs, and pax is expected to interpret such archives on input properly, despite the fact that there is no pax option that would force this mixed case on output. (When -o linkdata is used, the output must contain the duplicate data, but the implementation is free to include it or omit it when -o linkdata is not used.)

The time values are included as extended header records for those implementations needing more than the eleven octal digits allowed by the ustar format. Portable file timestamps cannot be negative. If pax encounters a file with a negative timestamp in copy or write mode, it can reject the file, substitute a non-negative timestamp, or generate a non-portable timestamp with a leading '-'. Even though some implementations can support finer file- time granularities than seconds, the normative text requires support only for seconds since the Epoch because the ISO POSIX‐1 standard states them that way. The ustar format includes only mtime; the new format adds atime and ctime for symmetry. The atime access time restored to the file system will be affected by the -p a and -p e options. The ctime creation time (actually inode modification time) is described with appropriate privileges so that it can be ignored when writing to the file system. POSIX does not provide a portable means to change file creation time. Nothing is intended to prevent a non-portable implementation of pax from restoring the value.

The gid, size, and uid extended header records were included to allow expansion beyond the sizes specified in the regular tar header. New file system architectures are emerging that will exhaust the 12-digit size field. There are probably not many systems requiring more than 8 digits for user and group IDs, but the extended header values were included for completeness, allowing overrides for all of the decimal values in the tar header.

The standard developers intended to describe the effective results of pax with regard to file ownerships and permissions; implementations are not restricted in timing or sequencing the restoration of such, provided the results are as specified.

Much of the text describing the extended headers refers to use in ``write or copy modes''. The copy mode references are due to the normative text: ``The effect of the copy shall be as if the copied files were written to an archive file and then subsequently extracted ...''. There is certainly no way to test whether pax is actually generating the extended headers in copy mode, but the effects must be as if it had.

pax Archive Character Set Encoding/Decoding There is a need to exchange archives of files between systems of different native codesets. Filenames, group names, and user names must be preserved to the fullest extent possible when an archive is read on the receiving platform. Translation of the contents of files is not within the scope of the pax utility.

There will also be the need to represent characters that are not available on the receiving platform. These unsupported characters cannot be automatically folded to the local set of characters due to the chance of collisions. This could result in overwriting previous extracted files from the archive or pre-existing files on the system.

For these reasons, the codeset used to represent characters within the extended header records of the pax archive must be sufficiently rich to handle all commonly used character sets. The fields requiring translation include, at a minimum, filenames, user names, group names, and link pathnames. Implementations may wish to have localized extended keywords that use non-portable characters.

The standard developers considered the following options:

* The archive creator specifies the well-defined name of the source codeset. The receiver must then recognize the codeset name and perform the appropriate translations to the destination codeset.

* The archive creator includes within the archive the character mapping table for the source codeset used to encode extended header records. The receiver must then read the character mapping table and perform the appropriate translations to the destination codeset.

* The archive creator translates the extended header records in the source codeset into a canonical form. The receiver must then perform the appropriate translations to the destination codeset.

The approach that incorporates the name of the source codeset poses the problem of codeset name registration, and makes the archive useless to pax archive decoders that do not recognize that codeset.

Because parts of an archive may be corrupted, the standard developers felt that including the character map of the source codeset was too fragile. The loss of this one key component could result in making the entire archive useless. (The difference between this and the global extended header decision was that the latter has a workaround—duplicating extended header records on unreliable media—but this would be too burdensome for large character set maps.)

Both of the above approaches also put an undue burden on the pax archive receiver to handle the cross-product of all source and destination codesets.

To simplify the translation from the source codeset to the canonical form and from the canonical form to the destination codeset, the standard developers decided that the internal representation should be a stateless encoding. A stateless encoding is one where each codepoint has the same meaning, without regard to the decoder being in a specific state. An example of a stateful encoding would be the Japanese Shift-JIS; an example of a stateless encoding would be the ISO/IEC 646:1991 standard (equivalent to 7-bit ASCII).

For these reasons, the standard developers decided to adopt a canonical format for the representation of file information strings. The obvious, well-endorsed candidate is the ISO/IEC 10646‐1:2000 standard (based in part on Unicode), which can be used to represent the characters of virtually all standardized character sets. The standard developers initially agreed upon using UCS2 (16-bit Unicode) as the internal representation. This repertoire of characters provides a sufficiently rich set to represent all commonly-used codesets.

However, the standard developers found that the 16-bit Unicode representation had some problems. It forced the issue of standardizing byte ordering. The 2-byte length of each character made the extended header records twice as long for the case of strings coded entirely from historical 7-bit ASCII. For these reasons, the standard developers chose the UTF‐8 defined in the ISO/IEC 10646‐1:2000 standard. This multi-byte representation encodes UCS2 or UCS4 characters reliably and deterministically, eliminating the need for a canonical byte ordering. In addition, NUL octets and other characters possibly confusing to POSIX file systems do not appear, except to represent themselves. It was realized that certain national codesets take up more space after the encoding, due to their placement within the UCS range; it was felt that the usefulness of the encoding of the names outweighs the disadvantage of size increase for file, user, and group names.

The encoding of UTF‐8 is as follows:

UCS4 Hex Encoding UTF-8 Binary Encoding

00000000-0000007F 0xxxxxxx 00000080-000007FF 110xxxxx 10xxxxxx 00000800-0000FFFF 1110xxxx 10xxxxxx 10xxxxxx 00010000-001FFFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 00200000-03FFFFFF 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 04000000-7FFFFFFF 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx

where each 'x' represents a bit value from the character being translated.

ustar Interchange Format The description of the ustar format reflects numerous enhancements over pre-1988 versions of the historical tar utility. The goal of these changes was not only to provide the functional enhancements desired, but also to retain compatibility between new and old versions. This compatibility has been retained. Archives written using the old archive format are compatible with the new format.

Implementors should be aware that the previous file format did not include a mechanism to archive directory type files. For this reason, the convention of using a filename ending with <slash> was adopted to specify a directory on the archive.

The total size of the name and prefix fields have been set to meet the minimum requirements for {PATH_MAX}. If a pathname will fit within the name field, it is recommended that the pathname be stored there without the use of the prefix field. Although the name field is known to be too small to contain {PATH_MAX} characters, the value was not changed in this version of the archive file format to retain backwards-compatibility, and instead the prefix was introduced. Also, because of the earlier version of the format, there is no way to remove the restriction on the linkname field being limited in size to just that of the name field.

The size field is required to be meaningful in all implementation extensions, although it could be zero. This is required so that the data blocks can always be properly counted.

It is suggested that if device special files need to be represented that cannot be represented in the standard format, that one of the extension types (AZ) be used, and that the additional information for the special file be represented as data and be reflected in the size field.

Attempting to restore a special file type, where it is converted to ordinary data and conflicts with an existing filename, need not be specially detected by the utility. If run as an ordinary user, pax should not be able to overwrite the entries in, for example, /dev in any case (whether the file is converted to another type or not). If run as a privileged user, it should be able to do so, and it would be considered a bug if it did not. The same is true of ordinary data files and similarly named special files; it is impossible to anticipate the needs of the user (who could really intend to overwrite the file), so the behavior should be predictable (and thus regular) and rely on the protection system as required.

The value 7 in the typeflag field is intended to define how contiguous files can be stored in a ustar archive. POSIX.1‐2008 does not require the contiguous file extension, but does define a standard way of archiving such files so that all conforming systems can interpret these file types in a meaningful and consistent manner. On a system that does not support extended file types, the pax utility should do the best it can with the file and go on to the next.

The file protection modes are those conventionally used by the ls utility. This is extended beyond the usage in the ISO POSIX‐2 standard to support the ``shared text'' or ``sticky'' bit. It is intended that the conformance document should not document anything beyond the existence of and support of such a mode. Further extensions are expected to these bits, particularly with overloading the set-user-ID and set-group-ID flags.

cpio Interchange Format The reference to appropriate privileges in the cpio format refers to an error on standard output; the ustar format does not make comparable statements.

The model for this format was the historical System V cpio-c data interchange format. This model documents the portable version of the cpio format and not the binary version. It has the flexibility to transfer data of any type described within POSIX.1‐2008, yet is extensible to transfer data types specific to extensions beyond POSIX.1‐2008 (for example, contiguous files). Because it describes existing practice, there is no question of maintaining upwards-compatibility.

cpio Header There has been some concern that the size of the c_ino field of the header is too small to handle those systems that have very large inode numbers. However, the c_ino field in the header is used strictly as a hard-link resolution mechanism for archives. It is not necessarily the same value as the inode number of the file in the location from which that file is extracted.

The name c_magic is based on historical usage.

cpio Filename For most historical implementations of the cpio utility, {PATH_MAX} octets can be used to describe the pathname without the addition of any other header fields (the NUL character would be included in this count). {PATH_MAX} is the minimum value for pathname size, documented as 256 bytes. However, an implementation may use c_namesize to determine the exact length of the pathname. With the current description of the <cpio.h> header, this pathname size can be as large as a number that is described in six octal digits.

Two values are documented under the c_mode field values to provide for extensibility for known file types:

0110 000 Reserved for contiguous files. The implementation may treat the rest of the information for this archive like a regular file. If this file type is undefined, the implementation may create the file as a regular file.

This provides for extensibility of the cpio format while allowing for the ability to read old archives. Files of an unknown type may be read as ``regular files'' on some implementations. On a system that does not support extended file types, the pax utility should do the best it can with the file and go on to the next.