tar and cpio
tar(5) and cpio(5) are two competing archive formats, that provide almost identical features. Both are streamed formats, originally designed to be used on tapes. Their major practical difference is the style of their standard command line utilities:
cpio(1)
takes the list of files to archive on its standard input, allowing a very high control but requiring to use another utility such asfind(1)
to archive an entire tree;tar(1)
usually takes the list of file to archive on its command line, and browses directories recursively be default.
Peace!
There is a traditional opposition between these two formats and utilities and
their respective supporters. To fix this, the POSIX specification deprecated
both commands in 2001, replacing them by a single one called
pax(1)
, that can be used either in cpio-like or tar-like style.
This specification also defined a pax interchange format, which these vile
trolls derived from… tar!
Now, considering the actual practice, you can notice that tar has won, and that almost nobody uses pax. Almost every free software project I know release tarballs, not cpio archives. This is probably the reason why the GNU tar utility has evolved much more than GNU cpio: tar now integrates seamlessly with compressors such as gzip, bzip2 or xz, autodetecting the compression format when needed whereas cpio has no such features as far as I know. However, cpio is notably used by the Linux kernel for its initial memory file system, aka initrd.
tar and cpio drawbacks
Now I come to the title of this article: I personally prefer the cpio format over tar, although I usually use tar because of the advanced features of GNU tar. In fact, tar suffers from a drawback that usually has to be worked around: it cannot store any type of file. There exist seven file types: regular files, directories, symbolic links, character devices, block devices, FIFOs and Unix sockets. Well, tar can only represent six of them, and misses the ability to store sockets. Thus, if you try to archive your entire filesystem with tar, you may get the following output:
tar: /var/spool/postfix/private/rewrite: socket ignored tar: /var/spool/postfix/private/bsmtp: socket ignored
If you try the same with cpio, you will see that sockets get archived without any problem. Given that the tar format has a type byte, with one reserved value, I am surprised that its 2001 revision did not use it for sockets, but for some reason it did not. In practice, programs that use sockets work around this problem by creating them at startup. To be honest, I must mention that cpio also has one significant drawback: when a file has several links, it gets copied several times in the archive.
Edit: It appears that archiving sockets is completely pointless, given that they must be unlinked if they exist, and then created before they are used by a program. Thus, the question becomes: why does cpio support sockets?
6 comments
saturday 21 may 2011 à 01:53 Justin Rovang said : #1
saturday 21 may 2011 à 07:26 Buck said : #2
saturday 21 may 2011 à 08:32 Tanguy said : #3
saturday 21 may 2011 à 10:01 Mathieu said : #4
saturday 21 may 2011 à 15:01 mirabilos said : #5
friday 27 may 2011 à 16:41 Ted said : #6