20 05 | 2011

cpio > tar

Written by Tanguy

Classified in : Homepage, Debian, Miscellaneous

Large package icon

tar and cpio

tar(5) and cpio(5) are two competing archive formats, that provide almost identical features. Both are streamed formats, originally designed to be used on tapes. Their major practical difference is the style of their standard command line utilities:

  • cpio(1) takes the list of files to archive on its standard input, allowing a very high control but requiring to use another utility such as find(1) to archive an entire tree;
  • tar(1) usually takes the list of file to archive on its command line, and browses directories recursively be default.

Peace!

There is a traditional opposition between these two formats and utilities and their respective supporters. To fix this, the POSIX specification deprecated both commands in 2001, replacing them by a single one called pax(1), that can be used either in cpio-like or tar-like style. This specification also defined a pax interchange format, which these vile trolls derived from… tar!

Now, considering the actual practice, you can notice that tar has won, and that almost nobody uses pax. Almost every free software project I know release tarballs, not cpio archives. This is probably the reason why the GNU tar utility has evolved much more than GNU cpio: tar now integrates seamlessly with compressors such as gzip, bzip2 or xz, autodetecting the compression format when needed whereas cpio has no such features as far as I know. However, cpio is notably used by the Linux kernel for its initial memory file system, aka initrd.

tar and cpio drawbacks

Now I come to the title of this article: I personally prefer the cpio format over tar, although I usually use tar because of the advanced features of GNU tar. In fact, tar suffers from a drawback that usually has to be worked around: it cannot store any type of file. There exist seven file types: regular files, directories, symbolic links, character devices, block devices, FIFOs and Unix sockets. Well, tar can only represent six of them, and misses the ability to store sockets. Thus, if you try to archive your entire filesystem with tar, you may get the following output:

tar: /var/spool/postfix/private/rewrite: socket ignored
tar: /var/spool/postfix/private/bsmtp: socket ignored

If you try the same with cpio, you will see that sockets get archived without any problem. Given that the tar format has a type byte, with one reserved value, I am surprised that its 2001 revision did not use it for sockets, but for some reason it did not. In practice, programs that use sockets work around this problem by creating them at startup. To be honest, I must mention that cpio also has one significant drawback: when a file has several links, it gets copied several times in the archive.

Edit: It appears that archiving sockets is completely pointless, given that they must be unlinked if they exist, and then created before they are used by a program. Thus, the question becomes: why does cpio support sockets?

6 comments

saturday 21 may 2011 à 01:53 Justin Rovang said : #1

Don't forget incremental with GNU tar =)

Tar also can skip over a corrupted file

saturday 21 may 2011 à 07:26 Buck said : #2

It doesn't make much sense to archive a socket, does it?
When you restore the socket, nothing will be bound to it,
and it will have to be unlinked for anything to bind() that
path again

saturday 21 may 2011 à 08:32 Tanguy said : #3

@Buck: I disagree, the socket by itself has no value, but its permissions do.

For instance, some programs create their sockets when started, and remove them when stopped. Some of them do not allow to choose the socket permissions, which requires to modify their init script to set the permissions you want after they started, for instance to allow your mail server to talk to your milter.

Edit: according to unix(7), sockets must indeed be unlinked when they are not used. Too bad, but indeed, archiving sockets is then useless. I wonder if this is a requirement, and if it is impossible to reuse an unused socket.

saturday 21 may 2011 à 10:01 Mathieu said : #4

A drawback of both: they don't archive ACLs and attr.

saturday 21 may 2011 à 15:01 mirabilos said : #5

tar (ustar) also has a limit of 2GB; GNU tar writes a gnutar format that’s not portable either; GNU cpio’s ustar support is totally broken (as is mc’s), and paxtar often doesn’t support the pax format (the only one allowing more than 2 (tar) / 4 (newc/crc) / 8 (cpio) GiB long archive members)……… any questions? ☺☹

friday 27 may 2011 à 16:41 Ted said : #6

If you like the cpio, you should take a look at afio. It can create a cpio format archive, but includes more advanced options such as the -Z option that compresses each file that makes up the archive, and support for files larger than 2GB.

Write a comment

What is the first letter of the word kumsq? : 

Archives