09 12 | 2014

Using bsdtar to change an archive format

Written by Tanguy

Classified in : Homepage, Debian, Command line, To remember

Streamable archive formats

Package icon

Archive formats such as tar(5) and cpio(5) have the advantage of being streamable, so you can use them for transferring data with pipes and remote shells, without having to store the archive in the middle of the process, for instance:

$ cd public_html/blog
$ rgrep -lF "archive" data/articles \
      | pax -w \
      | ssh newserver "mkdir public_html/blog ;
                       cd public_html/blog ;
                       pax -r"

Turning a ZIP archive into tarball

Unfortunately, many people will send you data in non-streamable archive formats such as ZIP¹. For such cases, bsdtar(1) can be useful, as it is able to convert an archive from one format to another:

$ bsdtar -cf - @archive.zip \
      | COMMAND

These arguments tell bsdtar to:

  • create an archive;
  • write it to stdout (contrary to GNU tar which defaults to stdout, bsdtar defaults to a tape device);
  • put into it the files it will find in the archive archive.zip.

The result is a tape archive, which is easier to manipulate in a stream than a ZIP archive.

Notes

  1. Some will say that although ZIP is based on an file index, it can be stream because that index is placed at the end of the archive. In fact, that characteristic only allows to stream the archive creation, but requires to store the full archive before being able to extract it. .

11 comments

tuesday 09 december 2014 à 16:48 mirabilos said : #1

Please note that this is not “BSD tar”, but “bsdtar”, a piece of software used by the FreeBSD® family (also, DragonFly, MidnightBSD, etc.) for its tar implementation. The real BSDs (MirBSD, OpenBSD, NetBSD®, etc.) use “paxtar” (Debian package “pax”) instead.

tuesday 09 december 2014 à 17:20 Tanguy said : #2

@mirabilos : Thank you, I have corrected my article accordingly.

sunday 14 december 2014 à 13:09 pini said : #3

> ZIP [...] requires to store the full archive before being able to extract it

Actually you can use bsdtar to stream a zip archive extraction:

$ wget -qO- http://example.org/file.zip | bsdtar -xvf-

Credits: http://unix.stackexchange.com/a/125102

monday 15 december 2014 à 11:01 Tanguy said : #4

@pini : Yes, you can use bsdtar to unzip in a stream, which can be useful for organizing your command line, but that will not provide the material advantages of streaming, as it will store the whole archive in memory before starting to extract it.

So my explanation still stands: ZIP is not a streamable format, as it does requires to store the full archive before being able to extract it, even though bsdtar can hide that and fake a streaming usage by storing it in memory.

tuesday 16 december 2014 à 23:11 pini said : #5

> it will store the whole archive in memory before starting to extract it.
Not true. It doesn't need to. The PKZIP format doesn't need this final index actually. It is there for redundancy. The very same information is available along the archive. That's why 'zip -FF' can repair a truncated archive.
Try this:
1- zip a folder with many files
2- split the archive in 2 pieces using 'split'
3- cat <first part> | bsdtar xf -
This last step will successfully extract what is stored into the arhive until it encounters unexpected EOF.

thursday 12 may 2016 à 14:58 Romiras said : #6

@Tanguy : I found your article very useful for me.

Can you advice how do I stream ZIP-archive as TAR-archive though pipe instead of unpacking files?

thursday 12 may 2016 à 15:15 Tanguy said : #7

@Romiras : That depends on what you have, and what you want to do exactly. Do you already have an archive, or do you want to create one? What do you want to do with it, do you want to transmit it to some remote host, allow people to donwload it, unpack it?

thursday 12 may 2016 à 15:22 Romiras said : #8

@Tanguy : I want to convert from ZIP-archive to tarball on-the-fly through pipe. My Zip file can be located remotely or locally. For example if I read remote Zip

wget -qO- http://example.org/file.zip | bsdtar -xvf-

it will extract files locally. Instead I need to pass through pipe for additional filters, like nc, gzip or whatever.

thursday 12 may 2016 à 16:01 Romiras said : #9

Finally I've found the way!
This worked for me:

wget -qO- http://example.org/file.zip | bsdtar -cf - @- > converted.tar

One can pass additional filters, for example:
cat file.zip | bsdtar -cf - @- | filter1 | filter2 | filter3

thursday 12 may 2016 à 16:02 Tanguy said : #10

@Romiras : Well yes, this is exactly what my article was about!

thursday 12 may 2016 à 16:07 Romiras said : #11

@Tanguy : Yes. Initially I thought archive should be extracted at first, like:

cat file.zip | bsdtar -xf- | bsdtar_pack_it_to_tar

but there was no need to do so.

Write a comment

What is the first letter of the word gmmlei? : 

Archives