03 09 | 2011

Repacking ZIP-based containers

Written by Tanguy

Classified in : Homepage, Debian, Command line, To remember

An open package showing two egg-like objects

Several modern complex file formats are based on a ZIP container: this is at least the case of OpenDocument and EPUB. However, they are not simply a bunch of files joined into an archive, but they follow some rules in order to be easily recognized by tools such as file. As I had to unpack, modify and repack such a container, here is a recipe to do that.

The zip command is flexible enough to allow you to work in place (I am not sure that it is really in place) to add, remove and replace files in the archive, however I find it more convenient to work on the unpacked tree and repack later.

Unpack and work

To unpack, just unzip, being aware that these containers do not have a root directory, so it is better to extract them to a dedicated directory.

Modify as you wish; if you add, remove or replace files, remember that there is a file list to modify accordingly: META-INF/manifest.xml for OpenDocument, OEBPS/content.opf and possibly OEBPS/toc.ncx for EPUB.

Repack

To repack, you must create a ZIP archive without extra file attributes, and put the file mimetype uncompressed in first position. Its goal is to be visible as plain text at a fixed position, to serve as a magic file type indication. The remaining files can be stored compressed and in random order. For an EPUB, for instance, this gives the following commands:

% zip --no-extra --compression-method store ../book.epub mimetype
% zip --recurse-paths ../book.epub META-INF OEBPS

This could be automated by a dedicated script, but I felt no need to write one yet. By the way, these options can be abbreviated as -X0 and -r.

1 comment

saturday 03 september 2011 à 19:18 Christoph Anton Mitterer said : #1

would actually be nice to see a general optimise-compression-tool or so... which repacks [tar].bz2|gz|xz|zip etc. ... at best possible quality and also detects and handles files like ODF/EPUB correctly.

Write a comment

What is the last letter of the word mhdq? : 

Archives