Like it or not, XML has been used everywhere, even in cases where text-based formats would have been sufficient. Unfortunately, standard tools such as grep, sed or awk are not really adapted to work with XML. Let us take the following example:
<chapter xmlns="http://docbook.org/ns/docbook" version="5.0"> <title>The Debian distribution</title> <para>Debian is a free operating system, describing itself as “the universal operating system”. It is mostly known as a GNU/Linux distribution, but it also exist in other variants such as GNU/Hurd and GNU/kFreeBSD…</para> </chapter>
PYX and xml2
There are at least two line-oriented alternative formats for XML:
- PYX is an even-oriented format derived from an SGML subset, which can be used with the tool XMLStarlet,
- xml2 is a tool that can transform XML to a content-oriented format.
This is what our example would like in PYX:
(chapter Aversion 5.0 -\n (title -The Debian distribution )title -\n\n (para -Debian is a free operating system, describing itself as -“the\n universal operating system”. It is mostly known as a GNU/Linux\n distribution, but it also exist in other variants such as GNU/Hurd\n and GNU/kFreeBSD… )para -\n )chapter
And in the xml2 format:
/chapter/@xmlns=http://docbook.org/ns/docbook /chapter/@version=5.0 /chapter/title=The Debian distribution /chapter/para=Debian is a free operating system, describing itself as “the /chapter/para= universal operating system”. It is mostly known as a GNU/Linux /chapter/para= distribution, but it also exist in other variants such as GNU/Hurd /chapter/para= and GNU/kFreeBSD…
Examples of use
We want to extract the DocBook version number. This is not easy to do in a reliable way using the XML directly, but it appears directly with xml2:
$ xml2 < chapter.xml | grep '^/chapter/@version=' \ | cut -d= -f2 5.0
We want to move the title into an info tag, using PYX:
$ xmlstarlet pyx chapter.xml | sed -e '/^(title$/i\ (info /^)title$/a\ )info' | xmlstarlet p2x <chapter version="5.0"> <info><title>The Debian distribution</title></info> […]
We could go further, adding a keywords entry in that info tag for
instance, but you get the idea: when you want to work with XML in a
reliable way, try xmlstarlet pyx
or xml2
.
2 comments
tuesday 24 september 2013 à 20:31 piero said : #1
wednesday 25 september 2013 à 11:09 wodny said : #2