Like it or not, XML has been used everywhere, even in cases where text-based formats would have been sufficient. Unfortunately, standard tools such as grep, sed or awk are not really adapted to work with XML. Let us take the following example:
<chapter
xmlns="http://docbook.org/ns/docbook" version="5.0">
<title>The Debian distribution</title>
<para>Debian is a free operating system, describing itself as “the
universal operating system”. It is mostly known as a GNU/Linux
distribution, but it also exist in other variants such as GNU/Hurd
and GNU/kFreeBSD…</para>
</chapter>
PYX and xml2
There are at least two line-oriented alternative formats for XML:
- PYX is an even-oriented format derived from an SGML subset, which can be used with the tool XMLStarlet,
- xml2 is a tool that can transform XML to a content-oriented format.
This is what our example would like in PYX:
(chapter Aversion 5.0 -\n (title -The Debian distribution )title -\n\n (para -Debian is a free operating system, describing itself as -“the\n universal operating system”. It is mostly known as a GNU/Linux\n distribution, but it also exist in other variants such as GNU/Hurd\n and GNU/kFreeBSD… )para -\n )chapter
And in the xml2 format:
/chapter/@xmlns=http://docbook.org/ns/docbook /chapter/@version=5.0 /chapter/title=The Debian distribution /chapter/para=Debian is a free operating system, describing itself as “the /chapter/para= universal operating system”. It is mostly known as a GNU/Linux /chapter/para= distribution, but it also exist in other variants such as GNU/Hurd /chapter/para= and GNU/kFreeBSD…
Examples of use
We want to extract the DocBook version number. This is not easy to do in a reliable way using the XML directly, but it appears directly with xml2:
$ xml2 < chapter.xml | grep '^/chapter/@version=' \
| cut -d= -f2
5.0
We want to move the title into an info tag, using PYX:
$ xmlstarlet pyx chapter.xml | sed -e '/^(title$/i\
(info
/^)title$/a\
)info' | xmlstarlet p2x
<chapter version="5.0">
<info><title>The Debian distribution</title></info>
[…]
We could go further, adding a keywords entry in that info tag for
instance, but you get the idea: when you want to work with XML in a
reliable way, try xmlstarlet pyx or xml2.
2 comments
tuesday 24 september 2013 à 20:31 piero said : #1
wednesday 25 september 2013 à 11:09 wodny said : #2