I have a huge XML document that makes up metadata for bushman drawings that were compiled in the late 19th centuary and recently digitised. I am trying to generate a view of part of that information, but I need to rearrange the structure of some XML elements. Oh, and YES! I do know that this can potentially be done seamlessly in XSLT 2.0 [1], but I want to generalise my solution and not restrict myself to processing my documents with a particular xslt processor –Saxon [2] is the only xslt processor known to be compatible with xslt 2.0. Just so you know, I am working with xsltproc [3], on Ubuntu 11.04; but in theory, this should be able to work on any platform using any xslt processor. So I have categories that are structured as shown below.
<?xml version="1.0" encoding="UTF-8"?>
<data>
<categories>
<category>
<id>7</id>
<name>Plants and animals</name>
</category>
<category>
<id>7</id>
<name>History (personal)</name>
</category>
<category>
<id>8</id>
<name>Artefact and dress</name>
</category>
</categories>
</data>
What I want to do is rearrange the structure of the categories and eventually end up with the structure below. In other words, I basically would like to group all ids under each specific category that they fall under.
<categories>
<category>
<name>plants and animals</name>
<id>2</id>
<id>5</id>
<id>7</id>
</category>
</categories>
A quick check on the Web led me to Jeni’s XML Pages [4] and dpawson’s website [5]
Solution
The solution basically involves a two way approach. Identifying the categories that exist Extracting all the ids that fall under each individual category In essence though, what I essentially want to do is group all ids by category. I came up with the xslt stylesheet below.
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:key name="ids-by-category" match="category" use="name" />
<xsl:output method="xml" indent="yes" encoding="utf-8" />
<xsl:template match="/data">
<xsl:apply-templates select="categories" />
</xsl:template>
<xsl:template match="categories">
<categories>
<xsl:for-each select="category[count(. | key('ids-by-category', name)[1]) = 1]">
<xsl:sort select="name" />
<category>
<name>
<xsl:value-of select="name" />
</name>
<xsl:for-each select="key('ids-by-category', name)">
<xsl:sort select="id" data-type="number" />
<id>
<xsl:value-of select="id" />
</id>
</xsl:for-each>
</category>
</xsl:for-each>
</categories>
</xsl:template>
</xsl:stylesheet>
The resulting xml document has structure below.
<?xml version="1.0" encoding="utf-8"?>
<categories>
<category>
<name>Artefact and dress</name>
<id>8</id>
<id>16</id>
<id>18</id>
<id>39</id>
<id>57</id>
<id>85</id>
<id>118</id>
<id>200</id>
<id>257</id>
<id>339</id>
<id>346</id>
<id>365</id>
<id>391</id>
References
[1] XSL Transformations (XSLT) Version 2.0, (N.D.) Retrieved October 13, 2011, from W3C: http://www.w3.org/TR/xslt20/
[2] SAXON – The XSLT and XQuery Processor, (N.D.) Retrieved October 13, 2011, from Sourceforge: http://saxon.sourceforge.net/
[3] XSLTProc, (N.D.) Retrieved October 13, 2011, from XMLSoft: http://xmlsoft.org/XSLT/xsltproc.html
[4] GROUPING USING THE MUENCHIAN METHOD, (N.D.) Retrieved October 13, 2011, from JENI’S XML PAGES: http://jenitennison.com/xslt/grouping/muenchian.xml
[5] Special Techniques, (N.D.) Retrieved October 13, 2011, from XSLT FAQ. DOCBOOK FAQ. Braille: http://www.dpawson.co.uk/xsl/sect2/muench.html


LinkedIn
Twitter
Facebook
Flickr
GooglePlus