I have a huge XML document that makes up metadata for bushman drawings that were compiled in the late 19th centuary and recently digitised. I am trying to generate a view of part of that information, but I need to rearrange the structure of some XML elements. Oh, and YES! I do know that this can potentially be done seamlessly in XSLT 2.0 [1], but I want to generalise my solution and not restrict myself to processing my documents with a particular xslt processor –Saxon [2] is the only xslt processor known to be compatible with xslt 2.0. Just so you know, I am working with xsltproc [3], on Ubuntu 11.04; but in theory, this should be able to work on any platform using any xslt processor. So I have categories that are structured as shown below.
<?xml version="1.0" encoding="UTF-8"?> <data> <categories> <category> <id>7</id> <name>Plants and animals</name> </category> <category> <id>7</id> <name>History (personal)</name> </category> <category> <id>8</id> <name>Artefact and dress</name> </category> </categories> </data>
What I want to do is rearrange the structure of the categories and eventually end up with the structure below. In other words, I basically would like to group all ids under each specific category that they fall under.
<categories> <category> <name>plants and animals</name> <id>2</id> <id>5</id> <id>7</id> </category> </categories>
A quick check on the Web led me to Jeni’s XML Pages [4] and dpawson’s website [5]
Solution
The solution basically involves a two way approach. Identifying the categories that exist Extracting all the ids that fall under each individual category In essence though, what I essentially want to do is group all ids by category. I came up with the xslt stylesheet below.
<?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:key name="ids-by-category" match="category" use="name" /> <xsl:output method="xml" indent="yes" encoding="utf-8" /> <xsl:template match="/data"> <xsl:apply-templates select="categories" /> </xsl:template> <xsl:template match="categories"> <categories> <xsl:for-each select="category[count(. | key('ids-by-category', name)[1]) = 1]"> <xsl:sort select="name" /> <category> <name> <xsl:value-of select="name" /> </name> <xsl:for-each select="key('ids-by-category', name)"> <xsl:sort select="id" data-type="number" /> <id> <xsl:value-of select="id" /> </id> </xsl:for-each> </category> </xsl:for-each> </categories> </xsl:template> </xsl:stylesheet>
The resulting xml document has structure below.
<?xml version="1.0" encoding="utf-8"?> <categories> <category> <name>Artefact and dress</name> <id>8</id> <id>16</id> <id>18</id> <id>39</id> <id>57</id> <id>85</id> <id>118</id> <id>200</id> <id>257</id> <id>339</id> <id>346</id> <id>365</id> <id>391</id>
References
[1] XSL Transformations (XSLT) Version 2.0, (N.D.) Retrieved October 13, 2011, from W3C: http://www.w3.org/TR/xslt20/
[2] SAXON – The XSLT and XQuery Processor, (N.D.) Retrieved October 13, 2011, from Sourceforge: http://saxon.sourceforge.net/
[3] XSLTProc, (N.D.) Retrieved October 13, 2011, from XMLSoft: http://xmlsoft.org/XSLT/xsltproc.html
[4] GROUPING USING THE MUENCHIAN METHOD, (N.D.) Retrieved October 13, 2011, from JENI’S XML PAGES: http://jenitennison.com/xslt/grouping/muenchian.xml
[5] Special Techniques, (N.D.) Retrieved October 13, 2011, from XSLT FAQ. DOCBOOK FAQ. Braille: http://www.dpawson.co.uk/xsl/sect2/muench.html