Meuchian Grouping Method – XSLT 1.0

I have a huge XML document that makes up metadata for bushman drawings that were compiled in the late 19th centuary and recently digitised. I am trying to generate a view of part of that information, but I need to rearrange the structure of some XML elements. Oh, and YES! I do know that this can potentially be done seamlessly in XSLT 2.0 [1], but I want to generalise my solution and not restrict myself to processing my documents with a particular xslt processor –Saxon [2] is the only xslt processor known to be compatible with xslt 2.0. Just so you know, I am working with xsltproc [3], on Ubuntu 11.04; but in theory, this should be able to work on any platform using any xslt processor. So I have categories that are structured as shown below.

<?xml version="1.0" encoding="UTF-8"?>
<data>
  <categories>
    <category>
      <id>7</id>
      <name>Plants and animals</name>
    </category>
    <category>
      <id>7</id>
      <name>History (personal)</name>
    </category>
    <category>
      <id>8</id>
      <name>Artefact and dress</name>
    </category>
  </categories>
</data>

What I want to do is rearrange the structure of the categories and eventually end up with the structure below. In other words, I basically would like to group all ids under each specific category that they fall under.

<categories>
   <category>
     <name>plants and animals</name>
     <id>2</id>
     <id>5</id>
     <id>7</id>
   </category>
</categories>

A quick check on the Web led me to Jeni’s XML Pages [4] and dpawson’s website [5]

Solution

The solution basically involves a two way approach. Identifying the categories that exist Extracting all the ids that fall under each individual category In essence though, what I essentially want to do is group all ids by category. I came up with the xslt stylesheet below.

<?xml version="1.0" encoding="utf-8"?>
   <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:key name="ids-by-category" match="category" use="name" />
     <xsl:output method="xml" indent="yes" encoding="utf-8" />

     <xsl:template match="/data">
       <xsl:apply-templates select="categories" />
     </xsl:template>

     <xsl:template match="categories">
       <categories>
         <xsl:for-each select="category[count(. | key('ids-by-category', name)[1]) = 1]">
           <xsl:sort select="name" />
           <category>
             <name>
               <xsl:value-of select="name" />
             </name>
             <xsl:for-each select="key('ids-by-category', name)">
               <xsl:sort select="id" data-type="number" />
               <id>
                 <xsl:value-of select="id" />
               </id>
             </xsl:for-each>
           </category>
         </xsl:for-each>
       </categories>
     </xsl:template>
   </xsl:stylesheet>

The resulting xml document has structure below.

<?xml version="1.0" encoding="utf-8"?>
     <categories>
       <category>
	 <name>Artefact and dress</name>
	 <id>8</id>
	 <id>16</id>
	 <id>18</id>
	 <id>39</id>
	 <id>57</id>
	 <id>85</id>
	 <id>118</id>
	 <id>200</id>
	 <id>257</id>
	 <id>339</id>
	 <id>346</id>
	 <id>365</id>
	 <id>391</id>

References

[1] XSL Transformations (XSLT) Version 2.0, (N.D.) Retrieved October 13, 2011, from W3C: http://www.w3.org/TR/xslt20/
[2] SAXON – The XSLT and XQuery Processor, (N.D.) Retrieved October 13, 2011, from Sourceforge: http://saxon.sourceforge.net/
[3] XSLTProc, (N.D.) Retrieved October 13, 2011, from XMLSoft: http://xmlsoft.org/XSLT/xsltproc.html
[4] GROUPING USING THE MUENCHIAN METHOD, (N.D.) Retrieved October 13, 2011, from JENI’S XML PAGES: http://jenitennison.com/xslt/grouping/muenchian.xml
[5]  Special Techniques, (N.D.) Retrieved October 13, 2011, from XSLT FAQ. DOCBOOK FAQ. Braille: http://www.dpawson.co.uk/xsl/sect2/muench.html

Categories: Technical
Tags: , , ,