Transforming OS X Plists Into XML

Comments

The challenge this week: convert recipe data from an app’s proprietary plist format to another XML format, using XSLT and PHP.

About the challenge

I am slowly building a recipe manager web app, and as a first step I am converting all data I have to a common XML format I have defined in part 1. Some of the data I had stored by a cute looking, but ultimately disappoing, OS X app called Yummy Soup!. I need to liberate that data, which the app can export as an OS X plist, which is an XML format.

Some artefacts available on GitHub.

Converting from YummySoup! plist format to XML

1
2
3
4
Feature: Converting from YummySoup! plist format to XML
    In order to consolidate my recipe data
    As a geeky recipe author
    I need to convert from plist to XML

Converting from one type of XML to another is a job for XSLT, and Apache Ant is what I normally run XSLT with. But first of all, I get the YS! recipes. I can only export them all to single file, by opening the app, selecting all recipes in it, and selecting ‘export’. It created a plist with 239 dict nodes in it, each of them a recipe.

Using Apache Ant to run an XSLT job on a file

1
2
3
4
5
Scenario: Using Apache Ant to run an XSLT job on a file
    Given that I have an XSLT and an XML file
    When I run the extractFromYS Ant job
    Then a new XML file should be produced
    And it should include data from the input file

This is easily achieved using my standard Ant setup

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
<project name="WeeklyChallenge12" default="extractFromYS" basedir=".">
  <description>Gotofritz Weekly Challenge 15</description>

  <property environment="env"/>
  <property file="common.properties"/>
  <property file="local.properties"/>

  <target
    name = "extractFromYS"
    description = "Extracts recipes from Yummy Soup"
    depends = "increaseBuildNumber"
    >
    <xslt
      in    = "239_YummySoup_recipes.ysr"
      out   = "output.xml"
      style = "recipes.xsl">
      <outputproperty name="method" value="xml"/>
      <outputproperty name="standalone" value="yes"/>
      <outputproperty name="encoding" value="utf-8"/>
      <outputproperty name="indent" value="yes"/>
    </xslt>
  </target>

  <target
    name = "increaseBuildNumber" >
    <propertyfile
      file    = "buildnumber.txt"
      comment = "Version number">
      <entry  key="build" type="int" default="0" operation="+"/>
    </propertyfile>
  </target>


</project>

Normally I split property files between common and local (i.e., those that go under version control and those that don’t). In this case I need neither, but Ant will carry on working even if it doesn’t find the files, which is nice. The job extractFromYS is set as default for the project, which allows me to run it from Sublime Text 2 without doing any special work - just choose ‘Ant’ as the build system, then hit CMD-B.

The target increaseBuildNumber is a standard one I use which increases a number in a text file every time I run the job. This is often useful, although probably not this time, but I am happy to keep it for consistency with my other projects. Finally the XSLT job - as simple as it gets, with a file in, file out, and XSLT file, plus some output parameters.

The XSL file is also as simple as it can be:

1
2
3
4
5
6
7
8
9
<?xml version="1.0" encoding="utf8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:preserve-space elements="*"/>
<xsl:output indent = "yes" cdata-section-elements="title description source step"/>

<xsl:template match="/plist/array/dict/string[2]">
------------------
  <xsl:value-of select=".">
</xsl:template>

What I am looking for is one string per recipe, to see that the XSLT runs and recognizes them. It did.

Transforming plist into XML

1
2
3
4
5
6
Scenario: Transforming plist into XML
    Given that an OS X plist document is being XSL transformed
    And key value pairs are stored as <key>KEY</key><string>VALUE</string>
    When a node is passed through a template
    and KEY passed as paramerter
    Then the associated VALUE should be replaced

The XML schema of plists is rather akward for XSLT, so I created a template for handling it. Note that ‘string’ is only one of the possible plist types, but it’s the only one I need in this particular case. I started with fetching the name of the recipe only as a test.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
<?xml version="1.0" encoding="utf8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:preserve-space elements="*"/>
<xsl:output indent = "yes" cdata-section-elements="title description source step"/>

<xsl:template match="/">
  <xsl:copy>
    <xsl:apply-templates select="/plist/array/dict"/>
  </xsl:copy>
</xsl:template>

<xsl:template match="/plist/array/dict">

<xsl:template name="val">
    <xsl:param name="node" />
    <xsl:value-of select="$node/following-sibling::string[1]" />
</xsl:template>

-------------------------

<xsl:processing-instruction name="xml-stylesheet">
href="recipe.xsl" type="text/xsl"
</xsl:processing-instruction>

<recipe lang="en-uk">
    <title><xsl:call-template name="val"><xsl:with-param name="node" select="*[. = 'name']"/></xsl:call-template></title>
    <description><xsl:call-template name="val"><xsl:with-param name="node" select="*[. = 'recipeDescription']"/></xsl:call-template></description>
</xsl:template>

The xsl:template match=”/” matches the whole document and it’s the entry point to control which other templates get to handle which nodes. Every time it encounters a dict node, which is once per recipe, in the source XML document, it handles control to xsl:template match=”/plist/array/dict”. There a new XML document is appended to the output, separated by a row of lines. This is not valid XML of course, but I will split it later.

xsl:processing-instruction name="xml-stylesheet" adds the stylesheet reference to the output - you can’t just type <?xsl... ?> otherwise it looks like it is an instruction to be run rather than a string to be output.

Then a couple of tags are mapped from one file to another: ‘name’ becomes ‘title’, ‘recipeDescription’ becomes ‘description’ and so on. A named template, val, is used to convert from nameMulligatawny to Mulligatawny This is easily achieved using following-sibling to fetch the value associated with a plist key.

Another thing worth nothing is the cdata-section-elements="title description source step" in the xsl:output node on line 6. That defines a list of elements whose content will be wrapped in a CDATA, which is nice.

Splitting string with XSL built in string functions

1
2
3
4
5
6
7
8
9
10
Scenario: Splitting string with XSL built in string functions
    Given that cuisine is stored as a string in format "STYLE / REGION"
    When I pass it through the template
    Then it should return a node for style, and one for region
    And it should be equal to "<style>STYLE</style><region>REGION</region>"
    Given that cuisine is stored as a string in format "STYLE"
    And it has no REGION
    When I pass it through the template
    Then it should return a node for style, and an empty one for region
    And it should be equal to <style>STYLE</style><region></region>`

There are two use cases here - one is dealing with strings like ‘Italian / Sardinian’ or ‘Indian / Kerala’, the other deals with ‘English’, ‘Spanish’ etc. First the ‘val’ template is called, to save the ‘cuisine’ string to a variable, then I use as simple xsl:choose and some of XSL’s built-in string functions to handle both use cases.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
<xsl:template match="/plist/array/dict">
    <xsl:variable name="cuisine"><xsl:call-template name="val"><xsl:with-param name="node" select="*[. = 'cuisine']"/></xsl:call-template></xsl:variable>
    ...

<xsl:processing-instruction name="xml-stylesheet">
href="recipe.xsl" type="text/xsl"
</xsl:processing-instruction>

<recipe lang="en-uk">
    ...
    <cuisine>
        <xsl:choose>
            <xsl:when test="contains( $cuisine, '/')">
                <style><xsl:value-of select="normalize-space( substring-before( $cuisine, '/' ) )" /></style>
                <region><xsl:value-of select="normalize-space( substring-after( $cuisine, '/') )" /></region>
            </xsl:when>
            <xsl:otherwise>
                <style><xsl:value-of select="$cuisine" /></style>
                <region></region>
            </xsl:otherwise>
        </xsl:choose>
        <approach></approach>
    </cuisine>
    ...
</recipe>

Splitting string with XSL using a recursive function

1
2
3
4
Scenario: Splitting string with XSL using a recursive function
    Given that tags are stored as a single comma separated string
    When I pass it through the splitstring template
    Then it should return a node for each tag

A comma separated list doesn’t have limits, so a recursive template called splitstring will be used. It is a generic template that can be reused in other projects - it takes three parameters as arguments, the input string, the delimiter and the ouput tag to be generated. The string to be passed in is saved to a variable with the val template.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
<!--
splitstring - splits a string and assigns each substring to a node

@param {String} list the string to be split
@param {String} delimiter the string to split by, [optional  default ,]
@oaran {String} tag the name of the tag to be created, [optional  default 'tag']
-->
<xsl:template name="splitstring">
    <xsl:param name="list" />
    <xsl:param name="delimiter" select="','"/>
    <xsl:param name="tag" select="'tag'"/>
    <xsl:variable name="newstring"><xsl:choose>
        <xsl:when test="contains( $list, $delimiter )"><xsl:value-of select="normalize-space($list)" /></xsl:when>
        <xsl:otherwise><xsl:value-of select="concat(normalize-space($list), $delimiter)" /></xsl:otherwise>
    </xsl:choose></xsl:variable>
    <xsl:variable name="first" select="substring-before($newstring, $delimiter)" />
    <xsl:variable name="remaining" select="substring-after($newstring, $delimiter)" />
    <xsl:element name="{$tag}"><xsl:value-of select="$first" /></xsl:element>
        <xsl:if test="$remaining">
            <xsl:call-template name="splitstring">
            <xsl:with-param name="strlisting" select="$remaining" />
            <xsl:with-param name="delimiter" select="$delimiter" />
            <xsl:with-param name="tag" select="$tag" />
        </xsl:call-template>
    </xsl:if>
</xsl:template>

<xsl:template match="/plist/array/dict">
    <xsl:variable name="tags"><xsl:call-template name="val"><xsl:with-param name="node" select="*[. = 'keywords']"/></xsl:call-template></xsl:variable>
    ...

<xsl:processing-instruction name="xml-stylesheet">
href="recipe.xsl" type="text/xsl"
</xsl:processing-instruction>

<recipe lang="en-uk">
...
    <tags>
        <xsl:call-template name="splitstring"><xsl:with-param name="list" select="$tags"/></xsl:call-template>
    </tags>
...

Using XSL extension to process tricky strings

1
2
3
4
5
Scenario: Using XSL extension to process tricky strings
    Given that ingredient data in XML node is too complex for XSLT
    And directions are not split in individual steps
    When I run the XSLT
    Then I want to process the list with a different language

I have two issues here. Firstly, the directions are split into steps in various different ways, none of them useful, depending on which version of the app they were created in. This is part of what I found so annoying with the software. The ingredients list is more regular, but it is a Python list of tuples, where a tuple either signals the start of a new group of ingredients, or it is a single ingredient.

The first case could be solved with a few regular expressions. XSLT 2 has them, but I am still on 1 so no joy there. Either way, the second case is simply too complex for XSLT, so I need to look at ways to bring other languages into the equation.

XSLT can be extended with various languages - Java as expected, but also Python (or rather Jython, the Java implementation thererof), and Javascript. Alternatively, when running the transform through PHP, the XSLT processor there is able to use PHP functions to process nodes.

I first tried Jython - the ingredients list is (almost) valid Python data, so I thought it should be easy. Then I tried Javascript, because I know it quite well. But I couldn’t get either to work - it looks like Xalan kept treating the scripts as Java instead of using the lang attribute. I posted on StackOverflow and hope that will help.

I had slightly more luck with extending XSLT with Java. but only by using simple methods of built-in Java classes into Xalan. For example here’s how to print the date using Java:

1
2
3
4
5
6
7
<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/T"
    exclude-result-prefixes="java"
    xmlns:java="http://xml.apache.org/xslt/java">
    ....

<xsl:value-of select="java:java.util.Date.new()"/>

But anything more complex proved to be problematic. That left one last option: good old PHP.

PHP has registerPHPFunctions which are equivalent to XSL extensions, but you have to run the transform from within PHP. This is not such a big deal though, as I am only processing one file.

First of all, I created the simplest possible PHP file to get started. The following will simply run the XML file through an XSL that copies the XML out as is, without changing it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
$file_in  = "/PATH/TO/239_YummySoup_recipes.ysr";
$file_out = "/PATH/TO/239_YummySoup_recipes_preprocessed.xml";

$xsl = <<<EOB
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     xmlns:php="http://php.net/xsl">

<xsl:output method="html" encoding="utf-8" indent="yes"/>
<xsl:template match="/">
  <xsl:copy-of select="." />
</xsl:template>
</xsl:stylesheet>
EOB;

$xml_in  = new DOMDocument;
$xml_in->load( $file_in );
$xsl_doc  = new DOMDocument( "1.0", "utf-8" );
$xsl_doc->loadXML( $xsl );
$xslt = new XSLTProcessor();
$xslt->importStyleSheet( $xsl_doc );

#NOTE that using <xsl:processing instruction doesn't seem to work
#the processing instruction is gereated as <?xml version=".." >
#i.e., without the closing ?
$xml_out = '<?xml version="1.0" encoding="utf-8"?>' . "\n\n" . $xslt->transformToXML( $xml_in );

$FH_file_out = fopen( $file_out, "w" );
fwrite( $FH_file_out, $xml_out );
fclose( $FH_file_out );

echo 'XML preprocessed';

Then I changed the Ant job so that it runs the PHP preprocessing before the XSLT

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
<project name="WeeklyChallenge12" default="extractFromYS" basedir=".">
    <description>Gotofritz Weekly Challenge 15</description>

  <property environment="env"/>
  <property file="common.properties"/>
  <property file="local.properties"/>

  <target
    name = "extractFromYS"
    description = "Extracts recipes from Yummy Soup"
    depends = "increaseBuildNumber"
    >
    <exec
      executable = "php">
      <arg value="-f=recipes.php" />
    </exec>
    <xslt
      in    = "239_YummySoup_recipes_preprocessed.xml"
      out   = "output.xml"
      style = "recipes.xsl"
      force = "yes">
      <outputproperty name="method" value="xml"/>
      <outputproperty name="standalone" value="yes"/>
      <outputproperty name="encoding" value="utf-8"/>
      <outputproperty name="indent" value="yes"/>
    </xslt>
  </target>

  <target
    name = "increaseBuildNumber" >
    <propertyfile
      file    = "buildnumber.txt"
      comment = "Version number">
      <entry  key="build" type="int" default="0" operation="+"/>
    </propertyfile>
  </target>

</project>

That works well, nothing changes in the output XML files. So finally I have a way to call functions written in a different language from XSLT.

Using PHP functions to process regular expressions in XSLT

1
2
3
4
5
6
7
8
9
Scenario: Using PHP functions to process regular expressions in XSLT
    Given that the directions data is not always split into steps
    But each step could be enclosed in &lt;li&gt; tags
    Or each step could be enclosed by <li> tags
    Or steps could be separated by two new lines
    Or steps could be separated by <br> or &lt;br&; tags
    And steps could be preceeded by (1) or 1)
    When I the PHP function processes them
    Then it should output a list of <step> nodes

The first step here is to arrange the XSLT so that it leaves everything untouched except for the nodes I want to process via PHP. I do this by changing the XSL embedded in the PHP file

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     xmlns:php="http://php.net/xsl">

<xsl:output method="html" encoding="utf-8" indent="yes"/>

<!-- ENTRY POINT -->
<xsl:template match="/">
<plist version="1.0">
<array>
     <xsl:apply-templates select="/plist/array/dict" />
</array>
</plist>
</xsl:template>

<!-- ONCE PER RECIPE -->
<xsl:template match="/plist/array/dict">
    <dict>
        <xsl:variable name="directions"><xsl:value-of select="./string[preceding-sibling::key='directions'][1]"/></xsl:variable>
        <xsl:copy-of select="php:function('splitSteps', \$directions )"/>
        <xsl:copy-of select="./*[preceding-sibling::*[1][not(. = 'directions')]]" />
    </dict>
</xsl:template>

</xsl:stylesheet>

Line 20 finds the node I am interested in, a string node preceeded by a key node with value ‘directions’, assign it to a variable, and then pass it to a PHP function. Note that the PHP function is called in a copy-of node, that allows the function to return some DOM nodes of its own - if I use value-of, then I can only return a string.

Line 22 finds all the other nodes, i.e. the ones who are not string nodes preceeded by etc., and just copy them as they are.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
/**
 * handles the directions, which are split in all sort of weird and wonderful ways, and turns the into a set of nodes. Note that the set of nodes need to ve a valid XML document, i.e. with a single rood node. Lucily, that works here.
 * @param  {String} $node the content of the ndoe
 * @return {DOMDocument}       a DOM tree
 */
function splitSteps( $node ) {

    $find    = array();
    $replace = array();

    #hteml_decode_entities didn't work
    array_push( $find, "'&lt;'" );
    array_push( $replace,  "<" );

    array_push( $find, "'&gt;'" );
    array_push( $replace,  ">" );

    array_push( $find, "'<br>'" );
    array_push( $replace,  "\n\n" );

    array_push( $find, "'</?[ou]l>'" );
    array_push( $replace,  "" );

    array_push( $find, "'<li>'" );
    array_push( $replace,  "\n" );

    array_push( $find, "'</li>'" );
    array_push( $replace,  "\n" );

    array_push( $find, "'\n[ \t]+'" );
    array_push( $replace,  "\n" );

    array_push( $find, "'\n{2,}'" );
    array_push( $replace,  "\n\n" );

    array_push( $find, "'(^|\n)\(?\d+\.?\d?\)\s*'" );
    array_push( $replace,  "\n\n" );

    array_push( $find, "'\n\n'" );
    array_push( $replace,  "</step>\n<step>" );

    $node = "<directions><step>" . preg_replace( $find, $replace, $node ) . "</step></directions>";
    $dom = new DOMDocument("1.0","UTF-8");
    $dom->loadXML( $node ) or die ( $node );
    return $dom;
}

The PHP function runs a bunch of regular expressions on the strings, using the array form of preg_replace. The end result is an XML tree with a single root node. This is then parsed into an XML document and returned. The PHP XSLT processor will treat that XML as a DOM fragment, and happily add the nodes to the document it is processing. If I had only returned the XML string, all the > and < would have been transformed into &gt; and &lt;.

I run this, and apart from a few stray missing entities (which is why I added the die statement) it processed fine.

More XSLT processing with PHP

1
2
3
4
5
Scenario: More XSLT processing with PHP
    Given that the ingredient list is almost a list of Python tuples
    And strings are not quoted unless empty or longer than one word
    When I parse the ingredient node
    Then I want a list of nodes compatible with the XML schema I have been working with

The ingredient list is stored as a single string in this format:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
<string>(
        (
        "For The Pastry:",
        "",
        "",
        "",
        YES,
        NO,
        945989
    ),
        (
        125,
        g,
        "unsalted butter",
        "",
        NO,
        NO,
        2364227
    ), ....

where YES, NO, determine whether the entry is an ingredient group or an ingredient. Not a great format, but at least regular.

First I change the XSL so that it it now passes either ingredients or directions nodes to the respective PHP function, and passes the rest untouched.

1
2
3
4
5
6
7
8
9
10
<xsl:template match="/plist/array/dict">
    <dict>
        <xsl:variable name="directions"><xsl:value-of select="./string[preceding-sibling::key='directions'][1]"/></xsl:variable>
        <xsl:variable name="ingredients"><xsl:value-of select="./string[preceding-sibling::key='ingredientsArray'][1]"/></xsl:variable>

        <xsl:copy-of select="php:function('getIngredients', \$ingredients )"/>
        <xsl:copy-of select="php:function('splitSteps', \$directions )"/>
        <xsl:copy-of select="./*[preceding-sibling::*[1][not(. = 'ingredientsArray') and not(. = 'directions')]]" />
    </dict>
</xsl:template>

The PHP function to handle ingredients is not so hard (in PHP - it’d have been a much worse in XSL). It uses regular expressions to massage the string into something that resembles a csv string, then splits it, then copies the various bits into DOM nodes.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
/**
 * parses the ingredient string, which looks like a python tuple, but without quoted string
 * @param  String $node the node being processed
 * @return DOMFragment       a list of nodes
 */
function getIngredients( $node ) {
    //removes enclosing ()
    $node = substr( $node, 1, -1 );

    //gets rid of stuff and flatten
    $node = preg_replace( "'\"'", "", $node );
    $node = preg_replace( "'\n\s*'", " ", $node );

    //gets individual  ingredients or group names
    $lines = preg_split( "'(^|\),)\s*\('", $node );
    $header = array(
            "quantity" => 0
          , "measurement" => 1
          , "name"        => 2
          , "preparation" => 3
          , "isgroup"     => 4
          , "ignore"      => 5
          , "ignore2"     => 6
          );

    $dom  = new DOMDocument( "1.0", "UTF-8" );
    $root = $dom->appendChild( new DOMElement( 'ingredients' ) );

    for( $i=0, $i2=sizeof( $lines ); $i<$i2; $i++ ){
        $line = trim( $lines[$i] );
        if( "" === $line ){
            continue;
        }
        //treats line as a CSV
        $fields = preg_split( "':?\,\s*'", $line );
        //a line is either a groupname or an igredient
        if( "YES" === $fields[ $header["isgroup"] ] ){
            $ndGroup = $root->appendChild( new DOMElement( 'group' ) );
            $ndGroup->appendChild( new DOMElement( 'name', $fields[ $header["quantity"] ] ) );
        } else {
            if( !isset( $ndGroup ) ){
                $ndGroup = $root->appendChild( new DOMElement( 'group' ) );
            }
            $ndIngredient = $ndGroup->appendChild( new DOMElement( 'ingredient' ) );
            $ndIngredient->appendChild( new DOMElement( 'quantity',     $fields[ $header["quantity"] ] ) );
            $ndIngredient->appendChild( new DOMElement( 'measurement',  $fields[ $header["measurement"] ] ) );
            $ndIngredient->appendChild( new DOMElement( 'name',         $fields[ $header["name"] ] ) );
            $ndIngredient->appendChild( new DOMElement( 'preparation',  $fields[ $header["preparation"] ] ) );
        }
    }
    return $root;
}

Splitting XSL output to separate files with Xalan’s redirect

1
2
3
4
5
Scenario: Splitting XSL output to separate files with Xalan's redirect
    Given that XSLT are being processed with Xalan
    And that the input is a single XML file with several recipes in it
    When a recipe start is encountered
    Then the processor should create a new file for it

There are several possible approaches to splitting the XML in separate files, but the easiest is to use one of the XSLT extensions supported by Xalan, the default XSL processor used by Ant.

The extension needed in this case is redirect (can be found on the bottom LHS). To import it into the XSL document, I added a namespace to the xsl:stylesheet decalration:

1
2
3
4
5
6
7
8
<?xml version="1.0" encoding="utf8"?>
<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:jython-extension="http://www.jython.org/"
    xmlns:redirect="org.apache.xalan.xslt.extensions.Redirect"
    extension-element-prefixes="redirect"
    xmlns:lxslt="http://xml.apache.org/xslt"
    exclude-result-prefixes="lxslt">

(NOTE: some sources suggest using xmlns:redirect=”org.apache.xalan.xslt.extensions.Redirect” but that doesn’t work for the version of Xalan included with Ant).

Then using it is simply a matter of enclosing the recipe template with a redirect:write tags, which takes a ‘file’ attribute to specify the file path. Note that ‘file’ is relative to where the project home, which by default is where the buid.xml file sits.

I also used the xsl:fallback tag for catching errors, but for a one off job it doesn’t really matter.

Note that among the XSLT extension supported by Xalan there are also some dealing with string, including splitting a string, but the recursive function I have created works fine and it’s independent of the processor, so I’ll stick to that.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<xsl:template match="/plist/array/dict">
    <xsl:variable name="title"><xsl:call-template name="val"><xsl:with-param name="node" select="*[. = 'name']"/></xsl:call-template></xsl:variable>
...

    <redirect:write  file="reipes/{$title}.xml">
<xsl:processing-instruction name="xml-stylesheet">href="recipe.xsl" type="text/xsl"</xsl:processing-instruction>

<recipe lang="en-uk">
....
</recipe>
<xsl:fallback>
--- REDIRECT FAILED ---
</xsl:fallback>
    </redirect:write>

At first the transform failed with the error: Error! Unrecognized XSLTC extension ‘org.apache.xalan.xslt.extensions.Redirect:write’. After a bit of digging around, I downloaded the latest version of Xalan, extracted the zip, and copied the jars with sudo cp ~/Downloads/xalan-j_2_7_1/lib/*.jar /usr/share/ant/lib/That fixed the issue, and I was able to generate all the recipes.

Challenge 100% complete

If this was a proper job, I would have first explored whether I could have done it all the way I planned it (i.e., using Ant, XSLT, and Jython) and would have changed plan once I discovered I couldn’t. That would have meant doing the whole thing in PHP rather than a two step procedure. But I am only playing around, and the important thing is that the conversation was successful.

The scripts are available on GitHub.

Comments