RSS 2.0
Sign In
# Friday, September 05, 2008

We're facing a task of conversion of a java method into a state machine. This is like to convert a SAX Parser, pushing data, into an Xml Reader, which pulls data.

The task is formalized as:

  • for a given method containing split markers create a class perimitting iteration;
  • each iteration performs part of a logic of a method.

We have defined rules converting all statements into a state machine except of the statement synchronied. In fact the logic it rather linear, however the most untrivial conversion is for try statement. Consider an example:

public class Test
{
  void method()
    throws Exception
  {
    try
    {
      A();
      B();
    }
    catch(Exception e)
    {
      C(e);
    }
    finally
    {
      D();
    }

    E();
  }

  private void A()
    throws Exception
  {
    // logic A
  }

  private void B()
    throws Exception
  {
    // logic B
  }

  private void C(Exception e)
    throws Exception
  {
    // logic C
  }

  private void D()
    throws Exception
  {
    // logic D
  }

  private void E()
    throws Exception
  {
    // logic E
  }
}

Suppose we want to see method() as a state machine in a way that split markers are after calls to methods A(), B(), C(), D(), E(). This is how it looks as a state machine:

Callable<Boolean> methodAsStateMachine()
  throws Exception
{
  return new Callable<Boolean>()
  {
    public Boolean call()
      throws Exception
    {
      do
      {
        try
        {
          switch(state)
          {
            case 0:
            {
              A();
              state = 1;

              return true;
            }
            case 1:
            {
              B();
              state = 3;

              return true;
            }
            case 2:
            {
              C(ex);
              state = 3;

              return true;
            }
            case 3:
            {
              D();

              if (currentException != null)
              {
                throw currentException;
              }

              state = 4;

              return true;
            }
            case 4:
            {
              E();
              state = -1;

              return false;
            }
          }

          if (currentException == null)
          {
            currentException = new IllegalStateException();
          }
        }
        catch(Throwable e)
        {
          currentException = null;

          switch(state)
          {
            case 0:
            case 1:
            {
              if (e instanceof Exception)
              {
                ex = (Exception)e;
                state = 2;
              }
              else
              {
                currentException = e;
                state = 3;
              }

              continue;
            }
            case 2:
            {
              currentException = e;
              state = 3;

              continue;
            }
          }

          currentException = e;
          state = -1;
        }
      }
      while(false);

      return this.<Exception>error();
    }

    @SuppressWarnings("unchecked")
    private <T extends Throwable> boolean error()
      throws T
    {
      throw (T)currentException;
    }

    private int state = 0;
    private Throwable currentException = null;
    private Exception ex = null;
  };
}

Believe it, or not but this transformation can be done purely in xslt 2.0 with the help of the jxom (Java xml object model). We shall update jxom.zip whenever this module will be implemented and tested.

Friday, September 05, 2008 3:39:50 PM UTC  #    Comments [0] -
xslt
# Wednesday, September 03, 2008

In the xslt one can express logically the same things in different words like:

  exists($x)
and
  every $y in $x satisfies exists($y)

newbie> Really the same?
expert> Ops... You're right, these are different things!

What's the difference?

Wednesday, September 03, 2008 12:34:06 PM UTC  #    Comments [0] -
xslt
# Saturday, August 30, 2008

I was already writing about tuples and maps in the xslt (see Tuples and maps - Status: CLOSED, WONTFIX, and Tuples and maps in Saxon).

Now, I want to argue on a use case, and on how xslt processor can detect such a use case and implement it as map. This way, for a certain conditions, a sequences could be treated as maps (or as sets).

Use case.

There are two stages:

  • a logic collecting nodes/values satisfying some criteria.
  • process data, and take a special action whenever a node/value is collected on the previous stage.

Whenever we're talking of nodes than result of the first stage is a sequence $set as node()*. The role of this sequence is a set of nodes (order is not important).

The second stage is usually an xsl:for-each, an xsl:apply-templates, or something of this kind, which repeatedly verifies whether a some $node as node()? belongs to the $set, like a following: $node intersect $set, or $node except $set.

In spite of that we're still using regular xpath 2.0, we have managed to express a set based operation. It's a matter of xslt processor's optimizer to detect such a use case and consider a sequence as a set. In fact the detection rule is rather simple.

For expressions $node except $set and $node intersect $set:
  • $set can be considered as a set, as order of elements is not important;
  • chances are good that a $set being implemented as a set outperforms implementation using a list or an array.

Thus what to do? Well, I do not think I'm the smartest child, quite opposite... however it worth to hint this idea to xslt implementers (see Suggest optimization). I still do not know if it was fruitful...

P.S. A very similar use case exists for a function index-of($collection, $item).

Saturday, August 30, 2008 7:44:44 AM UTC  #    Comments [0] -
xslt
# Tuesday, August 12, 2008

I know we're not the first who create a parser in xslt. However I still want to share our implementation, as I think it's beautiful.

In our project, which is conversion from a some legacy language to java, we're dealing with dynamic expressions. For example in the legacy language one can filter a collection using an expression defined by a string: collection.filter("a > 0 and b = 7");

Whenever expression string is calculated there is nothing to do except to parse such string at runtime and perform filtering dynamically. On the other hand we have found that in the majority of cases literal strings are used. Thus we have decided to optimize this route like this:

  collection.filter(
    new Filter<T>()
    {
      boolean filter(T value)
      {
        return (value.getA() > 0) and (value.getB() = 7);
      }
    });

This means that we're converting that expression string into java code on the generation stage.

In the xslt - our generator engine - this means that we have to convert a string into expression tree like this:

(a > 7 or a= 3) and c * d = 2.2

to

<and>
  <or>
    <gt>
      <identifier>a</identifier>
      <integer>7</integer>
    </gt>
    <eq>
      <identifier>a</identifier>
      <integer>3</integer>
    </eq>
  </or>
  <eq>
    <mul>
      <identifier>c</identifier>
      <identifier>d</identifier>
    </mul>
    <decimal>2.2</decimal>
  </eq>
</and>

Our parser fits naturally to the world of parsers: it uses xsl:analyze-string instruction to tokenize input and parses tokens according to an expression grammar. During implementation I've found some new to me things. I think they worth mentioning:

  • As tokenizer is defined as a big regular expression, we have rather verbose regex attribute over xsl:analyze-string. It was hard to edit such a big line until I've found there is flag="x" option that solves formatting problems:

    The flags attribute may be used to control the interpretation of the regular expression... If it contains the letter x, then whitespace within the regular expression is ignored.

    This means that I can use spaces to format regular expression and /s to specify space as part of expression.
  • Saxon 9.1.0.1 has inefficiency in implementation of xsl:analyze-string instruction, whenever regex contains literal value however with '{' character (e.g. "\p{{L}}"), as it considers the value to be an AVT and delays pattern compilation until runtime, which it does every time instruction is executed.

Use following link to see the xslt: expression-parser.xslt.
To see how to generate java from an xml follow this link: Xslt for the jxom (Java xml object model), jxom.zip.

Tuesday, August 12, 2008 2:45:54 PM UTC  #    Comments [0] -
xslt
# Thursday, July 31, 2008

Yesterday, incidentally, I've arrived to a problem of a dynamic error during evaluation of a template's match. This reminded me SFINAE in C++. There the principle is applied at compile time to find a matching template.

I think people underestimate the meaning of this behaviour. The effect of dynamic errors occurring during pattern evaluation is described in the specification:

Any dynamic error or type error that occurs during the evaluation of a pattern against a particular node is treated as a recoverable error even if the error would not be recoverable under other circumstances. The optional recovery action is to treat the pattern as not matching that node.

This has far reaching consequences, like an error recovery. To illustrate what I'm talking about please look at this simple stylesheet that recovers from "Division by zero.":

<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xsl:template match="/">
  <xsl:variable name="operator" as="element()+">
    <div divident="10" divisor="0"/>
    <div divident="10" divisor="2"/>
  </xsl:variable>

  <xsl:apply-templates select="$operator"/>
</xsl:template>

<xsl:param name="NaN" as="xs:double" select="1.0 div 0"/>

<xsl:template
  match="div[(xs:integer(@divident) div xs:integer(@divisor)) ne $NaN]">
  <xsl:message select="xs:integer(@divident) div xs:integer(@divisor)"/>
</xsl:template>

<xsl:template match="div">
  <xsl:message select="'Division by zero.'"/>
</xsl:template>

</xsl:stylesheet>

Here, if there is a division by zero a template is not matched and other template is selected, thus second template serves as an error handler for the first one. Definitely, one may define much more complex construction to be handled this way.

I never was a purist (meaning doing everything in xslt), however this example along with indirect function call, shows that xslt is rather equiped language. One just need to be smart enough to understand how to do a things.

See also: Try/catch block in xslt 2.0 for Saxon 9.

Thursday, July 31, 2008 11:52:21 AM UTC  #    Comments [0] -
Tips and tricks | xslt
# Monday, July 28, 2008

Among other job activities, we're from time to time asked to check technical skills of job applicants.

Several times we were interviewing people who're far below the acceptable professional skills. It's a torment for both sides, I should say.

To ease things we have designed a small questionnaire (specific to our projects) for job applicants. It's sent to an applicant before the meeting. Even partially answered, this questionnaire constitutes a good filter against profanes:

<questionnaire>
  <item>
    <question>
      Please estimate your knowledge in XML Schema (xsd) as lacking, bad, good, or perfect.
    </question>
    <answer/>
  </item>
  <item>
    <question>
      Please estimate your knowledge in xslt 2.0/xquery 1.0 as lacking, bad, good, or perfect.
    </question>
    <answer/>
  </item>
  <item>
    <question>
      Please estimate your knowledge in xslt 1.0 as lacking, bad, good, or perfect.
    </question>
    <answer/>
  </item>
  <item>
    <question>
      Please estimate your knowledge in java as lacking, bad, good, or perfect.
    </question>
    <answer/>
  </item>
  <item>
    <question>
      Please estimate your knowledge in c# as lacking, bad, good, or perfect.
    </question>
    <answer/>
  </item>
  <item>
    <question>
      Please estimate your knowledge in sql as lacking, bad, good, or perfect.
    </question>
    <answer/>
  </item>
  <item>
    <question>
      For logical values A, B, please rewrite logical expression "A and B" using operator "or".
    </question>
    <answer/>
  </item>
  <item>
    <question>
      For logical values A, B, please rewrite logical expression "A = B" using operators "and" and "or".
    </question>
    <answer/>
  </item>
  <item>
    <question>
      There are eight balls, with only one heavier than some other.
      What is a minimum number of weighings reveals the heavier ball?
      Please be suspicious about the "trivial" solution.
    </question>
    <answer/>
  </item>
  <item>
    <question>
      If A results in B. What one may say about the reason of B?
    </question>
    <answer/>
  </item>
  <item>
    <question>
      If only A or B result in C. What one may say about the reason of C?
    </question>
    <answer/>
  </item>
  <item>
    <question>
      Please define an xml schema for this questionnaire.
    </question>
    <answer/>
  </item>
  <item>
    <question>
      Please create a simple stylesheet creating an html table based on this questionnaire.
    </question>
    <answer/>
  </item>
  <item>
    <question>
      For a table A with columns B, C, and D, please create an sql query selecting B groupped by C and ordered by D.
    </question>
    <answer/>
  </item>
  <item>
    <question>
      For a sequence of xml elements A with attribute B, please write a stylesheet excerpt creating a sequence of elements D, grouping elements A with the same string value of attribute B, sorted in the order of ascending of B.
    </question>
    <answer/>
  </item>
  <item>
    <question>
      Having a java class A with properties B and C, please sort a collection of A for B in ascending, and C in descending order.
    </question>
    <answer/>
  </item>
  <item>
    <question>
      What does a following line mean in c#?
      int? x;
    </question>
    <answer/>
  </item>
  <item>
    <question>
      What is a parser?
    </question>
    <answer/>
  </item>
  <item>
    <question>
      How to issue an error in the xml stylesheet?
    </question>
    <answer/>
  </item>
  <item>
    <question>
      What is a lazy evaluation?
    </question>
    <answer/>
  </item>
  <item>
    <question>
      How do you understand a following sentence?
      For each line of code there should be a comment.
    </question>
    <answer/>
  </item>
  <item>
    <question>
      Have you used any supplemental information to answer these questions?
    </question>
    <answer/>
  </item>
  <item>
    <question>
      Have you independently answered these questions?
    </question>
    <answer/>
  </item>
</questionnaire>

Monday, July 28, 2008 10:54:54 AM UTC  #    Comments [0] -
Tips and tricks | xslt
# Thursday, July 10, 2008

I've found that proposition to introduce tuples and maps to xslt/xquery type system has not found a support:

At the joint meeting of the XSL and XQuery Working groups 2008-06-23 it was decided that a change of this nature would be too large for the next "point" release of the Recommendations. The request for new functionality will be considered for a future "main" release.

Boor> *****!

Pessimist> Ah, there won't be tuples and maps in xslt/xquery...

Optimist> Wow, chances are good to see this addition by the year 2018!

Thursday, July 10, 2008 5:54:52 AM UTC  #    Comments [0] -
xslt
# Thursday, July 03, 2008

Today Michael Kay has announced an update for the Saxon processor. The latest version for now is 9.1.

I've checked our saxon.extensions, and has fixed incompatibilities.

The source for the new version of the Saxon can be found at http://saxon.sourceforge.net/.

New features are discussed at: http://www.saxonica.com/documentation/changes/intro.html

Our extensions can be found at saxon.extensions.9.1.zip.

Thursday, July 03, 2008 1:47:13 PM UTC  #    Comments [0] -
xslt
# Thursday, June 26, 2008

We are designing a rather complex xslt 2.0 application, dealing with semistructured data. We must tolerate with errors during processing, as there are cases where an input is not perfectly valid (or the program is not designed or ready to get such an input).

The most typical error is unsatisfied expectation of tree structure like:
  <xsl:variable name="element" as="element()" select="some-element"/>

Obviously, dynamic error occurs if a specified element is not present. To concentrate on primary logic, and to avoid a burden of illegal (unexpected) case recovery we have created a try/catch API. The goal of such API is:

  • to be able to continue processing in case of error;
  • report as much as possible useful information related to an error.

Alternatives:

Do not think this is our arrogance, which has turned us to create a custom API. No, we were looking for alternatives! Please see [xsl] saxon:try() discussion:

  • saxon:try() function - is a kind of pseudo function, which explicitly relies on lazy evaluation of its arguments, and ... it's not available in SaxonB;
  • ex:error-safe  extension instruction - is far from perfect in its implementation quality, and provides no error location.

We have no other way except to design this feature by ourselves. In our defence one can say that we are using innovatory approach that encapsulates details of the implementation behind template and calls handlers indirectly.

Use:

Try/catch API is designed as a template <xsl:template name="t:try-block"/> calling a "try" handler, and, if required, a "catch" hanler using <xsl:apply-templates mode="t:call"/> instruction. Caller passes any information to these handlers by the means of tunnel parameters.

Handlers must be in a "t:call" mode. The "catch" handler may recieve following error info parameters:

<xsl:param name="error" as="xs:QName"/>
<xsl:param name="error-description" as="xs:string"/>
<xsl:param name="error-location" as="item()*"/>

where $error-location is a sequence of pairs (location as xs:string, context as item())*.

A sample:

<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:t="http://www.nesterovsky-bros.com/xslt/public/"
  exclude-result-prefixes="xs t">

<xsl:include href="try-block.xslt"/>

<xsl:template match="/">
  <result>
    <xsl:for-each select="1 to 10">
      <xsl:call-template name="t:try-block">
        <xsl:with-param name="value" tunnel="yes" select=". - 5"/>
        <xsl:with-param name="try" as="element()">
          <try/>
        </xsl:with-param>
        <xsl:with-param name="catch" as="element()">
          <t:error-handler/>
        </xsl:with-param>
      </xsl:call-template>
    </xsl:for-each>
  </result>
</xsl:template>

<xsl:template mode="t:call" match="try">
  <xsl:param name="value" tunnel="yes" as="xs:decimal"/>

  <value>
    <xsl:sequence select="1 div $value"/>
  </value>
</xsl:template>

</xsl:stylesheet>

The sample prints values according to the formula "1/(i - 5)", where "i" is a variable varying from 1 to 10. Clearly, division by zero occurs when "i" is equal to 5.

Please notice how to access try/catch API through <xsl:include href="try-block.xslt"/>. The main logic is executed in <xsl:template mode="t:call" match="try"/>, which recieves parameters using tunneling. A default error handler <t:error-handler/> is used to report errors.

Error report:

Error: FOAR0001
Description:
Decimal divide by zero

Location:
1. systemID: "file:///D:/style/try-block-test.xslt", line: 34
2. template mode="t:call" match="element(try, xs:anyType)"
  systemID: "file:///D:/style/try-block-test.xslt", line: 30
  context node:
    /*[1][local-name() = 'try']
3. template mode="t:call"
  match="element({http://www.nesterovsky-bros.com/xslt/private/try-block}try, xs:anyType)"
  systemID: "file:///D:/style/try-block.xslt", line: 53
  context node:
    /*[1][local-name() = 'try']
4. systemID: "file:///D:/style/try-block.xslt", line: 40
5. call-template name="t:try-block"
  systemID: "file:///D:/style/try-block-test.xslt", line: 17
6. for-each
  systemID: "file:///D:/style/try-block-test.xslt", line: 16
  context item: 5
7. template mode="saxon:_defaultMode" match="document-node()"
  systemID: "file:///D:/style/try-block-test.xslt", line: 14
  context node:
    /

Implementation details:

You were not expecting this API to be pure xslt, weren't you? :-)

Well, you're right, there is an extension function. Its pseudo code is like this:

function tryBlock(tryItems, catchItems)
{
  try
  {
    execute xsl:apply-templates for tryItems.
  }
  catch
  {
    execute xsl:apply-templates for catchItems.
  }
}

 

The last thing. Please get the implementation saxon.extensions.zip. There you will find sources of the try/catch, and tuples/maps API.

Thursday, June 26, 2008 9:18:50 AM UTC  #    Comments [0] -
Announce | Tips and tricks | xslt
# Tuesday, June 17, 2008

Right now we're inhabiting in the java world, thus all our tasks are (in)directly related to this environment.

We want to store stylesheets as resources of java application, and at the same time to point to these stylesheets without jar qualification. In .NET this idea would not appear at all, as there are well defined boundaries between assemblies, but java uses rather different approach. Whenever you have a resource name, it's up to ClassLoader to find this resource. To exploit this feature we've created an uri resolver for the stylesheet transformation. The protocol we use has a following format: "resource:/resource-path".

For example to store stylesheets in the META-INF/stylesheets folder we use uri "resource:/META-INF/stylesheets/java/main.xslt". Relative path is resolved naturally. A path "../jxom/java-serializer.xslt" in previously mentioned stylesheet is resolved to "resource:/META-INF/stylesheets/jxom/java-serializer.xslt".

We've created a small class ResourceURIResolver. You need to supply an instance of TransformerFactory with this resolver:
  transformerFactory.setURIResolver(new ResourceURIResolver());

The class itself is so small that we qoute it here:

import java.io.InputStream;

import java.net.URI;
import java.net.URISyntaxException;

import javax.xml.transform.Source;
import javax.xml.transform.TransformerException;
import javax.xml.transform.URIResolver;

import javax.xml.transform.stream.StreamSource;

/**
 * This class implements an interface that can be called by the processor
 * to turn a URI used in document(), xsl:import, or xsl:include into a
 * Source object.
 */
public class ResourceURIResolver implements URIResolver
{
  /**
   * Called by the processor when it encounters
   * an xsl:include, xsl:import, or document() function.
   *
   * This resolver supports protocol "resource:".
   * Format of uri is: "resource:/resource-path", where "resource-path" is an
   * argument of a {@link ClassLoader#getResourceAsStream(String)} call.
   * @param href - an href attribute, which may be relative or absolute.
   * @param base - a base URI against which the first argument will be made
   *   absolute if the absolute URI is required.
   * @return a Source object, or null if the href cannot be resolved, and
   *   the processor should try to resolve the URI itself.
   */
  public Source resolve(String href, String base)
    throws TransformerException
  {
    if (href == null)
    {
      return null;
    }

    URI uri;

    try
    {
      if (base == null)
      {
        uri = new URI(href);
      }
      else
      {
        uri = new URI(base).resolve(href);
      }
    }
    catch(URISyntaxException e)
    {
      // Unsupported uri.
      return null;
    }

    if (!"resource".equals(uri.getScheme()))
    {
      return null;
    }

    String resourceName = uri.getPath();

    if ((resourceName == null) || (resourceName.length() == 0))
    {
      return null;
    }

    if (resourceName.charAt(0) == '/')
    {
      resourceName = resourceName.substring(1);
    }

    ClassLoader classLoader = Thread.currentThread().getContextClassLoader();
    InputStream stream = classLoader.getResourceAsStream(resourceName);

    if (stream == null)
    {
      return null;
    }

    return new StreamSource(stream, uri.toString());
  }
}

Tuesday, June 17, 2008 7:57:52 AM UTC  #    Comments [0] -
Tips and tricks | xslt
# Monday, June 09, 2008

We've uploaded an update for the jxom.

It has turned out that jxom schema is so powerful that you can do a great number of manipulations over xml representation of java program.

In our case this is an optimization of unreachable code, defined at Sun's spec. We're facing this problem as result of translation from other ancient language, which also has well defined xml schema.

We also have introduced an ability to annotate jxom elements (see meta element), which in practice we use to annotate expressions with their types and perform "compile time" expression evaluation.

You may download jxom version at usual place.

See also: Java Xml Object Model.

Monday, June 09, 2008 6:47:54 AM UTC  #    Comments [0] -
xslt
# Sunday, May 18, 2008

Recently I've proposed to add two new atomic types tuple and map to the xpath/xslt/xquery type system (see "Tuples an maps"). Later I've implemented tuple and map pure xslt approximation. Now I want to present java implementation for Saxon.

I've created TupleValue and MapValue atomic types, and Collections class exposing extension functions api. It's easy to use this api. I'll repeat an example that I was showing earlier:

<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:f="http://www.nesterovsky-bros.com/xslt/functions/public"
  xmlns:p="http://www.nesterovsky-bros.com/xslt/functions/private"
  xmlns:c="java:com.nesterovskyBros.saxon.Functions"
  exclude-result-prefixes="xs f p c">

<xsl:template match="/">
   <root>
     <xsl:variable name="tuples" as="item()*" select="
       for $i in 1 to 20
         return c:tuple(1 to $i)"/>

    <total-items>
      <xsl:sequence select="
        sum
        (
          for $tuple in $tuples return
            count(c:tuple-items($tuple))
        )"/>
    </total-items>
    <tuples-size>
      <xsl:sequence select="count($tuples)"/>
    </tuples-size>
    <sums-per-tuples>
      <xsl:for-each select="$tuples">
        <xsl:variable name="index" as="xs:integer" select="position()"/>

        <sum index="{$index}" value="{sum(c:tuple-items(.))}"/>
      </xsl:for-each>
    </sums-per-tuples>

    <xsl:variable name="cities" as="element()*">
      <city name="Jerusalem" country="Israel"/>
      <city name="London" country="Great Britain"/>
      <city name="Paris" country="France"/>
      <city name="New York" country="USA"/>
      <city name="Moscow" country="Russia"/>
      <city name="Tel Aviv" country="Israel"/>
      <city name="St. Petersburg" country="Russia"/>
     </xsl:variable>

    <xsl:variable name="map" as="item()" select="
      c:map
      (
        for $city in $cities return
        (
          $city/string(@country),
          $city
        )
      )"/>

    <xsl:for-each select="c:map-keys($map)">
      <xsl:variable name="key" as="xs:string" select="."/>

      <country name="{$key}">
        <xsl:sequence select="c:map-value($map, $key)"/>
     </country>
    </xsl:for-each>
  </root>
</xsl:template>

</xsl:stylesheet>

Download java source.

P.S. I would wish this api be integrated into Saxon, as at present java extension functions are called through reflection.

Sunday, May 18, 2008 8:44:09 AM UTC  #    Comments [0] -
Announce | xslt
# Monday, May 12, 2008

Today I've found another new language (working draft in fact). It's an XML Pipeline Language.

XProc: An XML Pipeline Language, a language for describing operations to be performed on XML documents.

An XML Pipeline specifies a sequence of operations to be performed on zero or more XML documents. Pipelines generally accept zero or more XML documents as input and produce zero or more XML documents as output. Pipelines are made up of simple steps which perform atomic operations on XML documents and constructs similar to conditionals, iteration, and exception handlers which control which steps are executed.

An experience shows a process of language invention is an essential part of computer industry from the very beginning, however...

I must confess I must be too reluctant to any new language: I was happy with C++, but then all these new languages like Delphi, Java, C#, and so many others started to appear. It's correct to say that there is no efficient universal language, however I think it's wrong to say that a domain specific language is required to solve a particular problem in a most efficient way.

And now a question to the point: why do you need a new language for describing operations to be performed on XML documents?

Monday, May 12, 2008 7:11:58 AM UTC  #    Comments [0] -
xslt
# Friday, May 09, 2008
Георгиевская ленточка

A project I'm currently working on, requires me to manipulate with a big number of documents. This includes accessing these documents with key() function.

I never thought this task poses any problem, until I've discovered that Saxon caches documents loaded using document() function to preserve their identities:

By default, this function is ·stable·. Two calls on this function return the same document node if the same URI Reference (after resolution to an absolute URI Reference) is supplied to both calls. Thus, the following expression (if it does not raise an error) will always be true:

doc("foo.xml") is doc("foo.xml")

However, for performance reasons, implementations may provide a user option to evaluate the function without a guarantee of stability. The manner in which any such option is provided is implementation-defined. If the user has not selected such an option, a call of the function must either return a stable result or must raise an error: [err:FODC0003].

Saxon provides a saxon:discard-document() function to release documents from cache. The use case is like this:

<xsl:variable name="document" as="document-node()"
   select="saxon:discard-document(document(...))"/>

You may see, that saxon:discard-document() is bound to a place where document is loaded. In my case this is inefficient, as my code repeatedly accesses documents from different places. To release loaded documents I need to collect them after main processing.

Other issue in Saxon is that, processor may keep document references through xsl:key, thus saxon:discard-document() provides no guaranty of documents to be garbage collected.

To deal with this, I've designed (Saxon specific) api to manage document pools:

t:begin-document-pool-scope() as item()
  Begins document pool scope.
    Returns scope id.

t:end-document-pool-scope(scope as item())
  Terminates document pool scope.
    $scope - scope id.

t:put-document-in-pool(document as document-node()) as document-node()
  Puts a document into a current scope of document pool.
    $document - a document to put into the document pool.
    Returns the same document node.

The use case is:

<xsl:variable name="scope" select="t:begin-document-pool-scope()"/>

<xsl:sequence select="t:assert($scope)"/>

...
<xsl:variable name="document" as="document-node()"
  select="t:put-document-in-pool(...)"/>
...

<xsl:sequence select="t:end-document-pool-scope($scope)"/>

Download document-pool.xslt to use this api.

Friday, May 09, 2008 6:58:29 AM UTC  #    Comments [0] -
xslt
# Saturday, May 03, 2008

I was already writing about the logical difference between tamplates and functions. This time I've realized another, technical one. It's related to lazy evaluation, permitted by language specification.

I was arguing as follows:

  • suppose you define a function returning a sequence;
  • this function at final step constructs document using xsl:result-document;
  • caller invokes this function and uses only first item of sequence;
  • lazy evaluation allows to xslt processor to calculate first item only, thus to avoid creation of output document altogether.

This conclusion looked ridiculous to me, as it means that I cannot reliably expect creation of documents built with xsl:result-document instruction.

To resolve the issue I've checked specification. Someone has already thought of this. This is what specification says:

[Definition: Each instruction in the stylesheet is evaluated in one of two possible output states: final output state or temporary output state].

[Definition: The first of the two output states is called final output state. This state applies when instructions are writing to a final result tree.]

[Definition: The second of the two output states is called temporary output state. This state applies when instructions are writing to a temporary tree or any other non-final destination.]

The instructions in the initial template are evaluated in final output state. An instruction is evaluated in the same output state as its calling instruction, except that xsl:variable, xsl:param, xsl:with-param, xsl:attribute, xsl:comment, xsl:processing-instruction, xsl:namespace, xsl:value-of, xsl:function, xsl:key, xsl:sort, and xsl:message always evaluate the instructions in their contained sequence constructor in temporary output state.

[ERR XTDE1480] It is a non-recoverable dynamic error to evaluate the xsl:result-document instruction in temporary output state.

As you can see, xsl:function is always evaluated in temporary output state, and cannot contain xsl:result-document, in contrast to xsl:template, which may be evaluated in final output state. This difference dictates the role of templates as a "top level functions" and functions as standalone algorithms.

You can find more on subject at "Lazy evaluation and predicted results".

Saturday, May 03, 2008 4:36:38 PM UTC  #    Comments [0] -
xslt
# Wednesday, April 30, 2008

In the era of parallel processing it's so natural to inscribe your favorite programming language in the league of "Multithreading supporter". I've seen such appeals before "Wide Finder in XSLT --> deriving new requirements for efficiency in XSLT processors."

... I am not aware of any XSLT implementation that provides explicit or implicit support for parallel processing (with the obvious goal to take advantage of the multi-core processors that have almost reached a "prevalent" status today) ...

I think both xslt and xquery are well fitted for parrallel processing in terms of type system. This is because of "immutable" nature (until recent additions) of the execution state, which prevents many race conditions. The only missing ingredients are indirect function call, and a couple of core functions to queue parallel tasks.

Suppose there is a type to encapsulate a function call (say function-id), and a function accepting a sequence and a function-id. This function calls function-id for each element of the sequence in a parallel way, and then combines a final result, as if it were implemented serially.

Pretty simple, isn't it?

<!--
  This function runs $id function for each item in a sequence.
    $items - items to process.
    $id - function id.
    Returns a sequece of results of calls to $id function.
-->
<xsl:function name="x:queue-tasks" as="items()*">
  <xsl:param name="items" as="item()*"/>
  <xsl:param name="id" as="x:function-id"/>

  <!-- The pseudo code. -->
  <xsl:sequence select="$items/call $id (.)"/>
</xsl:function>

Wednesday, April 30, 2008 6:59:29 AM UTC  #    Comments [0] -
xslt
# Saturday, April 05, 2008

Yesterday's idea has inspired me as much as to create a prototype implementation of map and tuple in the xslt 2.0.

Definitely I wished these were a built-in types, and were considered as atomic values for purposes of comparasions and iteration. This way it were possible to create highly efficient grouping per several fields at once.

This pure implementation (xslt-tuple.zip) is rather scetchy, however it allows to feel what can be done with tuples and maps. I guess a good example may say more than many other words, so have a pleasure:

<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:f="http://www.nesterovsky-bros.com/xslt/functions"
  exclude-result-prefixes="xs f">

  <xsl:include href="tuple.xslt"/>
  <xsl:include href="map.xslt"/>

  <xsl:template match="/">
    <root>
      <xsl:variable name="tuples" as="item()*" select="
        f:tuple
        (
          for $i in 1 to 10 return
            f:tuple(1 to $i)
        )"/>

      <total-items>
        <xsl:sequence select="count($tuples)"/>
      </total-items>
      <tuples-size>
        <xsl:sequence select="f:tuple-size($tuples)"/>
      </tuples-size>
      <sums-per-tuples>
        <xsl:for-each select="1 to f:tuple-size($tuples)">
          <xsl:variable name="index" as="xs:integer" select="position()"/>

          <sum
            index="{$index}"
            value="{sum(f:tuple-items(f:tuple-item($tuples, $index)))}"/>
        </xsl:for-each>
      </sums-per-tuples>

      <xsl:variable name="cities" as="element()*">
        <city name="Jerusalem" country="Israel"/>
        <city name="London" country="Great Britain"/>
        <city name="Paris" country="France"/>
        <city name="New York" country="USA"/>
        <city name="Moscow" country="Russia"/>
        <city name="Tel Aviv" country="Israel"/>
        <city name="St. Petersburg" country="Russia"/>
      </xsl:variable>

      <xsl:variable name="map" as="item()*" select="
        f:map
        (
          for $city in $cities return
            ($city/string(@country), $city)
        )"/>

      <xsl:for-each select="f:map-keys($map)">
        <xsl:variable name="key" as="xs:string" select="."/>

        <country name="{$key}">
          <xsl:sequence select="f:map-value($map, $key)"/>
        </country>
      </xsl:for-each>
    </root>
  </xsl:template>

</xsl:stylesheet>

Saturday, April 05, 2008 3:49:03 PM UTC  #    Comments [0] -
xslt
# Friday, April 04, 2008

The type system of xslt 2.0 is not complete (see Sequence of sequences in xslt 2.0). You cannot perform manipulations over items as you could do. The reason is in the luck of set based constructs: xslt 2.0 supports sequences, but not associative maps of items.

If you think that xml can be used as a good approximation of a map, I shan't agree with you. Xml has an application in a very specific cases only. Maps I'm thinking of,  would allow associate items by reference, like sequences do.

This opens a perspective to create a state objects, to manage sequence of sequences, to create cyclic graphs of items, and so on. These maps are richer than what key() function provides right now, and allow to implement for-each-group in xquery.

Such maps can be modeled with several functions, however I would wish they were built in:

f:map($items as item()*) as item()
Returns a map from a sequence $items of pairs (key, value).

f:map-items($map as item()) as item()*
Returns a sequence of pairs (key, value) for a map $map.

f:map-keys($map as item()) as item()*
Returns a sequence of keys contained in a map $map.

f:map-values($map as item()) as item()*
Returns a sequence of values contained in a map $map.

f:map-value($map as item(), $key as item()) as item()*
Returns a sequence of values corresponding to a specified key $key contained a specified map $map.

The other thing I would add is items tuple. It's like a sequence, however a sequence of tuples is never transformed into single sequence, but stays as sequence of tuples.

Fortunately it's possible to implement such extension functions.

Friday, April 04, 2008 1:49:56 PM UTC  #    Comments [0] -
xslt
# Wednesday, April 02, 2008

xslt 2.0 is a beautiful language and at the same time it allows constructs, which may trouble anyone.

Look at this valid stylesheet:

<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xsl:template match="/">
    <xsl:variable name="x" as="node()" select="."/>
    <xsl:variable name="x" as="xs:int" select="***"/>

    <xsl:sequence select="$x"/>
  </xsl:template>

</xsl:stylesheet>

Fun, isn't it? :-)

Wednesday, April 02, 2008 5:45:28 AM UTC  #    Comments [0] -
xslt
# Monday, March 31, 2008

I was thinking earlier about the difference between named tamplates and functions in xslt 2.0 and have not found satisfactory criterion for a decision of what to use in each case. I was not first one who has troubled with this, see stylesheet functions or named templates.

To feel easy I deliberately have decided to use functions whenever possible, avoid named tamplates completely, and use matching templates to apply logic depending on context (something like virtual function). I've forgot about the issue until yesterday. To realize the difference one should stop thinking of it, quite opposite she must start solving practical xslt tasks, and if there is any difference, except syntactic, it will manifest itself somehow.

To make things obvious to those whose programming roots are in a language like C++ I shall compare xsl:function with free standing (or static) C++ function, and named xsl:template with C++ member function. In C++ you can use both free standing and member functions interchangeably, however if there is only one argument (among others) whose state transition this function represents then it's preferrable to define it as a member function. The most important difference between these two type of functions is that a member function has hidden argument "this", and is able to access its private state.

Please, do not try to think I'm going to compare template context item in xslt 2.0 with "this" in C++, quite opposite I consider context item as a part of a state. I'm arguing however, of private state that can be passed through template call chain with tunnel parameters. Think of a call tunneling some state (like options, flags, values), and that state accessed several levels deep in call hierarchy, whenever one needs to. You cannot do it with xsl:function, you cannot pass all private state through the function call, you just do not know of it.

This way my answer to the tacit question is:

  •  use xsl:function to perform independent unit of logic;
  •  use named xsl:template when a functionality is achieved cooperatively, and when you will possibly need to share the state between different implementation blocks;

After thinking through this, I've noticed that such distinction does not exist in XQuery 1.0. There is no tunneling there. :-)

Monday, March 31, 2008 6:54:22 AM UTC  #    Comments [0] -
xslt
# Tuesday, March 25, 2008

In the xslt world there is no widely used custom to think of stylesheet members as of public and private in contrast to other programming languages like C++/java/c# where access modifiers are essential. The reason is in complexity of stylesheets: the less size of code - the easier to developer to keep all details in memory. Whenever xslt program grows you should modularize it to keep it manageable.

At the point where modules are introduced one starts thinking of public interface of module and its implementation details. This separation is especially important for the template matching as you won't probably want to match private template just because you've forgotten about some template in implementation of some module.

To make public or private member distinction you can introduce two namespaces in your stylesheet, like:

For the private namespace you can use a unique name, e.g. stylesheet name as part of uri.

The following example is based on jxom. This stylesheet builds expression from expression tree. Public part consists only of t:get-expression function, other members are private:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:t="http://www.nesterovsky-bros.com/public"
  xmlns:p="http://www.nesterovsky-bros.com/private/expression.xslt"
  xmlns="http://www.nesterovsky-bros.com/download/jxom.zip"
  xpath-default-namespace="http://www.nesterovsky-bros.com/download/jxom.zip"
  exclude-result-prefixes="xs t p">

  <xsl:output method="text" indent="yes"/>

  <!-- Entry point. -->
  <xsl:template match="/">
    <xsl:variable name="expression" as="element()">
      <lt>
        <sub>
          <mul>
            <var name="b"/>
            <var name="b"/>
          </mul>
          <mul>
            <mul>
              <int>4</int>
              <var name="a"/>
            </mul>
            <var name="c"/>
          </mul>
        </sub>
        <double>0</double>
      </lt>
    </xsl:variable>

    <xsl:value-of select="t:get-expression($expression)" separator=""/>
  </xsl:template>

  <!--
    Gets expression.
      $element - expression element.
      Returns expression tokens.
  -->
  <xsl:function name="t:get-expression" as="item()*">
    <xsl:param name="element" as="element()"/>

    <xsl:apply-templates mode="p:expression" select="$element"/>
  </xsl:function>

  <!--
    Gets binary expression.
      $element - assignment expression.
      $type - expression type.
      Returns expression token sequence.
  -->
  <xsl:function name="p:get-binary-expression" as="item()*">
    <xsl:param name="element" as="element()"/>
    <xsl:param name="type" as="xs:string"/>

    <xsl:sequence select="t:get-expression($element/*[1])"/>
    <xsl:sequence select="' '"/>
    <xsl:sequence select="$type"/>
    <xsl:sequence select="' '"/>
    <xsl:sequence select="t:get-expression($element/*[2])"/>
  </xsl:function>

  <!-- Mode "expression". Empty match. -->
  <xsl:template mode="p:expression" match="@*|node()">
    <xsl:sequence select="error(xs:QName('invalid-expression'), name())"/>
  </xsl:template>

  <!-- Mode "expression". or. -->
  <xsl:template mode="p:expression" match="or">
    <xsl:sequence select="p:get-binary-expression(., '||')"/>
  </xsl:template>

  <!-- Mode "expression". and. -->
  <xsl:template mode="p:expression" match="and">
    <xsl:sequence select="p:get-binary-expression(