RSS 2.0
Sign In
# Sunday, 08 January 2023

While doing a migration of some big xslt 3 project into plain C# we run into a case that was not obvious to resolve.

Documents we process can be from a tiny to a moderate size. Being stored in xml they might take from virtually zero to, say, 10-20 MB.

In C# we may rewrite Xslt code virtually in one-to-one manner using standard features like XDocument, LINQ, regular classes, built-in collections, and so on. Clearly C# has a reacher repertoire, so task is easily solved unless you run into multiple opportunities to solve it.

The simplest solution is to use XDocument API to represent data at runtime, and use LINQ to query it. All features like xslt keys, templates, functions, xpath sequences, arrays and maps and primitive types are natuarally mapped into C# language and its APIs.

Taking several xslt transformations we could see that xslt to C# rewrite is rather straightforward and produces recognizable functional programs that have close C# source code size to their original Xslt. As a bonus C# lets you write code in asynchronous way, so C# wins in a runtime scalability, and in a design-time support.

But can you do it better in C#, especially when some data has well defined xml schemas?

The natural step, in our opinion, would be to produce C# plain object model from xml schema and use it for runtime processing. Fortunately .NET has xml serialization attributes and tools to produce classes from xml schemas. With small efforts we have created a relevant class hierarchy for a rather big xml schema. XmlSerializer is used to convert object model to and from xml through XmlReader and XmlWriter. So, we get typed replacement of generic XDocument that still supports the same LINQ API over collections of objects, and takes less memory at runtime.

The next step would be to commit a simple test like:

  • read object model;

  • transform it;

  • write it back.

We have created such tests both for XDocument and for object model cases, and compared results from different perspectives.

Both solution produce very similar code, which is also similar to original xslt both in style and size.

Object model has static typing, which is much better to support.

But the most unexpected outcome is that object model was up to 20% slower due to serialization and deserialization even with pregenerated xmlserializer assemblies. Difference of transformation performance and memory consumption was so unnoticable that it can be neglected. These results were confirmed with multiple tests, with multiple cycles including heating up cycles.

Here we run into a case where static typing harms more than helps. Because of the nature of our processing pipeline, which is offline batch, this difference can be mapped into 10th of minutes or even more.

Thus in this particular case we decided to stay with runtime typing as a more performant way of processing in C#.

Sunday, 08 January 2023 13:28:14 UTC  #    Comments [0] -
.NET | xslt
# Saturday, 16 April 2022

Xslt is oftentimes thought as a tool to take input xml, and run transformation to get html or some xml on output. Our use case is more complex, and is closer to a data mining of big data in batch. Our transformation pipelines often take hour or more to run even with SSD disks and with CPU cores fully loaded with work.

So, we're looking for performance opportunities, and xml vs json might be promising.

Here are our hypotheses:

  • json is lighter than xml to serialize and deserialize;
  • json stored as map(*), array(*) and other items() are ligher than node() at runtime, in particular subtree copy is zero cost in json;
  • templates with match patterns are efficiently can be implemented with maps();
  • there is incremental way forward from use of xml to use of json.

If it pays off we might be switching xml format to json all over, even though it is a development effort.

But to proceed we need to commit an experiment to measure processing speed of xml vs json in xslt.

Now our task is to find an isolated small representative sample to prove or reject our hypotheses.

Better to start off with some existing transformation, and change it from use of xml to json.

The question is whether there is such a candidate.

Saturday, 16 April 2022 19:03:04 UTC  #    Comments [0] -
Thinking aloud | xslt
# Monday, 08 June 2020

Not sure what is use of our Xslt Graph exercises but what we are sure with is that it stresses different parts of Saxon Xslt engine and helps to find and resolve different bugs.

While implementing biconnected components algorithm we incidently run into internal error with Saxon 10.1 with rather simple xslt:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:array="http://www.w3.org/2005/xpath-functions/array"
  exclude-result-prefixes="xs array">

  <xsl:template match="/">
    <xsl:sequence select="
      array:fold-left
      (
        [8, 9], 
        (), 
        function($first as item(), $second as item()) 
        {  
          min(($first, $second))
        }
      )"/>
  </xsl:template>

</xsl:stylesheet>

More detail can be found at Saxon's issue tracker: Bug #4578: NullPointerException when array:fold-left|right $zero argument is an empty sequence.

Bug is promptly resolved.

Monday, 08 June 2020 05:58:32 UTC  #    Comments [0] -
Tips and tricks | xslt
# Sunday, 24 May 2020

While working on algorithm to trace Biconnected components for Graph API in the XSLT  we realized that we implemented it unconventionally.

A pseudocode in Wikipedia is:

GetArticulationPoints(i, d)
    visited[i] := true
    depth[i] := d
    low[i] := d
    childCount := 0
    isArticulation := false

    for each ni in adj[i] do
        if not visited[ni] then
            parent[ni] := i
            GetArticulationPoints(ni, d + 1)
            childCount := childCount + 1
            if low[ni] ≥ depth[i] then
                isArticulation := true
            low[i] := Min (low[i], low[ni])
        else if ni ≠ parent[i] then
            low[i] := Min (low[i], depth[ni])
    if (parent[i] ≠ null and isArticulation) or (parent[i] = null and childCount > 1) then
        Output i as articulation point

That algorithm is based on the fact that connected graph can be represented as a tree of biconnected components. Vertices of such tree are called articulation points. Implementation deals with a depth of each vertex, and with a lowpoint parameter that is also related to vertex depth during Depth-First-Search.

Out of interest we approached to the problem from different perspective. A vertex is an articulation point if it has neighbors that cannot be combined into a path not containing this vertex. As well as classical algorithm we use Depth-First-Search to navigate the graph, but in contrast we collect cycles that pass through each vertex. If during back pass of Depth-First-Search we find not cycle from "child" to "ancestor" then it is necessary an articulation point.

Here is pseudocode:

GetArticulationPoints(v, p) -> result
    index = index + 1
    visited[v] = index 
    result = index
    articulation = p = null ? -1 : 0

    for each n in neighbors of v except p do
        if visited[n] = 0 then
            nresult = GetArticulationPoints(n, v)
            result = min(result, nresult)

            if nresult >= visited[v] then
                articulation = articulation + 1
        else
            result = min(result, visited[n])

    if articulation > 0 then
        Output v as articulation point

Algorithms' complexity are the same.

What is interesting is that we see no obvious way to transform one algorithm into the other except from starting from Graph theory.

More is on Wiki.

Sunday, 24 May 2020 12:15:02 UTC  #    Comments [0] -
Thinking aloud | xslt
# Tuesday, 19 May 2020

Michael Key's "A Proposal for XSLT 4.0" has spinned our interest in what could be added or changed in XSLT. This way we decided to implement Graph API purely in xslt. Our goal was to prove that:

  • it's possible to provide efficient implementation of different Graph Algorithms in XSLT;
  • to build Graph API the way that engine could provide native implementations of Grahp Algorithms.
  • to find through an experiments what could be added to XSLT as a language.

At present we may confirm that first two goals are reachable; and experiments have shown that XSLT could provide more help to make program better, e.g. we have seen that language could simplify coding cycles.

Graph algorithms are often expressed with while cycles, e.g "Dijkstra's algorithm" has:

12      while Q is not empty:
13          u ← vertex in Q with min dist[u]  

body is executed when condition is satisfied, but condition is impacted by body itself.

In xslt 3.0 we did this with simple recursion:

<xsl:template name="f:while" as="item()*">
  <xsl:param name="condition" as="function(item()*) as xs:boolean"/>
  <xsl:param name="action" as="function(item()*) as item()*"/>
  <xsl:param name="next" as="function(item()*, item()*) as item()*"/>
  <xsl:param name="state" as="item()*"/>

  <xsl:if test="$condition($state)">
    <xsl:variable name="items" as="item()*" select="$action($state)"/>

    <xsl:sequence select="$items"/>

    <xsl:call-template name="f:while">
      <xsl:with-param name="condition" select="$condition"/>
      <xsl:with-param name="action" select="$action"/>
      <xsl:with-param name="next" select="$next"/>
      <xsl:with-param name="state" select="$next($state, $items)"/>
    </xsl:call-template>
  </xsl:if>
</xsl:template>

But here is the point. It could be done in more comprehended way. E.g. to let xsl:iterate without select to cycle until xsl:break is reached.

<xsl:iterate>
  <xsl:param name="name" as="..." value="..."/>
  
  <xsl:if test="...">
    <xsl:break/>
  </xsl:if>

  ...
</xsl:iterate>

So, what we propose is to let xsl:iterate/@select to be optional, and change the behavior of processor when the attribute is missing from compilation error to a valid behavior. This should not impact on any existing valid XSLT 3.0 program.

Tuesday, 19 May 2020 07:00:25 UTC  #    Comments [0] -
Thinking aloud | xslt
# Tuesday, 12 May 2020

Recently we've read an article "A Proposal for XSLT 4.0", and thought it worth to suggest one more idea. We have written a message to Michael Kay, author of this proposal. Here it is:

A&V

Historically xslt, xquery and xpath were dealing with trees. Nowadays it became much common to process graphs. Many tasks can be formulated in terms of graphs, and in particular any task processing trees is also graph task.

I suggest to take a deeper look in this direction.

As an inspiration I may suggest to look at "P1709R2: Graph Library" - the C++ proposal.


Michael Kay

I have for many years found it frustrating that XML is confined to hierarchic relationships (things like IDREF and XLink are clumsy workarounds); also the fact that the arbitrary division of data into "documents" plays such a decisive role: documents should only exist in the serialized representation of the model, not in the model itself.

I started my career working with the Codasyl-defined network data model. It's a fine and very flexible data model; its downfall was the (DOM-like) procedural navigation language. So I've often wondered what one could do trying to re-invent the Codasyl model in a more modern idiom, coupling it with an XPath-like declarative access language extended to handle networks (graphs) rather than hierarchies.

I've no idea how close a reinventiion of Codasyl would be to some of the modern graph data models; it would be interesting to see. The other interesting aspect of this is whether you can make it work for schema-less data.

But I don't think that would be an incremental evolution of XSLT; I think it would be something completely new.


A&V

I was not so radical in my thoughts. :-)

Even C++ API is not so radical, as they do not impose hard requirements on internal graph representation but rather define template API that will work both with third party representations (they even mention Fortran) or several built-in implementations that uses standard vectors.

Their strong point is in algorithms provided as part of library and not graph internal structure (I think authors of that paper have structured it not the best way). E.g. in the second part they list graph algorithms: Depth First Search (DFS); Breadth First Search (BFS); Topological Sort (TopoSort); Shortest Paths Algorithms; Dijkstra Algorithms; and so on.

If we shall try to map it to xpath world them graph on API level might be represented as a user function or as a map of user functions.

On a storage level user may implement graph using a sequence of maps or map of maps, or even using xdm elements.

So, my approach is evolutional. In fact I suggest pure API that could even be implemented now.


Michael Kay

Yes, there's certainly scope for graph-oriented functions such as closure($origin, $function) and is-reachable($origin, $function) and find-path($origin, $destination, $function) where we use the existing data model, treating any item as a node in a graph, and representing the arcs using functions. There are a few complications, e.g. what's the identity comparison between arbitrary items, but it can probably be done.


A&V
> There are a few complications, e.g. what's the identity comparison between arbitrary items, but it can probably be done.

One approach to address this is through definition of graph API. E.g. to define graph as a map (interface analogy) of functions, with equality functions, if required:

map
{
  vertices: function(),
  edges: function(),
  value: function(vertex),
  in-vertex: function(edge),
  out-vertex: function(edge),
  edges: function(vertex),
  is-in-vertex: function(edge, vertex),
  is-out-vertex: function(edge, vertex)
  ...
}

Not sure how far this will go but who knows.

Tuesday, 12 May 2020 06:08:51 UTC  #    Comments [0] -
Thinking aloud | xslt
# Sunday, 24 March 2019

This story started half year ago when Michael Kay, author of Saxon XSLT processor, was dealing with performance in multithreaded environment. See Bug #3958.

The problem is like this.

Given XSLT:

<xsl:stylesheet exclude-result-prefixes="#all" 
  version="3.0" 
  xmlns:saxon="http://saxon.sf.net/"
  xmlns:xs="http://www.w3.org/2001/XMLSchema" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output method="text" />

  <xsl:template name="main">
    <xsl:for-each saxon:threads="4" select="1 to 10">
      <xsl:choose>
        <xsl:when test=". eq 1">
          <!-- Will take 10 seconds -->
          <xsl:sequence select="
            json-doc('https://httpbin.org/delay/10')?url"/>
        </xsl:when>
        <xsl:when test=". eq 5">
          <!-- Will take 9 seconds -->
          <xsl:sequence select="
            json-doc('https://httpbin.org/delay/9')?url"/>
        </xsl:when>
        <xsl:when test=". eq 10">
          <!-- Will take 8 seconds -->
          <xsl:sequence select="
            json-doc('https://httpbin.org/delay/8')?url"/>
        </xsl:when>
      </xsl:choose>
    </xsl:for-each>
    <xsl:text>
</xsl:text>
  </xsl:template>
</xsl:stylesheet>

Implement engine to achieve best performance of parallel for-each.

Naive implementation that will distribute iterations per threads will run into unfair load on threads, so some load-balancing is required. That was the case Saxon EE.

Michael Kay has been trying to find most elegant way for the implementation and has written the comment:

I can't help feeling that the answer to this must lie in using the Streams machinery, and Spliterators in particular. I've spent another hour or so reading all about Spliterators, and I have to confess I really don't understand the paradigm. If someone can enlighten me, please go ahead...

We have decided to take the challange and to model the expected behavior using Streams. Here is our go:

import java.util.stream.IntStream;
import java.util.stream.Stream;
import java.util.function.Consumer;
import java.util.function.Function;

public class Streams
{
  public static class Item<T>
  {
    public Item(int index, T data)
    {
      this.index = index;
      this.data = data;
    }
    
    int index;
    T data;
  }

  public static void main(String[] args)
  {
    run(
      "Sequential",
      input(),
      Streams::action,
      Streams::output,
      true);
    
    run(
      "Parallel ordered", 
      input().parallel(),
      Streams::action,
      Streams::output,
      true);
    
    run(
      "Parallel unordered", 
      input().parallel(),
      Streams::action,
      Streams::output,
      false);    
  }
  
  private static void run(
    String description,
    Stream<Item<String>> input,
    Function<Item<String>, String[]> action,
    Consumer<String[]> output,
    boolean ordered)
  {
    System.out.println(description);
    
    long start = System.currentTimeMillis();
   
    if (ordered)
    {
      input.map(action).forEachOrdered(output);
    }
    else
    {
      input.map(action).forEach(output);
    }
    
    long end = System.currentTimeMillis();
    
    System.out.println("Execution time: " + (end - start) + "ms.");
    System.out.println();
  }
  
  private static Stream<Item<String>> input()
  {
    return IntStream.range(0, 10).
      mapToObj(i -> new Item<String>(i + 1, "Data " + (i + 1)));
  }
  
  private static String[] action(Item<String> item)
  {
    switch(item.index)
    {
      case 1:
      {
        sleep(10);
        
        break;
      }
      case 5:
      {
        sleep(9);
        
        break;
      }
      case 10:
      {
        sleep(8);
        
        break;
      }
      default:
      {
        sleep(1);
        
        break;
      }
    }
    
    String[] result = { "data:", item.data, "index:", item.index + "" };
    
    return result;
  }
  
  private synchronized static void output(String[] value)
  {
    boolean first = true;
    
    for(String item: value)
    {
      if (first)
      {
        first = false;
      }
      else
      {
        System.out.print(' ');
      }
    
      System.out.print(item);
    }

    System.out.println();
  }
  
  private static void sleep(int seconds)
  {
    try
    {
      Thread.sleep(seconds * 1000);
    }
    catch(InterruptedException e)
    {
      throw new IllegalStateException(e);
    }
  }
}

We model three cases:

"Sequential"
slowest, single threaded execution with output:
data: Data 1 index: 1
data: Data 2 index: 2
data: Data 3 index: 3
data: Data 4 index: 4
data: Data 5 index: 5
data: Data 6 index: 6
data: Data 7 index: 7
data: Data 8 index: 8
data: Data 9 index: 9
data: Data 10 index: 10
Execution time: 34009ms.
"Parallel ordered"
fast, multithread execution preserving order, with output:
data: Data 1 index: 1
data: Data 2 index: 2
data: Data 3 index: 3
data: Data 4 index: 4
data: Data 5 index: 5
data: Data 6 index: 6
data: Data 7 index: 7
data: Data 8 index: 8
data: Data 9 index: 9
data: Data 10 index: 10
Execution time: 10019ms.
"Parallel unordered"
fastest, multithread execution not preserving order, with output:
data: Data 6 index: 6
data: Data 2 index: 2
data: Data 4 index: 4
data: Data 3 index: 3
data: Data 9 index: 9
data: Data 7 index: 7
data: Data 8 index: 8
data: Data 5 index: 5
data: Data 10 index: 10
data: Data 1 index: 1
Execution time: 10001ms.

What we can add in conclusion is that xslt engine could try automatically decide what approach to use, as many SQL engines are doing, and not to force developer to go into low level engine details.

Sunday, 24 March 2019 07:52:02 UTC  #    Comments [0] -
Java | Thinking aloud | xslt
# Saturday, 03 November 2018

Recently we observed how we solved the same task in different versions of XPath: 2.0, 3.0, and 3.1.

Consider, you have a sequence $items, and you want to call some function over each item of the sequence, and to return combined result.

In XPath 2.0 this was solved like this:

for $item in $items return
  f:func($item)

In XPath 3.0 this was solved like this:

$items!f:func(.)

And now with XPath 3.1 that defined an arrow operator => we attempted to write something as simple as:

$items=>f:func()

That is definitely not working, as it is the same as f:func($items).

Next attempt was:

$items!=>f:func()

That even does not compile.

So, finally, working expression using => looks like this:

$items!(.=>f:func())

This looks like a step back comparing to XPath 3.0 variant.

More than that, XPath grammar of arrow operator forbids the use of predictes, axis or mapping operators, so this won't compile:

$items!(.=>f:func()[1])
$items!(.=>f:func()!something)

Our conclusion is that arrow operator is rather confusing addition to XPath.

Saturday, 03 November 2018 20:59:28 UTC  #    Comments [0] -
Thinking aloud | xslt
# Tuesday, 02 October 2018

Xslt 3.0 defines a feature called streamability: a technique to write xslt code that is able to handle arbitrary sized inputs.

This contrasts with conventional xslt code (and xslt engines) where inputs are completely loaded in memory.

To make code streamable a developer should declare her code as such, and the code should pass Streamability analysis.

The goal is to define subset of xslt/xpath operations that allow to process input in one pass.

In simple case it's indeed a simple task to verify that code is streamable, but the more complex your code is the less trivial it's to witness it is streamable.

On the forums we have seen a lot of discussions, where experts were trying to figure out whether particular xslt is streamable. At times it's remarkably untrivial task!

This, in our opinion, clearly manifests that the feature is largerly failed attempt to inscribe some optimization technique into xslt spec.

The place of such optimization is in the implementation space, and not in spec. Engine had to attempt such optimization and fallback to the traditional implementation.

The last such example is: Getting SXST0060 "No streamable path found in expression" when trying to push a map with grounded nodes to a template of a streamable mode, where both xslt code and engine developers are not sure that the code is streamable in the first place.

By the way, besides streamability there is other optimization technique that works probably in all SQL engines. When data does not fit into memory engine may spill it on disk. Thus trading memory pressure for disk access. So, why didn't such techninque find the way into the Xslt or SQL specs?

Tuesday, 02 October 2018 12:50:22 UTC  #    Comments [0] -
Thinking aloud | xslt
# Friday, 28 September 2018

Saxon 9.9.0-1 is out!

Shortly we have reported our first bug in the new version. See https://saxonica.plan.io/issues/3923.

Friday, 28 September 2018 17:47:37 UTC  #    Comments [0] -
xslt
# Thursday, 27 September 2018

After 17 years of experience we still run into dummy bugs in xslt (xpath in fact).

The latest one is related to order of nodes produced by ancestor-or-self axis.

Consider the code:

<xsl:stylesheet version="3.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xsl:template match="/">
    <xsl:variable name="data" as="element()">
      <a>
        <b>
          <c/>
        </b>
      </a>
    </xsl:variable>

    <xsl:variable name="item" as="element()" select="($data//c)[1]"/>

    <xsl:message select="$item!ancestor-or-self::*!local-name()"/>
    <xsl:message select="$item!local-name(), $item!..!local-name(), $item!..!..!local-name()"/>
  </xsl:template>

</xsl:stylesheet>

We expected to have the following outcome

  • c b a
  • c b a

But correct one is

  • a b c
  • c b a

Here is why:

ancestor-or-self::* is an AxisStep. From XPath §3.3.2:

[Definition: An axis step returns a sequence of nodes that are reachable from the context node via a specified axis. Such a step has two parts: an axis, which defines the "direction of movement" for the step, and a node test, which selects nodes based on their kind, name, and/or type annotation.] If the context item is a node, an axis step returns a sequence of zero or more nodes; otherwise, a type error is raised [err:XPTY0020]. The resulting node sequence is returned in document order.

For some reason we were thinking that reverse axis produces result in reverse order. It turns out the reverse order is only within predicate of such axis.

See more at https://saxonica.plan.io/boards/3/topics/7312

Thursday, 27 September 2018 05:52:58 UTC  #    Comments [0] -
xslt
# Tuesday, 18 September 2018

XPath 3 has introduced a syntactic sugar for a string concatenation, so following:

concat($a, $b)

can be now written as:

$a || $b

This is nice addition, except when you run into a trouble. Being rooted in C world we unintentionally have written a following xslt code:

<xsl:if test="$a || $b">
...
</xsl:if>

Clearly, we intended to write $a or $b. In contrast $a || $b is evaluated as concat($a, $b). If both variables are false() we get 'falsefalse' outcome, which has effective boolean value true(). This means that test condition of xsl:if is always true().

What can be done to avoid such unfortunate typo, which is manifested in no way neither during compilation nor during runtime?

The answer is to issue informational message during the compilation, e.g. if result of || operator is converted to a boolean, and if its arguments are booleans also then chances are high this is typo, and not intentional expression.

We adviced to implement such message in the saxon processor (see https://saxonica.plan.io/boards/3/topics/7305).

Tuesday, 18 September 2018 12:23:08 UTC  #    Comments [0] -
xslt
# Thursday, 13 September 2018

It seems we've found discrepancy in regex implementation during the transformation in Saxon. Consider the following xslt:

<xsl:stylesheet version="3.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xsl:template match="/">
    <xsl:variable name="text" as="xs:string" 
      select="'A = &quot;a&quot; OR B = &quot;b&quot;'"/>

    <xsl:analyze-string regex="&quot;(\\&quot;|.)*?&quot;" select="$text">
      <xsl:matching-substring>
        <xsl:message>
          <xsl:sequence select="regex-group(0)"/>
        </xsl:message>
      </xsl:matching-substring>
    </xsl:analyze-string>
  </xsl:template>

</xsl:stylesheet>
vs javascript
<html>
<body>
  <script>
    var text = 'A = "a" OR B = "b"';
    var regex = /"(\\"|.)*?"/;
    var match = text.match(regex);

    alert(match[0]);
  </script>
</body>
</html>

xslt produces: "a" OR B = "b"

while javascript: "a"

What is interesting is that we're certain this was working correctly in Saxon several years ago.

You can track progress of the bug at: https://saxonica.plan.io/boards/3/topics/7300 and at https://saxonica.plan.io/issues/3902.

Thursday, 13 September 2018 06:29:05 UTC  #    Comments [0] -
xslt
# Tuesday, 27 June 2017

We've found that there is a Saxon HE update that was going to fix problems we mentioned in the previous post, and decided to give it a second chance.

Now Saxon fails with two other errors:

We shall be waiting for the fixes. Mean time we're back to version 9.7.

Tuesday, 27 June 2017 22:30:28 UTC  #    Comments [0] -
Announce | xslt
# Wednesday, 14 June 2017
Finally, Saxon 9.8 is out!
This means that basic xslt 3 is available in the HE version.

Update: as usually, each new release has new bugs...
See https://saxonica.plan.io/boards/3/topics/6809
Wednesday, 14 June 2017 21:05:51 UTC  #    Comments [0] -
xslt
# Tuesday, 16 May 2017

We have found that Saxon HE 9.7.0-18 has finally exposed partial support to map and array item types. So, now you can encapsulate your data in sequence rather than having a single sequence and treating odd and even elements specially.

Basic example is:

<xsl:stylesheet version="3.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:t="t"
  xmlns:map="http://www.w3.org/2005/xpath-functions/map"
  exclude-result-prefixes="xs t map">

  <xsl:template match="/">
    <xsl:variable name="map" as="map(xs:string, xs:string)" select="
      map 
      {
        'Su': 'Sunday',
        'Mo': 'Monday',
        'Tu': 'Tuesday',
        'We': 'Wednesday',
        'Th': 'Thursday',
        'Fr': 'Friday',
        'Sa': 'Saturday'
      }"/>
      
     <xsl:message select="map:keys($map)"/>
  </xsl:template>  

</xsl:stylesheet>

A list of map functions can be found here http://www.w3.org/2005/xpath-functions/map/, though not all are available, as Saxon HE still does not allow inline functions.

P.S. From the development perspective it's a great harm that Saxon HE is so limited. Basically limited to xslt 2.0 + some selected parts of 3.0.

Tuesday, 16 May 2017 06:20:48 UTC  #    Comments [0] -
Thinking aloud | xslt
# Sunday, 26 March 2017

Lately we do not program in XSLT too often but rather in java, C#, SQL and javascript, but from time to time we have tasks in XSLT.

People claim that those languages are too different and use this argument to explain why XSLT is only a niche language. We, on the other hand, often spot similarities between them.

So, what it is in other languages that is implemented as tunnel parameters in XSLT?

To get an answer we reiterated how they work in XSLT, so, you:

  • define a template with parameters marked as tunnel="yes";
  • use these parameters the same way as regular parameters;
  • pass template parameters down to other templates marking them as tunnel="yes";

The important difference of regular template parameters from tunnel parameters is that the tunnel parameters are implicitly passed down the call chain of templates. This means that you:

  • define your API that is expected to receive some parameter;
  • pass these parameters somewhere high in the stack, or override them later in the stack chain;
  • do not bother to propagate them (you might not even know all of the tunnel parameters passed, so encapsulation is in action);

As a result we have a template with some parameters passed explicitly, and some others are receiving values from somewhere, usually not from direct caller. It’s possible to say that these tunnel parameters are injected into a template call. This resembles a lot injection API in other languages where you configure that some parameters are prepared for you by some container rather then by direct caller.

Now, when we have expressed this idea it seems so obvious but before we thought of this we did not realize that tunnel parameters in XSLT and Dependency Injection in other languages are the same thing.

Sunday, 26 March 2017 04:21:36 UTC  #    Comments [0] -
Thinking aloud | xslt
# Wednesday, 21 December 2016

Recently we have found and fixed a bug in unreachable statement optimization in jxom.

Latest version of stylesheets can be found at github.com languages-xom.

Wednesday, 21 December 2016 22:10:06 UTC  #    Comments [0] -
xslt
# Friday, 03 June 2016

Good bad and good news.

Good: recently a new version Saxon XSLT processor was published:

12 May 2016
Saxon 9.7.0.5 maintenance release for Java and .NET.

Bad: we run that release on our code base and found a bug:

See Internal error in Saxon-HE-9.7.0-5

Good: Michael Kay has confirmed the problem and even fixed it:

See Bug #2770

The only missing ingredient is when the patch will be available to the public:

"We tend to do a new maintenance release every 4-6 weeks. Can't commit to firm dates."

Friday, 03 June 2016 21:09:10 UTC  #    Comments [0] -
xslt
# Tuesday, 09 February 2016

Visitor pattern is often used to separate operation from object graph it operates with. Here we assume that the reader is familiar with the subject.

The idea is like this:

  • The operation over object graph is implemented as type called Visitor.
  • Visitor defines methods for each type of object in the graph, which a called during traversing of the graph.
  • Traversing over the graph is implemented by a type called Traverser, or by the Visitor or by each object type in the graph.

Implementation should collect, aggregate or perform other actions during visit of objects in the graph, so that at the end of the visit the purpose of operation will be complete.

Such implementation is push-like: you create operation object and call a method that gets object graph on input and returns operation result on output.

In the past we often dealt with big graphs (usually these are virtual graphs backended at database or at a file system).

Also having a strong experience in the XSLT we see that the visitor pattern in OOP is directly mapped into xsl:template and xsl:apply-templates technique.

Another thought was that in XML processing there are two camps:

  • SAX (push-like) - those who process xml in callbacks, which is very similar to visitor pattern; and
  • XML Reader (pull-like) - those who pull xml components from a source, and then iterate and process them.

As with SAX vs XML Reader or, more generally, push vs pull processing models, there is no the best one. One or the other is preferable in particular circumstances. E.g. Pull like component fits into a transformation pipeline where one pull component has another as its source; another example is when one needs to process two sources at once, which is untrivial with push like model. On the other hand push processing fits better into Reduce part of MapReduce pattern where you need to accumulate results from source.

So, our idea was to complete classic push-like visitor pattern with an example of pull-like implementation.

For the demostration we have selected Java language, and a simplest boolean expression calculator.

Please follow GitHub nesterovsky-bros/VisitorPattern to see the detailed explanation.

Tuesday, 09 February 2016 12:37:10 UTC  #    Comments [0] -
Java | Thinking aloud | xslt
# Monday, 04 January 2016

Essence of the problem (see Error during transformation in Saxon 9.7, thread on forum):

  1. XPath engine may arbitrary reorder predicates whose expressions do not depend on a context position.
  2. While an XPath expression $N[@x castable as xs:date][xs:date(@x) gt xs:date("2000-01-01")] cannot raise an error if it's evaluated from the left to right, an expression with reordered predicates $N[xs:date(@x) gt xs:date("2000-01-01")][@x castable as xs:date] may generate an error when @x is not a xs:date.

To avoid a potential problem one should rewrite the expression like this: $N[if (@x castable as xs:date) then xs:date(@x) gt xs:date("2000-01-01") else false()].

Please note that the following rewrite will not work: $N[(@x castable as xs:date) and (xs:date(@x) gt xs:date("2000-01-01"))], as arguments of and expression can be evaluated in any order, and error that occurs during evaluation of any argument may be propageted.

With these facts we faced a task to check our code base and to fix possible problems.

A search has brought ~450 instances of XPath expessions that use two or more consequtive predicates. Accurate analysis limited this to ~20 instances that should be rewritten. But then, all of sudden, we have decided to commit an experiment. What if we split XPath expression in two sub expressions. Can error still resurface?

Consider:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xsl:variable name="elements" as="element()+"><a/><b value="c"/></xsl:variable>

  <xsl:template match="/">
    <xsl:variable name="a" as="element()*" select="$elements[self::d or self::e]"/>
    <xsl:variable name="b" as="element()*" select="$a[xs:integer(@value) = 1]"/>

    <xsl:sequence select="$b"/>
  </xsl:template>

</xsl:stylesheet>

As we expected Saxon 9.7 internally assembles a final XPath with two predicates and reorders them. As result we get an error:

Error at char 20 in xsl:variable/@select on line 8 column 81 of Saxon9.7-filter_speculation.xslt:
  FORG0001: Cannot convert string "c" to an integer

This turn of events greately complicates the code review we have to commit.

Michiel Kay's answer to this example:

I think your argument that the reordering is inappropriate when the expression is written using variables is very powerful. I shall raise the question with my WG colleagues.

In fact we think that either: reordering of predicates is inappropriate, or (weaker, to allow reordering) to treat an error during evaluation of predicate expression as false(). This is what is done in XSLT patterns. Other solutions make XPath less intuitive.

In other words we should use XPath (language) to express ideas, and engine should correctly and efficiently implement them. So, we should not be forced to rewrite expression to please implementation.

Monday, 04 January 2016 10:07:12 UTC  #    Comments [0] -
Thinking aloud | xslt
# Saturday, 02 January 2016

On December, 30 we have opened a thread in Saxon help forum that shows a stylesheet generating an error. This is the stylesheet:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xsl:variable name="elements" as="element()+"><a/><b value="c"/></xsl:variable>

  <xsl:template match="/">
    <xsl:sequence select="$elements[self::d or self::e][xs:integer(@value) = 1]"/>
  </xsl:template>

</xsl:stylesheet>

We get an error:

Error at char 47 in xsl:sequence/@select on line 7 column 83 of Saxon9.7-filter_speculation.xslt:
  FORG0001: Cannot convert string "c" to an integer
Exception in thread "main" ; SystemID: .../Saxon9.7-filter_speculation.xslt; Line#: 7; Column#: 47
ValidationException: Cannot convert string "c" to an integer
  at ...

It's interesting that error happens in Saxon 9.7 but not in earlier versions.

The answer we got was expected but disheartening:

The XPath specification (section 2.3.4, Errors and Optimization) explicitly allows the predicates of a filter expression to be reordered by an optimizer. See this example, which is very similar to yours:

The expression in the following example cannot raise a casting error if it is evaluated exactly as written (i.e., left to right). Since neither predicate depends on the context position, an implementation might choose to reorder the predicates to achieve better performance (for example, by taking advantage of an index). This reordering could cause the expression to raise an error.

$N[@x castable as xs:date][xs:date(@x) gt xs:date("2000-01-01")]

Following the spec, Michael Kay advices us to rewrite XPath:

$elements[self::d or self::e][xs:integer(@value) = 1]

like this:

$elements[if (self::d or self::e) then xs:integer(@value) = 1 else false()]

Such subtleties make it hard to reason about and to teach XPath. We doubt many people will spot the difference immediately.

We think that if such optimization was so much important to spec writers, then they had to change filter rules to treat failed predicates as false(). This would avoid any obscure differences in these two, otherwise equal, expressions. In fact something similar already exists with templates where failed evaluation of pattern is treated as un-match.

Saturday, 02 January 2016 21:32:16 UTC  #    Comments [0] -
Thinking aloud | xslt
# Monday, 24 August 2015

It's time to align csharpxom to the latest version of C#. The article New Language Features in C# 6 sums up what's being added.

Sources can be found at nesterovsky-bros/languages-xom, and C# model is at csharp folder.

In general we feel hostile to any new features until they prove they bring an added value. So, here our list of new features from most to least useless:

  1. String interpolation

    var s = $"{p.Name} is {p.Age} year{{s}} old";

    This is useless, as it does not account resource localization.

  2. Null-conditional operators

    int? first = customers?[0].Orders?.Count();

    They claim to reduce cluttering from null checks, but in our opinion it looks opposite. It's better to get NullReferenceException if arguments are wrong.

  3. Exception filters

    private static bool Log(Exception e) { /* log it */ ; return false; }

    try { … } catch (Exception e) when (Log(e)) {}

    "It is also a common and accepted form of “abuse” to use exception filters for side effects; e.g. logging."

    Design a feature for abuse just does not tastes good.

  4. Expression-bodied function and property members.

    public Point Move(int dx, int dy) => new Point(x + dx, y + dy);
    public string Name => First + " " + Last;

    Not sure it's that usefull.

Monday, 24 August 2015 10:52:07 UTC  #    Comments [0] -
.NET | Announce | Java | xslt
# Friday, 31 July 2015

Taking into an account that we use Saxon for many years, it was strange to run into so simple error like the following:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/">
    <xsl:variable name="doc" as="element()+"><a/><b/><c/></xsl:variable>   
    <xsl:sequence select="$doc = 3"/>
  </xsl:template>
</xsl:stylesheet>

This is a simplified case that should produce an dynamic error FORG0001 as per General Comparisions; the real code is more complex, as it uses SFINAE and continues.

This case crushes in Saxon with exception:

Exception in thread "main" java.lang.RuntimeException: 
      Internal error evaluating template  at line 3 in module ICE9.6.xslt
    at net.sf.saxon.expr.instruct.Template.applyLeavingTail()
    at net.sf.saxon.trans.Mode.applyTemplates()
    at net.sf.saxon.Controller.transformDocument()
    at net.sf.saxon.Controller.transform()
    at net.sf.saxon.s9api.XsltTransformer.transform()
    at net.sf.saxon.jaxp.TransformerImpl.transform()
    ...
Caused by: java.lang.NumberFormatException: For input string: "" 
    at java.lang.NumberFormatException.forInputString()
    at java.lang.Long.parseLong()
    at java.lang.Long.parseLong()
    at net.sf.saxon.expr.GeneralComparison.quickCompare()
    at net.sf.saxon.expr.GeneralComparison.compare()
    at net.sf.saxon.expr.GeneralComparison.evaluateManyToOne()
    at net.sf.saxon.expr.GeneralComparison.evaluateItem()
    at net.sf.saxon.expr.GeneralComparison.evaluateItem()
    at net.sf.saxon.expr.Expression.process()
    at net.sf.saxon.expr.instruct.Template.applyLeavingTail()
    ... 8 more

We have reported the problem at Saxon's forum, and as usual the problem was shortly resolved.

Friday, 31 July 2015 08:36:39 UTC  #    Comments [0] -
xslt
# Thursday, 09 April 2015

After ECMAScript Xml Object Model we aligned JXOM to support Java 8. This includes support of:

As with ECMAScript, all sources are available at https://github.com/nesterovsky-bros/languages-xom

Thursday, 09 April 2015 19:46:22 UTC  #    Comments [0] -
Announce | Java | xslt
# Monday, 06 April 2015

Much time has passed since we fixed or extended Languages Xml Object Model. But now we needed to manipulate with and generate javascript programs.

Though xslt today is not a language of choice but rather niche language, it still fits very well to tasks of code generation and transformation.

So, we're pleased to announce ECMAScript Xml Object Model, which includes:

All sources are available at github: https://github.com/nesterovsky-bros/languages-xom

Monday, 06 April 2015 12:17:04 UTC  #    Comments [0] -
Announce | javascript | xslt
# Monday, 06 October 2014

After investigation we have found that Saxon 9.6 HE does not support xslt 3.0 as we assumed earlier.

On Saxonica site it's written: "Support for XQuery 3.0 and XPath 3.0 (now Recommendations) has been added to the open-source product."

As one can notice no xslt is mentioned.

More details are on open-source 3.0 support.

:-(
Monday, 06 October 2014 10:16:03 UTC  #    Comments [0] -
xslt
# Sunday, 05 October 2014

The new release of Saxon HE (version 9.6) claims basic support of xslt 3.0. So we're eager to test it but... errors happen. See error report at Error in SaxonHE9-6-0-1J and Bug #2160.

As with previous release Exception during execution in Saxon-HE-9.5.1-6 we bumped into engine's internal error.

We expect to see an update very soon, and to continue with long waited xslt 3.0.

Here is an argument to the discussion of open source vs commercial projects: open source projects with rich community may benefit, as problems are detected promptly; while commercial projects risk to live with more unnoticed bugs.

Sunday, 05 October 2014 08:10:14 UTC  #    Comments [0] -
xslt
# Friday, 03 October 2014

With Saxon 9.6 we can finally play with open source xslt 3.0

It's sad that it took so much time to make it available.

See Saxonica's home page to get details.

Friday, 03 October 2014 10:37:11 UTC  #    Comments [0] -
xslt
# Monday, 14 July 2014

These days we're not active xslt developers, though we still consider xslt and xquery are important part of our personal experience and self education.

Besides, we have a pretty large xslt code base that is and will be in use. We think xslt/xquery is in use in many other big and small projects thus they have a strong position as a niche languages.

Thus we think it's important to help to those who support xslt/xquery engines. That's what we're regularly doing (thanks to our code base).

Now, to the problem we just have found. Please consider the code:

<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:t="this"
  exclude-result-prefixes="xs t">

  <xsl:template match="/">
    <xsl:param name="new-line-text" as="xs:string" select="' '"/>
    <xsl:variable name="items" as="item()*" select="'select', $new-line-text"/>

    <xsl:message select="t:string-join($items)"/>
  </xsl:template>

  <!--
    Joins adjacent string items.
      $items - items to join.
      Returns joined items.
  -->
  <xsl:function name="t:string-join" as="item()*">
    <xsl:param name="items" as="item()*"/>

    <xsl:variable name="indices" as="xs:integer*" select="
      0,
      index-of
      (
        (
          for $item in $items return
            $item instance of xs:string
        ),
        false()
      ),
      count($items) + 1"/>

    <xsl:sequence select="
      for $i in 1 to count($indices) - 1 return
      (
        $items[$indices[$i]],
        string-join
        (
          subsequence
          (
            $items,
            $indices[$i] + 1,
            $indices[$i + 1] - $indices[$i] - 1
          ),
          ''
        )
      )"/>
  </xsl:function>

</xsl:stylesheet>

The output is:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
  at net.sf.saxon.om.Chain.itemAt(Chain.java:161)
  at net.sf.saxon.om.SequenceTool.itemAt(SequenceTool.java:130)
  at net.sf.saxon.expr.FilterExpression.iterate(FilterExpression.java:1143)
  at net.sf.saxon.expr.LetExpression.iterate(LetExpression.java:365)
  at net.sf.saxon.expr.instruct.BlockIterator.next(BlockIterator.java:49)
  at net.sf.saxon.expr.MappingIterator.next(MappingIterator.java:70)
  at net.sf.saxon.expr.instruct.Message.processLeavingTail(Message.java:264)
  at net.sf.saxon.expr.instruct.Block.processLeavingTail(Block.java:660)
  at net.sf.saxon.expr.instruct.Template.applyLeavingTail(Template.java:239)
  at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:1057)
  at net.sf.saxon.Controller.transformDocument(Controller.java:2088)
  at net.sf.saxon.Controller.transform(Controller.java:1911)
  ...

The problem is reported at: Exception during execution in Saxon-HE-9.5.1-6 and also tracked at https://saxonica.plan.io/issues/2104.

Update: according to Michael Kay the issue is fixed. See note #4:

I have committed a patch to Chain.itemAt() on the 9.5 and 9.6 branches to check for index<0

Monday, 14 July 2014 08:08:01 UTC  #    Comments [0] -
xslt
# Monday, 28 April 2014

Among proposed new features (other than Maps and Arrays) in XPath 3.1 we like Arrow operator (=>).

It's defined like this:

[Definition: An arrow operator is a postfix operator that applies a function to an item, using the item as the first argument to the function.] If $i is an item and f() is a function, then $i=>f() is equivalent to f($i), and $i=>f($j) is equivalent to f($i, $j).

This syntax is particularly helpful when conventional function call syntax is unreadable, e.g. when applying multiple functions to an item. For instance, the following expression is difficult to read due to the nesting of parentheses, and invites syntax errors due to unbalanced parentheses:

tokenize((normalize-unicode(upper-case($string))),"\s+")

Many people consider the following expression easier to read, and it is much easier to see that the parentheses are balanced:

$string=>upper-case()=>normalize-unicode()=>tokenize("\s+")

 What it looks like?

Right! It's like extension functions in C#.

Monday, 28 April 2014 06:20:27 UTC  #    Comments [0] -
xslt
# Wednesday, 03 July 2013

Awhile ago we have created a set of xml schemas and xslt to represent different languages as xml, and to generate source from those xmls. This way we know to represent and generate: java, c#, cobol, and several sql dialects (read about languages xom on this site).

Here, we'd like to expose a nuisance we had with sql dialects schema.

Our goal was to define a basic sql schema, and dialect extensions. This way we assumed to express general and dialect specific constructs. So, lets consider an example.

General:

-- Select one row
select * from A

DB2:

select * from A fetch first row only

T-SQL:

select top 1 * from A

Oracle:

select * from A where rownum = 1

All these queries have common core syntax, while at the same time have dialect specific means to express intention to return first row only.

Down to the xml schema basic select statement looks like this:

<xs:complexType name="select-statement">
  <xs:complexContent>
    <xs:extension base="full-select-statement">
      <xs:sequence>
        <xs:element name="columns" type="columns-clause">
        <xs:element name="from" type="from-clause" minOccurs="0">
        <xs:element name="where" type="unary-expression" minOccurs="0"/>
        <xs:element name="group-by" type="expression-list" minOccurs="0"/>
        <xs:element name="having" type="unary-expression" minOccurs="0"/>
        <xs:element name="order-by" type="order-by-clause" minOccurs="0"/>
      </xs:sequence>
      <xs:attribute name="specification" type="query-specification"
        use="optional" default="all"/>
    </xs:extension>
  </xs:complexContent>
</xs:complexType>

Here all is relatively clear. The generic select looks like:

<sql:select>
  <sql:columns>
    <sql:column wildcard="true"/>
  </sql:columns>
  <sql:from>
    <sql:table name="A"/>
  </sql:from>
</sql:select>

But how would you define dialect specifics?

E.g. for T-SQL we would like to see a markup:

<sql:select>
  <tsql:top>
    <sql:number value="1"/>
  </tsql:top>
  <sql:columns>
    <sql:column wildcard="true"/>
  </sql:columns>
  <sql:from>
    <sql:table name="A"/>
  </sql:from>
</sql:select>

While for DB2 there should be:

<sql:select>
  <sql:columns>
    <sql:column wildcard="true"/>
  </sql:columns>
  <sql:from>
    <sql:table name="A"/>
  </sql:from>
  <db2:fetch-first rows="1"/>
</sql:select>

So, again the quesions are:

  • how to define basic sql schema with goal to extend it in direction of DB2 or T-SQL?
  • how to define an xslt sql serializer that will be also extendable?

Though we have tried several solutions to that problem, none is satisfactory enough. To allow extensions we have defined that all elements in sql schema are based on sql-element, which allows extensions:

<xs:complexType name="sql-element" abstract="true">
  <xs:sequence>
    <xs:element ref="extension" minOccurs="0" maxOccurs="unbounded"/>
  </xs:sequence>
</xs:complexType>

<xs:element name="extension" type="extension"/>

<xs:complexType name="extension" abstract="true">
  <xs:complexContent>
    <xs:extension base="sql-element"/>
  </xs:complexContent>
</xs:complexType>

...

<xs:element name="top" type="top-extension" substitutionGroup="sql:extension"/>

<xs:complexType name="top-extension">
  <xs:complexContent>
    <xs:extension base="sql:extension">
      <xs:sequence>
        <xs:element ref="sql:expression"/>
      </xs:sequence>
      <xs:attribute name="percent" type="xs:boolean" use="optional" default="false"/>
    </xs:extension>
  </xs:complexContent>
</xs:complexType>

Unfortunately, this creates too weak typed schema for extensions, thus intellisence suggests too many options.

Wednesday, 03 July 2013 05:50:43 UTC  #    Comments [0] -
Thinking aloud | xslt
# Monday, 29 October 2012

If you deal with web applications you probably have already dealt with export data to Excel. There are several options to prepare data for Excel:

  • generate CSV;
  • generate HTML that excel understands;
  • generate XML in Spreadsheet 2003 format;
  • generate data using Open XML SDK or some other 3rd party libraries;
  • generate data in XLSX format, according to Open XML specification.

You may find a good article with pros and cons of each solution here. We, in our turn, would like to share our experience in this field. Let's start from requirements:

  • Often we have to export huge data-sets.
  • We should be able to format, parametrize and to apply different styles to the exported data.
  • There are cases when exported data may contain more than one table per sheet or even more than one sheet.
  • Some exported data have to be illustrated with charts.

All these requirements led us to a solution based on XSLT processing of streamed data. The advantage of this solution is that the result is immediately forwarded to a client as fast as XSLT starts to generate output. Such approach is much productive than generating of XLSX using of Open XML SDK or any other third party library, since it avoids keeping a huge data-sets in memory on the server side.

Another advantage - is simple maintenance, as we achieve clear separation of data and presentation layers. On each request to change formatting or apply another style to a cell you just have to modify xslt file(s) that generate variable parts of XLSX.

As result, our clients get XLSX files according with Open XML specifications. The details of implementations of our solution see in our next posts.

Monday, 29 October 2012 15:34:38 UTC  #    Comments [0] -
.NET | ASP.NET | Thinking aloud | xslt
# Friday, 03 August 2012

Earlier we have shown how to build streaming xml reader from business data and have reminded about ForwardXPathNavigator which helps to create a streaming xslt transformation. Now we want to show how to stream content produced with xslt out of WCF service.

To achieve streaming in WCF one needs:

1. To configure service to use streaming. Description on how to do this can be found in the internet. See web.config of the sample Streaming.zip for the details.

2. Create a service with a method returning Stream:

[ServiceContract(Namespace = "http://www.nesterovsky-bros.com")]
[AspNetCompatibilityRequirements(RequirementsMode = AspNetCompatibilityRequirementsMode.Allowed)]
public class Service
{
  [OperationContract]
  [WebGet(RequestFormat = WebMessageFormat.Json)]
  public Stream GetPeopleHtml(int count, int seed)
  {
    ...
  }
}

2. Return a Stream from xsl transformation.

Unfortunately (we mentioned it already), XslCompiledTransform generates its output into XmlWriter (or into output Stream) rather than exposes result as XmlReader, while WCF gets input stream and passes it to a client.

We could generate xslt output into a file or a memory Stream and then return that content as input Stream, but this will defeat a goal of streaming, as client would have started to get data no earlier that the xslt completed its work. What we need instead is a pipe that form xslt output Stream to an input Stream returned from WCF.

.NET implements pipe streams, so our task is trivial. We have defined a utility method that creates an input Stream from a generator populating an output Stream:

public static Stream GetPipedStream(Action<Stream> generator)
{
  var output = new AnonymousPipeServerStream();
  var input = new AnonymousPipeClientStream(
    output.GetClientHandleAsString());

  Task.Factory.StartNew(
    () =>
    {
      using(output)
      {
        generator(output);
        output.WaitForPipeDrain();
      }
    },
    TaskCreationOptions.LongRunning);

  return input;
}

We wrapped xsl transformation as such a generator:

[OperationContract]
[WebGet(RequestFormat = WebMessageFormat.Json)]
public Stream GetPeopleHtml(int count, int seed)
{
  var context = WebOperationContext.Current;

  context.OutgoingResponse.ContentType = "text/html";
  context.OutgoingResponse.Headers["Content-Disposition"] =
    "attachment;filename=reports.html";

  var cache = HttpRuntime.Cache;
  var path = HttpContext.Current.Server.MapPath("~/People.xslt");
  var transform = cache[path] as XslCompiledTransform;

  if (transform == null)
  {
    transform = new XslCompiledTransform();
    transform.Load(path);
    cache.Insert(path, transform, new CacheDependency(path));
  }

  return Extensions.GetPipedStream(
    output =>
    {
      // We have a streamed business data.
      var people = Data.CreateRandomData(count, seed, 0, count);

      // We want to see it as streamed xml data.
      using(var stream =
        people.ToXmlStream("people", "http://www.nesterovsky-bros.com"))
      using(var reader = XmlReader.Create(stream))
      {
        // XPath forward navigator is used as an input source.
        transform.Transform(
          new ForwardXPathNavigator(reader),
          new XsltArgumentList(),
          output);
      }
    });
}

This way we have build a code that streams data directly from business data to a client in a form of report. A set of utility functions and classes helped us to overcome .NET's limitations and to build simple code that one can easily support.

The sources can be found at Streaming.zip.

Friday, 03 August 2012 22:32:49 UTC  #    Comments [0] -
.NET | ASP.NET | Thinking aloud | Tips and tricks | xslt
# Thursday, 26 July 2012

In the previous post about streaming we have dropped at the point where we have XmlReader in hands, which continously gets data from IEnumerable<Person> source. Now we shall remind about ForwardXPathNavigator - a class we have built back in 2002, which adds streaming transformations to .NET's xslt processor.

While XslCompiledTransform is desperately obsolete, and no upgrade will possibly follow; still it's among the fastest xslt 1.0 processors. With ForwardXPathNavigator we add ability to transform input data of arbitrary size to this processor.

We find it interesting that xslt 3.0 Working Draft defines streaming processing in a way that closely matches rules for ForwardXPathNavigator:

Streaming achieves two important objectives: it allows large documents to be transformed without requiring correspondingly large amounts of memory; and it allows the processor to start producing output before it has finished receiving its input, thus reducing latency.

The rules for streamability, which are defined in detail in 19.3 Streamability Analysis, impose two main constraints:

  • The only nodes reachable from the node that is currently being processed are its attributes and namespaces, its ancestors and their attributes and namespaces, and its descendants and their attributes and namespaces. The siblings of the node, and the siblings of its ancestors, are not reachable in the tree, and any attempt to use their values is a static error. However, constructs (for example, simple forms of xsl:number, and simple positional patterns) that require knowledge of the number of preceding elements by name are permitted.

  • When processing a given node in the tree, each descendant node can only be visited once. Essentially this allows two styles of processing: either visit each of the children once, and then process that child with the same restrictions applied; or process all the descendants in a single pass, in which case it is not possible while processing a descendant to make any further downward selection.

The only significant difference between ForwardXPathNavigator and xlst 3.0 streaming is in that we reported violations of rules for streamability at runtime, while xslt 3.0 attempts to perform this analysis at compile time.

Here the C# code for the xslt streamed transformation:

var transform = new XslCompiledTransform();

transform.Load("People.xslt");

// We have a streamed business data.
var people = Data.CreateRandomData(10000, 0, 0, 10000);

// We want to see it as streamed xml data.
using(var stream =
  people.ToXmlStream("people", "http://www.nesterovsky-bros.com"))
using(var reader = XmlReader.Create(stream))
using(var output = File.Create("people.html"))
{
  // XPath forward navigator is used as an input source.
  transform.Transform(
    new ForwardXPathNavigator(reader),
    new XsltArgumentList(),
    output);
}

Notice how XmlReader is wrapped into ForwardXPathNavigator.

To complete the picture we need xslt that follows the streaming rules:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:msxsl="urn:schemas-microsoft-com:xslt"
  xmlns:d="http://www.nesterovsky-bros.com"
  exclude-result-prefixes="msxsl d">

  <xsl:output method="html" indent="yes"/>

  <!-- Root template processed in the streaming mode. -->
  <xsl:template match="/d:people">
    <html>
      <head>
        <title>List of persons</title>
        <style type="text/css">
          .even
          {
          }

          .odd
          {
            background: #d0d0d0;
          }
        </style>
      </head>
      <body>
        <table border="1">
          <tr>
            <th>ID</th>
            <th>First name</th>
            <th>Last name</th>
            <th>City</th>
            <th>Title</th>
            <th>Age</th>
          </tr>

          <xsl:for-each select="d:person">
            <!--
              Get element snapshot.
              A snapshot allows arbitrary access to the element's content.
            -->
            <xsl:variable name="person">
              <xsl:copy-of select="."/>
            </xsl:variable>

            <xsl:variable name="position" select="position()"/>

            <xsl:apply-templates mode="snapshot" select="msxsl:node-set($person)/d:person">
              <xsl:with-param name="position" select="$position"/>
            </xsl:apply-templates>
          </xsl:for-each>
        </table>
      </body>
    </html>
  </xsl:template>

  <xsl:template mode="snapshot" match="d:person">
    <xsl:param name="position"/>

    <tr>
      <xsl:attribute name="class">
        <xsl:choose>
          <xsl:when test="$position mod 2 = 1">
            <xsl:text>odd</xsl:text>
          </xsl:when>
          <xsl:otherwise>
            <xsl:text>even</xsl:text>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:attribute>

      <td>
        <xsl:value-of select="d:Id"/>
      </td>
      <td>
        <xsl:value-of select="d:FirstName"/>
      </td>
      <td>
        <xsl:value-of select="d:LastName"/>
      </td>
      <td>
        <xsl:value-of select="d:City"/>
      </td>
      <td>
        <xsl:value-of select="d:Title"/>
      </td>
      <td>
        <xsl:value-of select="d:Age"/>
      </td>
    </tr>
  </xsl:template>

</xsl:stylesheet>

So, we have started with a streamed entity data, proceeded to the streamed XmlReader and reached to the streamed xslt transformation.

But at the final post about streaming we shall remind a simple way of building WCF service returning html stream from our xslt transformation.

The sources can be found at Streaming.zip.

Thursday, 26 July 2012 18:49:51 UTC  #    Comments [0] -
.NET | Thinking aloud | Tips and tricks | xslt
# Sunday, 22 July 2012

For some reason neither .NET's XmlSerializer nor DataContractSerializer allow reading data through an XmlReader. These APIs work other way round writing data into an XmlWriter. To get data through XmlReader one needs to write it to some destination like a file or memory stream, and then to read it using XmlReader. This complicates streaming design considerably.

In fact the very same happens with other .NET APIs.

We think the reason of why .NET designers preferred XmlWriter to XmlReader in those APIs is that XmlReader's implementation is a state machine like, while XmlWriter's implementation looks like a regular procedure. It's much harder to manually write and to support a correct state machine logic than a procedure.

If history would have gone slightly different way, and if yield return, lambda, and Enumerator API appeared before XmlReader, and XmlWriter then, we think, both these classes looked differently. Xml source would have been described with a IEnumerable<XmlEvent> instead of XmlReader, and XmlWriter must be looked like a function receiving IEnumerable<XmlEvent>. Implementing XmlReader would have meant a creating a enumerator. Yield return and Enumerable API would have helped to implement it in a procedural way.

But in our present we have to deal with the fact that DataContractSerializer should write the data into XmlWriter, so let's assume we have a project that uses Entity Framework to access the database, and that you have a data class Person, and data access method GetPeople():

[DataContract(Name = "person", Namespace = "http://www.nesterovsky-bros.com")]
public class Person
{
  [DataMember] public int Id { get; set; }
  [DataMember] public string FirstName { get; set; }
  [DataMember] public string LastName { get; set; }
  [DataMember] public string City { get; set; }
  [DataMember] public string Title { get; set; }
  [DataMember] public DateTime BirthDate { get; set; }
  [DataMember] public int Age { get; set; }
}

public static IEnumerable<Person> GetPeople() { ... }

And your goal is to expose result of GetPeople() as XmlReader. We achieve result with three simple steps:

  1. Define JoinedStream - an input Stream implementation that reads data from a enumeration of streams (IEnumerable<Stream>).
  2. Build xml parts in the form of IEnumerable<Stream>.
  3. Combine parts into final xml stream.

The code is rather simple, so here we qoute its essential part:

public static class Extensions
{
  public static Stream JoinStreams(this IEnumerable<Stream> streams, bool closeStreams = true)
  {
    return new JoinedStream(streams, closeStreams);
  }

  public static Stream ToXmlStream<T>(
    this IEnumerable<T> items,
    string rootName = null,
    string rootNamespace = null)
  {
    return items.ToXmlStreamParts<T>(rootName, rootNamespace).
      JoinStreams(false);
  }

  private static IEnumerable<Stream> ToXmlStreamParts<T>(
    this IEnumerable<T> items,
    string rootName = null,
    string rootNamespace = null)
  {
    if (rootName == null)
    {
      rootName = "ArrayOfItems";
    }

    if (rootNamespace == null)
    {
      rootNamespace = "";
    }

    var serializer = new DataContractSerializer(typeof(T));
    var stream = new MemoryStream();
    var writer = XmlDictionaryWriter.CreateTextWriter(stream);

    writer.WriteStartDocument();
    writer.WriteStartElement(rootName, rootNamespace);
    writer.WriteXmlnsAttribute("s", XmlSchema.Namespace);
    writer.WriteXmlnsAttribute("i", XmlSchema.InstanceNamespace);

    foreach(var item in items)
    {
      serializer.WriteObject(writer, item);
      writer.WriteString(" ");

      writer.Flush();
      stream.Position = 0;

      yield return stream;

      stream.Position = 0;
      stream.SetLength(0);
    }

    writer.WriteEndElement();
    writer.WriteEndDocument();

    writer.Flush();
    stream.Position = 0;

    yield return stream;
  }

  private class JoinedStream: Stream
  {
    public JoinedStream(IEnumerable<Stream> streams, bool closeStreams = true)
    ...
  }
}

The use is even more simple:

// We have a streamed business data.
var people = GetPeople();

// We want to see it as streamed xml data.
using(var stream = people.ToXmlStream("persons", "http://www.nesterovsky-bros.com"))
using(var reader = XmlReader.Create(stream))
{
  ...
}

We have packed the sample into the project Streaming.zip.

In the next post we're going to remind about streaming processing in xslt.

Sunday, 22 July 2012 20:38:29 UTC  #    Comments [2] -
.NET | Thinking aloud | Tips and tricks | xslt
# Tuesday, 08 May 2012

Some time ago we were taking a part in a project where 95% of all sources are xslt 2.0. It was a great experience for us.

The interesting part is that we used xslt in areas we would never expect it in early 2000s. It crunched gigabytes of data in offline, while earlier we generally sought xslt application in a browser or on a server as an engine to render the data.

Web applications (both .NET and java) are in our focus today, and it became hard to find application for xslt or xquery.

Indeed, client side now have a very strong APIs: jquery, jqueryui, jsview, jqgrid, kendoui, and so on. These libraries, and today's browsers cover developer's needs in building managable applications. In contrast, a native support of xslt (at least v2) does not exist in browsers.

Server side at present is seen as a set of web services. These services support both xml and json formats, and implement a business logic only. It would be a torture to try to write such a frontend in xslt/xquery. A server logic itself is often dealing with a diversity of data sources like databases, files (including xml files) and other.

As for a database (we primarily work with SQL Server 2008 R2), we think that all communication should go through stored procedures, which implement all data logic. Clearly, this place is not for xslt. However, those who know sql beyond its basics can confirm that sql is very similar to xquery. More than that SQL Server (and other databases) integrate xquery to work with xml data, and we do use it extensively.

Server logic itself uses API like LINQ to manipulate with different data sources. In fact, we think that one can build a compiler from xquery 3.0 to C# with LINQ. Other way round compiler would be a whole different story.

The net result is that we see little place for xslt and xquery. Well, after all it's only a personal perspective on the subject. The similar type of thing has happened to us with C++. As with xslt/xquery we love C++ very much, and we fond of C++11, but at present we have no place in our current projects for C++. That's pitty.

P.S. Among other things that play against xslt/xquery is that there is a shortage of people who know these languages, thus who can support such projects.

Tuesday, 08 May 2012 20:28:51 UTC  #    Comments [0] -
Thinking aloud | xslt
# Friday, 23 March 2012

This time we update csharpxom to adjust it to C# 4.5. Additions are async modifier and await operator.

They are used to simplify asynchronous programming.

The following example from the msdn:

private async Task<byte[]> GetURLContentsAsync(string url)
{
  var content = new MemoryStream();
  var request = (HttpWebRequest)WebRequest.Create(url);

  using(var response = await request.GetResponseAsync())
  using(var responseStream = response.GetResponseStream())
  {
    await responseStream.CopyToAsync(content);
  }

  return content.ToArray();
}

looks like this in csharpxom:

<method name="GetURLContentsAsync" access="private" async="true">
  <returns>
    <type name="Task" namespace="System.Threading.Tasks">
      <type-arguments>
        <type name="byte" rank="1"/>
      </type-arguments>
    </type>
  </returns>
  <parameters>
    <parameter name="url">
      <type name="string"/>
    </parameter>
  </parameters>
  <block>
    <var name="content">
      <initialize>
        <new-object>
          <type name="MemoryStream" namespace="System.IO"/>
        </new-object>
      </initialize>
    </var>
    <var name="request">
      <initialize>
        <cast>
          <invoke>
            <static-method-ref name="Create">
              <type name="WebRequest" namespace="System.Net"/>
            </static-method-ref>
            <arguments>
              <var-ref name="url"/>
            </arguments>
          </invoke>
          <type name="HttpWebRequest" namespace="System.Net"/>
        </cast>
      </initialize>
    </var>

    <using>
      <resource>
        <var name="response">
          <initialize>
            <await>
              <invoke>
                <method-ref name="GetResponseAsync">
                  <var-ref name="request"/>
                </method-ref>
              </invoke>
            </await>
          </initialize>
        </var>
      </resource>
      <using>
        <resource>
          <var name="responseStream">
            <initialize>
              <invoke>
                <method-ref name="GetResponseStream">
                  <var-ref name="response"/>
                </method-ref>
              </invoke>
            </initialize>
          </var>
        </resource>
        <expression>
          <await>
            <invoke>
              <method-ref name="CopyToAsync">
                <var-ref name="responseStream"/>
              </method-ref>
              <arguments>
                <var-ref name="content"/>
              </arguments>
            </invoke>
          </await>
        </expression>
      </using>
    </using>

    <return>
      <invoke>
        <method-ref name="ToArray">
          <var-ref name="content"/>
        </method-ref>
      </invoke>
    </return>
  </block>
</method>

Friday, 23 March 2012 00:07:35 UTC  #    Comments [0] -
.NET | Announce | xslt
# Saturday, 10 December 2011

@michaelhkay Saxon 9.4 is out.

But why author does not state that HE version is still xslt/xpath 2.0, as neither xslt maps, nor function items are supported.

Saturday, 10 December 2011 12:16:28 UTC  #    Comments [0] -
Thinking aloud | xslt
# Friday, 28 October 2011

It has happened so, that we have never worked with jQuery, however were aware of it.

In early 2000 we have developed a web application that contained rich javascript APIs, including UI components. Later, we were actively practicing in ASP.NET, and later in JSF.

At present, looking at jQuery more closely we regret that we have failed to start using it earlier.

Separation of business logic and presentation is remarkable when one uses JSON web services. In fact server part can be seen as a set of web services representing a business logic and a set of resources: html, styles, scripts, others. Nor ASP.NET or JSF approach such a consistent separation.

The only trouble, in our opinion, is that jQuery has no standard data binding: a way to bind JSON data to (and from) html controls. The technique that will probably be standardized is called jQuery Templates or JsViews .

Unfortunatelly after reading about this binding API, and being in love with Xslt and XQuery we just want to cry. We don't know what would be the best solution for the task, but what we see looks uncomfortable to us.

Friday, 28 October 2011 22:59:23 UTC  #    Comments [0] -
ASP.NET | JSF and Facelets | Thinking aloud | Tips and tricks | xslt
# Thursday, 29 September 2011

A couple of weeks ago, we have suggested to introduce a enumerator function into the XPath (see [F+O30] A enumerator function):

I would like the WG to consider an addition of a function that turns a sequence into a enumeration of values.

Consider a function like this:  fn:enumerator($items as item()*) as function() as item()?;

alternatively, signature could be:

 fn:enumerator($items as function() as item()*) as function() as item()?;

This function receives a sequence, and returns a function item, which upon N's call shall return N's element of the original sequence. This way, a sequence of items is turned into a function providing a enumeration of items of the sequence.

As an example consider two functions:

a) t:rand($seed as xs:double) as xs:double* - a function producing a random number sequence;
b) t:work($input as element()) as element() - a function that generates output from it's input, and that needs random numbers in the course of the execution.

t:work() may contain a code like this:
  let $rand := fn:enumerator(t:rand($seed)),

and later it can call $rand() to get a random numbers.

Enumerators will help to compose algorithms where one algorithm communicate with other independant algorithms, thus making code simpler. The most obvious class of enumerators are generators: ordered numbers, unique identifiers, random numbers.

Technically, function returned from fn:enumerator() is nondetermenistic, but its "side effect" is similar to a "side effect" of a function generate-id() from a newly created node (see bug #13747, and bug #13494).

The idea is inspired by a generator function, which returns a new value upon each call.

Such function can be seen as a stateful object. But our goal is to look at it in a more functional way. So, we look at the algorithm as a function that produces a sequence of output, which is pure functional; and an enumerator that allows to iterate over algorithm's output.

This way, we see the function that implements an algorithm and the function that uses it can be seen as two thread of functional programs that use messaging to communicate to each other.

Honestly, we doubt that WG will accept it, but it's interesting to watch the discussion.

Thursday, 29 September 2011 11:56:05 UTC  #    Comments [0] -
Thinking aloud | xslt
# Wednesday, 14 September 2011

More than month has passed since we have reported a problem to the saxon forum (see Saxon optimizer bug and Saxon 9.2 generate-id() bug).

The essence of the problem is that we have constructed argumentless function to return a unique identifiers each time function is called. To achieve the effect we have created a temporary node and returned its generate-id() value.

Such a function is nondetermenistic, as we cannot state that its result depends on arguments only. This means that engine's optimizer is not free to reorder calls to such a function. That's what happens in Saxon 9.2, and Saxon 9.3 where engine elevates function call out of cycle thus producing invalid results.

Michael Kay, the author of the Saxon engine, argued that this is "a gray area of the xslt spec":

If the spec were stricter about defining exactly when you can rely on identity-dependent operations then I would be obliged to follow it, but I think it's probably deliberate that it currently allows implementations some latitude, effectively signalling to users that they should avoid depending on this aspect of the behaviour.

He adviced to raise a bug in the w3c bugzilla to resolve the issue. In the end two related bugs have been raised:

  • Bug 13494 - Node uniqueness returned from XSLT function;
  • Bug 13747 - [XPath 3.0] Determinism of expressions returning constructed nodes.

Yesterday, the WG has resolved the issue:

The Working Group agreed that default behavior should continue to require these nodes to be constructed with unique IDs. We believe that this is the kind of thing implementations can do with annotations or declaration options, and it would be best to get implementation experience with this before standardizing.

This means that the technique we used to generate unique identifiers is correct and the behaviour is well defined.

The only problem is to wait when Saxon will fix its behaviour accordingly.

Wednesday, 14 September 2011 05:54:56 UTC  #    Comments [0] -
Thinking aloud | xslt
# Sunday, 28 August 2011

Recently one of users of java yield return annotation has kindly informed us about some problem that happened in his environment (see Java's @Yield return annotation update).

Incidentally we have never noticed the problem earlier. Along with this issue we have found that eclipse compiler has changed in the Indigo in a way that we had to recompile the source. Well, that's a price you have to pay when you access internal API.

Updated sources can be found at Yield.zip, and compiled jars at Yield.jar (pre-Indigo), and Yield.3.7.jar (Indigo and probably higher).

See also:

Yield return feature in java
Why @Yield iterator should be Closeable
What you can do with jxom.

Sunday, 28 August 2011 19:11:45 UTC  #    Comments [0] -
Announce | Java | xslt
# Wednesday, 27 July 2011

An xslt code that worked in the production for several years failed unexpectedly. That's unusual, unfortunate but it happens.

We started to analyze the problem, limited the code block and recreated it in the simpe form. That's it:

<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:t="http://www.nesterovsky-bros.com/xslt/public"
  exclude-result-prefixes="t xs">

<xsl:template match="/" name="main">
  <xsl:variable name="content">
    <root>
      <xsl:for-each select="1 to 3">
        <item/>
      </xsl:for-each>
    </root>
  </xsl:variable>

  <xsl:variable name="result">
    <root>
      <xsl:for-each select="$content/root/item">
        <section-ref name-ref="{t:generate-id()}.s"/>
        <!--
        <xsl:variable name="id" as="xs:string"
          select="t:generate-id()"/>
        <section-ref name-ref="{$id}.s"/>
        -->
      </xsl:for-each>
    </root>
  </xsl:variable>

  <xsl:message select="$result"/>
</xsl:template>

<xsl:function name="t:generate-id" as="xs:string">
  <xsl:variable name="element" as="element()">
    <element/>
  </xsl:variable>

  <xsl:sequence select="generate-id($element)"/>
</xsl:function>

</xsl:stylesheet>

This code performs some transformation and assigns unique values to name-ref attributes. Values generated with t:generate-id() function are guaranteed to be unique, as spec claims that every node has its unique generate-id() value.

Imagine, what was our surprise to find that generated elements all have the same name-ref's. We studied code all over, and found no holes in our reasoning and implementation, so our conlusion was: it's Saxon's bug!

It's interesting enough that if we rewrite code a little (see commented part), it starts to work properly, thus we suspect Saxon's optimizer.

Well, in the course of development we have found and reported many Saxon bugs, but how come that this little beetle was hiding so long.

We've verified that the bug exists in the versions 9.2 and 9.3. Here is the bug report: Saxon 9.2 generate-id() bug.

Unfortunatelly, it's there already for three days (2011-07-25 to 2011-07-27) without any reaction. We hope this will change soon.

Wednesday, 27 July 2011 20:02:38 UTC  #    Comments [0] -
Tips and tricks | xslt
# Thursday, 26 May 2011

We did not update languages-xom already for many monthes but now we have found a severe bug in the jxom's algorithm for eliminating unreachable code. The marked line were considered as unreachable:

check:
  if (condition)
  {
    break check;
  }
  else
  {
    return;
  }

  // due to bug the following was considered unreachable
  expression;

Bug is fixed.

Current update contains other cosmetic fixes.

Please download xslt sources from languages-xom.zip.

Summary

Languages XOM is a set of xml schemas and xslt stylesheets that allows:

  • to define programs in xml form;
  • to perform transformations over code in xml form;
  • to generate sources.

Languages XOM includes:

  • jxom - Java Xml Object model;
  • csharpxom - C# Xml Object Model;
  • cobolxom - COBOL Xml Object Model;
  • sqlxom - SQL Xml Object Model (including several sql dialects);
  • aspx - ASP.NET Object Model;

A proprietary part of languages XOM also includes XML Object Model for a language named Cool:GEN. In fact the original purpose for this API was a generation of java/C#/COBOL from Cool:GEN. For more details about Cool:GEN conversion please see here.

Thursday, 26 May 2011 05:15:11 UTC  #    Comments [2] -
Announce | Java | xslt
# Saturday, 05 February 2011

We have updated @Yield annotation processor to support better debug info.

Annotation processor can be downloaded from Yield.zip or Yield.jar.

We also decided to consider jxom's state machine refactoring as obsolete as @Yield annotation allows to achieve the same effect but with more clear code.

JXOM can be downloaded from languages-xom.zip

See also:

Yield return feature in java
Why @Yield iterator should be Closeable
What you can do with jxom.

Saturday, 05 February 2011 21:12:05 UTC  #    Comments [3] -
Announce | Java | xslt
# Thursday, 03 February 2011

michaelhkay: Just posted a new internal draft of XSLT 3.0. Moving forward on maps, nested sequences, and JSON support.

Hope they will finally appear there!

See also: Tuples and maps - next try, Tuples and maps - Status: CLOSED, WONTFIX, Tuples and maps in Saxon and other blog posts on our site about immutable maps.

Thursday, 03 February 2011 11:07:49 UTC  #    Comments [0] -
Thinking aloud | xslt
# Friday, 14 January 2011

For some reason we never knew about instance initializer in java; on the other hand static initializer is well known.

class A
{
  int x;
  static int y;

  // This is an instance initializer.
  {
    x = 1;
  }

  // This is a static initializer.
  static
  {
    y = 2;
  }
}

Worse, we have missed it in the java grammar when we were building jxom. This way jxom was missing the feature.

Today we fix the miss and introduce a schema element:

<class-initializer static="boolean">
  <block>
    ...
  </block>
</class-initializer>

It superseeds:

<static>
  <block>
    ...
  </block>
</static>

 that supported static initializers alone.

Please update languages-xom xslt stylesheets.

P.S. Out of curiosity, did you ever see any use of instance initializers?

Friday, 14 January 2011 21:29:04 UTC  #    Comments [0] -
Announce | Java | xslt
# Tuesday, 11 January 2011

We could not stand the temptation to implement the @Yield annotation that we described earlier.

Idea is rather clear but people are saying that it's not an easy task to update the sources.

They were right!

Implementation has its price, as we were forced to access JDK's classes of javac compiler. As result, at present, we don't support other compilers such as EclipseCompiler. We shall look later what can be done in this area.

At present, annotation processor works perfectly when you run javac either from the command line, from ant, or from other build tool.

Here is an example of how method is refactored:

@Yield
public static Iterable<Long> fibonachi()
{
  ArrayList<Long> items = new ArrayList<Long>();

  long Ti = 0;
  long Ti1 = 1;

  while(true)
  {
    items.add(Ti);

    long value = Ti + Ti1;

    Ti = Ti1;
    Ti1 = value;
  }
}

And that's how we transform it:

@Yield()
public static Iterable<Long> fibonachi() {
  assert (java.util.ArrayList<Long>)(ArrayList<Long>)null == null : null;

  class $state$ implements java.lang.Iterable<Long>, java.util.Iterator<Long>, java.io.Closeable {

    public java.util.Iterator<Long> iterator() {
      if ($state$id == 0) {
        $state$id = 1;
        return this;
      } else return new $state$();
    }

    public boolean hasNext() {
      if (!$state$nextDefined) {
        $state$hasNext = $state$next();
        $state$nextDefined = true;
      }

      return $state$hasNext;
    }

    public Long next() {
      if (!hasNext()) throw new java.util.NoSuchElementException();

      $state$nextDefined = false;

      return $state$next;
    }

    public void remove() {
      throw new java.lang.UnsupportedOperationException();
    }

    public void close() {
      $state$id = 5;
    }

    private boolean $state$next() {
      while (true) switch ($state$id) {
      case 0:
        $state$id = 1;
      case 1:
        Ti = 0;
        Ti1 = 1;
      case 2:
        if (!true) {
          $state$id = 4;
          break;
        }

        $state$next = Ti;
        $state$id = 3;

        return true;
      case 3:
        value = Ti + Ti1;
        Ti = Ti1;
        Ti1 = value;
        $state$id = 2;

        break;
      case 4:
      case 5:
      default:
        $state$id = 5;

        return false;
      }
    }

    private long Ti;
    private long Ti1;
    private long value;
    private int $state$id;
    private boolean $state$hasNext;
    private boolean $state$nextDefined;
    private Long $state$next;
  }

  return new $state$();
}

Formatting is automatic, sorry, but anyway it's for diagnostics only. You will never see this code.

It's iteresting to say that this implementation is very precisely mimics xslt state machine implementation we have done back in 2008.

You can download YieldProcessor here. We hope that someone will find our solution very interesting.

Tuesday, 11 January 2011 16:08:41 UTC  #    Comments [0] -
Announce | Thinking aloud | Tips and tricks | xslt | Java
# Monday, 20 December 2010

Several times we have already wished to see yield feature in java and all the time came to the same implementation: infomancers-collections. And every time with dissatisfaction turned away, and continued with regular iterators.

Why? Well, in spite of the fact it's the best implementation of the feature we have seen, it's still too heavy, as it's playing with java byte code at run-time.

We never grasped the idea why it's done this way, while there is post-compile time annotation processing in java.

If we would implemented the yeild feature in java we would created a @Yield annotation and would demanded to implement some well defined code pattern like this:

@Yield
Iteratable<String> iterator()
{
  // This is part of pattern.
  ArrayList<String> list = new ArrayList<String>();

  for(int i = 0; i < 10; ++i)
  {
    // list.add() plays the role of yield return.
    list.add(String.valueOf(i));
  }

  // This is part of pattern.
  return list;
}

or

@Yield
Iterator<String> iterator()
{
  // This is part of pattern.
  ArrayList<String> list = new ArrayList<String>();

  for(int i = 0; i < 10; ++i)
  {
    // list.add() plays the role of yield return.
    list.add(String.valueOf(i));
  }

  // This is part of pattern.
  return list.iterator();
}

Note that the code will work correctly even, if by mischance, post-compile-time processing will not take place.

At post comile time we would do all required refactoring to turn these implementations into a state machines thus runtime would not contain any third party components.

It's iteresting to recall that we have also implemented similar refactoring in pure xslt.

See What you can do with jxom.

Update: implementation can be found at Yield.zip

Monday, 20 December 2010 16:28:35 UTC  #    Comments [0] -
Java | Thinking aloud | Tips and tricks | xslt
# Thursday, 18 November 2010

Michael Key, author of the Saxon xslt processor, being inspired by the GWT ideas, has decided to compile Saxon HE into javascript. See Compiling Saxon using GWT.

The resulting script is about 1MB of size.

But what we thought lately, that it's overkill to bring whole xslt engine on a client, while it's possible to generate javascript from xslt the same way as he's building java from xquery. This will probably require some runtime but of much lesser size.

Thursday, 18 November 2010 16:19:52 UTC  #    Comments [0] -
Tips and tricks | xslt
# Tuesday, 09 November 2010

Search at www.google.fr: An empty sequence is not allowed as the @select attribute of xsl:analyze-string

That's known issue. See Bug 7976.

In xslt 2.0 you should either check the value before using xsl:analyze-string, or wrap it into string() call.

The problem is addressed in xslt 3.0

Tuesday, 09 November 2010 10:11:45 UTC  #    Comments [0] -
Tips and tricks | xslt
# Sunday, 07 November 2010

michaelhkay: Saxon 9.3 has been out for 8 days: only two bugs so far, one found by me. I think that's a record.

Not necessary. We, for example, who use Saxon HE, have found nothing new in Saxon 9.3, while expected to see xslt 3.0. Disappointed. No actual reason to migrate.

P.S. We were among the first who were finding early bugs in previous releases.

Sunday, 07 November 2010 09:07:11 UTC  #    Comments [0] -
Thinking aloud | xslt
# Tuesday, 02 November 2010

We're following w3's "Bug 9069 - Function to invoke an XSLT transformation".

There, people argue about xpath API to invoke xslt transformations. Function should look roughly like this:

transform
(
  $node-tree as node()?,
  $stylesheet as item(),
  $parameters as XXX
) as node()

The discussion is spinning around the last argument: $parameters as XXX. Should it be an xml element describing parameters, a function returning values for parameter names, or some new type modelling immutable map?

What is most interesting in this discussion is the leak about plans to introduce a map type:

Comment 7 Michael Kay, 2010-09-14 22:46:58 UTC

We're currently talking about adding an immutable map to XSLT as a new data type (the put operation would return a new map). There appear to be a number of possible efficient implementations. It would be ideally suited for this purpose, because unlike the mechanism used for serialization parameters, the values can be any data type (including nodes), not only strings.

There is a hope that map will finally appear in xslt!

See also:
Bug 5630 - [DM] Tuples and maps,
Tuples and maps - Status: CLOSED, WONTFIX,
Map, based on immutable trees,
Maps in exslt2?

Tuesday, 02 November 2010 08:34:52 UTC  #    Comments [0] -
Thinking aloud | xslt
# Monday, 01 November 2010

Historically jxom was developed first, and as such exhibited some imperfectness in its xml schema. csharpxom has taken into an account jxom's problems.

Unfortunately we could not easily fix jxom as a great amount of code already uses it. In this refactoring we tried to be conservative, and have changed only "type" and "import" xml schema elements in java.xsd.

Consider type reference and package import constructs in the old schema:

<!-- import java.util.ArrayList; -->
<import name="java.util.ArrayList"/>

<!-- java.util.ArrayList<java.math.BigDecimal> -->
<type package="java.util">
  <part name="ArrayList">
    <argument>
      <type name="BigDecimal" package="java.math">
    </argument>
  </part>
</type>

<!-- my.Parent.Nested -->
<type package="my">
  <part name="Parent"/>
  <part name="Nested"/>
<type>

Here we can observe that:

  • type is referred by a qualified name in import element;
  • type has two forms: simple (see BigDecimal), and other for nested or generic type (see ArrayList).

We have made it more consistent in the updated jxom:

<!-- import java.util.ArrayList; -->
<import>
  <type name="ArrayList" package="java.util"/>
</import>

<!-- java.util.ArrayList<java.math.BigDecimal> -->
<type name="ArrayList" package="java.util">
  <argument>
    <type name="BigDecimal" package="java.math">
  </argument>
</type>

<!-- my.Parent.Nested -->
<type name="Nested">
  <type name="Parent" package="my"/>
<type>

We hope that you will not be impacted very much by this fix.

Please refresh Languages XOM from languages-xom.zip.

P.S. we have also included xml schema and xslt api to generate ASPX (see Xslt serializer for ASPX output). We, in fact, in our projects, generate aspx documents with embedded csharpxom, and then pass it through two stage transformation.

Monday, 01 November 2010 15:48:19 UTC  #    Comments [0] -
Announce | xslt
# Friday, 22 October 2010

In the previous post we have announced an API to parse a COBOL source into the cobolxom.

We exploited the incremental parser to build a grammar xml tree and then were planning to create an xslt transformation to generate cobolxom.

Now, we would like to declare that such xslt is ready.

At present all standard COBOL constructs are supported, but more tests are required. Preprocessor support is still in the todo list.

You may peek into an examples of COBOL:

Cobol grammar:

And cobolxom:

While we were building a grammar to cobolxom stylesheet we asked ourselves whether the COBOL parsing could be done entirely in xslt. The answer is yes, so who knows it might be that we shall turn this task into pure xslt one. :-)

Friday, 22 October 2010 13:24:31 UTC  #    Comments [0] -
Announce | Incremental Parser | Thinking aloud | xslt
# Monday, 18 October 2010

Recently we've seen a code like this:

<xsl:variable name="a" as="element()?" select="..."/>
<xsl:variable name="b" as="element()?" select="..."/>

<xsl:apply-templates select="$a">
  <xsl:with-param name="b" tunnel="yes" as="element()" select="$b"/>
</xsl:apply-templates>

It fails with an error: "An empty sequence is not allowed as the value of parameter $b".

What is interesting is that the value of $a is an empty sequence, so the code could potentially work, provided processor evaluated $a first, and decided not to evaluate xsl:with-param.

Whether the order of evaluation of @select and xsl:with-param is specified by the standard or it's an implementation defined?

We asked this question on xslt forum, and got the following answer:

The specification leaves this implementation-defined. Since the values of the parameters are the same for every node processed, it's a reasonably strategy for the processor to evaluate the parameters before knowing how many selected nodes there are, though I guess an even better strategy would be to do it lazily when the first selected node is found.

Well, that's an expected answer. This question will probably induce Michael Kay to introduce a small optimization into the Saxon.

Monday, 18 October 2010 17:58:51 UTC  #    Comments [0] -
Tips and tricks | xslt
# Saturday, 09 October 2010

Once ago we have created an incremental parser, and now when we have decided to load COBOL sources directly into cobolxom (XML Object Model for a COBOL) the parser did the job perfectly.

The good point about incremental parser is that it easily handles COBOL's grammar.

The whole process looks like this:

  1. incremental parser having a COBOL grammar builds a grammar tree;
  2. we stream this tree into xml;
  3. xslt to transform xml from previous step into cobolxom (TODO).

This is an example of a COBOL:

IDENTIFICATION DIVISION.
PROGRAM-ID. FACTORIAL RECURSIVE.

DATA DIVISION.
WORKING-STORAGE SECTION.
01 NUMB PIC 9(4) VALUE IS 5.
01 FACT PIC 9(8) VALUE IS 0.

LOCAL-STORAGE SECTION.
01 NUM PIC 9(4).

PROCEDURE DIVISION.
  MOVE 'X' TO XXX
  MOVE NUMB TO NUM

  IF NUMB = 0 THEN
    MOVE 1 TO FACT
  ELSE
    SUBTRACT 1 FROM NUMB
    CALL 'FACTORIAL'
    MULTIPLY NUM BY FACT
  END-IF

  DISPLAY NUM '! = ' FACT

  GOBACK.
END PROGRAM FACTORIAL.

And a grammar tree:

<Program>
  <Name data="FACTORIAL"/>
  <Recursive/>
  <DataDivision>
    <WorkingStorageSection>
      <Data>
        <Level data="01"/>
        <Name data="NUMB"/>
        <Picture data="9(4)"/>
        <Value>
          <Numeric data="5"/>
        </Value>
      </Data>
      <Data>
        <Level data="01"/>
        <Name data="FACT"/>
        <Picture data="9(8)"/>
        <Value>
          <Numeric data="0"/>
        </Value>
      </Data>
    </WorkingStorageSection>
    <LocalStorageSection>
      <Data>
        <Level data="01"/>
        <Name data="NUM"/>
        <Picture data="9(4)"/>
      </Data>
    </LocalStorageSection>
  </DataDivision>
  <ProcedureDivision>
    <Sentence>
      <MoveStatement>
        <From>
          <String data="'X'"/>
        </From>
        <To>
          <Identifier>
            <DataName data="XXX"/>
          </Identifier>
        </To>
      </MoveStatement>
      <MoveStatement>
        <From>
          <Identifier>
            <DataName data="NUMB"/>
          </Identifier>
        </From>
        <To>
          <Identifier>
            <DataName data="NUM"/>
          </Identifier>
        </To>
      </MoveStatement>
      <IfStatement>
        <Condition>
          <Relation>
            <Identifier>
              <DataName data="NUMB"/>
            </Identifier>
            <Equal/>
            <Numeric data="0"/>
          </Relation>
        </Condition>
        <Then>
          <MoveStatement>
            <From>
              <Numeric data="1"/>
            </From>
            <To>
              <Identifier>
                <DataName data="FACT"/>
              </Identifier>
            </To>
          </MoveStatement>
        </Then>
        <Else>
          <SubtractStatement>
            <Value>
              <Numeric data="1"/>
            </Value>
            <From>
              <Identifier>
                <DataName data="NUMB"/>
              </Identifier>
            </From>
          </SubtractStatement>
          <CallStatement>
            <Name>
              <String data="'FACTORIAL'"/>
            </Name>
          </CallStatement>
          <MultiplyStatement>
            <Value>
              <Identifier>
                <DataName data="NUM"/>
              </Identifier>
            </Value>
            <By>
              <Identifier>
                <DataName data="FACT"/>
              </Identifier>
            </By>
          </MultiplyStatement>
        </Else>
      </IfStatement>
      <DisplayStatement>
        <Values>
          <Identifier>
            <DataName data="NUM"/>
          </Identifier>
          <String data="'! = '"/>
          <Identifier>
            <DataName data="FACT"/>
          </Identifier>
        </Values>
      </DisplayStatement>
      <GobackStatement/>
    </Sentence>
  </ProcedureDivision>
  <EndName data="FACTORIAL"/>
</Program>

The last step is to transform tree into cobolxom is in the TODO list.

We have commited COBOL grammar in the same place at SourceForge as it was with XQuery grammar. Solution is now under the VS 2010.

Saturday, 09 October 2010 08:26:23 UTC  #    Comments [0] -
Announce | Incremental Parser | xslt
# Friday, 08 October 2010

Suppose you have a timestamp string, and want to check whether it fits to one of the following formats with leading and trailing spaces:

  • YYYY-MM-DD-HH.MM.SS.NNNNNN
  • YYYY-MM-DD-HH.MM.SS
  • YYYY-MM-DD

We decided to use regex and its capture groups to extract timestamp parts. This left us with only solution: xsl:analyze-string instruction. It took a couple more minutes to reach a final solution:

<xsl:variable name="parts" as="xs:string*">
  <xsl:analyze-string select="$value"
    regex="
      ^\s*(\d\d\d\d)-(\d\d)-(\d\d)
      (-(\d\d)\.(\d\d)\.(\d\d)(\.(\d\d\d\d\d\d))?)?\s*$"
    flags="x">
    <xsl:matching-substring>
      <xsl:sequence select="regex-group(1)"/>
      <xsl:sequence select="regex-group(2)"/>
      <xsl:sequence select="regex-group(3)"/>

      <xsl:sequence select="regex-group(5)"/>
      <xsl:sequence select="regex-group(6)"/>
      <xsl:sequence select="regex-group(7)"/>

      <xsl:sequence select="regex-group(9)"/>
    </xsl:matching-substring>
  </xsl:analyze-string>
</xsl:variable>

<xsl:choose>
  <xsl:when test="exists($parts)">
    ...
  </xsl:when>
  <xsl:otherwise>
    ...
  </xsl:otherwise>
</xsl:choose>

How would you solve the problem? Is it the best solution?

Friday, 08 October 2010 17:37:44 UTC  #    Comments [0] -
Tips and tricks | xslt
# Wednesday, 04 August 2010

We have updated C# XOM (csharpxom) to support C# 4.0 (in fact there are very few changes).

From the grammar perspective this includes:

  • Dynamic types;
  • Named and optional arguments;
  • Covariance and contravariance of generic parameters for interfaces and delegates.

Dynamic type, C#:

dynamic dyn = 1;

C# XOM:

<var name="dyn">
  <type name="dynamic"/>
  <initialize>
    <int value="1"/>
  </initialize>
</var>

Named and Optional Arguments, C#:

int Increment(int value, int increment = 1)
{
  return value + increment;
}

void Test()
{
  // Regular call.
  Increment(7, 1);

  // Call with named parameter.
  Increment(value: 7, increment: 1);
 
  // Call with default.
  Increment(7);
}

C# XOM:

<method name="Increment">
  <returns>
    <type name="int"/>
  </returns>
  <parameters>
    <parameter name="value">
      <type name="int"/>
    </parameter>
    <parameter name="increment">
      <type name="int"/>
      <initialize>
        <int value="1"/>
      </initialize>
    </parameter>
  </parameters>
  <block>
    <return>
      <add>
        <var-ref name="value"/>
        <var-ref name="increment"/>
      </add>
    </return>
  </block>
</method>

<method name="Test">
  <block>
    <expression>
      <comment>Regular call.</comment>
      <invoke>
        <method-ref name="Increment"/>
        <arguments>
          <int value="7"/>
          <int value="1"/>
        </arguments>
      </invoke>
    </expression>

    <expression>
      <comment>Call with named parameter.</comment>
      <invoke>
        <method-ref name="Increment"/>
        <arguments>
          <argument name="value">
            <int value="7"/>
          </argument>
          <argument name="increment">
            <int value="1"/>
          </argument>
        </arguments>
      </invoke>
    </expression>

    <expression>
      <comment>Call with default.</comment>
      <invoke>
        <method-ref name="Increment"/>
        <arguments>
          <int value="7"/>
        </arguments>
      </invoke>
    </expression>
  </block>
</method>

Covariance and contravariance, C#:

public interface Variance<in T, out P, Q>
{
  P X(T t);
}

C# XOM:

<interface access="public" name="Variance">
  <type-parameters>
    <type-parameter name="T" variance="in"/>
    <type-parameter name="P" variance="out"/>
    <type-parameter name="Q"/>
  </type-parameters>
  <method name="X">
    <returns>
      <type name="P"/>
    </returns>
    <parameters>
      <parameter name="t">
        <type name="T"/>
      </parameter>
    </parameters>
  </method>
</interface>

Other cosmetic fixes were also introduced into Java XOM (jxom), COBOL XOM (cobolxom), and into sql XOM (sqlxom).

The new version is found at languages-xom.zip.

See also: What's New in Visual C# 2010

Wednesday, 04 August 2010 14:00:26 UTC  #    Comments [0] -
Announce | xslt
# Thursday, 15 July 2010

We have run into another xslt bug, which depends on several independent circumstances and often behaves differently being observed. That's clearly a Heisenbug.

Xslt designers failed to realize that a syntactic suggar they introduce into xpath can turn into obscure bugs. Well, it's easy to be wise afterwards...

To the point.

Consider you have a sequence consisting of text nodes and elements, and now you want to "normalize" this sequence wrapping adjacent text nodes into separate elements. The following stylesheet is supposed to do the work:

<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:t="http://www.nesterovsky-bros.com/xslt/this"
  exclude-result-prefixes="xs t">

  <xsl:template match="/">
    <xsl:variable name="nodes" as="node()*">
      <xsl:text>Hello, </xsl:text>
      <string value="World"/>
      <xsl:text>! </xsl:text>
      <xsl:text>Well, </xsl:text>
      <string value="hello"/>
      <xsl:text>, if not joking!</xsl:text>
    </xsl:variable>
 
    <result>
      <xsl:sequence select="t:normalize($nodes)"/>
    </result>
  </xsl:template>

  <xsl:function name="t:normalize" as="node()*">
    <xsl:param name="nodes" as="node()*"/>

    <xsl:for-each-group select="$nodes" group-starting-with="*">
      <xsl:variable name="string" as="element()?" select="self::string"/>
      <xsl:variable name="texts" as="node()*"
        select="current-group() except $string"/>

      <xsl:sequence select="$string"/>

      <xsl:if test="exists($texts)">
        <string value="{string-join($texts, '')}"/>
      </xsl:if>
    </xsl:for-each-group>
  </xsl:function>

</xsl:stylesheet>

We're expecting the following output:

<result>
  <string value="Hello, "/>
  <string value="World"/>
  <string value="! Well, "/>
  <string value="hello"/>
  <string value=", if not joking!"/>
</result>

But often we're getting other results, like:

<result>
  <string value="Hello, "/>
  <string value="World"/>
  <string value="Well, ! "/>
  <string value="hello"/>
  <string value=", if not joking!"/>
</result>

Such output may seriously confuse, unless you will recall the rule for the xpath except operator:

The except operator takes two node sequences as operands and returns a sequence containing all the nodes that occur in the first operand but not in the second operand.

... these operators eliminate duplicate nodes from their result sequences based on node identity. The resulting sequence is returned in document order..

...
The relative order of nodes in distinct trees is stable but implementation-dependent

These words mean that result sequence may be very different from original sequence.

In contrast, if we change $text definition to:

<xsl:variable name="texts" as="node()*"
  select="current-group()[not(. is $string)]"/>

then the result becomes stable, but less clear.

See also Xslt Heisenbug

Thursday, 15 July 2010 08:22:13 UTC  #    Comments [0] -
Thinking aloud | Tips and tricks | xslt
# Tuesday, 22 June 2010

Recently we were raising a question about serialization of ASPX output in xslt.

The question went like this:

What's the recommended way of ASPX page generation?
E.g.:

------------------------
 <%@ Page AutoEventWireup="true"
   CodeBehind="CurMainMenuP.aspx.cs"
   EnableSessionState="True"
   Inherits="Currency.CurMainMenuP"
   Language="C#"
   MaintainScrollPositionOnPostback="True"
   MasterPageFile="Screen.Master" %>

<asp:Content ID="Content1" runat="server" ContentPlaceHolderID="Title">CUR_MAIN_MENU_P</asp:Content>

<asp:Content ID="Content2" runat="server" ContentPlaceHolderID="Content">
  <span id="id1222146581" runat="server"
    class="inputField system UpperCase" enableviewstate="false">
    <%# Dialog.Global.TranCode %>
  </span>
  ...
------------------------

Notice aspx page directives, data binding expessions, and prefixed tag names without namespace declarations.

There was a whole range of expected answers. We, however, looked whether somebody have already dealed with the task and has a ready solution at hands.

In general it seems that xslt community is very angry about ASPX: both format and technology. Well, put this aside.

The task of producing ASPX, which is almost xml, is not solvable when you're staying with pure xml serializer. Xslt's xsl:character-map does not work at all. In fact it looks as a childish attempt to address the problem, as it does not support character escapes but only grabs characters and substitutes them with strings.

We have decided to create ASPX serializer API producing required output text. This way you use <xsl:output method="text"/> to generate ASPX pages.

With this goal in mind we have defined a little xml schema to describe ASPX irregularities in xml form. These are:

  • <xs:element name="declared-prefix"> - to describe known prefixes, which should not be declared;
  • <xs:element name="directive"> - to describe directives like <%@ Page %>;
  • <xs:element name="content"> - a transparent content wrapper;
  • <xs:element name="entity"> - to issue xml entity;
  • <xs:element name="expression"> - to describe aspx expression like <%# Eval("A") %>;
  • <xs:element name="attribute"> - to describe an attribute of the parent element.

This approach greately simplified for us an ASPX generation process.

The API includes:

Tuesday, 22 June 2010 10:25:41 UTC  #    Comments [0] -
Announce | ASP.NET | Thinking aloud | Tips and tricks | xslt
# Friday, 14 May 2010

We have implemented report parser in C#. Bacause things are spinned around C#, a schema definition is changed.

We have started from classes defining a report definition tree, annotated these classes for xml serialization, and, finally, produced xml schema for such tree. So, at present, it is not an xml schema with annotations but a separate xml schema.

In addition we have defined APIs:

  • to enumerate report data (having report definition and report data one can get IEnumerable<ViewValue> to iterate report data in structured form);
  • to read report through XmlReader, which allows, for example, to have report as input for an xslt tranformation.
  • to write report directly into XmlWriter.

An example of report definition as C# code is: MyReport.cs. The very same report definition but serialized into xml is my-report.xml. A generated xml schema for a report definition is: schema0.xsd.

The good point about this solution is that it's already flexible enough to describe every report layout we have at hands, and it's extendable. Our measurments show that report parsing is extremely fast and have very small memory footprint due to forward only nature of report definitions.

From the design point of view report definition is a view of original text data with view info attached.

At present we have defined following views:

  • Element - a named view to generate output from a content view;
  •  Content - a view to aggregate other views together;
  • Choice - a view to produce output from one of content views;
  • Sequence - a view to sequence input view by key expressions, and to attach an index to each sequence item;
  • Iterator - a view to generate output from input view while some condition is true, and to attach an iteration index to each part of output view;
  • Page - a view to remove page headers and footers in the input view, and to attach an index to each page;
  • Compute - a named view to produce result of evaluation of expression as output view;
  •  Data - a named view to produce output value from some bounds of input view, and optionally to convert, validate and format the value.

To specify details of definitions there are:

  • expressions to deal with integers: Add, Div, Integer, MatchProperty, Max, Min, Mod, Mul, Neg, Null, Sub, VariableRef, ViewProperty, Case;
  • conditions to deal with booleans: And, EQ, GE, GT, IsMatch, LE, LT, NE, Not, Or.

At present there is no a specification of a report definitions. Probably, it's the most complex part to create such a spec for a user without deep knowledge. At present, our idea is that one should use xml schema (we should polish generated schema) for the report definition and schema aware editor to build report definitions. That's very robust approach working perfectly with languages xom.

C# sources can be found at: ReportLayout.zip including report definition classes and a sample report.

Friday, 14 May 2010 12:45:42 UTC  #    Comments [0] -
Announce | Thinking aloud | xslt
# Sunday, 09 May 2010
Ribbon of Saint George

We're facing a task of parsing reports produced from legacy applications and converting them into a structured form, e.g. into xml. These xml files can be processed further with up to date tools to produce good looking reports.

Reports at hands are of very different structure and of size: from a couple of KB to a several GB. The good part is that they mostly have a tabular form, so it's easy to think of specific parsers in case of each report type.

Our goal is to create an environment where a less qualified person(s) could create and manage such parsers, and only rarely to engage someone who will handle less untrivial cases.

Our analysis has shown that it's possible to write such parser in almost any language: xslt, C#, java.

Our approach was to create an xml schema annotations that from one side define a data structure, and from the other map report layout. Then we're able to create an xslt that will generate either xslt, C#, or java parser according to the schema definitions. Because of languages xom, providing XML Object Model and serialization stylesheets for C# and java, it does not really matter what we shall generate xslt or C#/java, as code will look the same.

The approach we're going to use to describe reports is not as powerfull as conventional parsers. Its virtue, however, is simplicity of specification.

Consider a report sample (a data to extract is in bold):

1 TITLE ...                    PAGE:            1
 BUSINESS DATE: 09/30/09   ... RUN DATE: 02/23/10
 CYCLE : ITD      RUN: 001 ... RUN TIME: 09:22:39

        CM         BUS   ...
  CO    NBR  FRM   FUNC  ...
 ----- ----- ----- -----  
 XXX   065   065   CLR   ...
 YYY   ...
...
1 TITLE ...                    PAGE:            2
 BUSINESS DATE: 09/30/09   ... RUN DATE: 02/23/10
 CYCLE : ITD      RUN: 001 ... RUN TIME: 09:22:39

        CM         BUS   ...
  CO    NBR  FRM   FUNC  ...
 ----- ----- ----- -----  
 AAA   NNN   MMM   PPP   ...
 BBB   ...
...

* * * * *  E N D   O F   R E P O R T  * * * * *

We're approaching to the report through a sequence of views (filters) of this report. Each veiw localizes some report data either for the subsequent filterring or for the extraction of final data.

Looking into the example one can build following views of the report:

  1. View of data before the "E N D   O F   R E P O R T" line.
  2. View of remaining data without page headers and footers.
  3. Views of table rows.
  4. Views of cells.

A sequence of filters allows us to build a pipeline of transformations of original text. This also allows us to generate a clean xslt, C# or java code to parse the data.

At first, our favorite language for such parser was xslt. Unfortunatelly, we're dealing with Saxon xslt implementation, which is not very strong in streaming processing. Without a couple of extension functions to prevent caching, it tends to cache whole input in the memory, which is not acceptable.

At present we have decided to start from C# code, which is pure C# naturally. :-)

Code still is in the development but at present we would like to share the xml schema annotations describing report layout: report-mapping.xsd, and a sample of report description: test.xsd.

Sunday, 09 May 2010 05:18:57 UTC  #    Comments [0] -
Announce | Thinking aloud | xslt
# Wednesday, 05 May 2010

A few little changes in streaming and in name normalization algorithms in jxom and in csharpxom and the generation speed almost doubled (especially for big files).

We suspect, however, that our xslt code is tuned for saxon engine.

It would be nice to know if anybody used languages XOM with other engines. Is anyone using it at all (well, at least there are downloads)?

Languages XOM (jxom, csharpxom, cobolxom, sqlxom) can be loaded from: languages-xom.zip

Wednesday, 05 May 2010 06:48:10 UTC  #    Comments [1] -
Announce | xslt
# Sunday, 02 May 2010

At times a simple task in xslt looks like a puzzle. Today we have this one.

For a string and a regular expression find a position and a length of the matched substring.

The problem looks so simple that you do not immediaty realize that you are going to spend ten minutes trying to solve it in the best way.

Try it yourself before proceeding:














<xsl:variable name="match" as="xs:integer*">
  <xsl:analyze-string select="$line" regex="my-reg-ex">
    <xsl:matching-substring>
      <xsl:sequence select="1, string-length(.)"/>
    </xsl:matching-substring>
    <xsl:non-matching-substring>
      <xsl:sequence select="0, string-length(.)"/>
    </xsl:non-matching-substring>
  </xsl:analyze-string>
</xsl:variable>

<xsl:choose>
  <xsl:when test="$match[1]">
    <xsl:sequence select="1, $match[2]"/>
  </xsl:when>
  <xsl:when test="$match[3]">
    <xsl:sequence select="$match[2], $match[4]"/>
  </xsl:when>
</xsl:choose>

Sunday, 02 May 2010 15:35:02 UTC  #    Comments [0] -
Tips and tricks | xslt
# Saturday, 01 May 2010

To see that the problem with Generator functions in xslt is a bit more complicated compare two functions.

The first one is quoted from the earlier post:

  <xsl:function name="t:generate" as="xs:integer*">
    <xsl:param name="value" as="xs:integer"/>

    <xsl:sequence select="$value"/>
    <xsl:sequence select="t:generate($value * 2)"/>
  </xsl:function>

It does not work in Saxon: crashes with out of memory.

The second one is slightly modified version of the same function:

  <xsl:function name="t:generate" as="xs:integer*">
    <xsl:param name="value" as="xs:integer"/>

    <xsl:sequence select="$value + 0"/>
    <xsl:sequence select="t:generate($value * 2)"/>
  </xsl:function>

It's working without problems. In first case Saxon decides to cache all function's output, in the second case it decides to evaluate data lazily on demand.

It seems that optimization algorithms implemented in Saxon are so plentiful and complex that at times they fool one another. :-)

See also: Generator functions

Saturday, 01 May 2010 07:18:24 UTC  #    Comments [0] -
Thinking aloud | Tips and tricks | xslt
# Friday, 23 April 2010

There are some complications with streamed tree that we have implemented in saxon. They are due to the fact that only a view of input data is available at any time. Whenever you access some element that's is not available you're getting an exception.

Consider an example. We have a log created with java logging. It looks like this:

<log>
  <record>
    <date>...</date>
    <millis>...</millis>
    <sequence>...</sequence>
    <logger>...</logger>
    <level>INFO</level>
    <class>...</class>
    <method>...</method>
    <thread>...</thread>
    <message>...</message>
  </record>
  <record>
    ...
  </record>
  ...

We would like to write an xslt that returns a page of log as html:

<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:t="http://www.nesterovsky-bros.com/xslt/this"
  xmlns="http://www.w3.org/1999/xhtml"
  exclude-result-prefixes="xs t">

  <xsl:param name="start-page" as="xs:integer" select="1"/>
  <xsl:param name="page-size" as="xs:integer" select="50"/>

  <xsl:output method="xhtml" byte-order-mark="yes" indent="yes"/>

  <!-- Entry point. -->
  <xsl:template match="/log">
    <xsl:variable name="start" as="xs:integer"
      select="($start-page - 1) * $page-size + 1"/>
    <xsl:variable name="records" as="element()*"
      select="subsequence(record, $start, $page-size)"/>

    <html>
      <head>
        <title>
          <xsl:text>A log file. Page: </xsl:text>
          <xsl:value-of select="$start-page"/>
        </title>
      </head>
      <body>
        <table border="1">
          <thead>
            <tr>
              <th>Level</th>
              <th>Message</th>
            </tr>
          </thead>
          <tbody>
            <xsl:apply-templates mode="t:record" select="$records"/>
          </tbody>
        </table>
      </body>
    </html>
  </xsl:template>

  <xsl:template mode="t:record" match="record">
    <!-- Make a copy of record to avoid streaming access problems. -->
    <xsl:variable name="log">
      <xsl:copy-of select="."/>
    </xsl:variable>

    <xsl:variable name="level" as="xs:string"
      select="$log/record/level"/>
    <xsl:variable name="message" as="xs:string"
      select="$log/record/message"/>

    <tr>
      <td>
        <xsl:value-of select="$level"/>
      </td>
      <td>
        <xsl:value-of select="$message"/>
      </td>
    </tr>
  </xsl:template>

</xsl:stylesheet>

This code does not work. Guess why? Yes, it's subsequence(), which is too greedy. It always wants to know what's the next node, so it naturally skips a content of the current node. Algorithmically, such saxon code could be rewritten, and could possibly work better also in modes other than streaming.

A viable workaround, which does not use subsequence, looks rather untrivial:

<!-- Entry point. -->
<xsl:template match="/log">
  <xsl:variable name="start" as="xs:integer"
    select="($start-page - 1) * $page-size + 1"/>
  <xsl:variable name="end" as="xs:integer"
    select="$start + $page-size"/>

  <html>
    <head>
      <title>
        <xsl:text>A log file. Page: </xsl:text>
        <xsl:value-of select="$start-page"/>
      </title>
    </head>
    <body>
      <table border="1">
        <thead>
          <tr>
            <th>Level</th>
            <th>Message</th>
          </tr>
        </thead>
        <tbody>
          <xsl:sequence select="
            t:generate-records(record, $start, $end, ())"/>
        </tbody>
      </table>
    </body>
  </html>
</xsl:template>

<xsl:function name="t:generate-records" as="element()*">
  <xsl:param name="records" as="element()*"/>
  <xsl:param name="start" as="xs:integer"/>
  <xsl:param name="end" as="xs:integer?"/>
  <xsl:param name="result" as="element()*"/>

  <xsl:variable name="record" as="element()?" select="$records[$start]"/>

  <xsl:choose>
    <xsl:when test="(exists($end) and ($start > $end)) or empty($record)">
      <xsl:sequence select="$result"/>
    </xsl:when>
    <xsl:otherwise>
      <!-- Make a copy of record to avoid streaming access problems. -->
      <xsl:variable name="log">
        <xsl:copy-of select="$record"/>
      </xsl:variable>

      <xsl:variable name="level" as="xs:string"
        select="$log/record/level"/>
      <xsl:variable name="message" as="xs:string"
        select="$log/record/message"/>

      <xsl:variable name="next-result" as="element()*">
        <tr>
          <td>
            <xsl:value-of select="$level"/>
          </td>
          <td>
            <xsl:value-of select="$message"/>
          </td>
        </tr>
      </xsl:variable>

      <xsl:sequence select="
        t:generate-records
        (
          $records,
          $start + 1,
          $end,
          ($result, $next-result)
        )"/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:function>

Here we observed the greediness of saxon, which too early tried to consume more input than it's required. In the other cases we have seen that it may defer actual data access to the point when there is no data anymore.

So, without tuning internal saxon logic it's possible but not easy to write stylesheets that exploit streaming features.

P.S. Updated sources are at streamedtree.zip

Friday, 23 April 2010 10:12:38 UTC  #    Comments [0] -
Thinking aloud | xslt
# Wednesday, 21 April 2010

When time has come to process big xml log files we've decided to implement streamable tree in saxon the very same way it was implemented in .net eight years ago (see How would we approach to streaming facility in xslt).

It's interesting enough that the implementation is similar to one of composable tree. There a node never stores a reference to a parent, while in the streamed tree no references to children are stored. This way only a limited subview of tree is available at any time. Implementation does not support preceding and preceding-sibling axes. Also, one cannot navigate to a node that is out of scope.

Implementation is external (there are no changes to saxon itself). To use it one needs to create an instance of DocumentInfo, which pulls data from XMLStreamReader, and to pass it as an input to a transformation:

Controller controller = (Controller)transformer;
XMLInputFactory factory = XMLInputFactory.newInstance();
StreamSource inputSource = new StreamSource(new File(input));
XMLStreamReader reader = factory.createXMLStreamReader(inputSource);
StaxBridge bridge = new StaxBridge();

bridge.setPipelineConfiguration(
  controller.makePipelineConfiguration());
bridge.setXMLStreamReader(reader);
inputSource = new DocumentImpl(bridge);

transformer.transform(inputSource, new StreamResult(output));

This helped us to format an xml log file of arbitrary size. An xslt like this can do the work:

<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns="http://www.w3.org/1999/xhtml"
  exclude-result-prefixes="xs">

  <xsl:template match="/log">
    <html>
      <head>
        <title>Log</title>
      </head>
      <body>
        <xsl:apply-templates/>
      </body>
    </html>
  </xsl:template>

  <xsl:template match="message">
   ...
  </xsl:template>

  <xsl:template match="message[@error]">
    ...
  </xsl:template>

  ...

</xsl:stylesheet>

Implementation can be found at: streamedtree.zip

Wednesday, 21 April 2010 07:10:34 UTC  #    Comments [0] -
Announce | Thinking aloud | xslt
# Thursday, 15 April 2010

jxom else if (google search)

Google helps with many things but with retrospective support.

Probably guy's trying to build a nested if then else jxom elements.

We expected this and have defined a function t:generate-if-statement() in java-optimizer.xslt.

Its signature:

<!--
  Generates if/then/else if ... statements.
    $closure - a series of conditions and blocks.
    $index - current index.
    $result - collected result.
    Returns if/then/else if ... statements.
-->
<xsl:function name="t:generate-if-statement" as="element()">
  <xsl:param name="closure" as="element()*"/>
  <xsl:param name="index" as="xs:integer"/>
  <xsl:param name="result" as="element()?"/>

Usage is like this:

<!-- Generate a sequence of pairs: (condition, scope). -->
<xsl:variable name="branches" as="element()+">
  <xsl:for-each select="...">
    <!-- Generate condition. -->
    <scope>
        <!-- Generate statemetns.  -->
    </scope>
  </xsl:for-each>
</xsl:variable>

<xsl:variable name="else" as="element()?">
  <!-- Generate final else, if any. -->
</xsl:variable>

<!-- This generates if statement. -->
<xsl:sequence
  select="t:generate-if-statement($branches, count($branches) - 1, $else)"/>

P.S. By the way, we like that someone is looking into jxom.

Thursday, 15 April 2010 06:59:01 UTC  #    Comments [0] -
Tips and tricks | xslt
# Friday, 09 April 2010

By the generator we assume a function that produces an infinitive output sequence for a particular input.

That's a rather theoretical question, as xslt does not allow infinitive sequence, but look at the example:

<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:t="http://www.nesterovsky-bros.com/xslt"
  exclude-result-prefixes="xs t">

  <xsl:template match="/">
    <xsl:variable name="value" as="xs:string" select="'10100101'"/>

    <xsl:variable name="values" as="xs:integer+" select="t:generate(1)"/>

    <!--<xsl:variable name="values" as="xs:integer+">
      <xsl:call-template name="t:generate">
        <xsl:with-param name="value" select="1"/>
      </xsl:call-template>
    </xsl:variable>-->

    <xsl:variable name="integer" as="xs:integer" select="
      sum
      (
        for $index in 1 to string-length($value) return
          $values[$index][substring($value, $index, 1) = '1']
      )"/>

    <xsl:message select="$integer"/>
  </xsl:template>

  <xsl:function name="t:generate" as="xs:integer*">
    <xsl:param name="value" as="xs:integer"/>

    <xsl:sequence select="$value"/>
    <xsl:sequence select="t:generate($value * 2)"/>
  </xsl:function>

  <!--<xsl:template name="t:generate" as="xs:integer*">
    <xsl:param name="value" as="xs:integer"/>

    <xsl:sequence select="$value"/>

    <xsl:call-template name="t:generate">
      <xsl:with-param name="value" select="$value * 2"/>
    </xsl:call-template>
  </xsl:template>-->

</xsl:stylesheet>

Here the logic uses such a generator and decides by itself where to break.

Should such code be valid?

From the algorithmic perspective example would better to work, as separation of generator logic and its use are two different things.

Friday, 09 April 2010 14:38:34 UTC  #    Comments [0] -
Thinking aloud | xslt

Lately, after playing a little with saxon tree models, we thought that design would be more cleaner and implementation faster if NamePool were implemented differently.

Now, saxon is very pessimistic about java objects thus it prefers to encode qualified names with integers. The encoding and decoding is done in the NamePool. Other parts of code use these integer values.

Operations done over these integers are:

  • equality comparision of two such integers in order to check whether to qualified or extended names are equal;
  • get different parts of qualified name from NamePool.

We would design this differently. We would:

  1. create a QualifiedName class to store all name parts.
  2. declare NamePool to create and cache QualifiedName instances.

This way:

  • equality comparision would be a reference comparision of two instances;
  • different parts of qualified name would become a trivial getter;
  • contention of such name pool would be lower.

That's the implementation we would propose: QualifiedName.java, NameCache.java

Friday, 09 April 2010 13:05:30 UTC  #    Comments [0] -
Thinking aloud | xslt
# Thursday, 08 April 2010

Earlier, in the entry "Inline functions in xslt 2.1" we've described an implementation of xml tree model that may share subtrees among different trees.

This way, in a code:

<xsl:variable name="elements" as="element()*" select="..."/>

<xsl:variable name="result" as="element()">
  <result>
    <xsl:sequence select="$elements"/>
  </result>
</xsl:variable>

the implementation shares internal representation among $elements and subtree of $result. From the perspective of xslt it looks as completely different subtrees with different node identities, which is in the accordance with its view of the world.

After a short study we've decided to create a research implementation of this tree model in saxon. It's took only a couple of days to introduce a minimal changes to engine, to refactor linked tree into a new composable tree, and to perform some tests.

In many cases saxon has benefited immediately from this new tree model, in some other cases more tunings are required.

Our tests've showed that this new tree performed better than linked tree, but a little bit worser than tiny tree. On the other hand, it's obvious that conventional code patterns avoid subtree copying, assuming it's expensive operation, thus one should rethink some code practices to benefit from composable tree.

Implementation can be downloaded at: saxon.composabletree.zip

Thursday, 08 April 2010 06:26:02 UTC  #    Comments [0] -
Announce | Thinking aloud | xslt
# Sunday, 04 April 2010

From the web we know that xslt WG is thinking now on how to make xslt more friendly to a huge documents. They will probably introduce some xslt syntax to allow implementation to be ready for a such processing.

They will probably introduce an indicator marking a specific mode for streaming. XPath in this mode will probably be restricted to a some its subset.

The funny part is that we have implemented similar feature back in 2002 in .net. It was called XPathForwardOnlyNavigator.

Implementation stored only several nodes at a time (context node and its ancestors), and read data from XmlReader perforce. Thus one could navigate to ancestor elements, to children, and to the following siblings, but never to any previous node. When one tried to reach a node that was already not available we threw an exception.

It was simple, not perfect (too restrictive) but it was pluggable in .net's xslt, and allowed to process files of any size.

That old implementation looks very attractive even now in 2010. We expect that WG with their decisions will not rule out such or similar solutions, and will not force implementers to write alternative engine for xslt streaming.

Sunday, 04 April 2010 20:53:27 UTC  #    Comments [0] -
Thinking aloud | xslt
# Friday, 02 April 2010

Xslt 1.0 has been designed based on the best intentions. Xslt 2.0 got a legacy baggage.

If you're not entirely concentrated during translation of your algorithms into xslt 2.0 you can get into trap, as we did.

Consider a code snapshot:

<xsl:variable name="elements" as="element()+">
  <xsl:apply-templates/>
</xsl:variable>

<xsl:variable name="converted-elements" as="element()+"
  select="$elements/t:convert(.)"/>

Looks simple, isn't it?

Our intention was to get converted elements, which result from some xsl:apply-templates logic.

Well, this code works... but rather sporadically, as results are often in wrong order! This bug is very close to what is called a Heisenbug.

So, where is the problem?

Elementary, my dear Watson:

  1. xsl:apply-templates constructs a sequence of rootless elements.
  2. $elements/t:convert(.) converts elements and orders them in document order.

Here is a tricky part:

The relative order of nodes in distinct trees is stable but implementation-dependent...

Clearly each rootless element belongs to a unique tree.

After that we have realized what the problem is, code has been immediately rewritten as:

<xsl:variable name="elements" as="element()+">
  <xsl:apply-templates/>
</xsl:variable>

<xsl:variable name="converted-elements" as="element()+" select="
  for $element in $elements return
    t:convert($element)"/>

P.S. Taking into an accout a size of our xslt code base, it took a half an hour to localize the problem. Now, we're at position to review all uses of slashes in xslt. As you like it?

Friday, 02 April 2010 17:53:18 UTC  #    Comments [0] -
Thinking aloud | xslt
# Saturday, 27 March 2010

Opinions on xml namespaces

olegtk: @srivatsn Originally the idea was that namespace URI would point to some schema definition. Long abandoned idea.

Not so long ago, I've seen a good reasoning about the same subject:

Saturday, 27 March 2010 09:49:45 UTC  #    Comments [0] -
xslt
# Monday, 22 March 2010

Inline functions in xslt 2.1 look often as a some strange aberration. Sure, there are very usefull cases when they are delegates of program logic (e.g. comparators, and filters), but often (probably more often) we can see that it's use is to model data structures.

As an example, suppose you want to model a structure with three properties say a, b, and c. You implement this creating functions that wrap and unwrap the data:

function make-data($a as item(), $b as item(), $c as item()) as function() as item()+
{
  function() { $a, $b, $c }
}

function a($data as function() as item()+) as item()
{
  $data()[1]
}

function b($data as function() as item()+) as item()
{
  $data()[2]
}

function c($data as function() as item()+) as item()
{
  $data()[3]
}

Clever?

Sure, it is! Here, we have modeled structrue with the help of sequence, which we have wrapped into a function item.

Alas, clever is not always good (often it's a sign of a bad). We just wanted to define a simple structure. What it has in common with function?

There is a distance between what we want to express, designing an algorithm, and what we see looking at the code. The greater the distance, the more efforts are required to write, and to read the code.

It would be so good to have simpler way to express such concept as a structure. Let's dream a little. Suppose you already have a structure, and just want to access its members. An idea we can think about is an xpath like access method:

$data/a, $data/b, $data/c

But wait a second, doesn't $data looks very like an xml element, and its accessors are just node tests? That's correct, so data constructor may coincide with element constructor.

Then what pros and cons of using of xml elements to model structures?

Pros are: existing xml type system, and sensibly looking code (you just understand that here we're constructing a structure).

Cons are: xml trees are implemented the way that does not assume fast (from the perfromace perspective) composition, as when you construct a structure a copy of data is made.

But here we observe that "implemented" is very important word in this context. If xml tree implementation would not store reference to the parent node then subtrees could be composed very efficiently (note that tree is assumed to be immutable). Parent node could be available through a tree navigator, which would contain reference to a node itself and to a parent tree navigator (or to store child parent map somewhere near the root).

Such tree structure would probably help not only in this particular case but also with other conventional xslt code patterns.

P.S. Saxon probably could implement its NodeInfo this way.

Update: see also Custom tree model.

Monday, 22 March 2010 11:02:07 UTC  #    Comments [0] -
Thinking aloud | xslt
# Monday, 15 March 2010

A while ago we have proposed to introduce maps as built-in types in xpath/xquery type system: Tuples and maps.

The suggestion has been declined (probably our arguments were not convincing). We, however, still think that maps along with sequences are primitives, required to build sensible (not obscure one) algorithms. To see that map is at times is the best way to resolve the problems we shall refer to an utility function to allocate names in scope. Its signature looks like this:

<!--
Allocates unique names in the form $prefix{number}?.
Note that prefixes may coincide.
$prefixes - a name prefixes.
$names - allocated names pool.
$name-max-length - a longest allowable name length.
Returns unique names.
-->
<xsl:function name="t:allocate-names" as="xs:string*">
  <xsl:param name="prefixes" as="xs:string*"/>
  <xsl:param name="names" as="xs:string*"/>
  <xsl:param name="name-max-length" as="xs:integer?"/>

Just try to program it and you will find yourselves coding something like one defined at cobolxom.

To be efficient such maps should provide, at worst, a logarithmic operation complexity:

  • Access to the map through a key (and/or by index) - complexity is LnN;
  • Creation of a new map with added or removed item - complexity is LnN;
  • Construction of the map from ordered items - complexity is O(N);
  • Construction of the map from unordered items - complexity is O(N*LnN);
  • Total enumeration of all items - complexity is O(N*LnN);

These performance metrics are found in many functional and procedural implementations of the maps. Typical RB and AVL tree based maps satisfy these restrictions.

What we suggest is to introduce map implementation into the exslt2 (assuming inline functions are present). As a sketch we have implemented pure AVL Tree in Xslt 2.0:

We do not assume that implementation like this should be used, but rather think of some extension function(s) that provides a better performance.

What do you think?

Monday, 15 March 2010 13:59:19 UTC  #    Comments [1] -
Thinking aloud | xslt
# Sunday, 28 February 2010

The story about immutable tree would not be complete without xslt implementation. To make it possible one needs something to approxomate tree nodes. You cannot implement such consruct efficiently in pure xslt 2.0 (it would be either unefficient or not pure).

To isolate the problem we have used tuple interface:

  • tuple:ref($items as item()*) as item() - to wrap items into a tuple;
  • tuple:deref($tuple as item()?) as item()* - to unwrap items from a tuple;
  • tuple:is-same($first as item(), $second as item()) as xs:boolean - to test whether two tuples are the same.

and defined inefficient implementation based on xml elements. Every other part of code is a regular AVL algorithm implementation.

We want to stress again that even assuming that there is a good tuple implementation we would prefer built-in associative container implementation. Why the heck you need to include about 1000 lines of code just to use a map?

Source code is:

Sunday, 28 February 2010 19:28:07 UTC  #    Comments [0] -
Thinking aloud | xslt

We like Visual Studio very much, and try to adopt new version earlier.

For the last time our VS's use pattern is centered around xml and xslt. In our opinion VS 2008 is the best xslt 2 editor we have ever seen even with lack of support of xslt 2.0 debugging.

Unfortunatelly, that is still a true when VS 2010 is almost out. VS 2008 is just 2 - 3 times faster. You can observe this working with xslt files like those in languages-xom.zip (1000 - 2000 rows). Things just become slow.

We still hope that VS 2010 will make a final effort to outdo what VS 2008 has already delivered.

Sunday, 28 February 2010 18:37:58 UTC  #    Comments [0] -
Thinking aloud | xslt
# Thursday, 25 February 2010

While bemoaning about lack of associative containers in xpath type system, we have came up with a good implementation of t:allocate-names(). Implementation can be seen at location cobol-names.xslt.

It is based on recursion and on the use of xsl:for-each-group. Alogrithmic worst case complexity is O(N*LogN*LogL), where N is number of names, and L is a length of a longest name.

This does not invalidate the idea that associative containers are very wishful, as blessed one who naturally types such implementation. For us, it went the hard way, and has taken three days to realize that original algorithm is problematic, and to work out the better one.

In practice this means 2 seconds for the new implementation against 25 minutes for the old one.

Thursday, 25 February 2010 07:19:06 UTC  #    Comments [0] -
xslt
# Wednesday, 24 February 2010

Why do we return to this theme again?

Well, it's itching!

In cobolxom there is an utility function to allocate names in scope. Its signature looks like this:

<!--
  Allocates unique names in the form $prefix{number}?.
  Note that prefixes may coincide.
    $prefixes - a name prefixes.
    $names - allocated names pool.
    $name-max-length - a longest allowable name length.
    Returns unique names.
-->
<xsl:function name="t:allocate-names" as="xs:string*">
  <xsl:param name="prefixes" as="xs:string*"/>
  <xsl:param name="names" as="xs:string*"/>
  <xsl:param name="name-max-length" as="xs:integer?"/>

We have created several different implementations (all use recursion). Every implementation works fair for relatively small input sequences, say N < 100, but we have cases when there are about 10000 items on input. Algorithm's worst case complexity, in absence of associative containers, is O(N*N), and be sure it's an O-o-o-oh... due to xslt engine implementation.

If there were associative containers with efficient access (complexity is O(LogN)), and construction of updated container (complexity is also O(LogN)) then implementation would be straightforward and had complexity O(N*LogN).

Wednesday, 24 February 2010 07:34:07 UTC  #    Comments [0] -
Thinking aloud | xslt
# Wednesday, 17 February 2010

The very same simple tasks tend to appear in different languages (e.g. C# Haiku). Now we have to find:

  • integer and fractional part of a decimal;
  • length and precision of a decimal.

These tasks have no trivial solutions in xslt 2.0.

At present we have came up with the following answers:

Fractional part:

<xsl:function name="t:fraction" as="xs:decimal">
  <xsl:param name="value" as="xs:decimal"/>

  <xsl:sequence select="$value mod 1"/>
</xsl:function>

Integer part v1:

<xsl:function name="t:integer" as="xs:decimal">
  <xsl:param name="value" as="xs:decimal"/>

  <xsl:sequence select="$value - t:fraction($value)"/>
</xsl:function>

Integer part v2:

<xsl:function name="t:integer" as="xs:decimal">
  <xsl:param name="value" as="xs:decimal"/>

  <xsl:sequence select="
    if ($value ge 0) then
      floor($value)
    else
      -floor(-$value)"/>
</xsl:function>

Length and precision:

<!--
  Gets a decimal specification as a closure:
    ($length as xs:integer, $precision as xs:integer).
-->
<xsl:function name="t:decimal-spec" as="xs:integer+">
  <xsl:param name="value" as="xs:decimal"/>

  <xsl:variable name="text" as="xs:string" select="
    if ($value lt 0) then
      xs:string(-$value)
    else
      xs:string($value)"/>

  <xsl:variable name="length" as="xs:integer"
    select="string-length($text)"/>
  <xsl:variable name="integer-length" as="xs:integer"
    select="string-length(substring-before($text, '.'))"/>
 
  <xsl:sequence select="
    if ($integer-length) then
      ($length - 1, $length - $integer-length - 1)
    else
      ($length, 0)"/>
</xsl:function>

The last function looks odious. In many other languages its implementation would be considered as embarrassing.

Wednesday, 17 February 2010 07:29:55 UTC  #    Comments [0] -
Tips and tricks | xslt
# Wednesday, 27 January 2010

Continuing with the post "Ongoing xslt/xquery spec update" we would like to articulate what options regarding associative containers do we have in a functional languages (e.g. xslt, xquery), assuming that variables are immutable and implementation is efficient (in some sense).

There are three common implementation techniques:

  • store data (keys, value pairs) in sorted array, and use binary search to access values by a key;
  • store data in a hash map;
  • store data in a binary tree (usually RB or AVL trees).

Implementation choice considerably depends on operations, which are taken over the container. Usually these are:

  1. construction;
  2. value lookup by key;
  3. key enumeration (ordered or not);
  4. container modification (add and remove data into the container);
  5. access elements by index;

Note that modification in a functional programming means a creation of a new container, so here is a division:

  1. If container's use pattern does not include modification, then probably the simplest solution is to build it as an ordered sequence of pairs, and use binary search to access the data. Alternatively, one could implement associative container as a hash map.
  2. If modification is essential then neither ordered sequence of pairs, hash map nor classical tree implementation can be used, as they are either too slow or too greedy for a memory, either during modification or during access.

On the other hand to deal with container's modifications one can build an implementation, which uses "top-down" RB or AVL trees. To see the difference consider a classical tree structure and its functional variant:

Classical Functional
Node structure: node
  parent
  left
  right
  other data
node
 
  left
  right
  other data
Node reference: node itself node path from a root of a tree
Modification: either mutable or requires a completely new tree O(LnN) nodes are created

Here we observe that:

  1. one can implement efficient map (lookup time no worse than O(LnN)) with no modification support, using ordered array;
  2. one can implement efficient map with support of modification, using immutable binary tree;
  3. one can implement all these algorithms purely in xslt and xquery (provided that inline functions are supported);
  4. any such imlementation will lose against the same implementation written in C++, C#, java;
  5. the best implementation would probably start from sorted array and will switch to binary tree after some size threshold.

Here we provide a C# implementation of a functional AVL tree, which also supports element indexing:

Our intention was to show that the usual algorithms for associative containers apply in functional programming; thus a feature complete functional language must support associative containers to make development more conscious, and to free a developer from inventing basic things existing already for almost a half of century.

Wednesday, 27 January 2010 07:00:55 UTC  #    Comments [0] -
Thinking aloud | Tips and tricks | xslt
# Tuesday, 19 January 2010

Several years ago we have started a new project. We do not like neither hate any particular language, thus the decision what language to use was pragmatical: xslt 2.0 fitted perfectly.

At present it's a solid volume of xslt code. It exhibits all the virtues of any other good project in other language: clean design, modularity, documentation, sophisticationless (good code should not be clever).

Runtime profile of the project is that it deals with xml documents with sizes from a few dozens of bytes to several megabytes, and with xml schemas from simple ones like a tabular data, and to rich like xhtml and untyped. Pipeline of stylesheets processes gigabytes of data stored in the database and in files.

All the bragging above is needed here to introduce the context for the following couple of lines.

The diversity of load conditions and a big code base, exposed xslt engine of choice to a good proof test. The victim is Saxon. In the course of project we have found and reported many bugs. Some of them are nasty and important, and others are bearable. To Michael Kay's credit (he's owner of Saxon) all bugs are being fixed promtly (see the last one).

Such project helps us to understand a weak sides of xslt (it seems sometimes they, in WG, lack such experience, which should lead them through).

Well, it has happened so that we're helping to Saxon project. Unintentionally, however! :-)

P.S. About language preferences.

Nowdays we're polishing a COBOL generation. To this end we have familiarized ourselves with this language. That's the beatiful language. Its straightforwardness helps to see the evolution of computer languages and to understand what and why today's languages try to timidly hide.

Tuesday, 19 January 2010 19:56:04 UTC  #    Comments [0] -
Thinking aloud | xslt
# Friday, 15 January 2010

We have updated languages-xom.zip. There are many fixes in cobolxom (well, cobolxom is new, and probably there will be some more bugs). Also we have included Xml Object Model for the SQL, which in fact has appeared along with jxom.

SQL xom supports basic sql syntax including common table expressions, and two dialects for DB2 and Oracle.

Friday, 15 January 2010 16:20:41 UTC  #    Comments [2] -
xslt
# Wednesday, 06 January 2010

Recently W3C has published new drafts for xquery 1.1 and for xpath 2.1. We have noticed that committee has decided to introduce inline functions both for the xquery and for the xpath.

That's a really good news! This way xquery, xpath and xslt are being approached the Object Oriented Programming the way of javascript with its inline functions.

Now we shall be able to implement tuples (a sequence of items wrapped into single item), object with named properties, trees (e.g. RB Tree), associative containers (tree maps and hash maps, sets).

Surely, all this will be in the spirit of functional programming.

The only thing we regret about is that the WG did not include built-in implementations for trees and associative containers, as we don't believe that one can create an efficient implementation of these abstractions neither in xquery nor in xslt (asymptotically results will be good, but coefficients will be painful).

See also: Tuple and maps

Wednesday, 06 January 2010 13:13:16 UTC  #    Comments [0] -
xslt
# Monday, 04 January 2010

Not sure how things work for others but for us it turns out that Saxon 9.2 introduces new bugs, works slower and eats much more memory than its ancestor v9.1.

See Memory problem with V9.2.

We hope all this will be fixed soon.

Update: By the way, Saxon 9.2 (at the moment 2009-01-04) does not like (despises in fact) small documents and especially text nodes in those documents. It loves huge in memory documents, however. :-)

Update 2009-01-05: case's closed, fix's commited into svn.

Monday, 04 January 2010 13:39:47 UTC  #    Comments [0] -
xslt
# Friday, 01 January 2010

Today, I've tried to upgrade our projects to Saxon 9.2. We have a rather big set of stylesheets grinding gigabytes of information. It's obvious that we expected at least the same performance from the new version.

But to my puzzlement a pipeline of transformations failed almost immediately with en error message:

XPTY0018: Cannot mix nodes and atomic values in the result of a path expression

We do agree with this statement in general, but what it had in common with our stylesheets? And how everything was working in 9.1?

To find the root of the problem I've created a minimal problem reproduction:

<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:t="this"
  exclude-result-prefixes="xs t">

  <!-- Entry point. -->
  <xsl:template match="/">
    <xsl:variable name="p" as="element()">
      <p l="1"/>
    </xsl:variable>

    <o l="{$p/t:d(.)}"/>
  </xsl:template>

  <xsl:function name="t:d" as="item()*">
    <xsl:param name="p" as="element()"/>

    <xsl:apply-templates mode="p" select="$p"/>
  </xsl:function>

  <xsl:template match="*" mode="p">
    <xsl:sequence select="concat('0', @l)"/>
  </xsl:template>

</xsl:stylesheet>

Really simple, isn't it? The problem is in a new optimization of concat() function, introduced in version 9.2. It tries to eliminate string concatenation, and in certain cases emits its arguments directly into the output as text nodes, separating whole output with some stopper strings. The only problem is that no such optimization is allowed in this particular case (which is rather common, and surely legal, in our stylesheets); result of <xsl:template match="p" mode="p"> should not be a node, but of type xs:string.

Saxon 9.2 is here already for 3 month, at lest! Thus, how come that such a bug was not discovered earlier?

Update: the fix is commited into the svn on the next day. That's promptly!

Friday, 01 January 2010 22:17:47 UTC  #    Comments [0] -
xslt
# Sunday, 27 December 2009

We've added a new language to the set of Xml Object Model schemas and stylesheets.

The newcomer is COBOL! No jokes. It's not a whim, really. Believe it or not but COBOL is still alive and we need to generate it (mainly different sorts of proxies).

We've used VS COBOL II grammar Version 1.0.3 as a reference. Implemented grammar is complete but without preprocessor statements. On the other hand it defines COPY and EXEC SQL constructs.

Definitely, it'll take a time for the xml schema and xslt implementation to become mature.

Now language XOM is:

  • jxom - for java;
  • csharpxom - for C#;
  • cobolxom - for COBOL.

Sources can be found at languages-xom.

Sunday, 27 December 2009 17:00:07 UTC  #    Comments [0] -
Announce | xslt
# Monday, 21 December 2009

 Given:

  • an xml defining elements and groups;
  • each element belongs to a group or groups;
  • group may belong to another group.

Find:

  • groups, a given element directly or inderectly belongs to;
  • a function checking whether an element belongs to a group.

Example:

<groups>
  <group name="g1">
    <element ref="e1"/>
    <element ref="e2"/>
    <element ref="e3"/>
    <group ref="g2"/>
  </group>
  <group name="g2">
    <element ref="e5"/>
  </group>
  <group name="g3">
    <element ref="e1"/>
    <element ref="e4"/>
  </group>
</groups>

There are several solutions depending on aggresiveness of optimization. A moderate one is done through the xsl:key. All this reminds recursive common table expressions in SQL.

Anyone?

Monday, 21 December 2009 17:19:32 UTC  #    Comments [0] -
xslt
# Friday, 11 December 2009

A client asked us to produce Excel reports in ASP.NET application. They've given an Excel templates, and also defined what they want to show.

What are our options?

  • Work with Office COM API;
  • Use Office Open XML SDK (which is a set of pure .NET API);
  • Try to apply xslt somehow;
  • Macro, other?

For us, biased to xslt, it's hard to make a fair choice. To judge, we've tried formalize client's request and to look into future support.

So, we have defined sql stored procedures to provide the data. This way data can be represented either as ADO.NET DataSet, a set of classes, as xml, or in other reasonable format. We do not predict any considerable problem with data representation if client will decide to modify reports in future.

It's not so easy when we think about Excel generation.

Due to ignorance we've thought that Excel is much like xslt in some regard, and that it's possible to provide a tabular data in some form and create Excel template, which will consume the data to form a final output. To some extent it's possible, indeed, but you should start creating macro or vb scripts to achieve acceptable results.

When we've mentioned macroses to the client, they immediately stated that such a solution won't work due to security reasons.

Comparing COM API and Open XML SDK we can see that both provide almost the same level of service for us, except that the later is much more lighter and supports only Open XML format, and the earlier is a heavy API exposing MS Office and supports earlier versions also.

Both solutions have a considerable drawback: it's not easy to create Excel report in C#, and it will be a pain to support such solution if client will ask, say in half a year, to modify something in Excel template or to create one more report.

Thus we've approached to xslt. There we've found two more directions:

  • generate data for Office Open XML;
  • generate xml in format of MS Office 2003.

It's turned out that it's rather untrivial task to generate data for Open XML, and it's not due to the format, which is not xml at all but a zipped folder containing xmls. The problem is in the complex schemas and in many complex relations between files constituting Open XML document. In contrast, MS Office 2003 format allows us to create a single xml file for the spreadsheet.

Selecting between standard and up to date format, and older proprietary one, the later looks more attractive for the development and support.

At present we're at position to use xslt and to generate files in MS Office 2003 format. Are there better options?

Friday, 11 December 2009 09:28:32 UTC  #    Comments [4] -
Tips and tricks | xslt
# Saturday, 05 December 2009

Did you ever hear that double numbers may cause roundings, and that many financial institutions are very sensitive to those roundings?

Sure you did! We're also aware of this kind of problem, and we thought we've taken care of it. But things are not that simple, as you're not always know what an impact the problem can have.

To understand the context it's enough to say that we're converting (using xslt by the way) programs written in a CASE tool called Cool:GEN into java and into C#. Originally, Cool:GEN generated COBOL and C programs as deliverables. Formally, clients compare COBOL results vs java or C# results, and they want them to be as close as possible.

For one particular client it was crucial to have correct results during manipulation with numbers with 20-25 digits in total, and with 10 digits after a decimal point.

Clients are definitely right, and we've introduced generation options to control how to represent numbers in java and C# worlds; either as double or BigDecimal (in java), and decimal (in C#).

That was our first implementation. Reasonable and clean. Was it enough? - Not at all!

Client's reported that java's results (they use java and BigDecimal for every number with decimal point) are too precise, comparing to Mainframe's (MF) COBOL. This rather unusuall complain puzzles a litle, but client's confirmed that they want no more precise results than those MF produces.

The reason of the difference was in that that both C# and especially java may store much more decimal digits than is defined for the particualar result on MF. So, whenever you define a field storing 5 digits after decimal point, you're sure that exactly 5 digits will be stored. This contrasts very much with results we had in java and C#, as both multiplication and division can produce many more digits after the decimal point. The solution was to truncate(!) (not to round) the numbers to the specific precision in property setters.

So, has it resolved the problem? - No, still not!

Client's reported that now results much more better (coincide with MF, in fact) but still there are several instances when they observe differences in 9th and 10th digits after a decimal point, and again java's result are more accurate.

No astonishment this time from us but analisys of the reason of the difference. It's turned out that previous solution is partial. We're doing a final truncation but still there were intermediate results like in a/(b * c), or in a * (b/c).

For the intermediate results MF's COBOL has its, rather untrivial, formulas (and options) per each operation defining the number of digits to keep after a decimal point. After we've added similar options into the generator, several truncations've manifested in the code to adjust intermediate results. This way we've reached the same accurateness as MF has.

What have we learned (reiterated)?

  • A simple problems may have far reaching impact.
  • More precise is not always better. Client often prefers compatible rather than more accurate results.
Saturday, 05 December 2009 13:17:42 UTC  #    Comments [0] -
Tips and tricks | xslt
# Monday, 29 June 2009

If you have a string variable $value as xs:string, and want to know whether it starts from a digit, then what's the best way to do it in the xpath?

Our answer is: ($value ge '0') and ($value lt ':').

Looks a little funny (and disturbing).

Monday, 29 June 2009 06:00:28 UTC  #    Comments [0] -
Tips and tricks | xslt
# Wednesday, 24 June 2009

In our project we're generating a lot of xml files, which are subjects of manual changes, and repeated generations (often with slightly different generation options). This way a life flow of such an xml can be described as following:

  1. generate original xml (version 1)
  2. manual changes (version 2)
  3. next generation (version 3)
  4. manual changes integrated into the new generation (version 4)

If it were a regular text files we could use diff utility to prepare patch between versions 1 and 2, and apply it with patch utility to a version 3. Unfortunately xml has additional semantics compared to a plain text. What's an invariant or a simple modification in xml is often a drastic change in text. diff/patch does not work well for us. We need xml diff and patch.

The first guess is to google it! Not so simple. We have failed to find a tool or an API that can be used from ant. There are a lot of GUIs to show xml differences and to perform manual merge, or doing similar but different things to what we need (like MS's xmldiffpatch).

Please point us to such a program!

Meantime, we need to proceed. We don't believe that such a tool can be done on the knees, as it's a heuristical and mathematical at the same time task requiring a careful design and good statistics for the use cases. Our idea is to exploit diff/patch. To achieve the goals we're going to perform some normalization of xmls before diff to remove redundant invariants, and normalization after the patch to return it into a readable form. This includes:

  • ordering attributes by their names;
  • replacing unsignificant whitespaces with line breaks;
  • entering line breaks after element names and before attributes, after attribute name and before it's value, and after an attribute value.

This way we expect to recieve files reacting to modifications similarly to text files.

Wednesday, 24 June 2009 11:40:32 UTC  #    Comments [0] -
Tips and tricks | xslt
# Thursday, 18 June 2009

At present C# serializer knows how to print comments and do some formatting (we had to create micro xml serializer within xslt to serialize xml comments). C#'s formatting is not as advanced as java's one, but it should not be such in the first place, as C# text tends to be more neat due to properties and events. Compare:

Java: instance.getItems().get(10).setValue(value);

vs

C#: instance.Items[10].Value = value;

TODO: implement API existing in jxom and missing in C# xom. This includes:

  • name normalization - rewriting tree to make names unique (duplicate names are often appear during generation from code templates);
  • namespaces normalization - rewriting tree to elevate type namespaces (during generation, types are usually fully qualified);
  • unreachable code detection - optional feature (in java it's required, as unreachable code is an error, while in C# it's only a warning);
  • compile time expression evaluation - optional feature used in code optimization and in reachability checks;
  • state machine refactoring - not sure, as C# has yield statement that does the similar thing.

Update can be found at: jxom/C# xom.

June, 24 update: name and namespace normalizations are implemented.

Thursday, 18 June 2009 15:11:53 UTC  #    Comments [0] -
Announce | xslt
# Monday, 15 June 2009

Writing a language serializer is an as easy task, as riding a bicycle. Once you learned it, you won't apply a mental force anymore to create a new one.

This still requires essential mechanical efforts to write and test things.

Well, this is the first draft of the C# xslt serializer. Archive contains both C# xom and jxom.

Note: no comments are still supported; nothing is done to format code except line wrapping.

Monday, 15 June 2009 14:51:11 UTC  #    Comments [0] -
Announce | xslt
# Thursday, 28 May 2009

Well, it's jxom no more but also csharpxom!

A project concerns demanded us to create a C# 3.0 xml schema.

Shortly we expect to create an xslt serializing an xml document in this schema into a text. Thankfully to the original design we can reuse java streamer almost without changes.

A fact: C# schema more than twice bigger than the java's.

Thursday, 28 May 2009 09:57:02 UTC  #    Comments [4] -
Announce | xslt
# Friday, 08 May 2009

Yesterday, we've found an article "Repackaging Saxon". It's about a decision to go away from Saxon-B/Saxon-SA packaging to a more conventional product line: Home/Professional/Enterprise Editions.

The good news are that the Saxon stays open source. That's most important as an open comunity spirit will be preserved. On the other hand Professional and Enterprise Editions will not be free.

In this regard the most interesting comments are:

John Cowan> I suspect that providing packaging only for $$ (or pounds or euros) won't actually work, because someone else will step in and provide that packaging for free, as the licensing permits.

and response:

Michael Kay> This will be interesting to see. I'm relying partly on the idea that there's a fair degree of trust, and expectation of support, associated with Saxonica's reputation, and that the people who are risking their business on the product might be hesitant to rely on third parties, who won't necessarily be prompt in issuing maintenance releases etc; at the same time, such third parties may serve the needs of the hobbyists who are the real market for the open source version.

and also:

Michael Kay> ...I haven't been able to make a model based on paid services plus free software work for me. It's hard enough to get the services business; when you do get it, it's hard to get enough revenue from it to fund the time spent on developing and supporting the software. Personally, I think the culture of free software has gone too far, and it is now leading to a lack of investment in new software...

Friday, 08 May 2009 09:38:09 UTC  #    Comments [0] -
xslt
# Wednesday, 22 April 2009

Today we have seen an article: "The Death of XSLT in Web Frameworks". Who wants, please read.

Both Arthur's and mine reactions were equal...

Fabula of the article: I never read Pasternak but I disapprove him.

Background idea: What's not being hyped is dead.

See also: Web Application Frameworks.

Wednesday, 22 April 2009 08:16:16 UTC  #    Comments [0] -
xslt
# Sunday, 08 March 2009

Recently, we have started looking into a task of creating an interactive parser. A generic one.

Yes, we know there are plenty of them all around, however the goals we have defined made us to construct the new implementation.

The goals:

  • Parser must be incremental.
    You should direct what to parse, and when to stop.
    This virtually demands rather "pull" than conventional "push" implementation.
  • Parser must be able to synchronize a tree with text.
    Whenever the underlying text is changed, a limited part of a tree should to be updated.
  • Parser should be able to recover from errors, and continue parsing.
  • Parser should be manageable.
    This is a goal of every program, really.
  • Parser must be fast.
  • A low memory footprint is desired.

What's implemented (VS2008, C#) and put at SourceForge, is called an Incremental Parser.

These are parser's insights:

  • Bookmarks are objects to track text points. We use a binary tree (see Bare binary tree algorithms) to adjust positions of bookmarks when text is changed.
  • Ranges define parsed tree elements. Each range is defined by two bookmarks, and a grammar annotation.
  • There are grammar primitives, which can be composed into a grammar graph.
  • A grammar graph along with ranges form a state machine.
  • Grammar chains are cached, turning parsing into a series of probes of literal tokens and transitions between grammar chains. This caching is done on demand, which results in warming-up effect.
  • Parser itself includes a random access tokenizer, and a queue of ranges pending to be parsed.
  • Parsing is conducted as a cycle of pulling and parsing of pending ranges.
  • Whenever text is changed a closest range is queued for the reparsing.
  • A balance between amount of parsing and memory consumption can be achieved through a detalization of grammar annotation for a range. An active text fragment can be fully annotated, while for other text parts a coarse range can be stored.

We have defined xpath like grammar to test our ideas. See printed parsed trees to get understanding of what information can be seen from ranges.

Sunday, 08 March 2009 21:00:38 UTC  #    Comments [0] -
Announce | Incremental Parser | xslt
# Thursday, 15 January 2009

A simple demand nowdays - a good IDE.

Almost a ten years have passed since xslt has appeared but still, we're not pleased with IDEs claiming xslt support. Our expectaions are not too high. There are things however, which must be present in such an IDE.

  1. A notion of project, and possibly a group of projects. You may think of it as a main xslt including other xslts participationg in the project.
  2. A code completion. A feature providing typing hints for language constructs, includes, prefixes, namespaces, functions, templates, modes, variables, parameters, schema elements, and other (all this should work in a context of the project).
  3. A code refactoring. A means to move parts of code between (or inside) files and projects, rename things (functions, templates, parameters, variables, prefixes, namespaces, and other).
  4. Code validation and run.
  5. Optional debug feature.

We would be grateful if someone had pointed to any such IDE.

Thursday, 15 January 2009 14:41:35 UTC  #    Comments [13] -
Incremental Parser | xslt
# Wednesday, 14 January 2009

Once upon a time, we created a function mimicking decapitalize() method defined in java in java.beans.Introspector. Nothing special, indeed. See the source:

/**
 * Utility method to take a string and convert it to normal Java variable
 * name capitalization. This normally means converting the first
 * character from upper case to lower case, but in the (unusual) special
 * case when there is more than one character and both the first and
 * second characters are upper case, we leave it alone.
 * <p>
 * Thus "FooBah" becomes "fooBah" and "X" becomes "x", but "URL" stays
 * as "URL".
 *
 * @param name The string to be decapitalized.
 * @return The decapitalized version of the string.
 */
public static String decapitalize(String name) {
  if (name == null || name.length() == 0) {
    return name;
  }
  if (name.length() > 1 && Character.isUpperCase(name.charAt(1)) &&
    Character.isUpperCase(name.charAt(0))){
    return name;
  }
  char chars[] = name.toCharArray();
  chars[0] = Character.toLowerCase(chars[0]);
  return new String(chars);
}

We typed implementation immediately:

<xsl:function name="t:decapitalize" as="xs:string">
  <xsl:param name="value" as="xs:string?"/>

  <xsl:variable name="c" as="xs:string"
    select="substring($value, 2, 1)"/>

  <xsl:sequence select="
    if ($c = upper-case($c)) then
      $value
    else
      concat
      (
        lower-case(substring($value, 1, 1)),
        substring($value, 2)
      )"/>
</xsl:function>

It worked, alright, until recently, when it has fallen to work, as the output was different from java's counterpart.

The input was W9Identifier. Function naturally returned the same value, while java returned w9Identifier. We has fallen with the assumption that $c = upper-case($c) returns true when character is an upper case letter. That's not correct for numbers. Correct way is:

<xsl:function name="t:decapitalize" as="xs:string">
  <xsl:param name="value" as="xs:string?"/>

  <xsl:variable name="c" as="xs:string"
    select="substring($value, 2, 1)"/>

  <xsl:sequence select="
    if ($c != lower-case($c)) then
      $value
    else
      concat
      (
        lower-case(substring($value, 1, 1)),
        substring($value, 2)
      )"/>
</xsl:function>

Wednesday, 14 January 2009 15:46:23 UTC  #    Comments [0] -
Tips and tricks | xslt
# Saturday, 03 January 2009

The last year we were working on a project, which in essence dealt with transformation of graphs. Our experience with xslt 1.0, and other available information was promising - xslt 2.0 is a perfect match.

We were right, xslt 2.0 fitted very well to the problem.

It's easy to learn xslt 2.0/xquery: be acquainted with xml schema; read through a syntax, which is rather concise; look at examples, and start coding. API you will learn incrementally.

The same as other languages, xslt 2.0 is only a media to express algorithms. As such it fills its role rather good, as good as SQL:2003 and its variations do, and sometimes even better than other programming languages like C++ do.

Compare expressions "get data satisfying to a specific criteria" and "for each data part check a specific condition, and if it true, add it to the result". These often represent the same idea from two perspectives: human (or math) thinkning; and thinking in terms of execution procedure.

Both kinds of expressions have their use, however it has happened so that we're the human beings and perceive more easily natural language notions like: subjects, objects, predicates, deduction, induction and so on. I think the reason is that a human's (not positronic) brain grasps ideas, conceptions, images as something static, while execution procedure demands a notion of time (or at least notions of a sequence and an order) for the comprehension. ("Are you serious?", "Joke!" :-))

There is the other side to this story.

We have made the project design in relatively short terms. A good scalable design. We needed people who know xslt 2.0 to implement it. It has turned out, this was a strong objection against xslt!

Our fellow, xslt guru, Oleg Tkachenko has left our company to make his career at Microsoft, and to our disbelief it was impossible to find a person who was interested in a project involvong 85% of xslt and 15% of other technologies including java. Even in java world people prefer routine projects, like standard swing or web application, to a project demanding creativeness.

Possibly, it was our mistake, to allow to our company to look for developers the standard way: some secretary was looking through her sources, and inevitably was finding so-so java + poor xml + almost zero xslt knowledge graduates. We had to make appeals on xslt forums especially since the project could be easily developed with a distributed group.

Finally, we have designed and implemented the project by ourselves but to the present day our managers are calling and suggesting java developers for our project. What a bad joke!

Saturday, 03 January 2009 09:51:05 UTC  #    Comments [0] -
xslt
# Wednesday, 17 December 2008

Just for fun I've created exslt2.xslt and exslt2-test.xslt to model concepts discussed at EXSLT 2.0 forum. I did nothing special but used tuple as reference, and also I've defined f:call() to make function call indirectly.

<?xml version="1.0" encoding="utf-8"?>
<!--
  exslt 2 sketches.
-->
<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:f="http://exslt.org/v2"
  xmlns:t="this"
  xmlns:p="private"
  exclude-result-prefixes="xs t f">

  <xsl:include href="exslt2.xslt"/>

  <xsl:template match="/" name="main">
    <root>
      <xsl:variable name="refs" as="item()*" select="
        for $i in 1 to 20 return
          f:ref(1 to $i)"/>

      <total-items>
        <xsl:sequence select="
          sum
          (
            for $ref in $refs return
              count(f:deref($ref))
          )"/>
      </total-items>

      <sums-per-ref>
        <xsl:for-each select="$refs">
          <xsl:variable name="index" as="xs:integer" select="position()"/>

          <sum
            index="{$index}"
            value="{sum(f:deref(.))}"/>
        </xsl:for-each>
      </sums-per-ref>

      <add>
        <xsl:text>1 + 2 = </xsl:text>
        <xsl:sequence select="f:call(xs:QName('t:add'), (1, 2))"/>
      </add>
      </root>
  </xsl:template>

  <xsl:function name="t:add" as="xs:integer">
    <xsl:param name="arguments" as="xs:integer+"/>

    <xsl:variable name="first" as="xs:integer" select="$arguments[1]"/>
    <xsl:variable name="second" as="xs:integer" select="$arguments[2]"/>

    <xsl:sequence select="$first + $second"/>
  </xsl:function>

</xsl:stylesheet>

Code can be found at saxon.extensions.9.1.zip.

Wednesday, 17 December 2008 13:53:03 UTC  #    Comments [0] -
xslt
# Wednesday, 10 December 2008

We have created Java Xml Object Model purely for purposes of our project. In fact jxom at present has siblings: xml models for sql dialects. There are also different APIs like name normalizations, refactorings, compile time evaluation.

It turns out that jxom is also good enough for other developers.

The drawback of jxom, however, is rather complex xml schema. It takes time to understand it. To simplify things we have created (and planning to create more) a couple of examples allowing to feel how jxom xml looks like.

The latest version can be loaded from jxom.zip

We would be pleased to see more comments on the subject.

Wednesday, 10 December 2008 09:35:26 UTC  #    Comments [0] -
Announce | xslt
# Saturday, 22 November 2008

Recently, working on completely different thing, I've realized that one may create a "generator", function returning different values per each call. I was somewhat puzzled with this conclusion, as I thought xslt functions have no side effects, and for the same arguments xslt function returns the same result.

I've confirmed the conclusion at the forum. See Scope of uniqueness of generate-id().

In short:

  • each node has an unique identity;
  • function in the course of work creates a temporary node and produces a result depending on identity of that node.

Example:

<xsl:stylesheet version="2.0"
  xmlns:f="data:,f"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/">
  <xsl:message select="
    for $i in 1 to 8 return
      f:fun()"/>
</xsl:template>

<xsl:function name="f:fun" as="xs:string">
  <xsl:variable name="x">!</xsl:variable>

  <xsl:sequence select="generate-id($x)"/>
</xsl:function>

</xsl:stylesheet>

The next thought was that if you may create a generator then it's easy to create a good random number generator (that's a trivial math task).

Hey gurus, take a chance!

Saturday, 22 November 2008 08:27:48 UTC  #    Comments [2] -
xslt
# Tuesday, 18 November 2008

Suppose you have constructed a sequence of attributes.

How do you access a value of attribute "a"?

Simple, isn't it? It has taken a couple of minutes to find a solution!

<xsl:variable name="attributes" as="attribute()*">
  <xsl:apply-templates mode="t:generate-attributes" select="."/>
</xsl:variable>

<xsl:variable name="value" as="xs:string?"
  select="$attributes[self::attribute(a)]"/>

Tuesday, 18 November 2008 11:41:41 UTC  #    Comments [2] -
Tips and tricks | xslt
# Thursday, 13 November 2008

Saying

Our project, containing many different xslt files, generates many different outputs (e.g: code that uses DB2 SQL, or Oracle SQL, or DAO, or some other flavor of code). This results in usage of indirect calls to handle different generation options, however to allow xslt to work we had to create a big main xslt including stylesheets for each kind of generation. This impacts on a compilation time.

Alternatives

  1. A big main xslt including everything.
  2. A big main xslt including everything and using "use-when" attribute.
  3. Compose main xslt on the fly.

We were eagerly inclined to the second alternative. Unfortunately a limited set of information is available when "use-when" is evaluated. In particular there are neither parameters nor documents available. Using Saxon's extensions one may reach only static variables, or access System.getProperty(). This isn't flexible.

We've decided to try the third alternative.

Solution

We think we have found a nice solution: to create XsltSource, which receives a list of includes upon construction, and creates an xslt when getReader() is called.

import java.io.Reader;
import java.io.StringReader;

import javax.xml.transform.stream.StreamSource;

/**
 * A source to read generated stylesheet, which includes other stylesheets.
 */
public class XsltSource extends StreamSource
{
  /**
   * Creates an {@link XsltSource} instance.
   */
  public XsltSource()
  {
  }

  /**
   * Creates an {@link XsltSource} instance.
   * @param systemId a system identifier for root xslt.
   */
  public XsltSource(String systemId)
  {
    super(systemId);
  }

  /**
   * Creates an {@link XsltSource} instance.
   * @param systemId a system identifier for root xslt.
   * @param includes a list of includes.
   */
  public XsltSource(String systemId, String[] includes)
  {
    super(systemId);

    this.includes = includes;
  }

  /**
   * Gets stylesheet version.
   * @return a stylesheet version.
   */
  public String getVersion()
  {
    return version;
  }

  /**
   * Sets a stylesheet version.
   * @param value a stylesheet version.
   */
  public void setVersion(String value)
  {
    version = value;
  }

  /**
   * Gets a list of includes.
   * @return a list of includes.
   */
  public String[] getIncludes()
  {
    return includes;
  }

  /**
   * Sets a list of includes.
   * @param value a list of includes.
   */
  public void setIncludes(String[] value)
  {
    includes = value;
  }

  /**
   * Generates an xslt on the fly.
   */
  public Reader getReader()
  {
    String[] includes = getIncludes();

    if (includes == null)
    {
      return super.getReader();
    }

    String version = getVersion();

    if (version == null)
    {
      version = "2.0";
    }

    StringBuilder builder = new StringBuilder(1024);

    builder.append("<stylesheet version=\"");
    builder.append(version);
    builder.append("\" xmlns=\"http://www.w3.org/1999/XSL/Transform\">");

    for(String include: includes)
    {
      builder.append("<include href=\"");
      builder.append(include);
      builder.append("\"/>");
    }

    builder.append("</stylesheet>");

    return new StringReader(builder.toString());
  }

  /**
   * An xslt version. By default 2.0 is used.
   */
  private String version;

  /**
   * A list of includes.
   */
  private String[] includes;
}

To use it one just needs to write:

Source source = new XsltSource(base, stylesheets);
Templates templates = transformerFactory.newTemplates(source);
...

where:

  • base is a base uri for the generated stylesheet; it's used to resolve relative includes;
  • stylesheets is an array of hrefs.

Such implementation resembles a dynamic linking when separate parts are bound at runtime. We would like to see dynamic modules in the next version of xslt.

Thursday, 13 November 2008 11:26:50 UTC  #    Comments [0] -
Tips and tricks | xslt
# Tuesday, 04 November 2008

Why we've turned our attention to the Saxon implementation?

A considerable part (~75%) of project we're working on at present is creating xslt(s). That's not stylesheets to create page presentations, but rather project's business logic. To fulfill the project we were in need of xslt 2.0 processor. In the current state of affairs I doubt someone can point to a good alternative to the Saxon implementation.

The open source nature of the SaxonB project and intrinsic curiosity act like a hook for such species like ourselves.

We want to say that we're rather sceptical observers of a code: the code should prove it have merits. Saxon looks consistent. It takes not too much time to grasp implementation concepts taking into account that the code routinely follows xpath/xslt/xquery specifications. These code observation and practice with live xslt tasks helped us to form an opinion on the Saxon itself. That's why we dare to critique it.

1. Compilation is fused with execution.

An xslt before being executed passes several stages including xpath data model, and a graph of expressions - objects implementing parts of runtime logic.

Expression graph is optimized to achieve better runtime performace. The optimization logic is distributed throughout the code, and in particular lives in expression objects. This means that expression completes two roles: runtime execution and optimization.

I would prefer to see a smaller and cleaner run time objects (expressions), and optimization logic separately. On the other hand I can guess why Michael Kay fused these roles: to ease lazy optimizations (at runtime).

2. Optimizations are xslt 1.0 by origin

This is like a heritage. There are two main techniques: cached sequences, and global indices of rooted nodes.

This might be enough in xslt 1.0, but in 2.0 where there are diverse set of types, where sequences extend node sets to other types, where sequences may logically be grouped by pairs, tripples, and so on, this is not enough.

XPath data model operates with sequences only (in math sense). On the other hand it defines many set based functions (operators) like: $a intersect $b, $a except $b, $a = $b, $a != $b. In these examples XPath sequences are better to consider as sets, or maps of items.

Other example: for $i in index-of($names, $name) return $values[$i], where $names as xs:string*, $values as element()* shows that a closure of ($names, $values) is in fact a map, and $names might be implemented as a composition of a sequence and a map of strings to indices.

There are other use case examples, which lead me to think that Saxon lacks set based operators. Global indices are poor substitution, which work for rooted trees only.

Again, I guess why Michael Kay is not implementing these operators: not everyone loads xslt with stressful tasks requiring these features. I think xslt is mostly used to render pages, and one rarely deviates from rooted trees.

In spite of the objections we think that Saxon is a good xslt 2.0 implementation, which unfortunately lacks competitors.

Tuesday, 04 November 2008 11:30:36 UTC  #    Comments [0] -
xslt
# Saturday, 27 September 2008

We are certain xslt/xquery are the best for web application frameworks from the design perspective; or, in other words, pipeline frameworks allowing use of xslt/xquery are preferable way to create web applications.

Advantages are obvious:

  • clear separation of business logic, data, and presentation;

  • richness of languages, allowing to implement simple presentation, complex components, and sophisticated data binding;

  • built-in extensibility, allowing comunication with business logic, written in other languages and/or located at different site.

It seems the agitation for a such technologies is like to force an open door. There are such frameworks out there: Orbeon Forms, Cocoon, and others. We're not qualified to judge of their virtues, however...

Look at the current state of affairs. The main players in this area (well, I have a rather limited vision) push other technologies: JSP/JSF/Faceletes and alike in the Java world, and ASP.NET in the .NET world. The closest thing they are providing is xslt servlet/component allowing to generate an output.

Their variants of syntaxis, their data binding techniques allude to similar paradigms in xslt/xquery:

<select>
  <c:forEach var="option" items="#{bean.options}">
    <option value="#{option.key}">#{parameter.value}</option>
  </c:forEach>
</select>

On the surface, however, we see much more limited (in design and in the application) frameworks.

And here is a contradiction: how can it be that at present such a good design is not as popular, as its competitors, at least?

Someone can say, there is no such a problem. You can use whatever you want. You have a choice! Well, he's lucky. From our perspective it's not that simple.

We're creating rather complex web applications. Their nature isn't important in this context, but what is important is that there are customers. They are not thoroughly enlightened in the question, and exactly because of this they prefer technologies proposed by leaders. It seems, everything convince them: main stream, good support, many developers who know technology.

There is no single chance to promote anything else.

We believe that the future may change this state, but we're creating at present, and cannot wait...

Saturday, 27 September 2008 10:36:06 UTC  #    Comments [3] -
Tips and tricks | xslt
# Tuesday, 16 September 2008

I've uploaded jxom.zip

Now, it contains a state machine generator. See "What you can do with jxom".

The code is in the java-state-machine-generator.xslt. The test is in the java-state-machine-test.xslt.

Tuesday, 16 September 2008 11:02:09 UTC  #    Comments [0] -
xslt
# Friday, 05 September 2008

We're facing a task of conversion of a java method into a state machine. This is like to convert a SAX Parser, pushing data, into an Xml Reader, which pulls data.

The task is formalized as:

  • for a given method containing split markers create a class perimitting iteration;
  • each iteration performs part of a logic of a method.

We have defined rules converting all statements into a state machine except of the statement synchronized. In fact the logic is rather linear, however the most untrivial conversion is for try statement. Consider an example:

public class Test
{
  void method()
    throws Exception
  {
    try
    {
      A();
      B();
    }
    catch(Exception e)
    {
      C(e);
    }
    finally
    {
      D();
    }

    E();
  }

  private void A()
    throws Exception
  {
    // logic A
  }

  private void B()
    throws Exception
  {
    // logic B
  }

  private void C(Exception e)
    throws Exception
  {
    // logic C
  }

  private void D()
    throws Exception
  {
    // logic D
  }

  private void E()
    throws Exception
  {
    // logic E
  }
}

Suppose we want to see method() as a state machine in a way that split markers are after calls to methods A(), B(), C(), D(), E(). This is how it looks as a state machine:

Callable<Boolean> methodAsStateMachine()
  throws Exception
{
  return new Callable<Boolean>()
  {
    public Boolean call()
      throws Exception
    {
      do
      {
        try
        {
          switch(state)
          {
            case 0:
            {
              A();
              state = 1;

              return true;
            }
            case 1:
            {
              B();
              state = 3;

              return true;
            }
            case 2:
            {
              C(ex);
              state = 3;

              return true;
            }
            case 3:
            {
              D();

              if (currentException != null)
              {
                throw currentException;
              }

              state = 4;

              return true;
            }
            case 4:
            {
              E();
              state = -1;

              return false;
            }
          }

          if (currentException == null)
          {
            currentException = new IllegalStateException();
          }
        }
        catch(Throwable e)
        {
          currentException = null;

          switch(state)
          {
            case 0:
            case 1:
            {
              if (e instanceof Exception)
              {
                ex = (Exception)e;
                state = 2;
              }
              else
              {
                currentException = e;
                state = 3;
              }

              continue;
            }
            case 2:
            {
              currentException = e;
              state = 3;

              continue;
            }
          }

          currentException = e;
          state = -1;
        }
      }
      while(false);

      return this.<Exception>error();
    }

    @SuppressWarnings("unchecked")
    private <T extends Throwable> boolean error()
      throws T
    {
      throw (T)currentException;
    }

    private int state = 0;
    private Throwable currentException = null;
    private Exception ex = null;
  };
}

Believe it, or not but this transformation can be done purely in xslt 2.0 with the help of the jxom (Java xml object model). We shall update jxom.zip whenever this module will be implemented and tested.

Friday, 05 September 2008 15:39:50 UTC  #    Comments [0] -
xslt
# Wednesday, 03 September 2008

In the xslt one can express logically the same things in different words like:

  exists($x)
and
  every $y in $x satisfies exists($y)

newbie> Really the same?
expert> Ops... You're right, these are different things!

What's the difference?

Wednesday, 03 September 2008 12:34:06 UTC  #    Comments [0] -
xslt
# Saturday, 30 August 2008

I was already writing about tuples and maps in the xslt (see Tuples and maps - Status: CLOSED, WONTFIX, and Tuples and maps in Saxon).

Now, I want to argue on a use case, and on how xslt processor can detect such a use case and implement it as map. This way, for a certain conditions, a sequences could be treated as maps (or as sets).

Use case.

There are two stages:

  • a logic collecting nodes/values satisfying some criteria.
  • process data, and take a special action whenever a node/value is collected on the previous stage.

Whenever we're talking of nodes than result of the first stage is a sequence $set as node()*. The role of this sequence is a set of nodes (order is not important).

The second stage is usually an xsl:for-each, an xsl:apply-templates, or something of this kind, which repeatedly verifies whether a some $node as node()? belongs to the $set, like a following: $node intersect $set, or $node except $set.

In spite of that we're still using regular xpath 2.0, we have managed to express a set based operation. It's a matter of xslt processor's optimizer to detect such a use case and consider a sequence as a set. In fact the detection rule is rather simple.

For expressions $node except $set and $node intersect $set:
  • $set can be considered as a set, as order of elements is not important;
  • chances are good that a $set being implemented as a set outperforms implementation using a list or an array.

Thus what to do? Well, I do not think I'm the smartest child, quite opposite... however it worth to hint this idea to xslt implementers (see Suggest optimization). I still do not know if it was fruitful...

P.S. A very similar use case exists for a function index-of($collection, $item).

Saturday, 30 August 2008 07:44:44 UTC  #    Comments [0] -
xslt
# Tuesday, 12 August 2008

I know we're not the first who create a parser in xslt. However I still want to share our implementation, as I think it's beautiful.

In our project, which is conversion from a some legacy language to java, we're dealing with dynamic expressions. For example in the legacy language one can filter a collection using an expression defined by a string: collection.filter("a > 0 and b = 7");

Whenever expression string is calculated there is nothing to do except to parse such string at runtime and perform filtering dynamically. On the other hand we have found that in the majority of cases literal strings are used. Thus we have decided to optimize this route like this:

  collection.filter(
    new Filter<T>()
    {
      boolean filter(T value)
      {
        return (value.getA() > 0) and (value.getB() = 7);
      }
    });

This means that we're converting that expression string into java code on the generation stage.

In the xslt - our generator engine - this means that we have to convert a string into expression tree like this:

(a > 7 or a= 3) and c * d = 2.2

to

<and>
  <or>
    <gt>
      <identifier>a</identifier>
      <integer>7</integer>
    </gt>
    <eq>
      <identifier>a</identifier>
      <integer>3</integer>
    </eq>
  </or>
  <eq>
    <mul>
      <identifier>c</identifier>
      <identifier>d</identifier>
    </mul>
    <decimal>2.2</decimal>
  </eq>
</and>

Our parser fits naturally to the world of parsers: it uses xsl:analyze-string instruction to tokenize input and parses tokens according to an expression grammar. During implementation I've found some new to me things. I think they worth mentioning:

  • As tokenizer is defined as a big regular expression, we have rather verbose regex attribute over xsl:analyze-string. It was hard to edit such a big line until I've found there is flag="x" option that solves formatting problems:

    The flags attribute may be used to control the interpretation of the regular expression... If it contains the letter x, then whitespace within the regular expression is ignored.

    This means that I can use spaces to format regular expression and /s to specify space as part of expression.
  • Saxon 9.1.0.1 has inefficiency in implementation of xsl:analyze-string instruction, whenever regex contains literal value however with '{' character (e.g. "\p{{L}}"), as it considers the value to be an AVT and delays pattern compilation until runtime, which it does every time instruction is executed.

Use following link to see the xslt: expression-parser.xslt.
To see how to generate java from an xml follow this link: Xslt for the jxom (Java xml object model), jxom.zip.

Tuesday, 12 August 2008 14:45:54 UTC  #    Comments [2] -
xslt
# Thursday, 31 July 2008

Yesterday, incidentally, I've arrived to a problem of a dynamic error during evaluation of a template's match. This reminded me SFINAE in C++. There the principle is applied at compile time to find a matching template.

I think people underestimate the meaning of this behaviour. The effect of dynamic errors occurring during pattern evaluation is described in the specification:

Any dynamic error or type error that occurs during the evaluation of a pattern against a particular node is treated as a recoverable error even if the error would not be recoverable under other circumstances. The optional recovery action is to treat the pattern as not matching that node.

This has far reaching consequences, like an error recovery. To illustrate what I'm talking about please look at this simple stylesheet that recovers from "Division by zero.":

<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xsl:template match="/">
  <xsl:variable name="operator" as="element()+">
    <div divident="10" divisor="0"/>
    <div divident="10" divisor="2"/>
  </xsl:variable>

  <xsl:apply-templates select="$operator"/>
</xsl:template>

<xsl:param name="NaN" as="xs:double" select="1.0 div 0"/>

<xsl:template
  match="div[(xs:integer(@divident) div xs:integer(@divisor)) ne $NaN]">
  <xsl:message select="xs:integer(@divident) div xs:integer(@divisor)"/>
</xsl:template>

<xsl:template match="div">
  <xsl:message select="'Division by zero.'"/>
</xsl:template>

</xsl:stylesheet>

Here, if there is a division by zero a template is not matched and other template is selected, thus second template serves as an error handler for the first one. Definitely, one may define much more complex construction to be handled this way.

I never was a purist (meaning doing everything in xslt), however this example along with indirect function call, shows that xslt is rather equiped language. One just need to be smart enough to understand how to do a things.

See also: Try/catch block in xslt 2.0 for Saxon 9.

Thursday, 31 July 2008 11:52:21 UTC  #    Comments [0] -
Tips and tricks | xslt
# Monday, 28 July 2008

Among other job activities, we're from time to time asked to check technical skills of job applicants.

Several times we were interviewing people who're far below the acceptable professional skills. It's a torment for both sides, I should say.

To ease things we have designed a small questionnaire (specific to our projects) for job applicants. It's sent to an applicant before the meeting. Even partially answered, this questionnaire constitutes a good filter against profanes:

<questionnaire>
  <item>
    <question>
      Please estimate your knowledge in XML Schema (xsd) as lacking, bad, good, or perfect.
    </question>
    <answer/>
  </item>
  <item>
    <question>
      Please estimate your knowledge in xslt 2.0/xquery 1.0 as lacking, bad, good, or perfect.
    </question>
    <answer/>
  </item>
  <item>
    <question>
      Please estimate your knowledge in xslt 1.0 as lacking, bad, good, or perfect.
    </question>
    <answer/>
  </item>
  <item>
    <question>
      Please estimate your knowledge in java as lacking, bad, good, or perfect.
    </question>
    <answer/>
  </item>
  <item>
    <question>
      Please estimate your knowledge in c# as lacking, bad, good, or perfect.
    </question>
    <answer/>
  </item>
  <item>
    <question>
      Please estimate your knowledge in sql as lacking, bad, good, or perfect.
    </question>
    <answer/>
  </item>
  <item>
    <question>
      For logical values A, B, please rewrite logical expression "A and B" using operator "or".
    </question>
    <answer/>
  </item>
  <item>
    <question>
      For logical values A, B, please rewrite logical expression "A = B" using operators "and" and "or".
    </question>
    <answer/>
  </item>
  <item>
    <question>
      There are eight balls, with only one heavier than some other.
      What is a minimum number of weighings reveals the heavier ball?
      Please be suspicious about the "trivial" solution.
    </question>
    <answer/>
  </item>
  <item>
    <question>
      If A results in B. What one may say about the reason of B?
    </question>
    <answer/>
  </item>
  <item>
    <question>
      If only A or B result in C. What one may say about the reason of C?
    </question>
    <answer/>
  </item>
  <item>
    <question>
      Please define an xml schema for this questionnaire.
    </question>
    <answer/>
  </item>
  <item>
    <question>
      Please create a simple stylesheet creating an html table based on this questionnaire.
    </question>
    <answer/>
  </item>
  <item>
    <question>
      For a table A with columns B, C, and D, please create an sql query selecting B groupped by C and ordered by D.
    </question>
    <answer/>
  </item>
  <item>
    <question>
      For a sequence of xml elements A with attribute B, please write a stylesheet excerpt creating a sequence of elements D, grouping elements A with the same string value of attribute B, sorted in the order of ascending of B.
    </question>
    <answer/>
  </item>
  <item>
    <question>
      Having a java class A with properties B and C, please sort a collection of A for B in ascending, and C in descending order.
    </question>
    <answer/>
  </item>
  <item>
    <question>
      What does a following line mean in c#?
      int? x;
    </question>
    <answer/>
  </item>
  <item>
    <question>
      What is a parser?
    </question>
    <answer/>
  </item>
  <item>
    <question>
      How to issue an error in the xml stylesheet?
    </question>
    <answer/>
  </item>
  <item>
    <question>
      What is a lazy evaluation?
    </question>
    <answer/>
  </item>
  <item>
    <question>
      How do you understand a following sentence?
      For each line of code there should be a comment.
    </question>
    <answer/>
  </item>
  <item>
    <question>
      Have you used any supplemental information to answer these questions?
    </question>
    <answer/>
  </item>
  <item>
    <question>
      Have you independently answered these questions?
    </question>
    <answer/>
  </item>
</questionnaire>

Monday, 28 July 2008 10:54:54 UTC  #    Comments [0] -
Tips and tricks | xslt
# Thursday, 10 July 2008

I've found that proposition to introduce tuples and maps to xslt/xquery type system has not found a support:

At the joint meeting of the XSL and XQuery Working groups 2008-06-23 it was decided that a change of this nature would be too large for the next "point" release of the Recommendations. The request for new functionality will be considered for a future "main" release.

Boor> *****!

Pessimist> Ah, there won't be tuples and maps in xslt/xquery...

Optimist> Wow, chances are good to see this addition by the year 2018!

Thursday, 10 July 2008 05:54:52 UTC  #    Comments [0] -
xslt
# Thursday, 03 July 2008

Today Michael Kay has announced an update for the Saxon processor. The latest version for now is 9.1.

I've checked our saxon.extensions, and has fixed incompatibilities.

The source for the new version of the Saxon can be found at http://saxon.sourceforge.net/.

New features are discussed at: http://www.saxonica.com/documentation/changes/intro.html

Our extensions can be found at saxon.extensions.9.1.zip.

Thursday, 03 July 2008 13:47:13 UTC  #    Comments [0] -
xslt
# Thursday, 26 June 2008

We are designing a rather complex xslt 2.0 application, dealing with semistructured data. We must tolerate with errors during processing, as there are cases where an input is not perfectly valid (or the program is not designed or ready to get such an input).

The most typical error is unsatisfied expectation of tree structure like:
  <xsl:variable name="element" as="element()" select="some-element"/>

Obviously, dynamic error occurs if a specified element is not present. To concentrate on primary logic, and to avoid a burden of illegal (unexpected) case recovery we have created a try/catch API. The goal of such API is:

  • to be able to continue processing in case of error;
  • report as much as possible useful information related to an error.

Alternatives:

Do not think this is our arrogance, which has turned us to create a custom API. No, we were looking for alternatives! Please see [xsl] saxon:try() discussion:

  • saxon:try() function - is a kind of pseudo function, which explicitly relies on lazy evaluation of its arguments, and ... it's not available in SaxonB;
  • ex:error-safe  extension instruction - is far from perfect in its implementation quality, and provides no error location.

We have no other way except to design this feature by ourselves. In our defence one can say that we are using innovatory approach that encapsulates details of the implementation behind template and calls handlers indirectly.

Use:

Try/catch API is designed as a template <xsl:template name="t:try-block"/> calling a "try" handler, and, if required, a "catch" hanler using <xsl:apply-templates mode="t:call"/> instruction. Caller passes any information to these handlers by the means of tunnel parameters.

Handlers must be in a "t:call" mode. The "catch" handler may recieve following error info parameters:

<xsl:param name="error" as="xs:QName"/>
<xsl:param name="error-description" as="xs:string"/>
<xsl:param name="error-location" as="item()*"/>

where $error-location is a sequence of pairs (location as xs:string, context as item())*.

A sample:

<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:t="http://www.nesterovsky-bros.com/xslt/public/"
  exclude-result-prefixes="xs t">

<xsl:include href="try-block.xslt"/>

<xsl:template match="/">
  <result>
    <xsl:for-each select="1 to 10">
      <xsl:call-template name="t:try-block">
        <xsl:with-param name="value" tunnel="yes" select=". - 5"/>
        <xsl:with-param name="try" as="element()">
          <try/>
        </xsl:with-param>
        <xsl:with-param name="catch" as="element()">
          <t:error-handler/>
        </xsl:with-param>
      </xsl:call-template>
    </xsl:for-each>
  </result>
</xsl:template>

<xsl:template mode="t:call" match="try">
  <xsl:param name="value" tunnel="yes" as="xs:decimal"/>

  <value>
    <xsl:sequence select="1 div $value"/>
  </value>
</xsl:template>

</xsl:stylesheet>

The sample prints values according to the formula "1/(i - 5)", where "i" is a variable varying from 1 to 10. Clearly, division by zero occurs when "i" is equal to 5.

Please notice how to access try/catch API through <xsl:include href="try-block.xslt"/>. The main logic is executed in <xsl:template mode="t:call" match="try"/>, which recieves parameters using tunneling. A default error handler <t:error-handler/> is used to report errors.

Error report:

Error: FOAR0001
Description:
Decimal divide by zero

Location:
1. systemID: "file:///D:/style/try-block-test.xslt", line: 34
2. template mode="t:call" match="element(try, xs:anyType)"
  systemID: "file:///D:/style/try-block-test.xslt", line: 30
  context node:
    /*[1][local-name() = 'try']
3. template mode="t:call"
  match="element({http://www.nesterovsky-bros.com/xslt/private/try-block}try, xs:anyType)"
  systemID: "file:///D:/style/try-block.xslt", line: 53
  context node:
    /*[1][local-name() = 'try']
4. systemID: "file:///D:/style/try-block.xslt", line: 40
5. call-template name="t:try-block"
  systemID: "file:///D:/style/try-block-test.xslt", line: 17
6. for-each
  systemID: "file:///D:/style/try-block-test.xslt", line: 16
  context item: 5
7. template mode="saxon:_defaultMode" match="document-node()"
  systemID: "file:///D:/style/try-block-test.xslt", line: 14
  context node:
    /

Implementation details:

You were not expecting this API to be pure xslt, weren't you? :-)

Well, you're right, there is an extension function. Its pseudo code is like this:

function tryBlock(tryItems, catchItems)
{
  try
  {
    execute xsl:apply-templates for tryItems.
  }
  catch
  {
    execute xsl:apply-templates for catchItems.
  }
}

 

The last thing. Please get the implementation saxon.extensions.zip. There you will find sources of the try/catch, and tuples/maps API.

Thursday, 26 June 2008 09:18:50 UTC  #    Comments [0] -
Announce | Tips and tricks | xslt
# Tuesday, 17 June 2008

Right now we're inhabiting in the java world, thus all our tasks are (in)directly related to this environment.

We want to store stylesheets as resources of java application, and at the same time to point to these stylesheets without jar qualification. In .NET this idea would not appear at all, as there are well defined boundaries between assemblies, but java uses rather different approach. Whenever you have a resource name, it's up to ClassLoader to find this resource. To exploit this feature we've created an uri resolver for the stylesheet transformation. The protocol we use has a following format: "resource:/resource-path".

For example to store stylesheets in the META-INF/stylesheets folder we use uri "resource:/META-INF/stylesheets/java/main.xslt". Relative path is resolved naturally. A path "../jxom/java-serializer.xslt" in previously mentioned stylesheet is resolved to "resource:/META-INF/stylesheets/jxom/java-serializer.xslt".

We've created a small class ResourceURIResolver. You need to supply an instance of TransformerFactory with this resolver:
  transformerFactory.setURIResolver(new ResourceURIResolver());

The class itself is so small that we qoute it here:

import java.io.InputStream;

import java.net.URI;
import java.net.URISyntaxException;

import javax.xml.transform.Source;
import javax.xml.transform.TransformerException;
import javax.xml.transform.URIResolver;

import javax.xml.transform.stream.StreamSource;

/**
 * This class implements an interface that can be called by the processor
 * to turn a URI used in document(), xsl:import, or xsl:include into a
 * Source object.
 */
public class ResourceURIResolver implements URIResolver
{
  /**
   * Called by the processor when it encounters
   * an xsl:include, xsl:import, or document() function.
   *
   * This resolver supports protocol "resource:".
   * Format of uri is: "resource:/resource-path", where "resource-path" is an
   * argument of a {@link ClassLoader#getResourceAsStream(String)} call.
   * @param href - an href attribute, which may be relative or absolute.
   * @param base - a base URI against which the first argument will be made
   *   absolute if the absolute URI is required.
   * @return a Source object, or null if the href cannot be resolved, and
   *   the processor should try to resolve the URI itself.
   */
  public Source resolve(String href, String base)
    throws TransformerException
  {
    if (href == null)
    {
      return null;
    }

    URI uri;

    try
    {
      if (base == null)
      {
        uri = new URI(href);
      }
      else
      {
        uri = new URI(base).resolve(href);
      }
    }
    catch(URISyntaxException e)
    {
      // Unsupported uri.
      return null;
    }

    if (!"resource".equals(uri.getScheme()))
    {
      return null;
    }

    String resourceName = uri.getPath();

    if ((resourceName == null) || (resourceName.length() == 0))
    {
      return null;
    }

    if (resourceName.charAt(0) == '/')
    {
      resourceName = resourceName.substring(1);
    }

    ClassLoader classLoader = Thread.currentThread().getContextClassLoader();
    InputStream stream = classLoader.getResourceAsStream(resourceName);

    if (stream == null)
    {
      return null;
    }

    return new StreamSource(stream, uri.toString());
  }
}

Tuesday, 17 June 2008 07:57:52 UTC  #    Comments [0] -
Tips and tricks | xslt
# Monday, 09 June 2008

We've uploaded an update for the jxom.

It has turned out that jxom schema is so powerful that you can do a great number of manipulations over xml representation of java program.

In our case this is an optimization of unreachable code, defined at Sun's spec. We're facing this problem as result of translation from other ancient language, which also has well defined xml schema.

We also have introduced an ability to annotate jxom elements (see meta element), which in practice we use to annotate expressions with their types and perform "compile time" expression evaluation.

You may download jxom version at usual place.

See also: Java Xml Object Model.

Monday, 09 June 2008 06:47:54 UTC  #    Comments [0] -
xslt
# Sunday, 18 May 2008

Recently I've proposed to add two new atomic types tuple and map to the xpath/xslt/xquery type system (see "Tuples an maps"). Later I've implemented tuple and map pure xslt approximation. Now I want to present java implementation for Saxon.

I've created TupleValue and MapValue atomic types, and Collections class exposing extension functions api. It's easy to use this api. I'll repeat an example that I was showing earlier:

<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:f="http://www.nesterovsky-bros.com/xslt/functions/public"
  xmlns:p="http://www.nesterovsky-bros.com/xslt/functions/private"
  xmlns:c="java:com.nesterovskyBros.saxon.Functions"
  exclude-result-prefixes="xs f p c">

<xsl:template match="/">
   <root>
     <xsl:variable name="tuples" as="item()*" select="
       for $i in 1 to 20
         return c:tuple(1 to $i)"/>

    <total-items>
      <xsl:sequence select="
        sum
        (
          for $tuple in $tuples return
            count(c:tuple-items($tuple))
        )"/>
    </total-items>
    <tuples-size>
      <xsl:sequence select="count($tuples)"/>
    </tuples-size>
    <sums-per-tuples>
      <xsl:for-each select="$tuples">
        <xsl:variable name="index" as="xs:integer" select="position()"/>

        <sum index="{$index}" value="{sum(c:tuple-items(.))}"/>
      </xsl:for-each>
    </sums-per-tuples>

    <xsl:variable name="cities" as="element()*">
      <city name="Jerusalem" country="Israel"/>
      <city name="London" country="Great Britain"/>
      <city name="Paris" country="France"/>
      <city name="New York" country="USA"/>
      <city name="Moscow" country="Russia"/>
      <city name="Tel Aviv" country="Israel"/>
      <city name="St. Petersburg" country="Russia"/>
     </xsl:variable>

    <xsl:variable name="map" as="item()" select="
      c:map
      (
        for $city in $cities return
        (
          $city/string(@country),
          $city
        )
      )"/>

    <xsl:for-each select="c:map-keys($map)">
      <xsl:variable name="key" as="xs:string" select="."/>

      <country name="{$key}">
        <xsl:sequence select="c:map-value($map, $key)"/>
     </country>
    </xsl:for-each>
  </root>
</xsl:template>

</xsl:stylesheet>

Download java source.

P.S. I would wish this api be integrated into Saxon, as at present java extension functions are called through reflection.

Sunday, 18 May 2008 08:44:09 UTC  #    Comments [2] -
Announce | xslt
# Monday, 12 May 2008

Today I've found another new language (working draft in fact). It's an XML Pipeline Language.

XProc: An XML Pipeline Language, a language for describing operations to be performed on XML documents.

An XML Pipeline specifies a sequence of operations to be performed on zero or more XML documents. Pipelines generally accept zero or more XML documents as input and produce zero or more XML documents as output. Pipelines are made up of simple steps which perform atomic operations on XML documents and constructs similar to conditionals, iteration, and exception handlers which control which steps are executed.

An experience shows a process of language invention is an essential part of computer industry from the very beginning, however...

I must confess I must be too reluctant to any new language: I was happy with C++, but then all these new languages like Delphi, Java, C#, and so many others started to appear. It's correct to say that there is no efficient universal language, however I think it's wrong to say that a domain specific language is required to solve a particular problem in a most efficient way.

And now a question to the point: why do you need a new language for describing operations to be performed on XML documents?

Monday, 12 May 2008 07:11:58 UTC  #    Comments [0] -
xslt
# Friday, 09 May 2008
Георгиевская ленточка

A project I'm currently working on, requires me to manipulate with a big number of documents. This includes accessing these documents with key() function.

I never thought this task poses any problem, until I've discovered that Saxon caches documents loaded using document() function to preserve their identities:

By default, this function is ·stable·. Two calls on this function return the same document node if the same URI Reference (after resolution to an absolute URI Reference) is supplied to both calls. Thus, the following expression (if it does not raise an error) will always be true:

doc("foo.xml") is doc("foo.xml")

However, for performance reasons, implementations may provide a user option to evaluate the function without a guarantee of stability. The manner in which any such option is provided is implementation-defined. If the user has not selected such an option, a call of the function must either return a stable result or must raise an error: [err:FODC0003].

Saxon provides a saxon:discard-document() function to release documents from cache. The use case is like this:

<xsl:variable name="document" as="document-node()"
   select="saxon:discard-document(document(...))"/>

You may see, that saxon:discard-document() is bound to a place where document is loaded. In my case this is inefficient, as my code repeatedly accesses documents from different places. To release loaded documents I need to collect them after main processing.

Other issue in Saxon is that, processor may keep document references through xsl:key, thus saxon:discard-document() provides no guaranty of documents to be garbage collected.

To deal with this, I've designed (Saxon specific) api to manage document pools:

t:begin-document-pool-scope() as item()
  Begins document pool scope.
    Returns scope id.

t:end-document-pool-scope(scope as item())
  Terminates document pool scope.
    $scope - scope id.

t:put-document-in-pool(document as document-node()) as document-node()
  Puts a document into a current scope of document pool.
    $document - a document to put into the document pool.
    Returns the same document node.

The use case is:

<xsl:variable name="scope" select="t:begin-document-pool-scope()"/>

<xsl:sequence select="t:assert($scope)"/>

...
<xsl:variable name="document" as="document-node()"
  select="t:put-document-in-pool(...)"/>
...

<xsl:sequence select="t:end-document-pool-scope($scope)"/>

Download document-pool.xslt to use this api.

Friday, 09 May 2008 06:58:29 UTC  #    Comments [0] -
xslt
# Saturday, 03 May 2008

I was already writing about the logical difference between tamplates and functions. This time I've realized another, technical one. It's related to lazy evaluation, permitted by language specification.

I was arguing as follows:

  • suppose you define a function returning a sequence;
  • this function at final step constructs document using xsl:result-document;
  • caller invokes this function and uses only first item of sequence;
  • lazy evaluation allows to xslt processor to calculate first item only, thus to avoid creation of output document altogether.

This conclusion looked ridiculous to me, as it means that I cannot reliably expect creation of documents built with xsl:result-document instruction.

To resolve the issue I've checked specification. Someone has already thought of this. This is what specification says:

[Definition: Each instruction in the stylesheet is evaluated in one of two possible output states: final output state or temporary output state].

[Definition: The first of the two output states is called final output state. This state applies when instructions are writing to a final result tree.]

[Definition: The second of the two output states is called temporary output state. This state applies when instructions are writing to a temporary tree or any other non-final destination.]

The instructions in the initial template are evaluated in final output state. An instruction is evaluated in the same output state as its calling instruction, except that xsl:variable, xsl:param, xsl:with-param, xsl:attribute, xsl:comment, xsl:processing-instruction, xsl:namespace, xsl:value-of, xsl:function, xsl:key, xsl:sort, and xsl:message always evaluate the instructions in their contained sequence constructor in temporary output state.

[ERR XTDE1480] It is a non-recoverable dynamic error to evaluate the xsl:result-document instruction in temporary output state.

As you can see, xsl:function is always evaluated in temporary output state, and cannot contain xsl:result-document, in contrast to xsl:template, which may be evaluated in final output state. This difference dictates the role of templates as a "top level functions" and functions as standalone algorithms.

You can find more on subject at "Lazy evaluation and predicted results".

Saturday, 03 May 2008 16:36:38 UTC  #    Comments [0] -
xslt
# Wednesday, 30 April 2008

In the era of parallel processing it's so natural to inscribe your favorite programming language in the league of "Multithreading supporter". I've seen such appeals before "Wide Finder in XSLT --> deriving new requirements for efficiency in XSLT processors."

... I am not aware of any XSLT implementation that provides explicit or implicit support for parallel processing (with the obvious goal to take advantage of the multi-core processors that have almost reached a "prevalent" status today) ...

I think both xslt and xquery are well fitted for parrallel processing in terms of type system. This is because of "immutable" nature (until recent additions) of the execution state, which prevents many race conditions. The only missing ingredients are indirect function call, and a couple of core functions to queue parallel tasks.

Suppose there is a type to encapsulate a function call (say function-id), and a function accepting a sequence and a function-id. This function calls function-id for each element of the sequence in a parallel way, and then combines a final result, as if it were implemented serially.

Pretty simple, isn't it?

<!--
  This function runs $id function for each item in a sequence.
    $items - items to process.
    $id - function id.
    Returns a sequece of results of calls to $id function.
-->
<xsl:function name="x:queue-tasks" as="items()*">
  <xsl:param name="items" as="item()*"/>
  <xsl:param name="id" as="x:function-id"/>

  <!-- The pseudo code. -->
  <xsl:sequence select="$items/call $id (.)"/>
</xsl:function>

Wednesday, 30 April 2008 06:59:29 UTC  #    Comments [0] -
xslt
# Saturday, 05 April 2008

Yesterday's idea has inspired me as much as to create a prototype implementation of map and tuple in the xslt 2.0.

Definitely I wished these were a built-in types, and were considered as atomic values for purposes of comparasions and iteration. This way it were possible to create highly efficient grouping per several fields at once.

This pure implementation (xslt-tuple.zip) is rather scetchy, however it allows to feel what can be done with tuples and maps. I guess a good example may say more than many other words, so have a pleasure:

<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:f="http://www.nesterovsky-bros.com/xslt/functions"
  exclude-result-prefixes="xs f">

  <xsl:include href="tuple.xslt"/>
  <xsl:include href="map.xslt"/>

  <xsl:template match="/">
    <root>
      <xsl:variable name="tuples" as="item()*" select="
        f:tuple
        (
          for $i in 1 to 10 return
            f:tuple(1 to $i)
        )"/>

      <total-items>
        <xsl:sequence select="count($tuples)"/>
      </total-items>
      <tuples-size>
        <xsl:sequence select="f:tuple-size($tuples)"/>
      </tuples-size>
      <sums-per-tuples>
        <xsl:for-each select="1 to f:tuple-size($tuples)">
          <xsl:variable name="index" as="xs:integer" select="position()"/>

          <sum
            index="{$index}"
            value="{sum(f:tuple-items(f:tuple-item($tuples, $index)))}"/>
        </xsl:for-each>
      </sums-per-tuples>

      <xsl:variable name="cities" as="element()*">
        <city name="Jerusalem" country="Israel"/>
        <city name="London" country="Great Britain"/>
        <city name="Paris" country="France"/>
        <city name="New York" country="USA"/>
        <city name="Moscow" country="Russia"/>
        <city name="Tel Aviv" country="Israel"/>
        <city name="St. Petersburg" country="Russia"/>
      </xsl:variable>

      <xsl:variable name="map" as="item()*" select="
        f:map
        (
          for $city in $cities return
            ($city/string(@country), $city)
        )"/>

      <xsl:for-each select="f:map-keys($map)">
        <xsl:variable name="key" as="xs:string" select="."/>

        <country name="{$key}">
          <xsl:sequence select="f:map-value($map, $key)"/>
        </country>
      </xsl:for-each>
    </root>
  </xsl:template>

</xsl:stylesheet>

Saturday, 05 April 2008 15:49:03 UTC  #    Comments [0] -
xslt
# Friday, 04 April 2008

The type system of xslt 2.0 is not complete (see Sequence of sequences in xslt 2.0). You cannot perform manipulations over items as you could do. The reason is in the luck of set based constructs: xslt 2.0 supports sequences, but not associative maps of items.

If you think that xml can be used as a good approximation of a map, I shan't agree with you. Xml has an application in a very specific cases only. Maps I'm thinking of,  would allow associate items by reference, like sequences do.

This opens a perspective to create a state objects, to manage sequence of sequences, to create cyclic graphs of items, and so on. These maps are richer than what key() function provides right now, and allow to implement for-each-group in xquery.

Such maps can be modeled with several functions, however I would wish they were built in:

f:map($items as item()*) as item()
Returns a map from a sequence $items of pairs (key, value).

f:map-items($map as item()) as item()*
Returns a sequence of pairs (key, value) for a map $map.

f:map-keys($map as item()) as item()*
Returns a sequence of keys contained in a map $map.

f:map-values($map as item()) as item()*
Returns a sequence of values contained in a map $map.

f:map-value($map as item(), $key as item()) as item()*
Returns a sequence of values corresponding to a specified key $key contained a specified map $map.

The other thing I would add is items tuple. It's like a sequence, however a sequence of tuples is never transformed into single sequence, but stays as sequence of tuples.

Fortunately it's possible to implement such extension functions.

Friday, 04 April 2008 13:49:56 UTC  #    Comments [0] -
xslt
# Wednesday, 02 April 2008

xslt 2.0 is a beautiful language and at the same time it allows constructs, which may trouble anyone.

Look at this valid stylesheet:

<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xsl:template match="/">
    <xsl:variable name="x" as="node()" select="."/>
    <xsl:variable name="x" as="xs:int" select="***"/>

    <xsl:sequence select="$x"/>
  </xsl:template>

</xsl:stylesheet>

Fun, isn't it? :-)

Wednesday, 02 April 2008 05:45:28 UTC  #    Comments [0] -
xslt
# Monday, 31 March 2008

I was thinking earlier about the difference between named tamplates and functions in xslt 2.0 and have not found satisfactory criterion for a decision of what to use in each case. I was not first one who has troubled with this, see stylesheet functions or named templates.

To feel easy I deliberately have decided to use functions whenever possible, avoid named tamplates completely, and use matching templates to apply logic depending on context (something like virtual function). I've forgot about the issue until yesterday. To realize the difference one should stop thinking of it, quite opposite she must start solving practical xslt tasks, and if there is any difference, except syntactic, it will manifest itself somehow.

To make things obvious to those whose programming roots are in a language like C++ I shall compare xsl:function with free standing (or static) C++ function, and named xsl:template with C++ member function. In C++ you can use both free standing and member functions interchangeably, however if there is only one argument (among others) whose state transition this function represents then it's preferrable to define it as a member function. The most important difference between these two type of functions is that a member function has hidden argument "this", and is able to access its private state.

Please, do not try to think I'm going to compare template context item in xslt 2.0 with "this" in C++, quite opposite I consider context item as a part of a state. I'm arguing however, of private state that can be passed through template call chain with tunnel parameters. Think of a call tunneling some state (like options, flags, values), and that state accessed several levels deep in call hierarchy, whenever one needs to. You cannot do it with xsl:function, you cannot pass all private state through the function call, you just do not know of it.

This way my answer to the tacit question is:

  •  use xsl:function to perform independent unit of logic;
  •  use named xsl:template when a functionality is achieved cooperatively, and when you will possibly need to share the state between different implementation blocks;

After thinking through this, I've noticed that such distinction does not exist in XQuery 1.0. There is no tunneling there. :-)

Monday, 31 March 2008 06:54:22 UTC  #    Comments [0] -
xslt
# Tuesday, 25 March 2008

In the xslt world there is no widely used custom to think of stylesheet members as of public and private in contrast to other programming languages like C++/java/c# where access modifiers are essential. The reason is in complexity of stylesheets: the less size of code - the easier to developer to keep all details in memory. Whenever xslt program grows you should modularize it to keep it manageable.

At the point where modules are introduced one starts thinking of public interface of module and its implementation details. This separation is especially important for the template matching as you won't probably want to match private template just because you've forgotten about some template in implementation of some module.

To make public or private member distinction you can introduce two namespaces in your stylesheet, like:

For the private namespace you can use a unique name, e.g. stylesheet name as part of uri.

The following example is based on jxom. This stylesheet builds expression from expression tree. Public part consists only of t:get-expression function, other members are private:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:t="http://www.nesterovsky-bros.com/public"
  xmlns:p="http://www.nesterovsky-bros.com/private/expression.xslt"
  xmlns="http://www.nesterovsky-bros.com/download/jxom.zip"
  xpath-default-namespace="http://www.nesterovsky-bros.com/download/jxom.zip"
  exclude-result-prefixes="xs t p">

  <xsl:output method="text" indent="yes"/>

  <!-- Entry point. -->
  <xsl:template match="/">
    <xsl:variable name="expression" as="element()">
      <lt>
        <sub>
          <mul>
            <var name="b"/>
            <var name="b"/>
          </mul>
          <mul>
            <mul>
              <int>4</int>
              <var name="a"/>
            </mul>
            <var name="c"/>
          </mul>
        </sub>
        <double>0</double>
      </lt>
    </xsl:variable>

    <xsl:value-of select="t:get-expression($expression)" separator=""/>
  </xsl:template>

  <!--
    Gets expression.
      $element - expression element.
      Returns expression tokens.
  -->
  <xsl:function name="t:get-expression" as="item()*">
    <xsl:param name="element" as="element()"/>

    <xsl:apply-templates mode="p:expression" select="$element"/>
  </xsl:function>

  <!--
    Gets binary expression.
      $element - assignment expression.
      $type - expression type.
      Returns expression token sequence.
  -->
  <xsl:function name="p:get-binary-expression" as="item()*">
    <xsl:param name="element" as="element()"/>
    <xsl:param name="type" as="xs:string"/>

    <xsl:sequence select="t:get-expression($element/*[1])"/>
    <xsl:sequence select="' '"/>
    <xsl:sequence select="$type"/>
    <xsl:sequence select="' '"/>
    <xsl:sequence select="t:get-expression($element/*[2])"/>
  </xsl:function>

  <!-- Mode "expression". Empty match. -->
  <xsl:template mode="p:expression" match="@*|node()">
    <xsl:sequence select="error(xs:QName('invalid-expression'), name())"/>
  </xsl:template>

  <!-- Mode "expression". or. -->
  <xsl:template mode="p:expression" match="or">
    <xsl:sequence select="p:get-binary-expression(., '||')"/>
  </xsl:template>

  <!-- Mode "expression". and. -->
  <xsl:template mode="p:expression" match="and">
    <xsl:sequence select="p:get-binary-expression(., '&&')"/>
  </xsl:template>

  <!-- Mode "expression". eq. -->
  <xsl:template mode="p:expression" match="eq">
    <xsl:sequence select="p:get-binary-expression(., '==')"/>
  </xsl:template>

  <!-- Mode "expression". ne. -->
  <xsl:template mode="p:expression" match="ne">
    <xsl:sequence select="p:get-binary-expression(., '!=')"/>
  </xsl:template>

  <!-- Mode "expression". le. -->
  <xsl:template mode="p:expression" match="le">
    <xsl:sequence select="p:get-binary-expression(., '<=')"/>
  </xsl:template>

  <!-- Mode "expression". ge. -->
  <xsl:template mode="p:expression" match="ge">
    <xsl:sequence select="p:get-binary-expression(., '>=')"/>
  </xsl:template>

  <!-- Mode "expression". lt. -->
  <xsl:template mode="p:expression" match="lt">
    <xsl:sequence select="p:get-binary-expression(., '<')"/>
  </xsl:template>

  <!-- Mode "expression". gt. -->
  <xsl:template mode="p:expression" match="gt">
    <xsl:sequence select="p:get-binary-expression(., '>')"/>
  </xsl:template>

  <!-- Mode "expression". add. -->
  <xsl:template mode="p:expression" match="add">
    <xsl:sequence select="p:get-binary-expression(., '+')"/>
  </xsl:template>

  <!-- Mode "expression". sub. -->
  <xsl:template mode="p:expression" match="sub">
    <xsl:sequence select="p:get-binary-expression(., '-')"/>
  </xsl:template>

  <!-- Mode "expression". mul. -->
  <xsl:template mode="p:expression" match="mul">
    <xsl:sequence select="p:get-binary-expression(., '*')"/>
  </xsl:template>

  <!-- Mode "expression". div. -->
  <xsl:template mode="p:expression" match="div">
    <xsl:sequence select="p:get-binary-expression(., '/')"/>
  </xsl:template>

  <!-- Mode "expression". neg. -->
  <xsl:template mode="p:expression" match="neg">
    <xsl:sequence select="'-'"/>
    <xsl:sequence select="t:get-expression(*[1])"/>
  </xsl:template>

  <!-- Mode "expression". not. -->
  <xsl:template mode="p:expression" match="not">
    <xsl:sequence select="'!'"/>
    <xsl:sequence select="t:get-expression(*[1])"/>
  </xsl:template>

  <!-- Mode "expression". parens. -->
  <xsl:template mode="p:expression" match="parens">
    <xsl:sequence select="'('"/>
    <xsl:sequence select="t:get-expression(*[1])"/>
    <xsl:sequence select="')'"/>
  </xsl:template>

  <!-- Mode "expression". var. -->
  <xsl:template mode="p:expression" match="var">
    <xsl:sequence select="@name"/>
  </xsl:template>

  <!-- Mode "expression". int, short, byte, long, float, double. -->
  <xsl:template mode="p:expression"
    match="int | short | byte | long | float | double">
    <xsl:sequence select="."/>
  </xsl:template>

 </xsl:stylesheet>

Tuesday, 25 March 2008 06:23:30 UTC  #    Comments [0] -
Tips and tricks | xslt
# Monday, 03 March 2008

I often find myself in a position that whenever I'm thinking of something, I can find the idea to be already implemented somewhere.

A good example is xslt/xquery -> java code.

Well, the world is full with smart guys. :-)

Monday, 03 March 2008 18:08:21 UTC  #    Comments [0] -
xslt
# Thursday, 28 February 2008

Wow, I've found an article Code generation in XSLT 2.0. The article is dated by year 2005.

Well, I was inventing a bicycle. This is a good lesson for me.

I'm going to study very carefully about SQL Code Generation, as this is exacly the same task I'm facing now.

Thursday, 28 February 2008 04:35:24 UTC  #    Comments [0] -
xslt
# Wednesday, 27 February 2008

I've updated jxom.zip.

There are minor fixes there. The most important addition is a line breaker. The purpose of the line breaker is to split long lines.

Long lines appear if there are verbose comments, or there is a very long expression, which was not categorized as multiline.

It's not perfect, however looks acceptable.

Now I'm facing a next problem: I need to do a similar job I'm doing to java, however with sql. Moreover, I need to support several dialects of sql. I'm not sure if it's possible (worth) to define single sql-xom.xsd, or should I define sql-db2-v9-xom.xsd, sql-sqlserver-2005-xom.xsd, ...

The bad news are that sql grammar is much more complex than one of java. Probably I'll start from some sql subset. In any case I do not consider generation of sql "directly", as jxom fits remarkably into its role.

Wednesday, 27 February 2008 13:30:47 UTC  #    Comments [0] -
xslt
# Wednesday, 20 February 2008

Building jxom stylesheets I've learned what is a "good" and "bad" recursion from the saxon's perspective.

I'm using control tokens $t:indent and $t:unindent to control indentation in the sequence of tokens defining java output. To build output lines I need to calculate total indentation for each line. This can be done using cummulative sum, considering $t:indent as +1 and $t:unindent as -1.

This task can be formalized as "calculate cummulative integer sum".

The first approach I've tested is non recursive: "for $i in 1 to count($items) return sum(subsequence($items, 1, $i))".
It is incredibly slow.

The next try was recurrent: calculate and spew results as they are calculated.
This is "crash fast" method. Saxon, indeed, implements this as recursion and arrives to a stack limit early.

The last approach, employes saxon's ability to detect some particular flavour of tail calls. When function contains a tail call, and the output on a tail call code path consists of this tail call only, then saxon transforms such construction into a cycle. Thus I need to accumulate result and pass it down to a tail call chain and output it on the last opportunity only.

The following sample shows this technique:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:t="http://www.nesterovsky-bros.com"
  exclude-result-prefixes="xs t">

  <xsl:output method="xml" indent="yes"/>

  <xsl:template match="/">
    <xsl:variable name="values" as="xs:integer*" select="1 to 10000"/>

    <result>
      <sum>
        <xsl:value-of select="t:cumulative-integer-sum($values)"/>

        <!-- This call crashes with stack overflow. -->
        <!-- <xsl:value-of select="t:bad-cumulative-integer-sum($values)"/> -->

        <!-- To compare speed uncomment following lines. -->
        <!--<xsl:value-of select="sum(t:cumulative-integer-sum($values))"/>-->
        <!--<xsl:value-of select="sum(t:slow-cumulative-integer-sum($values))"/>-->
      </sum>
    </result>
  </xsl:template>

  <!--
    Calculates cumulative sum of integer sequence.
      $items - input integer sequence.
      Returns an integer sequence that is a cumulative sum of original sequence.
  -->
  <xsl:function name="t:cumulative-integer-sum" as="xs:integer*">
    <xsl:param name="items" as="xs:integer*"/>

    <xsl:sequence select="t:cumulative-integer-sum-impl($items, 1, 0, ())"/>
  </xsl:function>

  <!--
    Implementation of the t:cumulative-integer-sum.
      $items - input integer sequence.
      $index - current iteration index.
      $sum - base sum.
      $result - collected result.
      Returns an integer sequence that is a cumulative sum of original sequence.
  -->
  <xsl:function name="t:cumulative-integer-sum-impl" as="xs:integer*">
    <xsl:param name="items" as="xs:integer*"/>
    <xsl:param name="index" as="xs:integer"/>
    <xsl:param name="sum" as="xs:integer"/>
    <xsl:param name="result" as="xs:integer*"/>

    <xsl:variable name="item" as="xs:integer?" select="$items[$index]"/>

    <xsl:choose>
      <xsl:when test="empty($item)">
        <xsl:sequence select="$result"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:variable name="value" as="xs:integer" select="$item + $sum"/>
        <xsl:variable name="next" as="xs:integer+" select="$result, $value"/>

        <xsl:sequence select="
          t:cumulative-integer-sum-impl($items, $index + 1, $value, $next)"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:function>

  <!-- "Bad" implementation of the cumulative-integer-sum. -->
  <xsl:function name="t:bad-cumulative-integer-sum" as="xs:integer*">
    <xsl:param name="items" as="xs:integer*"/>

    <xsl:sequence select="t:bad-cumulative-integer-sum-impl($items, 1, 0)"/>
  </xsl:function>

  <!-- "Bad" implementation of the cumulative-integer-sum. -->
  <xsl:function name="t:bad-cumulative-integer-sum-impl" as="xs:integer*">
    <xsl:param name="items" as="xs:integer*"/>
    <xsl:param name="index" as="xs:integer"/>
    <xsl:param name="sum" as="xs:integer"/>

    <xsl:variable name="item" as="xs:integer?" select="$items[$index]"/>

    <xsl:if test="exists($item)">
      <xsl:variable name="value" as="xs:integer" select="$item + $sum"/>
 
      <xsl:sequence select="$value"/>
      <xsl:sequence select="
        t:bad-cumulative-integer-sum-impl($items, $index + 1, $value)"/>
    </xsl:if>
  </xsl:function>

 <!-- Non recursive implementation of the cumulative-integer-sum. -->
 <xsl:function name="t:slow-cumulative-integer-sum" as="xs:integer*">
   <xsl:param name="items" as="xs:integer*"/>

   <xsl:sequence select="
     for $i in 1 to count($items) return
       sum(subsequence($items, 1, $i))"/>
 </xsl:function>

</xsl:stylesheet>

Wednesday, 20 February 2008 08:59:22 UTC  #    Comments [0] -
xslt
# Tuesday, 19 February 2008

Comparing xslt 2.0 with its predecessor I see a great evolution of the language. There are however parts of language, which are not as good as they could be.

Look at manipulations of sequence of sequence of items. xpath 2.0/xquery 1.0 type system treats type quantifiers separately from type itself. One can declare a variable of type "xs:string", or variable of type of sequence of strings "xs:string*". Unfortunately it's not possible to declare a sequence of sequence of strings "xs:string**", as type can have only one quantifier.

I think this is wrong. People do different tricks to remedy the problem. Typically one builds nodes that contain copy of items of sequences. Clearly this is a heavy way to achieve a simple result, moreover it does not preserve item identity.

In jxom I'm using different solution to store sequence of sequences, namely storing all sequences in one, separated with terminator.

A typical sample is in the java serializer. After building method's parameters I should format them one (compact) or the other (verbose) way depending on decision, which can be made when all parameters are already built.

To see how it's working please look at following xslt:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:t="http://www.nesterovsky-bros.com"
  exclude-result-prefixes="xs t">

  <xsl:output method="xml" indent="yes"/>

  <!-- Terminator token. -->
  <xsl:variable name="t:terminator" as="xs:QName"
    select="xs:QName('t:terminator')"/>

  <!-- New line. -->
  <xsl:variable name="t:crlf" as="xs:string" select="'&#10;'"/>

  <xsl:template match="/">
    <!--
      We need to manipulate a sequence of sequence of tokens.
      To do this we use $t:terminator to separate sequences.
    -->
    <xsl:variable name="short-items" as="item()*">
      <xsl:sequence select="t:get-param('int', 'a')"/>
      <xsl:sequence select="$t:terminator"/>

      <xsl:sequence select="t:get-param('int', 'b')"/>
      <xsl:sequence select="$t:terminator"/>

      <xsl:sequence select="t:get-param('int', 'c')"/>
      <xsl:sequence select="$t:terminator"/>
    </xsl:variable>

    <xsl:variable name="long-items" as="item()*">
      <xsl:sequence select="t:get-param('int', 'a')"/>
      <xsl:sequence select="$t:terminator"/>

      <xsl:sequence select="t:get-param('int', 'b')"/>
      <xsl:sequence select="$t:terminator"/>

      <xsl:sequence select="t:get-param('int', 'c')"/>
      <xsl:sequence select="$t:terminator"/>

      <xsl:sequence select="t:get-param('int', 'd')"/>
      <xsl:sequence select="$t:terminator"/>
    </xsl:variable>

    <result>
      <short>
        <xsl:value-of select="t:format($short-items)" separator=""/>
      </short>
      <long>
        <xsl:value-of select="t:format($long-items)" separator=""/>
      </long>
    </result>
  </xsl:template>

  <!--
    Returns a sequence of tokens that defines a parameter.
      $type - parameter type.
      $name - parameter name.
      Returns sequence of parameter tokens.
  -->
  <xsl:function name="t:get-param" as="item()*">
    <xsl:param name="type" as="xs:string"/>
    <xsl:param name="name" as="xs:string"/>

    <xsl:sequence select="$type"/>
    <xsl:sequence select="' '"/>
    <xsl:sequence select="$name"/>
  </xsl:function>

  <!--
    Format sequence of sequence of tokens separated with $t:terminator.
      $tokens - sequence of sequence of tokens to format.
      Returns formatted sequence of tokens.
  -->
  <xsl:function name="t:format" as="item()*">
    <xsl:param name="tokens" as="item()*"/>

    <xsl:variable name="terminators" as="xs:integer+"
      select="0, index-of($tokens, $t:terminator)"/>
    <xsl:variable name="count" as="xs:integer"
      select="count($terminators) - 1"/>
    <xsl:variable name="verbose" as="xs:boolean"
      select="$count > 3"/>

    <xsl:sequence select="
      for $i in 1 to $count return
      (
        subsequence
        (
          $tokens,
          $terminators[$i] + 1,
          $terminators[$i + 1] - $terminators[$i] - 1
        ),
        if ($i = $count) then ()
        else
        (
          ',',
          if ($verbose) then $t:crlf else ' '
        )
      )"/>
  </xsl:function>

</xsl:stylesheet>

Tuesday, 19 February 2008 07:54:11 UTC  #    Comments [0] -
xslt
# Monday, 18 February 2008

I've updated jxom.zip. Now it supports qualified type name optimization.

I need to mention that this optimization is only possible when imports does not contain wildcard declarations like:

import a.b.*;

The only important thing to do is a good line breaker.

Monday, 18 February 2008 09:28:34 UTC  #    Comments [0] -
xslt

Is it possible to call function indirectly in xslt 2.0?

The answer is yes, however implementation uses dull trick of template matching to select a function handler. Template matching is a beautiful thing. Definitely it was not devised to make this trick possible.

The following example defines two functions t:sum, and t:count to call indirectly by t:test.
Function id (a.k.a. function pointer) is defined by t:sum, and t:count variables.

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:t="http://www.nesterovsky-bros.com"
  exclude-result-prefixes="xs t">

<xsl:output method="xml" indent="yes"/>

<xsl:template match="/">
  <xsl:variable name="items" as="element()*">
    <value>1</value>
    <value>2</value>
    <value>3</value>
    <value>4</value>
    <value>5</value>
  </xsl:variable>

  <root>
    <sum>
      <xsl:sequence select="t:test($items, $t:sum)"/>
    </sum>
    <count>
      <xsl:sequence select="t:test($items, $t:count)"/>
    </count>
  </root>
</xsl:template>

<!-- Mode "t:function-call". Default match. -->
<xsl:template mode="t:function-call" match="@* | node()">
 <xsl:sequence select="
   error
   (
     xs:QName('invalid-call'),
     concat('Unbound function call. Id: ', name())
   )"/>
</xsl:template>

<!-- Id of the function t:sum. -->
<xsl:variable name="t:sum" as="item()">
 <t:sum/>
</xsl:variable>

<!-- Mode "t:function-call". t:sum handler. -->
<xsl:template mode="t:function-call" match="t:sum">
  <xsl:param name="items" as="element()*"/>

  <xsl:sequence select="t:sum($items)"/>
</xsl:template>

<!--
  Calculates a sum of elements.
    $param - items to sum.
    Returns sum of element values.
-->
<xsl:function name="t:sum" as="xs:integer">
  <xsl:param name="items" as="element()*"/>

  <xsl:sequence select="sum($items/xs:integer(.))"/>
</xsl:function>

<!-- Id of the function t:count. -->
<xsl:variable name="t:count" as="item()">
  <t:count/>
</xsl:variable>

<!-- Mode "t:function-call". t:count handler. -->
<xsl:template mode="t:function-call" match="t:count">
  <xsl:param name="items" as="element()*"/>

  <xsl:sequence select="t:count($items)"/>
</xsl:template>

<!--
  Calculates the number of elements in a sequence.
    $param - items to count.
    Returns count of element values.
-->
<xsl:function name="t:count" as="xs:integer">
 <xsl:param name="items" as="element()*"/>

 <xsl:sequence select="count($items)"/>
</xsl:function>

<!--
  A function that performs indirect call.
    $param - items to pass to an indirect call.
    $function-id - a function id.
    Returns a value calculated in the indirect function.
-->
<xsl:function name="t:test" as="xs:integer">
 <xsl:param name="items" as="element()*"/>
 <xsl:param name="function-id" as="item()"/>

 <xsl:variable name="result" as="xs:integer">
   <xsl:apply-templates mode="t:function-call" select="$function-id">
     <xsl:with-param name="items" select="$items"/>
   </xsl:apply-templates>
 </xsl:variable>

 <xsl:sequence select="$result"/>
</xsl:function>

</xsl:stylesheet>

Monday, 18 February 2008 05:53:46 UTC  #    Comments [0] -
xslt
# Saturday, 16 February 2008

Hello again!

To see first part about jxom please read.

I'm back with jxom (Java xml object model). I've finally managed to create an xslt that generates java code from jxom document.

Will you ask why it took as long as a week to produce it?

There are two answers:
1. My poor talents.
2. I've virtually created two implementations.

My first approach was to directly generate java text from xml. I was a truly believer that this is the way. I've screwed things up on that way, as when you're starting to deal with indentations, formatting and reformatting of text you're generating you will see things are not that simple. Well, it was a naive approach.

I could finish it, however at some point I've realized that its complexity is not composable from complexity of its  parts, but increases more and more. This is not permissible for a such simple task. Approach is bad. Point.

An alternative I've devised is simple and in fact more natural than naive approach. This is a two stage generation:
  a) generate sequence of tokens - serializer;
  b) generate and then print a sequence of lines - streamer.

Tokens (item()*) are either control words (xs:QName), or literals (xs:string).

I've defined following control tokens:

Token Description
t:indent indents following content.
t:unindent unindents following content.
t:line-indent resets indentation for one line.
t:new-line new line token.
t:terminator separates token sequences.
t:code marks line as code (default line type).
t:doc marks line as documentation comment.
t:begin-doc marks line as begin of documentation comment.
t:end-doc marks line as end of documentation comment.
t:comment marks line as comment.

Thus an input for the streamer looks like:

<xsl:sequence select="'public'"/>
<xsl:sequence select="' '"/>
<xsl:sequence select="'class'"/>
<xsl:sequence select="' '"/>
<xsl:sequence select="'A'"/>
<xsl:sequence select="$t:new-line"/>
<xsl:sequence select="'{'"/>
<xsl:sequence select="$t:new-line"/>
<xsl:sequence select="$t:indent"/>
<xsl:sequence select="'public'"/>
<xsl:sequence select="' '"/>
<xsl:sequence select="'int'"/>
<xsl:sequence select="' '"/>
<xsl:sequence select="'a'"/>
<xsl:sequence select="';'"/>
<xsl:sequence select="$t:unindent"/>
<xsl:sequence select="$t:new-line"/>
<xsl:sequence select="'}'"/>
<xsl:sequence select="$t:new-line"/>

Streamer receives a sequence of tokens and transforms it in a sequence of lines.

One beautiful thing about tokens is that streamer can easily perform line breaks in order to keep page width, and another convenient thing is that code generating tokens should not track indentation level, as it just uses t:indent, t:unindent control tokens to increase and decrease current indentation.

The way the code is built allows mimic any code style. I've followed my favorite one. In future I'll probably add options controlling code style. In my todo list there still are several features I want to implement, such as line breaker to preserve page width, and type qualification optimizer (optional feature) to reduce unnecessary type qualifications.

Current implementation can be found at jxom.zip. It contains:

File Description
java.xsd jxom xml schema.
java-serializer-main.xslt transformation entry point.
java-serializer.xslt generates tokens for top level constructs.
java-serializer-statements.xslt generates tokens for statements.
java-serializer-expressions.xslt generates tokens for expressions.
java-streamer.xslt converts tokens into lines.
DataAdapter.xml sample jxom document.

This was my first experience with xslt 2.0. I feel very pleased with what it can do. The only missed feature is indirect function call (which I do not want to model with dull template matching approach).

Note that in spite that xslt I've built is platform independed I want to point out that I was experimenting with saxon 9. Several times I've relied on efficient tail call implementation (see t:cumulative-integer-sum), which otherwise will lead to xslt stack overflow.

I shall be pleased to see your feedback on the subject.

Saturday, 16 February 2008 10:42:16 UTC  #    Comments [7] -
Tips and tricks | xslt
# Saturday, 09 February 2008

Hello,

I was not writing for a long time. IMHO: nothing to say? - do not noise!

Nowadays I'm busy with xslt.

Should I be pleased that w3c committee has finally delivered xpath 2.0/xslt 2.0/xquery? There possibly were people who have failed to wait till this happened, and who have died. Be grateful to the fate we have survived!

I'm working now with saxon 9. It's good implementation, however too interpreter like in my opinion. I think these languages could be compiled down to machine/vm code the same way as c++/java/c# do.

To the point.
I need to generate java code in xslt. I've done this earlier; that time I dealt with relatively simple templates like beans or interfaces. Now I need to generate beans, interfaces, classes with logic. In fact I should cover almost all java 6 features.

Immediately I've started thinking in terms of java xml object model (jxom). Thus there will be an xml schema of jxom (Am I inventing bicycle? I pray you to point me to an existing schema!) - java grammar as xml. There will be xslts, which generate code according to this schema, and xslt that will serialize jxom documents derectly into java.

This two stage generation is important as there are essentially two different tasks: generate java code, and serialize it down to a text format. Moreover whenever I have jxom document I can manipulate it! And finally this will allow to our team to concentrate efforts, as one should only generate jxom document.

Yesterday, I've found java ANLT grammar, and have converted it into xml schema: java.xsd. It is important to have this xml schema defined, even if no one shall use it except in editor, as it makes jxom generation more formal.

The next step is to create xslt serializer, which is in todo list.

To feel how jxom looks I've created it manually for some simple java file:

// $Id: DataAdapter.java 1122 2007-12-31 12:43:47Z arthurn $
package com.bphx.coolgen.data;

import java.util.List;

/**
* Encapsulates encyclopedia database access.
*/

public interface DataAdapter
{
  /**
   * Starts data access session for a specified model.
   * @param modelId - a model to open.
   */

  void open(int modelId)
    throws Exception;

  /**
   * Ends data access session.
   */

  void close()
   throws Exception;

  /**
   * Gets current model id.
   * @return current model id.
   */

  int getModelId();

  /**
   * Gets data objects for a specified object type for the current model.
   * @param type - an object type to get data objects for.
   * @return list of data objects.
   */

  List<DataObject> getObjectsForType(short type)
    throws Exception;

  /**
   * Gets a list of data associations for an object id.
   * @param id - object id.
   * @return list of data associations.
   */

  List<DataAssociation> getAssociations(int id)
    throws Exception;

  /**
   * Gets a list of data properties for an object id.
   * @param id - object id.
   * @return list of data properties.
   */

  List<DataProperty> getProperties(int id)
    throws Exception;
}

jxom:

<unit xmlns="http://www.bphx.com/java-1.5/2008-02-07" package="com.bphx.coolgen.data">
  <comment>$Id: DataAdapter.java 1122 2007-12-31 12:43:47Z arthurn $</comment>
  <import package="java.util.List"/>
  <interface access="public" name="DataAdapter">
    <comment doc="true">Encapsulates encyclopedia database access.</comment>
    <method name="open">
      <comment doc="true">
        Starts data access session for a specified model.
        <para type="param" name="modelId">a model to open.</para>
      </comment>
      <parameters>
        <parameter name="modelId"><type name="int"/></parameter>
      </parameters>
      <throws><type name="Exception"/></throws>
    </method>
    <method name="close">
      <comment doc="true">Ends data access session.</comment>
      <throws><type name="Exception"/></throws>
    </method>
    <method name="getModelId">
      <comment doc="true">
        Gets current model id.
        <para type="return">current model id.</para>
      </comment>
      <returns><type name="int"/></returns>
      <throws><type name="Exception"/></throws>
    </method>
    <method name="getObjectsForType">
      <comment doc="true">
        Gets data objects for a specified object type for the current model.
        <para name="param" type="type">
          an object type to get data objects for.
        </para>
        <para type="return">list of data objects.</para>
      </comment>
      <returns>
        <type>
          <part name="List">
            <typeArgument><type name="DataObject"/></typeArgument>
          </part>
        </type>
      </returns>
      <parameters>
        <parameter name="type"><type name="short"/></parameter>
      </parameters>
      <throws><type name="Exception"/></throws>
    </method>
    <method name="getAssociations">
      <comment doc="true">
        Gets a list of data associations for an object id.
        <para type="param" name="id">object id.</para>
        <para type="return">list of data associations.</para>
      </comment>
      <returns>
        <type>
          <part name="List">
            <typeArgument><type name="DataAssociation"/></typeArgument>
          </part>
        </type>
      </returns>
      <parameters>
        <parameter name="id"><type name="int"/></parameter>
      </parameters>
      <throws><type name="Exception"/></throws>
    </method>
    <method name="getProperties">
      <comment doc="true">
        Gets a list of data properties for an object id.
        <para type="param" name="id">object id.</para>
        <para type="return">list of data properties.</para>
      </comment>
      <returns>
        <!-- Compact form of generic type. -->
        <type name="List<DataProperty>"/>
      </returns>
      <parameters>
        <parameter name="id"><type name="int"/></parameter>
      </parameters>
      <throws><type name="Exception"/></throws>
    </method>
  </interface>
</unit>

To read about xslt for jxom please follow this link.

Saturday, 09 February 2008 17:56:45 UTC  #    Comments [3] -
Tips and tricks | xslt
Archive
<2024 March>
SunMonTueWedThuFriSat
252627282912
3456789
10111213141516
17181920212223
24252627282930
31123456
Statistics
Total Posts: 386
This Year: 2
This Month: 0
This Week: 0
Comments: 931
Locations of visitors to this page
Disclaimer
The opinions expressed herein are our own personal opinions and do not represent our employer's view in anyway.

© 2024, Nesterovsky bros
All Content © 2024, Nesterovsky bros
DasBlog theme 'Business' created by Christoph De Baene (delarou)