Xslt 2.0 document pool api, for Saxon

Friday, 09 May 2008

A project I'm currently working on, requires me to manipulate with a big number of documents. This includes accessing these documents with key() function.

I never thought this task poses any problem, until I've discovered that Saxon caches documents loaded using document() function to preserve their identities:

By default, this function is ·stable·. Two calls on this function return the same document node if the same URI Reference (after resolution to an absolute URI Reference) is supplied to both calls. Thus, the following expression (if it does not raise an error) will always be true:

doc("foo.xml") is doc("foo.xml")

However, for performance reasons, implementations may provide a user option to evaluate the function without a guarantee of stability. The manner in which any such option is provided is implementation-defined. If the user has not selected such an option, a call of the function must either return a stable result or must raise an error: [err:FODC0003].

Saxon provides a saxon:discard-document() function to release documents from cache. The use case is like this:

<xsl:variable name="document" as="document-node()"
select="saxon:discard-document(document(...))"/>

You may see, that saxon:discard-document() is bound to a place where document is loaded. In my case this is inefficient, as my code repeatedly accesses documents from different places. To release loaded documents I need to collect them after main processing.

Other issue in Saxon is that, processor may keep document references through xsl:key, thus saxon:discard-document() provides no guaranty of documents to be garbage collected.

To deal with this, I've designed (Saxon specific) api to manage document pools:

  t:begin-document-pool-scope() as item()

  Begins document pool scope.

    Returns scope id.

  t:end-document-pool-scope(scope as item())

  Terminates document pool scope.

    $scope - scope id.

  t:put-document-in-pool(document as document-node()) as 
  document-node()

  Puts a document into a current scope of document pool.

    $document - a document to put into the document pool.

    Returns the same document node.

The use case is:

    
      <xsl:variable name="scope" select="t:begin-document-pool-scope()"/>

      <xsl:sequence select="t:assert($scope)"/>

            ...

      <xsl:variable name="document" as="document-node()"

      select="t:put-document-in-pool(...)"/>

      ...

      <xsl:sequence 
      select="t:end-document-pool-scope($scope)"/>

Download document-pool.xslt to use this api.

Friday, 09 May 2008 06:58:29 UTC

Comments [0] -
xslt

All comments require the approval of the site owner before being displayed.

Name *
E-mail
Home page

	Remember Me
Comment (Some html is allowed: `a@href@title, b, blockquote@cite, em, i, strike, strong, sub, super, u`) where the @ means "attribute." For example, you can use <a href="" title=""> or <blockquote cite="Scott">.
Enter the code shown (prevents robots):
Live Comment Preview