A project I'm currently working on, requires me to manipulate with a big number
of documents. This includes accessing these documents with key()
function.
I never thought this task poses any problem, until I've discovered that Saxon
caches documents loaded using document() function to preserve their identities:
By default, this function is ·stable·.
Two calls on this function return the same document node if the same
URI Reference (after resolution to an absolute URI Reference)
is supplied to both calls. Thus, the following expression
(if it does not raise an error) will always be true:
doc("foo.xml") is doc("foo.xml")
However, for performance reasons, implementations may provide a user option to
evaluate the function without a guarantee of stability. The manner in which any
such option is provided is implementation-defined. If the user has not selected
such an option, a call of the function must either return a stable result or
must raise an error: [err:FODC0003].
Saxon provides a saxon:discard-document() function to release documents from
cache. The use case is like this:
<xsl:variable name="document" as="document-node()"
select="saxon:discard-document(document(...))"/>
You may see, that saxon:discard-document() is bound to a place where document is
loaded. In my case this is inefficient, as my code repeatedly accesses documents
from different places. To release loaded documents I need to collect them after
main processing.
Other issue in Saxon is that, processor may keep document references through
xsl:key, thus saxon:discard-document() provides no guaranty of documents to be
garbage collected.
To deal with this, I've designed (Saxon specific) api to manage document pools:
t:begin-document-pool-scope() as item()
Begins document pool scope.
Returns scope id.
t:end-document-pool-scope(scope as item())
Terminates document pool scope.
$scope - scope id.
t:put-document-in-pool(document as document-node()) as
document-node()
Puts a document into a current scope of document pool.
$document - a document to put into the document pool.
Returns the same document node.
The use case is:
<xsl:variable name="scope" select="t:begin-document-pool-scope()"/>
<xsl:sequence select="t:assert($scope)"/>
...
<xsl:variable name="document" as="document-node()"
select="t:put-document-in-pool(...)"/>
...
<xsl:sequence
select="t:end-document-pool-scope($scope)"/>
Download
document-pool.xslt to use this api.