I know we're not the first who create a parser in xslt.
However I still want to share our implementation, as I think it's beautiful.
In our project, which is conversion from a some legacy language to java, we're
dealing with dynamic expressions. For example in the legacy language one can
filter a collection using an expression defined by a string:
collection.filter("a > 0 and b = 7");
Whenever expression string is calculated there is nothing to do except to parse such
string at runtime and perform filtering dynamically. On the other hand we have
found that in the majority of cases literal strings are used. Thus we have decided to
optimize this route like this:
collection.filter(
new Filter<T>()
{
boolean filter(T value)
{
return (value.getA() > 0) and (value.getB() = 7);
}
});
This means that we're converting that expression string into java code on the
generation stage.
In the xslt - our generator engine - this means that we have to convert a string
into expression tree like this:
(a > 7 or a= 3) and c * d = 2.2
to
<and>
<or>
<gt>
<identifier>a</identifier>
<integer>7</integer>
</gt>
<eq>
<identifier>a</identifier>
<integer>3</integer>
</eq>
</or>
<eq>
<mul>
<identifier>c</identifier>
<identifier>d</identifier>
</mul>
<decimal>2.2</decimal>
</eq>
</and>
Our parser fits naturally to the world of parsers: it uses xsl:analyze-string instruction to tokenize input and
parses tokens according to an expression grammar. During implementation I've
found some new to me things. I think they worth mentioning:
-
As tokenizer is defined as a big regular expression, we have rather verbose
regex attribute over
xsl:analyze-string. It was hard
to edit such a big line until I've found there is
flag="x" option that solves
formatting problems:
The flags attribute may be used to control the interpretation of the regular expression... If it contains the letter x, then whitespace within the regular expression is ignored.
This means that I can use spaces to format regular expression and
/s to specify space as part of expression.
-
Saxon 9.1.0.1 has inefficiency in implementation of
xsl:analyze-string
instruction, whenever regex contains literal value however with '{' character
(e.g. "\p{{L}}"), as it considers the value to be an AVT and delays pattern
compilation until runtime, which it does every time instruction is executed.
Use following link to see the xslt:
expression-parser.xslt.
To see how to generate java from an xml follow this link:
Xslt for the jxom (Java xml object model), jxom.zip.