Expression parser in the xslt.

Tuesday, 12 August 2008

I know we're not the first who create a parser in xslt. However I still want to share our implementation, as I think it's beautiful.

In our project, which is conversion from a some legacy language to java, we're dealing with dynamic expressions. For example in the legacy language one can filter a collection using an expression defined by a string: collection.filter("a > 0 and b = 7");

Whenever expression string is calculated there is nothing to do except to parse such string at runtime and perform filtering dynamically. On the other hand we have found that in the majority of cases literal strings are used. Thus we have decided to optimize this route like this:

collection.filter( new Filter<T>() { boolean filter(T value) { return (value.getA() > 0) and (value.getB() = 7); } });

This means that we're converting that expression string into java code on the generation stage.

In the xslt - our generator engine - this means that we have to convert a string into expression tree like this:

(a > 7 or a= 3) and c * d = 2.2

<and> <or> <gt> <identifier>a</identifier> <integer>7</integer> </gt> <eq> <identifier>a</identifier> <integer>3</integer> </eq> </or> <eq> <mul> <identifier>c</identifier> <identifier>d</identifier> </mul> <decimal>2.2</decimal> </eq> </and>

Our parser fits naturally to the world of parsers: it uses xsl:analyze-string instruction to tokenize input and parses tokens according to an expression grammar. During implementation I've found some new to me things. I think they worth mentioning:

As tokenizer is defined as a big regular expression, we have rather verbose regex attribute over xsl:analyze-string. It was hard to edit such a big line until I've found there is flag="x" option that solves formatting problems:

The flags attribute may be used to control the interpretation of the regular expression... If it contains the letter x, then whitespace within the regular expression is ignored.
This means that I can use spaces to format regular expression and /s to specify space as part of expression.
Saxon 9.1.0.1 has inefficiency in implementation of xsl:analyze-string instruction, whenever regex contains literal value however with '{' character (e.g. "\p{{L}}"), as it considers the value to be an AVT and delays pattern compilation until runtime, which it does every time instruction is executed.

Use following link to see the xslt: expression-parser.xslt.
To see how to generate java from an xml follow this link: Xslt for the jxom (Java xml object model), jxom.zip.

Tuesday, 12 August 2008 14:45:54 UTC

Comments [2] -
xslt

Thursday, 09 October 2008 05:45:51 UTC

This is really cool.

Oleg Tkachenko

Friday, 10 October 2008 04:02:45 UTC

Probably you'd be interested to read about the LR Parsing Framework of FXSL.

Using it, several parsers have been implemented in pure XSLT (a JSON parser: the f:json-document() function), an XPath 2.0 parser and more...

Cheers,
Dimitre Novatchev

Dimitre Novatchev

All comments require the approval of the site owner before being displayed.

Name *
E-mail
Home page

	Remember Me
Comment (Some html is allowed: `a@href@title, b, blockquote@cite, em, i, strike, strong, sub, super, u`) where the @ means "attribute." For example, you can use <a href="" title=""> or <blockquote cite="Scott">.
Enter the code shown (prevents robots):
Live Comment Preview