Saturday, 11 December 2010 - Nesterovsky bros

Saturday, 11 December 2010

We have a class Beans used to serialize a list of generic objects into an xml. This is done like this:

public class Call { public Beans input; public Beans output; ... } @XmlJavaTypeAdapter(value = BeanAdapter.class) public class Beans { public List<Object> bean; }

Thanks to @XmlJavaTypeAdapter, we're able to write xml in whatever form we want.

When we're serializing a Call instance:

Call call = ... Beans beans = ...; call.setInput(beans); JAXBContext context = ...; Marshaller marshaler = context.createMarshaller(); ObjectFactory factory = ...; marshaler.marshal(factory.createCall(call), result);

things work as expected, meaning that BeanAdapter is used during xml serialization. But if it's happened that you want to serialize a Beans instance itself, you start getting problems with the serialization of unknown objects. That's because JAXB does not use BeanAdapter.

We have found a similar case "How to assign an adapter to the root element?", unfortunately with no satisfactory explanation.

That is strange.

Saturday, 11 December 2010 08:48:00 UTC

Comments [0] -
Java | Thinking aloud

Tuesday, 23 November 2010

Something else

Last Thursday, 18 Nov 2010, we with our colleagues from BluePhoenix were in a trip at Wadi Kelt. See our photo-report here.

Tuesday, 23 November 2010 13:23:48 UTC

Comments [1] -

Thursday, 18 November 2010

A shallow thoughts

Michael Key, author of the Saxon xslt processor, being inspired by the GWT ideas, has decided to compile Saxon HE into javascript. See Compiling Saxon using GWT.

The resulting script is about 1MB of size.

But what we thought lately, that it's overkill to bring whole xslt engine on a client, while it's possible to generate javascript from xslt the same way as he's building java from xquery. This will probably require some runtime but of much lesser size.

Thursday, 18 November 2010 16:19:52 UTC

Comments [0] -
Tips and tricks | xslt

Tuesday, 09 November 2010

xsl:analyze-string (www.google.fr)

Search at www.google.fr: An empty sequence is not allowed as the @select attribute of xsl:analyze-string

That's known issue. See Bug 7976.

In xslt 2.0 you should either check the value before using xsl:analyze-string, or wrap it into string() call.

The problem is addressed in xslt 3.0

Tuesday, 09 November 2010 10:11:45 UTC

Comments [0] -
Tips and tricks | xslt

Sunday, 07 November 2010

Saxon 9.3

michaelhkay: Saxon 9.3 has been out for 8 days: only two bugs so far, one found by me. I think that's a record.

Not necessary. We, for example, who use Saxon HE, have found nothing new in Saxon 9.3, while expected to see xslt 3.0. Disappointed. No actual reason to migrate.

P.S. We were among the first who were finding early bugs in previous releases.

Sunday, 07 November 2010 09:07:11 UTC

Comments [0] -
Thinking aloud | xslt

Tuesday, 02 November 2010

27 years of service

Reading individual papers of C++ WG, you can find the following one:

N3174

10-0164

To move or not to move

Bjarne Stroustrup

2010-10-17

2010-10

Core

There, Bjarne Stroustrup thinks about issues with implicitly generated copy and move operations in C++.

It's always a pleasure to see how one can deal with a problem burdened with antagonisms. To conduct his position Bjarne skilfully uses not only rational but also emotional argumentation:

...We may deem this “bad code that deserves to be broken” or “unrealistic”, but this example demonstrates that the problem with a generated move has an exact counterpart for copy (which we have lived with for 27 years)...

...In 1984, I missed the chance to protect us against copy and we have lived with the problems ever since. I should have instituted some rule along the lines “if a class has a destructor, no copy operations are generated” or “if a class has a pointer member, no copy operations are generated.”...

It's impossible to recall this numbers without shivering. :-)

Tuesday, 02 November 2010 10:16:08 UTC

Comments [0] -
Thinking aloud

w3's bugzilla entry 9069

We're following w3's "Bug 9069 - Function to invoke an XSLT transformation".

There, people argue about xpath API to invoke xslt transformations. Function should look roughly like this:

transform ( $node-tree as node()?, $stylesheet as item(), $parameters as XXX ) as node()

The discussion is spinning around the last argument: $parameters as XXX. Should it be an xml element describing parameters, a function returning values for parameter names, or some new type modelling immutable map?

What is most interesting in this discussion is the leak about plans to introduce a map type:

Comment 7 Michael Kay, 2010-09-14 22:46:58 UTC

We're currently talking about adding an immutable map to XSLT as a new data type (the put operation would return a new map). There appear to be a number of possible efficient implementations. It would be ideally suited for this purpose, because unlike the mechanism used for serialization parameters, the values can be any data type (including nodes), not only strings.

There is a hope that map will finally appear in xslt!

Tuesday, 02 November 2010 08:34:52 UTC

Comments [0] -
Thinking aloud | xslt

Monday, 01 November 2010

Languages xom update

Historically jxom was developed first, and as such exhibited some imperfectness in its xml schema. csharpxom has taken into an account jxom's problems.

Unfortunately we could not easily fix jxom as a great amount of code already uses it. In this refactoring we tried to be conservative, and have changed only "type" and "import" xml schema elements in java.xsd.

Consider type reference and package import constructs in the old schema:

 <import name="java.util.ArrayList"/>  <type package="java.util"> <part name="ArrayList"> <argument> <type name="BigDecimal" package="java.math"> </argument> </part> </type>  <type package="my"> <part name="Parent"/> <part name="Nested"/> <type>

Here we can observe that:

type is referred by a qualified name in import element;
type has two forms: simple (see BigDecimal), and other for nested or generic type (see ArrayList).

We have made it more consistent in the updated jxom:

 <import> <type name="ArrayList" package="java.util"/> </import>  <type name="ArrayList" package="java.util"> <argument> <type name="BigDecimal" package="java.math"> </argument> </type>  <type name="Nested"> <type name="Parent" package="my"/> <type>

We hope that you will not be impacted very much by this fix.

Please refresh Languages XOM from languages-xom.zip.

P.S. we have also included xml schema and xslt api to generate ASPX (see Xslt serializer for ASPX output). We, in fact, in our projects, generate aspx documents with embedded csharpxom, and then pass it through two stage transformation.

Monday, 01 November 2010 15:48:19 UTC

Comments [0] -
Announce | xslt

Friday, 22 October 2010

Parse COBOL into cobolxom, #2

In the previous post we have announced an API to parse a COBOL source into the cobolxom.

We exploited the incremental parser to build a grammar xml tree and then were planning to create an xslt transformation to generate cobolxom.

Now, we would like to declare that such xslt is ready.

At present all standard COBOL constructs are supported, but more tests are required. Preprocessor support is still in the todo list.

You may peek into an examples of COBOL:

Cobol grammar:

And cobolxom:

While we were building a grammar to cobolxom stylesheet we asked ourselves whether the COBOL parsing could be done entirely in xslt. The answer is yes, so who knows it might be that we shall turn this task into pure xslt one. :-)

Friday, 22 October 2010 13:24:31 UTC

Comments [0] -
Announce | Incremental Parser | Thinking aloud | xslt

Monday, 18 October 2010

What's evaluated first: @select or xsl:with-param?

Recently we've seen a code like this:

<xsl:variable name="a" as="element()?" select="..."/> <xsl:variable name="b" as="element()?" select="..."/> <xsl:apply-templates select="$a"> <xsl:with-param name="b" tunnel="yes" as="element()" select="$b"/> </xsl:apply-templates>

It fails with an error: "An empty sequence is not allowed as the value of parameter $b".

What is interesting is that the value of $a is an empty sequence, so the code could potentially work, provided processor evaluated $a first, and decided not to evaluate xsl:with-param.

Whether the order of evaluation of @select and xsl:with-param is specified by the standard or it's an implementation defined?

We asked this question on xslt forum, and got the following answer:

The specification leaves this implementation-defined. Since the values of the parameters are the same for every node processed, it's a reasonably strategy for the processor to evaluate the parameters before knowing how many selected nodes there are, though I guess an even better strategy would be to do it lazily when the first selected node is found.

Well, that's an expected answer. This question will probably induce Michael Kay to introduce a small optimization into the Saxon.

Monday, 18 October 2010 17:58:51 UTC

Comments [0] -
Tips and tricks | xslt

Saturday, 09 October 2010

Parse COBOL into cobolxom

Once ago we have created an incremental parser, and now when we have decided to load COBOL sources directly into cobolxom (XML Object Model for a COBOL) the parser did the job perfectly.

The good point about incremental parser is that it easily handles COBOL's grammar.

The whole process looks like this:

incremental parser having a COBOL grammar builds a grammar tree;
we stream this tree into xml;
xslt to transform xml from previous step into cobolxom (TODO).

This is an example of a COBOL:

IDENTIFICATION DIVISION. PROGRAM-ID. FACTORIAL RECURSIVE. DATA DIVISION. WORKING-STORAGE SECTION. 01 NUMB PIC 9(4) VALUE IS 5. 01 FACT PIC 9(8) VALUE IS 0. LOCAL-STORAGE SECTION. 01 NUM PIC 9(4). PROCEDURE DIVISION. MOVE 'X' TO XXX MOVE NUMB TO NUM IF NUMB = 0 THEN MOVE 1 TO FACT ELSE SUBTRACT 1 FROM NUMB CALL 'FACTORIAL' MULTIPLY NUM BY FACT END-IF DISPLAY NUM '! = ' FACT GOBACK. END PROGRAM FACTORIAL.

And a grammar tree:

The last step is to transform tree into cobolxom is in the TODO list.

We have commited COBOL grammar in the same place at SourceForge as it was with XQuery grammar. Solution is now under the VS 2010.

Saturday, 09 October 2010 08:26:23 UTC

Comments [0] -
Announce | Incremental Parser | xslt

Friday, 08 October 2010

Getting regex captures in xslt

Suppose you have a timestamp string, and want to check whether it fits to one of the following formats with leading and trailing spaces:

YYYY-MM-DD-HH.MM.SS.NNNNNN
YYYY-MM-DD-HH.MM.SS
YYYY-MM-DD

We decided to use regex and its capture groups to extract timestamp parts. This left us with only solution: xsl:analyze-string instruction. It took a couple more minutes to reach a final solution:

<xsl:variable name="parts" as="xs:string*"> <xsl:analyze-string select="$value" regex=" ^\s*(\d\d\d\d)-(\d\d)-(\d\d) (-(\d\d)\.(\d\d)\.(\d\d)(\.(\d\d\d\d\d\d))?)?\s*$" flags="x"> <xsl:matching-substring> <xsl:sequence select="regex-group(1)"/> <xsl:sequence select="regex-group(2)"/> <xsl:sequence select="regex-group(3)"/> <xsl:sequence select="regex-group(5)"/> <xsl:sequence select="regex-group(6)"/> <xsl:sequence select="regex-group(7)"/> <xsl:sequence select="regex-group(9)"/> </xsl:matching-substring> </xsl:analyze-string> </xsl:variable> <xsl:choose> <xsl:when test="exists($parts)"> ... </xsl:when> <xsl:otherwise> ... </xsl:otherwise> </xsl:choose>

How would you solve the problem? Is it the best solution?

Friday, 08 October 2010 17:37:44 UTC

Comments [0] -
Tips and tricks | xslt

Sunday, 05 September 2010

BigDecimal + JAXB => potential interoperability problems

One of our latest tasks was a conversion of data received from mainframe as an EBCDIC flat file into an XML file in UTF-8 encoding for further processing.

The solution was rather straightforward:

read the source flat file, record-by-record;
serialize each record as an element into target XML file using JAXB.

For reading data from EBCDIC encoded flat file, a good old tool named eXperanto was used. It allows to define C# and/or Java classes that suit for records in the source flat file. Thus we were able to read and convert records from EBCDIC to UTF-8.

The next sub-task was to serialize a Java bean to an XML element. JAXB marshaller was used for this.

Everything was ok, until we had started to test the implementation on real data.

We've realized that some decimal values (BigDecimal fields in Java classes) were serialized in scientific exponential notation. For example: 0.000000365 was serialized as 3.65E-7 and so on.

On the other hand, the target XML was used by another (non Java) application, which expected to receive decimal data, as it was defined in XSD schema (the field types were specified as xs:decimal).

According with W3C datatypes specification:

"...decimal has a lexical representation consisting of a finite-length sequence of decimal digits (#x30-#x39) separated by a period as a decimal indicator. An optional leading sign is allowed. If the sign is omitted, "+" is assumed. Leading and trailing zeroes are optional. If the fractional part is zero, the period and following zero(es) can be omitted. For example: -1.23, 12678967.543233, 100000.00, 210..."

So, the result was predictable, the consumer application fails.

Google search reveals that we deal with a well-known bug: "JAXB marshaller returns BigDecimal with scientific notation in JDK 6". It remains open already an year and half since May 2009, marked as "Fix in progress". We've tested our application with Java version 1.6.0_21-b07, JAXB 2.1.

Although this is rather critical bug that may affect on interoperability of Java applications (e.g. Java web services etc.), its priority was set just as "4-Low".

P.S. as a temporary workaround for this case only(!) we've replaced xs:decimal on xs:double in XSD schema for the target application.

Sunday, 05 September 2010 12:58:23 UTC

Comments [0] -
Java | Tips and tricks

Wednesday, 25 August 2010

String and StringBuilder in .NET 4

Accidentally we have found that implementation of String and StringBuilder have been considerably revised, while public interface has remained the same.

public sealed class String { private int m_arrayLength; private int m_stringLength; private char m_firstChar; }

This layout is dated to .NET 1.0.

VM, in fact, allocates more memory than that defined in C# class, as &m_firstCharrefers to an inline char buffer.

This way string's buffer length and string's length were two different values, thus StringBuilder used this fact and stored its content in a private string which it modified in place.

In .NET 4, string is different:

public sealed class String { private int m_stringLength; private char m_firstChar; }

Memory footprint of such structure is smaller, but string's length should always be the same as its buffer. In fact layout of string is now the same as layout of char[].

This modification leads to implementation redesign of the StringBuilder.

Earlier, StringBuilder looked like the following:

public sealed class StringBuilder { internal IntPtr m_currentThread; internal int m_MaxCapacity; internal volatile string m_StringValue; }

Notice that m_StringValue is used as a storage, and m_currentThread is used to preserve thread affinity of the internal string value.

Now, guys at Microsoft have decided to implement StringBuilder very differently:

public sealed class StringBuilder { internal int m_MaxCapacity; internal int m_ChunkLength; internal int m_ChunkOffset; internal char[] m_ChunkChars; internal StringBuilder m_ChunkPrevious; }

Inspection of this layout immediately reveals implementation technique. It's a list of chunks. Instance itself references the last chunk (most recently appended), and the previous chunks.

Characteristics of this design are:

while Length is small, performance almost the same as it was earlier;
there are no more thread affinity checks;
Append(), and ToString() works as fast a in the old version.
Insert() in the middle works faster, as only a chuck should be splitted and probably reallocated (copied), instead of the whole string;
Random access is fast at the end O(1) and slows when you approaching the start O(chunk-count).

Personally, we would select a slightly different design:

public sealed class StringBuilder { private struct Chunk { public int length; // Chunk length. public int offset; // Chunk offset. public char[] buffer; } private int m_MaxCapacity; // Alternatively, one can use // private List<Chunk> chunks; private int chunkCount; // Number of used chunks. private Chunk[] chunks; // Array of chunks except last. private Chunk last; // Last chunk. private bool nonHomogenous; // false if all chunks are of the same size. }

This design has better memory footprint, and random access time is O(1) when there were no inserts in the middle (nonHomogenous=false), and O(log(chunkCount)) after such inserts. All other characteristics are the same.

Wednesday, 25 August 2010 09:36:55 UTC

Comments [0] -
Thinking aloud | Tips and tricks

Visual Studio 2010

Earlier, there was a hype about how good VS 2010 is.

When we tried the beta and found that it's noticeably slower than VS 2008, we assumed that release will do better.

Unfortunately, that was an optimistic assumption.

Comparing VS 2008 and VS 2010 we can confirm that later:

eats more memory;
exhibits slower experience with C# projects (often hangs for a long periods and even crushes);
incapable to work with xslt 2.0 files;
has removed Shift+Enter key stroke to insert <br/> in html editor (why?);
has removed visualizer of the StringBuilder (in debugger).

Are we using too outdated hardware (laptops Lenovo T60 2GHz Core Duo/2GB RAM)? Other reason?

Wednesday, 25 August 2010 06:52:39 UTC

Comments [0] -

Navigation