Wednesday, February 24, 2010

Ladies and gentlemen of the jury...

Wednesday, February 24, 2010 9:00:36 PM UTC      Comments [7] -

Well, it's itching!

In cobolxom there is an utility function to allocate names in scope. Its signature looks like this:

`<!--  Allocates unique names in the form \$prefix{number}?.  Note that prefixes may coincide.     \$prefixes - a name prefixes.     \$names - allocated names pool.     \$name-max-length - a longest allowable name length.     Returns unique names.--> <xsl:function name="t:allocate-names" as="xs:string*">   <xsl:param name="prefixes" as="xs:string*"/>   <xsl:param name="names" as="xs:string*"/>   <xsl:param name="name-max-length" as="xs:integer?"/> `

We have created several different implementations (all use recursion). Every implementation works fair for relatively small input sequences, say N < 100, but we have cases when there are about 10000 items on input. Algorithm's worst case complexity, in absence of associative containers, is O(N*N), and be sure it's an O-o-o-oh... due to xslt engine implementation.

If there were associative containers with efficient access (complexity is O(LogN)), and construction of updated container (complexity is also O(LogN)) then implementation would be straightforward and had complexity O(N*LogN).

Wednesday, February 24, 2010 7:34:07 AM UTC      Comments [0] -
Thinking aloud | xslt
Wednesday, February 17, 2010

The very same simple tasks tend to appear in different languages (e.g. C# Haiku). Now we have to find:

• integer and fractional part of a decimal;
• length and precision of a decimal.

These tasks have no trivial solutions in xslt 2.0.

At present we have came up with the following answers:

Fractional part:

```<xsl:function name="t:fraction" as="xs:decimal">   <xsl:param name="value" as="xs:decimal"/>   <xsl:sequence select="\$value mod 1"/> </xsl:function>```

Integer part v1:

```<xsl:function name="t:integer" as="xs:decimal">   <xsl:param name="value" as="xs:decimal"/>   <xsl:sequence select="\$value - t:fraction(\$value)"/> </xsl:function>```

Integer part v2:

```<xsl:function name="t:integer" as="xs:decimal">   <xsl:param name="value" as="xs:decimal"/>   <xsl:sequence select="     if (\$value ge 0) then       floor(\$value)     else       -floor(-\$value)"/> </xsl:function>```

Length and precision:

```<!--   Gets a decimal specification as a closure:     (\$length as xs:integer, \$precision as xs:integer). --> <xsl:function name="t:decimal-spec" as="xs:integer+">   <xsl:param name="value" as="xs:decimal"/>   <xsl:variable name="text" as="xs:string" select="     if (\$value lt 0) then       xs:string(-\$value)     else       xs:string(\$value)"/>   <xsl:variable name="length" as="xs:integer"     select="string-length(\$text)"/>   <xsl:variable name="integer-length" as="xs:integer"     select="string-length(substring-before(\$text, '.'))"/>     <xsl:sequence select="     if (\$integer-length) then       (\$length - 1, \$length - \$integer-length - 1)     else       (\$length, 0)"/> </xsl:function>```

The last function looks odious. In many other languages its implementation would be considered as embarrassing.

Wednesday, February 17, 2010 7:29:55 AM UTC      Comments [0] -
Tips and tricks | xslt
Sunday, February 7, 2010

Given:

```public class N {   public readonly N next; }```

What needs to be done to construct a ring of `N`: `n1` refers to `n2`, `n2` to `n3`, ... `nk` to `n1`? Is it possible?

Sunday, February 7, 2010 7:57:08 AM UTC      Comments [2] -
Thinking aloud | Tips and tricks
Saturday, February 6, 2010

To end with immutable trees, at least for now, we've implemented `IDictionary<K, V>`. It's named `Map<K, V>`. Functionally it looks very like `SortedDictionary<K, V>`. there are some differences, however:

• `Map` in contrast to `SortedDictionary` is very cheap on copy.
• Bacause `Map` is based on AVL tree, which is more rigorly balanced than RB tree, so it's a little bit faster asymptotically for lookup than `SortedDictionary`, and a little bit slower on modification.
• Due to the storage structure: node + navigator, `Map` consumes less memory than `SortedDictionary`, and is probably cheaper for GC (simple garbage graphs).
• As AVL tree stores left and right subtree sizes, in contrast to a "color" in RB tree, we able to index data in two ways: with integer index, and with key value.

Sources are:

Update:

It was impossible to withstand temptation to commit some primitive performance comparision. `Map` outperforms `SortedDictionary` both in population and in access. this does not aggree with pure algorithm's theory, but there might be other unaccounted factors: memory consumption, quality of implementation, and so on.

Program.cs is updated with measurements.

Update 2:

More occurate tests show that for some key types `Map`'s faster, for others `SortedDictionary`'s faster. Usually `Map`'s slower during population (mutable AVL tree navigator may fix this). the odd thing is that `Map<string, int>` is faster than `SortedDictionary<string, int>` both for allocaction and for access. See excel report.

Update 3:

Interesing observation. The following table shows maximal and average tree heights for different node sizes in AVL and RB trees after a random population:

AVL RB
Size Max Avg Max Avg
10 4 2.90 5 3.00
50 7 4.94 8 4.94
100 8 5.84 9 5.86
500 11 8.14 14 8.39
1000 12 9.14 16 9.38
5000 15 11.51 18 11.47
10000 16 12.53 20 12.47
50000 19 14.89 23 14.72
100000 20 15.90 25 15.72
500000 25 18.26 28 18.27
1000000 25 19.28 30 19.27

Here, according with theory, the height of AVL tree is shorter than the height of RB tree. But what is most interesting is that the depth of an "average node". This value describes a number of steps required to find a random key. RB tree is very close and often is better than AVL in this regard.

Saturday, February 6, 2010 6:31:13 PM UTC      Comments [0] -
Thinking aloud | Tips and tricks
Wednesday, February 3, 2010

It was obvious as hell from day one of generics that there will appear obscure long names when you will start to parametrize your types. It was the easiest thing in the world to take care of this in advanvce. Alas, C# inherits C++'s bad practices.

Read Associative containers in a functional languages and Program.cs to see what we're talking about.

Briefly, there is a pair (`string`, `int`), which in C# should be declared as:

`System.Collections.Generic.KeyValuePair<string, int>`

Obviously we would like to write it in a short way. These are our attempts, which fail:

1. Introduce generic alias Pair<K, V>:

```using System.Collections.Generic; using Pair<K, V> = KeyValuePair<K, V>;```

2. Introduce type alias for a generic type with specific types.

``` using System.Collections.Generic; using Pair = KeyValuePair<string, int>;```

And this is only one that works:

``` using Pair = System.Collections.Generic.KeyValuePair<string, int>;```

Do you think is it bearable? Well, consider the following:

• There is a generic type `ValueNode<T>`, where `T` should be `Pair`.
• There is a generic type `TreeNavigator<N>`, where `N` is should be `ValueNode<Pair>`.

The declaration looks like this:

``` using Pair = System.Collections.Generic.KeyValuePair<string, int>; using Node = NesterovskyBros.Collections.AVL.ValueNode<   System.Collections.Generic.KeyValuePair<string, int>>; using Navigator = NesterovskyBros.Collections.AVL.TreeNavigator<   NesterovskyBros.Collections.AVL.ValueNode<     System.Collections.Generic.KeyValuePair<string, int>>>;```

Do you still think is it acceptable?

P.S. Legacy thinking led C#'s and java's designers to the use of word "new" for the object construction. It is not required at all. Consider `new Pair("A", 1)` vs `Pair("A", 1)`. C++ prefers second form. C# and java always use the first one.

Wednesday, February 3, 2010 11:59:19 AM UTC      Comments [1] -
Thinking aloud | Tips and tricks
Wednesday, January 27, 2010

Continuing with the post "Ongoing xslt/xquery spec update" we would like to articulate what options regarding associative containers do we have in a functional languages (e.g. xslt, xquery), assuming that variables are immutable and implementation is efficient (in some sense).

There are three common implementation techniques:

• store data (keys, value pairs) in sorted array, and use binary search to access values by a key;
• store data in a hash map;
• store data in a binary tree (usually RB or AVL trees).

Implementation choice considerably depends on operations, which are taken over the container. Usually these are:

1. construction;
2. value lookup by key;
3. key enumeration (ordered or not);
4. container modification (add and remove data into the container);
5. access elements by index;

Note that modification in a functional programming means a creation of a new container, so here is a division:

1. If container's use pattern does not include modification, then probably the simplest solution is to build it as an ordered sequence of pairs, and use binary search to access the data. Alternatively, one could implement associative container as a hash map.
2. If modification is essential then neither ordered sequence of pairs, hash map nor classical tree implementation can be used, as they are either too slow or too greedy for a memory, either during modification or during access.

On the other hand to deal with container's modifications one can build an implementation, which uses "top-down" RB or AVL trees. To see the difference consider a classical tree structure and its functional variant:

Classical Functional
Node structure: ```node   parent   left   right   other data``` ```node     left   right   other data```
Node reference: node itself node path from a root of a tree
Modification: either mutable or requires a completely new tree O(LnN) nodes are created

Here we observe that:

1. one can implement efficient map (lookup time no worse than O(LnN)) with no modification support, using ordered array;
2. one can implement efficient map with support of modification, using immutable binary tree;
3. one can implement all these algorithms purely in xslt and xquery (provided that inline functions are supported);
4. any such imlementation will lose against the same implementation written in C++, C#, java;
5. the best implementation would probably start from sorted array and will switch to binary tree after some size threshold.

Here we provide a C# implementation of a functional AVL tree, which also supports element indexing:

Our intention was to show that the usual algorithms for associative containers apply in functional programming; thus a feature complete functional language must support associative containers to make development more conscious, and to free a developer from inventing basic things existing already for almost a half of century.

Wednesday, January 27, 2010 7:00:55 AM UTC      Comments [0] -
Thinking aloud | Tips and tricks | xslt
Tuesday, January 19, 2010

Several years ago we have started a new project. We do not like neither hate any particular language, thus the decision what language to use was pragmatical: xslt 2.0 fitted perfectly.

At present it's a solid volume of xslt code. It exhibits all the virtues of any other good project in other language: clean design, modularity, documentation, sophisticationless (good code should not be clever).

Runtime profile of the project is that it deals with xml documents with sizes from a few dozens of bytes to several megabytes, and with xml schemas from simple ones like a tabular data, and to rich like xhtml and untyped. Pipeline of stylesheets processes gigabytes of data stored in the database and in files.

All the bragging above is needed here to introduce the context for the following couple of lines.

The diversity of load conditions and a big code base, exposed xslt engine of choice to a good proof test. The victim is Saxon. In the course of project we have found and reported many bugs. Some of them are nasty and important, and others are bearable. To Michael Kay's credit (he's owner of Saxon) all bugs are being fixed promtly (see the last one).

Such project helps us to understand a weak sides of xslt (it seems sometimes they, in WG, lack such experience, which should lead them through).

Well, it has happened so that we're helping to Saxon project. Unintentionally, however!

Nowdays we're polishing a COBOL generation. To this end we have familiarized ourselves with this language. That's the beatiful language. Its straightforwardness helps to see the evolution of computer languages and to understand what and why today's languages try to timidly hide.

Tuesday, January 19, 2010 7:56:04 PM UTC      Comments [0] -
Thinking aloud | xslt
Friday, January 15, 2010

We have updated languages-xom.zip. There are many fixes in cobolxom (well, cobolxom is new, and probably there will be some more bugs). Also we have included Xml Object Model for the SQL, which in fact has appeared along with jxom.

SQL xom supports basic sql syntax including common table expressions, and two dialects for DB2 and Oracle.

Friday, January 15, 2010 4:20:41 PM UTC      Comments [2] -
xslt
Wednesday, January 6, 2010

Recently W3C has published new drafts for xquery 1.1 and for xpath 2.1. We have noticed that committee has decided to introduce inline functions both for the xquery and for the xpath.

That's a really good news! This way xquery, xpath and xslt are being approached the Object Oriented Programming the way of javascript with its inline functions.

Now we shall be able to implement tuples (a sequence of items wrapped into single item), object with named properties, trees (e.g. RB Tree), associative containers (tree maps and hash maps, sets).

Surely, all this will be in the spirit of functional programming.

The only thing we regret about is that the WG did not include built-in implementations for trees and associative containers, as we don't believe that one can create an efficient implementation of these abstractions neither in xquery nor in xslt (asymptotically results will be good, but coefficients will be painful).

Wednesday, January 6, 2010 1:13:16 PM UTC      Comments [0] -
xslt
Monday, January 4, 2010

Not sure how things work for others but for us it turns out that Saxon 9.2 introduces new bugs, works slower and eats much more memory than its ancestor v9.1.

We hope all this will be fixed soon.

Update: By the way, Saxon 9.2 (at the moment 2009-01-04) does not like (despises in fact) small documents and especially text nodes in those documents. It loves huge in memory documents, however.

Update 2009-01-05: case's closed, fix's commited into svn.

Monday, January 4, 2010 1:39:47 PM UTC      Comments [0] -
xslt
Friday, January 1, 2010

Today, I've tried to upgrade our projects to Saxon 9.2. We have a rather big set of stylesheets grinding gigabytes of information. It's obvious that we expected at least the same performance from the new version.

But to my puzzlement a pipeline of transformations failed almost immediately with en error message:

```XPTY0018: Cannot mix nodes and atomic values in the result of a path expression ```

We do agree with this statement in general, but what it had in common with our stylesheets? And how everything was working in 9.1?

To find the root of the problem I've created a minimal problem reproduction:

```<xsl:stylesheet version="2.0"   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"   xmlns:xs="http://www.w3.org/2001/XMLSchema"   xmlns:t="this"   exclude-result-prefixes="xs t">   <!-- Entry point. -->   <xsl:template match="/">     <xsl:variable name="p" as="element()">       <p l="1"/>     </xsl:variable>     <o l="{\$p/t:d(.)}"/>   </xsl:template>   <xsl:function name="t:d" as="item()*">     <xsl:param name="p" as="element()"/>     <xsl:apply-templates mode="p" select="\$p"/>   </xsl:function>   <xsl:template match="*" mode="p">     <xsl:sequence select="concat('0', @l)"/>   </xsl:template> </xsl:stylesheet>```

Really simple, isn't it? The problem is in a new optimization of `concat()` function, introduced in version 9.2. It tries to eliminate string concatenation, and in certain cases emits its arguments directly into the output as text nodes, separating whole output with some stopper strings. The only problem is that no such optimization is allowed in this particular case (which is rather common, and surely legal, in our stylesheets); result of ``` <xsl:template match="p" mode="p">``` should not be a node, but of type `xs:string`.

Saxon 9.2 is here already for 3 month, at lest! Thus, how come that such a bug was not discovered earlier?

Update: the fix is commited into the svn on the next day. That's promptly!

Friday, January 1, 2010 10:17:47 PM UTC      Comments [0] -
xslt
Sunday, December 27, 2009

We've added a new language to the set of Xml Object Model schemas and stylesheets.

The newcomer is COBOL! No jokes. It's not a whim, really. Believe it or not but COBOL is still alive and we need to generate it (mainly different sorts of proxies).

We've used VS COBOL II grammar Version 1.0.3 as a reference. Implemented grammar is complete but without preprocessor statements. On the other hand it defines COPY and EXEC SQL constructs.

Definitely, it'll take a time for the xml schema and xslt implementation to become mature.

Now language XOM is:

• jxom - for java;
• csharpxom - for C#;
• cobolxom - for COBOL.

Sources can be found at languages-xom.

Sunday, December 27, 2009 5:00:07 PM UTC      Comments [0] -
Announce | xslt
Monday, December 21, 2009

Given:

• an xml defining elements and groups;
• each element belongs to a group or groups;
• group may belong to another group.

Find:

• groups, a given element directly or inderectly belongs to;
• a function checking whether an element belongs to a group.

Example:

``` <groups>   <group name="g1">     <element ref="e1"/>     <element ref="e2"/>     <element ref="e3"/>     <group ref="g2"/>   </group>   <group name="g2">     <element ref="e5"/>   </group>   <group name="g3">     <element ref="e1"/>     <element ref="e4"/>   </group> </groups>```

There are several solutions depending on aggresiveness of optimization. A moderate one is done through the xsl:key. All this reminds recursive common table expressions in SQL.

Anyone?

Monday, December 21, 2009 5:19:32 PM UTC      Comments [0] -
xslt
Saturday, December 19, 2009

In spite of the fact that our last projects are being developed in Java, the .NET is definitly our favorite platform.

In a twitter I saw the phrase: "Java the language is a stagnant mess". It's said in favour of C#. It's true that C# significantly affects now even on Java (let's remember generics, jaxb, web services, etc.), but in my opinion, the C# won't be the leading language for worldwide enterprise applications in the nearest future.

One of causes is that the main platform for .NET still is Windows. The situation could be changed by Mono project, but I think there are yet not enough projects on platforms other than Windows.

My guess is confirmed by some of observations that I did as a software engineer of an IT company. Our company performs different software porting projects from legacy programming languages like COBOL, ADSO, Natural etc. into up to date languages like Java, C# etc. It worth to say that clients rarely select to migrate to .NET despite to our advices.

The main reason of such choice, according to most of our clients, is that they want to be platform independent and only Java gives them this choice.

It worth for Microsoft to think about cooperation with Mono in order to make .NET really platform indpendent, otherwise C# will always be a step behind Java despite apparent advantages of C# as a programming language.

Saturday, December 19, 2009 5:08:23 PM UTC      Comments [1] -
Thinking aloud
Archive
 < February 2010 >
SunMonTueWedThuFriSat
31123456
78910111213
14151617181920
21222324252627
28123456
78910111213
Categories
 .NET AI Angular AngularJS Announce ASP.NET Azure BizTalk Server C++ Incremental Parser Java javascript JSF and Facelets kendoui ML.NET SCCBridge SQL Server puzzle Thinking aloud Tips and tricks Window Search xslt
Blogroll
Statistics
Total Posts: 387
This Year: 3
This Month: 0
This Week: 0