RSS 2.0
Sign In
# Thursday, February 12, 2009

Do you agree that binary trees and algorithms that keep trees reasonably balanced are important?

Our answer is yes!

It's interesting enough, however, that you won't easily find these algorithms publicly available.

Though red-black, AVL and other algorithms described in the wikipedia are defined in terms of tree manipulation, all implementations we have seen, deal with trees annotated with keys and values. These implementations really use tree balancing algorithms behind the schene, and expose a commonplace set or map containers to a client. Even C++ Standard Library suffers from this disease.

We think that binary trees are valuable independent concepts, and they worth to be implemented separately, at least because there are other algorithms, except sets and maps, using trees.

And well, we did it in C#! See RedBlackTree.cs.

Consider an example - a simple scheduler, ScheduleBookmark.cs, with operations:

  • schedule an action;
  • remove an action from the schedule;
  • enumerate actions;
  • find a date, an action is scheduled for;
  • find an action (or at least closest one) for a specified date;
  • postpone actions due to delays;

A balanced binary tree allows efficient implementation of such a scheduler. Tree node stores an action, and a time span between parent node and this node. This way:

Operation Steps
schedule an action find place + link node + rebalance tree
remove an action from the schedule unlink node + rebalance tree
enumerate actions navigate tree
find a date, an action is scheduled for find node in tree
find an action for a specified date cumulate time spans up to the tree root
postpone actions due to delays fixup time spans from a node up to the tree root

Compare operation complexities between tree, array, list and map:
Operation Tree Array List Map
schedule an action O(ln(N)) O(N) O(N) O(ln(N))
remove an action from the schedule O(ln(N)) O(N) O(1) O(ln(N))
enumerate actions O(ln(N)) O(1) O(1) O(ln(N))
find a date, an action is scheduled for O(ln(N)) O(1) O(1) O(1)
find an action for a specified date O(ln(N)) O(ln(N)) O(N) O(ln(N))
postpone actions due to delays O(ln(N)) O(N) O(N) O(N*ln(N))

Complexity of each operation for the tree is O(ln(N)). No arrays, lists, or maps achieve similar worst case guaranty.

Finally, the test program is Program.cs, and a whole project (VS2008) is

Thursday, February 12, 2009 1:17:36 PM UTC  #    Comments [0] -
Incremental Parser | Tips and tricks
# Wednesday, February 11, 2009

Could you think of a C# method accepting an ancestor, and forbidding a descendant of a class at compile time?

The answer to this probably is: why do you need such a reptile.

Well, I don't. I didn't meant to create such a method, but generics help a lot!

public class BinaryTreeNode<Node>
  where Node: BinaryTreeNode<Node>
  public Node parent;
  public Node left;
  public Node right;

public class MyNode: BinaryTreeNode<MyNode>
  public int key;

public class MyRoot: MyNode

public class Test
  public void test()
    MyRoot root = new MyRoot();

    // print((MyNode)root); // This works.
    print(root); // This does not work.

  private static void print<T>(T node)
    where T: BinaryTreeNode<T>
    Console.WriteLine("print me");

By the way, BinaryTreeNode is an "abstract" class, as you cannot instantiate it but inherit only.

Wednesday, February 11, 2009 1:59:17 PM UTC  #    Comments [0] -
Incremental Parser | Tips and tricks
# Thursday, January 15, 2009

A simple demand nowdays - a good IDE.

Almost a ten years have passed since xslt has appeared but still, we're not pleased with IDEs claiming xslt support. Our expectaions are not too high. There are things however, which must be present in such an IDE.

  1. A notion of project, and possibly a group of projects. You may think of it as a main xslt including other xslts participationg in the project.
  2. A code completion. A feature providing typing hints for language constructs, includes, prefixes, namespaces, functions, templates, modes, variables, parameters, schema elements, and other (all this should work in a context of the project).
  3. A code refactoring. A means to move parts of code between (or inside) files and projects, rename things (functions, templates, parameters, variables, prefixes, namespaces, and other).
  4. Code validation and run.
  5. Optional debug feature.

We would be grateful if someone had pointed to any such IDE.

Thursday, January 15, 2009 2:41:35 PM UTC  #    Comments [13] -
Incremental Parser | xslt
# Wednesday, January 14, 2009

Once upon a time, we created a function mimicking decapitalize() method defined in java in java.beans.Introspector. Nothing special, indeed. See the source:

 * Utility method to take a string and convert it to normal Java variable
 * name capitalization. This normally means converting the first
 * character from upper case to lower case, but in the (unusual) special
 * case when there is more than one character and both the first and
 * second characters are upper case, we leave it alone.
 * <p>
 * Thus "FooBah" becomes "fooBah" and "X" becomes "x", but "URL" stays
 * as "URL".
 * @param name The string to be decapitalized.
 * @return The decapitalized version of the string.
public static String decapitalize(String name) {
  if (name == null || name.length() == 0) {
    return name;
  if (name.length() > 1 && Character.isUpperCase(name.charAt(1)) &&
    return name;
  char chars[] = name.toCharArray();
  chars[0] = Character.toLowerCase(chars[0]);
  return new String(chars);

We typed implementation immediately:

<xsl:function name="t:decapitalize" as="xs:string">
  <xsl:param name="value" as="xs:string?"/>

  <xsl:variable name="c" as="xs:string"
    select="substring($value, 2, 1)"/>

  <xsl:sequence select="
    if ($c = upper-case($c)) then
        lower-case(substring($value, 1, 1)),
        substring($value, 2)

It worked, alright, until recently, when it has fallen to work, as the output was different from java's counterpart.

The input was W9Identifier. Function naturally returned the same value, while java returned w9Identifier. We has fallen with the assumption that $c = upper-case($c) returns true when character is an upper case letter. That's not correct for numbers. Correct way is:

<xsl:function name="t:decapitalize" as="xs:string">
  <xsl:param name="value" as="xs:string?"/>

  <xsl:variable name="c" as="xs:string"
    select="substring($value, 2, 1)"/>

  <xsl:sequence select="
    if ($c != lower-case($c)) then
        lower-case(substring($value, 1, 1)),
        substring($value, 2)

Wednesday, January 14, 2009 3:46:23 PM UTC  #    Comments [0] -
Tips and tricks | xslt
# Tuesday, January 6, 2009

100% Agree.

> Basically we work for clients and if they ask that we need
> this output from this input then we don't thing about it and
> get the result.

I guess the definition of professionalism is that if you're a professional,
you advise the client when he gets the requirements wrong, and if you're
not, you build whatever rubbish he asks for.

Michael Kay

Tuesday, January 6, 2009 11:22:36 AM UTC  #    Comments [0] -

# Saturday, January 3, 2009

The last year we were working on a project, which in essence dealt with transformation of graphs. Our experience with xslt 1.0, and other available information was promising - xslt 2.0 is a perfect match.

We were right, xslt 2.0 fitted very well to the problem.

It's easy to learn xslt 2.0/xquery: be acquainted with xml schema; read through a syntax, which is rather concise; look at examples, and start coding. API you will learn incrementally.

The same as other languages, xslt 2.0 is only a media to express algorithms. As such it fills its role rather good, as good as SQL:2003 and its variations do, and sometimes even better than other programming languages like C++ do.

Compare expressions "get data satisfying to a specific criteria" and "for each data part check a specific condition, and if it true, add it to the result". These often represent the same idea from two perspectives: human (or math) thinkning; and thinking in terms of execution procedure.

Both kinds of expressions have their use, however it has happened so that we're the human beings and perceive more easily natural language notions like: subjects, objects, predicates, deduction, induction and so on. I think the reason is that a human's (not positronic) brain grasps ideas, conceptions, images as something static, while execution procedure demands a notion of time (or at least notions of a sequence and an order) for the comprehension. ("Are you serious?", "Joke!" :-))

There is the other side to this story.

We have made the project design in relatively short terms. A good scalable design. We needed people who know xslt 2.0 to implement it. It has turned out, this was a strong objection against xslt!

Our fellow, xslt guru, Oleg Tkachenko has left our company to make his career at Microsoft, and to our disbelief it was impossible to find a person who was interested in a project involvong 85% of xslt and 15% of other technologies including java. Even in java world people prefer routine projects, like standard swing or web application, to a project demanding creativeness.

Possibly, it was our mistake, to allow to our company to look for developers the standard way: some secretary was looking through her sources, and inevitably was finding so-so java + poor xml + almost zero xslt knowledge graduates. We had to make appeals on xslt forums especially since the project could be easily developed with a distributed group.

Finally, we have designed and implemented the project by ourselves but to the present day our managers are calling and suggesting java developers for our project. What a bad joke!

Saturday, January 3, 2009 9:51:05 AM UTC  #    Comments [0] -
# Wednesday, December 17, 2008

Just for fun I've created exslt2.xslt and exslt2-test.xslt to model concepts discussed at EXSLT 2.0 forum. I did nothing special but used tuple as reference, and also I've defined f:call() to make function call indirectly.

<?xml version="1.0" encoding="utf-8"?>
  exslt 2 sketches.
<xsl:stylesheet version="2.0"
  exclude-result-prefixes="xs t f">

  <xsl:include href="exslt2.xslt"/>

  <xsl:template match="/" name="main">
      <xsl:variable name="refs" as="item()*" select="
        for $i in 1 to 20 return
          f:ref(1 to $i)"/>

        <xsl:sequence select="
            for $ref in $refs return

        <xsl:for-each select="$refs">
          <xsl:variable name="index" as="xs:integer" select="position()"/>


        <xsl:text>1 + 2 = </xsl:text>
        <xsl:sequence select="f:call(xs:QName('t:add'), (1, 2))"/>

  <xsl:function name="t:add" as="xs:integer">
    <xsl:param name="arguments" as="xs:integer+"/>

    <xsl:variable name="first" as="xs:integer" select="$arguments[1]"/>
    <xsl:variable name="second" as="xs:integer" select="$arguments[2]"/>

    <xsl:sequence select="$first + $second"/>


Code can be found at

Wednesday, December 17, 2008 1:53:03 PM UTC  #    Comments [0] -
# Wednesday, December 10, 2008

We have created Java Xml Object Model purely for purposes of our project. In fact jxom at present has siblings: xml models for sql dialects. There are also different APIs like name normalizations, refactorings, compile time evaluation.

It turns out that jxom is also good enough for other developers.

The drawback of jxom, however, is rather complex xml schema. It takes time to understand it. To simplify things we have created (and planning to create more) a couple of examples allowing to feel how jxom xml looks like.

The latest version can be loaded from

We would be pleased to see more comments on the subject.

Wednesday, December 10, 2008 9:35:26 AM UTC  #    Comments [0] -
Announce | xslt
# Thursday, December 4, 2008

Although in last our projects we're using more Java and XSLT, we always compare Java and .NET features. It's not a secret that in most applications we may find cache solutions used to improve performance. Unlike .NET providing a robust cache solution Java doesn't provide anything standard. Of course Java's adept may find a lot of caching frameworks or just to say: "use HashMap (ArrayList etc.) instead", but this is not the same.

Think about options for Java:
1. Caching frameworks (caching systems). Yes, they do their work. Do it perfectly. Some of them are brought to the state of the art, but there are drawbacks. The crucial one is that for simple data caching one should use a whole framework. This option requires too many efforts to solve a simple problem.

2. Collection classes (HashMap, ArrayList etc.) for caching data. This is very straightforward solution, and very productive. Everyone knows these classes, nothing to configure. One should declare an instance of such class, take care of data access synchronization and everything starts working immediately. An admirable caching solution but for "toy applications", since it solves one problem and introduces another one. If an application works for hours and there are a lot of data to cache, the amount of data grows only and never reduces, so this is the reason why such caching is very quickly surrounded with all sort of rules that somehow reduce its size at run-time. The solution very quickly lost its shine and become not portable, but it's still applicable for some applications.

3. Using Java reference objects for caching data. The most appropriate for cache solution is a java.util.WeekHashMap class. WeakHashMap works exactly like a hash table but uses weak references internally. In practice, entries in the WeakHashMap are reclaimed at any time if they are not refered outside of map. This caching strategy depends on GC's whims and is not entirely reliable, may increase a number of cache misses.

We've decided to create our simple cache with sliding expiration of data.

One may create many cache instances but there is only one global service that tracks expired objects among these instances:

private Cache<String, Object> cache = new Cache<String, Object>();

There is a constructor that specifies an expiration interval in milliseconds for all cached objects:

private Cache<String, Object> cache = new Cache<String, Object>(15 * 60 * 1000)

Access is similar to HashMap:

instance = cache.get("key"); and cache.put("key", instance);

That's all one should know to start use it. Click here to download the Java source of this class. Feel free to use it in your applications.

Thursday, December 4, 2008 12:12:38 PM UTC  #    Comments [2] -
Announce | Tips and tricks
# Saturday, November 22, 2008

Recently, working on completely different thing, I've realized that one may create a "generator", function returning different values per each call. I was somewhat puzzled with this conclusion, as I thought xslt functions have no side effects, and for the same arguments xslt function returns the same result.

I've confirmed the conclusion at the forum. See Scope of uniqueness of generate-id().

In short:

  • each node has an unique identity;
  • function in the course of work creates a temporary node and produces a result depending on identity of that node.


<xsl:stylesheet version="2.0"

<xsl:template match="/">
  <xsl:message select="
    for $i in 1 to 8 return

<xsl:function name="f:fun" as="xs:string">
  <xsl:variable name="x">!</xsl:variable>

  <xsl:sequence select="generate-id($x)"/>


The next thought was that if you may create a generator then it's easy to create a good random number generator (that's a trivial math task).

Hey gurus, take a chance!

Saturday, November 22, 2008 8:27:48 AM UTC  #    Comments [2] -
# Thursday, November 20, 2008

Yesterday I've read of a new Garbage Collection implementation G1. To be honest I was not impressed.

I think Garbage Collection is an evil, or at least its present implementations. I do not believe in algorithms that in their very core assume a centralized execution. On the other hand it's clear it's not in my power to change the status quo. My lot is to give advices mostly incompetent and ignorable.

I'm waiting for the time when someone will reach the idea to bring some parts of GC logic out of runtime scope. This will require more VM  intelligence, however will bear its fruits.

JIT or compiler during a static analysis may prove that some objects being collected may make some of their referring objects unreachable, provided it can prove that referring objects are not reachable through the other means (e.g. private field which is not stored in other places). This is close to the ideas expressed in Muse on value types in java. It's possible to prepare a garbage graph in advance before runtime.

In many cases it's also possible to prove that when method's variable goes out of scope it's not reachable through the other means and may be collected. This allows to implement a stage of automatic garbage collection when objects that are proven to be a garbage be immedeately added to a free memory set.

As an example I'm thinking of java's ArrayList object which stores private array. When ArrayList is reclaimed or resized a reference to the private array is getting lost and memory can be added to the free set immediately.

This mechanics being integrated as the first stage of GC will make it less centralized, as I believe many objects will be collected this way.

Thursday, November 20, 2008 7:54:47 AM UTC  #    Comments [0] -
Tips and tricks
# Tuesday, November 18, 2008

Suppose you have constructed a sequence of attributes.

How do you access a value of attribute "a"?

Simple, isn't it? It has taken a couple of minutes to find a solution!

<xsl:variable name="attributes" as="attribute()*">
  <xsl:apply-templates mode="t:generate-attributes" select="."/>

<xsl:variable name="value" as="xs:string?"

Tuesday, November 18, 2008 11:41:41 AM UTC  #    Comments [2] -
Tips and tricks | xslt
# Thursday, November 13, 2008


Our project, containing many different xslt files, generates many different outputs (e.g: code that uses DB2 SQL, or Oracle SQL, or DAO, or some other flavor of code). This results in usage of indirect calls to handle different generation options, however to allow xslt to work we had to create a big main xslt including stylesheets for each kind of generation. This impacts on a compilation time.


  1. A big main xslt including everything.
  2. A big main xslt including everything and using "use-when" attribute.
  3. Compose main xslt on the fly.

We were eagerly inclined to the second alternative. Unfortunately a limited set of information is available when "use-when" is evaluated. In particular there are neither parameters nor documents available. Using Saxon's extensions one may reach only static variables, or access System.getProperty(). This isn't flexible.

We've decided to try the third alternative.


We think we have found a nice solution: to create XsltSource, which receives a list of includes upon construction, and creates an xslt when getReader() is called.



 * A source to read generated stylesheet, which includes other stylesheets.
public class XsltSource extends StreamSource
   * Creates an {@link XsltSource} instance.
  public XsltSource()

   * Creates an {@link XsltSource} instance.
   * @param systemId a system identifier for root xslt.
  public XsltSource(String systemId)

   * Creates an {@link XsltSource} instance.
   * @param systemId a system identifier for root xslt.
   * @param includes a list of includes.
  public XsltSource(String systemId, String[] includes)

    this.includes = includes;

   * Gets stylesheet version.
   * @return a stylesheet version.
  public String getVersion()
    return version;

   * Sets a stylesheet version.
   * @param value a stylesheet version.
  public void setVersion(String value)
    version = value;

   * Gets a list of includes.
   * @return a list of includes.
  public String[] getIncludes()
    return includes;

   * Sets a list of includes.
   * @param value a list of includes.
  public void setIncludes(String[] value)
    includes = value;

   * Generates an xslt on the fly.
  public Reader getReader()
    String[] includes = getIncludes();

    if (includes == null)
      return super.getReader();

    String version = getVersion();

    if (version == null)
      version = "2.0";

    StringBuilder builder = new StringBuilder(1024);

    builder.append("<stylesheet version=\"");
    builder.append("\" xmlns=\"\">");

    for(String include: includes)
      builder.append("<include href=\"");


    return new StringReader(builder.toString());

   * An xslt version. By default 2.0 is used.
  private String version;

   * A list of includes.
  private String[] includes;

To use it one just needs to write:

Source source = new XsltSource(base, stylesheets);
Templates templates = transformerFactory.newTemplates(source);


  • base is a base uri for the generated stylesheet; it's used to resolve relative includes;
  • stylesheets is an array of hrefs.

Such implementation resembles a dynamic linking when separate parts are bound at runtime. We would like to see dynamic modules in the next version of xslt.

Thursday, November 13, 2008 11:26:50 AM UTC  #    Comments [0] -
Tips and tricks | xslt
# Tuesday, November 4, 2008

Why we've turned our attention to the Saxon implementation?

A considerable part (~75%) of project we're working on at present is creating xslt(s). That's not stylesheets to create page presentations, but rather project's business logic. To fulfill the project we were in need of xslt 2.0 processor. In the current state of affairs I doubt someone can point to a good alternative to the Saxon implementation.

The open source nature of the SaxonB project and intrinsic curiosity act like a hook for such species like ourselves.

We want to say that we're rather sceptical observers of a code: the code should prove it have merits. Saxon looks consistent. It takes not too much time to grasp implementation concepts taking into account that the code routinely follows xpath/xslt/xquery specifications. These code observation and practice with live xslt tasks helped us to form an opinion on the Saxon itself. That's why we dare to critique it.

1. Compilation is fused with execution.

An xslt before being executed passes several stages including xpath data model, and a graph of expressions - objects implementing parts of runtime logic.

Expression graph is optimized to achieve better runtime performace. The optimization logic is distributed throughout the code, and in particular lives in expression objects. This means that expression completes two roles: runtime execution and optimization.

I would prefer to see a smaller and cleaner run time objects (expressions), and optimization logic separately. On the other hand I can guess why Michael Kay fused these roles: to ease lazy optimizations (at runtime).

2. Optimizations are xslt 1.0 by origin

This is like a heritage. There are two main techniques: cached sequences, and global indices of rooted nodes.

This might be enough in xslt 1.0, but in 2.0 where there are diverse set of types, where sequences extend node sets to other types, where sequences may logically be grouped by pairs, tripples, and so on, this is not enough.

XPath data model operates with sequences only (in math sense). On the other hand it defines many set based functions (operators) like: $a intersect $b, $a except $b, $a = $b, $a != $b. In these examples XPath sequences are better to consider as sets, or maps of items.

Other example: for $i in index-of($names, $name) return $values[$i], where $names as xs:string*, $values as element()* shows that a closure of ($names, $values) is in fact a map, and $names might be implemented as a composition of a sequence and a map of strings to indices.

There are other use case examples, which lead me to think that Saxon lacks set based operators. Global indices are poor substitution, which work for rooted trees only.

Again, I guess why Michael Kay is not implementing these operators: not everyone loads xslt with stressful tasks requiring these features. I think xslt is mostly used to render pages, and one rarely deviates from rooted trees.

In spite of the objections we think that Saxon is a good xslt 2.0 implementation, which unfortunately lacks competitors.

Tuesday, November 4, 2008 11:30:36 AM UTC  #    Comments [0] -
# Friday, October 17, 2008

We strongly object against persistence frameworks in their contemporary meaning. This includes a long row of names like Hibernate, Java Persistence API, LINQ, and others.

Consider how one of them describes itself:

...high performance object/relational persistence and query service... lets you develop persistent classes following object-oriented idiom - including association, inheritance, polymorphism, composition, and collections... allows you to express queries in its own portable SQL extension...

Sounds good, right?

We think not! Words "own" and "portable" regarding SQL are heard almost like antonyms. When one creates a unified language (a noble rush, opposed to a proprietary one (?)) she will inevitably adds a peer, increasing plurality in the family of languages.

Attempts to create similar layers between data and business logic are not new. This happens throughout the computer history. IDMS, NATURAL, COOL:GEN these are 20-30 years old examples.

Our reasoning (nothing new).

One need to approach to a design (development and maintainance) from different perspectives, thus she will understand the question under the design better, and will estimate skills to accomplish the problem. This will lead to a modularization e.g: business layer, data layer, appearance; and to development (maintainance) roles: program developer, database specialist, appearance speciaist. On a small scale several roles are often fulfilled with one person; this should not mean, however, that these roles are redundant, one just need to try on different roles.

Why does one separate business layer and data layer?

Pragmatic perspective. There are databases, which may accomplish most of data storage tasks in a more efficient way than one may achieve without database. There are two worlds of database specialists and program developers. These two layers and roles are facts of reality.

A desiner's goal is to keep these roles separate:

  • do not force a database specialist to know the business logic details;
  • do not force a program developer to know details on how to organize a storage in more efficient way, or on how to optimize a particular query;

Modularity helps here. Databases are well equipped to solve these tasks: the data layer should expose a database API through stored procedures, functions, and views, while the business layer should use this API to access the database.

With persistence frameworks there are two alterantives:

  1. still use data layer API;
  2. rely on a persistence framework.

When the first case is selected then a framework provides almost no aditional value comparing to traditional database access (jdbc,, an so on).

When one relies on a framework then a data layer interface virtually disappears (in fact a framework substitutes this interface). Database specialist has very little control over tuning the data structure, and optimizing queries, unless she starts digging in the business code but even then she always cannot control queries to the database. Moreover database specialist must learn a proprietary query language.

Result is that a persistence framework erodes a division of responsibilities, complicating development and maintainance.

We often hear a following explanation on why one should use Persistence Frameworks: "It eases database vendor switch". This is the most stupid reason to use Persistence Frameworks! It looks as if they plan to switch vendors once a day.

A design needs to focus on a modularity. This will make code more robust, faster and maintainable. This also eases potential migration process, as the data layer should be migrated only, with minimal (mostly configurational) changes in the business layer.

Friday, October 17, 2008 7:57:28 PM UTC  #    Comments [2] -
Tips and tricks
<February 2009>
Total Posts: 387
This Year: 3
This Month: 0
This Week: 0
Comments: 1371
Locations of visitors to this page
The opinions expressed herein are our own personal opinions and do not represent our employer's view in anyway.

© 2024, Nesterovsky bros
All Content © 2024, Nesterovsky bros
DasBlog theme 'Business' created by Christoph De Baene (delarou)