The very same simple tasks tend to appear in different languages (e.g.
C# Haiku).
Now we have to find:
- integer and fractional part of a decimal;
- length and precision of a decimal.
These tasks have no trivial solutions in xslt 2.0.
At present we have came up with the following answers:
Fractional part:
<xsl:function name="t:fraction" as="xs:decimal">
<xsl:param name="value" as="xs:decimal"/>
<xsl:sequence select="$value mod 1"/>
</xsl:function>
Integer part v1:
<xsl:function name="t:integer" as="xs:decimal">
<xsl:param name="value" as="xs:decimal"/>
<xsl:sequence select="$value - t:fraction($value)"/>
</xsl:function>
Integer part v2:
<xsl:function name="t:integer" as="xs:decimal">
<xsl:param name="value" as="xs:decimal"/>
<xsl:sequence select="
if ($value ge 0) then
floor($value)
else
-floor(-$value)"/>
</xsl:function>
Length and precision:
<!--
Gets a decimal specification as a closure:
($length as xs:integer, $precision as xs:integer).
-->
<xsl:function
name="t:decimal-spec" as="xs:integer+">
<xsl:param name="value"
as="xs:decimal"/>
<xsl:variable name="text" as="xs:string" select="
if ($value
lt 0) then
xs:string(-$value)
else
xs:string($value)"/>
<xsl:variable
name="length" as="xs:integer"
select="string-length($text)"/>
<xsl:variable
name="integer-length" as="xs:integer"
select="string-length(substring-before($text, '.'))"/>
<xsl:sequence select="
if
($integer-length) then
($length - 1, $length - $integer-length - 1)
else
($length, 0)"/>
</xsl:function>
The last function looks odious. In many other languages its implementation
would be considered as embarrassing.
Given:
public class N
{
public readonly N next;
}
What needs to be done to construct a ring of N: n1 refers to n2, n2 to n3, ... nk to n1? Is it possible?
To end with immutable trees, at least for now, we've implemented IDictionary<K, V>.
It's named Map<K, V>. Functionally it looks very like SortedDictionary<K, V>.
there are some differences, however:
Map in contrast to SortedDictionary is very cheap on
copy.
- Bacause
Map is based on AVL tree, which is more rigorly balanced
than RB tree, so it's a little bit faster asymptotically for lookup than SortedDictionary,
and a little bit slower on modification.
- Due to the storage structure: node + navigator,
Map consumes less memory than
SortedDictionary, and is probably cheaper for GC (simple garbage
graphs).
- As AVL tree stores left and right subtree sizes, in contrast to a "color" in
RB tree, we able to index data in two ways: with integer index, and with key
value.
Sources are:
Update:
It was impossible to withstand temptation to commit some primitive performance
comparision. Map outperforms SortedDictionary both in population and in access.
this does not aggree with pure algorithm's theory, but there might be other
unaccounted factors: memory consumption, quality of implementation, and so on.
Program.cs is updated with measurements.
Update 2:
More occurate tests show that for some key types Map's faster, for others
SortedDictionary's faster. Usually Map's slower during population (mutable AVL
tree navigator may fix this). the odd thing is that Map<string, int> is faster
than SortedDictionary<string, int> both for allocaction and for access. See
excel report.
Update 3:
Interesing observation. The following table shows maximal and
average tree heights for different node sizes in AVL and RB trees after a random population:
|
AVL |
RB |
| Size |
Max |
Avg |
Max |
Avg |
| 10 |
4 |
2.90 |
5 |
3.00 |
| 50 |
7 |
4.94 |
8 |
4.94 |
| 100 |
8 |
5.84 |
9 |
5.86 |
| 500 |
11 |
8.14 |
14 |
8.39 |
| 1000 |
12 |
9.14 |
16 |
9.38 |
| 5000 |
15 |
11.51 |
18 |
11.47 |
| 10000 |
16 |
12.53 |
20 |
12.47 |
| 50000 |
19 |
14.89 |
23 |
14.72 |
| 100000 |
20 |
15.90 |
25 |
15.72 |
| 500000 |
25 |
18.26 |
28 |
18.27 |
| 1000000 |
25 |
19.28 |
30 |
19.27 |
Here, according with theory, the height of AVL tree is shorter than the height
of RB tree. But what is most interesting is that the depth of an "average
node". This value describes a number of steps required to find a random key. RB
tree is very close and often is better than AVL in this regard.
It was obvious as hell from day one of generics that there will appear obscure
long names when you will start to parametrize your types. It was the easiest
thing in the world to take care of this in advanvce. Alas, C# inherits C++'s bad
practices.
Read Associative containers in a functional languages
and
Program.cs to see what we're talking about.
Briefly, there is a pair (string, int), which in C# should be declared as:
System.Collections.Generic.KeyValuePair<string, int>
Obviously we would like to write it in a short way. These are our attempts, which
fail:
1. Introduce generic alias Pair<K, V>:
using System.Collections.Generic;
using Pair<K, V> = KeyValuePair<K, V>;
2. Introduce type alias for a generic type with specific types.
using System.Collections.Generic;
using Pair = KeyValuePair<string, int>;
And this is only one that works:
using Pair = System.Collections.Generic.KeyValuePair<string, int>;
Do you think is it bearable? Well, consider the following:
- There is a generic type
ValueNode<T>, where T
should be Pair.
- There is a generic type
TreeNavigator<N>, where N is should be ValueNode<Pair>.
The declaration looks like this:
using Pair = System.Collections.Generic.KeyValuePair<string, int>;
using Node = NesterovskyBros.Collections.AVL.ValueNode<
System.Collections.Generic.KeyValuePair<string, int>>;
using Navigator = NesterovskyBros.Collections.AVL.TreeNavigator<
NesterovskyBros.Collections.AVL.ValueNode<
System.Collections.Generic.KeyValuePair<string, int>>>;
Do you still think is it acceptable?
P.S. Legacy thinking led C#'s and java's designers to the use of word "new" for the
object construction. It is not required at all. Consider new Pair("A", 1) vs Pair("A", 1).
C++ prefers second form. C# and java always use the first one.
Continuing with the post "Ongoing xslt/xquery spec update"
we would like to articulate what options regarding associative containers do we
have in a functional languages (e.g. xslt, xquery), assuming that variables are
immutable and implementation is efficient (in some sense).
There are three common implementation techniques:
- store data (keys, value pairs) in sorted array, and use binary search to
access values by a key;
- store data in a hash map;
- store data in a binary tree (usually RB or AVL trees).
Implementation choice considerably depends on operations, which are taken over
the container. Usually these are:
- construction;
- value lookup by key;
- key enumeration (ordered or not);
- container modification (add and remove data into the
container);
- access elements by index;
Note that modification in a functional programming means a creation of a new
container, so here is a
division:
- If container's use pattern does not include modification, then probably the
simplest solution is to build it as an ordered sequence of
pairs, and use binary search to access the data. Alternatively, one could
implement associative container as a hash map.
- If modification is essential then neither ordered sequence of pairs, hash map
nor classical tree implementation can be used, as they are either too slow
or too greedy for a memory, either during modification or during access.
On the other hand to deal with container's modifications one can build
an implementation, which uses "top-down" RB
or AVL trees. To see the
difference consider a classical tree structure and its functional variant:
|
Classical |
Functional |
| Node structure: |
node
parent
left
right
other data |
node
left
right
other data |
| Node reference: |
node itself |
node path from a root of a tree |
| Modification: |
either mutable or requires a completely new tree |
O(LnN) nodes are created
|
Here we observe that:
- one can implement efficient map (lookup time no worse than O(LnN)) with no
modification support, using ordered array;
- one can implement efficient map with support of modification, using immutable binary tree;
- one can implement all these algorithms purely in xslt and xquery (provided that inline
functions are supported);
- any such imlementation will lose against the same implementation
written in C++, C#, java;
- the best implementation would probably start from sorted array and
will switch to binary tree after some size threshold.
Here we provide a C# implementation of a functional AVL tree, which also supports
element indexing:
Our intention was to show that the usual algorithms for associative
containers apply in functional
programming; thus a feature complete functional language must support
associative containers to make development more conscious, and to free a
developer from inventing basic things existing already for almost a half of
century.
A client asked us to produce Excel reports in ASP.NET
application. They've given an Excel templates, and also defined what they want to show.
What are our options?
- Work with Office COM API;
- Use Office Open XML SDK (which is a set of pure .NET
API);
- Try to apply xslt somehow;
- Macro, other?
For us, biased to xslt, it's hard to make a fair choice. To judge, we've
tried formalize client's request and to look into future support.
So, we have defined sql stored procedures to provide the data. This way data can be
represented either as ADO.NET DataSet, a set of classes, as xml, or in other reasonable format. We do not
predict any considerable problem with data representation if client will decide
to modify reports in future.
It's not so easy when we think about Excel generation.
Due to ignorance we've thought that Excel is much like xslt in some regard, and
that it's possible to provide a tabular data in some form and create Excel
template, which will consume the data to form a final output. To some extent
it's possible, indeed, but you should start creating macro or vb scripts to
achieve acceptable results.
When we've mentioned macroses to the client, they immediately stated that
such a solution won't work due to security reasons.
Comparing COM API and Open XML SDK we can see that both provide almost the same
level of service for us, except that the later is much more lighter and supports only Open XML format, and the earlier is a heavy
API exposing MS Office and supports earlier versions also.
Both solutions have a considerable drawback: it's not easy to create Excel
report in C#, and it will be a pain to support such solution if client will ask,
say in half a year, to modify something in Excel template or to create one more
report.
Thus we've approached to xslt. There we've found two more directions:
- generate data for Office Open XML;
- generate xml in format of MS Office 2003.
It's turned out that it's rather untrivial task to generate data for Open XML,
and it's not due to the format, which is not xml at all but a zipped folder
containing xmls. The problem is in the complex schemas and in many complex
relations between files constituting Open XML document. In contrast, MS
Office 2003 format allows us to create a single xml file for the spreadsheet.
Selecting between standard and up to date format, and older proprietary one, the
later looks more attractive for the development and support.
At present we're at position to use xslt and to generate files in MS Office
2003 format. Are there better options?
Did you ever hear that double numbers may cause roundings, and that
many financial institutions are very sensitive to those roundings?
Sure you did! We're also aware of this kind of problem, and we thought we've
taken care of it. But things are not that simple, as you're not always
know what an impact the problem can have.
To understand the context it's enough to say that we're converting (using xslt by the way) programs
written in a CASE tool called
Cool:GEN into java and into C#. Originally, Cool:GEN generated COBOL and C
programs as deliverables. Formally, clients compare COBOL results vs java or C#
results, and they want them to be as close as possible.
For one particular client it was crucial to have correct results during
manipulation with numbers with 20-25 digits in total, and with 10 digits after a decimal point.
Clients are definitely right, and we've introduced generation options to control
how to represent numbers in java and C# worlds; either as double or
BigDecimal (in java), and decimal (in C#).
That was our first implementation. Reasonable and clean. Was it enough? - Not at
all!
Client's reported that java's results (they use java and BigDecimal
for every number with decimal point) are too precise, comparing to Mainframe's
(MF) COBOL. This rather unusuall complain puzzles a litle, but client's
confirmed that they want no more precise results than those MF produces.
The reason of the difference was in that that both C# and especially java may
store much more decimal digits than is defined for the particualar result on MF.
So, whenever you define a field storing 5 digits after decimal point, you're
sure that exactly 5 digits will be stored. This contrasts very much with results
we had in java and C#, as both multiplication and division can produce many more
digits after the decimal point. The solution was to truncate(!) (not to round) the
numbers to the specific precision in property setters.
So, has it resolved the problem? - No, still not!
Client's reported that now results much more better (coincide with MF, in fact)
but still there are several instances when they observe differences in 9th and
10th digits after a decimal point, and again java's result are more accurate.
No astonishment this time from us but analisys of the reason of the difference.
It's turned out that previous solution is partial. We're doing a final truncation
but still there were intermediate results like in a/(b * c), or in a * (b/c).
For the intermediate results MF's COBOL has its, rather untrivial, formulas (and
options) per each operation defining the number of digits to keep after a
decimal point. After we've added similar options into the generator, several
truncations've manifested in the code to adjust intermediate results. This way
we've reached the same accurateness as MF has.
What have we learned (reiterated)?
- A simple problems may have far reaching impact.
- More precise is not always better. Client often prefers compatible rather than
more accurate results.
For some reason C# lacks a decimal truncation function
limiting result to a specified number of digits after a decimal point. Don't
know what's the reasoning behind, but it stimulates the thoughts. Internet
is plentiful with workarounds. A tipical answer is like this:
Math.Truncate(2.22977777 * 1000) / 1000; // Returns 2.229
So, we also want to provide our solution to this problem.
public static decimal Truncate(decimal value,
byte decimals)
{
decimal result = decimal.Round(value, decimals);
int c = decimal.Compare(value, result);
bool negative = decimal.Compare(value, 0) < 0;
if (negative ? c <= 0 : c >= 0)
{
return result;
}
return result - new decimal(1, 0, 0, negative, decimals);
}
Definitely, if the function were implemented by the framework it were much more efficient. We assume, however, that above's the best implementation that can be done externally.
A natural curiosity led us to the implementation of connection
pooling in Apache Tomcat (org.apache.commons.dbcp).
And what're results do you ask?
Uneasiness... Uneasiness for all those who use it. Uneasiness due to the
difference between our expectations and real implementation.
Briefly the design is following:
- wrap every jdbc object;
- cache prepared statements wrappers;
- lookup prepared statement wrappers in the cache before
asking original driver;
- upon close return wrappers into the cache.
It took us a couple of minutes to see that this is very problematic design, as
it does not address double close of statements properly (jdbc states that is
safe to call close() over closed jdbc object). With Apache's design it's safe
not to touch the object after the close() call, as it returned to the pool and
possibly already given to an other client who requested it.
The correct design would be:
- wrap every jdbc object;
- cache original prepared statements;
- lookup original prepared statement in the cache before asking original
driver, and return wrappers;
- detach wrapper upon close from original object, and put original object
into the cache.
A bit later. We've found a confirmation of our doubts on Apache site: see "JNDI Datasource HOW-TO
", chapter "Common Problems".
Our experience with facelets shows that when you're designing
a composition components you often want to add a level of customization. E.g.
generate
element with or without id, or define class/style if value is specified.
Consider for simplicity that you want to encapsulate a check box and pass
several attributes to it. The first version that you will probably think of is something like
this:
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:ui="http://java.sun.com/jsf/facelets"
xmlns:c="http://java.sun.com/jstl/core"
xmlns:h="http://java.sun.com/jsf/html"
xmlns:ex="http://www.nesterovsky-bros.com/jsf">
<body>
<!--
Attributes:
id - an optional id;
value - a data binding;
class - an optional element class;
style - an optional element inline style;
onclick - an optional script event handler for onclick event;
onchange - an optional script event handler for onchange event.
-->
<ui:component>
<h:selectBooleanCheckbox
id="#{id}"
value="#{value}"
style="#{style}"
class="#{class}"
onchange="#{onchange}"
onclick="#{onclick}"/>
</ui:component>
</body>
</html>
Be sure, this is not what you have expected. Output will contain all mentioned
attributes, even those, which weren't passed into a component (they will have empty
values). More than that, if you will omit "id", you will get an error like: "emtpy
string is not valid id".
The reason is in the EL! Attributes used in
this example are of type String, thus result of evaluation of value expression is coersed to String.
Values of attributes that weren't passed in are evaluated to null. EL returns ""
while coersing null to String. The interesting thing
is that, if EL were not changing null then those omitted attributes would not appear in the output.
The second attept would probably be:
<h:selectBooleanCheckbox value="#{value}">
<c:if test="#{!empty id}">
<f:attribute name="id" value="#{id}"/>
</c:if>
<c:if test="#{!empty onclick}">
<f:attribute name="onclick" value="#{onclick}"/>
</c:if>
<c:if test="#{!empty onchange}">
<f:attribute name="onchange" value="#{onchange}"/>
</c:if>
<c:if test="#{!empty class}">
<f:attribute name="class" value="#{class}"/>
</c:if>
<c:if test="#{!empty style}">
<f:attribute name="style" value="#{style}"/>
</c:if>
</h:selectBooleanCheckbox>
Be sure, this won't work either (it may work but not as you would expect). Instruction c:if
is evaluated on the stage of the building of a component tree, and not on the
rendering stage.
To workaround the problem you should prevent null to "" conversion in the EL.
That's, in fact, rather trivial to achieve: value expression should evaluate to
an object different from String, whose toString() method returns a required
value.
The final component may look like this:
<h:selectBooleanCheckbox
id="#{ex:object(id)}"
value="#{value}"
style="#{ex:object(style)}"
class="#{ex:object(class)}"
onchange="#{ex:object(onchange)}"
onclick="#{ex:object(onclick)}"/>
where ex:object() is a function defined like this:
public static Object object(final Object value)
{
return new Object()
{
public String toString()
{
return value == null ? null : value.toString();
}
}
}
A bit later: not everything works as we expected. Such approach doesn't work with the validator attribute, whereas it works with converter attribute. The difference between them is that the first attribute should be MethodExpression value, when the second one is ValueExpression value. Again, we suffer from ugly JSF implementation of UOutput component.
Recently we have seen a blog entry: "JSF: IDs and clientIds in Facelets", which provided wrong implementation of the feature.
I'm not sure how useful it is, but here is our approach to the same problem.
In the core is ScopeComponent. Example uses a couple of utility functions defined in Functions. Example itself is found at window.xhtml:
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:ui="http://java.sun.com/jsf/facelets"
xmlns:c="http://java.sun.com/jstl/core"
xmlns:h="http://java.sun.com/jsf/html"
xmlns:f="http://java.sun.com/jsf/core"
xmlns:fn="http://java.sun.com/jsp/jstl/functions"
xmlns:ex="http://www.nesterovsky-bros.com/jsf">
<body>
<h:form>
<ui:repeat value="#{ex:sequence(5)}">
<f:subview id="scope" binding="#{ex:scope().value}">
#{scope.id}, #{scope.clientId}
</f:subview>
<f:subview id="script" uniqueId="my-script"
binding="#{ex:scope().value}" myValue="#{2 + 2}">
, #{script.id}, #{script.clientId},
#{script.bindings.myValue.expressionString},
#{ex:value(script.bindings.myValue)},
#{script.attributes.myValue}
</f:subview>
<br/>
</ui:repeat>
</h:form>
</body>
</html>
Update: ex:scope() is made to return a simple bean with property "value".
Another useful example:
<f:subview id="group" binding="#{ex:scope().value}">
<h:inputText id="input" value="#{bean.property}"/>
<script type="text/javascript">
var element = document.getElementById('#{group.clientId}:input');
</script>
</f:subview>
In the section about AJAX, JSF 2.0 spec (final draft) talks about partial requests...
This sounds rather strange. My perception was that the AJAX is about partial responses. What a sense to send partial requests? Requests are comparatively small anyway! Besides, a partial request may complicate restoring component tree on the server and made things fragile, but this largely depends on what they mean with these words.
Recently we were disputing (Arthur vs Vladimir) about the
benefits of ValueExpression references in JSF/Facelets.
Such dispute in itself presents rather funny picture when
you're defending one position and after a while you're taking opposite
point
and starting to maintain it. But let's go to the problem.
JSF/Facelets uses
Unified
Expression Language for the data binding, e.g.:
<h:inputText id="name" value="#{customer.name}" />
or
<h:selectBooleanCheckbox id="selected" value="#{customer.selected}" />
In these cases value from input and check boxes are mapped to a properties name, and selected of a bean named customer.
Everything is fine except of a case when selected
is not of boolean type (e.g. int). In this case you will have a hard time thinking
on how to adapt bean property to the jsf component. Basically, you have to
provide a bean adapter, or change type of property. Later is
unfeasible in our case, thus we're choosing bean adapter. More than that we have to create a
generic solution for int to boolean property type
adapter. With
this target in mind we may create a function receiving bean and a property name and
returning other bean with a single propery of boolean type:
<h:selectBooleanCheckbox id="selected" value="#{ex:toBoolean(customer, 'selected').value}" />
But thinking further the question appears: whether we can pass ValueExpression by reference into a bean adapter function, and have something like this:
<h:selectBooleanCheckbox id="selected" value="#{ex:toBoolean(byref customer.selected).value}" />
It turns out that it's possible to do this kind of thing. Unfortunately it requires custom facelets tag, like this:
<ex:ref var="selected"
value="#{customer.selected}"/>
<h:selectBooleanCheckbox id="selected" value="#{ex:toBoolean(selected).value}" />
Implementation of such a tag is really primitive (in fact it mimics c:set tag
handler except one line), but still it's an extension on the level we don't
happy to introduce.
This way we were going circles considering pros and cons, regretting that el
references ain't native in jsf/facelets and weren't able to classify whether our
solution is a hack or a neat extension...
P.S. We know that JSF 2.0 provides solution for h:selectBooleanCheckbox but still there are cases when similar technique is required
even there.
We always tacitly assumed that protected modifier in java
permits member access from a class the member belongs to, or from an instance of
class's descendant. Very like the C++ defines it, in fact.
In other words no external client of an instance can directly access a protected member of that instance or class the instance belongs to.
It would be very interesting to know how many people live
with such a naivete, really!
Well, that's what java states:
The protected modifier specifies that the member can only be accessed within its own package (as with package-private) and, in addition, by a subclass of its class in another package.
If one'll think, just a little, she'll see that this gorgeous definition
is so different from C++'s and so meaningless that they would better dropped
this modifier altogether.
The hole is so huge that I can easily build an example
showing how to modify protected member of some other class in a perfectly valid
way. Consider:
MyClass.java
package com.mypackage;
import javax.faces.component.Hack;
import javax.faces.component.UIComponentBase;
import javax.faces.event.FacesListener;
public class MyClass
{
public void addFacesListener(
UIComponentBase component,
FacesListener listener)
{
Hack.addFacesListener(component, listener);
}
...
}
Hack.java
package javax.faces.component;
import javax.faces.event.FacesListener;
public class Hack
{
public static void addFacesListener(
UIComponentBase component,
FacesListener listener)
{
component.addFacesListener(listener);
}
}
An example is about to how one adds custom listener to an arbitrary jsf component. Notice that this is not
assumed by design, as a method addFacesListener() is protected. But see how easy one can hack this dummy "protected" notion.
Update: for a proper implementation of protected please read Manifest file, a part about package sealing.
Just in case, if you don't know what JSON stands for - it's JavaScript Object Notation.
You may find a plenty of JSON implementations in java, so we shall add one more idea. Briefly, it's about to plug it into xml serialization infrastructure JAXB. Taking into account that JAXB now is an integral part of java platform itself, benefits are that you can transparently use the same beans for xml and JSON serialization.
What you need to do is only to provide JSON reader and writer under the hood of XMLStreamReader and XMLStreamWriter interfaces.
In spare time we shall implement this idea.
If you by chance see lines like the following in your code:
private transient final Type field;
then know, you're in the trouble!
The reason is simple, really (provided you're sane and don't put field modifiers without reason). transient assumes that your class is serializable, and you have a particular field that you don't want to serialize. final states that the field is initialized in the constructor, and does not change the value for the rest life cycle.
This way if you will serialize an instance of class with such a field, and then deserialize it back, you will have the field initialized with null, and no way to have another value there.
P.S. That's what we have found in our code recently:
private transient final Lock sync = new ReentrantLock();
Generics in C# look inferior to templates (especially to concepts) in C++,
however now and then you can build a wonderful pieces the way a C++ profi would
envy.
Consider a generic converter method: T Convert<T>(object value).
In C++ I would create several template specializations for all supported
conversions. Well, to make things harder, think of converter provider supporting
conversion:
public interface IConverterProvider
{
Converter<object, T> Get<T>();
}
That begins to be a puzzle in C++, but C# handles it easily!
My first C#'s implementation was too naive, and spent too many cycles in
provider, resolving which converter to use. So, I went on, and have created a
sofisticated implementation like this:
private IConverterProvider provider = ...
public T Convert<T>(object value)
{
var converter = provider.Get<T>();
return converter(value);
}
...
public class ConverterProvider: IConverterProvider
{
public Converter<object, T> Get<T>()
{
return Impl<T>.converter;
}
private static class Impl<T>
{
static Impl()
{
// Heavy implementation initializing converters.
converter = ...
}
public static readonly Converter<object, T> converter;
}
}
Go, and do something close in C++!
If you have a string variable $value as xs:string, and want to know whether it starts from a digit, then what's the best way to do it in the xpath?
Our answer is: ($value ge '0') and ($value lt ':').
Looks a little funny (and disturbing).
In our project we're generating a lot of xml files, which are subjects of manual
changes, and repeated generations (often with slightly different generation
options). This way a life flow of such an xml can be described as following:
- generate original xml (version 1)
- manual changes (version 2)
- next generation (version 3)
- manual changes integrated into the new generation (version 4)
If it were a regular text files we could use diff utility to prepare
patch between versions 1 and 2, and apply it with patch utility to
a version 3. Unfortunately xml has additional semantics compared to a plain text. What's an
invariant or a simple modification in xml is often a drastic change in text.
diff/patch does not work well for us. We need xml diff
and patch.
The first guess is to google it! Not so simple.
We have failed to find a tool or an API that can be used from ant. There are a
lot of GUIs to show xml differences and to perform manual merge, or doing
similar but different things to what we need
(like MS's xmldiffpatch).
Please point us to such a program!
Meantime, we need to proceed. We don't believe that such a tool can be
done on the knees, as it's a heuristical and mathematical at the same time
task requiring a careful design and good statistics for the use cases. Our idea
is to exploit
diff/patch. To achieve the goals we're going to
perform some normalization of xmls before diff to remove redundant
invariants, and normalization after the patch to return it into a readable form.
This includes:
- ordering attributes by their names;
- replacing unsignificant whitespaces with line breaks;
- entering line breaks after element names and before attributes, after
attribute name and before it's value, and after an attribute value.
This way we expect to recieve files reacting to modifications similarly to text
files.
Sunny> Look what have I found! Consider a C#:
public class T
{
public T free;
}
public void NewTest()
{
T cache = new T();
Stopwatch timer = new Stopwatch();
timer.Reset();
timer.Start();
for(int i = 0; i < 10000000; ++i)
{
// Get from cache.
T t;
if (cache.free == null)
{
cache.free = new T();
}
t = cache.free;
// Release
cache.free = t;
t = null;
}
timer.Stop();
long cacheTicks = timer.ElapsedTicks;
timer.Reset();
timer.Start();
for(int i = 0; i < 10000000; ++i)
{
new T();
}
timer.Stop();
long newTicks = timer.ElapsedTicks;
Console.WriteLine("cache: {0}, new: {1}", cacheTicks, newTicks);
}
Gloomy> And?
Sunny> Tests show that new T() is almost as fast as
caching! GC's "new" probably has a fast route, where it shifts free memory border
in an atomic way, thus allocation takes just several cycles.
Gloomy> Well, you're probably right, there is a fast route. I, however,
have a different opinion. To track references, a generational garbage collector
implements field assign as a call rather than a mov.
This routine, except move itself, marks touched memory page in a special card
table (who said GC is cheap?); thus, I think, a reference field setter is
almost as slow as the "new" call.
.Net is known for its array covariance. That means that any array can be cast to
an array of base elements:
public class T: B
{
}
T[] tlist = ...
B[] blist = tlist;
This feature comes at cost:
B b = ...
T t = ...
blist[0] = b; // This efficiently is: blist[0] = (T)b;
tlist[0] = t; // This is the same: tlist[0] = (T)t;
We pay the cost of additional cast, just for nothing. Let this dubious design decision opresses .Net/Java inventors.
You can eliminate the cast. Just use array of structs:
struct S<T>
{
public T t;
}
S<T>[] slist = ...
slist[0].t = t; // Works without cast.
Measurment show that S[] is ~35% faster than T[] on write, and slower (JIT could do better) on read.
Well, ugly workaround of ugly design.
P.S. In java there is no relief...
There is a method Right() in the RB tree implementation:
public int Right(int node)
{
return items[node].right;
}
JIT does not want to inline it, probably as the method may throw:
public int Right(int node)
{
return items[node].right;
00000000 mov eax,dword ptr [ecx+4]
00000003 cmp edx,dword ptr [eax+4]
00000006 jae 00000013
00000008 shl edx,4
0000000b lea eax,[eax+edx+8]
0000000f mov eax,dword ptr [eax+8]
00000012 ret
00000013 call 74C3A62C
00000018 int 3
Too sad.
Early in 2001 we've read that .NET's JIT is smart enough to optimize repeated
boundary checks.
In the year 2009 we still can verify that this is not the case (no matter how
hard you try).
C#:
private int CharAt(int offset)
{
string text = this.text;
return (uint)offset >= (uint)text.Length ? -1 : text[offset];
}
Disassembly:
private int CharAt(int offset)
{
string text = this.text;
00000000 push ebp
00000001 mov ebp,esp
00000003 mov ecx,dword ptr [ecx+30h]
return (uint)offset >= (uint)text.Length ? -1 : text[offset];
00000006 cmp dword ptr [ecx+8],edx
00000009 jbe 00000017
0000000b cmp edx,dword ptr [ecx+8]
0000000e jae 0000001C
00000010 movzx eax,word ptr [ecx+edx*2+0Ch]
00000015 pop ebp
00000016 ret
00000017 or eax,0FFFFFFFFh
0000001a pop ebp
0000001b ret
0000001c call 74C24C6C
00000021 int 3
P.S. Neither this method is inlined (IL length is 25 bytes).
Yesterday, I've installed IE8.
Looks better here and there.
Today, I'm shocked!
I've reopened my web mail and it remembered the session. It keeps session cookies after closing IE8 instance!
I did not believe to myself and logged into an another web application, and then opened another IE8 instance. What do you think? - It shares the session between instances!
That is a serious security problem.
It prevents me from opening two sessions of a web application on my computer.
P.S. we have found that this problem was already discussed. See IE8 handles sessions/cookies different than IE7 - big trouble for - ...
Someone needs a brain surgery...
Quick solution: run IE8 with -nomerge command line option.
We'd like to return to the binary tree algorithms and spell what you cannot
do with generics in C#. Well, you can do many things, however with generalization
penalty.
Consider a binary tree node: Node(Parent, Left, Right). RB, AVL, and
others algorithms attach some private information to this node to perform
balancing.
You can express this idea methematically (and in C++), you cannot implement it efficiently in C#.
More focused example. Consider RB tree: Node(Parent, Left, Right, Color).
There are a number of ways you may implement the internal structure of the tree.
Algorithms themselves stay the same.
Straightforward implementation:
class Node
{
Node Parent;
Node Left;
Node Right;
bool Color;
}
This implementation allocates nodes in the heap and each node refers to other
nodes.
Node navigator implementation:
class Node
{
Node Left;
Node Right;
bool Color;
}
struct NodeNavigator
{
Node[] nodes;
int index;
}
Node does not refer to the parent. This reduces the memory consumption and
simplifies object graph, which is good for GC. Tree is walked using a node
navigator, which stores ancestors of the node.
Node as a structure:
struct Node
{
int Parent;
int Left;
int Right;
bool Color; // This might be integrated as highest bit of parent.
}
Tree is stored as an array of nodes. This is compact and GC efficient
implementation.
Node as a structure, and with node navigator:
struct Node
{
int Left;
int Right;
bool Color; // This might be integrated as highest bit of left.
}
struct NodeNavigator
{
Tree tree;
int[] nodes;
int index;
}
Tree is stored as an array of nodes, and a navigator is used to walk it. This is the most compact implementation.
Each implementation has its virtues. The common between implementations is that
they share the same balancing and navigation algorithms. Storage
differences prevent a single C# implementation. To the contrast, C++ allows to
define a concept "tree" and to define specializations of this concept, allowing
a unified algorithms; all this is done without performance penalty.
P.S. java in this regard, is almost alternativeless...
Do you agree that binary trees and algorithms that keep trees reasonably balanced
are important?
Our answer is yes!
It's interesting enough, however, that you won't easily find these algorithms
publicly available.
Though red-black,
AVL and other algorithms
described in the wikipedia are defined in terms of tree manipulation, all
implementations we have seen, deal with trees annotated with keys and values.
These implementations really use tree balancing algorithms behind the schene,
and expose a commonplace set or map containers to a client. Even
C++ Standard
Library suffers from this disease.
We think that binary trees are valuable independent concepts, and they worth to
be implemented separately, at least because there are other algorithms, except
sets and maps, using trees.
And well, we did it in C#! See
RedBlackTree.cs.
Consider an example - a simple scheduler,
ScheduleBookmark.cs, with operations:
- schedule an action;
- remove an action from the schedule;
- enumerate actions;
- find a date, an action is scheduled for;
- find an action (or at least closest one) for a specified date;
- postpone actions due to delays;
A balanced binary tree allows efficient implementation of such a scheduler. Tree
node stores an action, and a time span between parent node and this node.
This way:
| Operation |
Steps |
| schedule an action |
find place + link node + rebalance tree |
| remove an action from the schedule |
unlink node + rebalance tree |
| enumerate actions |
navigate tree |
| find a date, an action is scheduled for |
find node in tree |
| find an action for a specified date |
cumulate time spans up to the tree root |
| postpone actions due to delays |
fixup time spans from a node up to the tree root |
Compare operation complexities between tree, array, list and map:
| Operation |
Tree |
Array |
List |
Map |
| schedule an action |
O(ln(N)) |
O(N) |
O(N) |
O(ln(N)) |
| remove an action from the schedule |
O(ln(N)) |
O(N) |
O(1) |
O(ln(N)) |
| enumerate actions |
O(ln(N)) |
O(1) |
O(1) |
O(ln(N)) |
| find a date, an action is scheduled for |
O(ln(N)) |
O(1) |
O(1) |
O(1) |
| find an action for a specified date |
O(ln(N)) |
O(ln(N)) |
O(N) |
O(ln(N)) |
| postpone actions due to delays |
O(ln(N)) |
O(N) |
O(N) |
O(N*ln(N)) |
Complexity of each operation for the tree is O(ln(N)). No arrays, lists, or maps achieve similar worst case guaranty.
Finally, the test program is
Program.cs,
and a whole project (VS2008) is
Tree.zip
Could you think of a C# method accepting an ancestor, and
forbidding a descendant of a class at compile time?
The answer to this probably is: why do you need such a reptile.
Well, I don't. I didn't meant to create such a method, but generics help a lot!
public class BinaryTreeNode<Node>
where Node: BinaryTreeNode<Node>
{
public Node parent;
public Node left;
public Node right;
}
public class MyNode: BinaryTreeNode<MyNode>
{
public int key;
}
public class MyRoot: MyNode
{
}
public class Test
{
public void test()
{
MyRoot root = new MyRoot();
// print((MyNode)root); // This works.
print(root); // This does not work.
}
private static void print<T>(T node)
where T: BinaryTreeNode<T>
{
Console.WriteLine("print me");
}
}
By the way, BinaryTreeNode is an "abstract" class, as you cannot instantiate it but inherit only.
Once upon a time, we created a function mimicking
decapitalize() method defined in java in java.beans.Introspector. Nothing
special, indeed. See the source:
/**
* Utility method to take a string and convert it to normal Java variable
* name capitalization. This normally means converting the first
* character from upper case to lower case, but in the (unusual) special
* case when there is more than one character and both the first and
* second characters are upper case, we leave it alone.
* <p>
* Thus "FooBah" becomes "fooBah" and "X" becomes "x", but "URL" stays
* as "URL".
*
* @param name The string to be decapitalized.
* @return The decapitalized version of the string.
*/
public static String decapitalize(String name) {
if (name == null || name.length() == 0) {
return name;
}
if (name.length() > 1 && Character.isUpperCase(name.charAt(1)) &&
Character.isUpperCase(name.charAt(0))){
return name;
}
char chars[] = name.toCharArray();
chars[0] = Character.toLowerCase(chars[0]);
return new String(chars);
}
We typed implementation immediately:
<xsl:function name="t:decapitalize" as="xs:string">
<xsl:param name="value" as="xs:string?"/>
<xsl:variable name="c" as="xs:string"
select="substring($value, 2, 1)"/>
<xsl:sequence select="
if ($c = upper-case($c)) then
$value
else
concat
(
lower-case(substring($value, 1, 1)),
substring($value, 2)
)"/>
</xsl:function>
It worked, alright, until recently, when it has fallen to work, as the output was
different from java's counterpart.
The input was W9Identifier. Function naturally returned the same value, while
java returned w9Identifier. We has fallen with the assumption that
$c = upper-case($c) returns true when character is an upper case letter. That's
not correct for numbers. Correct way is:
<xsl:function name="t:decapitalize" as="xs:string">
<xsl:param name="value" as="xs:string?"/>
<xsl:variable name="c" as="xs:string"
select="substring($value, 2, 1)"/>
<xsl:sequence select="
if ($c != lower-case($c)) then
$value
else
concat
(
lower-case(substring($value, 1, 1)),
substring($value, 2)
)"/>
</xsl:function>
Although in last our projects we're using more Java and XSLT, we always compare Java and .NET features. It's not a secret that in most applications we may find cache solutions used to improve performance. Unlike .NET providing a robust cache solution Java doesn't provide anything standard. Of course Java's adept may find a lot of caching frameworks or just to say: "use HashMap (ArrayList etc.) instead", but this is not the same.
Think about options for Java:
1. Caching frameworks (caching systems). Yes, they do their work. Do it perfectly. Some of them are brought to the state of the art, but there are drawbacks. The crucial one is that for simple data caching one should use a whole framework. This option requires too many efforts to solve a simple problem.
2. Collection classes (HashMap, ArrayList etc.) for caching data. This is very straightforward solution, and very productive. Everyone knows these classes, nothing to configure. One should declare an instance of such class, take care of data access synchronization and everything starts working immediately. An admirable caching solution but for "toy applications", since it solves one problem and introduces another one. If an application works for hours and there are a lot of data
to cache, the amount of data grows only and never reduces, so this is the reason why such caching is very quickly surrounded with all sort of rules that somehow reduce its size at run-time. The solution very quickly lost its shine and become not portable, but it's still applicable for some applications.
3. Using Java reference objects for caching data. The most appropriate for cache solution is a java.util.WeekHashMap class. WeakHashMap works exactly like a hash table but uses weak references internally. In practice, entries in the WeakHashMap are reclaimed at any time if they are not refered outside of map. This caching strategy
depends on GC's whims and is not entirely reliable, may increase a number of cache misses.
We've decided to create our simple cache with sliding expiration of data.
One may create many cache instances but there is only one global service that tracks expired objects among these instances:
private Cache<String, Object> cache = new Cache<String, Object>();
There is a constructor that specifies an expiration interval in milliseconds for all cached objects:
private Cache<String, Object> cache = new Cache<String, Object>(15 * 60 * 1000)
Access is similar to HashMap:
instance = cache.get("key"); and cache.put("key", instance);
That's all one should know to start use it. Click here to download the Java source of this class. Feel free to use it in your applications.
Yesterday I've read of a new Garbage Collection implementation
G1.
To be honest I was not impressed.
I think Garbage Collection is an evil, or at least its present implementations.
I do not believe in algorithms that in their very core assume a centralized
execution.
On the other hand it's clear it's not in my power to change the status quo. My
lot is to give advices mostly incompetent and ignorable.
I'm waiting for the time when someone will reach the idea to bring some parts of
GC logic out of runtime scope. This will require more VM intelligence,
however will bear its fruits.
JIT or compiler during a static analysis may prove that some objects being
collected may make some of their referring objects unreachable, provided it can
prove that referring objects are not reachable through the other means (e.g.
private field which is not stored in other places). This is close to the ideas
expressed in
Muse on value types in java. It's possible to prepare a garbage graph in
advance before runtime.
In many cases it's also possible to prove that when method's variable goes out
of scope it's not reachable through the other means and may be collected. This
allows to implement a stage of automatic garbage collection when objects that
are proven to be a garbage be immedeately added to a free memory set.
As an example I'm thinking of java's ArrayList object which stores private
array. When ArrayList is reclaimed or resized a reference to the private array
is getting lost and memory can be added to the free set immediately.
This mechanics being integrated as the first stage of GC will make it less
centralized, as I believe many objects will be collected this way.
Suppose you have constructed a sequence of attributes.
How do you access a value of attribute "a"?
Simple, isn't it? It has taken a couple of minutes to find a solution!
<xsl:variable name="attributes" as="attribute()*">
<xsl:apply-templates mode="t:generate-attributes" select="."/>
</xsl:variable>
<xsl:variable name="value" as="xs:string?"
select="$attributes[self::attribute(a)]"/>
Saying
Our project, containing many different xslt files, generates many different
outputs (e.g: code that uses DB2 SQL, or Oracle SQL, or DAO, or some
other flavor of code). This results in usage of
indirect calls to handle different generation options, however to allow xslt
to work we had to create a big main xslt including stylesheets for each kind of
generation. This impacts on a compilation time.
Alternatives
- A big main xslt including everything.
- A big main xslt including everything and using "use-when" attribute.
- Compose main xslt on the fly.
We were eagerly inclined to the second alternative. Unfortunately a limited set of information is available when "use-when" is evaluated. In
particular there are neither parameters nor documents available. Using
Saxon's extensions one may reach only static variables, or access
System.getProperty(). This isn't flexible.
We've decided to try the third alternative.
Solution
We think we have found a nice solution: to create XsltSource,
which receives a list of includes upon construction, and creates an xslt
when getReader() is called.
import java.io.Reader;
import java.io.StringReader;
import javax.xml.transform.stream.StreamSource;
/**
* A source to read generated stylesheet, which includes other stylesheets.
*/
public class XsltSource extends StreamSource
{
/**
* Creates an {@link XsltSource} instance.
*/
public XsltSource()
{
}
/**
* Creates an {@link XsltSource} instance.
* @param systemId a system identifier for root xslt.
*/
public XsltSource(String systemId)
{
super(systemId);
}
/**
* Creates an {@link XsltSource} instance.
* @param systemId a system identifier for root xslt.
* @param includes a list of includes.
*/
public XsltSource(String systemId, String[] includes)
{
super(systemId);
this.includes = includes;
}
/**
* Gets stylesheet version.
* @return a stylesheet version.
*/
public String getVersion()
{
return version;
}
/**
* Sets a stylesheet version.
* @param value a stylesheet version.
*/
public void setVersion(String value)
{
version = value;
}
/**
* Gets a list of includes.
* @return a list of includes.
*/
public String[] getIncludes()
{
return includes;
}
/**
* Sets a list of includes.
* @param value a list of includes.
*/
public void setIncludes(String[] value)
{
includes = value;
}
/**
* Generates an xslt on the fly.
*/
public Reader getReader()
{
String[] includes = getIncludes();
if (includes == null)
{
return super.getReader();
}
String version = getVersion();
if (version == null)
{
version = "2.0";
}
StringBuilder builder = new StringBuilder(1024);
builder.append("<stylesheet version=\"");
builder.append(version);
builder.append("\" xmlns=\"http://www.w3.org/1999/XSL/Transform\">");
for(String include: includes)
{
builder.append("<include href=\"");
builder.append(include);
builder.append("\"/>");
}
builder.append("</stylesheet>");
return new StringReader(builder.toString());
}
/**
* An xslt version. By default 2.0 is used.
*/
private String version;
/**
* A list of includes.
*/
private String[] includes;
}
To use it one just needs to write:
Source source = new XsltSource(base, stylesheets);
Templates templates = transformerFactory.newTemplates(source);
...
where:
base is a base uri for the generated stylesheet; it's used to
resolve relative includes;
stylesheets is an array of hrefs.
Such implementation resembles a dynamic linking when separate parts are bound at
runtime. We would like to see dynamic modules in the next version of xslt.
We strongly object against persistence frameworks in their contemporary meaning.
This includes a long row of names like Hibernate, Java Persistence API, LINQ,
and others.
Consider how one of them describes itself:
...high performance object/relational persistence and query service... lets you
develop persistent classes following object-oriented idiom - including
association, inheritance, polymorphism, composition, and collections... allows you to express queries in its own portable SQL extension...
Sounds good, right?
We think not! Words "own" and "portable" regarding SQL are heard
almost like antonyms. When one creates a unified language (a noble rush, opposed to a
proprietary one (?)) she will inevitably adds a peer, increasing
plurality in the family of languages.
Attempts to create similar layers between data and business logic are not new.
This happens throughout the computer history. IDMS, NATURAL, COOL:GEN these are
20-30 years old examples.
Our reasoning (nothing new).
One need to approach to a design (development and maintainance) from different
perspectives, thus she will understand the question under the design better, and
will estimate skills to accomplish the problem. This will lead to a
modularization e.g: business layer, data layer, appearance; and to development
(maintainance) roles: program developer, database specialist, appearance
speciaist. On a small scale several roles are often fulfilled with one person;
this should not mean, however, that these roles are redundant, one just need to
try on different roles.
Why does one separate business layer and data layer?
Pragmatic perspective. There are databases, which may accomplish most of data
storage tasks in a more efficient way than one may achieve without database.
There are two worlds of database specialists and program developers. These two
layers and roles are facts of reality.
A desiner's goal is to keep these roles separate:
- do not force a database specialist to know the business logic details;
- do not force a program developer to know details on how to organize a storage
in more efficient way, or on how to optimize a particular query;
Modularity helps here. Databases are well equipped to solve these tasks: the data
layer should expose a database API through stored procedures, functions, and
views, while the business layer should use this API to access the database.
With persistence frameworks there are two alterantives:
- still use data layer API;
- rely on a persistence framework.
When the first case is selected then a framework provides almost no aditional
value comparing to traditional database access (jdbc, ado.net, an so on).
When one relies on a framework then a data layer interface virtually disappears
(in fact a framework substitutes this interface). Database specialist has very
little control over tuning the data structure, and optimizing queries, unless
she starts digging in the business code but even then she always cannot control
queries to the database. Moreover database specialist must learn a proprietary
query language.
Result is that a persistence framework erodes a division of responsibilities,
complicating development and maintainance.
We often hear a following explanation on why one should use Persistence
Frameworks: "It eases database vendor switch". This is the most stupid reason to use
Persistence Frameworks! It looks as if they plan to switch vendors once a
day.
A design needs to focus on a modularity. This will make code more robust, faster
and maintainable. This also eases potential migration process, as the data layer
should be migrated only, with minimal (mostly configurational) changes in the
business layer.
We are certain xslt/xquery are the best for web application frameworks from the
design perspective; or, in other words, pipeline frameworks allowing use of
xslt/xquery are preferable way to create web applications.
Advantages are obvious:
-
clear separation of business logic, data, and presentation;
-
richness of languages, allowing to implement simple presentation, complex
components, and sophisticated data binding;
-
built-in extensibility, allowing comunication with business logic, written in
other languages and/or located at different site.
It seems the agitation for a such technologies is like to force an open
door. There are such frameworks out there:
Orbeon Forms, Cocoon, and others.
We're not qualified to judge of their virtues, however...
Look at the current state of affairs. The main players in this area (well, I
have a rather limited vision) push other technologies: JSP/JSF/Faceletes and
alike in the Java world, and ASP.NET in the .NET world. The closest thing they
are providing is xslt servlet/component allowing to generate an output.
Their variants of syntaxis, their data binding techniques allude to similar
paradigms in xslt/xquery:
<select>
<c:forEach var="option" items="#{bean.options}">
<option value="#{option.key}">#{parameter.value}</option>
</c:forEach>
</select>
On the surface, however, we see much more limited (in design and in the
application) frameworks.
And here is a contradiction: how can it be that at present such a good design is
not as popular, as its competitors, at least?
Someone can say, there is no such a problem. You can use whatever you want. You
have a choice! Well, he's lucky. From our perspective it's not that simple.
We're creating rather complex web applications. Their nature isn't important in
this context, but what is important is that there are customers. They are not
thoroughly enlightened in the question, and exactly because of this they prefer
technologies proposed by leaders. It seems, everything convince them: main
stream, good support, many developers who know technology.
There is no single chance to promote anything else.
We believe that the future may change this state, but we're creating at present,
and cannot wait...
Java has no value types: objects allocated inplace, in contrast to objects
referred by a pointer in the heap. This, in my opinion, has a negative impact on
a program design and on a performance.
Incidentally, I've thought of a use case, which can be understood as a value
type by the jvm implementations. Consider an example:
class A
{
private final B b = new B();
}
Implementation may layout class A, in a way that field b will be a content of
an instance of class B itself rather than a pointer to an instance of a class B. This way we
save a pointer and a heap allocation of instance B. Another example:
class C
{
C(int size)
{
values = new D[size];
for(int i = 0; i < values.length; i++)
{
values[i] = new D();
}
}
private final D[] values;
}
Here field values is never a null and each item of array contains a non null
value. Assuming these conditions are kept for a whole life cycle, and values are
not passed by reference, we can consider values as an array of value types.
A use case conditions are following:
- a field contains a non null value;
- the field value is an instance of the field type and not
descendant type;
- if the field is an array, then all elements of the array are
initialized with instances of element type, and not descendant type.
- the field or an element of the array can be assigned through the
operator
new only (field = new T(), array[i] = new T());
- the array field is not passed by reference
(
Arrays.sort(array) never happens).
JIT's allowed to interpret a field as a
value type provided it proves these conditions.
Later...
There is another use case to detect value types:
- a method variable contains no null value, and
- that variable is never stored in any field, and
- no synchronization is used on the instance of value in variable, and
- a value to the variable is assigned through the operator
new only.
A variable can be layed out directly onto the stack, provided a preceding conditions are satisfied.
P.S. In spite that .NET has built in value types, it may use the very same technique to optimize reference types.
Yesterday, incidentally, I've arrived to a problem of a dynamic error during evaluation of a template's match.
This reminded me
SFINAE in C++. There the principle is applied at compile time to find a
matching template.
I think people underestimate the meaning of this behaviour. The effect of
dynamic errors occurring during pattern evaluation is described in the
specification:
Any dynamic error or type error that occurs during the evaluation of a pattern against a particular node is treated as a recoverable error even if the error would not be recoverable under other circumstances. The optional recovery action is to treat the pattern as not matching that node.
This has far reaching consequences, like an error recovery. To illustrate what I'm talking about please look at this simple stylesheet that recovers from "Division by zero.":
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:template match="/">
<xsl:variable name="operator" as="element()+">
<div divident="10" divisor="0"/>
<div divident="10" divisor="2"/>
</xsl:variable>
<xsl:apply-templates select="$operator"/>
</xsl:template>
<xsl:param name="NaN" as="xs:double" select="1.0 div 0"/>
<xsl:template
match="div[(xs:integer(@divident) div xs:integer(@divisor)) ne $NaN]">
<xsl:message select="xs:integer(@divident) div xs:integer(@divisor)"/>
</xsl:template>
<xsl:template match="div">
<xsl:message select="'Division by zero.'"/>
</xsl:template>
</xsl:stylesheet>
Here, if there is a division by zero a template is not matched and other
template is selected, thus second template serves as an error handler for the
first one. Definitely, one may define much more complex construction to be
handled this way.
I never was a purist (meaning doing everything in xslt), however this example
along with
indirect function call, shows that xslt is rather equiped language. One just
need to be smart enough to understand how to do a things.
See also: Try/catch block in xslt 2.0 for Saxon 9.
Among other job activities, we're from time to time asked to check technical skills of job applicants.
Several times we were interviewing people who're far below the
acceptable professional skills. It's a torment for both sides, I should say.
To ease things we have designed a small
questionnaire (specific to our projects) for job applicants. It's sent to an applicant before the
meeting. Even partially answered, this
questionnaire constitutes a good filter against profanes:
<questionnaire> <item>
<question> Please estimate your knowledge in XML Schema
(xsd) as lacking, bad, good, or perfect.
</question> <answer/> </item> <item>
<question> Please estimate your
knowledge in xslt 2.0/xquery 1.0 as lacking, bad, good, or perfect.
</question> <answer/> </item> <item>
<question> Please estimate your
knowledge in xslt 1.0 as lacking, bad, good, or perfect. </question> <answer/> </item> <item>
<question> Please estimate your
knowledge in java as lacking, bad, good, or perfect. </question> <answer/> </item> <item>
<question> Please estimate your
knowledge in c# as lacking, bad, good, or perfect. </question> <answer/> </item> <item>
<question> Please estimate your
knowledge in sql as lacking, bad, good, or perfect. </question> <answer/> </item> <item>
<question> For logical values A, B,
please rewrite logical expression "A and B" using operator "or".
</question> <answer/> </item> <item>
<question> For logical values A, B,
please rewrite logical expression "A = B" using operators "and" and "or".
</question> <answer/> </item> <item>
<question> There are eight balls, with
only one heavier than some other.
What is a minimum number of weighings reveals the
heavier ball?
Please be suspicious about the "trivial" solution.
</question> <answer/> </item> <item>
<question> If A results in B. What one
may say about the reason of B? </question> <answer/> </item> <item>
<question> If only A or B result in C.
What one may say about the reason of C? </question> <answer/> </item> <item>
<question> Please define an xml schema
for this questionnaire. </question> <answer/> </item> <item>
<question> Please create a simple
stylesheet creating an html table based on this questionnaire.
</question> <answer/> </item> <item>
<question> For a table A with columns
B, C, and D, please create an sql query selecting B groupped by C and ordered by
D. </question> <answer/> </item> <item>
<question> For a sequence of xml
elements A with attribute B, please write a stylesheet excerpt creating a
sequence of elements D, grouping elements A with the same string value of
attribute B, sorted in the order of ascending of B. </question> <answer/> </item> <item>
<question> Having a java class A with
properties B and C, please sort a collection of A for B in ascending, and C in
descending order.
</question> <answer/> </item> <item>
<question> What does a following line
mean in c#?
int? x; </question> <answer/> </item> <item>
<question> What is a parser? </question> <answer/> </item> <item>
<question> How to issue an error in the
xml stylesheet? </question> <answer/> </item> <item>
<question> What is a lazy evaluation? </question> <answer/> </item> <item>
<question> How do you understand a
following sentence?
For each line of code there should be a comment.
</question> <answer/> </item> <item>
<question> Have you used any
supplemental information to answer these questions? </question> <answer/> </item> <item>
<question> Have you independently
answered these questions? </question> <answer/> </item> </questionnaire>
We are designing a rather complex xslt 2.0 application, dealing with semistructured
data. We must tolerate with errors during processing, as there are cases where an
input is not perfectly valid (or the program is not designed or ready to get
such an input).
The most typical error is unsatisfied expectation of tree structure like:
<xsl:variable name="element" as="element()" select="some-element"/>
Obviously, dynamic error occurs if a specified element is not present. To
concentrate on primary logic, and to avoid a burden of illegal (unexpected) case
recovery we have created a try/catch API. The goal of such API is:
- to be able to continue processing in case of error;
- report as much as possible useful information related to an error.
Alternatives:
Do not think this is our arrogance, which has turned us to create a custom API. No, we
were looking for alternatives! Please see
[xsl] saxon:try() discussion:
- saxon:try()
function - is a kind of pseudo function, which explicitly relies on lazy
evaluation of its arguments, and ... it's not available in SaxonB;
- ex:error-safe
extension instruction - is far from perfect in its implementation quality, and provides no error location.
We have no other way except to design this feature by ourselves. In our defence one
can say that we are using innovatory approach that encapsulates details of the
implementation behind template and calls handlers indirectly.
Use:
Try/catch API is designed as a template
<xsl:template name="t:try-block"/> calling a "try" handler, and, if
required, a "catch" hanler using
<xsl:apply-templates mode="t:call"/> instruction. Caller passes any
information to these handlers by the means of tunnel parameters.
Handlers must be in a "t:call" mode. The "catch" handler
may recieve following error info parameters:
<xsl:param name="error" as="xs:QName"/>
<xsl:param name="error-description" as="xs:string"/>
<xsl:param name="error-location" as="item()*"/>
where $error-location is a sequence of pairs (location as
xs:string, context as item())*.
A sample:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:t="http://www.nesterovsky-bros.com/xslt/public/"
exclude-result-prefixes="xs t">
<xsl:include href="try-block.xslt"/>
<xsl:template match="/"> <result> <xsl:for-each select="1 to 10">
<xsl:call-template name="t:try-block"> <xsl:with-param name="value" tunnel="yes"
select=". - 5"/> <xsl:with-param name="try" as="element()"> <try/>
</xsl:with-param> <xsl:with-param name="catch" as="element()">
<t:error-handler/> </xsl:with-param> </xsl:call-template> </xsl:for-each>
</result> </xsl:template>
<xsl:template mode="t:call" match="try"> <xsl:param
name="value" tunnel="yes" as="xs:decimal"/>
<value> <xsl:sequence select="1 div
$value"/> </value> </xsl:template>
</xsl:stylesheet>
The sample prints values according to the formula "1/(i - 5)", where "i" is a
variable varying from 1 to 10. Clearly, division by zero occurs when "i" is equal
to 5.
Please notice how to access try/catch API through
<xsl:include href="try-block.xslt"/>. The main logic is
executed in
<xsl:template mode="t:call" match="try"/>, which
recieves parameters using tunneling. A default error handler
<t:error-handler/> is used to report errors.
Error report:
Error: FOAR0001
Description:
Decimal divide by zero
Location:
1. systemID: "file:///D:/style/try-block-test.xslt", line: 34
2. template mode="t:call"
match="element(try, xs:anyType)"
systemID: "file:///D:/style/try-block-test.xslt", line: 30
context node:
/*[1][local-name() = 'try']
3. template mode="t:call"
match="element({http://www.nesterovsky-bros.com/xslt/private/try-block}try, xs:anyType)"
systemID: "file:///D:/style/try-block.xslt", line: 53
context node:
/*[1][local-name() = 'try']
4. systemID: "file:///D:/style/try-block.xslt", line: 40
5. call-template name="t:try-block"
systemID: "file:///D:/style/try-block-test.xslt", line: 17
6. for-each
systemID: "file:///D:/style/try-block-test.xslt", line: 16
context item: 5
7. template mode="saxon:_defaultMode"
match="document-node()"
systemID: "file:///D:/style/try-block-test.xslt", line: 14
context node:
/
Implementation details:
You were not expecting this API to be pure xslt, weren't you? 
Well, you're right, there is an extension function. Its pseudo code is like
this:
function tryBlock(tryItems, catchItems)
{
try
{
execute xsl:apply-templates for tryItems.
}
catch
{
execute xsl:apply-templates for catchItems.
}
}
The last thing. Please get the implementation
saxon.extensions.zip. There you will find sources of the try/catch, and
tuples/maps API.
Right now we're inhabiting in the java world, thus all our tasks are (in)directly
related to this environment.
We want to store stylesheets as resources of java application, and at
the same time to point to these stylesheets without jar qualification. In .NET this idea would not
appear at all, as there are well defined boundaries between assemblies, but java uses
rather different approach. Whenever you have a resource name, it's up to
ClassLoader to find this resource. To exploit this feature we've created
an uri resolver for the stylesheet
transformation. The protocol we use has a following format: "resource:/resource-path".
For example to store stylesheets in the
META-INF/stylesheets folder we use uri "resource:/META-INF/stylesheets/java/main.xslt".
Relative path is resolved naturally. A path "../jxom/java-serializer.xslt"
in previously mentioned stylesheet is resolved to "resource:/META-INF/stylesheets/jxom/java-serializer.xslt".
We've created a small class ResourceURIResolver. You need to
supply an instance of TransformerFactory with this resolver:
transformerFactory.setURIResolver(new ResourceURIResolver());
The class itself is so small that we qoute it here:
import java.io.InputStream;
import java.net.URI;
import java.net.URISyntaxException;
import javax.xml.transform.Source;
import javax.xml.transform.TransformerException;
import javax.xml.transform.URIResolver;
import javax.xml.transform.stream.StreamSource;
/**
* This class implements an interface that can be called by the processor
* to turn a URI used in document(), xsl:import, or xsl:include into a
* Source object.
*/
public class ResourceURIResolver implements URIResolver
{
/**
* Called by the processor when it encounters
* an xsl:include, xsl:import, or document() function.
*
* This resolver supports protocol "resource:".
* Format of uri is: "resource:/resource-path", where "resource-path" is
an
* argument of a {@link ClassLoader#getResourceAsStream(String)} call.
* @param href - an href attribute, which may be relative or absolute.
* @param base - a base URI against which the first argument will be
made
* absolute if the absolute URI is required.
* @return a Source object, or null if the href cannot be resolved, and
* the processor should try to resolve the URI itself.
*/
public Source resolve(String href, String base)
throws TransformerException
{
if (href == null)
{
return null;
}
URI uri;
try
{
if (base == null)
{
uri = new URI(href);
}
else
{
uri = new URI(base).resolve(href);
}
}
catch(URISyntaxException e)
{
// Unsupported uri. return null;
}
if (!"resource".equals(uri.getScheme()))
{
return null;
}
String resourceName = uri.getPath();
if ((resourceName == null) || (resourceName.length() == 0))
{
return null;
}
if (resourceName.charAt(0) == '/')
{
resourceName = resourceName.substring(1);
}
ClassLoader classLoader =
Thread.currentThread().getContextClassLoader();
InputStream stream =
classLoader.getResourceAsStream(resourceName);
if (stream == null)
{
return null;
}
return new StreamSource(stream, uri.toString());
}
}
The project we're working on requires us to generate a java web application from a some ancient language. The code being converted, we have transformed into java classes
(thanks to
jxom),
the presentation is converted into JSF (facelets) pages.
By the way, long before java (.net) platform has been conceived, there were
languages and environments, worked out so good that contemporary client - server
paradigms (like JSF, ASP.NET, and so on) are just their isomorphisms.
The problem we were dealing with recently is JSF databinding for a bean properties
of types java.sql.Date, java.sql.Time, java.sql.Timestamp.
At some point of design we have decided that these types are most natural
representation of data in the original language, as the program's activity is
tightly connected to the database. Later on it's became clear that JSF
databinding does not like these types at all. We were to decide either to fall
back and use java.util.Date as bean property types, or do something with
databinding.
It was not clear what's the best way until we have found an elegant solution,
namely: to create ELResolver to handle bean properties of these types. The solution
works because custom el resolvers are applied before standard resolvers (except
implicit one).
The class
DateELResolver is rather simple extension of the
BeanELResolver. To use it you only need to register it the faces-config.xml:
<faces-config version="1.2"
xmlns="http://java.sun.com/xml/ns/javaee"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/javaee
http://java.sun.com/xml/ns/javaee/web-facesconfig_1_2.xsd">
<application>
<el-resolver>com.nesterovskyBros.jsf.DateELResolver</el-resolver>
</application>
</faces-config>
Does WebSphere MQ library for .NET support a connection pool? This is the question, which ask many .NET developers who deal with IBM WebSphere MQ and write multithread applications. The answer to this question unfortunately is NO… The .NET version supports only individual connection types.
I have compared two MQ libraries Java's and one for .NET, and I’ve found that most of the classes have the same declarations except one crucial for me difference. As opposed to .NET, the Java MQ library provides several classes implementing MQ connection pooling. There is nothing similar in .NET library.
There are few common workarounds for this annoying restriction. One of such workarounds (is recommended by IBM in their “MQ using .NET”) is to keep open one MQ connection per thread. Unfortunately such approach is not working for ASP.NET applications (including web services).
The good news is that starting from service pack 5 for MQ 5.3, and of course for MQ 6.xx they are supporting sharing MQ connections in blocked mode:
“The implementation of WebSphere MQ .NET ensures that, for a given connection (MQQueueManager object instance), all access to the target WebSphere MQ queue manager is synchronized. The default behavior is that a thread that wants to issue a call to a queue manager is blocked until all other calls in progress for that connection are complete.”
This allows creating an MQ connection (pay attention that MQQueueManager object is a wrapper for MQ connection) in one thread and exclusive use it in another thread without side-effects caused by multithreading.
Taking in account this feature, I’ve created a simple MQ connection pool. It’s ease in use. The main class MQPoolManager has only two static methods:
public static MQQueueManager Get(string QueueManagerName, string ChannelName, string ConnectionName);
and
public static void Release(ref MQQueueManager queueManager);
The method Get returns MQ queue manager (either existing from pool or newly created one), and Release returns it to the connection pool. Internally the logic of MQPoolManager tracks expired connections and do some finalizations, if need.
So, you may use one MQ connection pool per application domain without additional efforts and big changes in existing applications.
By the way, this approach has allowed us to optimize performance of MQ part considerably in one of ours projects.
Later on...
To clarify using of MQPoolManager I've decided to show here following code snippet:
MQQueueManager queueManager = MQPoolManager.Get(QueueManagerName, ChannelName, ConnectionName);
try
{
// TODO: some work with MQ here
}
finally
{
MQPoolManager.Release(ref queueManager);
}
// at this point the queueManager is null
In the xslt world there is no widely used custom to think of stylesheet members
as of public and private in contrast to other programming languages like
C++/java/c# where access modifiers are essential. The reason is in complexity of
stylesheets: the less size of code - the easier to developer to keep all details
in memory. Whenever xslt program grows you should modularize
it to keep it manageable.
At the point where modules are introduced one starts thinking of public
interface of module and its implementation details. This separation is
especially important for the template matching as you won't probably want to
match private template just because you've forgotten about some template in
implementation of some module.
To make public or private member distinction you can introduce two namespaces in
your stylesheet, like:
For the private namespace you can use a unique name, e.g. stylesheet name as
part of uri.
The following example is based on
jxom. This stylesheet builds expression from expression tree. Public part
consists only of t:get-expression function, other members are private:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet
version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:t="http://www.nesterovsky-bros.com/public"
xmlns:p="http://www.nesterovsky-bros.com/private/expression.xslt"
xmlns="http://www.nesterovsky-bros.com/download/jxom.zip"
xpath-default-namespace="http://www.nesterovsky-bros.com/download/jxom.zip"
exclude-result-prefixes="xs t p">
<xsl:output method="text" indent="yes"/>
<!--
Entry point. -->
<xsl:template match="/">
<xsl:variable name="expression"
as="element()">
<lt>
<sub>
<mul>
<var name="b"/>
<var name="b"/>
</mul>
<mul>
<mul>
<int>4</int>
<var name="a"/>
</mul>
<var name="c"/>
</mul>
</sub>
<double>0</double>
</lt>
</xsl:variable>
<xsl:value-of
select="t:get-expression($expression)" separator=""/>
</xsl:template>
<!--
Gets
expression.
$element - expression element.
Returns expression tokens.
-->
<xsl:function name="t:get-expression" as="item()*">
<xsl:param name="element"
as="element()"/>
<xsl:apply-templates mode="p:expression" select="$element"/>
</xsl:function>
<!--
Gets binary expression.
$element - assignment expression.
$type - expression type.
Returns expression token sequence.
-->
<xsl:function
name="p:get-binary-expression" as="item()*">
<xsl:param name="element"
as="element()"/>
<xsl:param name="type" as="xs:string"/>
<xsl:sequence
select="t:get-expression($element/*[1])"/>
<xsl:sequence select="' '"/>
<xsl:sequence select="$type"/>
<xsl:sequence select="' '"/>
<xsl:sequence
select="t:get-expression($element/*[2])"/>
</xsl:function>
<!-- Mode
"expression". Empty match. -->
<xsl:template mode="p:expression"
match="@*|node()">
<xsl:sequence select="error(xs:QName('invalid-expression'),
name())"/>
</xsl:template>
<!-- Mode "expression". or. -->
<xsl:template
mode="p:expression" match="or">
<xsl:sequence select="p:get-binary-expression(.,
'||')"/>
</xsl:template>
<!-- Mode "expression". and. -->
<xsl:template
mode="p:expression" match="and">
<xsl:sequence
select="p:get-binary-expression(., '&&')"/>
</xsl:template>
<!-- Mode
"expression". eq. -->
<xsl:template mode="p:expression" match="eq">
<xsl:sequence select="p:get-binary-expression(., '==')"/>
</xsl:template>
<!--
Mode "expression". ne. -->
<xsl:template mode="p:expression" match="ne">
<xsl:sequence select="p:get-binary-expression(., '!=')"/>
</xsl:template>
<!--
Mode "expression". le. -->
<xsl:template mode="p:expression" match="le">
<xsl:sequence select="p:get-binary-expression(., '<=')"/>
</xsl:template>
<!--
Mode "expression". ge. -->
<xsl:template mode="p:expression" match="ge">
<xsl:sequence select="p:get-binary-expression(., '>=')"/>
</xsl:template>
<!--
Mode "expression". lt. -->
<xsl:template mode="p:expression" match="lt">
<xsl:sequence select="p:get-binary-expression(., '<')"/>
</xsl:template>
<!--
Mode "expression". gt. -->
<xsl:template mode="p:expression" match="gt">
<xsl:sequence select="p:get-binary-expression(., '>')"/>
</xsl:template>
<!--
Mode "expression". add. -->
<xsl:template mode="p:expression" match="add">
<xsl:sequence select="p:get-binary-expression(., '+')"/>
</xsl:template>
<!--
Mode "expression". sub. -->
<xsl:template mode="p:expression" match="sub">
<xsl:sequence select="p:get-binary-expression(., '-')"/>
</xsl:template>
<!--
Mode "expression". mul. -->
<xsl:template mode="p:expression" match="mul">
<xsl:sequence select="p:get-binary-expression(., '*')"/>
</xsl:template>
<!--
Mode "expression". div. -->
<xsl:template mode="p:expression" match="div">
<xsl:sequence select="p:get-binary-expression(., '/')"/>
</xsl:template>
<!--
Mode "expression". neg. -->
<xsl:template mode="p:expression" match="neg">
<xsl:sequence select="'-'"/>
<xsl:sequence select="t:get-expression(*[1])"/>
</xsl:template>
<!-- Mode "expression". not. -->
<xsl:template
mode="p:expression" match="not">
<xsl:sequence select="'!'"/>
<xsl:sequence
select="t:get-expression(*[1])"/>
</xsl:template>
<!-- Mode "expression".
parens. -->
<xsl:template mode="p:expression" match="parens">
<xsl:sequence
select="'('"/>
<xsl:sequence select="t:get-expression(*[1])"/>
<xsl:sequence
select="')'"/>
</xsl:template>
<!-- Mode "expression". var. -->
<xsl:template
mode="p:expression" match="var">
<xsl:sequence select="@name"/>
</xsl:template>
<!-- Mode "expression". int, short, byte, long, float, double. -->
<xsl:template
mode="p:expression"
match="int | short | byte | long | float | double">
<xsl:sequence select="."/>
</xsl:template>
</xsl:stylesheet>
Hello again!
To see first part about jxom please read.
I'm back with jxom (Java xml object model). I've finally managed to create an xslt that generates java code from jxom document.
Will you ask why it took as long as a week to produce it?
There are two answers: 1. My poor talents. 2. I've virtually created two implementations.
My first approach was to directly generate java text from xml. I was a truly believer that this is the way. I've screwed things up on that way, as when you're starting to deal with indentations, formatting and reformatting of text you're generating you will see things are not that simple. Well, it was a naive approach.
I could finish it, however at some point I've realized that its complexity is not composable from complexity of its parts, but increases more and more. This is not permissible for a such simple task. Approach is bad. Point.
An alternative I've devised is simple and in fact more natural than naive approach. This is a two stage generation: a) generate sequence of tokens - serializer; b) generate and then print a sequence of lines - streamer.
Tokens (item()*) are either control words (xs:QName), or literals (xs:string).
I've defined following control tokens:
| Token |
Description |
| t:indent |
indents following content. |
| t:unindent |
unindents following content. |
| t:line-indent |
resets indentation for one line. |
| t:new-line |
new line token. |
| t:terminator |
separates token sequences. |
| t:code |
marks line as code (default line type). |
| t:doc |
marks line as documentation comment. |
| t:begin-doc |
marks line as begin of documentation comment. |
| t:end-doc |
marks line as end of documentation comment. |
| t:comment |
marks line as comment. |
Thus an input for the streamer looks like:
<xsl:sequence select="'public'"/> <xsl:sequence select="' '"/> <xsl:sequence select="'class'"/> <xsl:sequence select="' '"/> <xsl:sequence select="'A'"/> <xsl:sequence select="$t:new-line"/> <xsl:sequence select="'{'"/> <xsl:sequence select="$t:new-line"/> <xsl:sequence select="$t:indent"/> <xsl:sequence select="'public'"/> <xsl:sequence select="' '"/> <xsl:sequence select="'int'"/> <xsl:sequence select="' '"/> <xsl:sequence select="'a'"/> <xsl:sequence select="';'"/> <xsl:sequence select="$t:unindent"/> <xsl:sequence select="$t:new-line"/> <xsl:sequence select="'}'"/> <xsl:sequence select="$t:new-line"/>
Streamer receives a sequence of tokens and transforms it in a sequence of lines.
One beautiful thing about tokens is that streamer can easily perform line breaks in order to keep page width, and another convenient thing is that code generating tokens should not track indentation level, as it just uses t:indent, t:unindent control tokens to increase and decrease current indentation.
The way the code is built allows mimic any code style. I've followed my favorite one. In future I'll probably add options controlling code style. In my todo list there still are several features I want to implement, such as line breaker to preserve page width, and type qualification optimizer (optional feature) to reduce unnecessary type qualifications.
Current implementation can be found at jxom.zip. It contains:
| File |
Description |
| java.xsd |
jxom xml schema. |
| java-serializer-main.xslt |
transformation entry point. |
| java-serializer.xslt |
generates tokens for top level constructs. |
| java-serializer-statements.xslt |
generates tokens for statements. |
| java-serializer-expressions.xslt |
generates tokens for expressions. |
| java-streamer.xslt |
converts tokens into lines. |
| DataAdapter.xml |
sample jxom document. |
This was my first experience with xslt 2.0. I feel very pleased with what it can do. The only missed feature is indirect function call (which I do not want to model with dull template matching approach).
Note that in spite that xslt I've built is platform independed I want to point out that I was experimenting with saxon 9. Several times I've relied on efficient tail call implementation (see t:cumulative-integer-sum), which otherwise will lead to xslt stack overflow.
I shall be pleased to see your feedback on the subject.
Hello,
I was not writing for a long time. IMHO: nothing to say? - do not noise!
Nowadays I'm busy with xslt.
Should I be pleased that w3c committee has finally delivered xpath 2.0/xslt 2.0/xquery? There possibly were people who have failed to wait till this happened, and who have died. Be grateful to the fate we have survived!
I'm working now with saxon 9. It's good implementation, however too interpreter like in my opinion. I think these languages could be compiled down to machine/vm code the same way as c++/java/c# do.
To the point. I need to generate java code in xslt. I've done this earlier; that time I dealt with relatively simple templates like beans or interfaces. Now I need to generate beans, interfaces, classes with logic. In fact I should cover almost all java 6 features.
Immediately I've started thinking in terms of java xml object model (jxom). Thus there will be an xml schema of jxom (Am I inventing bicycle? I pray you to point me to an existing schema!) - java grammar as xml. There will be xslts, which generate code according to this schema, and xslt that will serialize jxom documents derectly into java.
This two stage generation is important as there are essentially two different tasks: generate java code, and serialize it down to a text format. Moreover whenever I have jxom document I can manipulate it! And finally this will allow to our team to concentrate efforts, as one should only generate jxom document.
Yesterday, I've found java ANLT grammar, and have converted it into xml schema: java.xsd. It is important to have this xml schema defined, even if no one shall use it except in editor, as it makes jxom generation more formal.
The next step is to create xslt serializer, which is in todo list.
To feel how jxom looks I've created it manually for some simple java file:
// $Id: DataAdapter.java 1122 2007-12-31 12:43:47Z arthurn $ package com.bphx.coolgen.data;
import java.util.List;
/** * Encapsulates encyclopedia database access. */ public interface DataAdapter { /** * Starts data access session for a specified model. * @param modelId - a model to open. */ void open(int modelId) throws Exception;
/** * Ends data access session. */ void close() throws Exception;
/** * Gets current model id. * @return current model id. */ int getModelId();
/** * Gets data objects for a specified object type for the current model. * @param type - an object type to get data objects for. * @return list of data objects. */ List<DataObject> getObjectsForType(short type) throws Exception;
/** * Gets a list of data associations for an object id. * @param id - object id. * @return list of data associations. */ List<DataAssociation> getAssociations(int id) throws Exception;
/** * Gets a list of data properties for an object id. * @param id - object id. * @return list of data properties. */ List<DataProperty> getProperties(int id) throws Exception; }
jxom:
<unit xmlns="http://www.bphx.com/java-1.5/2008-02-07" package="com.bphx.coolgen.data"> <comment>$Id: DataAdapter.java 1122 2007-12-31 12:43:47Z arthurn $</comment> <import package="java.util.List"/> <interface access="public" name="DataAdapter"> <comment doc="true">Encapsulates encyclopedia database access.</comment> <method name="open"> <comment doc="true"> Starts data access session for a specified model. <para type="param" name="modelId">a model to open.</para> </comment> <parameters> <parameter name="modelId"><type name="int"/></parameter> </parameters> <throws><type name="Exception"/></throws> </method> <method name="close"> <comment doc="true">Ends data access session.</comment> <throws><type name="Exception"/></throws> </method> <method name="getModelId"> <comment doc="true"> Gets current model id. <para type="return">current model id.</para> </comment> <returns><type name="int"/></returns> <throws><type name="Exception"/></throws> </method> <method name="getObjectsForType"> <comment doc="true"> Gets data objects for a specified object type for the current model. <para name="param" type="type"> an object type to get data objects for. </para> <para type="return">list of data objects.</para> </comment> <returns> <type> <part name="List"> <typeArgument><type name="DataObject"/></typeArgument> </part> </type> </returns> <parameters> <parameter name="type"><type name="short"/></parameter> </parameters> <throws><type name="Exception"/></throws> </method> <method name="getAssociations"> <comment doc="true"> Gets a list of data associations for an object id. <para type="param" name="id">object id.</para> <para type="return">list of data associations.</para> </comment> <returns> <type> <part name="List"> <typeArgument><type name="DataAssociation"/></typeArgument> </part> </type> </returns> <parameters> <parameter name="id"><type name="int"/></parameter> </parameters> <throws><type name="Exception"/></throws> </method> <method name="getProperties"> <comment doc="true"> Gets a list of data properties for an object id. <para type="param" name="id">object id.</para> <para type="return">list of data properties.</para> </comment> <returns> <!-- Compact form of generic type. --> <type name="List<DataProperty>"/> </returns> <parameters> <parameter name="id"><type name="int"/></parameter> </parameters> <throws><type name="Exception"/></throws> </method> </interface> </unit>
To read about xslt for jxom please follow this link.
C++ Standard Library Issues List, Issue 254I'm tracking this issue already for the several years, and have my unpretentious opinion. To make my arguments clear I'll bring the issue description here.
254. Exception types in clause 19 are constructed from std::string
Section: 19.1 [std.exceptions], 27.4.2.1.1 [ios::failure] Status: Tentatively Ready Submitter: Dave Abrahams Date: 2000-08-01
Discussion:
Many of the standard exception types which implementations are required to throw are constructed with a const std::string& parameter. For example: 19.1.5 Class out_of_range [lib.out.of.range]
namespace std {
class out_of_range : public logic_error {
public:
explicit out_of_range(const string& what_arg);
};
}
1 The class out_of_range defines the type of objects thrown as excep-
tions to report an argument value not in its expected range.
out_of_range(const string& what_arg);
Effects:
Constructs an object of class out_of_range.
Postcondition:
strcmp(what(), what_arg.c_str()) == 0.
There are at least two problems with this:
- A program which is low on memory may end up throwing std::bad_alloc instead of out_of_range because memory runs out while constructing the exception object.
- An obvious implementation which stores a std::string data member may end up invoking terminate() during exception unwinding because the exception object allocates memory (or rather fails to) as it is being copied.
There may be no cure for (1) other than changing the interface to out_of_range, though one could reasonably argue that (1) is not a defect. Personally I don't care that much if out-of-memory is reported when I only have 20 bytes left, in the case when out_of_range would have been reported. People who use exception-specifications might care a lot, though.
There is a cure for (2), but it isn't completely obvious. I think a note for implementors should be made in the standard. Avoiding possible termination in this case shouldn't be left up to chance. The cure is to use a reference-counted "string" implementation in the exception object. I am not necessarily referring to a std::string here; any simple reference-counting scheme for a NTBS would do.
Further discussion, in email:
...I'm not so concerned about (1). After all, a library implementation can add const char* constructors as an extension, and users don't need to avail themselves of the standard exceptions, though this is a lame position to be forced into. FWIW, std::exception and std::bad_alloc don't require a temporary basic_string.
...I don't think the fixed-size buffer is a solution to the problem, strictly speaking, because you can't satisfy the postcondition strcmp(what(), what_arg.c_str()) == 0 For all values of what_arg (i.e. very long values). That means that the only truly conforming solution requires a dynamic allocation.
Further discussion, from Redmond:
The most important progress we made at the Redmond meeting was realizing that there are two separable issues here: the const string& constructor, and the copy constructor. If a user writes something like throw std::out_of_range("foo"), the const string& constructor is invoked before anything gets thrown. The copy constructor is potentially invoked during stack unwinding.
The copy constructor is a more serious problem, becuase failure during stack unwinding invokes terminate. The copy constructor must be nothrow. Curaçao: Howard thinks this requirement may already be present.
The fundamental problem is that it's difficult to get the nothrow requirement to work well with the requirement that the exception objects store a string of unbounded size, particularly if you also try to make the const string& constructor nothrow. Options discussed include:
- Limit the size of a string that exception objects are required to throw: change the postconditions of 19.1.2 [domain.error] paragraph 3 and 19.1.6 [runtime.error] paragraph 3 to something like this: "strncmp(what(), what_arg._str(), N) == 0, where N is an implementation defined constant no smaller than 256".
- Allow the const string& constructor to throw, but not the copy constructor. It's the implementor's responsibility to get it right. (An implementor might use a simple refcount class.)
- Compromise between the two: an implementation is not allowed to throw if the string's length is less than some N, but, if it doesn't throw, the string must compare equal to the argument.
- Add a new constructor that takes a const char*
(Not all of these options are mutually exclusive.)
...
To be honest, I do not understand their (committee members') decisions. It seems they are trying to conceal themselves from the problem virtually proposing to store character buffer in the exception object. In fact the problem is more general, and is related to any exception types that store some data, and which can throw during copy construction. How to avoid problems during copy construction? Well, do not perform activity that can lead to an exception. If copying data can throw, then do not copy it! Thus we have to share data between exception objects.
This logic brought me to a safe exception type design. E.g. exception object should keep refcounted handle to a data object that is shared between type instances.
The only question is: why didn't they even consider this way?
In one of our latest projects (GUI on .NET 2.0) we've felt all the power of .NET globalization, but an annoying thing happened too...
In our case such an annoying thing was sharing of UI culture info between main (UI) thread and all auxiliary threads (threads from ThreadPool, manually created threads etc.). It seems we've fallen into a .NET globalization pitfall.
We guessed that the same as main thread UI culture info for, at least, all asynchronous delegates' calls is used. This is a common mistake, and what's more annoying, there is no a single line in MSDN documentation about this issue.
Let's look closer at this issue. Our application starts on computer with English regional settings ("en-En"), and during application starting we are changing UI culture info to one specified in configuration file: // set the culture from the config file
try
{
Thread.CurrentThread.CurrentUICulture =
new CultureInfo(Settings.Default.CultureName);
}
catch
{
// use the default UI culture info
}
Thus, all the screens of this GUI application will be displayed according with the specified culture. There are also localized strings stored in resource files that are used as log, exception messages etc., which can be displayed from within different threads (e.g. asynchronous delegates' calls).
So, when application is running and even all screens are displayed according with the specified culture, all the exceptions from auxiliary threads still in English. This happened since threads for asynchronous calls are pulled out from ThreadPool, and all these threads were created using default culture.
Conclusion Take care about CurrentUICulture in different threads by yourself, and be careful - there are still pitfalls on this way...
Return a table of numbers from 0 up to a some value. I'm facing this recurring task once in several years. Such periodicity induces me to invent solution once again but using contemporary features.
November 18:
This time I have succeeded to solve the task in one select:
declare @count int;
set @count = 1000;
with numbers(value) as ( select 0 union all select value * 2 + 1 from numbers where value < @count / 2 union all select value * 2 + 2 from numbers where value < (@count - 1) / 2 ) select row_number() over(order by U.V) value from numbers cross apply (select 1 V) U;
Do you have a better solution?
We're building a .NET 2.0 GUI application. A part of a project is a localization. According to advices of msdn we have created *.resx files and sent them to foreign team that performs localization using WinRes tool.
Several of our user controls contained SplitContainer control. We never thought this could present a problem. Unfortunately it is!
When you're trying to open resx for a such user control you're getting:
Eror - Failed to load the resource due to the following error: System.MissingMethodException: Constructor on type 'System.Windows.Forms.SplitterPanel' not found.
We started digging the WinRes.exe (thanks to .NET Reflector) and found the solution: we had to define the name of split container the way that its parent name appeared before (in ascending sort order) than splitter itself.
Say if you have a form "MyForm" and split container "ASplitContainer" then you should rename split container to say "_ASplitContainer". In this case resources are stored as:
| Name |
Parent Name |
| MyForm |
|
| _ASplitContainer |
MyForm |
| _ASplitContainer.Panel1 |
_ASplitContainer |
| _ASplitContainer.Panel2 |
_ASplitContainer |
This makes WinRes happy. 
Today we had spent some time looking for samples of web-services in RPC/encoded style, and we have found a great site http://www.xmethods.com/. This site contains a lot of web-services samples in Document/literal and RPC/encoded styles. We think this link will be useful for both developers and testers.
Yesterday we had ran into following problem: how to retrieve session object from within Java web-service? The crucial point of the problem was that we are generating automatically our web-service from Java bean and this web-service works under WebSphere v5.1.1.
After some time we had spent to find acceptable solution, we have found that it's possible either to implement “session substitution” using EJB SessionBean or somehow to retrieve HttpSession instance.
The first approach has a lot of advantages before the second one, but it requires to implement bunch of EJB objects (session bean itself, home object etc.). The second approach just solve our problem for web-service via HTTP, and no more, but... it requires only few lines to be changed in Java bean code. This second approach is based on implementation of javax.xml.rpc.server.ServiceLifecyle interface for our Java bean. For details take a look at the following article: “Web services programming tips and tricks: Build stateful sessions in JAX-RPC applications“.
Actually, only two additional methods init() and destroy() were implemented. The init() method retrieves (during initialization) an ServletEndpointContext instance that is stored somewhere in private filed of the bean. Further the ServletEndpointContext.getHttpSession() is called in order to get HttpSession. So easy, so quickly - we just was pleased.
|