Friday, 22 October 2010 - Nesterovsky bros

In the previous post we have announced an API to parse a COBOL source into the cobolxom.

We exploited the incremental parser to build a grammar xml tree and then were planning to create an xslt transformation to generate cobolxom.

Now, we would like to declare that such xslt is ready.

At present all standard COBOL constructs are supported, but more tests are required. Preprocessor support is still in the todo list.

You may peek into an examples of COBOL:

Cobol grammar:

And cobolxom:

While we were building a grammar to cobolxom stylesheet we asked ourselves whether the COBOL parsing could be done entirely in xslt. The answer is yes, so who knows it might be that we shall turn this task into pure xslt one. :-)

Friday, 22 October 2010 13:24:31 UTC

Comments [0] -
Announce | Incremental Parser | Thinking aloud | xslt

What's evaluated first: @select or xsl:with-param?

Recently we've seen a code like this:

<xsl:variable name="a" as="element()?" select="..."/> <xsl:variable name="b" as="element()?" select="..."/> <xsl:apply-templates select="$a"> <xsl:with-param name="b" tunnel="yes" as="element()" select="$b"/> </xsl:apply-templates>

It fails with an error: "An empty sequence is not allowed as the value of parameter $b".

What is interesting is that the value of $a is an empty sequence, so the code could potentially work, provided processor evaluated $a first, and decided not to evaluate xsl:with-param.

Whether the order of evaluation of @select and xsl:with-param is specified by the standard or it's an implementation defined?

We asked this question on xslt forum, and got the following answer:

The specification leaves this implementation-defined. Since the values of the parameters are the same for every node processed, it's a reasonably strategy for the processor to evaluate the parameters before knowing how many selected nodes there are, though I guess an even better strategy would be to do it lazily when the first selected node is found.

Well, that's an expected answer. This question will probably induce Michael Kay to introduce a small optimization into the Saxon.

Monday, 18 October 2010 17:58:51 UTC

Comments [0] -
Tips and tricks | xslt

Parse COBOL into cobolxom

Once ago we have created an incremental parser, and now when we have decided to load COBOL sources directly into cobolxom (XML Object Model for a COBOL) the parser did the job perfectly.

The good point about incremental parser is that it easily handles COBOL's grammar.

The whole process looks like this:

incremental parser having a COBOL grammar builds a grammar tree;
we stream this tree into xml;
xslt to transform xml from previous step into cobolxom (TODO).

This is an example of a COBOL:

IDENTIFICATION DIVISION. PROGRAM-ID. FACTORIAL RECURSIVE. DATA DIVISION. WORKING-STORAGE SECTION. 01 NUMB PIC 9(4) VALUE IS 5. 01 FACT PIC 9(8) VALUE IS 0. LOCAL-STORAGE SECTION. 01 NUM PIC 9(4). PROCEDURE DIVISION. MOVE 'X' TO XXX MOVE NUMB TO NUM IF NUMB = 0 THEN MOVE 1 TO FACT ELSE SUBTRACT 1 FROM NUMB CALL 'FACTORIAL' MULTIPLY NUM BY FACT END-IF DISPLAY NUM '! = ' FACT GOBACK. END PROGRAM FACTORIAL.

And a grammar tree:

The last step is to transform tree into cobolxom is in the TODO list.

We have commited COBOL grammar in the same place at SourceForge as it was with XQuery grammar. Solution is now under the VS 2010.

Saturday, 09 October 2010 08:26:23 UTC

Comments [0] -
Announce | Incremental Parser | xslt

Getting regex captures in xslt

Suppose you have a timestamp string, and want to check whether it fits to one of the following formats with leading and trailing spaces:

YYYY-MM-DD-HH.MM.SS.NNNNNN
YYYY-MM-DD-HH.MM.SS
YYYY-MM-DD

We decided to use regex and its capture groups to extract timestamp parts. This left us with only solution: xsl:analyze-string instruction. It took a couple more minutes to reach a final solution:

<xsl:variable name="parts" as="xs:string*"> <xsl:analyze-string select="$value" regex=" ^\s*(\d\d\d\d)-(\d\d)-(\d\d) (-(\d\d)\.(\d\d)\.(\d\d)(\.(\d\d\d\d\d\d))?)?\s*$" flags="x"> <xsl:matching-substring> <xsl:sequence select="regex-group(1)"/> <xsl:sequence select="regex-group(2)"/> <xsl:sequence select="regex-group(3)"/> <xsl:sequence select="regex-group(5)"/> <xsl:sequence select="regex-group(6)"/> <xsl:sequence select="regex-group(7)"/> <xsl:sequence select="regex-group(9)"/> </xsl:matching-substring> </xsl:analyze-string> </xsl:variable> <xsl:choose> <xsl:when test="exists($parts)"> ... </xsl:when> <xsl:otherwise> ... </xsl:otherwise> </xsl:choose>

How would you solve the problem? Is it the best solution?

Friday, 08 October 2010 17:37:44 UTC

Comments [0] -
Tips and tricks | xslt

BigDecimal + JAXB => potential interoperability problems

One of our latest tasks was a conversion of data received from mainframe as an EBCDIC flat file into an XML file in UTF-8 encoding for further processing.

The solution was rather straightforward:

read the source flat file, record-by-record;
serialize each record as an element into target XML file using JAXB.

For reading data from EBCDIC encoded flat file, a good old tool named eXperanto was used. It allows to define C# and/or Java classes that suit for records in the source flat file. Thus we were able to read and convert records from EBCDIC to UTF-8.

The next sub-task was to serialize a Java bean to an XML element. JAXB marshaller was used for this.

Everything was ok, until we had started to test the implementation on real data.

We've realized that some decimal values (BigDecimal fields in Java classes) were serialized in scientific exponential notation. For example: 0.000000365 was serialized as 3.65E-7 and so on.

On the other hand, the target XML was used by another (non Java) application, which expected to receive decimal data, as it was defined in XSD schema (the field types were specified as xs:decimal).

According with W3C datatypes specification:

"...decimal has a lexical representation consisting of a finite-length sequence of decimal digits (#x30-#x39) separated by a period as a decimal indicator. An optional leading sign is allowed. If the sign is omitted, "+" is assumed. Leading and trailing zeroes are optional. If the fractional part is zero, the period and following zero(es) can be omitted. For example: -1.23, 12678967.543233, 100000.00, 210..."

So, the result was predictable, the consumer application fails.

Google search reveals that we deal with a well-known bug: "JAXB marshaller returns BigDecimal with scientific notation in JDK 6". It remains open already an year and half since May 2009, marked as "Fix in progress". We've tested our application with Java version 1.6.0_21-b07, JAXB 2.1.

Although this is rather critical bug that may affect on interoperability of Java applications (e.g. Java web services etc.), its priority was set just as "4-Low".

P.S. as a temporary workaround for this case only(!) we've replaced xs:decimal on xs:double in XSD schema for the target application.

Sunday, 05 September 2010 12:58:23 UTC

Comments [0] -
Java | Tips and tricks

String and StringBuilder in .NET 4

Accidentally we have found that implementation of String and StringBuilder have been considerably revised, while public interface has remained the same.

public sealed class String { private int m_arrayLength; private int m_stringLength; private char m_firstChar; }

This layout is dated to .NET 1.0.

VM, in fact, allocates more memory than that defined in C# class, as &m_firstCharrefers to an inline char buffer.

This way string's buffer length and string's length were two different values, thus StringBuilder used this fact and stored its content in a private string which it modified in place.

In .NET 4, string is different:

public sealed class String { private int m_stringLength; private char m_firstChar; }

Memory footprint of such structure is smaller, but string's length should always be the same as its buffer. In fact layout of string is now the same as layout of char[].

This modification leads to implementation redesign of the StringBuilder.

Earlier, StringBuilder looked like the following:

public sealed class StringBuilder { internal IntPtr m_currentThread; internal int m_MaxCapacity; internal volatile string m_StringValue; }

Notice that m_StringValue is used as a storage, and m_currentThread is used to preserve thread affinity of the internal string value.

Now, guys at Microsoft have decided to implement StringBuilder very differently:

public sealed class StringBuilder { internal int m_MaxCapacity; internal int m_ChunkLength; internal int m_ChunkOffset; internal char[] m_ChunkChars; internal StringBuilder m_ChunkPrevious; }

Inspection of this layout immediately reveals implementation technique. It's a list of chunks. Instance itself references the last chunk (most recently appended), and the previous chunks.

Characteristics of this design are:

while Length is small, performance almost the same as it was earlier;
there are no more thread affinity checks;
Append(), and ToString() works as fast a in the old version.
Insert() in the middle works faster, as only a chuck should be splitted and probably reallocated (copied), instead of the whole string;
Random access is fast at the end O(1) and slows when you approaching the start O(chunk-count).

Personally, we would select a slightly different design:

public sealed class StringBuilder { private struct Chunk { public int length; // Chunk length. public int offset; // Chunk offset. public char[] buffer; } private int m_MaxCapacity; // Alternatively, one can use // private List<Chunk> chunks; private int chunkCount; // Number of used chunks. private Chunk[] chunks; // Array of chunks except last. private Chunk last; // Last chunk. private bool nonHomogenous; // false if all chunks are of the same size. }

This design has better memory footprint, and random access time is O(1) when there were no inserts in the middle (nonHomogenous=false), and O(log(chunkCount)) after such inserts. All other characteristics are the same.

Wednesday, 25 August 2010 09:36:55 UTC

Comments [0] -
Thinking aloud | Tips and tricks

Visual Studio 2010

Earlier, there was a hype about how good VS 2010 is.

When we tried the beta and found that it's noticeably slower than VS 2008, we assumed that release will do better.

Unfortunately, that was an optimistic assumption.

Comparing VS 2008 and VS 2010 we can confirm that later:

eats more memory;
exhibits slower experience with C# projects (often hangs for a long periods and even crushes);
incapable to work with xslt 2.0 files;
has removed Shift+Enter key stroke to insert <br/> in html editor (why?);
has removed visualizer of the StringBuilder (in debugger).

Are we using too outdated hardware (laptops Lenovo T60 2GHz Core Duo/2GB RAM)? Other reason?

Wednesday, 25 August 2010 06:52:39 UTC

Comments [0] -

C# XOM Update

We have updated C# XOM (csharpxom) to support C# 4.0 (in fact there are very few changes).

From the grammar perspective this includes:

Dynamic types;
Named and optional arguments;
Covariance and contravariance of generic parameters for interfaces and delegates.

Dynamic type, C#:

dynamic dyn = 1;

C# XOM:

<var name="dyn"> <type name="dynamic"/> <initialize> <int value="1"/> </initialize> </var>

Named and Optional Arguments, C#:

int Increment(int value, int increment = 1) { return value + increment; } void Test() { // Regular call. Increment(7, 1); // Call with named parameter. Increment(value: 7, increment: 1); // Call with default. Increment(7); }

C# XOM:

<method name="Increment"> <returns> <type name="int"/> </returns> <parameters> <parameter name="value"> <type name="int"/> </parameter> <parameter name="increment"> <type name="int"/> <initialize> <int value="1"/> </initialize> </parameter> </parameters> <block> <return> <add> <var-ref name="value"/> <var-ref name="increment"/> </add> </return> </block> </method> <method name="Test"> <block> <expression> <comment>Regular call.</comment> <invoke> <method-ref name="Increment"/> <arguments> <int value="7"/> <int value="1"/> </arguments> </invoke> </expression> <expression> <comment>Call with named parameter.</comment> <invoke> <method-ref name="Increment"/> <arguments> <argument name="value"> <int value="7"/> </argument> <argument name="increment"> <int value="1"/> </argument> </arguments> </invoke> </expression> <expression> <comment>Call with default.</comment> <invoke> <method-ref name="Increment"/> <arguments> <int value="7"/> </arguments> </invoke> </expression> </block> </method>

Covariance and contravariance, C#:

public interface Variance<in T, out P, Q> { P X(T t); }

C# XOM:

<interface access="public" name="Variance"> <type-parameters> <type-parameter name="T" variance="in"/> <type-parameter name="P" variance="out"/> <type-parameter name="Q"/> </type-parameters> <method name="X"> <returns> <type name="P"/> </returns> <parameters> <parameter name="t"> <type name="T"/> </parameter> </parameters> </method> </interface>

Other cosmetic fixes were also introduced into Java XOM (jxom), COBOL XOM (cobolxom), and into sql XOM (sqlxom).

The new version is found at languages-xom.zip.

See also Xslt Heisenbug

Thursday, 15 July 2010 08:22:13 UTC

Comments [0] -
Thinking aloud | Tips and tricks | xslt

ASP.NET two way databinding and OnClientInit in DataBindExtender

It does not matter that DataBindExtender looks not usual in the ASP.NET. It turns to be so handy that built-in data binding is not considered to be an option.

After a short try, you uderstand that people tried very hard and have invented many controls and methods like ObjectDataSource, FormView, Eval(), and Bind() with outcome, which is very specific and limited.

In contrast DataBindExtender performs:

Two or one way data binding of any business data property to any control property;
Converts value before it's passed to the control, or into the business data;
Validates the value.

See an example:

<asp:TextBox id=Field8 EnableViewState="false" runat="server"></asp:TextBox> <bphx:DataBindExtender runat='server' EnableViewState='false' TargetControlID='Field8' ControlProperty='Text' DataSource='<%# Import.ClearingMemberFirm %>' DataMember='Id' Converter='<%# Converters.AsString("XXXXX", false) %>' Validator='<%# (extender, value) => Functions.CheckID(value as string) %>'/>

Here, we beside a regualar two way data binding of a property Import.ClearingMemberFirm.Id to a property Field8.Text, format (parse) Converters.AsString("XXXXX", false), and finally validate an input value with a lambda function (extender, value) => Functions.CheckID(value as string).

DataBindExtender works also well in template controls like asp:Repeater, asp:GridView, and so on. Having your business data available, you may reduce a size of the ViewState with EnableViewState='false'. This way DataBindExtender approaches page development to a pattern called MVC.

Recently, we have found that it's also useful to have a way to run a javascript during the page load (e.g. you want to attach some client side event, or register a component). DataBindExtender provides this with OnClientInit property, which is a javascript to run on a client, where this refers to a DOM element:

... OnClientInit='$addHandler(this, "change", function() { handleEvent(event, "Field8"); } );'/>

allows us to attach onchange javascript event to the asp:TextBox.

So, meantime we're very satisfied with what we can achieve with DataBindExtender. It's more than JSF allows, and much more stronger and neater to what ASP.NET has provided.

The sources can be found at DataBindExtender.cs

Sunday, 11 July 2010 07:07:03 UTC

Comments [4] -
ASP.NET | Thinking aloud | Tips and tricks

Praise to C#'s var

Lately, we have found that we've accustomed to declare C#'s local variables using var:

var exitStateName = exitState == null ? "" : exitState.Name; var rules = Environment.NavigationRules; var rule = rules[caller.Name]; var flow = rule.NavigationCases[procedure.OriginExitState];

This makes code cleaner, and in presense of good IDE still allows to figure out types very easely.

We, howerer, found that var tends to have exceptions in its uses. E.g. for some reason most of boolean locals in our code tend to remain explicit (matter of taste?):

bool succeed = false; try { ... succeed = true; } finally { if (!succeed) { ... } }

Also, type often survives in for, but not in foreach:

for(int i = 0; i < sourceDataMapping.Length; ++i) { ... } foreach(var property in properties) { ... }

In addition var has some limitations, as one cannot easily initialize such local with null. From the following we prefer the first approach:

IWindowContext context = null;
var context = (IWindowContext)null;
var context = null as IWindowContext;
var context = default(IWindowContext);

We might need to figure out a consistent code style as for var. It might be like that:

Numeric, booleans and string locals should use explicit type;
Try to avoid locals initialized with null, or without initializer, or use type if such variable cannot be avoided;
Use var in all other cases.

Another code style could be like that:

For the consistency, completely avoid the use of keyword var.

Monday, 05 July 2010 09:09:26 UTC

Comments [0] -
Thinking aloud | Tips and tricks

Xslt serializer for ASPX output

Recently we were raising a question about serialization of ASPX output in xslt.

The question went like this:

What's the recommended way of ASPX page generation?
E.g.:

------------------------
<%@ Page AutoEventWireup="true"
   CodeBehind="CurMainMenuP.aspx.cs"
   EnableSessionState="True"
   Inherits="Currency.CurMainMenuP"
   Language="C#"
   MaintainScrollPositionOnPostback="True"
   MasterPageFile="Screen.Master" %>

<asp:Content ID="Content1" runat="server" ContentPlaceHolderID="Title">CUR_MAIN_MENU_P</asp:Content>

<asp:Content ID="Content2" runat="server" ContentPlaceHolderID="Content">
<span id="id1222146581" runat="server"
    class="inputField system UpperCase" enableviewstate="false">
    <%# Dialog.Global.TranCode %>
</span>
...
------------------------

Notice aspx page directives, data binding expessions, and prefixed tag names without namespace declarations.

There was a whole range of expected answers. We, however, looked whether somebody have already dealed with the task and has a ready solution at hands.

In general it seems that xslt community is very angry about ASPX: both format and technology. Well, put this aside.

The task of producing ASPX, which is almost xml, is not solvable when you're staying with pure xml serializer. Xslt's xsl:character-map does not work at all. In fact it looks as a childish attempt to address the problem, as it does not support character escapes but only grabs characters and substitutes them with strings.

We have decided to create ASPX serializer API producing required output text. This way you use <xsl:output method="text"/> to generate ASPX pages.

With this goal in mind we have defined a little xml schema to describe ASPX irregularities in xml form. These are:

<xs:element name="declared-prefix"> - to describe known prefixes, which should not be declared;
<xs:element name="directive"> - to describe directives like <%@ Page %>;
<xs:element name="content"> - a transparent content wrapper;
<xs:element name="entity"> - to issue xml entity;
<xs:element name="expression"> - to describe aspx expression like <%# Eval("A") %>;
<xs:element name="attribute"> - to describe an attribute of the parent element.

This approach greately simplified for us an ASPX generation process.

The API includes:

aspx.xsd - an xml schema for the ASPX elements;
aspx-serializer.xslt - a serializer API;
aspx-test.xslt - a test stylesheet (any xml input is used).

Tuesday, 22 June 2010 10:25:41 UTC

Comments [0] -
Announce | ASP.NET | Thinking aloud | Tips and tricks | xslt

Validators in ASP.NET

In previous posts we were crying about problems with JSF to ASP.NET migration. Let's point to another one.

Consider that you have an input field, whose value should be validated:

Here we have an input control, whose value is bound to Import.AaControlAttributes.UserEnteredTrancode property. But what is missed is a value validation. Somewhere we have a function that could answer the question whether the value is valid. It should be called like this: Functions.IsTransactionCodeValid(value).

Staying within standard components we can use a custom validator on the page:

<asp:CustomValidator runat="server" ControlToValidate="id1222146409" OnServerValidate="ValidateTransaction" ErrorMessage="Invalid transaction code."/>

and add the following code-behind:

protected void ValidateTransaction(object source, ServerValidateEventArgs args) { args.IsValid = Functions.IsTransactionCodeValid(args.Value); }

This approach works, however it pollutes the code-behind with many very similar methods. The problem is that the validation rules in most cases are not property of page but one of data model. That's why page validation methods just forward check to somewhere.

While thinking on how to simplify the code we have came up with more conscious and short way to express validators, namely using lambda functions. To that end we have introduced a Validator property of type ValueValidator over DataBindExtender. Where

/// <summary>A delegate to validate values.</summary> /// <param name="extender">An extender instance.</param> /// <param name="value">A value to validate.</param> /// <returns>true for valid value, and false otherwise.</returns> public delegate bool ValueValidator(DataBindExtender extender, object value); /// <summary>An optional data member validator.</summary> public virtual ValueValidator Validator { get; set; }

With this new property the page markup looks like this:

<input type="text" runat="server" ID="id1222146409" maxlength="4"/> <bphx:DataBindExtender runat="server" TargetControlID="id1222146409" ControlProperty="Value" DataSource="<%# Import.AaControlAttributes %>" DataMember="UserEnteredTrancode" Validator='<%# (extender, value) => Functions.IsTransactionCodeValid(value as string) %>' ErrorMessage="Invalid transaction code."/>

This is almost like an event handler, however it allowed us to call data model validation logic without unnecessary code-behind.

The updated DataBindExtender can be found at DataBindExtender.cs.

Tuesday, 15 June 2010 06:36:44 UTC

Comments [0] -
ASP.NET | Thinking aloud | Tips and tricks

Data dinding script injection in ASP.NET

Being well behind of the latest news and traps of the ASP.NET, we're readily falling on each problem. :-)

This time it's a script injection during data binding.

In JSF there is a component to output data called h:outputText. Its use is like this:

<span jsfc="h:outputText" value="#{myBean.myProperty}"/>

The output is a span element with data bound value embeded into content. The natural alternative in ASP.NET seems to be an asp:Label control:

<asp:Label runat="server" Text="<%# Eval("MyProperty") %>"/>

This almost works except that the h:outputText escapes data (you may override this and specify attribute escape="false"), and asp:Label never escapes the data.

This looks as a very serious omission in ASP.NET (in fact very close to a security hole). What are chances that when you're creating a new page, which uses data binding, you will not forget to fix code that wizard created for you and to change it to:

<asp:Label runat="server" Text="<%# Server.HtmlEncode(Eval("MyProperty")) %>"/>

Eh? Think what will happen if MyProperty will return a text that looks like a script (e.g.: <script>alert(1)</script>), while you just wanted to output a label?

To address the issue we've also introduced a property Escape into DataBindExtender. So at present we have a code like this:

<asp:Label runat="server" ID="MyLabel"/> <bphx:DataBindExtender runat="server" TargetControlID="MyLabel" ControlProperty="Text" ReadOnly="true" Escape="true" DataSource="<%# MyBean %>" DataMember="MyProperty"/>

Thursday, 10 June 2010 13:06:19 UTC

Comments [0] -
ASP.NET | Thinking aloud | Tips and tricks

A DataBindExtender: two way databinding in ASP.NET

After struggling with ASP.NET data binding we found no other way but to introduce our little extender control to address the issue.

We were trying to be minimalistic and to introduce two way data binding and to support data conversion. This way extender control (called DataBindExtender) have following page syntax:

<asp:TextBox id=TextBox1 runat="server"></asp:TextBox> <cc1:DataBindExtender runat="server" DataSource="<%# Data %>" DataMember="ID" TargetControlID="TextBox1" ControlProperty="Text" />

Two way data binding is provided with DataSource object (notice data binding over this property) and a DataMember property from the one side, and TargetControlID and ControlProperty from the other side. DataBindExtender supports Converter property of type TypeConverter to support custom converters.

DataBindExtender is based on AjaxControlToolkit.ExtenderControlBase class and implements System.Web.UI.IValidator. ExtenderControlBase makes implementation of extenders extremely easy, while IValidator plugs natuarally into page validation (Validate method, Validators collections, ValidationSummary control).

The good point about extenders is that they are not visible in designer, while it exposes properties in extended control itself. The disadvantage is that it requires Ajax Control Toolkit, and also ScriptManager component of the page.

To simplify the use DataBindExtender gets data from control and puts the value into data source in Validate method, and puts data into control in OnPreRender method; thus no specific action is required to perform data binding.

Source for the DataBindExtender is DataBindExtender.cs.

Saturday, 05 June 2010 11:22:03 UTC

Comments [0] -
ASP.NET | Thinking aloud | Tips and tricks