Tuesday, 11 February 2014 - Nesterovsky bros

Tuesday, 11 February 2014

These are initial positions for this writing:

SQL Server allows to execute dynamic SQL.
Dynamic SQL is useful and often unavoidable, e.g. when you have to filter or order data in a way that you cannot code efficiently in advance.
Dynamic SQL has proven to be a dangerous area, as with improper use it can open hole in a security.

In general nothing stops you from building and then excuting of SQL string. Our goal, however, is to define rules that make work with dynamic SQL is more managable and verifiable.

Here we outline these rules, and then give some examples and tips.

Rule #1. Isolate dynamic SQL

Put all logic related to building of dynamic SQL into a separate function.
We usually define a separate scheme Dynamic, and define functions like Dynamic.GetSQL_XXX(params).
This makes it simple to perform code review.

Rule #2. Xml as parameters

Use xml type to pass parameters to a function that builds dynamic SQL.
In many cases dynamic SQL depends on variable number of parameters (like a list of values to check against).
Xml fits here to represent structured information.
On a client (e.g. in C# or java) you can define a class with all parameters, populate an instance and serialize it to an xml.

Rule #3. XQuery as template language

Use XQuery to define SQL template and to generate SQL tree from the input parameters.
Here is an example of such XQuery:

@data.query(' <sql> select T.* from Data.Ticket T where { for $ticketID in data/ticketID return <sql>(T.TicketID = <int>{$ticketID}</int>) and </sql> } (1 = 1) </sql>')

You can see that output is an xml with sql element to represent literal SQL, and int element to represent integer literal.

In fact whole output schema can be defined like this:

<xs:schema elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="sql"/> <xs:element name="name"/> <xs:element name="string" nillable="true"/> <xs:element name="int" nillable="true"/> <xs:element name="decimal" nillable="true"/> <xs:element name="date" nillable="true"/> <xs:element name="time" nillable="true"/> <xs:element name="datetime" nillable="true"/> </xs:schema>

where sql is to represent literal content, name to represent a name, and other elements to represent different literal values.

Rule #4. Escape literals

Use function Dynamic.ToSQL(@template) to build final SQL text.
Here we quote the definition:

-- Builds a text of SQL function for an sql template. create function Dynamic.ToSQL ( -- SQL template. @template xml ) returns nvarchar(max) with returns null on null input as begin return ( select case when N.Node.exist('*[xs:boolean(@xsi:nil)]') = 1 then 'null' when N.Node.exist('self::int') = 1 then isnull(N.Node.value('xs:int(.)', 'nvarchar(max)'), '# int #') when N.Node.exist('self::string') = 1 then 'N''' + replace ( N.Node.value('.', 'nvarchar(max)'), '''', '''''' ) + '''' when N.Node.exist('self::name') = 1 then isnull ( quotename(N.Node.value('.', 'nvarchar(128)'), '['), '# name #' ) when N.Node.exist('self::datetime') = 1 then isnull ( 'convert(datetime2, ''' + N.Node.value('xs:dateTime(.)', 'nvarchar(128)') + ''', 126)', '# datetime #' ) when N.Node.exist('self::date') = 1 then isnull ( 'convert(date, ''' + N.Node.value('xs:date(.)', 'nvarchar(128)') + ''', 126)', '# date #' ) when N.Node.exist('self::time') = 1 then isnull ( 'convert(time, ''' + N.Node.value('xs:time(.)', 'nvarchar(128)') + ''', 114)', '# time #' ) when N.Node.exist('self::decimal') = 1 then isnull ( N.Node.value('xs:decimal(.)', 'nvarchar(128)'), '# decimal #' ) when N.Node.exist('self::*') = 1 then '# invalid template #' else N.Node.value('.', 'nvarchar(max)') end from @template.nodes('//sql/node()[not(self::sql)]') N(Node) for xml path(''), type ).value('.', 'nvarchar(max)'); end;

Now, we want to stress that this function plays an important role in prevention of the SQL injection, as it escapes literals from the SQL tree.

Rule #5 (optional). Collect data

Use SQL to collect additional data required to build dynamic SQL. Here is an example of how we get a Ticket by StatusID, while on input we receive a StatusName:

create function Dynamic.GetSQL_GetTicketByStatus(@data xml) returns nvarchar(max) as begin set @data = ( select @data, ( select T.StatusID from @data.nodes('/data/status') N(Node) inner join Metadata.Status T on T.StatusName = Node.value('.', 'nvarchar(128)') for xml auto, type, elements ) for xml path('') ); return Dynamic.ToSQL ( @data.query (' <sql> select T.* from Data.Ticket T where T.Status in ({ for $status in /T/StatusID return <sql><int>{$status}</int>,</sql> } null) </sql> ') ); end;

Notice code in red that collects some more data before calling XQuery.

Rule #6. Execute

The final step is to call dynamic SQL.
This is done like this:

-- build declare @sql nvarchar(max) = Dynamic.GetSQL_GetTicket(@data); -- execute execute sp_executesql @sql -- {, N'@parameter_name data_type [ OUT | OUTPUT ][ ,...n ]' } -- { , [ @param1 = ] 'value1' [ ,...n ] } with result sets ( ( TicketID int not null, CreatedAt datetime2 not null, Summary nvarchar(256) null, Status int, Severity int, DeadLineAt datetime2 null ) );

Notice that the use of dynamic SQL does not prevent static parameters.
Notice also that with result sets clause is used to specify output.

Example. Tickets system

Let's assume you're dealing with a tickets system (like Bugzilla), and you have a table Data.Ticket to describe tickets. Assume that DDL for this table is like this:

create table Data.Ticket ( TicketID bigint not null primary key, CreatedAt datetime2 not null, Summary nvarchar(128) null, Status int not null, UpdatedAt datetime2(7) not null )

Suppose you have to build C# code to search different tickets, where Entity Framework is used to access the database.
Search should be done by a range of CreatedAt, a range of UpdatedAt, Summary, or by different Status values. It should be possible to order results in different ways.

We start out solution from the C# and define classes for a request:

public enum Direction { Asc, Desc } public struct Order { public string Field { get; set; } public Direction Direction {get; set; } } public class DateRange { public DateTime? From { get; set; } // This property is to omit From element if value is null. // See rules for xml serialization. public bool FromSpecified { get { return From != null; } } public DateTime? To { get; set; } public bool ToSpecified { get { return To != null; } } } public class TicketsRequest { public DateRange CreatedAt { get; set; } public string Summary { get; set; } public DateRange UpdatedAt { get; set; } [XmlElement] public Order[] Order { get; set; } [XmlElement] public int[] Status { get; set; } }

Notice that we're going to use XmlSerializer to convert request to xml and then to pass parameter into EF's model. Here is utility method to perform such conversion:

public static string ToXmlString<T>(T value) { if (value == null) { return null; } var serializer = new XmlSerializer(typeof(T)); var builder = new StringBuilder(); var writer = XmlWriter.Create( builder, new XmlWriterSettings { OmitXmlDeclaration = true, Indent = false }); serializer.Serialize(writer, value); writer.Flush(); return builder.ToString(); }

Now we proceed to the database and define a procedure that runs the search:

-- Gets tickets. create procedure Data.GetTickets ( -- A query parameters. @params xml ) as begin set nocount on; -- This is for EF to guess type of result. if (1 = 0) begin select TicketID, CreatedAt, Summary, Status, UpdatedAt from Data.Ticket; end; declare @sql nvarchar(max) = Dynamic.GetSQL_GetTickets(@params); execute sp_executesql @sql with result sets ( ( TicketID int not null, CreatedAt datetime2 not null, Summary nvarchar(256) null, Status int, UpdatedAt datetime2 null ) ); end;

Switch back to C#, import the Data.GetTickets into the EF model, and create a search method:

public IEnumerable<Ticket> GetTickets(TicketsRequest request) { var model = new Model(); return model.GetTickets(ToXmlString(request)); }

The last ingredient is Dynamic.GetSQL_GetTickets() function.

create function Dynamic.GetSQL_GetTickets(@data xml) returns nvarchar(max) as begin return Dynamic.ToSQL ( @data.query(' <sql> select T.TicketID, T.CreatedAt, T.Summary, T.Status, T.UpdatedAt from Data.Ticket T where { for $range in */CreatedAt return ( for $date in $range/From return <sql> (T.CreatedAt >= <datetime>{$date}</datetime>) and </sql>, for $date in $range/To return <sql> (<datetime>{$date}</datetime> > T.CreatedAt) and </sql> ), for $range in */UpdatedAt return ( for $date in $range/From return <sql> (T.UpdatedAt >= <datetime>{$date}</datetime>) and </sql>, for $date in $range/To return <sql> (<datetime>{$date}</datetime> > T.UpdatedAt) and </sql> ), for $summary in */Summary return <sql> (T.Summary like <string>{$summary}</string>) and </sql>, if (*/Status) then <sql> T.Status in ({ for $status in */Status return <sql><int>{$status}</int>, </sql> } null) and </sql> else () } (1 = 1) order by { for $order in */Order [ Field = ("TicketID", "CreatedAt", "Summary", "UpdatedAt", "Status") ] return <sql> <name>{$order/Field}</name> {" desc"[$order[Direction = "Desc"]]}, </sql> } (select null) </sql> ') ); end;

SQL text from Dynamic.GetSQL_GetTickets()

Consider now SQL text produced by this function. For an input:

<TicketsRequest> <CreatedAt> <From>2014-01-01T00:00:00</From> </CreatedAt> <Summary>hello%</Summary> <Order> <Field>Status</Field> <Direction>Desc</Direction> </Order> <Status>1</Status> <Status>3</Status> </TicketsRequest>

the output is:

select T.TicketID, T.CreatedAt, T.Summary, T.Status, T.UpdatedAt from Data.Ticket T where (T.CreatedAt >= convert(datetime2, '2014-01-01T00:00:00', 126)) and (T.Summary like N'hello%') and T.Status in (1, 3, null) and (1 = 1) order by [Status] desc, (select null)

Though the text is not formatted as we would like, it's perfectly valid SQL.

Tips for building XQuery templates

What is called XQuery in SQL Server is in fact a very limited subset of XQuery 1.0. Microsoft clearly states this fact. What is trivial in XQuery is often impossible or ugly in XQuery of SQL Server.

Nevertheless XQuery in SQL Server works rather well as SQL template language. To make it most efficient, however, you should learn several tips.

Tip #1. Where clause

In template you might want to build a where clause:

<sql> select ... where { if (...) then <sql>...</sql> else () } </sql>

and it might happen that for a certain input a condition under where might collapse, and you will be left with where keyword without a real condition, which is wrong. A simple work around is to always add some true condition under ther where like this:

<sql> select ... where { if (...) then <sql>... and </sql> else () } (1 = 1) </sql>

Tip #2. "in" expression

If you want to generate "in" expression like this:

value in (item1, item2,...)

then you might find that it's much easier generate equivalent a code like this:

value in (item1, item2,..., null).

Here is a XQuery to generate such template:

value in ({ for $item in ... return <sql><int>{$item}</int>, </sql> } null) and

Tip #3. Order by

You can conclude an order by clause built from a data with a dummy expression like this:

order by { for $item in ... return <sql> <name>{$item/Field}</name> {" desc"[$item/Direction = "Desc"]}, </sql> } (select null)

Alternatively you can use first column from a clustered index.

Tip #4. Group by

In a group by clause we cannot introduce terminator expression as it was with order by, so a code is a less trivial:

{ let $items := ... return if ($items) then <sql> group by <name>{$items[1]}</name> { for $item in $items[position() > 1] return <sql>, <name>{$item}</name></sql> } </sql> else () }

In fact similar logic may work with order by.

Tip #5. Escape literals

It's crusial not to introduce SQL injection while building SQL. Thus use:

<int>{...}</int> - for literal int;
<decimal>{...}</decimal> - for literal decimal;
<string>{...}</string> - for literal string;
<datetime>{...}</datetime> - for literal datetime2;
<date>{...}</date> - for literal date;
<time>{...}</time> - for literal time;
<name>{...}</name> - for a name to quote.

Note that you can use xsi:nil, so <int xsi:nil="true"/> means null.

If you generate a field name from an input data then it worth to validate it against a list of available names.

Tip #6. Validate input.

It worth to define xml schema for an input xml, and to validate parameters against it.
This makes code more secure, and also adds a documentation.

Tip #7. Don't abuse dynamic SQL

There are not too many cases when you need a dynamic SQL. Usually SQL engine knows how to build a good execution plan. If your query contains optional conditions then you can write it a way that SQL Server can optimize, e.g.:

select * from T where ((@name is null) or (Name = @name)) and ((@date is null) or (Date = @date)) option(recompile)

Tuesday, 11 February 2014 09:48:07 UTC

Comments [0] -
SQL Server puzzle | Thinking aloud | Tips and tricks

Wednesday, 22 January 2014

Shared state

Consider how would you implement Style object in the HTML DOM?

These are some characteristics of that object:

It has a long list of properties, e.g. in IE 11 there are more than 300 properties over a style object.
Any specific instance usually have only several properties assigned.
Reads of properties are much more frequent than writes. In fact style often stays unchanged after initialization.
DOM contains many style instances (often thousands).
The number of distinct instances in terms of values of properties is moderate (usually dozens).

Here is how would we approached to such an object.

1. Styles are sparse objects, thus there is no point to implement plain class with all those properties, as it's wasteful.

We would rather use two techniques to keep style's state:

A dictionary of properties with their values;
An aggregation of objects, where all properies are grouped into families, each group is defined by a separate type, and a style's state is an aggregation of that groups.

A current style of an element is an aggregation of styles of ancestor element. It can either by dynamic or be fused into a single style instance.

2. Make style's state immutable, and share all these states among all style instances.

In this implementation property write turns into a state transition operation: state = set(state, property, value). Thus no state is modified but replaced with other state that corresponds to a required change.

If state is seen as a dictionary then API may look like this :

public class State<K, V> { // Gets shared dictionary for an input dictionary. public IDictionary<K, V> Get(IDictionary<K, V> dictionary); // Gets a shared dictionary for an input dictionary with key set to a value. public IDictionary<K, V> Set(IDictionary<K, V> dictionary, K key, V value); // Gets a shared dictionary for an input dictionary. public IDictionary<K, V> Remove(IDictionary<K, V> dictionary, K key); // Gets typed value. public T Get<T>(IDictionary<K, V> dictionary, K key) where T: V { V value; if ((dictionary == null) || !dictionary.TryGetValue(key, out value)) { return default(T); } return (T)value; } // Sets or removes a typed value. // dictionary can be null. // null returned if output dictionary would be empty. public IDictionary<K, V> Set<T>(IDictionary<K, V> dictionary, K key, T value) where T : V { return value == null ? Remove(dictionary, key) : Set(dictionary, key, (V)value); } }

States can be cached. Provided the cache keeps states in a weak way, no unsued state will be stored for a long time. We may use weak table of dictionary to dictionary WeakTable<Dictionary<K, V>, Dictionary<K, V>> as a storage for such a cache. All required API is described in the WeakTable and Hash Code of Dictionary posts.

3. Style can be implemented as a structure with shared state as a storage. Here is a scetch:

[Serializable] public struct Style { // All properties. public enum Property { Background, BorderColor, BorderStyle, Color, FontFamily, FontSize, // ... } public int? Background { get { return states.Get<int?>(state, Property.Background); } set { state = states.Set(state, Property.Background, value); } } public int? BorderColor { get { return states.Get<int?>(state, Property.BorderColor); } set { state = states.Set(state, Property.BorderColor, value); } } public string BorderStyle { get { return states.Get<string>(state, Property.BorderStyle); } set { state = states.Set(state, Property.BorderStyle, value); } } public int? Color { get { return states.Get<int?>(state, Property.Color); } set { state = states.Set(state, Property.Color, value); } } public string FontFamily { get { return states.Get<string>(state, Property.FontFamily); } set { state = states.Set(state, Property.FontFamily, value); } } public double? FontSize { get { return states.Get<double?>(state, Property.FontSize); } set { state = states.Set(state, Property.FontSize, value); } } // ... [OnDeserialized] private void OnDeserialized(StreamingContext context) { state = states.Get(state); } // A state. private IDictionary<Property, object> state; // A states cache. private static readonly State<Property, object> states = new State<Property, object>(); }

Note that:

default state is a null dictionary;
states are application wide shared.

The following link is our implementation of State<K, V> class: State.cs.

Here we have outlined the idea of shared state object, and how it can be applied to sparse mostly immutable objects. We used HTML style as an example of such an object. Shared state object may work in many other areas, but for it to shine its use case should fit to the task.

Wednesday, 22 January 2014 19:43:25 UTC

Comments [0] -
.NET | Thinking aloud | Tips and tricks

Monday, 13 January 2014

Hash Code of Dictionary

Dealing recently with some task (the same that inspired us to implement WeakTable), we were in a position to use a dictionary as a key in another dictionary.

What are the rules for the class to be used as key:

key should be immutable;
key should implement a GetHashCode() method;
key should implement a Equals() method.

The first requirement is usually implemented as a documentation contract like this:

As long as an object is used as a key in the Dictionary<TKey, TValue>, it must not change in any way that affects its hash value.

The third requirement about equals is trivially implemented as a method:

public bool Equals(IDictionary<K, V> x, IDictionary<K, V> y) { if (x == y) { return true; } if ((x == null) || (y == null) || (x.Count != y.Count)) { return false; } foreach(var entry in x) { V value; if (!y.TryGetValue(entry.Key, out value) || !valueComparer.Equals(entry.Value, value)) { return false; } } return true; }

But how would you implement hash code?

We argued like this.

1. Let's consider the dictionary as a sparse array of values with only populated items that correspond to key hash codes.

2. Hash code is constructed using some fair algorithm. E.g like that used in java to calculate string's hash code:

n-1 h(s) = SUM (s[i]*p^(n-1-i)) mod m, where m = 2^31 i=0

In our case:

n can be arbitrary large int value, so in fact it's 2^32;
items are enumerated in unknown order;
there is only limited set of items, so most s[i] are zeros.

As result we cannot use recurrent function to calculate a power p^k mod m. Fortunately one can build fast exponentiation arguing like this:

32/s - 1 p^k = p^ SUM 2^((s*i)*k[i]) mod m, where s some int: 1, 2, 4, 8, 16, or 32. i=0

Thus

32/s - 1 p^k = PRODUCT (p^(2^(s*i)))^k[i] mod m i=0

If s = 1 then k[i] is either 1 or 0 (a bit), and there is 32 different p^(2^i) mod m values, which can be precalculated.

On the other hand, if we select s = 8 we can write the formula as:

p^k = p^k[0] * (p^(2^8))^k[1] * (p^(2^16))^k[2] * (p^(2^24))^k[3] mod m

where k[i] is a 8 bit value (byte).

Precalculating all values p^n, (p^(2^8))^n, (p^(2^16))^n, (p^(2^24))^n for n in 0 to 255 we reach the formula with 4 multiplications and with 1024 precalculated values.

Here is the whole utility to calculate hash factors:

/// <summary> /// Hash utilities. /// </summary> public class Hash { /// <summary> /// Returns a P^value mod 2^31, where P is hash base. /// </summary> /// <param name="value">A value to get hash factor for.</param> /// <returns>A hash factor value.</returns> public static int GetHashFactor(int value) { return factors[(uint)value & 0xff] * factors[(((uint)value >> 8) & 0xff) | 0x100] * factors[(((uint)value >> 16) & 0xff) | 0x200] * factors[(((uint)value >> 24) & 0xff) | 0x300]; } /// <summary> /// Initializes hash factors. /// </summary> static Hash() { var values = new int[4 * 256]; var value = P; var current = 1; var i = 0; do { values[i++] = current; current *= value; } while(i < 256); value = current; current = 1; do { values[i++] = current; current *= value; } while(i < 512); value = current; current = 1; do { values[i++] = current; current *= value; } while(i < 768); value = current; current = 1; do { values[i++] = current; current *= value; } while(i < 1024); factors = values; } /// <summary> /// A base to calculate hash factors. /// </summary> public const int P = 1103515245; /// <summary> /// Hash factors. /// </summary> private static readonly int[] factors; }

With this API hash code for a dictionary is a trivial operation:

public int GetHashCode(IDictionary<K, V> dictionary) { if (dictionary == null) { return 0; } var result = 0; foreach(var entry in dictionary) { if ((entry.Key == null) || (entry.Value == null)) { continue; } result += Hash.GetHashFactor(keyComparer.GetHashCode(entry.Key)) * valueComparer.GetHashCode(entry.Value); } return result; }

And finally, here is a reference to a class DictionaryEqualityComparer<K, V>: IEqualityComparer<IDictionary<K, V>> that allows a dictionary to be a key in another dictionary.

Update

We have commited some tests, and have found that with suffiently "good" implementation of GetHashCode() of key or value we achieve results almost of the same quality, as the results of the algorithm we have outlined above with much simpler and straightforward algorithm like this:

public int GetHashCode(IDictionary<K, V> dictionary) { if (dictionary == null) { return 0; } var result = 0; foreach(var entry in dictionary) { if ((entry.Key == null) || (entry.Value == null)) { continue; } var k = entry.Key.GetHashCode(); var v = entry.Value.GetHashCode(); k = (k << 5) + k; v = (v << (k >> 3)) + v; result += k ^ v; //result += Hash.GetHashFactor(keyComparer.GetHashCode(entry.Key)) * // valueComparer.GetHashCode(entry.Value); } return result; }

It was worth to blog about this just to find out that we have outwitted ourselves, and finally to reach to a trivial hash code implementation for the dictionary.

Monday, 13 January 2014 20:33:31 UTC

Comments [0] -
.NET | Thinking aloud | Tips and tricks

Wednesday, 08 January 2014

WeakTable

Dealing recently with some task, we were in a position to use a weak dictionary in the .NET. Instinctively we assumed that it should exist somewhere in the standard library. We definitely knew that there is a WeakReference class to for a single instance. We also knew that there is WeakHashMap in java, and that it's based on java's WeakReference.

So, we were surprised to find that there is no such thing out of the box in .NET.

We have found that java's and .NET's weak references are different. In java weak references whose targets are GCed can be automatically put into a queue, which can be used to build clean up logic to remove dead keys from weak hash map. There is nothing similar in .NET, where weak reference just silently loses it's value.

Internet is full with custom implementations of weak dictionaries in .NET.

.NET 4.5 finally defines a class ConditionalWeakTable<TKey, TValue>, which solves the problem in case when you need to match keys by instance identity.

Unfortunately in our case we needed to match keys using key's GetHashCode() and Equals(). So, ConditionalWeakTable<TKey, TValue> did not directly work, but then we found a way to make it work for us.

Here is a quote from the definition:

A ConditionalWeakTable<TKey, TValue> object is a dictionary that binds a managed object, which is represented by a key, to its attached property, which is represented by a value. The object's keys are the individual instances of the TKey class to which the property is attached, and its values are the property values that are assigned to the corresponding objects.

...in the ConditionalWeakTable<TKey, TValue> class, adding a key/value pair to the table does not ensure that the key will persist, even if it can be reached directly from a value stored in the table... Instead, ConditionalWeakTable<TKey, TValue> automatically removes the key/value entry as soon as no other references to a key exist outside the table.

This property of ConditionalWeakTable<TKey, TValue> has helped us to build a way to get a notification when the key is being finalized, which is the missed ingredient in .NET's weak references.

Assume you have an instance key of type Key. To get a notification you should define a class Finalizer that will call some handler when it's finalized, and you should bind key and a finalizer instance using weak table.

The code looks like this:

public class Finalizer<K> where K: class { public static void Bind(K key, Action<K> handler) { var finalizer = table.GetValue(key, k => new Finalizer<K> { key = k }); finalizer.Handler += handler; } public static void Unbind(K key, Action<K> handler) { Finalizer finalizer; if (table.TryGetValue(key, out finalizer)) { finalizer.Handler -= handler; } } ~Finalizer() { var handler = Handler; if (handler != null) { handler(key); } } private event Action<K> Handler; private K key; private static readonly ConditionalWeakTable<K, Finalizer> table = new ConditionalWeakTable<K, Finalizer>(); } Key key = ... Finalizer.Bind(key, k => { /* clean up. */ });

Using this approach we have created a class WeakTable<K, V> modeled after ConditionalWeakTable<TKey, TValue>.

So, this is our take in the problem: WeakTable.cs.

Wednesday, 08 January 2014 21:57:16 UTC

Comments [0] -
.NET | Java | Thinking aloud | Tips and tricks

Friday, 27 December 2013

Bidi Visual Converter

Oftentimes we deal with Hebrew in .NET.

The task we face again and again is attempt to convert a Hebrew text from visual to logical representation.

The latest demand of such task was when we processed content extracted from PDF. It's turned out that PDF stores content as graphic primitives, and as result text is stored visually (often each letter is kept separately).

We solved the task more than a decade ago, by calling Uniscribe API.

The function by itself is a small wrapper around that API, so in .NET 1.0 we were using managed C++, several years later we have switched to C++/CLI.

But now after many .NET releases, and with 32/64 versions we can see that C++ is only a guest in .NET world.

To run C++ in .NET you have to install VC runtime libraries adjusted to a specific .NET version. This turns C++ support in .NET into not a trivial task.

So, we have finally decided to define C# interop for the Uniscribe API, and recreate that function in pure C#:

namespace NesterovskyBros.Bidi { /// <summary> /// An utility to convert visual string to logical. /// <summary> public static class BidiConverter { /// <summary> /// Converts visual string to logical. /// </summary> /// <param name="value">A value to convert.</param> /// <param name="rtl">A base direction.</param> /// <param name="direction"> /// true for visual to logical, and false for logical to visual. /// </param> /// <returns>Converted string.</returns> public static string Convert(string value, bool rtl, bool direction);

You can download this project from BidiVisualConverter.zip.

Friday, 27 December 2013 20:36:20 UTC

Comments [0] -
.NET | Tips and tricks

Monday, 11 November 2013

Library resources in JSF

Before to start we have to confess that afer many years of experience we sincerely dislike JSF technology, as we think it's outdated compared to html 5 + REST.

We have a JSF 2.2 application, which is configured to track session through url. In this case Session ID is stored in url and not in cookies, as there may be many sessions opened per a client.

At the same time application uses libraries that expose scripts and css resources. This resources are referred to like this:

<link rel="stylesheet" type="text/css" jsfc="h:outputStylesheet" library="css" name="library-name.css"/> <script type="text/javascript" jsfc="h:outputScript" name="library-name.js" library="scripts" target="head"></script>

At runtime this is rendered as:

<link type="text/css" rel="stylesheet" href="/App/javax.faces.resource/library-name.css.jsf;jsessionid=FC4A893330CCE12E8E20DFAFC73CDF35?ln=css" /> <script type="text/javascript" src="/App/javax.faces.resource/library-name.js.jsf;jsessionid=FC4A893330CCE12E8E20DFAFC73CDF35?ln=scripts"></script>

You can see that Session ID is a part of url path, which prevents resource caching on a client.

It's not clear whether it's what JSF spec dictates or it's Oracle's Reference Implementation detail. We're certain, however, that it's too wasteful in heavy loaded environment, so we have tried to resolve the problem.

From JSF sources we have found that h:outputStylesheet, h:outputScript, and h:outputLink all use ExternalContext.encodeResourceURL() method to build markup url.

So, here is a solution: to provide custom wrapper for the ExternalContext.

This is done in two steps:

create a factory class;
register a factory in faces-config.xml;

1. Factory is a simple class but unfortunately it's implementation specific:

package com.nesterovskyBros.jsf; import javax.faces.FacesException; import javax.faces.context.ExternalContext; import javax.faces.context.ExternalContextWrapper; import com.sun.faces.context.ExternalContextFactoryImpl; /** * {@link ExternalContextFactory} to prevent session id in resource urls. */ public class ExternalContextFactory extends ExternalContextFactoryImpl { /** * {@inheritDoc} */ @Override public ExternalContext getExternalContext( Object context, Object request, Object response) throws FacesException { final ExternalContext externalContext = super.getExternalContext(context, request, response); return new ExternalContextWrapper() { @Override public ExternalContext getWrapped() { return externalContext; } @Override public String encodeResourceURL(String url) { return shouldEncode(url) ? super.encodeResourceURL(url) : url; } private boolean shouldEncode(String url) { // Decide here whether you want to encode url. // E.g. in case of h:outputLink you may want to have session id in url, // so your decision is based on some marker (like &session=1) in url. return false; } }; } }

2. Registration is just three lines in faces-config.xml:

<factory> <external-context-factory>com.nesterovskyBros.jsf.ExternalContextFactory</external-context-factory> </factory>

After that change at runtime we have:

<link type="text/css" rel="stylesheet" href="/App/javax.faces.resource/library-name.css.jsf?ln=css" /> <script type="text/javascript" src="/App/javax.faces.resource/library-name.js.jsf?ln=scripts"></script>

Monday, 11 November 2013 13:08:53 UTC

Comments [0] -
Java | JSF and Facelets | Tips and tricks

Monday, 04 November 2013

Windows: so far so worse

A day ago we had installed on our laptops Win 8.1. We thought it will be better than Win 8, but now we afraid of next release of Windows... So far, so worse.

ICS became unusable: in Windows 7 we've shared Internet connection from USB 3G dongle between our two computers, in Windows 8 we've succeeded to share only file system access, but not Internet. In Windows 8.1 neither Internet nor file system are accessible...

Monday, 04 November 2013 11:20:28 UTC

Comments [0] -
Thinking aloud

Monday, 14 October 2013

String comparision: SQL vs COBOL

Till recently we were living in simple world of string comparisons in SQL style, and now everything has changed.

From the university years we knew that strings in SQL are compared by first trimming traling spaces, and then comparing in C style.

Well, the picture was a little more complex, as collations were involved (national, case sensivity), and as different SQL vendors implemented it differently.

Next,
we're dealing with programs converted from COBOL, which we originally thought follow SQL rules when strings are compared.

Here is where the problem has started.

Once we have found that java program has branched differently than original COBOL, and the reason was that the COBOL and java compared two strings differently:

COBOL: "A\n" < "A";
Java: "A\n" > "A"

We have looked into COBOL Language Reference and found the rules:

Operands of equal size

Characters in corresponding positions of the two operands are compared, beginning with the leftmost character and continuing through the rightmost character.

If all pairs of characters through the last pair test as equal, the operands are considered as equal.

If a pair of unequal characters is encountered, the characters are tested to determine their relative positions in the collating sequence. The operand that contains the character higher in the sequence is considered the greater operand.

Operands of unequal size

If the operands are of unequal size, the comparison is made as though the shorter operand were extended to the right with enough spaces to make the operands equal in size.

You can see that strings must not be trimmed but padded with spaces to the longer string, and only then they are compared. This subtle difference has significant impact for characters below the space.

So, here we've found that COBOL and SQL comparisons are different.

But then we have questioned how really SQL beheaves?

We've tested comparisons in SQL Server and DB2, and have seen that our understanding of SQL comparison holds. It works as if trimming spaces, and then comparing.

But again we have looked into SQL-92 definition, and that's what we see there:

8.2 <comparison predicate>
3) The comparison of two character strings is determined as follows:

a) If the length in characters of X is not equal to the length in characters of Y, then the shorter string is effectively replaced, for the purposes of comparison, with a copy of itself that has been extended to the length of the longer string by concatenation on the right of one or more pad characters, where the pad character is chosen based on CS. If CS has the NO PAD attribute, then the pad character is an implementation-dependent character different from any character in the character set of X and Y that collates less than any string under CS. Otherwise, the pad character is a <space>.

So, what we see is that SQL-92 rules are very close to COBOL rules, but then we reach the question: how come that at least SQL Server and DB2 implement string comparison differently than SQL-92 dictates?

Update: we have found that both SQL Server and DB2 have their string collation defined in a way that <space> is less than any other character. So the following is always true: '[' + char(13) + ']' > '[ ]'.

Monday, 14 October 2013 20:23:11 UTC

Comments [0] -
Java | SQL Server puzzle | Thinking aloud | Tips and tricks

Friday, 09 August 2013

KendoUI's guid function

Earlier we have written a post KendoUI's slowest function and now, we want to point to the next slow function, which is kendo.guid(). It's used to assign uid to each observable object, and also in couple of other places.

Here is its source:

guid: function() { var id = "", i, random; for (i = 0; i < 32; i++) { random = math.random() * 16 | 0; if (i == 8 || i == 12 || i == 16 || i == 20) { id += "-"; } id += (i == 12 ? 4 : (i == 16 ? (random & 3 | 8) : random)).toString(16); } return id; }

KendoUI people have decided to define uid as a string in format of Globally unique identifier. We think there is no reason to have such a complex value; it's enough to have counter to generate uid values. As KendoUI relies on the string type of uid, so we have defined a patch like this:

var guid = 0 kendo.guid = function() { return ++guid + "" }

Consider now a test case. It's almost identical to that in previous post:

<!DOCTYPE html> <html> <head> <title>Test</title> <script src="scripts/jquery/jquery.min..js"></script> <script src="scripts/kendo/kendo.web.min.js"></script> <link href="styles/kendo.common.min.css" rel="stylesheet" /> <link href="styles/kendo.default.min.css" rel="stylesheet" /> <link href="styles/style.css" rel="stylesheet" /> <script> var model; function init() { var source = []; for(var i = 0; i < 1000; ++i) { source.push({ text: "value " + i, value: "" + i }); } model = kendo.observable( { value: "1", source: new kendo.data.DataSource( { data: source }) }); model.source.read(); } function patch() { var base = kendo.data.binders.widget.source.fn._ns; var result; var guid = 0; kendo.guid = function() { return ++guid + ""; }; kendo.data.binders.widget.source.fn._ns = function(ns) { return ns ? base.call(this, ns) : (result || (result = base.call(this, ns))); } } function test() { init(); kendo.bind("#view", model); } patch(); </script> </head> <body> <p> <button onclick="test()">Click to start test</button> </p> <p id="view"> Select: <input data-role="dropdownlist" data-bind="value: value, source: source" data-text-field="text" data-value-field="value"/> </p> </body> </html>

Now, we can compare performance with and without that patch.

Here is a run statistics without patch:

Level	Function	Count	Inclusive time (ms)	Inclusive time %	Avg time (ms)
1	onclick	1	270.73	100	270.73
1.1	test	1	269.73	99.63	269.73
1.1.1	init	1	117.07	43.24	117.07
1.1.1.1	guid	1,001	72.05	26.61	0.07
1.1.2	bind	1	152.65	56.39	152.65

and with patch:

Level	Function	Count	Inclusive time (ms)	Inclusive time %	Avg time (ms)
1	onclick	1	172.64	100	172.64
1.1	test	1	171.65	99.42	171.65
1.1.1	init	1	62.04	35.94	62.04
1.1.1.1	guid	1,001	1	0.58	0
1.1.2	bind	1	109.6	63.49	109.6

Note that statistics were collected for IE 10.
An example can be found at slow2.html.

Friday, 09 August 2013 14:24:16 UTC

Comments [0] -
javascript | kendoui | Tips and tricks

Tuesday, 30 July 2013

KendoUI's slowest function

What is the slowest function in kendo? Or better, which function has most negative performance impact in kendo.

Recently, we were dealing with a simple page, which was too slow, as data binding took more than second.

The page contained a dropdown list, with ~1000 options. To understand the reason we have run this page under the IE's built-in javascript profiler, and ...

there, we have found that almost half of the time is taken by a function (call it X), which receives nothing and returns always the same result!

But, let's now see a minimal example that demostrates the problem:

<!DOCTYPE html> <html> <head> <title>Test</title> <script src="scripts/jquery/jquery.js"></script> <script src="scripts/kendo/kendo.web.min.js"></script> <link href="styles/kendo.common.min.css" rel="stylesheet" /> <link href="styles/kendo.default.min.css" rel="stylesheet" /> <link href="styles/style.css" rel="stylesheet" /> <script> var model; function init() { var source = []; for(var i = 0; i < 1000; ++i) { source.push({ text: "value " + i, value: "" + i }); } model = kendo.observable( { value: "1", source: new kendo.data.DataSource( { data: source }) }); model.source.read(); } function test() { kendo.bind("#view", model); } init(); </script> </head> <body> <p> <button onclick="test()">Click to start test</button> </p> <p id="view"> Select: <input data-role="dropdownlist" data-bind="value: value, source: source" data-text-field="text" data-value-field="value"/> </p> </body> </html>

There are two parts: initialization, and a test itself that starts upon button click.

In the initialization part we have defined a model, containing a datasource.

The test part performs data binding.

Now, here is a run statistics:

Function	Count	Inclusive time (ms)	Inclusive time %	Avg time (ms)
test	1	456.05	100	456.05
X	1,000	200.14	43.89	0.2

So, X is fast by itself, but it run 1000 times, and took about 44% of all time.

And now, to the function. It's kendo.data.binders.widget.source.fn._ns.

Here is its code:

_ns: function(ns) { ns = ns || kendo.ui; var all = [ kendo.ui, kendo.dataviz.ui, kendo.mobile.ui ]; all.splice($.inArray(ns, all), 1); all.unshift(ns); return kendo.rolesFromNamespaces(all); }

We can see that on each call it receives a parameter undefined, and always returns an array with the same content. Not sure why Kendo UI team made it so complex, but one can easily device a simple patch that optimizes this code path.

function patch() { var base = kendo.data.binders.widget.source.fn._ns; var result; kendo.data.binders.widget.source.fn._ns = function(ns) { return ns ? base.call(this, ns) : (result || (result = base.call(this, ns))); } }

With this patch applied, the profiler shows:

Function	Count	Inclusive time (ms)	Inclusive time %	Avg time (ms)
test	1	253.03	100	253.03
_ns	1,000	6	2.37	0.01

Execution time dropped considerably, and _ns() loses its title of most time consuming function!

An example can be found at slow.html.

Tuesday, 30 July 2013 20:47:11 UTC

Comments [0] -
javascript | kendoui

Sunday, 28 July 2013

Visual to logical conversion in Java

Time after time we run into the same problem on different platforms, with different languages. The problem's name is "Visual to Logical conversion for right-to-left or bidirectional text". The problem is usually due to legacy code, which stores texts in visual order from left to right. In case of English it's ok, but with Hebrew this means that texts are partially reversed.

It worth to note that we've solved the same task with Windows API for native and .NET applications more than 10 years ago.

On the other hand, for Java, we yet didn't see any acceptable standalone solution. To remedy this omission, we publish here our solution to this problem.

package com.nesterovskyBros.text;

import java.text.Bidi;

/**
 * Utility that uses {@link Bidi} class. 
 */
public class BidiUtils
{
  /**
   * Implements visual to logical order converter.
   * 
   * @author <a href="http://www.nesterovsky-bros.com">Nesterovsky bros</a>
   *
   * @param text an input text in visual order to convert.
   * @return a String value in logical order.
   */
  public static String visualToLogical(String text)
  {
    if ((text == null) || (text.length() == 0))
    {
        return text;
    }
  
    Bidi bidi = new Bidi(text, Bidi.DIRECTION_DEFAULT_LEFT_TO_RIGHT);
  
    if (bidi.isLeftToRight())
    {
        return text;
    }
  
    int count = bidi.getRunCount();
    byte[] levels = new byte[count];
    Integer[] runs = new Integer[count];
  
    for (int i = 0; i < count; i++)
    {
       levels[i] = (byte)bidi.getRunLevel(i);
       runs[i] = i;
    }
  
    Bidi.reorderVisually(levels, 0, runs, 0, count);

    StringBuilder result = new StringBuilder();

    for (int i = 0; i < count; i++)
    {
       int index = runs[i];
       int start = bidi.getRunStart(index);
       int end = bidi.getRunLimit(index);
       int level = levels[index];
  
       if ((level & 1) != 0)
       {
         for (; --end >= start;)
         {
           result.append(text.charAt(end));
         }
       }
       else
       {
         result.append(text, start, end);
       }
    }

    return result.toString();
  }
}

This method utilizes embeded Bidi's algorithm, see class java.text.Bidi.

Be aware that there is no perfect algorithm that covers all possible cases, since BIDI was written for an opposite task, but our implementation based on Bidi.reorderVisually is usually acceptable.

Here is an JUnit test for this method:

package com.nesterovskyBors.text;

import static org.junit.Assert.*;

import org.junit.Test;

import com.nesterovskyBros.text.BidiUtils;

public class BidiUtilsTests
{

  @Test
  public void testsVisualToLogical()
  {
    String text = "123 יתימאה ןחבמ";
    String actual = BidiUtils.visualToLogical(text);
    String expected = "מבחן האמיתי 123";
  
    assertEquals(expected, actual);
    
    text = "תירבע English תירבע בוש";
    actual = BidiUtils.visualToLogical(text);
    expected = "שוב עברית English עברית";
    
    assertEquals(expected, actual);
  }
}

Sunday, 28 July 2013 15:44:17 UTC

Comments [0] -
Java | Tips and tricks

Sunday, 14 July 2013

On Garbage Collectors

Recently we've seen an article Why mobile web apps are slow.

While title mentions web apps, the the criticism is directed purely to javascript language. The answer presented there is twofold:

Raw javascript performance is ~5 times slower than performance of native code solving the same task.
This accounts for the most modern implementations that exploit JIT. Author does not expect that this proportion will be significatly changed in javascript's favor in the near future.
Garbage Collection, which is essential part of javascript, does not work well in constrainted environment of mobile devices.
Here author quotes many references that show that:
- for GC to work on peer with non-GC application, it needs to have ~6 - 8 times size of memory than an application needs;
- at the same time for hardware reasons, mobile devices cannot provide such amount of memory;
- on the other hand with rise of CPU performance, GC pressure rises even faster.

In the end author, while saying about some attempts to change the state, has no final verdict, whether there can be anything done to remedy the problem.

Having roots in C++, we're GC unbelievers. But you know, who will ask your opinion on that, while there are all those modern languages that try to abstract from memory and implicitly or explicitly assume GC: java, C#, javascript, xslt, xquery, and so on.

There always is a solution to avoid GC completely, like C++ and other (e.g. Microsoft's C++/CX, or Apple's ARC) do. But, assuming you're entered GC world, what can be done with it? How do you make it more predictable, less greedy, and probably more fast?

Our arguments are like this.

How does native code manage object graphs?

Today's solution is reference counting along with weak references to break cycles in graph.

Can be GC based on this?

Yes.

In fact upcoming spec of javascript contains weak references. So, provided a developer accurately defines relations in an object graph, one may immediately achieve the same efficiency as native solution.

If one does not use weak references consistently then object cycles can be created, so there can appear a graph that is not referenced anywhere in a program. This graph can be collected with classical GC that scans object roots.

Classical GC part can be used as a debug mode leak detector, and will collect graph cycles at runtime.

Thus, we claim that a hybrid memory management: reference counting with weak references plus classical GC is possible; it will be equal to a native memory management when weak references are properly used, and it will be close to classical GC without use of weak references.

This solution gives a rule to mitigate GC drawbacks: just use weak references in appropriate points, and you can continue to live in GC world, where GC is only a fallback.

Sunday, 14 July 2013 12:20:29 UTC

Comments [0] -
javascript | Thinking aloud

Friday, 12 July 2013

Prises to RESTful

RESTful is well known and established archetecture. It does not need our approvals or recomendations.

It's something different that we want to express. We'd like to thank all those people who formulated those ideas.

While one can argue that REST has "always" existed along with the web, from our experience we can see that most of web applications we have created up to the late 200x were stateful.

We can see that many web applications should support the application (session) state. And here REST has given a rule:

client stores a session state;
server does not store a session, and is represented as set of services;
if there is a state that cannot be stored on a client (e.g. due to security reasons), then server should use caches, and be able to reconstruct that state upon cache miss.

Now is a good time for the proliferation of the REST, as even weakest clients (browsers) can implement its requirements.

REST allowed us to drastically simplify development and support, to unload server, and to build better web applications.

We compare web applications we have written a decade ago using ASP.NET, and in last 3-4 years using RESTful ideas both in java and in .NET. Clearly, the later have better performance, but from the support standpoint the most appealing is that you can instantly upgrade the application without impacting users, as there are no sessions on the server. For the same reason you should not puzzle over whether you should use in-process sessions and session stickness with load balancing server, or out-of-process sessions.

Things became simpler:

server now is pure logic through services (WCF, Web API, JAX-RS);
client is gui - jquery, kendoui or other;
aspx/jsf pages gone completely;

Friday, 12 July 2013 13:25:28 UTC

Comments [0] -
Thinking aloud

Sunday, 07 July 2013

Compile KendoUI templates

Earlier, in the article How To: Load KendoUI Templates from External Files, we were talking about the way to combine project's templates into a single file using Text Templates. Now, we would like to suggest the next step.

KendoUI defines text templates that it knows to transform into functions, at runtime obviously. Thus a template like this:

<tr> <td data-bind=" text: name"></td> <td>#: kendo.toString(get("price"), "C") #</td> <td data-bind="text: unitsInStock"></td> <td><button class="k-button" data-bind="click: deleteProduct"> Delete</button></td> </tr>

is transformed into a function:

function(data) { var o,e=kendo.htmlEncode; with(data) { o='<tr><td data-bind="text: name"></td><td>'+ e( kendo.toString(get("price"), "C") )+ '</td><td data-bind="text: unitsInStock"></td>' + '<td><button class="k-button" ' + 'data-bind="click: deleteProduct">Delete</button></td></tr>'; } return o; }

The transformation is done through a sequence of of regex replaces.

Now, what's the fastest javascript template engine?

Right! That, which does not work at runtime. :-)

What we thought is that we can generate those functions at compile time rather than defining templates.

We have updated templates.tt to generate template functions, and optionally to generate <script> tags that call those functions. This way, for an input footer.tmpl.html:

<!DOCTYPE html> <html> <head> <title>Test</title> <base href="/" /> <link href="styles/kendo.common.min.css" rel="stylesheet" /> <link href="styles/kendo.default.min.css" rel="stylesheet" /> </head> <body> <table data-template-id="view"> <tr> <td>Products count: #: total() #</td> <td>Total price: #: totalPrice() #</td> <td colspan="2">Units in stock: #: totalUnitsInStock() #</td> </tr> </table> </body> </html>

templates.js will look like this:

nesterovskyBros.templates= { ... "footer-view":function(data) { var o,e=kendo.htmlEncode; with(data) { ... } return o; }, ... }; document.write('<script id="footer-view-template" type="text/x-kendo-template">#=nesterovskyBros.templates["footer-view"](data)#</script>');

To get template function at runtime you simply refer to nesterovskyBros.templates["footer-view"].

template.tt now allows you to specify:

scope - a javascript scope for tempate functions, e.g. "nesterovskyBros.templates";
data-script attribute over each template (default is true) to prevent generation of <script> tag;
data-with-block attribute (default is true) to prevent with(data) {...} statement in javascript.

See a sample application that shows how nicely KendoUI UserControls work with those compiled templates.

Sunday, 07 July 2013 18:54:57 UTC

Comments [2] -
javascript | kendoui | Thinking aloud

Wednesday, 03 July 2013

Derived xml schemas

Awhile ago we have created a set of xml schemas and xslt to represent different languages as xml, and to generate source from those xmls. This way we know to represent and generate: java, c#, cobol, and several sql dialects (read about languages xom on this site).

Here, we'd like to expose a nuisance we had with sql dialects schema.

Our goal was to define a basic sql schema, and dialect extensions. This way we assumed to express general and dialect specific constructs. So, lets consider an example.

General:

-- Select one row select * from A

DB2:

select * from A fetch first row only

T-SQL:

select top 1 * from A

Oracle:

select * from A where rownum = 1

All these queries have common core syntax, while at the same time have dialect specific means to express intention to return first row only.

Down to the xml schema basic select statement looks like this:

<xs:complexType name="select-statement"> <xs:complexContent> <xs:extension base="full-select-statement"> <xs:sequence> <xs:element name="columns" type="columns-clause"> <xs:element name="from" type="from-clause" minOccurs="0"> <xs:element name="where" type="unary-expression" minOccurs="0"/> <xs:element name="group-by" type="expression-list" minOccurs="0"/> <xs:element name="having" type="unary-expression" minOccurs="0"/> <xs:element name="order-by" type="order-by-clause" minOccurs="0"/> </xs:sequence> <xs:attribute name="specification" type="query-specification" use="optional" default="all"/> </xs:extension> </xs:complexContent> </xs:complexType>

Here all is relatively clear. The generic select looks like:

<sql:select> <sql:columns> <sql:column wildcard="true"/> </sql:columns> <sql:from> <sql:table name="A"/> </sql:from> </sql:select>

But how would you define dialect specifics?

E.g. for T-SQL we would like to see a markup:

<sql:select> <tsql:top> <sql:number value="1"/> </tsql:top> <sql:columns> <sql:column wildcard="true"/> </sql:columns> <sql:from> <sql:table name="A"/> </sql:from> </sql:select>

While for DB2 there should be:

<sql:select> <sql:columns> <sql:column wildcard="true"/> </sql:columns> <sql:from> <sql:table name="A"/> </sql:from> <db2:fetch-first rows="1"/> </sql:select>

So, again the quesions are:

how to define basic sql schema with goal to extend it in direction of DB2 or T-SQL?
how to define an xslt sql serializer that will be also extendable?

Though we have tried several solutions to that problem, none is satisfactory enough. To allow extensions we have defined that all elements in sql schema are based on sql-element, which allows extensions:

<xs:complexType name="sql-element" abstract="true"> <xs:sequence> <xs:element ref="extension" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:element name="extension" type="extension"/> <xs:complexType name="extension" abstract="true"> <xs:complexContent> <xs:extension base="sql-element"/> </xs:complexContent> </xs:complexType> ... <xs:element name="top" type="top-extension" substitutionGroup="sql:extension"/> <xs:complexType name="top-extension"> <xs:complexContent> <xs:extension base="sql:extension"> <xs:sequence> <xs:element ref="sql:expression"/> </xs:sequence> <xs:attribute name="percent" type="xs:boolean" use="optional" default="false"/> </xs:extension> </xs:complexContent> </xs:complexType>