ThinkingAloud - Nesterovsky bros

Introduction

We migrate code out of MF to Azure. Tool we use produces plain good functionally equivalent C# code. But it turns it's not enough!

So, what's the problem?

Converted code is very slow, especially for batch processing, where MF completes job, say in 30 minutes, while converted code finishes in 8 hours.

At this point usually someone appears and whispers in the ear:

Look, those old technologies are proven by the time. It worth to stick to old good Cobol, or better to Assembler if you want to do the real thing.

We're curious though: why is there a difference?

Turns out the issue lies in differences of network topology between MF and Azure solutions. On MF all programs, database and file storage virtually sit in a single box, thus network latency is negligible.

It's rather usual to see chatty SQL programs on MF that are doing a lot of small SQL queries.

In Azure - programs, database, file storage are different services most certainly sitting in different phisical boxes. You should be thankfull if they are co-located in a single datacenter. So, network latency immediately becomes a factor. Even if it just adds 1 millisecond per SQL roundtrip, it adds up in loops, and turns in the showstopper.

There is no easy workaround on the hardware level.

People advice to write programs differently: "Tune applications and databases for performance in Azure SQL Database".

That's a good advice for a new development but discouraging for migration done by a tool.

So, what is the way forward?

Well, there is one. While accepting weak sides of Azure we can exploit its strong sides.

Parallel refactoring

Before continuing let's consider a code demoing the problem:

  public void CreateReport(StringWriter writer)
  {
    var index = 0;

    foreach(var transaction in dataService.
      GetTransactions().
      OrderBy(item => (item.At, item.SourceAccountId)))
    {
      var sourceAccount = dataService.GetAccount(transaction.SourceAccountId);
      var targetAccount = transaction.TargetAccountId != null ?
        dataService.GetAccount(transaction.TargetAccountId) : null;

      ++index;

      if (index % 100 == 0)
      { 
        Console.WriteLine(index);
      }

      writer.WriteLine($"{index},{transaction.Id},{
        transaction.At},{transaction.Type},{transaction.Amount},{
        transaction.SourceAccountId},{sourceAccount?.Name},{
        transaction.TargetAccountId},{targetAccount?.Name}");
    }
  }

This cycle queries transactions, along with two more small queries to get source and target accounts for each transaction. Results are printed into a report.

If we assume query latency just 1 millisecond, and try to run such code for 100K transactions we easily come to 200+ seconds of execution.

Reality turns to be much worse. Program spends most of its lifecycle waiting for database results, and iterations don't advance until all work of previous iterations is complete.

We could do better even without trying to rewrite all code! Let's articulate our goals:

To make code fast.
To leave code recognizable.

The idea is to form two processing pipelines:

(a) one that processes data in parallel out of order;
(b) other that processes data serially, in original order;

Each pipeline may post sub-tasks to the other, so (a) runs its tasks in parallel unordered, while (b) runs its tasks as if everything was running serially.

So, parallel plan would be like this:

Queue parallel sub-tasks (a) for each transaction.
Parallel sub-task in (a) reads source and target accounts, and queues serial sub-task (b) passing transaction and accounts.
Serial sub-task (b) increments index, and writes report record.
Wait for all tasks to complete.

To reduce burden of task piplelines we use Dataflow (Task Parallel Library), and encapsulate everything in a small wrapper.

Consider refactored code:

  public void CreateReport(StringWriter writer)
  {
    using var parallel = new Parallel(options.Value.Parallelism); //     ⬅️ 1
    var index = 0;

    parallel.ForEachAsync( //     ⬅️ 2
      dataService.
        GetTransactions().
        OrderBy(item => (item.At, item.SourceAccountId)),
      transaction => //     ⬅️ 3
      {
        var sourceAccount = dataService.GetAccount(transaction.SourceAccountId);
        var targetAccount = transaction.TargetAccountId != null ?
          dataService.GetAccount(transaction.TargetAccountId) : null;

        parallel.PostSync(  //     ⬅️ 4
          (transaction, sourceAccount, targetAccount),  //     ⬅️ 5
          data =>
          {
            var (transaction, sourceAccount, targetAccount) = data; //     ⬅️ 6

            ++index;

            if (index % 100 == 0)
            {
              Console.WriteLine(index);
            }

            writer.WriteLine($"{index},{transaction.Id},{
              transaction.At},{transaction.Type},{transaction.Amount},{
              transaction.SourceAccountId},{sourceAccount?.Name},{
              transaction.TargetAccountId},{targetAccount?.Name}");
          });
      });
  }

Consider ⬅️ points:

We create Parallel utility class passing degree of parallelism requested.
We iterate transactions using parallel.ForEachAsync() that queues parallel sub-tasks for each transaction, and then waits until all tasks are complete.
Each parallel sub-task recieves a transaction. It may be called from a different thread.
Having recieved required accounts we queue a sub-task for synchronous execution using parallel.PostSync(), and
Pass there data collected in parallel sub-task: transaction and accounts.
We deconstruct data passed into variables, and then proceed with serial logic.

What we achieve with this refactoring:

Top level query that brings transactions is done and iterated serially.
But each iteration body is run in parallel. By default we set it up to allow up to 100 parallel executions. All those parallel sub-task do not wait on each other so their waitings do not add up.
Sync sub-tasks are queued and executed in order of their serial appearance, so increments and report records are not a subject of race conditions, nor a subject of reordering of output records.

We think that such refactored code is still recognizible.

As for performance this is what log shows:

Serial test
100
...
Execution time: 00:01:33.8152540

Parallel test
100
...
Execution time: 00:00:05.8705468

Reference

Please take a look at project to understand implementation details, and in particular Parallel class implementing API to post parallel and serial tasks, run cycles and some more.

Please continue reading on GitHub.

Friday, 26 January 2024 13:27:16 UTC

Comments [0] -
.NET | Thinking aloud | Tips and tricks

Using Bing chat as a tool.

As we wrote some time ago, we started to use AI and ML APIs in our applications, but this time we've tried to use Bing chat as a tool.

The initial task was to collect sequences of integer numbers and return a list of pairs integers where the first number is a range's start and the second one is a count of values in the range. For example, for sequence 1, 2, 3, 5, 8, 9 we'd like to get the following results: 1, 3 5, 1 8, 2

We both started to solve this issue. My brother started to play with SSMS using his deep knowledges in SQL and analytic functions, whereas me started to play with Bing chat. Looking ahead, I can state that we both solved the issue using almost the same SQL statement almost in the same time.

Here are my prompts and Bing's answers:

Prompt:: Act as database analyst and software engineer. Given the table (T) with a numeric field (N), which values are orderd ranges of integers. In a range each next value is greater than previous one on 1. When value differs from the previous one more than on 1 then in belongs to another range. Generate select statement that returns rows with two fields. The first field (R) is a number from which a range starts and the second field (C) is count elements of that range:
Answer:: Sure, I can help with that. Here's a SQL query that should do the trick. This query uses a self-join to identify the start of each range, and then groups by the start of the range to count the number of elements. ```sql SELECT start_of_range AS R, COUNT(*) AS C FROM ( SELECT N, N - ROW_NUMBER() OVER (ORDER BY N) AS start_of_range FROM T ) subquery GROUP BY start_of_range ORDER BY start_of_range; ``` In this query, `start_of_range` is calculated as the difference between the current value of `N` and its row number (when ordered by `N`). This difference will be the same for all `N` in the same range. Then we group by `start_of_range` to get the count of elements in each range. The outer query selects the start of each range (`R`) and the count of elements in that range (`C`).
Me, scratching the head, hmmm...
Question:: This is a bit incorrect SQL select. For example for values 1, 2, 3, 5, 8, 9 I'd like to get the following results: 1, 3 5, 1 8, 2 What a SQL select shall I use to get such results?
Answer:: I apologize for the confusion earlier. You're right, the previous query doesn't handle the scenario you've described correctly. Here's a corrected SQL query that should give you the desired output. This query uses a window function to compare each row with the previous one and identify the start of a new range. Then it groups by the start of the range to count the elements. ```sql WITH ranges AS ( SELECT N, CASE WHEN N - LAG(N, 1, N) OVER (ORDER BY N) > 1 THEN 1 ELSE 0 END AS is_start_of_range FROM T ), range_groups AS ( SELECT N, SUM(is_start_of_range) OVER (ORDER BY N) AS range_group FROM ranges ) SELECT MIN(N) AS R, COUNT(*) AS C FROM range_groups GROUP BY range_group ORDER BY R; ``` In this query, `is_start_of_range` is a flag that indicates the start of a new range (when the difference between the current and previous `N` is more than 1). `range_group` is a running total of `is_start_of_range` that serves as a unique identifier for each range. The outer query then groups by `range_group` to get the start (`R`) and count (`C`) of each range. I hope this helps! Let me know if you have any other questions.

That answer more satisfied me, and I've checked the following example:


        with T as
        (
            select 10 N
            union all
            select 11 N
            union all
            select 30 N
            union all
            select 31 N
            union all
            select 32 N
            union all
            select 50 N
            union all
            select 80 N
            union all
            select 81 N
            union all
            select 82 N
            union all
            select 83 N
            union all
            select 90 N
        ),
        ranges AS (
            SELECT
                N,
                CASE
                    WHEN N - LAG(N, 1, N) OVER (ORDER BY N) > 1 THEN 1
                    ELSE 0
                END AS is_start_of_range
            FROM T
        ),
        range_groups AS (
            SELECT
                N,
                SUM(is_start_of_range) OVER (ORDER BY N) AS range_group
            FROM ranges
        )
        SELECT
            MIN(N) AS R,
            COUNT(*) AS C
        FROM range_groups
        GROUP BY range_group
        ORDER BY R;

The result was:


        R	C
        ------------
        10	2
        30	3
        50	1
        80	4
        90	1

Conclusion:
Nowdays, using AI as a tool, you may improve your performance to the expert level and, may be, study something new...

P.S. Don't fool youself that AI can replace a developer (see the first answer of the chat), but together they can be much stornger than separatelly.

P.P.S. Another interesting solution gave us our colleague. She used an additional field to avoid using analytics function, and she reached the same result:


with T as
        (
            select 10 ID, 1 N
            union all
            select 11 ID, 1 N
            union all
            select 30 ID, 1 N
            union all
            select 31 ID, 1 N
            union all
            select 32 ID, 1 N
            union all
            select 50 ID, 1 N
            union all
            select 80 ID, 1 N
            union all
            select 81 ID, 1 N
            union all
            select 82 ID, 1 N
            union all
            select 83 ID, 1 N
            union all
            select 90 ID, 1 N
        ),
Groups AS (
    SELECT
        ID,
        N,
        ROW_NUMBER() OVER (ORDER BY ID) - ID AS GroupNumber
    FROM
        T
)
SELECT
    MIN(ID) AS R,
    SUM(N) AS C
FROM
    Groups
GROUP BY
    GroupNumber
ORDER BY
    StartID;

Monday, 01 January 2024 14:02:01 UTC

Comments [0] -
AI | SQL Server puzzle | Thinking aloud | Tips and tricks

Vector database

Many years ago we implemented Akinator like engine purely within SQL Server.

Today we use exactly the same technique to implement vector database.

Please see our GitHub repo: vector-database.

Thursday, 22 June 2023 22:01:45 UTC

Comments [0] -
Announce | SQL Server puzzle | Thinking aloud

ChatGPT

Last few days we play with OpenAI API and out of pure interest have asked about few slogans for our team. As an input we fed info from "About us" page. And as one of the first slogans we've gotten the following slogan that catgh our eyes:

Excellence Through Experience: Nesterovsky Bros.

ChatGPT is not a bad copywriter at all...

Thursday, 23 March 2023 11:30:03 UTC

Comments [0] -
Thinking aloud

xml vs json

Xslt is oftentimes thought as a tool to take input xml, and run transformation to get html or some xml on output. Our use case is more complex, and is closer to a data mining of big data in batch. Our transformation pipelines often take hour or more to run even with SSD disks and with CPU cores fully loaded with work.

So, we're looking for performance opportunities, and xml vs json might be promising.

Here are our hypotheses:

json is lighter than xml to serialize and deserialize;
json stored as map(*), array(*) and other items() are ligher than node() at runtime, in particular subtree copy is zero cost in json;
templates with match patterns are efficiently can be implemented with maps();
there is incremental way forward from use of xml to use of json.

If it pays off we might be switching xml format to json all over, even though it is a development effort.

But to proceed we need to commit an experiment to measure processing speed of xml vs json in xslt.

Now our task is to find an isolated small representative sample to prove or reject our hypotheses.

Better to start off with some existing transformation, and change it from use of xml to json.

The question is whether there is such a candidate.

Saturday, 16 April 2022 19:03:04 UTC

Comments [0] -
Thinking aloud | xslt

Bad code

Couple of days ago, while integrating with someones C# library, we had to debug it, as something went wrong. The code is big and obscure but for the integration purposes it's rather simple: you just create and call a class, that's all. Yet, something just did not work. We had to prove that it's not our fault, as the other side is uncooperative and would not run common debug session to resolve the problem.

To simplify the matter as much as possible here is the case:

var input = ...
var x = new X();
var output = x.Execute(input);

You pass correct input, and get correct output. Simple, right? But it did not work! So, we delved into the foreign code, and this is what we have seen:

class X: Y
{
  public Output Execute(Input input)
  {
    return Perform(input);
  }

  protected override Output Run(Input input)
  { 
    ...

     return output;
  }
}

class Y: Z
{
  ...
}

class Z
{
  protected Output Perform(Input input)
  {
    return Run(Input);
  }
        
  protected virtual Output Run(Input input)
  {
    return null;
  }
}

Do you see, still flow is simple, right? We call X.Execute(), it calls Z.Perform(), which in turn calls overriden X.Run() that returns the result.

But to our puzzlement we got null on output, as if Z.Run() was called!

We stepped through the code in debugger and confirmed that Z.Perform() calls Z.Run(), even though "this" instance is of type X.

How can it be? It's a nonsence! Yet, no overriden method was ever called.

No matter how much scrunity we applied to sources X and Z it just did not work.

We verified that the signature of X.Run() matches the signature of Z.Run(), so it overrides the method.

Then what do we see here?

And then enlightenment come! Yes, X.Run() overrides the method, but what method?

We looked closely at class Y, and bingo, we can see there following:

class Y: Z
{
  ...

  protected virtual Output Run(Input input)
  {
    return null;
  }
      
  ...
}

So, X.Run() overrides Y.Run() and not Z.Run()!

Per .NET Y.Run() and Z.Run() are two independant virtual methods, where Y.Run() in addition hides Z.Run().

IDE even issued a warning that it's better declare Y.Run() as:

protected new virtual Output Run(Input input)
{
  return null;
}

So, someones code was plainly wrong: Y.Run() had to use override rather than virtual.

We won, right?

Well, it's hard to call it a win.

We spent a hour looking at someones ugly code just to prove we're still sane.

So, what is conclusion of this story?

We think here it is:

be cautious while looking at someones code;
look at IDE warnings, don't disregard them, and try to resolve all of them in your code base;
try to write simple code.

Saturday, 15 January 2022 13:55:08 UTC

Comments [0] -
.NET | Thinking aloud

CosmosDB with EF Core

Lately we work great deal of time with Azure's CosmosDB.

This is how it's defined:

"It is schema-agnostic, horizontally scalable, and generally classified as a NoSQL database."

This, unconfident in itself, quote below is clarified as:

"The SQL API lets clients create, update and delete containers and items. Items can be queried with a read-only, JSON-friendly SQL dialect."

To be honest this SQL API made us favor CosmosDB.

So, we started a development with CosmosDB as a data storage.

The next development ingredient we learned the hard way is to try to refrain from clever techniques.

The lesson we learned is simple: after you finish with a project, provided it's not a toy, you give it to people who will be supporting it. You should think about those future developers before you're going to insert some cleverness in you code.

With this common sense we selected EF Core as a library that will serve as an interface between C# and the database.

Initialy all went well until we needed to store a list of strings as a document property and found it's not possible.

Why? - was a naive question.

Answer puzzled us a lot - because string is not an "Entity" (what ever it means), and EF is about framework of entities.

You could argue with this argument as long as you like, it just does not work. It is especially bad if you need to store a class that you do not directly control e.g. structures returned from other services.

Next pothole with EF was when we tried to run an innocent query that joins the data from document: e.g. document contains items, and you want to query some items from some documents.

Guess what?

Right, EF Core does not support it.

Why?

Because!

Later we have found that many other constructs and functions that you easily use in SQL dialect of CosmosDB are not possible or supported in EF Core.

We were very upset with those crutches and came to a conclusion that EF Core harms more than helps when you work with CosmosDB.

We went on and looked at how you work directly with CosmosDB client, and have found that it has all features ready:

you may give it SQL and bind parameters;
you may convert result items to objects;
you may create, delete, update and query data;

So, do we need EF Core?

We answered, no.

This does not mean we reject the value of EF Core but here our conclusion was that this API layer just complicated things instead making them simpler. It might be that EF Core for CosmosDB is not mature enough at this time.

Wednesday, 20 October 2021 16:11:39 UTC

Comments [0] -
Azure | Thinking aloud

Binary serialization in .NET 5.0

Recently we have found that BinaryFormatter.Serialize and BinaryFormatter.Deserialize methods are marked as obsolete in .NET 5.0, and are declared dangerous:

The BinaryFormatter type is dangerous and is not recommended for data processing. Applications should stop using BinaryFormatter as soon as possible, even if they believe the data they're processing to be trustworthy. BinaryFormatter is insecure and can't be made secure.

See BinaryFormatter security guide for more details.

That guide along with its links go and expand on what problems BinaryFormatter poses. The schema of dangeous use cases, we have seen so far, is like that:

two different sides communicate to each other;
one side supplies input in BinaryFormatter's format;
other side reads input using BinaryFormatter and instantiates classes.

A threat arises when two sides cannot trust to each other or cannot establish trusted communication chanel. In these cases malign input can be supplied to a side reading the data, which might lead to unexpected code execution, deny of service, data exposure and to other bad consequences.

Arguing like this, today's maintainers of .NET concluded that it's better to tear down BinaryFormatter and similar APIs out of the framework.

Note that they don't claim BinaryFormatter itself, or Reflection API that it uses, as a core of the problem. They blame on communication.

Spelling out clearly what are concerns could help to everyone to better understand how to address it. In the area of security of communication there are a lot of ready solutions like:

use signature to avoid tampering the data;
use encription to avoid spying the data;
use access rights to avoid even access to the data;
use secure communication channels.

We can surely state that without applying these solutions no other serialization format is reliable and is subject of the same vulnerabilities.

After all it looked like an attempt to throw out the baby with the bath water. The good news is that thankfully to now modular structure of .NET runtime we're able to access binary serialization library, which are (and will be) available on nugets repositories. So, it's futile effort to erase this usefull API.

Tuesday, 02 February 2021 12:39:37 UTC

Comments [0] -
.NET | Thinking aloud

Collecting public data from Internet

Earlier we wrote that recently we've gotten few tasks related to Machine Learning. The prerequisites to such task is to collect and prepare the input data. Usually the required data is scattered across public sites, some of them are in plain text format (or close to it), but others are accessible as output of public applications. To obtain the required data for such sites you have to navigate thourgh pages, which often requires keeping state between navigations.

In order to implement this task you need some kind of crawler/scraper of the websites. Fortunately, there are a lot of frameworks, libraries and tools in C# (and in other languages too) that allow to do this (visit this or this site to see most popular of them), for example:

ScrapySharp
ABot
HtmlAgilityPack
DotnetSpider

There are pros and cons of using these libraries. Most crucial cons is a lack of support of rich UI based on heavy client-side scripts and client-side state support. Since not all such libraries implement fully browser emulation and even more, some of them do not support Javascript execution. So, they suit for gathering information from simple web pages, but no library allows to easy navigate to some page of a web application that keeps rich client-side state. Even best of them, like ScrapySharp, require heavy programming to achieve the result.

Then, suddenly, we've recalled that already for several years we're using Selenium and web drivers to automate web tests for AngularJS/Angular projects. After short discussion we came to conclusion that there is no big difference between testing web application and collecting data, since one of testing stages is collecting of actual results (data) from the tested page, and usually our tests consist of chains of actions performed on consequently visited pages.

This way we came to idea to use WebDriver API implemented by Selenium project. There are implementations of this API in different languages, and in C# too.

Using WebDriver we easily implement cumbersome navigation of a complex web application and can collect required data. Moreover, it allows to run WebDriver in screenless mode. Some of its features allow to create a snapshots of virtual screen and store HTML sources that would resulted of Javascript execution. These features are very useful during run-time troubleshooting. To create a complex web application navigation we need only a bit more knowledge than usual web application's user - we need to identify somehow pages' elements for example by CSS selectors or by id of HTML elements (as we do this for tests). All the rest, like coockies, view state (if any), value of hidden fields, some Javascript events will be transparent in this case.

Although one may say that approach with Selenium is rather fat, it's ought to mention that it is rather scalable. You may either to run several threads with different WebDriver instances in each thread or run several processes simultaneously.

However, beside pros there are cons in the solution with Selenium. They will appear when you'll decide to publish it, e.g. to Azure environment. Take a note that approach with Selenium requires a browser on the server, there is also a problem with Azure itself, as it's Microsoft's platform and Selenium is a product of their main competitor Google... So, some issues aren't techincals. The only possible solution is to use PaaS approach instead of SaaS, but in this case you have to support everything by yourself...

The other problem is that if your application will implement rather aggressive crawling, so either servers where you gather data or your own host might ban it. So, be gentle, play nice, and implement delays between requests.

Also, take into account that when you're implementing any crawler some problems may appear on law level, since not all web sites allow pull anything you want. Many sites use terms & conditions that defines rules for the site users (that you cralwer should follow to), otherwise legal actions may be used against them (or their owners in case of crawler). There is very interesting article that describes many pitfalls when you implement your own crawler.

To summarize everything we told early, the Selenium project could be used in many scenarios, and one of them is to create a powerful crawler.

Friday, 01 January 2021 14:34:37 UTC

Comments [0] -
ML.NET | Thinking aloud | Tips and tricks

Big Angular applications

While doing Cool:GEN migratiotions to Java and C# we produce rather big Angular applications.

Everything is fine: server runs a REST APIs, and client is an Angular application with components per each window, dialog or screen. The only problem is with the word big.

We observe that enterprises that used Cool:GEN to develop their workflow come to migration stage with applications containing thousands of windows. In simple cases, after assessment, clients are able to split their monolith workflow into a set of independent applications. But even then we are dealing with Angular applications containing hundreds to many thousands components.

Now, lets look at Angular world. Best practices advice to (and actually almost force you to) use Ahead Of Time, Ivy compilation of all components and their templates.

Naive attempt to build such monolith Angular application will most surely fail. Angular uses nodejs for build, and chances are close to 100% of nodejs to run out of memory during the ng build.

You may fight and throw at it a capable build machine with 16 or better with 32GB of RAM, and instruct nodejs to use all of it.

Well, it's rather poor and extensive way of dealing with scale problems but it works.

Next hurdle you run into is time. We know it might take days just to build it.

You may ask why?

Well, angular is doing its best to validate templates!

Unfortunately the only viable workaround is to switch this nice feature off for such a big project.

With such setup you're able to build angular project in just 20-30 minutes!

Well, this is a big progress if you compare complete failure vs something that at least passes the build.

But what's next?

Sure, there are next problems:

scripts both development and production are of nonsense size: like several dozen megabytes for production, and some even higher number for development.
ng serve eats even more memory and builds even longer making nightmare out of development and support of such an application;
startup of such application, if it will start at all, is very slow.

So, what can be done? How can we create a manageable Angular application containing that many components?

Angular advices Lazy-loading feature modules.

That's reasonable advice!

We can create coarse grained modules containing subsets of components or fine grained modules containing one component.

So, does it help?

Yes, but it does not solve all problems:

ng build and ng serve are still very slow;
build produces multiple small scripts that are loaded on demand, so at least application works in browser.

Yet, other important problem is still there: we have multiple severly separated server REST controllers with components that represent them.

On the server side we have Java or C# REST controllers hosting business logic. They are separated per multiple projects probably managed by separate teams, and probably kept in separate GITs (or whatever). On the client side we have a fat angular project storing everything kept in single source repository.

This is not going to work from management perspective.

So the next step is try to split fat Angular project into multiple small projects. So, let's say we shall have 100 small angular libraries combinded by master project.

This is not going to work either due to nature of npm projects, as it will requre terabytes of cloned node_modules folders for each library, and many hours to build each of them.

It seems there is a room for improvments in npm area. There is no point to make dedicated copies of node_modules. It's much easier to have a local cache of artifacts.

So, what is the direction? How to create big angular project after all?

What we have here is:

a big enterprise level application;
it is modular but modules must work together to form desired workflow;
different modules are supported by different teams (both server and client side);
client Angular components correspond to REST controllers on the server.
all pages use the same styles and the same set of UI controls;

Looking from this perspective all development can be seen as:

development and support of unified styles and ui components that must be reused through the application;
development of server side and REST controllers that implement business logic;
development of templates of components (note that components themselves do nothing except expose their templates).

Studying this design suggests important and independent role of templates just like it is in AngularJS!

In contrast Angular templates are only a tool used by components. It's not obvious how to use thousands of templates without first building thousands components; neither it's obvious how to dynamically host those templates (routes do not help here).

Though not obvious it's still possible to do it though it requires use a bit lower level API than tutorials suggest. Ingredients are:

use of Just In Time (in contrast to Ahead Of Time) compilation, and use View Enginev (in contrast to Ivy);
use ViewContainerRef to host components dynamically;
Dynamic components and modules that you can create on demand using templates loaded e.g. through HttpClient.

To make things short we shall show the example of dynamic components in next article.

Here we shall emphasize that such design allows us to create small angular application that builds under 20 seconds with component templates served along with the REST controllers, and stored in the same Git.

So, if you say have a server subproject exposing REST controller say in the form: api/area/MyComponent then its template may be exposed as resource: page/area/MyComponent. Templates are loaded and compiled on demand at runtime thus making application light. At the same time templates may be cached in browser cache thus reducing number of roundtrips to the server.

Tuesday, 29 December 2020 18:15:25 UTC

Comments [0] -
Angular | AngularJS | Thinking aloud

Groups in SQL

Recently our colleague turned to us and asked to help to deal with some complex query.

It has turned out that the complex part was to understand what he wants to achieve.

After listening to him we have forumulated the task in our words and have confirmed that that is what he wants.

So, that's the task in our formulation:

Assume you have events.
Each event acts upon one or more accounts.
Find all events that act on the same set of accounts.
Note we deal with mutiple millions of events and accounts.

Data is defined like this:

create table dbo.Event
(
  EventID bigint not null,
  AccountID varchar(18) not null,
  primary key(EventID, AccountID)
);

Requested query turned out to be very simple, yet, not as simple as one would think to account big amout of data:

with D as
(
  select * from dbo.Event
),
S as
(
  select
    EventID,
    count(*) Items,
    checksum_agg(checksum(AccountID)) Hash
  from
    D
  group by
    EventID
)
select
  S1.EventID, S2.EventID
from
  S S1
  inner join
  S S2
  on
    S1.EventID < S2.EventID and
    S1.Items = S2.Items and
    S1.Hash = S2.Hash and
    not exists
    (
      select AccountID from D where EventID = S1.EventID
      except
      select AccountID from D where EventID = S2.EventID
    );

The idea is to:

calculate a hash derived from list of accounts for each group;
join groups with the same hash;
verify that matched groups fit perfectly.

Even simpler solution that does not use hashes is not scaleable, as it's performance is slower than O(N^2), where N - is a number of events. It has unacceptable time with N ~1e4, nothing to say about N ~1e7.

At this point our colleague was already satisfied, as he got result in couple of minutes for a task that he could not even formalize as SQL.

But we felt it could be even better.

We looked at statistics:

with D as
(
  select * from dbo.Event
),
S as
(
  select
    EventID,
    count(*) Items
  from
    D
  group by
    EventID
)
select
  Items, count(*) EventCount
from
  S
group by
  Items
order by
  EventCount desc;

and have seen that most of the events, about 90%, deal with single account, and all other with two and more (some of them act upon big number of accounts).

The nature of the dataset gave us a hint of more verbose but more fast query:

with D as
(
  select * from dbo.Event
),
S as
(
  select
    EventID,
    min(AccountID) AccountID,
    count(*) Items,
    checksum_agg(checksum(AccountID)) Hash
  from
    D
  group by
    EventID
)
select
  S1.EventID, S2.EventID
from
  S S1
  inner join
  S S2
  on
    S1.EventID < S2.EventID and
    S1.Items = 1 and
    S2.Items = 1 and
    S1.AccountID = S2.AccountID
union all
select
  S1.EventID, S2.EventID
from
  S S1
  inner join
  S S2
  on
    S1.EventID < S2.EventID and
    S1.Items > 1 and
    S2.Items > 1 and
    S1.Items = S2.Items and
    S1.Hash = S2.Hash and
    not exists
    (
      select AccountID from D where EventID = S1.EventID
      except
      select AccountID from D where EventID = S2.EventID
    );

This query produced results in twenty seconds instead of couple of minutes for a dataset with ~1e7 rows.

Wednesday, 05 August 2020 07:44:07 UTC

Comments [0] -
SQL Server puzzle | Thinking aloud | Tips and tricks

Algorithm for Biconnected components

While working on algorithm to trace Biconnected components for Graph API in the XSLT we realized that we implemented it unconventionally.

A pseudocode in Wikipedia is:

GetArticulationPoints(i, d)
    visited[i] := true
    depth[i] := d
    low[i] := d
    childCount := 0
    isArticulation := false

    for each ni in adj[i] do
        if not visited[ni] then
            parent[ni] := i
            GetArticulationPoints(ni, d + 1)
            childCount := childCount + 1
            if low[ni] ≥ depth[i] then
                isArticulation := true
            low[i] := Min (low[i], low[ni])
        else if ni ≠ parent[i] then
            low[i] := Min (low[i], depth[ni])
    if (parent[i] ≠ null and isArticulation) or (parent[i] = null and childCount > 1) then
        Output i as articulation point

That algorithm is based on the fact that connected graph can be represented as a tree of biconnected components. Vertices of such tree are called articulation points. Implementation deals with a depth of each vertex, and with a lowpoint parameter that is also related to vertex depth during Depth-First-Search.

Out of interest we approached to the problem from different perspective. A vertex is an articulation point if it has neighbors that cannot be combined into a path not containing this vertex. As well as classical algorithm we use Depth-First-Search to navigate the graph, but in contrast we collect cycles that pass through each vertex. If during back pass of Depth-First-Search we find not cycle from "child" to "ancestor" then it is necessary an articulation point.

Here is pseudocode:

GetArticulationPoints(v, p) -> result
    index = index + 1
    visited[v] = index 
    result = index
    articulation = p = null ? -1 : 0

    for each n in neighbors of v except p do
        if visited[n] = 0 then
            nresult = GetArticulationPoints(n, v)
            result = min(result, nresult)

            if nresult >= visited[v] then
                articulation = articulation + 1
        else
            result = min(result, visited[n])

    if articulation > 0 then
        Output v as articulation point

Algorithms' complexity are the same.

What is interesting is that we see no obvious way to transform one algorithm into the other except from starting from Graph theory.

More is on Wiki.

Sunday, 24 May 2020 12:15:02 UTC

Comments [0] -
Thinking aloud | xslt

On XSLT 4

Michael Key's "A Proposal for XSLT 4.0" has spinned our interest in what could be added or changed in XSLT. This way we decided to implement Graph API purely in xslt. Our goal was to prove that:

it's possible to provide efficient implementation of different Graph Algorithms in XSLT;
to build Graph API the way that engine could provide native implementations of Grahp Algorithms.
to find through an experiments what could be added to XSLT as a language.

At present we may confirm that first two goals are reachable; and experiments have shown that XSLT could provide more help to make program better, e.g. we have seen that language could simplify coding cycles.

Graph algorithms are often expressed with while cycles, e.g "Dijkstra's algorithm" has:

12      while Q is not empty:
13          u ← vertex in Q with min dist[u]

body is executed when condition is satisfied, but condition is impacted by body itself.

In xslt 3.0 we did this with simple recursion:

<xsl:template name="f:while" as="item()*">
  <xsl:param name="condition" as="function(item()*) as xs:boolean"/>
  <xsl:param name="action" as="function(item()*) as item()*"/>
  <xsl:param name="next" as="function(item()*, item()*) as item()*"/>
  <xsl:param name="state" as="item()*"/>

  <xsl:if test="$condition($state)">
    <xsl:variable name="items" as="item()*" select="$action($state)"/>

    <xsl:sequence select="$items"/>

    <xsl:call-template name="f:while">
      <xsl:with-param name="condition" select="$condition"/>
      <xsl:with-param name="action" select="$action"/>
      <xsl:with-param name="next" select="$next"/>
      <xsl:with-param name="state" select="$next($state, $items)"/>
    </xsl:call-template>
  </xsl:if>
</xsl:template>

But here is the point. It could be done in more comprehended way. E.g. to let xsl:iterate without select to cycle until xsl:break is reached.

<xsl:iterate>
  <xsl:param name="name" as="..." value="..."/>
  
  <xsl:if test="...">
    <xsl:break/>
  </xsl:if>

  ...
</xsl:iterate>

So, what we propose is to let xsl:iterate/@select to be optional, and change the behavior of processor when the attribute is missing from compilation error to a valid behavior. This should not impact on any existing valid XSLT 3.0 program.

Tuesday, 19 May 2020 07:00:25 UTC

Comments [0] -
Thinking aloud | xslt

Graphs in XSLT

Recently we've read an article "A Proposal for XSLT 4.0", and thought it worth to suggest one more idea. We have written a message to Michael Kay, author of this proposal. Here it is:

A&V
Historically xslt, xquery and xpath were dealing with trees. Nowadays it became much common to process graphs. Many tasks can be formulated in terms of graphs, and in particular any task processing trees is also graph task.

I suggest to take a deeper look in this direction.

As an inspiration I may suggest to look at "P1709R2: Graph Library" - the C++ proposal.

Michael Kay
I have for many years found it frustrating that XML is confined to hierarchic relationships (things like IDREF and XLink are clumsy workarounds); also the fact that the arbitrary division of data into "documents" plays such a decisive role: documents should only exist in the serialized representation of the model, not in the model itself.

I started my career working with the Codasyl-defined network data model. It's a fine and very flexible data model; its downfall was the (DOM-like) procedural navigation language. So I've often wondered what one could do trying to re-invent the Codasyl model in a more modern idiom, coupling it with an XPath-like declarative access language extended to handle networks (graphs) rather than hierarchies.

I've no idea how close a reinventiion of Codasyl would be to some of the modern graph data models; it would be interesting to see. The other interesting aspect of this is whether you can make it work for schema-less data.

But I don't think that would be an incremental evolution of XSLT; I think it would be something completely new.

A&V
I was not so radical in my thoughts.

Even C++ API is not so radical, as they do not impose hard requirements on internal graph representation but rather define template API that will work both with third party representations (they even mention Fortran) or several built-in implementations that uses standard vectors.

Their strong point is in algorithms provided as part of library and not graph internal structure (I think authors of that paper have structured it not the best way). E.g. in the second part they list graph algorithms: Depth First Search (DFS); Breadth First Search (BFS); Topological Sort (TopoSort); Shortest Paths Algorithms; Dijkstra Algorithms; and so on.

If we shall try to map it to xpath world them graph on API level might be represented as a user function or as a map of user functions.

On a storage level user may implement graph using a sequence of maps or map of maps, or even using xdm elements.

So, my approach is evolutional. In fact I suggest pure API that could even be implemented now.

Michael Kay
Yes, there's certainly scope for graph-oriented functions such as closure($origin, $function) and is-reachable($origin, $function) and find-path($origin, $destination, $function) where we use the existing data model, treating any item as a node in a graph, and representing the arcs using functions. There are a few complications, e.g. what's the identity comparison between arbitrary items, but it can probably be done.

A&V
> There are a few complications, e.g. what's the identity comparison between arbitrary items, but it can probably be done.

One approach to address this is through definition of graph API. E.g. to define graph as a map (interface analogy) of functions, with equality functions, if required:
map
{
  vertices: function(),
  edges: function(),
  value: function(vertex),
  in-vertex: function(edge),
  out-vertex: function(edge),
  edges: function(vertex),
  is-in-vertex: function(edge, vertex),
  is-out-vertex: function(edge, vertex)
  ...
}

Not sure how far this will go but who knows.

Tuesday, 12 May 2020 06:08:51 UTC

Comments [0] -
Thinking aloud | xslt

COBOL - raise from the dead or RIP?

It's not a secret that COVID-19 epidemic will change our world significantly. picture more than 30 years ago It impacts on economics and public services, especially on social services of our countries. We saw this in our country and now the same happens in US. Probably the same thing happens in all countries all over the world that suffer from COVID-19.

Usually, nowadays, such services are exposed online, but nobody expected such extreme loading of these services. And they start molder under such load. Programs start crash... and somebody has to fix them. It's just a temporary technical inconvenience when there is staff that familiar with such programs and technologies, but what about situation when programs and technologies are obsolete? When staff that may support them are about to retire due to ages, when knowledge were almost lost... It's very scary when such applications rules very important spheres of our life such social services, finances, medicine, defence etc. Something similar happened in US, so US government asked IBM about a help with their stuff written in COBOL.

Probably, in short term, they'll close the gaps, but taking in account the fact that epidemic won't dissolve in a month, there is a risk to still in the same hole... There are two ways to solve this issue in long term:

to make COBOL widely used program language and to teach enough programmers that will use it. This is exactly what IBM tries to do, see this article, but this way to nowhere, since it is not too popular in society of software developers.
to migrate such application to new technologies and new platform (e.g. Java or C# on UNIX/Windows). In this case organizations obtain scalable applications and ability to find human resources that may fix/modernize such applications step by step, in spare time, without loosing existing functionality. This is what our company Advanced may provide. And we are not alone. There are many such companies that may implement such migration on high level of quality.

And many professionals (even those that deal with COBOL on day by day basis) think that only 2nd way is viable. Let's see what will happen... More about the issue, see here.

Wednesday, 15 April 2020 06:00:51 UTC

Comments [0] -
Thinking aloud

Angular vs AngularJS. Practical comparison.

People compare these two technologies, and it seems an established fact is that Angular is evolutionally more advanced framework. We're not going to contradict, contrary, we agree with it, but it's better for an opinion to be grounded on facts that one can evaluate and verify.

Fortunately we got a chance to make such a comparison.

We support conversions of Cool:GEN (a legacy CASE tool with roots in 80th) to java or C#. In its time Cool:GEN allowed to greatly automate enterprise development using Mainframes as a server side and Mainframe terminals or Win32 GUIs as clients.

The legacy of this tool are probably hundreds of business and database models, milions of programs generated on COBOL on Mainframes and on C or Java on Windows and Linux. All this runs to this time in many econimic sectors.

Usually the client is some enterprise that invested a lot into design, development and support of their business model using Cool:GEN but now most such clients a trying not to lose this legacy but to convert it into something that goes in parallel with todays technologies.

As original technology is sound, so it is possible to map it to todays Java or C# on server, REST or SOAP as a transport, and Angular, AngularJS or some other on client. Such automatic conversion is an essense of our conversions efforts.

To understand a scope consider a typical enterprise client that has 2-3 thousand windows that are backed by 20-30 thousand programs.

Now, consider that the conversion is done. On output among other things we produce a clean java or C# web application with REST and SOAP interface, and Angular or AngularJS web client that encapsulates those 2-3 thousand windows.

Each window definition is rather small 5-10 KB in html form, but whole mass of windows takes 10-30 MB, which is not small any more.

For AngularJS we generate just those html templates, but for Angular we need to generate separate components for each window that includes typescript class, template and style.

While amout of generated resource for AngularJS stays in those 10-30 MB, generated Angular takes at least 5-10 MB more.

The next step is build.

AngularJS builds distribution that includes all used libraries and a set of templates, and it takes something like a minute from the CPU. Produced output is about 300 KB minified script and those 10-30 MB of templates (multiple files with 5-10 KB each one).

Angular (here we talk about version 9) builds distribution that includes all used libraries and a set of compiled components that are to be loaded lazily on demand. Without of the both angular builder that performs tree shaking build takes days. With tree shaking off it takes 40 minutes. This is the first notable difference. Produced output for ES2015 (latest javascript) is about 1 MB, and 15-100 KB per each compiled component. This is the second notable difference that already impacts end user rather than developer.

The third difference is in the end user experience. Though we have built equalvalent Angular and AngularJS frontend we observe load time of angular is higher. This cannot only be ascribed to bigger file sizes. It seems internal initialization also takes more time for Angular.

So, our experience in this particular test shows that Angular has more room to improve. In particular: compile time, bundle size, runtime speed and simplicity of dynamic loading (we have strong cases when template compilation is not the best approach).

Saturday, 04 April 2020 12:37:15 UTC

Comments [0] -
AngularJS | Java | Thinking aloud | Tips and tricks

Theory vs practice

We were asked to help with search service in one enterprise. We were told that their SharePoint portal does not serve their need. Main complaints were about the quality of search results.

They have decided to implement external index of SharePoint content, using Elastic, and expose custom search API within the enterprise.

We questioned their conclusions, asked why did they think Elastic will give much better results, asked did they try to figure out why SharePoint give no desired results.

Answers did not convince us though we have joined the project.

What do you think? Elastic did not help at all though they hoped very much that its query language will help to rank results in a way that matched documents will be found. After all they thought it was a problem of ranking of results.

Here we have started our analysis. We took a specific document that must be found but is never returned from search.

It turned to be well known problem, at least we dealt with closely related one in the past. There are two ingredients here:

documents that have low chances to be found are PDFs;
we live in Israel, so most texts are Hebrew, which means words are written from right to left, while some other texts from left to right. See Bi-directional text.

Traditionally PDF documents are provided in a way that only distantly resembles logical structure of original content. E.g., paragraphs of texts are often represented as unrelated runs of text lines, or as set of text runs representing single words, or independant characters. No need to say that additional complication comes from that Hebrew text are often represented visually (from left to right, as if "hello" would be stored as "olleh" and would be just printed from right to left). Another common feature of PDF are custom fonts with uncanonical mappings, or images with glyphs of letters.

You can implement these tricks in other document formats but for some reason PDF is only format we have seen that regularly and intensively uses these techniques.

At this point we have realized that it's not a fault of a search engine to find the document but the feature of the PDF to expose its text to a crawler in a form that cannot be used for search. In fact, PDF cannot search by itself in such documents, as when you try to find some text in the document opened in a pdf viewer, that you clearly see in the document, you often find nothing.

A question. What should you do in this case when no any PDF text extractor can give you correct text but text is there when you looking at document in a pdf viewer?

We decided it's time to go in the direction of image recognition. Thankfully, nowdays it's a matter of available processing resources.

Our goal was:

Have images of each PDF page. This task is immediately solved with Apache PDFBox (A Java PDF Library) - it's time to say this is java project.
Run Optical Character Recognition (OCR) over images, and get extracted texts. This is perfectly done by tesseract-ocr/tesseract, and thankfully to its java wrapper bytedeco/javacpp-presets we can immediately call this C++ API from java.

The only small nuisance of tesseract is that it does not expose table recognition info, but we can easily overcome it (we solved this task in the past), as along with each text run tesseract exposes its position.

What are results of the run of such program?

Full success! It works with high quality of recognition. Indeed, there is no any physical noise that impacts quality.
Slow speed - up to several seconds per recognition per page.
Scalable solution. Slow speed can be compensated by almost unlimited theoretical scalability.

So, what is the lesson we have taked from this experience?

Well, you should question yourself, test and verify ideas on the ground before building any theories that will lead you in completely wrong direction. After all people started to realize there was no need to claim on SharePoint, to throw it, and to spend great deal of time and money just to prove that the problem is in the different place.

A sample source code can be found at App.java

Friday, 26 July 2019 16:38:11 UTC

Comments [0] -
C++ | Java | Thinking aloud | Tips and tricks

Scheduling algorithm for xsl:for-each/@saxon:threads=N

This story started half year ago when Michael Kay, author of Saxon XSLT processor, was dealing with performance in multithreaded environment. See Bug #3958.

The problem is like this.

Given XSLT:

<xsl:stylesheet exclude-result-prefixes="#all" 
  version="3.0" 
  xmlns:saxon="http://saxon.sf.net/"
  xmlns:xs="http://www.w3.org/2001/XMLSchema" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output method="text" />

  <xsl:template name="main">
    <xsl:for-each saxon:threads="4" select="1 to 10">
      <xsl:choose>
        <xsl:when test=". eq 1">
          <!-- Will take 10 seconds -->
          <xsl:sequence select="
            json-doc('https://httpbin.org/delay/10')?url"/>
        </xsl:when>
        <xsl:when test=". eq 5">
          <!-- Will take 9 seconds -->
          <xsl:sequence select="
            json-doc('https://httpbin.org/delay/9')?url"/>
        </xsl:when>
        <xsl:when test=". eq 10">
          <!-- Will take 8 seconds -->
          <xsl:sequence select="
            json-doc('https://httpbin.org/delay/8')?url"/>
        </xsl:when>
      </xsl:choose>
    </xsl:for-each>
    <xsl:text>
</xsl:text>
  </xsl:template>
</xsl:stylesheet>

Implement engine to achieve best performance of parallel for-each.

Naive implementation that will distribute iterations per threads will run into unfair load on threads, so some load-balancing is required. That was the case Saxon EE.

Michael Kay has been trying to find most elegant way for the implementation and has written the comment:

I can't help feeling that the answer to this must lie in using the Streams machinery, and Spliterators in particular. I've spent another hour or so reading all about Spliterators, and I have to confess I really don't understand the paradigm. If someone can enlighten me, please go ahead...

We have decided to take the challange and to model the expected behavior using Streams. Here is our go:

import java.util.stream.IntStream;
import java.util.stream.Stream;
import java.util.function.Consumer;
import java.util.function.Function;

public class Streams
{
  public static class Item<T>
  {
    public Item(int index, T data)
    {
      this.index = index;
      this.data = data;
    }
    
    int index;
    T data;
  }

  public static void main(String[] args)
  {
    run(
      "Sequential",
      input(),
      Streams::action,
      Streams::output,
      true);
    
    run(
      "Parallel ordered", 
      input().parallel(),
      Streams::action,
      Streams::output,
      true);
    
    run(
      "Parallel unordered", 
      input().parallel(),
      Streams::action,
      Streams::output,
      false);    
  }
  
  private static void run(
    String description,
    Stream<Item<String>> input,
    Function<Item<String>, String[]> action,
    Consumer<String[]> output,
    boolean ordered)
  {
    System.out.println(description);
    
    long start = System.currentTimeMillis();
   
    if (ordered)
    {
      input.map(action).forEachOrdered(output);
    }
    else
    {
      input.map(action).forEach(output);
    }
    
    long end = System.currentTimeMillis();
    
    System.out.println("Execution time: " + (end - start) + "ms.");
    System.out.println();
  }
  
  private static Stream<Item<String>> input()
  {
    return IntStream.range(0, 10).
      mapToObj(i -> new Item<String>(i + 1, "Data " + (i + 1)));
  }
  
  private static String[] action(Item<String> item)
  {
    switch(item.index)
    {
      case 1:
      {
        sleep(10);
        
        break;
      }
      case 5:
      {
        sleep(9);
        
        break;
      }
      case 10:
      {
        sleep(8);
        
        break;
      }
      default:
      {
        sleep(1);
        
        break;
      }
    }
    
    String[] result = { "data:", item.data, "index:", item.index + "" };
    
    return result;
  }
  
  private synchronized static void output(String[] value)
  {
    boolean first = true;
    
    for(String item: value)
    {
      if (first)
      {
        first = false;
      }
      else
      {
        System.out.print(' ');
      }
    
      System.out.print(item);
    }

    System.out.println();
  }
  
  private static void sleep(int seconds)
  {
    try
    {
      Thread.sleep(seconds * 1000);
    }
    catch(InterruptedException e)
    {
      throw new IllegalStateException(e);
    }
  }
}

We model three cases:

"Sequential"

slowest, single threaded execution with output:

data: Data 1 index: 1
data: Data 2 index: 2
data: Data 3 index: 3
data: Data 4 index: 4
data: Data 5 index: 5
data: Data 6 index: 6
data: Data 7 index: 7
data: Data 8 index: 8
data: Data 9 index: 9
data: Data 10 index: 10
Execution time: 34009ms.

"Parallel ordered"

fast, multithread execution preserving order, with output:

data: Data 1 index: 1
data: Data 2 index: 2
data: Data 3 index: 3
data: Data 4 index: 4
data: Data 5 index: 5
data: Data 6 index: 6
data: Data 7 index: 7
data: Data 8 index: 8
data: Data 9 index: 9
data: Data 10 index: 10
Execution time: 10019ms.

"Parallel unordered"

fastest, multithread execution not preserving order, with output:

data: Data 6 index: 6
data: Data 2 index: 2
data: Data 4 index: 4
data: Data 3 index: 3
data: Data 9 index: 9
data: Data 7 index: 7
data: Data 8 index: 8
data: Data 5 index: 5
data: Data 10 index: 10
data: Data 1 index: 1
Execution time: 10001ms.

What we can add in conclusion is that xslt engine could try automatically decide what approach to use, as many SQL engines are doing, and not to force developer to go into low level engine details.

Sunday, 24 March 2019 07:52:02 UTC

Comments [0] -
Java | Thinking aloud | xslt

XPath through evolution

Recently we observed how we solved the same task in different versions of XPath: 2.0, 3.0, and 3.1.

Consider, you have a sequence $items, and you want to call some function over each item of the sequence, and to return combined result.

In XPath 2.0 this was solved like this:

for $item in $items return
  f:func($item)

In XPath 3.0 this was solved like this:

$items!f:func(.)

And now with XPath 3.1 that defined an arrow operator => we attempted to write something as simple as:

$items=>f:func()

That is definitely not working, as it is the same as f:func($items).

Next attempt was:

$items!=>f:func()

That even does not compile.

So, finally, working expression using => looks like this:

$items!(.=>f:func())

This looks like a step back comparing to XPath 3.0 variant.

More than that, XPath grammar of arrow operator forbids the use of predictes, axis or mapping operators, so this won't compile:

$items!(.=>f:func()[1])

$items!(.=>f:func()!something)

Our conclusion is that arrow operator is rather confusing addition to XPath.

Saturday, 03 November 2018 20:59:28 UTC

Comments [0] -
Thinking aloud | xslt

Xslt Streamability

Xslt 3.0 defines a feature called streamability: a technique to write xslt code that is able to handle arbitrary sized inputs.

This contrasts with conventional xslt code (and xslt engines) where inputs are completely loaded in memory.

To make code streamable a developer should declare her code as such, and the code should pass Streamability analysis.

The goal is to define subset of xslt/xpath operations that allow to process input in one pass.

In simple case it's indeed a simple task to verify that code is streamable, but the more complex your code is the less trivial it's to witness it is streamable.

On the forums we have seen a lot of discussions, where experts were trying to figure out whether particular xslt is streamable. At times it's remarkably untrivial task!

This, in our opinion, clearly manifests that the feature is largerly failed attempt to inscribe some optimization technique into xslt spec.

The place of such optimization is in the implementation space, and not in spec. Engine had to attempt such optimization and fallback to the traditional implementation.

The last such example is: Getting SXST0060 "No streamable path found in expression" when trying to push a map with grounded nodes to a template of a streamable mode, where both xslt code and engine developers are not sure that the code is streamable in the first place.

By the way, besides streamability there is other optimization technique that works probably in all SQL engines. When data does not fit into memory engine may spill it on disk. Thus trading memory pressure for disk access. So, why didn't such techninque find the way into the Xslt or SQL specs?

Tuesday, 02 October 2018 12:50:22 UTC

Comments [0] -
Thinking aloud | xslt

C++ Exceptions

For more than 25 years continues a discussion in C++ community about exceptions. In our opinion this can only be compared with math community and their open problems like Hilbert's 23 problems dated by 1900.

In essence C++ exception discussion is about efficiency of exceptions vs status codes. This problem is not so acute in other languages (like java or C#) because those languages postulate different goals.

C++ designers have introduced a zero-overhead principle for any language feature, which is:

If you don’t use some feature you don’t pay for it.
If you do use it you cannot (reasonably) write it more efficiently by hand.

Exceptions comparing to status codes do not withstand this demand. This led to the fragmentation of C++ comunity where many big projects and code styles ban exceptions partially or completely.

Make no doubt that all this time people were trying to make exceptions better, and have found techniques to make them space and run time efficient to some extent, but still, old plain status codes outperform both in speed (especially in predictability of time of exception handling logic) and in code size.

We guess the solution is finally found after the quarter the century of discussion!

WG paper: Zero-overhead deterministic exceptions: Throwing values by Herb Sutter. This "paper aims to extend C++’s exception model to let functions declare that they throw a statically specified type by value. This lets the exception handling implementation be exactly as efficient and deterministic as a local return by value, with zero dynamic or non-local overheads."

In other words author suggests to:

extend exception model (in fact implement additional one);
make exceptions as fast and as predicable as status codes (which virtually means designate a status code as a new exception);

Here are author's arguments:

Status code is usually just a number, and handling an error is like to perform some if or switch statements.
Handling errors with status codes is predicable in time and visible in the code though it burdens the logic very much.
Exceptions give clear separation between a control flow and error handling.
Exceptions are mostly invisible in the code, so it's not always clear how much they add to code size and how they impact performance.
Statistics show that exceptions add 15 to 30% to size of code, even if no exceptions are used (violation of zero-overhead principle);
Exceptions require Run Time Type Information about types, and have no predictable memory (stack or heap) footprint (violation of zero-overhead principle).

What aurhor suggests might seem too radical but at present it's only viable solution to reestablish zero-verhead principle and to reunite two C++ camps.

Here is the approach.

Clarify what should be considered as an exception.
1. Contract violation.
  Are contract violation like invalid values of arguments or invalid post conditions (unhold invariants) are exceptions or programmer's bugs?
  
  If later then it's best to terminate, as you cannot correctly recover from bug.
2. Virtual Machine fault.
  What user program can do with stack overflow?
  
  The best according to the author it to terminate.
3. OOM - Out Of Memory error.
  What is the best way to deal with OOM dyring dynamic allocation.
  
  At present there are two operators:
  - new - to allocate memory dynamically and throw bad_alloc on failure.
  - new(nothrow) - to allocate memory dynamically and return nullptr on failure.
  Herb Sutter suggests to change new behavior to terminate on failure (it is very hard to properly handle bad_alloc anyway), while new(nothrow) will still allow to build code that can deal with OOM.
4. Partial success
  This should never be reported as an error, and status codes should be used to report the state.
5. Error condition distinct from any type of success.
  This is where exceptions should be used.
Statistics shows that with such separation more than 90% of what curently is an exception will not be exception any more, so no any hidden exception logic is required: program either works or terminates.
Refactor exception
Redefine what exception object is and how it is propagated.

It should be thin value type. At minimum it needs to contain an error code. Suggested size is up to a couple of pointers.

Compiler should be able to cheaply allocate and copy it on the stack or even in the processor's registers.

Such simple exception type resolves problems with lifetime of exception object, and makes exception handling as light as checking status codes.

Exception should be propagated through return chanel, so it's like a new calling convention that defines either function result or error outcome.

It's not our intention to quote whole the paper here. If you're interested then please read it. Here we want to outline our concerns.

Exception payload.
This paper emphasizes that exception type should be small.

So, what to do with exception payload, if any (at least error message if it's not a static text)?

If this won't be resolved then developers will start to invent custom mechanisms like GetLastErrorMessage().

And what to do with aggregate exceptions?

We think this must be clearly addressed.
Implemntation shift.
We can accept that most of the current exceptions will terminate.

But consider now some container that serves requests, like web container or database.

It may be built from multiple components and serve multiple requests concurently. If one request will terminate we don't want for container to terminate.

If terminate handler is called then we cannot rely on state of the application. At least we can expect heap leaks and un-released resources.

So, we either want to be able release heap and other resources per request, or we want to go down with whole process and let OS deal with it.

In the later case we need to design such containers differently: as a set of cooperative processes; OS should allow to spin processes more easily.
VM with exceptions
There are Virtual Machines that allow exception to be thrown on each instruction (like JVM, or CLI).

You cannot tell in this case that code would never throw exception, as it can out of the blue!

Event in x86 you can have PAGE FAULT on memory access, which can be translated into an exception.

So, it's still a question whether the terminate() solution is sound in every case, and whether compiler can optimize out exception handling if it proves staticlly that no exception should be thrown.

Monday, 06 August 2018 12:00:10 UTC

Comments [0] -
C++ | Thinking aloud

SQL Server vs DB2

We often deal with different SQL DBs, and in particular DB2, Oracle, and SQL Server, and this is what we have found lately.

Our client has reported a problem with SQL insert into the DB2:

subject table has a small number of columns, but large number of rows;
insert should attempt to insert a row but tolerate the duplicate.

The prototype table is like this:

create table Link(FromID int, ToID int, primary key(FromID, ToID));

DB2 SQL insert is like this:

insert into Link(FromID, ToID)
values(1, 2)
except
select FromID, ToID from Link;

The idea is to have empty row set to insert if there is a duplicate.

SQL Server variant looks like this:

insert into Link(FromID, ToID)
select 1, 2
except
select FromID, ToID from Link;

Client reported ridiculously slow performance of this SQL, due to table scan to calculate results of except operator.

Out of interest we performed the same experiment with SQL Server, and found the execution plan is optimal, and index seek is used to check duplicates. See:

The only reasonable way of dealing with such DB2's peculiarity, except trying to insert and handle duplicate exception, was to qualify select with where clause:

insert into Link(FromID, ToID)
values(1, 2)
except
select FromID, ToID from Link where FromID = 1 and ToID = 2;

We think DB2 could do better.

Saturday, 14 April 2018 19:38:20 UTC

Comments [0] -
SQL Server puzzle | Thinking aloud

Some fun with java switch

Today we wanted to write some code in java that performs some or the other action depending on a condition. At the same time if some action fails we wanted to fall back to the other action.

We've written it like this:

switch(boolean_expression)
{
  case true:
  {
    try
    {
      // Some actions.
      break;
    }
    catch(Exception e)
    {
      // Fall back to false route. 
    }
  }
  case false:
  {
    // Other actions.
    break;
  }
}

The fun part is that it's not valid java code.

Why?

The answer can be found in spec: 14.11. The switch Statement

The type of the Expression must be char, byte, short, int, Character, Byte, Short, Integer, String, or an enum type (§8.9), or a compile-time error occurs.

But why?

Who knows...

Sure there are workarounds, even with switch, but it just not justified restriction...

Tuesday, 22 August 2017 15:18:29 UTC

Comments [0] -
Java | Thinking aloud

Saxon HE map and array types.

We have found that Saxon HE 9.7.0-18 has finally exposed partial support to map and array item types. So, now you can encapsulate your data in sequence rather than having a single sequence and treating odd and even elements specially.

Basic example is:

<xsl:stylesheet version="3.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:t="t"
  xmlns:map="http://www.w3.org/2005/xpath-functions/map"
  exclude-result-prefixes="xs t map">

  <xsl:template match="/">
    <xsl:variable name="map" as="map(xs:string, xs:string)" select="
      map 
      {
        'Su': 'Sunday',
        'Mo': 'Monday',
        'Tu': 'Tuesday',
        'We': 'Wednesday',
        'Th': 'Thursday',
        'Fr': 'Friday',
        'Sa': 'Saturday'
      }"/>
      
     <xsl:message select="map:keys($map)"/>
  </xsl:template>  

</xsl:stylesheet>

A list of map functions can be found here http://www.w3.org/2005/xpath-functions/map/, though not all are available, as Saxon HE still does not allow inline functions.

P.S. From the development perspective it's a great harm that Saxon HE is so limited. Basically limited to xslt 2.0 + some selected parts of 3.0.

Tuesday, 16 May 2017 06:20:48 UTC

Comments [0] -
Thinking aloud | xslt

View on tunnel parameters in XSLT

Lately we do not program in XSLT too often but rather in java, C#, SQL and javascript, but from time to time we have tasks in XSLT.

People claim that those languages are too different and use this argument to explain why XSLT is only a niche language. We, on the other hand, often spot similarities between them.

So, what it is in other languages that is implemented as tunnel parameters in XSLT?

To get an answer we reiterated how they work in XSLT, so, you:

define a template with parameters marked as tunnel="yes";
use these parameters the same way as regular parameters;
pass template parameters down to other templates marking them as tunnel="yes";

The important difference of regular template parameters from tunnel parameters is that the tunnel parameters are implicitly passed down the call chain of templates. This means that you:

define your API that is expected to receive some parameter;
pass these parameters somewhere high in the stack, or override them later in the stack chain;
do not bother to propagate them (you might not even know all of the tunnel parameters passed, so encapsulation is in action);

As a result we have a template with some parameters passed explicitly, and some others are receiving values from somewhere, usually not from direct caller. It’s possible to say that these tunnel parameters are injected into a template call. This resembles a lot injection API in other languages where you configure that some parameters are prepared for you by some container rather then by direct caller.

Now, when we have expressed this idea it seems so obvious but before we thought of this we did not realize that tunnel parameters in XSLT and Dependency Injection in other languages are the same thing.

Sunday, 26 March 2017 04:21:36 UTC

Comments [0] -
Thinking aloud | xslt

Dependency Injection

Our genuine love is C++. Unfortunately clients don't always share our favors, so we mostly occupied in the C#, java and javascript. Nevertheless, we're closely watching the evolution of the C++. It became more mature in the latest specs.

Recently, we wondered how would we deal with dependency injection in C++. What we found is only strengthened our commitment to C++.

Parameter packs introduced in C++ 11 allow trivial implementation of constructor injection, while std::type_index, std::type_info and std:any give service containers.

In fact there are many DI implementations out there. The one we refer here is Boost.DI. It's not standard nor we can claim it's the best but it's good example of how this concept can be implemented.

So, consider their example seen in Java with CDI, in C# in .NET Core injection, and in C++:

Java:

@Dependent
public class Renderer 
{
  @Inject @Device
  private int device;
};

@Dependent
public class View 
{
  @Inject @Title
  private String title;
  @Inject
  private Renderer renderer;
};

@Dependent
public class Model {};

@Dependent
public class Controller 
{
  @Inject
  private Model model;
  @Inject
  private View view;
};

@Dependent
public class User {};

@Dependent
public class App 
{
  @Inject
  private Controller controller;
  @Inject
  private User user;
};

...
  Privider<App> provider = ...

  App app = provider.get();

C#:

public class RenderedOptions
{
  public int Device { get; set; }
}
    
public class ViewOptions
{
  public int Title { get; set; }
}
    
public class Renderer 
{
  public Renderer(IOptions<RendererOptions> options)
  {
    Device = options.Device;
  }

  public int Device { get; set; }
}

public class View 
{
  public View(IOptions<ViewOptions> options, Renderer renderer)
  {
    Title = options.Title;
    Renderer = renderer;
  }

  public string Title { get; set; }
  public Renderer Renderer { get; set; }
}

public class Model {}

public class Controller 
{
  public Controller(Model model, View view) 
  {
    Model = model;
    View = view;
  }

  public Model Model { get; set; }
  public View View { get; set; }
};

public class User {};

public class App 
{
  public App(Controller controller, User user) 
  {
    Controller = controller;
    User = user;
  }

  public Controller Controller { get; set; }
  public User User { get; set; }
};

...
  IServiceProvider serviceProvider = ...

  serviceProvider.GetService<App>();

C++:

#include <boost/di.hpp>

namespace di = boost::di;

struct renderer 
{
  int device;
};

class view 
{
public:
  view(std::string title, const renderer&) {}
};

class model {};

class controller 
{
public:
  controller(model&, view&) {}
};

class user {};

class app 
{
public:
  app(controller&, user&) {}
};

int main()
{
  /**
   * renderer renderer_;
   * view view_{"", renderer_};
   * model model_;
   * controller controller_{model_, view_};
   * user user_;
   * app app_{controller_, user_};
   */

  auto injector = di::make_injector();
  injector.create<app>();
}

What is different between these DI flavors?

Not too much from the perspective of the final task achieved.

In java we used member injection, with qualifiers to inject scalars.

In C# we used constructor injection with Options pattern to inject scalars.

In C++ we used constructor injection with direct constants injected.

All technologies have their API to initialize DI container, but, again, while API is different, the idea is the same.

So, expressiveness of C++ matches to those of java and C#.

Deeper analysis shows that java's CDI is more feature rich than DI of C# and C++, but, personally, we consider it's advantage of C# and C++ that they have such a light DI.

At the same time there is an important difference between C++ vs java and C#.

While both java and C# are deemed to use reflection (C# in theory could use code generation on the fly to avoid reflection), C++'s DI natively constructs and injects services.

What does it mean for the user?

Well, a lot! Both in java and in C# you would not want to use DI in a performance critical part of code (e.g. in a tight loop), while it's Ok in C++ due to near to zero performance impact from DI. This may result in more modular and performant code in C++.

Thursday, 06 October 2016 11:27:42 UTC

Comments [0] -
.NET | C++ | Java | Thinking aloud

ASP.NET Core Session

While reading on ASP.NET Core Session, and analyzing the difference with previous version of ASP.NET we bumped into a problem...

At Managing Application State they note:

Session is non-locking, so if two requests both attempt to modify the contents of session, the last one will win. Further, Session is implemented as a coherent session, which means that all of the contents are stored together. This means that if two requests are modifying different parts of the session (different keys), they may still impact each other.

This is different from previous versions of ASP.NET where session was blocking, which meant that if you had multiple concurrent requests to the session, then all requests were synchronized. So, you could keep consistent state.

In ASP.NET Core you have no built-in means to keep a consistent state of the session. Even assurances that the session is coherent does not help in any way.

You options are:

build your own synchronization to deal with this problem (e.g. around the database);
decree that your application cannot handle concurrent requests to the same session, so client should not attempt it, otherwise behaviour is undefined.

Wednesday, 28 September 2016 19:22:15 UTC

Comments [0] -
.NET | ASP.NET | Thinking aloud

Dao for programmers.

8 Ways to Become a Better Coder is a good article. Read and apply to yourself. Never mind what your occupation is. Replace "coder" with your profession. Suits to everybody who wants to be the best.

Monday, 29 February 2016 18:33:04 UTC

Comments [0] -
Thinking aloud

Pull visitor pattern

Visitor pattern is often used to separate operation from object graph it operates with. Here we assume that the reader is familiar with the subject.

The idea is like this:

The operation over object graph is implemented as type called Visitor.
Visitor defines methods for each type of object in the graph, which a called during traversing of the graph.
Traversing over the graph is implemented by a type called Traverser, or by the Visitor or by each object type in the graph.

Implementation should collect, aggregate or perform other actions during visit of objects in the graph, so that at the end of the visit the purpose of operation will be complete.

Such implementation is push-like: you create operation object and call a method that gets object graph on input and returns operation result on output.

In the past we often dealt with big graphs (usually these are virtual graphs backended at database or at a file system).

Also having a strong experience in the XSLT we see that the visitor pattern in OOP is directly mapped into xsl:template and xsl:apply-templates technique.

Another thought was that in XML processing there are two camps:

SAX (push-like) - those who process xml in callbacks, which is very similar to visitor pattern; and
XML Reader (pull-like) - those who pull xml components from a source, and then iterate and process them.

As with SAX vs XML Reader or, more generally, push vs pull processing models, there is no the best one. One or the other is preferable in particular circumstances. E.g. Pull like component fits into a transformation pipeline where one pull component has another as its source; another example is when one needs to process two sources at once, which is untrivial with push like model. On the other hand push processing fits better into Reduce part of MapReduce pattern where you need to accumulate results from source.

So, our idea was to complete classic push-like visitor pattern with an example of pull-like implementation.

For the demostration we have selected Java language, and a simplest boolean expression calculator.

Please follow GitHub nesterovsky-bros/VisitorPattern to see the detailed explanation.

Tuesday, 09 February 2016 12:37:10 UTC

Comments [0] -
Java | Thinking aloud | xslt

Error during transformation in Saxon 9.7 - Continue

Essence of the problem (see Error during transformation in Saxon 9.7, thread on forum):

XPath engine may arbitrary reorder predicates whose expressions do not depend on a context position.
While an XPath expression $N[@x castable as xs:date][xs:date(@x) gt xs:date("2000-01-01")] cannot raise an error if it's evaluated from the left to right, an expression with reordered predicates $N[xs:date(@x) gt xs:date("2000-01-01")][@x castable as xs:date] may generate an error when @x is not a xs:date.

To avoid a potential problem one should rewrite the expression like this: $N[if (@x castable as xs:date) then xs:date(@x) gt xs:date("2000-01-01") else false()].

Please note that the following rewrite will not work: $N[(@x castable as xs:date) and (xs:date(@x) gt xs:date("2000-01-01"))], as arguments of and expression can be evaluated in any order, and error that occurs during evaluation of any argument may be propageted.

With these facts we faced a task to check our code base and to fix possible problems.

A search has brought ~450 instances of XPath expessions that use two or more consequtive predicates. Accurate analysis limited this to ~20 instances that should be rewritten. But then, all of sudden, we have decided to commit an experiment. What if we split XPath expression in two sub expressions. Can error still resurface?

Consider:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xsl:variable name="elements" as="element()+"><a/><b value="c"/></xsl:variable> <xsl:template match="/"> <xsl:variable name="a" as="element()*" select="$elements[self::d or self::e]"/> <xsl:variable name="b" as="element()*" select="$a[xs:integer(@value) = 1]"/> <xsl:sequence select="$b"/> </xsl:template> </xsl:stylesheet>

As we expected Saxon 9.7 internally assembles a final XPath with two predicates and reorders them. As result we get an error:

Error at char 20 in xsl:variable/@select on line 8 column 81 of Saxon9.7-filter_speculation.xslt: FORG0001: Cannot convert string "c" to an integer

This turn of events greately complicates the code review we have to commit.

Michiel Kay's answer to this example:

I think your argument that the reordering is inappropriate when the expression is written using variables is very powerful. I shall raise the question with my WG colleagues.

In fact we think that either: reordering of predicates is inappropriate, or (weaker, to allow reordering) to treat an error during evaluation of predicate expression as false(). This is what is done in XSLT patterns. Other solutions make XPath less intuitive.

In other words we should use XPath (language) to express ideas, and engine should correctly and efficiently implement them. So, we should not be forced to rewrite expression to please implementation.

Monday, 04 January 2016 10:07:12 UTC

Comments [0] -
Thinking aloud | xslt

Error during transformation in Saxon 9.7

On December, 30 we have opened a thread in Saxon help forum that shows a stylesheet generating an error. This is the stylesheet:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xsl:variable name="elements" as="element()+"><a/><b value="c"/></xsl:variable> <xsl:template match="/"> <xsl:sequence select="$elements[self::d or self::e][xs:integer(@value) = 1]"/> </xsl:template> </xsl:stylesheet>

We get an error:

Error at char 47 in xsl:sequence/@select on line 7 column 83 of Saxon9.7-filter_speculation.xslt: FORG0001: Cannot convert string "c" to an integer Exception in thread "main" ; SystemID: .../Saxon9.7-filter_speculation.xslt; Line#: 7; Column#: 47 ValidationException: Cannot convert string "c" to an integer at ...

It's interesting that error happens in Saxon 9.7 but not in earlier versions.

The answer we got was expected but disheartening:

The XPath specification (section 2.3.4, Errors and Optimization) explicitly allows the predicates of a filter expression to be reordered by an optimizer. See this example, which is very similar to yours:
The expression in the following example cannot raise a casting error if it is evaluated exactly as written (i.e., left to right). Since neither predicate depends on the context position, an implementation might choose to reorder the predicates to achieve better performance (for example, by taking advantage of an index). This reordering could cause the expression to raise an error.
$N[@x castable as xs:date][xs:date(@x) gt xs:date("2000-01-01")]

Following the spec, Michael Kay advices us to rewrite XPath:

$elements[self::d or self::e][xs:integer(@value) = 1]

like this:

$elements[if (self::d or self::e) then xs:integer(@value) = 1 else false()]

Such subtleties make it hard to reason about and to teach XPath. We doubt many people will spot the difference immediately.

We think that if such optimization was so much important to spec writers, then they had to change filter rules to treat failed predicates as false(). This would avoid any obscure differences in these two, otherwise equal, expressions. In fact something similar already exists with templates where failed evaluation of pattern is treated as un-match.

Saturday, 02 January 2016 21:32:16 UTC

Comments [0] -
Thinking aloud | xslt

Akinator like engine

A collegue has approached to us with a question on how Akinator engine may work.

To our shame we have never heard about this amazing game before. To fill the gap we have immediately started to play it, and have identified it as a Troubleshooting solver.

It took us a couple of minutes to come up with a brilliant solution: "We just need to google and find the engine in the internet". :-)

Unfortunately, this led to nowhere, as no Akinator itself is open sourced, and no other good quality open source solutions are available.

After another hour we have got two more ideas:

The task should fit into SQL;
The task is a good candidate for a neural network.

In fact, the first might be required to teach the second, so we have decided to formalize the problem in terms of SQL, while still keeping in mind a neural network.

With this goal we have created a GitHub project. Please see the algorithm and its implementation at github.com/nesterovsky-bros/KB.

Wednesday, 16 December 2015 12:33:41 UTC

Comments [0] -
Announce | SQL Server puzzle | Thinking aloud

Database async trap

Though ADO.NET and other ORM framworks like EntityFramework and Dapper support async pattern, you should remember that database drivers (at least all we know about) do not support concurrent db commands running against a single connection.

To see what we mean consider a bug we have recently identified. Consider a code:

await Task.WhenAll(
  newImages.
    Select(
      async image =>
      {
        // Load data from url.
        image.Content = await HttpUtils.ReadData(image.Url);

        // Insert image into the database.
        image.ImageID = await context.InsertImage(image);
      }));

The code runs multiple tasks to read images, and to write them into a database.

Framework decides to run all these tasks in parallel. HttpUtils.ReadData() has no problem with parallel execution, while context.InsertImage() does not run well in parallel, and is a subject of race conditions.

To workaround the problem we had to use async variant of a critical section. So the fixed code looks like this:

using(var semaphore = new SemaphoreSlim(1))
{
  await Task.WhenAll(
    newImages.
      Select(
        async image =>
        {
          // Load data from url.
          image.Content = await HttpUtils.ReadData(image.Url);

          await semaphore.WaitAsync();

          try
          {
            // Insert image into the database.
            image.ImageID = await context.InsertImage(image);
          }
          finally
          {
            semaphore.Release();
          }
        }));
}

So, in the async world we still should care about race conditions.

Monday, 27 July 2015 06:44:45 UTC

Comments [0] -
.NET | Thinking aloud | Tips and tricks

Multisite transclusion in Angularjs

Having a strong experience in ASP.NET and JSF, we found angular's transclusion concept is obscure and counterintuitive. It took a while for both of us to grasp the transclude's ideas described the Developer Guide. We suspect that this is due to the bad design: a bad design leads to a bad wording.

The other consequence of the bad design is that the transclusion is limited to one template per directive, which limits the use of the feature.

Consider:

A directive my-page that encapsulates a page with menu and content.
my-page uses templateUrl: my-page.html to render the page.
my-page.html defines two sites where menu and page content have to be embedded.
Two content fragments are passed to my-page to fill content sites.

Unfortunately, you cannot immediately implement this design in angularjs. On the other hand ASP.NET's Master Pages, and JSF's ui:composition readily solve this task.

Here is one of JSF's approaches:

Define page template my-page.xhtml:
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:ui="http://java.sun.com/jsf/facelets" xmlns:h="http://java.sun.com/jsf/html"> <h:body> <table> <tr> <td><ui:insert name="menu"/></td> </tr> <tr> <td><ui:insert name="content"/></td> </tr> </table> </h:body> </html>
Use ui:composition tag to pass parts to the template:
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:ui="http://java.sun.com/jsf/facelets" xmlns:h="http://java.sun.com/jsf/html"> <h:body> <ui:composition template="my-page.xhtml"> <ui:define name="content"> My Content <ui:define> <ui:define name="menu"> <a href="#file">File</a> <a href="#edit">Edit</a> <a href="#view">View</a> <ui:define> </ui:composition> </h:body> </html>

We have decided to model angular directives after JSF, and have defined three simple directives: ui-template, ui-insert, ui-define (see angularjs-api/template/ui-lib.js).

To define a template one writes the following markup (see angularjs-api/template/my-page.html):

<table ui-template> <tr> <td ui-insert="menu"></td> </tr> <tr> <td ui-insert="content"></td> </tr> </table>

and declares a directive (see angularjs-api/template/my-page.js):

var myPage = { templateUrl: "my-page.html", transclude: true }; angular.module("app"). directive("myPage", function() { return myPage; });

and finally, to instantiate the directive one needs to write (see angularjs-api/template/sample.html):

<my-page> <div ui-define="content"> My content </div> <div ui-define="menu"> <a href="#file">File</a> <a href="#edit">Edit</a> <a href="#view">View</a> </div> </my-page>

The working sample can be seen through rawgit: sample.html

The other sample that integrates with routing can be found at sample-routing.html

Monday, 04 May 2015 13:07:53 UTC

Comments [0] -
AngularJS | javascript | Thinking aloud

Farewell Entity Framework

Farewell Entity Framework and hello Dapper!

For many years we were using Entity Framework. It's still very popular and Microsoft's primary Object-Relational Mapper library.

Clearly, the decision is subjective but here are our arguments.

We know and love SQL, and think that in its domain it occupies strong positions. What SQL leaves out of scope is a bridge between itself and other languages. That's where ORM should help.

We strongly beleive that no ORM library should try to hide SQL behind the Object's language itself. We beleive in a separation of roles in development. Database design and Data Access Layer should be separated from client's logic. Thus, we strive, if possible, to encapulate data access through SQL functions and stored procedures.

Entity Framework, in contrast, tries to factor out SQL, giving a perspective of object graph to a client. Initially, it looks promising but at the end a developer should remember that any object query is mapped back to SQL. Without keeping this in mind either query won't compile, or performance will be poor.

E.g. This query will probably fail to build SQL, as no Regex can be mapped to SQL:

var result = context.Content. Where(data => Regex.IsMatch(data.Content, pattern)). ToArray();

This query might be slow, if no suitble SQL index is defined:

var result = context.Content. Where(data => data.Field == value). ToArray();

Thus no EF's goal is achieved completely, SQL power is limitted, and Data Access Layer is often fused into other client's logic.

We think that Entity Framework is over-engineered library, which tries to be more than ORM. Its generality often bumps into limits of SQL support in EF: SQL dialects, types, operators, functions, and so on. One can observe that people for years appeal to introduce support of xml, hierarchyid, geometry/geography types, full text search, and so on. This state cannot be different, as EF will never be able and does not aim to support all SQL features.

EF has both design-time and runtime. Each database vendor should implement their EF adapter for EF to play well with that database. This cooperation is not always smooth. E.g see Database first create entity framework 6.1.1 model using system.data.sqlite 1.0.93.

At some point the cost of dealing with EF has became too high for us, so we started to look into an alternatives: from plain ADO.NET to lighter ORM library.

To our delight we have immediately found: Dapper - a simple object mapper for .NET. It provides a simple extensions to IDBConnection interface to deal with mapping of query parameters to object properties, and of query results to plain types. Here are some examples:

// Get Customer var customer = connection. Query<Customer>("select * from Customers where CustomerId = @id", new { id = customerID }). ToSingle(); // Insert a value connection.Execute("insert into MyTable(A, B) values(@a, @b)", new { a = 2, b = 3 });

So, Dapper leaves you with plain SQL, which we consider as advantage.

Except beeing minimalistic compared to EF, Dapper claims performance close to pure hand written ADO.NET. Indeed, they build dynamic methods to populate parameters and to create rows instances, so reflection is used during warm up period only.

Sunday, 30 November 2014 12:47:46 UTC

Comments [0] -
.NET | Thinking aloud

A Proposal to Add 2D Graphics Rendering and Display to C++

Among latest C++ proposals the most ambiguous is N4021.

The goal of that proposal is "to define a 2D drawing API for the C++ programming language".

The motivation is going like this:

Today, computer graphics are pervasive in modern life, and are even replacing console-style I/O for basic user interaction on many platforms. For example, a simple cout << "Hello, world!" statement doesn’t do anything useful on many tablets and smartphones. We feel that C++ programmers should have a simple, standard way of displaying 2D graphics to users.

Authors compare several public and proprietary APIs to select the one named cairo graphics library as a base.

Reflecting on starting point they write:

Taken as a whole, starting from cairo allows for the creation of a 2D C++ drawing library that is already known to be portable, implementable, and useful without the need to spend years drafting, implementing, and testing a library to make sure that it meets those criteria.
...
An alternative design would be to create a new API via a synthesis of existing 2D APIs. This has the benefit of being able to avoid any perceived design flaws that existing APIs suffer from. Unfortunately this would not have implementation and usage experience. Further, doing so would not provide any guarantee that design flaws would not creep in.

What follows is a discussion on best way to transform that C library into std style C++ API.

Our thoughts on this proposal are threefold:

This proposal seems a decade or two late.
C++ standard should be modular to support basic and optional features.
We feel that programmers will not be satisfied with bare 2D graphics. It's not enough at nowadays.

Indeed, appeals to create standard C++ API for UI are as old as the C++'s standardization process. It's clear why did the committee not produce such API yet: they are bureaucracy that can approve API only. In fact it's a role of community to invent and implement libraries that may make their way into the standard. Without consensus in community no standard will reflect such API.

On the other hand C++ spec at present is too fat. Probably, not many people are satisfied with the pace of its evolution. Any big chunk of a new API makes the progress even slower. C++ spec should go through a refactoring and be split into core(s) and libraries and to allow individual progress of each part. This will simplify both specification and implementation. After that refactoring an API can be added or deprecated much more easily. In fact implementations were always like this. It's the spec that tries to be monolith.

As for a new 2D graphics API. It looks like an idea from late 90-es. We think that today's programmers (at least several samples :-) ) wished to deal with industry standard UI API, and not to start from basic drawing. Looking around we observe that html 5 is such de-facto standard. Take into an account that it supports rich layout, svg, canvas, user input; in addition it's good for GPU optimization. Even if you want to deal with simple graphics then you can build svg markup or draw on the canvas.

So, what we rather prefer to see in the C++ spec is an html binding API (both for DOM and Javascript).

Just think of standard C++ program that uses html engine as its UI!

Sunday, 17 August 2014 08:56:08 UTC

Comments [0] -
C++ | Thinking aloud

.NET Cache API

Looking at Guava Cache we think its API is more convenient than .NET's Cache API.

Just consider:

.NET has getters, and setters of objects by string keys.
You should provide caching policy with each setter.
Guava cache operates with typed storage of Key to Value.
Provides a value factory and a caching policy in advance at cache construction.

Guava's advantange is based on an idea that homogenous storage assumes a uniform way of creation of values, and uniform caching policy. Thus a great part of logic is factored out into a cache initialization.

We have decided to create a simple adapter of the MemoryCache to achieve the same goal. Here is a result of such an experiment:

public class Cache<K, V> where V: class { /// <summary> /// A cache builder. /// </summary> public struct Builder { /// <summary> /// A memory cache. If not specified then MemoryCache.Default is used. /// </summary> public MemoryCache MemoryCache; /// <summary> /// An expiration value. /// Alternatively CachePolicyFunc can be used. /// </summary> public TimeSpan Expiration; /// <summary> /// Indicates whether to use sliding (true), or absolute (false) /// expiration. /// Alternatively CachePolicyFunc can be used. /// </summary> public bool Sliding; /// <summary> /// Optional function to get caching policy. /// Alternatively Expiration and Sliding property can be used. /// </summary> public Func<V, CacheItemPolicy> CachePolicyFunc; /// <summary> /// Optional value validator. /// </summary> public Func<V, bool> Validator; /// <summary> /// A value factory. /// Alternatively FactoryAsync can be used. /// </summary> public Func<K, V> Factory; /// <summary> /// Async value factory. /// Alternatively Factory can be used. /// </summary> public Func<K, Task<V>> FactoryAsync; /// <summary> /// A key to string converter. /// </summary> public Func<K, string> KeyFunc; /// <summary> /// Converts builder to a Cache<K, V> instance. /// </summary> /// <param name="builder">A builder to convert.</param> /// <returns>A Cache<K, V> instance.</returns> public static implicit operator Cache<K, V>(Builder builder) { return new Cache<K, V>(builder); } } /// <summary> /// Creates a cache from a cache builder. /// </summary> /// <param name="builder">A cache builder instance.</param> public Cache(Builder builder) { if ((builder.Factory == null) && (builder.FactoryAsync == null)) { throw new ArgumentException("builder.Factory"); } if (builder.MemoryCache == null) { builder.MemoryCache = MemoryCache.Default; } this.builder = builder; } /// <summary> /// Cached value by key. /// </summary> /// <param name="key">A key.</param> /// <returns>A cached value.</returns> public V this[K key] { get { return Get(key); } set { Set(key, value); } } /// <summary> /// Sets a value for a key. /// </summary> /// <param name="key">A key to set.</param> /// <param name="value">A value to set.</param> public void Set(K key, V value) { SetImpl(GetKey(key), IsValid(value) ? value : null); } /// <summary> /// Gets a value for a key. /// </summary> /// <param name="key">A key to get value for.</param> /// <returns>A value instance.</returns> public V Get(K key) { var keyValue = GetKey(key); var value = builder.MemoryCache.Get(keyValue) as V; if (!IsValid(value)) { value = CreateValue(key); SetImpl(keyValue, value); } return value; } /// <summary> /// Gets a task to return an async value. /// </summary> /// <param name="key">A key.</param> /// <returns>A cached value.</returns> public async Task<V> GetAsync(K key) { var keyValue = GetKey(key); var value = builder.MemoryCache.Get(keyValue) as V; if (!IsValid(value)) { value = await CreateValueAsync(key); SetImpl(keyValue, value); } return value; } /// <summary> /// Gets string key value for a key. /// </summary> /// <param name="key">A key.</param> /// <returns>A string key value.</returns> protected string GetKey(K key) { return builder.KeyFunc != null ? builder.KeyFunc(key) : key == null ? null : key.ToString(); } /// <summary> /// Creates a value for a key. /// </summary> /// <param name="key">A key to create value for.</param> /// <returns>A value instance.</returns> protected V CreateValue(K key) { return builder.Factory != null ? builder.Factory(key) : builder.FactoryAsync(key).Result; } /// <summary> /// Creates a task for value for a key. /// </summary> /// <param name="key">A key to create value for.</param> /// <returns>A task for a value instance.</returns> protected Task<V> CreateValueAsync(K key) { return builder.FactoryAsync != null ? builder.FactoryAsync(key) : Task.FromResult(builder.Factory(key)); } /// <summary> /// Validates the value. /// </summary> /// <param name="value">A value to validate.</param> /// <returns> /// true if value is valid for a cache, and false otherise. /// </returns> protected bool IsValid(V value) { return (value != null) && ((builder.Validator == null) || builder.Validator(value)); } /// <summary> /// Set implementation. /// </summary> /// <param name="key">A key to set value for.</param> /// <param name="value">A value to set.</param> /// <returns>A set value.</returns> private V SetImpl(string key, V value) { if (value == null) { builder.MemoryCache.Remove(key); } else { builder.MemoryCache.Set( key, value, builder.CachePolicyFunc != null ? builder.CachePolicyFunc(value) : builder.Sliding ? new CacheItemPolicy { SlidingExpiration = builder.Expiration } : new CacheItemPolicy { AbsoluteExpiration = DateTime.Now + builder.Expiration }); } return value; } /// <summary> /// Cache builder. /// </summary> private Builder builder; }

The use consists of initialization:

Cache<MyKey, MyValue> MyValues = new Cache<MyKey, MyValue>.Builder { KeyFunc = key => ...key to string value..., Factory = key => ...create a value for a key..., Expiration = new TimeSpan(0, 3, 0), Sliding = true };

and a trivial cache access:

var value = MyValues[key];

This contrasts with MemoryCache coding pattern:

MemoryCache cache = MemoryCache.Default; ...

var keyAsString = ...key to string value... var value = cache.Get(keyAsString) as MyValue; if (value == null) { value = ...create a value for a key... cache.Set(keyAsString, value, ...caching policy...); }

Monday, 28 July 2014 05:36:06 UTC

Comments [0] -
.NET | Thinking aloud | Tips and tricks

IEqualityComparer in LINQ

Enumerable class contains many overloads with IEqualityComparable<T> argument. Most notable methods are:

Contains;
Distinct;
Except;
GroupBy;
Intersect;
Join;
ToDictionary;
ToLookup;
Union.

Recently we dealt with simple case:

source. Select( item => new Word { Text = ..., LangID = ..., Properties = ... ... }). Distinct(equality comparer by Text and LangID);

In other words how do you produce a enumeration of distinct words from a enumeration of words, where two words are qualified equal if their Text and LangID are equal?

It turns out it's cumbersome to implement IEqualityComparer<T> interface (and any other interface in C#), at least it's nothing close to a conciseness of lambda functions.

Here we've decided to step in into framework space and to introduce an API to define simple equality comparers for a class.

We start from the use case:

var wordComparer = KeyEqualityComparer.Null<Word>(). ThenBy(item => item.Text). ThenBy(item => item.LangID); ... source.Select(...).Distinct(wordComparer);

And then proceed to the API:

namespace NesterovskyBros.Linq { using System; using System.Collections; using System.Collections.Generic; /// <summary> /// A equality comparer extensions. /// </summary> public static class KeyEqualityComparer { /// <summary> /// Gets null as equality comparer for a type. /// </summary> /// <typeparam name="T">A type.</typeparam> /// <returns> /// null as equality comparer for a type. /// </returns> public static IEqualityComparer<T> Null<T>() { return null; } /// <summary> /// Creates an equality comparer for a enumeration item. /// </summary> /// <typeparam name="T">A type.</typeparam> /// <param name="source">A source items.</param> /// <param name="keyFunc">A key function.</param> /// <returns> /// null as equality comparer for a type. /// </returns> public static IEqualityComparer<T> EqualityComparerBy<T, K>( this IEnumerable<T> source, Func<T, K> keyFunc) { return new KeyEqualityComparer<T, K>(keyFunc); } /// <summary> /// Creates an equality comparer that uses this comparer as a base. /// </summary> /// <typeparam name="T">A type.</typeparam> /// <typeparam name="K">A key type.</typeparam> /// <param name="equalityComparer">A base equality comparer.</param> /// <param name="keyFunc">A key function.</param> /// <returns> /// An equality comparer that uses this comparer as a base. /// </returns> public static KeyEqualityComparer<T, K> ThenBy<T, K>( this IEqualityComparer<T> equalityComparer, Func<T, K> keyFunc) { return new KeyEqualityComparer<T, K>(keyFunc, equalityComparer); } } /// <summary> /// Equality comparer that uses a function to extract a comparision key. /// </summary> /// <typeparam name="T">A type.</typeparam> /// <typeparam name="K">A key type.</typeparam> public struct KeyEqualityComparer<T, K>: IEqualityComparer<T> { /// <summary> /// Creates an equality comparer. /// </summary> /// <param name="keyFunc">A key function.</param> /// <param name="equalityComparer">A base equality comparer.</param> public KeyEqualityComparer( Func<T, K> keyFunc, IEqualityComparer<T> equalityComparer = null) { KeyFunc = keyFunc; EqualityComparer = equalityComparer; } /// </summary> /// <param name="x">The first object of type T to compare.</param> /// <param name="y">The second object of type T to compare.</param> /// <returns> /// true if the specified objects are equal; otherwise, false. /// </returns> public bool Equals(T x, T y) { return ((EqualityComparer == null) || EqualityComparer.Equals(x, y)) && EqualityComparer<K>.Default.Equals(KeyFunc(x), KeyFunc(y)); } /// <summary> /// Returns a hash code for the specified object. /// </summary> /// <param name="obj"> /// The value for which a hash code is to be returned. /// </param> /// <returns>A hash code for the specified object.</returns> public int GetHashCode(T obj) { var hash = EqualityComparer<K>.Default.GetHashCode(KeyFunc(obj)); if (EqualityComparer != null) { var hash2 = EqualityComparer.GetHashCode(obj); hash ^= (hash2 << 5) + hash2; } return hash; } /// <summary> /// A key function. /// </summary> public readonly Func<T, K> KeyFunc; /// <summary> /// Optional base equality comparer. /// </summary> public readonly IEqualityComparer<T> EqualityComparer; } }

So, now you can easily build simple equality comparers to cache them or instantiate on the fly. This comparers are usually related to property values or their function of source values.

Extensions in `nesterovskyBros.data.Model`

As with kendo.data.Model there is fields Object - a set of key/value pairs to configure the model fields, but fields have some more options:
- fields.fieldName.serializable Boolean - indicates whether the field appears in an object returned in model.toJSON(). Default is true.
- fields.fieldName.updateDirty Boolean - indicates whether the change of the property should trigger dirty field change. Default is true.
When model defines a field and there is a prototype function with the same name then this function is used to get and set a field value.
When property is changed through the model.set() method then dirty change event is triggered (provided that fields.fieldName.updateDirty !== false). This helps to build a dependcy graph on that property.
When model instance is consturcted, the data passed in are validated, nullable and default values are set.

Model example

Here is an example of a model:

nesterovskyBros.data.ProductModel = nesterovskyBros.data.Model.define( { fields: { name: { type: "string", defaultValue: "Product Name" }, price: { type: "number", defaultValue: 10 }, unitsInStockValue: { type: "number", defaultValue: 10, serializable: false }, unitsInStock: { type: "string" } }, unitsInStock: function(value) { if (value === undefined) { var count = this.get("unitsInStockValue"); return ["one", "two", "three", "four"][count] || (count + ""); } else { this.set("unitsInStockValue", ({one: 1, two: 2, three: 3, four: 4 })[value] || value); } } });

Notice that:

unitsInStock property is implemented as a function - this helps to map model values to presentation values.
when you call model.toJSON(), or JSON.stringify() you will see in result name, price, unitsInStock values only - this helps to get model's state and to store it somewhere (e.g. in sessionStorage).
in a code:
var model = new nesterovskyBros.data.ProductModel({ price: "7", unitsInStock: "one" });
the following is true:
(typeof(model.price) == "number") && (mode.price == 7) && (model.name == "Product Name") && (model.unitsInStockValue == 1)

As with UserControl the implemntation is defined in the controls.js. The sample page is the same index.html

Wednesday, 03 April 2013 20:37:49 UTC

Comments [0] -
javascript | Thinking aloud | Tips and tricks

Windows 8 + Lenovo Yoga-13 + external SD card

Two weeks ago we've gotten new Lenovo 13" laptops (Yoga-13 with touch screens and Windows 8 Pro on board).

The first expression was WOW! Touch screens! Windows 8! Now we'll try our hand on that new (for us) API. So new, so cool...

A day later. What a shit this new UI. Where are my desktop, "Start" button, all the programs... After googling we've understood - we're not alone.

Few more days later. We've recognized that our SSD hard disk won't live long life with our projects. We generates output several GB a day. Thus we've decided to buy external SD cards - additional 64Gb, class 10. That's enough for us. No sooner said than done. After several attempts to copy our projects from hard drive to SD card (~9Gb of sources) we strongly believe that such a vigorous mix (Lenovo + Win 8 + external SD card) won't survive. Windows 8 hangs up when display off (in middle of data copy, after an hour of work). What a .... of .... this Windows 8, Lenovo and SD cards all together.

Thursday, 28 March 2013 22:39:55 UTC

Comments [2] -
Thinking aloud

KendoUI User control

Developing with KendoUI we try to formalize tasks. With this in mind we would like to have user controls.

We define user control as following:

It is a javascript class that extends Widget.
It offers a way to reuse UI.
It allows to define a model and a template with UI and data binding.

Unfortunately, KendoUI does not have such API, though one can easily define it; so we have defined our version.

Here we review our solution. We have taken a grid KendoUI example and converted it into a user control.

User control on the page

See index.html

<!DOCTYPE html> <html> <head> <title>Test</title>  <script src="scripts/templates.js"></script> <script src="scripts/jquery/jquery.js"></script> <script src="scripts/kendo/kendo.web.min.js"></script>  <script src="scripts/controls.js"></script>  <script src="scripts/controls/confirm.js"></script>  <script src="scripts/controls/products.js"></script> <link href="styles/kendo.common.min.css" rel="stylesheet" /> <link href="styles/kendo.default.min.css" rel="stylesheet" /> <script> $(function () { // (5) Bind the page. kendo.bind( document.body, // (6) Model as a datasource. { source: [new nesterovskyBros.data.ProductsModel] }); }); </script> </head> <body>  <div data-role="products" data-bind="source: source"></div> </body> </html>

That's what we see here:

Templates that define layouts. See "How To: Load KendoUI Templates from External Files", and templates.tt.
Definition of the UserControl widget.
Confirm dialog user control (we shall mention it later).
Products user control.
Data binding that instantiates page controls.
Model is passed to a user control through the dataSource.
Use of Products user control. Notice that "data-role" defines control type, "source" refers to the model.

User Control declaration

Declaration consists of a view and a model.

View is html with data binding. See products.tmpl.html

We build our project using Visual Studio, so templates packaging is done with templates.tt. This transformation converts products template into a tag:

<script id="products-template" type="text/x-kendo-template">

thus template can be referred by a utility function: nesterovskyBros.template("products-template").

Model inherits kedo.data.Model. Here how it looks:

// (1) Define a ProducsModel class. nesterovskyBros.data.ProductsModel = kendo.data.Model.define( { // (2) Model properties. fields: { productName: { type: "string", defaultValue: "Product Name" }, productPrice: { type: "number", defaultValue: 10 }, productUnitsInStock: { type: "number", defaultValue: 10 }, products: { type: "default", defaultValue: [] } }, // (3) Model methods. addProduct: function () { ... }, deleteProduct: function (e) { ... }, ... }); // (4) Register user control. nesterovskyBros.ui.Products = nesterovskyBros.defineControl( { name: "Products", model: nesterovskyBros.data.ProductsModel });

That's what we have here:

We define a model that inherits KendoUI Model.
We define model fields.
We define model methods.
Register user control with nesterovskyBros.defineControl(proto) call, where:
- proto.name - defines user control name;
- proto.model - defines model type;
- proto.template - defines optional template. If not specified, a template is retrieved from $("#" + proto.name.toLowerCase() + "-template").html().

UserControl API

Now, what's remained is API for the UserControl. See controls.js.

UserControl defines following events:
- change - triggered when data source is changed;
- dataBound - triggered when widget is data bound;
- dataBinding - triggered befor widget data binding;
- save - used to notify user to save model state.
UserControl defines following options:
- autoBind (default false) - autoBind data source;
- template (default $.noop) - user control template.
UserControl defines dataSource field and setDataSource() method.
UserControl defines rebind() method to manually rebuild widget's view from the template and model.
UserControl sets/deletes model.owner, which is a function returning a user control widget when model is bound/unbound to the widget.
When UserControl binds/unbinds model a model.refresh method is called, if any.
You usually define you control with a call nesterovskyBros.defineControl(proto). See above.
There is also a convenience method to build a dialog based on a user control: nesterovskyBros.defineDialog(options), where
- options.name - a user control name (used in the data-role);
- options.model - a model type;
- options.windowOptions - a window options.
This method returns a function that recieves a user control model, and returns a dialog (kendo.ui.Window) based on the user control.
Dialog has model() function that returns an instance of model.
Model has dialog() function that returns an instance of the dialog.
Dialog and model have result() function that returns an instance of deferred object used to track dialog completion.
The example of user control dialog is confirm.js and confirm.tmpl.html. The use is in the products.js deleteProduct():

deleteProduct: function(e) { var that = this; return nesterovskyBros.dialog.confirm( { title: "Please confirm", message: "Do you want to delete the record?", confirm: "Yes", cancel: "No" }). open(). center(). result(). then( function(confirmed) { if (!confirmed) { return; } ... }); }

Last

User controls along with technique to manage and cache templates allow us to build robust web applications. As the added value it's became a trivial task to build SPA.

Introduction

Parallel refactoring

Reference

Excellence Through Experience: Nesterovsky Bros.

A pseudocode in Wikipedia is:

Here is pseudocode:

Extensions in nesterovskyBros.data.Model

Model example

User control on the page

User Control declaration

UserControl API

Last

Extensions in `nesterovskyBros.data.Model`