RSS 2.0
Sign In
# Thursday, July 10, 2014

Enumerable class contains many overloads with IEqualityComparable<T> argument. Most notable methods are:

  • Contains;
  • Distinct;
  • Except;
  • GroupBy;
  • Intersect;
  • Join;
  • ToDictionary;
  • ToLookup;
  • Union.

Recently we dealt with simple case:

source.
  Select(
    item =>
      new Word
      {
        Text = ...,
        LangID = ...,
        Properties = ...
        ...
      }).
  Distinct(equality comparer by Text and LangID);

In other words how do you produce a enumeration of distinct words from a enumeration of words, where two words are qualified equal if their Text and LangID are equal?

It turns out it's cumbersome to implement IEqualityComparer<T> interface (and any other interface in C#), at least it's nothing close to a conciseness of lambda functions.

Here we've decided to step in into framework space and to introduce an API to define simple equality comparers for a class.

We start from the use case:

var wordComparer = KeyEqualityComparer.Null<Word>().
  ThenBy(item => item.Text).
  ThenBy(item => item.LangID);

...
source.Select(...).Distinct(wordComparer);

And then proceed to the API:

namespace NesterovskyBros.Linq
{
  using System;
  using System.Collections;
  using System.Collections.Generic;

  /// <summary>
  /// A equality comparer extensions.
  /// </summary>
  public static class KeyEqualityComparer
  {
    /// <summary>
    /// Gets null as equality comparer for a type.
    /// </summary>
    /// <typeparam name="T">A type.</typeparam>
    /// <returns>
    /// null as equality comparer for a type.
    /// </returns>
    public static IEqualityComparer<T> Null<T>()
    {
      return null;
    }

    /// <summary>
    /// Creates an equality comparer for a enumeration item.
    /// </summary>
    /// <typeparam name="T">A type.</typeparam>
    /// <param name="source">A source items.</param>
    /// <param name="keyFunc">A key function.</param>
    /// <returns>
    /// null as equality comparer for a type.
    /// </returns>
    public static IEqualityComparer<T> EqualityComparerBy<T, K>(
      this IEnumerable<T> source,
      Func<T, K> keyFunc)
    {
      return new KeyEqualityComparer<T, K>(keyFunc);
    }

    /// <summary>
    /// Creates an equality comparer that uses this comparer as a base.
    /// </summary>
    /// <typeparam name="T">A type.</typeparam>
    /// <typeparam name="K">A key type.</typeparam>
    /// <param name="equalityComparer">A base equality comparer.</param>
    /// <param name="keyFunc">A key function.</param>
    /// <returns>
    /// An equality comparer that uses this comparer as a base.
    /// </returns>
    public static KeyEqualityComparer<T, K> ThenBy<T, K>(
      this IEqualityComparer<T> equalityComparer,
      Func<T, K> keyFunc)
    {
      return new KeyEqualityComparer<T, K>(keyFunc, equalityComparer);
    }
  }

  /// <summary>
  /// Equality comparer that uses a function to extract a comparision key.
  /// </summary>
  /// <typeparam name="T">A type.</typeparam>
  /// <typeparam name="K">A key type.</typeparam>
  public struct KeyEqualityComparer<T, K>: IEqualityComparer<T>
  {
    /// <summary>
    /// Creates an equality comparer.
    /// </summary>
    /// <param name="keyFunc">A key function.</param>
    /// <param name="equalityComparer">A base equality comparer.</param>
    public KeyEqualityComparer(
      Func<T, K> keyFunc,
      IEqualityComparer<T> equalityComparer = null)
    {
      KeyFunc = keyFunc;
      EqualityComparer = equalityComparer;
    }

    /// </summary>
    /// <param name="x">The first object of type T to compare.</param>
    /// <param name="y">The second object of type T to compare.</param>
    /// <returns>
    /// true if the specified objects are equal; otherwise, false.
    /// </returns>
    public bool Equals(T x, T y)
    {
      return ((EqualityComparer == null) || EqualityComparer.Equals(x, y)) &&
        EqualityComparer<K>.Default.Equals(KeyFunc(x), KeyFunc(y));
    }

    /// <summary>
    /// Returns a hash code for the specified object.
    /// </summary>
    /// <param name="obj">
    /// The value for which a hash code is to be returned.
    /// </param>
    /// <returns>A hash code for the specified object.</returns>
    public int GetHashCode(T obj)
    {
      var hash = EqualityComparer<K>.Default.GetHashCode(KeyFunc(obj));

      if (EqualityComparer != null)
      {
        var hash2 = EqualityComparer.GetHashCode(obj);

        hash ^= (hash2 << 5) + hash2;
      }

      return hash;
    }

    /// <summary>
    /// A key function.
    /// </summary>
    public readonly Func<T, K> KeyFunc;

    /// <summary>
    /// Optional base equality comparer.
    /// </summary>
    public readonly IEqualityComparer<T> EqualityComparer;
  }
}

So, now you can easily build simple equality comparers to cache them or instantiate on the fly. This comparers are usually related to property values or their function of source values.

See also LINQ extensions

Thursday, July 10, 2014 8:31:42 PM UTC  #    Comments [0] -
.NET | Thinking aloud
# Monday, June 30, 2014

Dasha Nesterovsky

Picture of Dasha

Monday, June 30, 2014 1:01:25 PM UTC  #    Comments [0] -
Announce
# Saturday, June 28, 2014

This is a small post about refactoring lock statements in async methods.

Before refactoring we had a code like this:

lock(sync)
{
  result = methodToRefactorIntoAsync();
}

...

private object sync = new object();

Lock is bound to a thread, thus no way you to use it in async code. As an alternative you may use SemaphoreSlim class:

await sync.WaitAsync(cancellationToken);

try
{
  result = await methodAsync(cancellationToken);
}
finally
{
  sync.Release();
}

...

private SemaphoreSlim sync = new SemaphoreSlim(1, 1);

Saturday, June 28, 2014 11:56:32 AM UTC  #    Comments [0] -
.NET | Thinking aloud
# Friday, June 27, 2014

What will you do if you have async Web API method that runs on server for a some time but your client is dropped?

There are two solutions:

  1. Run method to the end and allow to a framework to deal with disconnect;
  2. Try to be notified about client's drop and to break early.

The first approach is simplest but might result in some overconsumption of server resources. The other method requires you to check client status from time to time.

Fortunatelly, ASP.NET provides a HttpResponse.ClientDisconnectedToken property, which is limited to IIS 7.5+ in integrated mode, but still fits our needs. So, you should request ClientDisconnectedToken, if any, and implement your async code using that token.

The following extension function gets that token:

using System.Linq;
using System.Net.Http;
using System.Threading.Tasks;
using System.Threading;
using System.Web;

public static class HttpApiExtensions
{
  public static CancellationToken GetCancellationToken(
    this HttpRequestMessage request)
  {
    CancellationToken cancellationToken;
    object value;
    var key = typeof(HttpApiExtensions).Namespace + ":CancellationToken";

    if (request.Properties.TryGetValue(key, out value))
    {
      return (CancellationToken)value;
    }

    var httpContext = HttpContext.Current;

    if (httpContext != null)
    {
      var httpResponse = httpContext.Response;

      if (httpResponse != null)
      {
        try
        {
          cancellationToken = httpResponse.ClientDisconnectedToken;
        }
        catch
        {
          // Do not support cancellation.
        }
      }
    }

    request.Properties[key] = cancellationToken;

    return cancellationToken;
  }
}

And here is a Web API WordCount service described in the previous post:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Net;
using System.Net.Http;
using System.Threading;
using System.Threading.Tasks;

public class ValuesController: ApiController
{
  public async Task<int> GetWordCount([FromUri(Name = "url")] string[] urls)
  {
    var cancellationToken = Request.GetCancellationToken();

    using(var client = new HttpClient())
    {
      return (await Task.WhenAll(
        urls.Select(url => WordCountAsync(client, url, cancellationToken)))).Sum();
    }
  }

  public static async Task<int> WordCountAsync(
    HttpClient client,
    string url,
    CancellationToken cancellationToken)
  {
    string content = await (await client.GetAsync(url, cancellationToken)).
      Content.ReadAsStringAsync();

    return WordCount(content);
  }

  private static int WordCount(string text)
  {
    var count = 0;
    var space = true;

    for (var i = 0; i < text.Length; ++i)
    {
      if (space != char.IsWhiteSpace(text[i]))
      {
        space = !space;

        if (!space)
        {
          ++count;
        }
      }
    }

    return count;
  }
}

Though is simple there is a nuisance. You should pass cancellation token here and there, which adds to a pollution from async.

Friday, June 27, 2014 2:42:44 PM UTC  #    Comments [0] -
.NET | Thinking aloud
# Wednesday, June 25, 2014

Though parallel and async algorithms solve different tasks, they converge in some cases. And it's not always immediately clear what's the best.

Consider the following task: get a total word count contained in a given a set of urls.

At first we've solved it as a parallel task: indeed this fits to MapReduce pattern when you get urls' contents to count the number of words in  parallel (Map), and then sum word counts per each url to get final result (Reduce). But then we decided that the very same MapReduce algorithm can be implemented with async.

This is a parallel word count:

public static int ParallelWordCount(IEnumerable<string> urls)
{
  var result = 0;

  Parallel.ForEach(
    urls,
    url =>
    {
      string content;

      using(var client = new WebClient())
      {
        content = client.DownloadString(url);
      }

      var count = WordCount(content);

      Interlocked.Add(ref result, count);
    });

  return result;
}

Here is async word count:

public static async Task<int> WordCountAsync(IEnumerable<string> urls)
{
  return (await Task.WhenAll(urls.Select(url => WordCountAsync(url)))).Sum();
}

public static async Task<int> WordCountAsync(string url)
{
  string content;

  using(var client = new WebClient())
  {
    content = await client.DownloadStringTaskAsync(url);
  }

  return WordCount(content);
}

And this is an implementation of word count for a text (it's less important for this discussion):

public static int WordCount(string text)
{
  var count = 0;
  var space = true;

  for(var i = 0; i < text.Length; ++i)
  {
    if (space != char.IsWhiteSpace(text[i]))
    {
      space = !space;

      if (!space)
      {
        ++count;
      }
    }
  }

  return count;
}

Our impressions are:

  1. The parallel version is contained in one method, while the async one is implemeneted with two methods.

    This is due to the fact that C# compiler fails to generate async labmda function. We attribute this to Microsoft who leads and implements C# spec. Features should be composable. If one can implement a method as a lambda function, and one can implement a method as async then one should be able to implement a method as an async lambda function.

  2. Both parallel and async versions are using thread pool to run their logic.

  3. While both implementations follow MapReduce pattern, we can see that async version is much more scaleable. It's because of parallel threads stay blocked while waiting for an http response. On the other hand async tasks are not bound to any thread and are just not running while waiting for I/O.

This sample helped us to answer the question as to when to use parallel and when async. The simple answer goes like this:

  • if your logic is only CPU bound then use parallel API;
  • otherwise use async API (this accounts I/O waits).
Wednesday, June 25, 2014 12:51:28 PM UTC  #    Comments [5] -
.NET | Thinking aloud
# Monday, June 23, 2014

Not a long ago C# has introduced special language constructs to simplify asynchronous programming. It seems C++1x will follow async trend. But only recently when frameworks like ASP.NET Web API and Entity Framework started to catch up we've felt how it's to program with async and await keywords.

At first glance it seems it's a pure pleasure to write async methods:

private async Task SumPageSizesAsync()
{
  // To use the HttpClient type in desktop apps, you must include a using directive and add a
  // reference for the System.Net.Http namespace.
  HttpClient client = new HttpClient();
  // . . .

  byte[] urlContents = await client.GetByteArrayAsync(url);
  // . . .
}

To dereference a Task<T> into T you just write await task_t_expression, mark your method with async specifier, and adjust output type (if not void) to Task or Task<Result>. Compiler applies its magic to convert your code into an asynchronous state machine.

We liked this feature and immediately have started to use it. But, as we said, async/await has shined in full when frameworks made it a core element, and at that point we have started to see that while  async/await solve the task, they does not abstract the developer from implementation details, as a code gets considerably polluted.

Consider a method with pollution marked:

public static async Task<UserAuthorization> GetAuthorizationAsync(string accessToken)
{
  var key = "oauth2:" + accessToken;
  var authorization = cache.Get<UserAuthorization>(key);

  if (authorization != null)
  {
    return authorization;
  }

  using(var model = new ModelContainer())
  {
    authorization =
      (await model.UserAuthorizations.
        Where(item => item.AccessToken == accessToken).
        ToListAsync()).
      FirstOrDefault();
  }

  if (authorization == null)
  {
    authorization = await ValidateAsync(accessToken);
  }

  cache.Set(key, cache.ShortDelay, authorization);

  return authorization;
}

The more you use async, the more pollution you will see in your code. Ideally we would like to see the method quoted without any marked parts.

Monday, June 23, 2014 6:15:55 AM UTC  #    Comments [0] -
.NET | Thinking aloud
# Thursday, June 12, 2014

We needed to have oauth2 authorization in angularjs project.

Internet search on the subject immediately brings large amout of solutions like:

But unfortunatelly:

  • provider specific libraries have too different set of APIs, which requires another umbrella library to allow for application to accept several providers;
  • angular-oauth - supports Google only, and does not work in IE 11 with default security settings;
  • oauth.io looks attractive but adds additional level of indirection server, and is free for a basic plan only.

However there is a problem with all those approaches.

Let's assume that you have properly implemented client side authorization, and finally have gotten an access_token.

Now, you access your server with that access_token. What is your first step on the server?

Right! Your should validate it against oauth2 provider.

So, while client side authorization, among other things, included a validation of your token, you have to perform the validation on the server once more.

At this point we felt that we need to implement our oauth2 API. :-)

It can be found at nesterovsky-bros/oauth2.

This is the readme from that project:

Here we implement oauth2 authorization within angularjs.

Authentication is done as follows:

  1. Open oauth2 provider login/grant screen.
  2. Redirect to the oauth2 callback screen with access token.
  3. Verify of the access token against provider.
  4. Get some basic profile.

A base javascript class OAuth2 implements these steps.

There are following implementations that authorize against specific providers:

OAuth2Server - implements authorization through known providers, but calls server side to validate access token. This way, the server side can establish a user's session.

The file Config.json contains endpoints and request parameters per supported provider.

Note: You should register a client_id for each provider.

Note: user_id and access_tokens are unique only in a scope of access provider, thus a session is identified by Provider + access_token, and a user is identified by Provider + user_id.

The use case can be found in test.js E.g. authorization against OAuth2Server is done like this:

var login = new OAuth2Server(provider);

token = login.authorize();

token.$promise.then(
  function()
  {
    // token contains populated data.
  },
  function(error)
  {
    if (error)
    {
      // handle an error
    }
  });

Authorization token contains:

  • a promise to handle authorization outcome.
  • cancelToken (a Deferred object) to cancel authorization in progress.

Whole sample is implemented as VS project. All scripts are build with app.tt, that combines content of Scripts/app int app.js.

Server side is implemented with ASP.NET Web API. Authorization controllers are:

Thursday, June 12, 2014 2:59:42 PM UTC  #    Comments [0] -
.NET | AngularJS | javascript
# Sunday, May 25, 2014

After several years of experience with KendoUI we turned our attention to AngularJS. As many other libraries it has its strong and weak sides. There are many resources describing what AngularJS is, and what it is not. Our approach to study AngularJS was through an attempt to integrate it into an existing KendoUI web application.

It's rather straightforward to convert model from KendoUI into AngularJS, as logically both frameworks are equal in this regard. But tactically KendoUI implements model-view binding very differently than AngularJS does. KendoUI binds model to view immediately per each model field, where AngularJS delays a binding of each model field and performs whole model binding in one go. Angular's approach is more performant, and even more appealing to a developer, though the problem is that the time it takes to make whole model binding is proportional to a size (number of objects and properties) of model. This means that if you have a relatively big model you will experience tangible halts in browser's UI while a javascript updating view/model is running.

AngularJS advices some workaround, which in essence is to avoid big model. The problem is that a couple of thousands or even several hundrends of objects and properties are already considered big model. So, you should immediately plan your model, and view to avoid any potential impact. This seriously distracts from the task your're solving.

The idea that your UI will halt for the time proportional to the size of your whole model looks flawed in our opinion. KendoUI knows no such a problem. That's the reason why our KendoUI to AngularJS transition experience was not smooth.

Our analysis of AngularJS sources shows that the issue could be resolved provided model to view binding (it's called digest in that library) was asynchronous.

To verify our ideas we have created a branch nesterovsky-bros/angular.js where we implemented required refactorings. It includes:

  • API based on existing deferred/promise to write algorithms in async way, and
  • refactored digest logic.

At the end we have proposed to integrate our changes into the main branch: Make $digest async.

We're not sure whether our proposition will be integrated (rather no than yes). Nevertheless what we have come with is an interesting extension of deferred object that we neither have seen in AngularJS nor in JQuery, so later we will quote that API from q.js and scheduler.js.

Sunday, May 25, 2014 8:02:41 AM UTC  #    Comments [2] -
AngularJS | javascript | kendoui | Thinking aloud
# Monday, April 28, 2014

Among proposed new features (other than Maps and Arrays) in XPath 3.1 we like Arrow operator (=>).

It's defined like this:

[Definition: An arrow operator is a postfix operator that applies a function to an item, using the item as the first argument to the function.] If $i is an item and f() is a function, then $i=>f() is equivalent to f($i), and $i=>f($j) is equivalent to f($i, $j).

This syntax is particularly helpful when conventional function call syntax is unreadable, e.g. when applying multiple functions to an item. For instance, the following expression is difficult to read due to the nesting of parentheses, and invites syntax errors due to unbalanced parentheses:

tokenize((normalize-unicode(upper-case($string))),"\s+")

Many people consider the following expression easier to read, and it is much easier to see that the parentheses are balanced:

$string=>upper-case()=>normalize-unicode()=>tokenize("\s+")

 What it looks like?

Right! It's like extension functions in C#.

Monday, April 28, 2014 6:20:27 AM UTC  #    Comments [0] -
xslt
# Thursday, March 13, 2014

MSDN topic CREATE TRIGGER (Transact-SQL) contains (at 2014-03-13) a polite advice we like very much. Here it is:

create trigger note

Thursday, March 13, 2014 1:34:10 PM UTC  #    Comments [0] -

# Monday, March 10, 2014

In the article "Error handling in WCF based web applications" we've shown a custom error handler for RESTful service based on WCF. This time we shall do the same for Web API 2.1 service.

Web API 2.1 provides an elegant way to implementat custom error handlers/loggers, see the following article. Web API permits many error loggers followed by a single error handler for all uncaught exceptions. A default error handler knows to output an error both in XML and JSON formats depending on requested MIME type.

In our projects we use unique error reference IDs. This feature allows to an end-user to refer to any error that has happened during the application life time and pass such error ID to the technical support for further investigations. Thus, error details passed to the client-side contain an ErrorID field. An error logger generates ErrorID and passes it over to an error handler for serialization.

Let's look at our error handling implementation for a Web API application.

The first part is an implementation of IExceptionLogger interface. It assigns ErrorID and logs all errors:

/// Defines a global logger for unhandled exceptions.
public class GlobalExceptionLogger : ExceptionLogger
{
  /// Writes log record to the database synchronously.
  public override void Log(ExceptionLoggerContext context)
  {
    try
    {
      var request = context.Request;
      var exception = context.Exception;

      var id = LogError(
        request.RequestUri.ToString(),
        context.RequestContext == null ?
          null : context.RequestContext.Principal.Identity.Name,
        request.ToString(),
        exception.Message,
        exception.StackTrace);

      // associates retrieved error ID with the current exception
      exception.Data["NesterovskyBros:id"] = id;
    }
    catch
    {
      // logger shouldn't throw an exception!!!
    }
  }

  // in the real life this method may store all relevant info into a database.
  private long LogError(
    string address,
    string userid,
    string request,
    string message,
    string stackTrace)
  {
    ...
  }
}

The second part is the implementation of IExceptionHandler:

/// Defines a global handler for unhandled exceptions.
public class GlobalExceptionHandler : ExceptionHandler
{
  /// This core method should implement custom error handling, if any.
  /// It determines how an exception will be serialized for client-side processing.
  public override void Handle(ExceptionHandlerContext context)
  {
    var requestContext = context.RequestContext;
    var config = requestContext.Configuration;

    context.Result = new ErrorResult(
      context.Exception,
      requestContext == null ? false : requestContext.IncludeErrorDetail,
      config.Services.GetContentNegotiator(),
      context.Request,
      config.Formatters);
  }

  /// An implementation of IHttpActionResult interface.
  private class ErrorResult : ExceptionResult
  {
    public ErrorResult(
      Exception exception,
      bool includeErrorDetail,
      IContentNegotiator negotiator,
      HttpRequestMessage request,
      IEnumerable<MediaTypeFormatter> formatters) :
      base(exception, includeErrorDetail, negotiator, request, formatters)
    {
    }

    /// Creates an HttpResponseMessage instance asynchronously.
    /// This method determines how a HttpResponseMessage content will look like.
    public override Task<HttpResponseMessage> ExecuteAsync(CancellationToken cancellationToken)
    {
      var content = new HttpError(Exception, IncludeErrorDetail);
   
      // define an additional content field with name "ErrorID"
      content.Add("ErrorID", Exception.Data["NesterovskyBros:id"] as long?);

      var result =
        ContentNegotiator.Negotiate(typeof(HttpError), Request, Formatters);

      var message = new HttpResponseMessage
      {
        RequestMessage = Request,
        StatusCode = result == null ?
          HttpStatusCode.NotAcceptable : HttpStatusCode.InternalServerError
      };

      if (result != null)
      {
        try
        {
          // serializes the HttpError instance either to JSON or to XML
          // depend on requested by the client MIME type.
          message.Content = new ObjectContent<HttpError>(
            content,
            result.Formatter,
            result.MediaType);
        }
        catch
        {
          message.Dispose();

          throw;
        }
      }

      return Task.FromResult(message);
    }
  }
}

Last, but not least part of this solution is registration and configuration of the error logger/handler:

/// WebApi congiguation.
public static class WebApiConfig
{
  public static void Register(HttpConfiguration config)
  {
    ...

    // register the exception logger and handler
    config.Services.Add(typeof(IExceptionLogger), new GlobalExceptionLogger());
    config.Services.Replace(typeof(IExceptionHandler), new GlobalExceptionHandler());

    // set error detail policy according with value from Web.config
    var customErrors =
      (CustomErrorsSection)ConfigurationManager.GetSection("system.web/customErrors");

    if (customErrors != null)
    {
      switch (customErrors.Mode)
      {
        case CustomErrorsMode.RemoteOnly:
        {
         config.IncludeErrorDetailPolicy = IncludeErrorDetailPolicy.LocalOnly;

         break;
        }
        case CustomErrorsMode.On:
        {
          config.IncludeErrorDetailPolicy = IncludeErrorDetailPolicy.Never;

          break;
        }
        case CustomErrorsMode.Off:
        {
          config.IncludeErrorDetailPolicy = IncludeErrorDetailPolicy.Always;

          break;
        }
        default:
        {
          config.IncludeErrorDetailPolicy = IncludeErrorDetailPolicy.Default;

          break;
        }
      }
    }
  }
}

The client-side error handler remain almost untouched. The implementation details you may find in /Scripts/api/api.js and Scripts/controls/error.js files.

You may download the demo project here. Feel free to use this solution in your .NET projects.

Monday, March 10, 2014 10:56:24 AM UTC  #    Comments [0] -
.NET | ASP.NET | javascript | kendoui | Tips and tricks
# Friday, March 7, 2014

From time to time we run into tasks that we would like to solve in LINQ style but unfortunately it either cannot be done or a solution is not efficient.

Note that by LINQ style we do not mean C# query expressions (we have a strong distaste for that syntax) but extension methods defined in System.Linq.Enumerable and other classes.

Here we quote several extension methods that are good for a general use:

1. Select with predicate. This is shorthand of items.Where(...).Select(...):

/// <summary>
/// Projects each element of a sequence into a new form.
/// </summary>
/// <typeparam name="T">A type of elements of source sequence.</typeparam>
/// <typeparam name="R">A type of elements of target sequence.</typeparam>
/// <param name="source">A source sequence.</param>
/// <param name="where">A predicate to filter elements.</param>
/// <param name="selector">A result element selector.</param>
/// <returns>A target sequence.</returns>
public static IEnumerable<R> Select<T, R>(
  this IEnumerable<T> source,
  Func<T, bool> where,
  Func<T, R> selector)
{
  return source.Where(where).Select(selector);
}

2. Select with predicate with source element index passed both into the predicate and into the selector. This one you cannot trivially implement in LINQ:

/// <summary>
/// Projects each element of a sequence into a new form.
/// </summary>
/// <typeparam name="T">A type of elements of source sequence.</typeparam>
/// <typeparam name="R">A type of elements of target sequence.</typeparam>
/// <param name="source">A source sequence.</param>
/// <param name="where">A predicate to filter elements.</param>
/// <param name="selector">A result element selector.</param>
/// <returns>A target sequence.</returns>
public static IEnumerable<R> Select<T, R>(
  this IEnumerable<T> source,
  Func<T, int, bool> where,
  Func<T, int, R> selector)
{
  var index = 0;

  foreach(var value in source)
  {
    if (where(value, index))
    {
      yield return selector(value, index);
    }

    ++index;
  }
}

3. A function with output element as projection of a window of input elements. Such function can be used to get finite difference (operation opposite to a cumulative sum).

/// <summary>
/// Projects a window of source elements in a source sequence into target sequence.
/// Thus
///   target[i] =
///     selector(source[i], source[i - 1], ... source[i - window + 1])
/// </summary>
/// <typeparam name="T">A type of elements of source sequence.</typeparam>
/// <typeparam name="R">A type of elements of target sequence.</typeparam>
/// <param name="source">A source sequence.</param>
/// <param name="window">A size of window.</param>
/// <param name="lookbehind">
/// Indicate whether to produce target if the number of source elements
/// preceeding the current is less than the window size.
/// </param>
/// <param name="lookahead">
/// Indicate whether to produce target if the number of source elements
/// following current is less than the window size.
/// </param>
/// <param name="selector">
/// A selector that derives target element.
/// On input it receives:
///   an array of source elements stored in round-robing fashon;
///   an index of the first element;
///   a number of elements in the array to count.
/// </param>
/// <returns>Returns a sequence of target elements.</returns>
public static IEnumerable<R> Window<T, R>(
  this IEnumerable<T> source,
  int window,
  bool lookbehind,
  bool lookahead,
  Func<T[], int, int, R> selector)
{
  var buffer = new T[window];
  var index = 0;
  var count = 0;

  foreach(var value in source)
  {
    if (count < window)
    {
      buffer[count++] = value;

      if (lookbehind || (count == window))
      {
        yield return selector(buffer, 0, count);
      }
    }
    else
    {
      buffer[index] = value;
      index = index + 1 == window ? 0 : index + 1;

      yield return selector(buffer, index, count);
    }
  }

  if (lookahead)
  {
    while(--count > 0)
    {
      index = index + 1 == window ? 0 : index + 1;

      yield return selector(buffer, index, count);
    }
  }
}

This way a finite difference looks like this:

var diff = input.Window(
  2,
  false,
  false,
  (buffer, index, count) => buffer[index ^ 1] - buffer[index]);

4. A specialization of Window method that returns a enumeration of windows:

/// <summary>
/// Projects a window of source elements in a source sequence into a
/// sequence of window arrays.
/// </summary>
/// <typeparam name="T">A type of elements of source sequence.</typeparam>
/// <typeparam name="R">A type of elements of target sequence.</typeparam>
/// <param name="source">A source sequence.</param>
/// <param name="window">A size of window.</param>
/// <param name="lookbehind">
/// Indicate whether to produce target if the number of source elements
/// preceeding the current is less than the window size.
/// </param>
/// <param name="lookahead">
/// Indicate whether to produce target if the number of source elements
/// following current is less than the window size.
/// </param>
/// <returns>Returns a sequence of windows.</returns>
public static IEnumerable<T[]> Window<T>(
  this IEnumerable<T> source,
  int window,
  bool lookbehind,
  bool lookahead)
{
  return source.Window(
    window,
    lookbehind,
    lookahead,
    (buffer, index, count) =>
    {
      var result = new T[count];

      for(var i = 0; i < count; ++i)
      {
        result[i] = buffer[index];
        index = index + 1 == buffer.Length ? 0 : index + 1;
      }

     return result;
   });
}

Friday, March 7, 2014 2:11:46 PM UTC  #    Comments [3] -
.NET | Tips and tricks
# Monday, February 24, 2014

While writing an article Dealing with dynamic SQL in SQL Server we have crushed on a trivial C# code.

 Consider a declaration:

public enum Direction
{
  Asc,
  Desc
}

public struct Order
{
  public string Field { get; set; }
  public Direction Direction { get; set; }
}

public class DateRange
{
  public DateTime? From { get; set; }
  public DateTime? To { get; set; }
}

public class Request
{
  public DateRange CreatedAt { get; set; }
  public string Summary { get; set; }
  [XmlElement]
  public int[] State { get; set; }
  public DateRange UpdatedAt { get; set; }
  [XmlElement]
  public Order[] Order { get; set; }
}

and a code:

  ...
  var request = new Request
  {
    CreatedAt = { From = new DateTime(2014, 1, 1) },
    Order = new[]
    {
      new Order { Field = "CreatedAt" }
    }
  };
  ...

It was hard to believe that this simple variable declaration throws NullReferenceException.

In fact we could not realize what the reason is until we have decompiled that code and poked into IL.

It appears that the problem is with the line:

CreatedAt = { From = new DateTime(2014, 1, 1) },

It's implemented like this:

var range = request.CreatedAt;

range.From = new DateTime(2014, 1, 1) ;
range.To = null;

In other words it assumes the instance stored in property CreatedAt is not null, and it reinitializes all its properties.

In contrast the following works as expected (no exception is thrown):

CreatedAt = new { From = new DateTime(2014, 1, 1) },

as it creates a new instance and assigns it to a property.

We think that this is very error prone, so if it's how C# is assumed to be then it's the problem of the spec, otherwise it's the problem of Microsoft implementation of C# (at least in VS 2010 - 2013).

Monday, February 24, 2014 8:48:35 AM UTC  #    Comments [4] -

# Tuesday, February 11, 2014

These are initial positions for this writing:

  • SQL Server allows to execute dynamic SQL.
  • Dynamic SQL is useful and often unavoidable, e.g. when you have to filter or order data in a way that you cannot code efficiently in advance.
  • Dynamic SQL has proven to be a dangerous area, as with improper use it can open hole in a security.

In general nothing stops you from building and then excuting of SQL string. Our goal, however, is to define rules that make work with dynamic SQL is more managable and verifiable.

Here we outline these rules, and then give some examples and tips.

Rule #1. Isolate dynamic SQL

Put all logic related to building of dynamic SQL into a separate function.
We usually define a separate scheme Dynamic, and define functions like Dynamic.GetSQL_XXX(params).
This makes it simple to perform code review.

Rule #2. Xml as parameters

Use xml type to pass parameters to a function that builds dynamic SQL.
In many cases dynamic SQL depends on variable number of parameters (like a list of values to check against).
Xml fits here to represent structured information.
On a client (e.g. in C# or java) you can define a class with all parameters, populate an instance and serialize it to an xml.

Rule #3. XQuery as template language

Use XQuery to define SQL template and to generate SQL tree from the input parameters.
Here is an example of such XQuery:

@data.query('
<sql>
select
  T.*
from
  Data.Ticket T
where
{
  for $ticketID in data/ticketID return
    <sql>(T.TicketID = <int>{$ticketID}</int>) and </sql>
}
(1 = 1)
</sql>')

You can see that output is an xml with sql element to represent literal SQL, and int element to represent integer literal.

In fact whole output schema can be defined like this:

<xs:schema elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="sql"/>
  <xs:element name="name"/>
  <xs:element name="string" nillable="true"/>
  <xs:element name="int" nillable="true"/>
  <xs:element name="decimal" nillable="true"/>
  <xs:element name="date" nillable="true"/>
  <xs:element name="time" nillable="true"/>
  <xs:element name="datetime" nillable="true"/>
</xs:schema>

where sql is to represent literal content, name to represent a name, and other elements to represent different literal values.

Rule #4. Escape literals

Use function Dynamic.ToSQL(@template) to build final SQL text.
Here we quote the definition:

-- Builds a text of SQL function for an sql template.
create function Dynamic.ToSQL
(
  -- SQL template.
  @template xml
)
returns nvarchar(max)
with returns null on null input
as
begin
  return
  (
    select
      case
        when N.Node.exist('*[xs:boolean(@xsi:nil)]') = 1 then
          'null'

        when N.Node.exist('self::int') = 1 then
          isnull(N.Node.value('xs:int(.)', 'nvarchar(max)'), '# int #')

        when N.Node.exist('self::string') = 1 then
          'N''' +
          replace
          (
            N.Node.value('.', 'nvarchar(max)'),
            '''',
            ''''''
          ) +
          ''''

        when N.Node.exist('self::name') = 1 then
          isnull
          (
            quotename(N.Node.value('.', 'nvarchar(128)'), '['),
            '# name #'
          )

        when N.Node.exist('self::datetime') = 1 then
          isnull
          (
            'convert(datetime2, ''' +
            N.Node.value('xs:dateTime(.)', 'nvarchar(128)') +
            ''', 126)',
            '# datetime #'
          )

        when N.Node.exist('self::date') = 1 then
          isnull
          (
            'convert(date, ''' +
            N.Node.value('xs:date(.)', 'nvarchar(128)') +
            ''', 126)',
            '# date #'
          )

        when N.Node.exist('self::time') = 1 then
          isnull
          (
            'convert(time, ''' +
            N.Node.value('xs:time(.)', 'nvarchar(128)') +
            ''', 114)',
            '# time #'
          )

        when N.Node.exist('self::decimal') = 1 then
          isnull
          (
            N.Node.value('xs:decimal(.)', 'nvarchar(128)'),
            '# decimal #'
          )

        when N.Node.exist('self::*') = 1 then
          '# invalid template #'

        else
          N.Node.value('.', 'nvarchar(max)')
      end
    from
      @template.nodes('//sql/node()[not(self::sql)]') N(Node)
    for xml path(''), type
  ).value('.', 'nvarchar(max)');
end;

Now, we want to stress that this function plays an important role in prevention of the SQL injection, as it escapes literals from the SQL tree.

Rule #5 (optional). Collect data

Use SQL to collect additional data required to build dynamic SQL. Here is an example of how we get a Ticket by StatusID, while on input we receive a StatusName:

create function Dynamic.GetSQL_GetTicketByStatus(@data xml)
returns nvarchar(max)
as
begin
  set @data =
    (
      select
        @data,
        (
          select
            T.StatusID
          from
            @data.nodes('/data/status') N(Node)
            inner join
            Metadata.Status T
            on
              T.StatusName = Node.value('.', 'nvarchar(128)')
            for xml auto, type, elements
        )
      for xml path('')
    );

  return Dynamic.ToSQL
  (
    @data.query
    ('
<sql>
select
  T.*
from
  Data.Ticket T
where
  T.Status in ({ for $status in /T/StatusID return <sql><int>{$status}</int>,</sql> } null)
</sql>
    ')
  );
end;

Notice code in red that collects some more data before calling XQuery.

Rule #6. Execute

The final step is to call dynamic SQL.
This is done like this:

-- build
declare @sql nvarchar(max) = Dynamic.GetSQL_GetTicket(@data);

-- execute
execute sp_executesql
  @sql
  -- {, N'@parameter_name data_type [ OUT | OUTPUT ][ ,...n ]' }
  -- { , [ @param1 = ] 'value1' [ ,...n ] }
with result sets
(
  (
    TicketID int not null,
    CreatedAt datetime2 not null,
    Summary nvarchar(256) null,
    Status int,
    Severity int,
    DeadLineAt datetime2 null
  )
);

Notice that the use of dynamic SQL does not prevent static parameters.
Notice also that with result sets clause is used to specify output.

Example. Tickets system

Let's assume you're dealing with a tickets system (like Bugzilla), and you have a table Data.Ticket to describe tickets. Assume that DDL for this table is like this:

create table Data.Ticket
(
  TicketID bigint not null primary key,
  CreatedAt datetime2 not null,
  Summary nvarchar(128) null,
  Status int not null,
  UpdatedAt datetime2(7) not null
)

Suppose you have to build C# code to search different tickets, where Entity Framework is used to access the database.
Search should be done by a range of CreatedAt, a range of UpdatedAt, Summary, or by different Status values. It should be possible to order results in different ways.

We start out solution from the C# and define classes for a request:

public enum Direction
{
  Asc,
  Desc
}

public struct Order
{
  public string Field { get; set; }
  public Direction Direction {get; set; }
}

public class DateRange
{
  public DateTime? From { get; set; }

  // This property is to omit From element if value is null.
  // See rules for xml serialization.
  public bool FromSpecified { get { return From != null; } }

  public DateTime? To { get; set; }
  public bool ToSpecified { get { return To != null; } }
}

public class TicketsRequest
{
  public DateRange CreatedAt { get; set; }
  public string Summary { get; set; }
  public DateRange UpdatedAt { get; set; }
  [XmlElement]
  public Order[] Order { get; set; }
  [XmlElement]
  public int[] Status { get; set; }
}

Notice that we're going to use XmlSerializer to convert request to xml and then to pass parameter into EF's model. Here is utility method to perform such conversion:

public static string ToXmlString<T>(T value)
{
  if (value == null)
  {
    return null;
  }

  var serializer = new XmlSerializer(typeof(T));
  var builder = new StringBuilder();

  var writer = XmlWriter.Create(
    builder,
    new XmlWriterSettings
    {
      OmitXmlDeclaration = true,
      Indent = false
    });

  serializer.Serialize(writer, value);
  writer.Flush();

  return builder.ToString();
}

Now we proceed to the database and define a procedure that runs the search:

-- Gets tickets.
create procedure Data.GetTickets
(
  -- A query parameters.
  @params xml
)
as
begin
  set nocount on;

  -- This is for EF to guess type of result.
  if (1 = 0)
  begin
    select
      TicketID,
      CreatedAt,
      Summary,
      Status,
      UpdatedAt
    from
      Data.Ticket;
  end;

  declare @sql nvarchar(max) = Dynamic.GetSQL_GetTickets(@params);

  execute sp_executesql @sql
  with result sets
  (
    (
      TicketID int not null,
      CreatedAt datetime2 not null,
      Summary nvarchar(256) null,
      Status int,
      UpdatedAt datetime2 null
    )
  );
end;

Switch back to C#, import the Data.GetTickets into the EF model, and create a search method:

public IEnumerable<Ticket> GetTickets(TicketsRequest request)
{
  var model = new Model();

  return model.GetTickets(ToXmlString(request));
}

The last ingredient is Dynamic.GetSQL_GetTickets() function.

create function Dynamic.GetSQL_GetTickets(@data xml)
returns nvarchar(max)
as
begin
  return Dynamic.ToSQL
  (
    @data.query('
<sql>
select
  T.TicketID,
  T.CreatedAt,
  T.Summary,
  T.Status,
  T.UpdatedAt
from
  Data.Ticket T
where
{
  for $range in */CreatedAt return
  (
    for $date in $range/From return
    <sql>
      (T.CreatedAt >= <datetime>{$date}</datetime>) and
    </sql>,

    for $date in $range/To return
    <sql>
      (<datetime>{$date}</datetime> > T.CreatedAt) and
    </sql>
  ),

  for $range in */UpdatedAt return
  (
    for $date in $range/From return
    <sql>
      (T.UpdatedAt >= <datetime>{$date}</datetime>) and
    </sql>,

    for $date in $range/To return
    <sql>
      (<datetime>{$date}</datetime> > T.UpdatedAt) and
    </sql>
  ),

  for $summary in */Summary return
  <sql>
    (T.Summary like <string>{$summary}</string>) and
  </sql>,

  if (*/Status) then
  <sql>
    T.Status in
      ({
        for $status in */Status return
          <sql><int>{$status}</int>, </sql>
      } null) and
  </sql>
  else ()
}
(1 = 1)
order by
{
  for $order in
    */Order
    [
      Field = ("TicketID", "CreatedAt", "Summary", "UpdatedAt", "Status")
    ]
  return
  <sql>
    <name>{$order/Field}</name>
    {" desc"[$order[Direction = "Desc"]]},
  </sql>
}
(select null)
</sql>
    ')
  );
end;

SQL text from Dynamic.GetSQL_GetTickets()

Consider now SQL text produced by this function. For an input:

<TicketsRequest>
  <CreatedAt>
    <From>2014-01-01T00:00:00</From>
  </CreatedAt>
  <Summary>hello%</Summary>
  <Order>
    <Field>Status</Field>
    <Direction>Desc</Direction>
  </Order>
  <Status>1</Status>
  <Status>3</Status>
</TicketsRequest>

the output is:

select
  T.TicketID,
  T.CreatedAt,
  T.Summary,
  T.Status,
  T.UpdatedAt
from
  Data.Ticket T
where

      (T.CreatedAt >= convert(datetime2, '2014-01-01T00:00:00', 126)) and

    (T.Summary like N'hello%') and

    T.Status in
      (1, 3, null) and

  (1 = 1)
order by
[Status] desc,

  (select null)

Though the text is not formatted as we would like, it's perfectly valid SQL.

Tips for building XQuery templates

What is called XQuery in SQL Server is in fact a very limited subset of XQuery 1.0. Microsoft clearly states this fact. What is trivial in XQuery is often impossible or ugly in XQuery of SQL Server.

Nevertheless XQuery in SQL Server works rather well as SQL template language. To make it most efficient, however, you should learn several tips.

Tip #1. Where clause

In template you might want to build a where clause:

<sql>
select
...
where
{
  if (...) then
    <sql>...</sql>
  else ()
}
</sql>

and it might happen that for a certain input a condition under where might collapse, and you will be left with where keyword without a real condition, which is wrong. A simple work around is to always add some true condition under ther where like this:

<sql>
select
...
where
{
  if (...) then
    <sql>... and </sql>
  else ()
} (1 = 1)
</sql>

Tip #2. "in" expression

If you want to generate "in" expression like this:

value in (item1, item2,...)

then you might find that it's much easier generate equivalent a code like this:

value in (item1, item2,..., null).

Here is a XQuery to generate such template:

value in
  ({
    for $item in ... return
      <sql><int>{$item}</int>, </sql>
  } null) and

Tip #3. Order by

You can conclude an order by clause built from a data with a dummy expression like this:

order by
{
  for $item in ... return
    <sql>
      <name>{$item/Field}</name>
      {" desc"[$item/Direction = "Desc"]},
    </sql>
} (select null)

Alternatively you can use first column from a clustered index.

Tip #4. Group by

In a group by clause we cannot introduce terminator expression as it was with order by, so a code is a less trivial:

{
  let $items := ... return

  if ($items) then
    <sql>
      group by <name>{$items[1]}</name>
      {
        for $item in $items[position() > 1] return
          <sql>, <name>{$item}</name></sql>
      }
    </sql>
  else ()
}

In fact similar logic may work with order by.

Tip #5. Escape literals

It's crusial not to introduce SQL injection while building SQL. Thus use:

<int>{...}</int> - for literal int;
<decimal>{...}</decimal> - for literal decimal;
<string>{...}</string> - for literal string;
<datetime>{...}</datetime> - for literal datetime2;
<date>{...}</date> - for literal date;
<time>{...}</time> - for literal time;
<name>{...}</name> - for a name to quote.

Note that you can use xsi:nil, so <int xsi:nil="true"/> means null.

If you generate a field name from an input data then it worth to validate it against a list of available names.

Tip #6. Validate input.

It worth to define xml schema for an input xml, and to validate parameters against it.
This makes code more secure, and also adds a documentation.

Tip #7. Don't abuse dynamic SQL

There are not too many cases when you need a dynamic SQL. Usually SQL engine knows how to build a good execution plan. If your query contains optional conditions then you can write it a way that SQL Server can optimize, e.g.:

select
  *
from
  T
where
  ((@name is null) or (Name = @name)) and
  ((@date is null) or (Date = @date))
option(recompile)

Tuesday, February 11, 2014 9:48:07 AM UTC  #    Comments [0] -
SQL Server puzzle | Thinking aloud | Tips and tricks
# Wednesday, January 22, 2014

Consider how would you implement Style object in the HTML DOM?

These are some characteristics of that object:

  • It has a long list of properties, e.g. in IE 11 there are more than 300 properties over a style object.
  • Any specific instance usually have only several properties assigned.
  • Reads of properties are much more frequent than writes. In fact style often stays unchanged after initialization.
  • DOM contains many style instances (often thousands).
  • The number of distinct instances in terms of values of properties is moderate (usually dozens).

Here is how would we approached to such an object.

1. Styles are sparse objects, thus there is no point to implement plain class with all those properties, as it's wasteful.

We would rather use two techniques to keep style's state:

  • A dictionary of properties with their values;
  • An aggregation of objects, where all properies are grouped into families, each group is defined by a separate type, and a style's state is an aggregation of that groups.

A current style of an element is an aggregation of styles of ancestor element. It can either by dynamic or be fused into a single style instance.

2. Make style's state immutable, and share all these states among all style instances.

In this implementation property write turns into a state transition operation: state = set(state, property, value). Thus no state is modified but replaced with other state that corresponds to a required change.

If state is seen as a dictionary then API may look like this :

public class State<K, V>
{
  // Gets shared dictionary for an input dictionary.
  public IDictionary<K, V> Get(IDictionary<K, V> dictionary);

  // Gets a shared dictionary for an input dictionary with key set to a value.
  public IDictionary<K, V> Set(IDictionary<K, V> dictionary, K key, V value);

  // Gets a shared dictionary for an input dictionary.
  public IDictionary<K, V> Remove(IDictionary<K, V> dictionary, K key);

  // Gets typed value.
  public T Get<T>(IDictionary<K, V> dictionary, K key)
    where T: V
  {
    V value;

    if ((dictionary == null) || !dictionary.TryGetValue(key, out value))
    {
      return default(T);
    }

    return (T)value;
  }

  // Sets or removes a typed value.
  // dictionary can be null.
  // null returned if output dictionary would be empty.
  public IDictionary<K, V> Set<T>(IDictionary<K, V> dictionary, K key, T value)
    where T : V
  {
    return value == null ? Remove(dictionary, key) :
      Set(dictionary, key, (V)value);
  }
}

States can be cached. Provided the cache keeps states in a weak way, no unsued state will be stored for a long time. We may use weak table of dictionary to dictionary WeakTable<Dictionary<K, V>, Dictionary<K, V>> as a storage for such a cache. All required API is described in the WeakTable and Hash Code of Dictionary posts.

3. Style can be implemented as a structure with shared state as a storage. Here is a scetch:

[Serializable]
public struct Style
{
  // All properties.
  public enum Property
  {
    Background,
    BorderColor,
    BorderStyle,
    Color,
    FontFamily,
    FontSize,
    // ...
  }

  public int? Background
  {
    get { return states.Get<int?>(state, Property.Background); }
    set { state = states.Set(state, Property.Background, value); }
  }

  public int? BorderColor
  {
    get { return states.Get<int?>(state, Property.BorderColor); }
    set { state = states.Set(state, Property.BorderColor, value); }
  }

  public string BorderStyle
  {
    get { return states.Get<string>(state, Property.BorderStyle); }
    set { state = states.Set(state, Property.BorderStyle, value); }
  }

  public int? Color
  {
    get { return states.Get<int?>(state, Property.Color); }
    set { state = states.Set(state, Property.Color, value); }
  }

  public string FontFamily
  {
    get { return states.Get<string>(state, Property.FontFamily); }
    set { state = states.Set(state, Property.FontFamily, value); }
  }

  public double? FontSize
  {
    get { return states.Get<double?>(state, Property.FontSize); }
    set { state = states.Set(state, Property.FontSize, value); }
  }

  // ...

  [OnDeserialized]
  private void OnDeserialized(StreamingContext context)
  {
    state = states.Get(state);
  }

  // A state.
  private IDictionary<Property, object> state;

  // A states cache.
  private static readonly State<Property, object> states =
    new State<Property, object>();
}

Note that:

  •  default state is a null dictionary;
  • states are application wide shared.

The following link is our implementation of State<K, V> class: State.cs.

 

Here we have outlined the idea of shared state object, and how it can be applied to sparse mostly immutable objects. We used HTML style as an example of such an object. Shared state object may work in many other areas, but for it to shine its use case should fit to the task.

Wednesday, January 22, 2014 7:43:25 PM UTC  #    Comments [0] -
.NET | Thinking aloud | Tips and tricks
Archive
<July 2014>
SunMonTueWedThuFriSat
293012345
6789101112
13141516171819
20212223242526
272829303112
3456789
Statistics
Total Posts: 387
This Year: 3
This Month: 0
This Week: 0
Comments: 1552
Locations of visitors to this page
Disclaimer
The opinions expressed herein are our own personal opinions and do not represent our employer's view in anyway.

© 2024, Nesterovsky bros
All Content © 2024, Nesterovsky bros
DasBlog theme 'Business' created by Christoph De Baene (delarou)