RSS 2.0
Sign In
# Wednesday, February 01, 2012

A customer have a table with data stored by dates, and asked us to present data from this table by sequential date ranges.

This query sounded trivial but took us half a day to create such a select.

For simplicity consider a table of integer numbers, and try to build a select that returns pairs of continuous ranges of values.

So, for an input like this:

declare @values table
(
  value int not null primary key
);

insert into @values(value)
select  1 union all select  2 union all select  3 union all
select  5 union all select  6 union all
select  8 union all
select 10 union all
select 12 union all select 13 union all select 14;

You will have a following output:

low  high
---- ----
1    3
5    6
8    8
10   10
12   14

Logic of the algorithms is like this:

  1. get a low bound of each range (a value without value - 1 in the source);
  2. get a high bound of each range (a value without value + 1 in the source);
  3. combine low and high bounds.

Following this logic we have built at least three different queries, where the shortest one is:

with source as
(
  select * from @values
)
select
  l.value low,
  min(h.value) high
from
  source l
  inner join
  source h
  on
    (l.value - 1 not in (select value from source)) and
    (h.value + 1 not in (select value from source)) and
    (h.value >= l.value)
group by
  l.value;

execution plan

Looking at this query it's hard to understand why it took so long to write so simple code...

Wednesday, February 01, 2012 8:34:09 PM UTC  #    Comments [0] -
SQL Server puzzle | Tips and tricks
# Thursday, January 19, 2012

While looking at some SQL we have realized that it can be considerably optimized.

Consider a table source like this:

with Data(ID, Type, SubType)
(
  select 1, 'A', 'X'
  union all
  select 2, 'A', 'Y'
  union all
  select 3, 'A', 'Y'
  union all
  select 4, 'B', 'Z'
  union all
  select 5, 'B', 'Z'
  union all
  select 6, 'C', 'X'
  union all
  select 7, 'C', 'X'
  union all
  select 8, 'C', 'Z'
  union all
  select 9, 'C', 'X'
  union all
  select 10, 'C', 'X'
)

Suppose you want to group data by type, to calculate number of elements in each group and to display sub type if all rows in a group are of the same sub type.

Earlier we have written the code like this:

select
  Type,
  case when count(distinct SubType) = 1 then min(SubType) end SubType,
  count(*) C
from
  Data
group by
  Type;

Namely, we select min(SybType) provided that there is a single distinct SubType, otherwise null is shown. That works perfectly, but algorithmically count(distinct SubType) = 1 needs to build a set of distinct values for each group just to ask the size of this set. That is expensive!

What we wanted can be expressed differently: if min(SybType) and max(SybType) are the same then we want to display it, otherwise to show null.

That's the new version:

select
  Type,
  case when min(SubType) = max(SubType) then min(SubType) end SubType,
  count(*) C
from
  Data
group by
  Type;

Such a simple rewrite has cardinally simplified the execution plan:

Execution plans

Another bizarre problem we have discovered is that SQL Server 2008 R2 just does not support the following:

select
  count(distinct SubType) over(partition by Type)
from
  Data

That's really strange, but it's known bug (see Microsoft Connect).

Thursday, January 19, 2012 9:12:11 PM UTC  #    Comments [0] -
SQL Server puzzle | Tips and tricks
# Friday, January 13, 2012

A database we support for a client contains multi-billion row tables. Many users query the data from that database, and it's permanently populated with a new data.

Every day we load several millions rows of a new data. Such loads can lock tables for a considerable time, so our loading procedures collect new data into intermediate tables and insert it into a final destination by chunks, and usually after work hours.

SQL Server 2008 R2 introduced READ_COMMITTED_SNAPSHOT database option. This feature trades locks for an increased tempdb size (to store row versions) and possible performance degradation during a transaction.

When we have switched the database to that option we did not notice any considerable performance change. Encouraged, we've decided to increase size of chunks of data we insert at once.

Earlier we have found that when we insert no more than 1000 rows at once, users don't notice impact, but for a bigger chunk sizes users start to complain on performance degradation. This has probably happened due to locks escalations.

Now, with chunks of 10000 or even 100000 rows we have found that no queries became slower. But load process became several times faster.

We were ready to pay for increased tempdb and transaction log size to increase performance, but in our case we didn't approach limits assigned by the DBA. Another gain is that we can easily load data at any time. This makes data we store more up to date.

Friday, January 13, 2012 1:43:56 PM UTC  #    Comments [0] -
SQL Server puzzle | Thinking aloud | Tips and tricks
# Saturday, December 03, 2011

Recently, we have found and reported the bug in the SQL Server 2008 (see SQL Server 2008 with(recompile), and also Microsoft Connect).

Persons, who's responsible for the bug evaluation has closed it, as if "By Design". This strange resolution, in our opinion, says about those persons only.

Well, we shall try once more (see Microsoft Connect). We have posted another trivial demonstartion of the bug, where we show that option(recompile) is not used, which leads to table scan (nothing worse can happen for a huge table).

Saturday, December 03, 2011 3:06:44 PM UTC  #    Comments [0] -
SQL Server puzzle | Thinking aloud
# Friday, November 18, 2011

Recently we have introduced some stored procedure in the production and have found that it performs incredibly slow.

Our reasoning and tests in the development environment did not manifest any problem at all.

In essence that procedure executes some SELECT and returns a status as a signle output variable. Procedure recieves several input parameters, and the SELECT statement uses with(recompile) execution hint to optimize the performance for a specific parameters.

We have analyzed the execution plan of that procedure and have found that it works as if with(recompile) hint was not specified. Without that hint SELECT failed to use index seek but rather used index scan.

What we have lately found is that the same SELECT that produces result set instead of reading result into a variable performs very well.

We think that this is a bug in SQL Server 2008 R2 (and in SQL Server 2008).

To demonstrate the problem you can run this test:

-- Setup
create table dbo.Items
(
  Item int not null primary key
);
go

insert into dbo.Items
select 1
union all
select 2
union all
select 3
union all
select 4
union all
select 5
go

create procedure dbo.GetMaxItem
(
  @odd bit = null,
  @result int output
)
as
begin
  set nocount on;

  with Items as
  (
    select * from dbo.Items where @odd is null
    union all
    select * from dbo.Items where (@odd = 1) and ((Item & 1) = 1)
    union all
    select * from dbo.Items where (@odd = 0) and ((Item & 1) = 0)
  )
  select @result = max(Item) from Items
  option(recompile);
end;
go

create procedure dbo.GetMaxItem2
(
  @odd bit = null,
  @result int output
)
as
begin
  set nocount on;

  declare @results table
  (
    Item int
  );

  with Items as
  (
    select * from dbo.Items where @odd is null
    union all
    select * from dbo.Items where (@odd = 1) and ((Item & 1) = 1)
    union all
    select * from dbo.Items where (@odd = 0) and ((Item & 1) = 0)
  )
  insert into @results
  select max(Item) from Items
  option(recompile);

  select @result = Item from @results;
end;
go

Test with output into a variable:

declare @result1 int;

execute dbo.GetMaxItem @odd = null, @result = @result1 output

Execution plan of dbo.GetMaxItem

Test without output directly into a variable:

declare @result2 int;

execute dbo.GetMaxItem2 @odd = null, @result = @result2 output

Execution plan of dbo.GetMaxItem2

Now, you can see the difference: the first execution plan uses startup expressions, while the second optimizes execution branches, which are not really used. In our case it was crucial, as the execition time difference was minutes (and more in future) vs a split of second.

See also Microsoft Connect Entry.

Friday, November 18, 2011 2:49:50 PM UTC  #    Comments [0] -
SQL Server puzzle | Tips and tricks
# Tuesday, April 26, 2011

Earlier, we have described an approach to call Windows Search from SQL Server 2008. But it has turned out that our problem is more complicated...

All has started from the initial task:

  • to allow free text search in a store of huge xml files;
  • files should be compressed, so these are *.xml.gz;
  • search results should be addressable to a fragment within xml.

Later we shall describe how we have solved this task, and now it's enough to say that we have implemented a Protocol Handler for Windows Search named '.xml-gz:'. This way original file stored say at 'file:///c:/store/data.xml-gz' is seen as a container by the Windows Search:

  • .xml-gz:///file:c:/store/data.xml-gz/id1.xml
  • .xml-gz:///file:c:/store/data.xml-gz/id2.xml
  • ...

This way search xml should be like this:

select System.ItemUrl from SystemIndex where scope='.xml-gz:' and contains(...)

Everything has worked during test: we have succeeded to issue Windows Search selects from SQL Server and join results with other sql queries.

But later on when we considered a runtime environment we have seen that our design won't work. The reason is simple. Windows Search will work on a computer different from those where SQL Servers run. So, the search query should look like this:

select System.ItemUrl from Computer.SystemIndex where scope='.xml-gz:' and contains(...)

Here we have realized the limitation of current (Windows Search 4) implementation: remote search works for shared folders only, thus query may only look like:

select System.ItemUrl from Computer.SystemIndex where scope='file://Computer/share/' and contains(...)

Notice that search restricts the scope to a file protocol, this way remoter search will never return our results. The only way to search in our scope is to perform a local search.

We have considered following approaches to resolve the issue.

The simplest one would be to access Search protocol on remote computer using a connection string: "Provider=Search.CollatorDSO;Data Source=Computer" and use local queries. This does not work, as provider simply disregards Data Source parameter.

The other try was to use MS Remote OLEDB provider. We tried hard to configure it but it always returns obscure error, and more than that it's deprecated (Microsoft claims to remove it in future).

So, we decided to forward request manually:

  • SQL Server calls a web service (through a CLR function);
  • Web service queries Windows Search locally.

Here we considered WCF Data Services and a custom web service.

The advantage of WCF Data Services is that it's a technology that has ambitions of a standard but it's rather complex task to create implementation that will talk with Windows Search SQL dialect, so we have decided to build a primitive http handler to get query parameter. That's trivial and also has a virtue of simple implementation and high streamability.

So, that's our http handler (WindowsSearch.ashx):

<%@ WebHandler Language="C#" Class="WindowsSearch" %>

using System;
using System.Web;
using System.Xml;
using System.Text;
using System.Data.OleDb;

/// <summary>
/// A Windows Search request handler.
/// </summary>
public class WindowsSearch: IHttpHandler
{
  /// <summary>
  /// Handles the request.
  /// </summary>
  /// <param name="context">A request context.</param>
  public void ProcessRequest(HttpContext context)
  {
    var request = context.Request;
    var query = request.Params["query"];
    var response = context.Response;

    response.ContentType = "text/xml";
    response.ContentEncoding = Encoding.UTF8;

    var writer = XmlWriter.Create(response.Output);

    writer.WriteStartDocument();
    writer.WriteStartElement("resultset");

    if (!string.IsNullOrEmpty(query))
    {
      using(var connection = new OleDbConnection(provider))
      using(var command = new OleDbCommand(query, connection))
      {
        connection.Open();

        using(var reader = command.ExecuteReader())
        {
          string[] names = null;

          while(reader.Read())
          {
            if (names == null)
            {
              names = new string[reader.FieldCount];

              for (int i = 0; i < names.Length; ++i)
              {
                names[i] = XmlConvert.EncodeLocalName(reader.GetName(i));
              }
            }

            writer.WriteStartElement("row");

            for(int i = 0; i < names.Length; ++i)
            {
              writer.WriteElementString(
                names[i],
                Convert.ToString(reader[i]));
            }

            writer.WriteEndElement();
          }
        }
      }
    }

    writer.WriteEndElement();
    writer.WriteEndDocument();

    writer.Flush();
  }

  /// <summary>
  /// Indicates that a handler is reusable.
  /// </summary>
  public bool IsReusable { get { return true; } }

  /// <summary>
  /// A connection string.
  /// </summary>
  private const string provider =
    "Provider=Search.CollatorDSO;" +
    "Extended Properties='Application=Windows';" +
    "OLE DB Services=-4";
}

And a SQL CLR function looks like this:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Net;
using System.IO;
using System.Xml;

/// <summary>
/// A user defined function.
/// </summary>
public class UserDefinedFunctions
{
  /// <summary>
  /// A Windows Search returning result as xml strings.
  /// </summary>
  /// <param name="url">A search url.</param>
  /// <param name="userName">A user name for a web request.</param>
  /// <param name="password">A password for a web request.</param>
  /// <param name="query">A Windows Search SQL.</param>
  /// <returns>A result rows.</returns>
  [SqlFunction(
    IsDeterministic = false,
    Name = "WindowsSearch",
    FillRowMethodName = "FillWindowsSearch",
    TableDefinition = "value nvarchar(max)")]
  public static IEnumerable Search(
    string url,
    string userName,
    string password,
    string query)
  {
    return SearchEnumerator(url, userName, password, query);
  }

  /// <summary>
  /// A filler of WindowsSearch function.
  /// </summary>
  /// <param name="value">A value returned from the enumerator.</param>
  /// <param name="row">An output value.</param>
  public static void FillWindowsSearch(object value, out string row)
  {
    row = (string)value;
  }

  /// <summary>
  /// Gets a search row enumerator.
  /// </summary>
  /// <param name="url">A search url.</param>
  /// <param name="userName">A user name for a web request.</param>
  /// <param name="password">A password for a web request.</param>
  /// <param name="query">A Windows Search SQL.</param>
  /// <returns>A result rows.</returns>
  private static IEnumerable<string> SearchEnumerator(
    string url,
    string userName,
    string password,
    string query)
  {
    if (string.IsNullOrEmpty(url))
    {
      throw new ArgumentException("url");
    }

    if (string.IsNullOrEmpty(query))
    {
      throw new ArgumentException("query");
    }

    var requestUrl = url + "?query=" + Uri.EscapeDataString(query);

    var request = WebRequest.Create(requestUrl);

    request.Credentials = string.IsNullOrEmpty(userName) ?
      CredentialCache.DefaultCredentials :
      new NetworkCredential(userName, password);

    using(var response = request.GetResponse())
    using(var stream = response.GetResponseStream())
    using(var reader = XmlReader.Create(stream))
    {
      bool read = true;

      while(!read || reader.Read())
      {
        if ((reader.Depth == 1) && reader.IsStartElement())
        {
          // Note that ReadInnerXml() advances the reader similar to Read().
          yield return reader.ReadInnerXml();

          read = false;
        }
        else
        {
          read = true;
        }
      }
    }
  }
}

And, finally, when you call this service from SQL Server you write query like this:

with search as
(
  select
    cast(value as xml) value
  from
    dbo.WindowsSearch
    (
      N'http://machine/WindowsSearchService/WindowsSearch.ashx',
      null,
      null,
      N'
        select
          "System.ItemUrl"
        from
          SystemIndex
        where
          scope=''.xml-gz:'' and contains(''...'')'
    )
)
select
  value.value('/System.ItemUrl[1]', 'nvarchar(max)')
from
  search

Design is not trivial but it works somehow.

After dealing with all these problems some questions remain unanswered:

  • Why SQL Server does not allow to query Windows Search directly?
  • Why Windows Search OLEDB provider does not support "Data Source" parameter?
  • Why Windows Search does not support custom protocols during remote search?
  • Why SQL Server does not support web request/web services natively?
Tuesday, April 26, 2011 8:26:10 AM UTC  #    Comments [0] -
SQL Server puzzle | Thinking aloud | Tips and tricks | Window Search
# Monday, March 07, 2011

Let's assume you're loading data into a table using BULK INSERT from tab separated file. Among others you have some varchar field, which may contain any character. Content of such field is escaped with usual scheme:

  • '\' as '\\';
  • char(13) as '\n';
  • char(10) as '\r';
  • char(9) as '\t';

But now, after loading, you want to unescape content back. How would you do it?

Notice that:

  • '\t' should be converted to a char(9);
  • '\\t' should be converted to a '\t';
  • '\\\t' should be converted to a '\' + char(9);

It might be that you're smart and you will immediately think of correct algorithm, but for us it took a while to come up with a neat solution:

declare @value varchar(max);

set @value = ...

-- This unescapes the value
set @value =
  replace
  (
    replace
    (
      replace
      (
        replace
        (
          replace(@value, '\\', '\ '),
          '\n',
          char(10)
        ),
        '\r',
        char(13)
      ),
      '\t',
      char(9)
    ),
    '\ ',
    '\'
  );

 

Do you know a better way?

Monday, March 07, 2011 9:01:24 PM UTC  #    Comments [0] -
SQL Server puzzle | Tips and tricks
# Friday, March 04, 2011

We were trying to query Windows Search from an SQL Server 2008.

Documentation states that Windows Search is exposed as OLE DB datasource. This meant that we could just query result like this:

SELECT
  *
FROM
  OPENROWSET(
    'Search.CollatorDSO.1',
    'Application=Windows',
    'SELECT "System.ItemName", "System.FileName" FROM SystemIndex');

But no, such select never works. Instead it returns obscure error messages:

OLE DB provider "Search.CollatorDSO.1" for linked server "(null)" returned message "Command was not prepared.".
Msg 7399, Level 16, State 1, Line 1
The OLE DB provider "Search.CollatorDSO.1" for linked server "(null)" reported an error. Command was not prepared.
Msg 7350, Level 16, State 2, Line 1
Cannot get the column information from OLE DB provider "Search.CollatorDSO.1" for linked server "(null)".

Microsoft is silent about reasons of such behaviour. People came to a conclusion that the problem is in the SQL Server, as one can query search results through OleDbConnection without problems.

This is very unfortunate, as it bans many use cases.

As a workaround we have defined a CLR function wrapping Windows Search call and returning rows as xml fragments. So now the query looks like this:

select
  value.value('System.ItemName[1]', 'nvarchar(max)') ItemName,
  value.value('System.FileName[1]', 'nvarchar(max)') FileName
from
  dbo.WindowsSearch('SELECT "System.ItemName", "System.FileName" FROM SystemIndex')

Notice how we decompose xml fragment back to fields with the value() function.

The C# function looks like this:

using System;
using System.Collections;
using System.IO;
using System.Xml;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using System.Data.OleDb;

using Microsoft.SqlServer.Server;

public class UserDefinedFunctions
{
  [SqlFunction(
    FillRowMethodName = "FillSearch",
    TableDefinition="value xml")]
  public static IEnumerator WindowsSearch(SqlString query)
  {
    const string provider =
      "Provider=Search.CollatorDSO;" +
      "Extended Properties='Application=Windows';" +
      "OLE DB Services=-4";

    var settings = new XmlWriterSettings
    {
      Indent = false,
      CloseOutput = false,
      ConformanceLevel = ConformanceLevel.Fragment,
      OmitXmlDeclaration = true
    };

    string[] names = null;

    using(var connection = new OleDbConnection(provider))
    using(var command = new OleDbCommand(query.Value, connection))
    {
      connection.Open();

      using(var reader = command.ExecuteReader())
      {
        while(reader.Read())
        {
          if (names == null)
          {
            names = new string[reader.FieldCount];

            for (int i = 0; i < names.Length; ++i)
            {
              names[i] = XmlConvert.EncodeLocalName(reader.GetName(i));
            }
          }

          var stream = new MemoryStream();
          var writer = XmlWriter.Create(stream, settings);

          for(int i = 0; i < names.Length; ++i)
          {
            writer.WriteElementString(names[i], Convert.ToString(reader[i]));
          }

          writer.Close();

          yield return new SqlXml(stream);
        }
      }
    }
  }

  public static void FillSearch(object value, out SqlXml row)
  {
    row = (SqlXml)value;
  }
}

Notes:

  •  Notice the use of "OLE DB Services=-4" in provider string to avoid transaction enlistment (required in SQL Server 2008).
  • Permission level of the project that defines this extension function should be set to unsafe (see Project Properties/Database in Visual Studio) otherwise it does not allow the use OLE DB.
  • SQL Server should be configured to allow CLR functions, see Server/Facets/Surface Area Configuration/ClrIntegrationEnabled in Microsoft SQL Server Management Studio
  • Assembly should either be signed or a database should be marked as trustworthy, see Database/Facets/Trustworthy in Microsoft SQL Server Management Studio.
Friday, March 04, 2011 9:22:49 AM UTC  #    Comments [0] -
SQL Server puzzle | Thinking aloud | Tips and tricks | Window Search
# Tuesday, February 27, 2007

It's now the time to explore CLR implementation of the Numbers and Split functions in the SQL Server.

I've created a simple C# assembly that defines two table valued functions Numbers_CLR and Split_CLR. Note that I had to fix autogenerated sql function declaration in order to replace nvarchar(4000) with nvarchar(max):

using System;
using System.Collections;
using System.Collections.Generic;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Diagnostics;

public class UserDefinedFunctions
{
  [SqlFunction]
  public static long GetTimestamp()
  {
    return Stopwatch.GetTimestamp();
  }

  [SqlFunction]
  public static long GetFrequency()
  {
    return Stopwatch.Frequency;
  }

  [SqlFunction(
    Name="Numbers_CLR"
    FillRowMethodName = "NumbersFillRow",
    IsPrecise = true,
    IsDeterministic = true,
    DataAccess = DataAccessKind.None,
    TableDefinition = "value int")]
  public static IEnumerator NumbersInit(int count)
  {
    for (int i = 0; i < count; i++)
    { 
      yield return i;
    }
  }

  public static void NumbersFillRow(Object obj, out int value)
  {
    value = (int)obj;
  }

  [SqlFunction(
    Name = "Split_CLR",
    FillRowMethodName = "SplitFillRow",
    IsPrecise = true,
    IsDeterministic = true,
    DataAccess = DataAccessKind.None,
    TableDefinition = "value nvarchar(max)")]
  public static IEnumerator SplitInit(string value, string splitter)
  {
    if (string.IsNullOrEmpty(value))
      yield break;

    if (string.IsNullOrEmpty(splitter))
      splitter = ",";

    for(int i = 0; i < value.Length; )
    {
      int next = value.IndexOf(splitter, i);

      if (next == -1)
      {
        yield return value.Substring(i);

        break;
      }
      else
      {
        yield return value.Substring(i, next - i);

        i = next + splitter.Length;
      }
    }
  }

  public static void SplitFillRow(Object obj, out string value)
  {
    value = (string)obj;
  }
};

These are results of the test of differents variants of the numbers function for different numbers of lines to return (length):

i    description    length   duration   msPerNumber
---- -------------- -------- ---------- -----------
0    Numbers        1        0.0964     0.0964
0    Numbers_CTE    1        0.2319     0.2319
0    Numbers_Table  1        0.1710     0.1710
0    Numbers_CLR    1        0.1729     0.1729
1    Numbers        2        0.0615     0.0307
1    Numbers_CTE    2        0.1327     0.0663
1    Numbers_Table  2        0.0816     0.0408
1    Numbers_CLR    2        0.1078     0.0539
2    Numbers        4        0.0598     0.0149
2    Numbers_CTE    4        0.1609     0.0402
2    Numbers_Table  4        0.0810     0.0203
2    Numbers_CLR    4        0.1092     0.0273
3    Numbers        8        0.0598     0.0075
3    Numbers_CTE    8        0.2308     0.0288
3    Numbers_Table  8        0.0813     0.0102
3    Numbers_CLR    8        0.1129     0.0141
4    Numbers        16       0.0598     0.0037
4    Numbers_CTE    16       0.3724     0.0233
4    Numbers_Table  16       0.0827     0.0052
4    Numbers_CLR    16       0.1198     0.0075
5    Numbers        32       0.0606     0.0019
5    Numbers_CTE    32       0.6473     0.0202
5    Numbers_Table  32       0.0852     0.0027
5    Numbers_CLR    32       0.1347     0.0042
6    Numbers        64       0.0615     0.0010
6    Numbers_CTE    64       1.1926     0.0186
6    Numbers_Table  64       0.0886     0.0014
6    Numbers_CLR    64       0.1648     0.0026
7    Numbers        128      0.0637     0.0005
7    Numbers_CTE    128      2.2886     0.0179
7    Numbers_Table  128      0.0978     0.0008
7    Numbers_CLR    128      0.2204     0.0017
8    Numbers        256      0.0679     0.0003
8    Numbers_CTE    256      4.9774     0.0194
8    Numbers_Table  256      0.1243     0.0005
8    Numbers_CLR    256      0.3486     0.0014
9    Numbers        512      0.0785     0.0002
9    Numbers_CTE    512      8.8983     0.0174
9    Numbers_Table  512      0.1523     0.0003
9    Numbers_CLR    512      0.5635     0.0011
10   Numbers        1024     0.0958     0.0001
10   Numbers_CTE    1024     17.8679    0.0174
10   Numbers_Table  1024     0.2453     0.0002
10   Numbers_CLR    1024     1.0504     0.0010
11   Numbers        2048     0.1324     0.0001
11   Numbers_CTE    2048     35.8185    0.0175
11   Numbers_Table  2048     0.3811     0.0002
11   Numbers_CLR    2048     1.9206     0.0009
12   Numbers        4096     0.1992     0.0000
12   Numbers_CTE    4096     70.9478    0.0173
12   Numbers_Table  4096     0.6772     0.0002
12   Numbers_CLR    4096     3.6921     0.0009
13   Numbers        8192     0.3361     0.0000
13   Numbers_CTE    8192     143.3364   0.0175
13   Numbers_Table  8192     1.2809     0.0002
13   Numbers_CLR    8192     7.3931     0.0009
14   Numbers        16384    0.6099     0.0000
14   Numbers_CTE    16384    286.7471   0.0175
14   Numbers_Table  16384    2.4579     0.0002
14   Numbers_CLR    16384    14.4731    0.0009
15   Numbers        32768    1.1546     0.0000
15   Numbers_CTE    32768    573.6626   0.0175
15   Numbers_Table  32768    4.7919     0.0001
15   Numbers_CLR    32768    29.0313    0.0009
16   Numbers        65536    2.3103     0.0000
16   Numbers_CTE    65536    1144.4052  0.0175
16   Numbers_Table  65536    9.5132     0.0001
16   Numbers_CLR    65536    57.7154    0.0009
17   Numbers        131072   4.4265     0.0000
17   Numbers_CTE    131072   2314.5917  0.0177
17   Numbers_Table  131072   18.9130    0.0001
17   Numbers_CLR    131072   116.4268   0.0009
18   Numbers        262144   8.7860     0.0000
18   Numbers_CTE    262144   4662.7233  0.0178
18   Numbers_Table  262144   38.3024    0.0001
18   Numbers_CLR    262144   230.1522   0.0009
19   Numbers        524288   18.4638    0.0000
19   Numbers_CTE    524288   9182.8146  0.0175
19   Numbers_Table  524288   83.4575    0.0002
19   Numbers_CLR    524288   468.0195   0.0009

These are results of the test of differents variants of the split function for different length of the string (length):

i    description    strLength duration   msPerChar
---- -------------- --------- ---------- ----------
0    Split          1         0.1442     0.1442
0    Split_CTE      1         0.2665     0.2665
0    Split_Table    1         0.2090     0.2090
0    Split_CLR      1         0.1964     0.1964
1    Split          2         0.0902     0.0451
1    Split_CTE      2         0.1788     0.0894
1    Split_Table    2         0.1087     0.0543
1    Split_CLR      2         0.1056     0.0528
2    Split          4         0.0933     0.0233
2    Split_CTE      4         0.2618     0.0654
2    Split_Table    4         0.1162     0.0291
2    Split_CLR      4         0.1143     0.0286
3    Split          8         0.1092     0.0137
3    Split_CTE      8         0.4408     0.0551
3    Split_Table    8         0.1344     0.0168
3    Split_CLR      8         0.1324     0.0166
4    Split          16        0.1422     0.0089
4    Split_CTE      16        0.7990     0.0499
4    Split_Table    16        0.1715     0.0107
4    Split_CLR      16        0.1687     0.0105
5    Split          32        0.2090     0.0065
5    Split_CTE      32        1.4924     0.0466
5    Split_Table    32        0.2458     0.0077
5    Split_CLR      32        0.4582     0.0143
6    Split          64        0.3464     0.0054
6    Split_CTE      64        2.9129     0.0455
6    Split_Table    64        0.3947     0.0062
6    Split_CLR      64        0.3880     0.0061
7    Split          128       0.6101     0.0048
7    Split_CTE      128       5.7348     0.0448
7    Split_Table    128       0.6898     0.0054
7    Split_CLR      128       0.6825     0.0053
8    Split          256       1.1504     0.0045
8    Split_CTE      256       11.5610    0.0452
8    Split_Table    256       1.3044     0.0051
8    Split_CLR      256       1.2901     0.0050
9    Split          512       2.2430     0.0044
9    Split_CTE      512       23.3854    0.0457
9    Split_Table    512       2.4992     0.0049
9    Split_CLR      512       2.4838     0.0049
10   Split          1024      4.5048     0.0044
10   Split_CTE      1024      45.7030    0.0446
10   Split_Table    1024      4.8886     0.0048
10   Split_CLR      1024      4.8601     0.0047
11   Split          2048      8.8229     0.0043
11   Split_CTE      2048      92.6160    0.0452
11   Split_Table    2048      9.7381     0.0048
11   Split_CLR      2048      9.8848     0.0048
12   Split          4096      17.6285    0.0043
12   Split_CTE      4096      184.3265   0.0450
12   Split_Table    4096      19.4092    0.0047
12   Split_CLR      4096      19.3849    0.0047
13   Split          8192      36.5924    0.0045
13   Split_CTE      8192      393.8663   0.0481
13   Split_Table    8192      39.3296    0.0048
13   Split_CLR      8192      38.9569    0.0048
14   Split          16384     70.7693    0.0043
14   Split_CTE      16384     740.2636   0.0452
14   Split_Table    16384     77.6300    0.0047
14   Split_CLR      16384     77.6878    0.0047
15   Split          32768     141.4202   0.0043
15   Split_CTE      32768     1481.5788  0.0452
15   Split_Table    32768     155.0163   0.0047
15   Split_CLR      32768     155.5904   0.0047
16   Split          65536     282.8597   0.0043
16   Split_CTE      65536     3098.3636  0.0473
16   Split_Table    65536     315.7588   0.0048
16   Split_CLR      65536     316.1782   0.0048
17   Split          131072    574.3652   0.0044
17   Split_CTE      131072    6021.9827  0.0459
17   Split_Table    131072    630.6880   0.0048
17   Split_CLR      131072    650.8676   0.0050
18   Split          262144    5526.9491  0.0211
18   Split_CTE      262144    17645.2219 0.0673
18   Split_Table    262144    5807.3244  0.0222
18   Split_CLR      262144    5759.6946  0.0220
19   Split          524288    11006.3019 0.0210
19   Split_CTE      524288    35093.2482 0.0669
19   Split_Table    524288    11585.3233 0.0221
19   Split_CLR      524288    11550.8323 0.0220

The results are:

  1. Recursive common table expression shows the worst timing.
  2. Split_CLR is on the pair with Split_Table, however Numbers_Table is better than Numbers_CLR.
  3. Split and Numbers based on unrolled recursion show the best timing (most of the time).

The End. :-)

Tuesday, February 27, 2007 1:40:04 PM UTC  #    Comments [0] -
SQL Server puzzle
# Friday, February 23, 2007

Well, several days have passed but for a some reason I've started to feel uncomfortable about Numbers function. It's all because of poor recursive CTE implementation. I have decided to unroll the cycle. The new version hovewer isn't a beautiful but is providing much more superior performance comparing with previous implementation:

/*
  Returns numbers table.
  Table has a following structure: table(value int not null);
  value is an integer number that contains numbers from 1 to a specified value.
*/

create function dbo.Numbers
(    
  /* Number of rows to return. */
  @count int
)
returns table
as
return
  with Number4(Value) as
  (
    select 0 union all select 0 union all 
    select 0 union all select 0 union all
    select 0 union all select 0 union all 
    select 0 union all select 0 union all
    select 0 union all select 0 union all 
    select 0 union all select 0 union all
    select 0 union all select 0 union all 
    select 0 union all select 0
  ),
  Number8(Value) as
  (
    select 0 from Number4 union all select 0 from Number4 union all 
    select 0 from Number4 union all select 0 from Number4 union all 
    select 0 from Number4 union all select 0 from Number4 union all 
    select 0 from Number4 union all select 0 from Number4 union all 
    select 0 from Number4 union all select 0 from Number4 union all 
    select 0 from Number4 union all select 0 from Number4 union all 
    select 0 from Number4 union all select 0 from Number4 union all 
    select 0 from Number4 union all select 0 from Number4
  ),
  Number32(Value) as
  (
    select 0 from Number8 N1, Number8 N2, Number8 N3, Number8 N4
  )
  select top(@count) row_number() over(order by Value) Value from Number32;

The performance achieved is on pair with numbers table. Estimated number of rows is precise whenever we pass constant as parameter.

What is the moral? - There is a space for the enhancements in the recursive CTE.

Next day

Guess what? - Yes! :-) there is also the CLR, which allows to create one more implementation of the numbers and split functions. In the next entry I'll show it, and performance comparison of different approaches.

Friday, February 23, 2007 12:21:31 AM UTC  #    Comments [0] -
SQL Server puzzle
# Tuesday, February 20, 2007

This task is already discussed many times. SQL Server 2005 allows to create an inline function that splits such a string. The logic of such a function is self explanatory, which also hints that SQL syntax became better:

/*
  Returns numbers table.
  Table has a following structure: table(value int not null);
  value is an integer number that contains numbers from 0 to a specified value.
*/

create function dbo.Numbers
(    
  /* Number of rows to return. */
  @count int
)
returns table
as
return
with numbers(value) as
(
  select 0
  union all
  select value * 2 + 1 from numbers where value < @count / 2
  union all
  select value * 2 + 2 from numbers where value < (@count - 1) / 2
)
select
  row_number() over(order by U.v) value
from
  numbers cross apply (select 0 v) U;

/*
  Splits string using split character.
  Returns a table that contains split positions and split values:
  table(Pos, Value)
*/

create function dbo.Split
(
  /* A string to split. */
  @value nvarchar(max),
  /* An optional split character.*/
  @splitChar nvarchar(max) = N','
)
returns table
as
return
with Bound(Pos) as
(
  select
    Value
  from
    dbo.Numbers(len(@value))
  where
    (Value = 1) or
    (substring(@value, Value - 1, len(@splitChar)) = @splitChar)
),
Word(Pos, Value) as
(
  select
    Bound.Pos,
    substring
    (
      @value,
      Bound.Pos,
      case when Splitter.Pos > 0
        then Splitter.Pos
        else len(@value) + 1
      end - Bound.Pos
    )
  from
    Bound
    cross apply
    (select charindex(@splitChar, @value, Pos) Pos) Splitter
)
select Pos, Value from Word;

Test:

declare @s nvarchar(max);

set @s = N'ALFKI,BONAP,CACTU,FRANK';

select Value from System.Split(@s, default) order by Pos;

See also: Arrays and Lists in SQL Server, Numbers table in SQL Server 2005, Parade of numbers

Tuesday, February 20, 2007 1:10:06 PM UTC  #    Comments [0] -
SQL Server puzzle
# Wednesday, February 07, 2007

SQL Server 2005 has got built-in partitions. As result, I have been given a task to port a database from SQL Server 2000 to 2005, and replace old style partitions with new one. It seems reasonable, but before modifying a production database, which is about 5TB in size, I've tested a small one.

Switch the data - it's an easy part. I need also to test all related stored procedures. At this point I've found shortcomings, which tightly related to a nature of the partitions.

In select statement SQL Server 2005 iterates over partitions, in contrast SQL Server 2000 rolls out partition view and embeds partition tables into an execution plan. The performance difference can be dramatic (the case I'm dealing with).

Suppose you are to get 'top N' rows of ordered set of data from several partitions. SQL Server 2000 can perform operations on partitions (to get ordered result per partition), and then merge them, and return 'top N' rows. However, if execution plan just iterates partitions and applies the same operations to each partition in sequential manner the result will be semiordered. To get 'top N' rows the sort operator is required. This is the case of SQL Server 2005.

The problem is that the SQL Server 2005 never uses merge operator to combine results!

To illustrate the problem let's define two partitioned tables:

create partition function [test](smalldatetime) as range left for values (N'2007-01-01', N'2007-02-01')
go

create partition scheme [testScheme] as partition [test] to [primary], [primary], [primary])
go

CREATE TABLE [dbo].[Test2000_12](
    [A] [smalldatetime] NOT NULL,
    [B] [int] NOT NULL,
    [C] [nvarchar](50) NULL,
CONSTRAINT [PK_Test2000_12] PRIMARY KEY CLUSTERED
(
    [A] ASC,
    [B] ASC
)
)
GO

CREATE NONCLUSTERED INDEX [IX_Test2000_12] ON [dbo].[Test2000_12]
(
    [B] ASC,
    [A] ASC
)
GO

CREATE TABLE [dbo].[Test2000_01](
    [A] [smalldatetime] NOT NULL,
    [B] [int] NOT NULL,
    [C] [nvarchar](50) NULL,
CONSTRAINT [PK_Test2000_01] PRIMARY KEY CLUSTERED
(
    [A] ASC,
    [B] ASC
)
)
GO

CREATE NONCLUSTERED INDEX [IX_Test2000_01] ON [dbo].[Test2000_01]
(
    [B] ASC,
    [A] ASC
)
GO

CREATE TABLE [dbo].[Test2000_02](
    [A] [smalldatetime] NOT NULL,
    [B] [int] NOT NULL,
    [C] [nvarchar](50) NULL,
CONSTRAINT [PK_Test2000_02] PRIMARY KEY CLUSTERED
(
    [A] ASC,
    [B] ASC
)
)
GO

CREATE NONCLUSTERED INDEX [IX_Test2000_02] ON [dbo].[Test2000_02]
(
    [B] ASC,
    [A] ASC
)
GO

CREATE TABLE [dbo].[Test2005](
    [A] [smalldatetime] NOT NULL,
    [B] [int] NOT NULL,
    [C] [nvarchar](50) NULL,
CONSTRAINT [PK_Test2005] PRIMARY KEY CLUSTERED
(
    [A] ASC,
    [B] ASC
)
) ON [testScheme]([A])
GO

CREATE NONCLUSTERED INDEX [IX_Test2005] ON [dbo].[Test2005]
(
    [B] ASC,
    [A] ASC
) ON [testScheme]([A])
GO

ALTER TABLE [dbo].[Test2000_01] WITH CHECK ADD CONSTRAINT [CK_Test2000_01] CHECK (([A]>='2007-01-01' AND [A]<'2007-02-01'))
GO
ALTER TABLE [dbo].[Test2000_01] CHECK CONSTRAINT [CK_Test2000_01]
GO

ALTER TABLE [dbo].[Test2000_02] WITH CHECK ADD CONSTRAINT [CK_Test2000_02] CHECK (([A]>='2007-02-01'))
GO
ALTER TABLE [dbo].[Test2000_02] CHECK CONSTRAINT [CK_Test2000_02]
GO

ALTER TABLE [dbo].[Test2000_12] WITH CHECK ADD CONSTRAINT [CK_Test2000_12] CHECK (([A]<'2007-01-01'))
GO
ALTER TABLE [dbo].[Test2000_12] CHECK CONSTRAINT [CK_Test2000_12]
GO

create view [dbo].[test2000] as
select * from dbo.test2000_12
union all
select * from dbo.test2000_01
union all
select * from dbo.test2000_02
go


/*
Returns numbers table.
Table has a following structure: table(value int not null);
value is an integer number that contains numbers from 0 to a specified value.
*/

create FUNCTION dbo.[Numbers]
(    
/* Number of rows to return. */
@count int
)
RETURNS TABLE
AS
RETURN
with numbers(value) as
(
select 0
union all
select value * 2 + 1 from numbers where value < @count / 2
union all
select value * 2 + 2 from numbers where value < (@count - 1) / 2
)
select
row_number() over(order by U.v) value
from
numbers cross apply (select 0 v) U

Pupulate tables:

insert into dbo.Test2005
select
cast(N'2006-01-01' as smalldatetime) + 0.001 * N.Value,
N.Value,
N'Value' + cast(N.Value as nvarchar(16))
from
dbo.Numbers(500000) N
go

insert into dbo.Test2000
select
cast(N'2006-01-01' as smalldatetime) + 0.001 * N.Value,
N.Value,
N'Value' + cast(N.Value as nvarchar(16))
from
dbo.Numbers(500000) N
go

Perform a test:

select top 20
A, B
from
dbo.Test2005
--where
--(A between '2006-01-10' and '2007-01-10')
order by
B

select top 20
A, B
from
dbo.Test2000
--where
--(A between '2006-01-10' and '2007-01-10')
order by
B
--option(merge union)

The difference is obvious if you will open execution plan. In the first case estimated subtree cost is: 17.4099; in the second: 0.0455385.

SQL server cannot efficiently use index on columns (B, A). The problem presented here can appear in any select that occasionally accesses two partitions, but regulary uses only one, provided it uses a secondary index. In fact this covers about 30% of all selects in my database.

Next day

I've meditated a little bit more and devised a centaur: I can define a partition view over partition table. Thus I can use either this view or table depending on what I'm trying to achieve either iterate partitions or roll them out.

create view [dbo].[Test2005_View] as
select * from dbo.Test2005 where $partition.test(A) = 1
union all
select * from dbo.Test2005 where $partition.test(A) = 2
union all
select * from dbo.Test2005 where $partition.test(A) = 3

The following select is running the same way as SQL Server 2000 partitions:

select top 20
A, B
from
dbo.Test2005_View
-- dbo.Test2005
order by
B

Wednesday, February 07, 2007 6:32:54 PM UTC  #    Comments [0] -
SQL Server puzzle
# Friday, November 17, 2006

I need to log actions into log table in my stored procedure, which is called in context of some transaction. The records in the log table I need no matter what happens (no, it's even more important to get them there if operation fails).

begin transaction
...
execute some_proc
...
if (...)
commit transaction
else
rollback transaction

some_proc:

...

insert into log...

insert ...
update ...

insert into log...

...

How to do this?

November 25

I've found two approaches:

  • table variables, which do not participate into transactions;
  • remote queries, which do not participate into local transactions;

The second way is more reliable, however not the fastest one. The idea is to execute query on the same sever as if it's a linked server.

Suppose you have a log table:

create table System.Log
(
  ID int identity(1,1) not null,
  Date datetime not null default getdate(),
  Type int null,
  Value nvarchar(max) null
);

To add log record you shall define a stored procedure:

create procedure System.WriteLog
(
  @type int,
  @message nvarchar(max)
)
as
begin
  set nocount on;

  execute(
    'insert into dbname.System.Log(Type, Value) values(?, ?)',
    @type,
    @message)
    as user = 'user_name'
    at same_server_name;
end

Whenever you're calling System.WriteLog in context of local transaction the records are inserted into the System.Log table in a separate transaction.

Friday, November 17, 2006 1:35:05 PM UTC  #    Comments [0] -
SQL Server puzzle
# Saturday, November 04, 2006

My next SQL puzzle (thanks to fabulous XQuery support in SQL Server 2005) is how to reconstruct xml from the hierarchy table. This is reverse to the "Load xml into the table".

Suppose you have:

  select Parent, Node, Name from Data

where
  (Parent, Node) - defines xml hierarchy, and
  Name - xml element name.

How would you restore original xml?


November 8, 2006 To my anonymous reader:

declare @content nvarchar(max);

set @content = '';

with Tree(Node, Parent, Name) as
(
  /* Source tree */
  select Node, Parent, Name from Data
),
Leaf(Node) as
(
  select Node from Tree
  except
  select Parent from Tree
),
NodeHeir(Node, Ancestor) as
(
  select Node, Parent from Tree
  union all
  select
    H.Node, T.Parent
  from
    Tree T inner join NodeHeir H on H.Ancestor = T.Node
),
ParentDescendants(Node, Descendats) as
(
  select
    Ancestor, count(Ancestor)
  from
    NodeHeir
  where
    Ancestor > 0
  group by
    Ancestor
),
Line(Row, Node, Text) as
(
  select
    O.Row, T.Node, O.Text
  from
    ParentDescendants D
    inner join
    Tree T
    on D.Node = T.Node
    cross apply
    (
      select D.Node * 2 - 1 Row, '<' + T.Name + '>' Text
      union all
      select (D.Node + D.Descendats) * 2, '</' + T.Name + '>'
    ) O
  union all
  select
    D.Node * 2 - 1, T.Node, '<' + T.Name + '/>'
  from
    Leaf D inner join Tree T on D.Node = T.Node
)
select top(cast(0x7fffffff as int))
  @content = @content + Text
from
  Line
order by
  Row asc, Node desc
option(maxrecursion 128);

select cast(@content as xml);

Saturday, November 04, 2006 9:50:22 AM UTC  #    Comments [0] -
SQL Server puzzle
# Friday, October 27, 2006

Say you need to load a table from an xml document, and this table defines some hierarchy. Believe me or not, but this is not that case when its better to store xml in the table.

Let's presume the table has:

  • Node - document node id;
  • Parent - parent node id;
  • Name - node name.

The following defines a sample xml document we shall work with:

declare @content xml;

set @content = '
<document>
  <header/>
  <activity>
    <title/>
    <row/>
    <row/>
    <row/>
    <row/>
    <total/>
  </activity>
  <activity>
    <title/>
    <row/>
    <total/>
  </activity>
  <activity>
    <title/>
    <row/>
    <total/>
  </activity>
  <activity>
    <title/>
    <row/>
    <row/>
    <row/>
    <total/>
  </activity>
</document>';

How would you solved this task?

I've been spending a whole day building acceptable solution. This is probably because I'm not an SQL guru. I've found answers using cursors, openxml, pure xquery, and finally hybrid of xquery and sql ranking functions.

The last is fast, and has linear dependency of working time to xml size.

with NodeGroup(ParentGroup, Node, Name) as
(
  select 
    dense_rank() over(order by P.Node),
    row_number() over(order by N.Node),
    N.Node.value('local-name(.)', 'nvarchar(max)')
  from 
    @content.nodes('//*') N(Node) 
    cross apply
    Node.nodes('..') P(Node)
),
Node(Parent, Node, Name) as
(
  select 
    min(Node) over(partition by ParentGroup) - 1, Node, Name
  from 
    NodeGroup 
)
select * from Node order by Node;

Is there a better way? Anyone?

Friday, October 27, 2006 12:23:25 PM UTC  #    Comments [0] -
SQL Server puzzle
# Monday, October 02, 2006

Return a table of numbers from 0 up to a some value. I'm facing this recurring task once in several years. Such periodicity induces me to invent solution once again but using contemporary features.

November 18:

This time I have succeeded to solve the task in one select:

declare @count int;

set @count = 1000;

with numbers(value) as
(
  select 0
  union all
  select value * 2 + 1 from numbers where value < @count / 2
  union all
  select value * 2 + 2 from numbers where value < (@count - 1) / 2
)
select
  row_number() over(order by U.V) value
from
  numbers cross apply (select 1 V) U;

Do you have a better solution?

Monday, October 02, 2006 7:27:51 AM UTC  #    Comments [0] -
SQL Server puzzle | Tips and tricks
Archive
<February 2012>
SunMonTueWedThuFriSat
2930311234
567891011
12131415161718
19202122232425
26272829123
45678910
Statistics
Total Posts: 241
This Year: 5
This Month: 1
This Week: 0
Comments: 181
Locations of visitors to this page
Disclaimer
The opinions expressed herein are our own personal opinions and do not represent our employer's view in anyway.

© 2012, Nesterovsky bros
All Content © 2012, Nesterovsky bros
DasBlog theme 'Business' created by Christoph De Baene (delarou)