Monday, 12 March 2007 - Nesterovsky bros

Monday, 12 March 2007

C++ Standard Library Issues List, Issue 254

C++ Standard Library Issues List, Issue 254

I'm tracking this issue already for the several years, and have my unpretentious opinion. To make my arguments clear I'll bring the issue description here.

254. Exception types in clause 19 are constructed from `std::string`

Section: 19.1 [std.exceptions], 27.4.2.1.1 [ios::failure] Status: Tentatively Ready Submitter: Dave Abrahams Date: 2000-08-01

Discussion:

Many of the standard exception types which implementations are required to throw are constructed with a const std::string& parameter. For example:

     19.1.5  Class out_of_range                          [lib.out.of.range]
     namespace std {
       class out_of_range : public logic_error {
       public:
         explicit out_of_range(const string& what_arg);
       };
     }

   1 The class out_of_range defines the type of objects  thrown  as  excep-
     tions to report an argument value not in its expected range.

     out_of_range(const string& what_arg);

     Effects:
       Constructs an object of class out_of_range.
     Postcondition:
       strcmp(what(), what_arg.c_str()) == 0.

There are at least two problems with this:

A program which is low on memory may end up throwing std::bad_alloc instead of out_of_range because memory runs out while constructing the exception object.
An obvious implementation which stores a std::string data member may end up invoking terminate() during exception unwinding because the exception object allocates memory (or rather fails to) as it is being copied.

There may be no cure for (1) other than changing the interface to out_of_range, though one could reasonably argue that (1) is not a defect. Personally I don't care that much if out-of-memory is reported when I only have 20 bytes left, in the case when out_of_range would have been reported. People who use exception-specifications might care a lot, though.

There is a cure for (2), but it isn't completely obvious. I think a note for implementors should be made in the standard. Avoiding possible termination in this case shouldn't be left up to chance. The cure is to use a reference-counted "string" implementation in the exception object. I am not necessarily referring to a std::string here; any simple reference-counting scheme for a NTBS would do.

Further discussion, in email:

...I'm not so concerned about (1). After all, a library implementation can add const char* constructors as an extension, and users don't need to avail themselves of the standard exceptions, though this is a lame position to be forced into. FWIW, std::exception and std::bad_alloc don't require a temporary basic_string.

...I don't think the fixed-size buffer is a solution to the problem, strictly speaking, because you can't satisfy the postcondition
strcmp(what(), what_arg.c_str()) == 0
For all values of what_arg (i.e. very long values). That means that the only truly conforming solution requires a dynamic allocation.

Further discussion, from Redmond:

The most important progress we made at the Redmond meeting was realizing that there are two separable issues here: the const string& constructor, and the copy constructor. If a user writes something like throw std::out_of_range("foo"), the const string& constructor is invoked before anything gets thrown. The copy constructor is potentially invoked during stack unwinding.

The copy constructor is a more serious problem, becuase failure during stack unwinding invokes terminate. The copy constructor must be nothrow. Curaçao: Howard thinks this requirement may already be present.

The fundamental problem is that it's difficult to get the nothrow requirement to work well with the requirement that the exception objects store a string of unbounded size, particularly if you also try to make the const string& constructor nothrow. Options discussed include:

Limit the size of a string that exception objects are required to throw: change the postconditions of 19.1.2 [domain.error] paragraph 3 and 19.1.6 [runtime.error] paragraph 3 to something like this: "strncmp(what(), what_arg._str(), N) == 0, where N is an implementation defined constant no smaller than 256".
Allow the const string& constructor to throw, but not the copy constructor. It's the implementor's responsibility to get it right. (An implementor might use a simple refcount class.)
Compromise between the two: an implementation is not allowed to throw if the string's length is less than some N, but, if it doesn't throw, the string must compare equal to the argument.
Add a new constructor that takes a const char*

(Not all of these options are mutually exclusive.)

...

To be honest, I do not understand their (committee members') decisions. It seems they are trying to conceal themselves from the problem virtually proposing to store character buffer in the exception object. In fact the problem is more general, and is related to any exception types that store some data, and which can throw during copy construction. How to avoid problems during copy construction? Well, do not perform activity that can lead to an exception. If copying data can throw, then do not copy it! Thus we have to share data between exception objects.

This logic brought me to a safe exception type design. E.g. exception object should keep refcounted handle to a data object that is shared between type instances.

The only question is: why didn't they even consider this way?

Monday, 12 March 2007 09:52:09 UTC

Comments [0] -
Tips and tricks

Tuesday, 27 February 2007

CLR Numbers

It's now the time to explore CLR implementation of the Numbers and Split functions in the SQL Server.

I've created a simple C# assembly that defines two table valued functions Numbers_CLR and Split_CLR. Note that I had to fix autogenerated sql function declaration in order to replace nvarchar(4000) with nvarchar(max):

using System;
using System.Collections;
using System.Collections.Generic;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Diagnostics;

public class UserDefinedFunctions
{
[SqlFunction]
public static long GetTimestamp()
{
    return Stopwatch.GetTimestamp();
}

[SqlFunction]
public static long GetFrequency()
{
    return Stopwatch.Frequency;
}

[SqlFunction(
    Name="Numbers_CLR",
    FillRowMethodName = "NumbersFillRow",
    IsPrecise = true,
    IsDeterministic = true,
    DataAccess = DataAccessKind.None,
    TableDefinition = "value int")]
public static IEnumerator NumbersInit(int count)
{
    for (int i = 0; i < count; i++)
    {
      yield return i;
    }
}

public static void NumbersFillRow(Object obj, out int value)
{
    value = (int)obj;
}

[SqlFunction(
    Name = "Split_CLR",
    FillRowMethodName = "SplitFillRow",
    IsPrecise = true,
    IsDeterministic = true,
    DataAccess = DataAccessKind.None,
    TableDefinition = "value nvarchar(max)")]
public static IEnumerator SplitInit(string value, string splitter)
{
    if (string.IsNullOrEmpty(value))
      yield break;

    if (string.IsNullOrEmpty(splitter))
      splitter = ",";

    for(int i = 0; i < value.Length; )
    {
      int next = value.IndexOf(splitter, i);

      if (next == -1)
      {
        yield return value.Substring(i);

        break;
      }
      else
      {
        yield return value.Substring(i, next - i);

        i = next + splitter.Length;
      }
    }
}

public static void SplitFillRow(Object obj, out string value)
{
    value = (string)obj;
}
};

These are results of the test of differents variants of the numbers function for different numbers of lines to return (length):

i    description    length   duration   msPerNumber
---- -------------- -------- ---------- -----------
0    Numbers        1        0.0964     0.0964
0    Numbers_CTE    1        0.2319     0.2319
0    Numbers_Table  1        0.1710     0.1710
0    Numbers_CLR    1        0.1729     0.1729
1    Numbers        2        0.0615     0.0307
1    Numbers_CTE    2        0.1327     0.0663
1    Numbers_Table  2        0.0816     0.0408
1    Numbers_CLR    2        0.1078     0.0539
2    Numbers        4        0.0598     0.0149
2    Numbers_CTE    4        0.1609     0.0402
2    Numbers_Table  4        0.0810     0.0203
2    Numbers_CLR    4        0.1092     0.0273
3    Numbers        8        0.0598     0.0075
3    Numbers_CTE    8        0.2308     0.0288
3    Numbers_Table  8        0.0813     0.0102
3    Numbers_CLR    8        0.1129     0.0141
4    Numbers        16       0.0598     0.0037
4    Numbers_CTE    16       0.3724     0.0233
4    Numbers_Table  16       0.0827     0.0052
4    Numbers_CLR    16       0.1198     0.0075
5    Numbers        32       0.0606     0.0019
5    Numbers_CTE    32       0.6473     0.0202
5    Numbers_Table  32       0.0852     0.0027
5    Numbers_CLR    32       0.1347     0.0042
6    Numbers        64       0.0615     0.0010
6    Numbers_CTE    64       1.1926     0.0186
6    Numbers_Table  64       0.0886     0.0014
6    Numbers_CLR    64       0.1648     0.0026
7    Numbers        128      0.0637     0.0005
7    Numbers_CTE    128      2.2886     0.0179
7    Numbers_Table  128      0.0978     0.0008
7    Numbers_CLR    128      0.2204     0.0017
8    Numbers        256      0.0679     0.0003
8    Numbers_CTE    256      4.9774     0.0194
8    Numbers_Table  256      0.1243     0.0005
8    Numbers_CLR    256      0.3486     0.0014
9    Numbers        512      0.0785     0.0002
9    Numbers_CTE    512      8.8983     0.0174
9    Numbers_Table  512      0.1523     0.0003
9    Numbers_CLR    512      0.5635     0.0011
10   Numbers        1024     0.0958     0.0001
10   Numbers_CTE    1024     17.8679    0.0174
10   Numbers_Table  1024     0.2453     0.0002
10   Numbers_CLR    1024     1.0504     0.0010
11   Numbers        2048     0.1324     0.0001
11   Numbers_CTE    2048     35.8185    0.0175
11   Numbers_Table  2048     0.3811     0.0002
11   Numbers_CLR    2048     1.9206     0.0009
12   Numbers        4096     0.1992     0.0000
12   Numbers_CTE    4096     70.9478    0.0173
12   Numbers_Table  4096     0.6772     0.0002
12   Numbers_CLR    4096     3.6921     0.0009
13   Numbers        8192     0.3361     0.0000
13   Numbers_CTE    8192     143.3364   0.0175
13   Numbers_Table  8192     1.2809     0.0002
13   Numbers_CLR    8192     7.3931     0.0009
14   Numbers        16384    0.6099     0.0000
14   Numbers_CTE    16384    286.7471   0.0175
14   Numbers_Table  16384    2.4579     0.0002
14   Numbers_CLR    16384    14.4731    0.0009
15   Numbers        32768    1.1546     0.0000
15   Numbers_CTE    32768    573.6626   0.0175
15   Numbers_Table  32768    4.7919     0.0001
15   Numbers_CLR    32768    29.0313    0.0009
16   Numbers        65536    2.3103     0.0000
16   Numbers_CTE    65536    1144.4052  0.0175
16   Numbers_Table  65536    9.5132     0.0001
16   Numbers_CLR    65536    57.7154    0.0009
17   Numbers        131072   4.4265     0.0000
17   Numbers_CTE    131072   2314.5917  0.0177
17   Numbers_Table  131072   18.9130    0.0001
17   Numbers_CLR    131072   116.4268   0.0009
18   Numbers        262144   8.7860     0.0000
18   Numbers_CTE    262144   4662.7233  0.0178
18   Numbers_Table  262144   38.3024    0.0001
18   Numbers_CLR    262144   230.1522   0.0009
19   Numbers        524288   18.4638    0.0000
19   Numbers_CTE    524288   9182.8146  0.0175
19   Numbers_Table  524288   83.4575    0.0002
19   Numbers_CLR    524288   468.0195   0.0009

These are results of the test of differents variants of the split function for different length of the string (length):

i    description    strLength duration   msPerChar
---- -------------- --------- ---------- ----------
0    Split          1         0.1442     0.1442
0    Split_CTE      1         0.2665     0.2665
0    Split_Table    1         0.2090     0.2090
0    Split_CLR      1         0.1964     0.1964
1    Split          2         0.0902     0.0451
1    Split_CTE      2         0.1788     0.0894
1    Split_Table    2         0.1087     0.0543
1    Split_CLR      2         0.1056     0.0528
2    Split          4         0.0933     0.0233
2    Split_CTE      4         0.2618     0.0654
2    Split_Table    4         0.1162     0.0291
2    Split_CLR      4         0.1143     0.0286
3    Split          8         0.1092     0.0137
3    Split_CTE      8         0.4408     0.0551
3    Split_Table    8         0.1344     0.0168
3    Split_CLR      8         0.1324     0.0166
4    Split          16        0.1422     0.0089
4    Split_CTE      16        0.7990     0.0499
4    Split_Table    16        0.1715     0.0107
4    Split_CLR      16        0.1687     0.0105
5    Split          32        0.2090     0.0065
5    Split_CTE      32        1.4924     0.0466
5    Split_Table    32        0.2458     0.0077
5    Split_CLR      32        0.4582     0.0143
6    Split          64        0.3464     0.0054
6    Split_CTE      64        2.9129     0.0455
6    Split_Table    64        0.3947     0.0062
6    Split_CLR      64        0.3880     0.0061
7    Split          128       0.6101     0.0048
7    Split_CTE      128       5.7348     0.0448
7    Split_Table    128       0.6898     0.0054
7    Split_CLR      128       0.6825     0.0053
8    Split          256       1.1504     0.0045
8    Split_CTE      256       11.5610    0.0452
8    Split_Table    256       1.3044     0.0051
8    Split_CLR      256       1.2901     0.0050
9    Split          512       2.2430     0.0044
9    Split_CTE      512       23.3854    0.0457
9    Split_Table    512       2.4992     0.0049
9    Split_CLR      512       2.4838     0.0049
10   Split          1024      4.5048     0.0044
10   Split_CTE      1024      45.7030    0.0446
10   Split_Table    1024      4.8886     0.0048
10   Split_CLR      1024      4.8601     0.0047
11   Split          2048      8.8229     0.0043
11   Split_CTE      2048      92.6160    0.0452
11   Split_Table    2048      9.7381     0.0048
11   Split_CLR      2048      9.8848     0.0048
12   Split          4096      17.6285    0.0043
12   Split_CTE      4096      184.3265   0.0450
12   Split_Table    4096      19.4092    0.0047
12   Split_CLR      4096      19.3849    0.0047
13   Split          8192      36.5924    0.0045
13   Split_CTE      8192      393.8663   0.0481
13   Split_Table    8192      39.3296    0.0048
13   Split_CLR      8192      38.9569    0.0048
14   Split          16384     70.7693    0.0043
14   Split_CTE      16384     740.2636   0.0452
14   Split_Table    16384     77.6300    0.0047
14   Split_CLR      16384     77.6878    0.0047
15   Split          32768     141.4202   0.0043
15   Split_CTE      32768     1481.5788  0.0452
15   Split_Table    32768     155.0163   0.0047
15   Split_CLR      32768     155.5904   0.0047
16   Split          65536     282.8597   0.0043
16   Split_CTE      65536     3098.3636  0.0473
16   Split_Table    65536     315.7588   0.0048
16   Split_CLR      65536     316.1782   0.0048
17   Split          131072    574.3652   0.0044
17   Split_CTE      131072    6021.9827  0.0459
17   Split_Table    131072    630.6880   0.0048
17   Split_CLR      131072    650.8676   0.0050
18   Split          262144    5526.9491  0.0211
18   Split_CTE      262144    17645.2219 0.0673
18   Split_Table    262144    5807.3244  0.0222
18   Split_CLR      262144    5759.6946  0.0220
19   Split          524288    11006.3019 0.0210
19   Split_CTE      524288    35093.2482 0.0669
19   Split_Table    524288    11585.3233 0.0221
19   Split_CLR      524288    11550.8323 0.0220

The results are:

Recursive common table expression shows the worst timing.
Split_CLR is on the pair with Split_Table, however Numbers_Table is better than Numbers_CLR.
Split and Numbers based on unrolled recursion show the best timing (most of the time).

The End. :-)

Tuesday, 27 February 2007 13:40:04 UTC

Comments [0] -
SQL Server puzzle

Friday, 23 February 2007

Parade of numbers

Well, several days have passed but for a some reason I've started to feel uncomfortable about Numbers function. It's all because of poor recursive CTE implementation. I have decided to unroll the cycle. The new version hovewer isn't a beautiful but is providing much more superior performance comparing with previous implementation:

/*
Returns numbers table.
Table has a following structure: table(value int not null);
value is an integer number that contains numbers from 1 to a specified value.
*/
create function dbo.Numbers
(
/* Number of rows to return. */
@count int
)
returns table
as
return
with Number4(Value) as
(
    select 0 union all select 0 union all
    select 0 union all select 0 union all
    select 0 union all select 0 union all
    select 0 union all select 0 union all
    select 0 union all select 0 union all
    select 0 union all select 0 union all
    select 0 union all select 0 union all
    select 0 union all select 0
),
Number8(Value) as
(
    select 0 from Number4 union all select 0 from Number4 union all
    select 0 from Number4 union all select 0 from Number4 union all
    select 0 from Number4 union all select 0 from Number4 union all
    select 0 from Number4 union all select 0 from Number4 union all
    select 0 from Number4 union all select 0 from Number4 union all
    select 0 from Number4 union all select 0 from Number4 union all
    select 0 from Number4 union all select 0 from Number4 union all
    select 0 from Number4 union all select 0 from Number4
),
Number32(Value) as
(
    select 0 from Number8 N1, Number8 N2, Number8 N3, Number8 N4
)
select top(@count) row_number() over(order by Value) Value from Number32;

The performance achieved is on pair with numbers table. Estimated number of rows is precise whenever we pass constant as parameter.

What is the moral? - There is a space for the enhancements in the recursive CTE.

Next day

Guess what? - Yes! :-) there is also the CLR, which allows to create one more implementation of the numbers and split functions. In the next entry I'll show it, and performance comparison of different approaches.

Friday, 23 February 2007 00:21:31 UTC

Comments [0] -
SQL Server puzzle

Tuesday, 20 February 2007

Split string in the SQL Server 2005

This task is already discussed many times. SQL Server 2005 allows to create an inline function that splits such a string. The logic of such a function is self explanatory, which also hints that SQL syntax became better:

/*
Returns numbers table.
Table has a following structure: table(value int not null);
value is an integer number that contains numbers from 0 to a specified value.
*/
create function dbo.Numbers
(
/* Number of rows to return. */
@count int
)
returns table
as
return
with numbers(value) as
(
select 0
union all
select value * 2 + 1 from numbers where value < @count / 2
union all
select value * 2 + 2 from numbers where value < (@count - 1) / 2
)
select
row_number() over(order by U.v) value
from
numbers cross apply (select 0 v) U;

/*
Splits string using split character.
Returns a table that contains split positions and split values:
table(Pos, Value)
*/
create function dbo.Split
(
/* A string to split. */
@value nvarchar(max),
/* An optional split character.*/
@splitChar nvarchar(max) = N','
)
returns table
as
return
with Bound(Pos) as
(
select
    Value
from
    dbo.Numbers(len(@value))
where
    (Value = 1) or
    (substring(@value, Value - 1, len(@splitChar)) = @splitChar)
),
Word(Pos, Value) as
(
select
    Bound.Pos,
    substring
    (
      @value,
      Bound.Pos,
      case when Splitter.Pos > 0
        then Splitter.Pos
        else len(@value) + 1
      end - Bound.Pos
    )
from
    Bound
    cross apply
    (select charindex(@splitChar, @value, Pos) Pos) Splitter
)
select Pos, Value from Word;

Test:

declare @s nvarchar(max);

set @s = N'ALFKI,BONAP,CACTU,FRANK';

select Value from System.Split(@s, default) order by Pos;

Tuesday, 20 February 2007 13:10:06 UTC

Comments [0] -
SQL Server puzzle

Wednesday, 07 February 2007

SQL Server's 2005 partitions: are they best?

SQL Server 2005 has got built-in partitions. As result, I have been given a task to port a database from SQL Server 2000 to 2005, and replace old style partitions with new one. It seems reasonable, but before modifying a production database, which is about 5TB in size, I've tested a small one.

Switch the data - it's an easy part. I need also to test all related stored procedures. At this point I've found shortcomings, which tightly related to a nature of the partitions.

In select statement SQL Server 2005 iterates over partitions, in contrast SQL Server 2000 rolls out partition view and embeds partition tables into an execution plan. The performance difference can be dramatic (the case I'm dealing with).

Suppose you are to get 'top N' rows of ordered set of data from several partitions. SQL Server 2000 can perform operations on partitions (to get ordered result per partition), and then merge them, and return 'top N' rows. However, if execution plan just iterates partitions and applies the same operations to each partition in sequential manner the result will be semiordered. To get 'top N' rows the sort operator is required. This is the case of SQL Server 2005.

The problem is that the SQL Server 2005 never uses merge operator to combine results!

To illustrate the problem let's define two partitioned tables:

create partition function [test](smalldatetime) as range left for values (N'2007-01-01', N'2007-02-01')
go

create partition scheme [testScheme] as partition [test] to [primary], [primary], [primary])
go

CREATE TABLE [dbo].[Test2000_12](
    [A] [smalldatetime] NOT NULL,
    [B] [int] NOT NULL,
    [C] [nvarchar](50) NULL,
CONSTRAINT [PK_Test2000_12] PRIMARY KEY CLUSTERED
(
    [A] ASC,
    [B] ASC
)
)
GO

CREATE NONCLUSTERED INDEX [IX_Test2000_12] ON [dbo].[Test2000_12]
(
    [B] ASC,
    [A] ASC
)
GO

CREATE TABLE [dbo].[Test2000_01](
    [A] [smalldatetime] NOT NULL,
    [B] [int] NOT NULL,
    [C] [nvarchar](50) NULL,
CONSTRAINT [PK_Test2000_01] PRIMARY KEY CLUSTERED
(
    [A] ASC,
    [B] ASC
)
)
GO

CREATE NONCLUSTERED INDEX [IX_Test2000_01] ON [dbo].[Test2000_01]
(
    [B] ASC,
    [A] ASC
)
GO

CREATE TABLE [dbo].[Test2000_02](
    [A] [smalldatetime] NOT NULL,
    [B] [int] NOT NULL,
    [C] [nvarchar](50) NULL,
CONSTRAINT [PK_Test2000_02] PRIMARY KEY CLUSTERED
(
    [A] ASC,
    [B] ASC
)
)
GO

CREATE NONCLUSTERED INDEX [IX_Test2000_02] ON [dbo].[Test2000_02]
(
    [B] ASC,
    [A] ASC
)
GO

CREATE TABLE [dbo].[Test2005](
    [A] [smalldatetime] NOT NULL,
    [B] [int] NOT NULL,
    [C] [nvarchar](50) NULL,
CONSTRAINT [PK_Test2005] PRIMARY KEY CLUSTERED
(
    [A] ASC,
    [B] ASC
)
) ON [testScheme]([A])
GO

CREATE NONCLUSTERED INDEX [IX_Test2005] ON [dbo].[Test2005]
(
    [B] ASC,
    [A] ASC
) ON [testScheme]([A])
GO

ALTER TABLE [dbo].[Test2000_01] WITH CHECK ADD CONSTRAINT [CK_Test2000_01] CHECK (([A]>='2007-01-01' AND [A]<'2007-02-01'))
GO
ALTER TABLE [dbo].[Test2000_01] CHECK CONSTRAINT [CK_Test2000_01]
GO

ALTER TABLE [dbo].[Test2000_02] WITH CHECK ADD CONSTRAINT [CK_Test2000_02] CHECK (([A]>='2007-02-01'))
GO
ALTER TABLE [dbo].[Test2000_02] CHECK CONSTRAINT [CK_Test2000_02]
GO

ALTER TABLE [dbo].[Test2000_12] WITH CHECK ADD CONSTRAINT [CK_Test2000_12] CHECK (([A]<'2007-01-01'))
GO
ALTER TABLE [dbo].[Test2000_12] CHECK CONSTRAINT [CK_Test2000_12]
GO

create view [dbo].[test2000] as
select * from dbo.test2000_12
union all
select * from dbo.test2000_01
union all
select * from dbo.test2000_02
go

/*
Returns numbers table.
Table has a following structure: table(value int not null);
value is an integer number that contains numbers from 0 to a specified value.
*/
create FUNCTION dbo.[Numbers]
(
/* Number of rows to return. */
@count int
)
RETURNS TABLE
AS
RETURN
with numbers(value) as
(
select 0
union all
select value * 2 + 1 from numbers where value < @count / 2
union all
select value * 2 + 2 from numbers where value < (@count - 1) / 2
)
select
row_number() over(order by U.v) value
from
numbers cross apply (select 0 v) U

Pupulate tables:

insert into dbo.Test2005
select
cast(N'2006-01-01' as smalldatetime) + 0.001 * N.Value,
N.Value,
N'Value' + cast(N.Value as nvarchar(16))
from
dbo.Numbers(500000) N
go

insert into dbo.Test2000
select
cast(N'2006-01-01' as smalldatetime) + 0.001 * N.Value,
N.Value,
N'Value' + cast(N.Value as nvarchar(16))
from
dbo.Numbers(500000) N
go

Perform a test:

select top 20
A, B
from
dbo.Test2005
--where
--(A between '2006-01-10' and '2007-01-10')
order by
B

select top 20
A, B
from
dbo.Test2000
--where
--(A between '2006-01-10' and '2007-01-10')
order by
B
--option(merge union)

The difference is obvious if you will open execution plan. In the first case estimated subtree cost is: 17.4099; in the second: 0.0455385.

SQL server cannot efficiently use index on columns (B, A). The problem presented here can appear in any select that occasionally accesses two partitions, but regulary uses only one, provided it uses a secondary index. In fact this covers about 30% of all selects in my database.

Next day

I've meditated a little bit more and devised a centaur: I can define a partition view over partition table. Thus I can use either this view or table depending on what I'm trying to achieve either iterate partitions or roll them out.

create view [dbo].[Test2005_View] as
select * from dbo.Test2005 where $partition.test(A) = 1
union all
select * from dbo.Test2005 where $partition.test(A) = 2
union all
select * from dbo.Test2005 where $partition.test(A) = 3

The following select is running the same way as SQL Server 2000 partitions:

select top 20
A, B
from
dbo.Test2005_View
-- dbo.Test2005
order by
B

Wednesday, 07 February 2007 18:32:54 UTC

Comments [0] -
SQL Server puzzle

Thursday, 23 November 2006

Sharing UI culture info between threads.

In one of our latest projects (GUI on .NET 2.0) we've felt all the power of .NET globalization, but an annoying thing happened too...

In our case such an annoying thing was sharing of UI culture info between main (UI) thread and all auxiliary threads (threads from ThreadPool, manually created threads etc.). It seems we've fallen into a .NET globalization pitfall.

We guessed that the same as main thread UI culture info for, at least, all asynchronous delegates' calls is used. This is a common mistake, and what's more annoying, there is no a single line in MSDN documentation about this issue. :-S

Let's look closer at this issue. Our application starts on computer with English regional settings ("en-En"), and during application starting we are changing UI culture info to one specified in configuration file:

	// set the culture from the config file
	try
	{
	  Thread.CurrentThread.CurrentUICulture =
              new CultureInfo(Settings.Default.CultureName);
	}
	catch
	{
	   // use the default UI culture info
	}

Thus, all the screens of this GUI application will be displayed according with the specified culture. There are also localized strings stored in resource files that are used as log, exception messages etc., which can be displayed from within different threads (e.g. asynchronous delegates' calls).

So, when application is running and even all screens are displayed according with the specified culture, all the exceptions from auxiliary threads still in English. :'( This happened since threads for asynchronous calls are pulled out from ThreadPool, and all these threads were created using default culture.

Conclusion
Take care about CurrentUICulture in different threads by yourself, and be careful - there are still pitfalls on this way...

Thursday, 23 November 2006 10:55:10 UTC

Comments [0] -
Tips and tricks

Friday, 17 November 2006

Write log records during a transaction

I need to log actions into log table in my stored procedure, which is called in context of some transaction. The records in the log table I need no matter what happens (no, it's even more important to get them there if operation fails).

begin transaction
...
execute some_proc
...
if (...)
commit transaction
else
rollback transaction

some_proc:

...

insert into log...

insert ...
update ...

insert into log...

...

How to do this?

November 25

I've found two approaches:

table variables, which do not participate into transactions;
remote queries, which do not participate into local transactions;

The second way is more reliable, however not the fastest one. The idea is to execute query on the same sever as if it's a linked server.

Suppose you have a log table:

create table System.Log
(
ID int identity(1,1) not null,
Date datetime not null default getdate(),
Type int null,
Value nvarchar(max) null
);

To add log record you shall define a stored procedure:

create procedure System.WriteLog
(
@type int,
@message nvarchar(max)
)
as
begin
set nocount on;

execute(
    'insert into dbname.System.Log(Type, Value) values(?, ?)',
    @type,
    @message)
    as user = 'user_name'
    at same_server_name;
end

Whenever you're calling System.WriteLog in context of local transaction the records are inserted into the System.Log table in a separate transaction.

Friday, 17 November 2006 13:35:05 UTC

Comments [0] -
SQL Server puzzle

Sunday, 12 November 2006

Перечитывая Ницше

Читая его впервые, студентом, видишь его вещающим оракулом.
Перечитывая, замечаешь, что он в значительной степени критик
прошлого и настоящего.

Какая ирония - он ведь предупреждал о вреде истории для творца!
Кто изменилcя - я или Ницше?

Sunday, 12 November 2006 10:39:39 UTC

Comments [2] -

Saturday, 04 November 2006

Reconstruct xml from a table

My next SQL puzzle (thanks to fabulous XQuery support in SQL Server 2005) is how to reconstruct xml from the hierarchy table. This is reverse to the "Load xml into the table".

Suppose you have:

  select Parent, Node, Name from Data

where
(Parent, Node) - defines xml hierarchy, and
Name - xml element name.

How would you restore original xml?

November 8, 2006 To my anonymous reader:

declare @content nvarchar(max);

set @content = '';

with Tree(Node, Parent, Name) as
(
/* Source tree */
select Node, Parent, Name from Data
),
Leaf(Node) as
(
select Node from Tree
except
select Parent from Tree
),
NodeHeir(Node, Ancestor) as
(
select Node, Parent from Tree
union all
select
    H.Node, T.Parent
from
    Tree T inner join NodeHeir H on H.Ancestor = T.Node
),
ParentDescendants(Node, Descendats) as
(
select
    Ancestor, count(Ancestor)
from
    NodeHeir
where
    Ancestor > 0
group by
    Ancestor
),
Line(Row, Node, Text) as
(
select
    O.Row, T.Node, O.Text
from
    ParentDescendants D
    inner join
    Tree T
    on D.Node = T.Node
    cross apply
    (
      select D.Node * 2 - 1 Row, '<' + T.Name + '>' Text
      union all
      select (D.Node + D.Descendats) * 2, '</' + T.Name + '>'
    ) O
  union all
select
    D.Node * 2 - 1, T.Node, '<' + T.Name + '/>'
from
    Leaf D inner join Tree T on D.Node = T.Node
)
select top(cast(0x7fffffff as int))
@content = @content + Text
from
Line
order by
Row asc, Node desc
option(maxrecursion 128);

select cast(@content as xml);

Saturday, 04 November 2006 09:50:22 UTC

Comments [0] -
SQL Server puzzle

Thursday, 02 November 2006

DasBlog Engine

Well, I like DasBlog Engine, however it does not allow to add new comments in our blog. This is unfortunate. :-S

In the activity log I regulary see errors related to the CAPTCHA component. For now I have switched it off. I believe we'll start getting comments at least one per year. :-)

Thursday, 02 November 2006 14:03:15 UTC

Comments [0] -

Friday, 27 October 2006

Load xml into the table

Say you need to load a table from an xml document, and this table defines some hierarchy. Believe me or not, but this is not that case when its better to store xml in the table.

Let's presume the table has:

Node - document node id;
Parent - parent node id;
Name - node name.

The following defines a sample xml document we shall work with:

declare @content xml;

set @content = '
<document>
  <header/>
  <activity>
    <title/>
    <row/>
    <row/>
    <row/>
    <row/>
    <total/>
  </activity>
  <activity>
    <title/>
    <row/>
    <total/>
  </activity>
  <activity>
    <title/>
    <row/>
    <total/>
  </activity>
  <activity>
    <title/>
    <row/>
    <row/>
    <row/>
    <total/>
  </activity>
</document>';

How would you solved this task?

I've been spending a whole day building acceptable solution. This is probably because I'm not an SQL guru. I've found answers using cursors, openxml, pure xquery, and finally hybrid of xquery and sql ranking functions.

The last is fast, and has linear dependency of working time to xml size.

with NodeGroup(ParentGroup, Node, Name) as
(
  select 
    dense_rank() over(order by P.Node),
    row_number() over(order by N.Node),
    N.Node.value('local-name(.)', 'nvarchar(max)')
  from 
    @content.nodes('//*') N(Node) 
    cross apply
    Node.nodes('..') P(Node)
),
Node(Parent, Node, Name) as
(
  select 
    min(Node) over(partition by ParentGroup) - 1, Node, Name
  from 
    NodeGroup 
)
select * from Node order by Node;

Is there a better way? Anyone?

Friday, 27 October 2006 12:23:25 UTC

Comments [0] -
SQL Server puzzle

Monday, 02 October 2006

Numbers table in SQL Server 2005

Return a table of numbers from 0 up to a some value. I'm facing this recurring task once in several years. Such periodicity induces me to invent solution once again but using contemporary features.

November 18:

This time I have succeeded to solve the task in one select:

declare @count int;

set @count = 1000;

with numbers(value) as
(
select 0
union all
select value * 2 + 1 from numbers where value < @count / 2
union all
select value * 2 + 2 from numbers where value < (@count - 1) / 2
)
select
row_number() over(order by U.V) value
from
numbers cross apply (select 1 V) U;

Do you have a better solution?

Monday, 02 October 2006 07:27:51 UTC

Comments [0] -
SQL Server puzzle | Tips and tricks

Saturday, 19 August 2006

What unifies C++ and Java

Do you think they are different? I think not too much.

Language, for a creative programmer, is a matter to express his virtues, and a hammer that brings a good salary for a skilled labourer.

Each new generation of programmers tries to prove itself. But how?

Well, C++ programmers invent their strings and smart pointers, and Java adepts (as they cannot create strings) design their springs and rubies.

It's probably all right - there is no dominance, but when I'm reading docs of someone's last pearl, I'm experiencing deja vu. On the whole, all looks as a chaotic movement.

Other time I think - how it's interesting to build something when you should not design the brick, even if you know that bricks aren't perfect.

Saturday, 19 August 2006 14:34:42 UTC

Comments [0] -

Monday, 14 August 2006

Don't allow to ignorants to come near the xsl

I've been given a task to fix several xsls (in fact many big xsls) that worked with msxml and stoped to work with .NET. At first I thought it will be easy stuff, indeed both implementations are compatible as both implement http://www.w3.org/1999/XSL/Transform.

Well, I was wrong. After a 10 minutes I've been abusing that ignorant who has written xsls. More over, I was wondering how msxml could accept that shit.

So, come to the point. I had following xsl:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   version="1.0"
   xmlns:msxsl="urn:schemas-microsoft-com:xslt"
   xmlns:user="http://mycompany.com/mynamespace">

<xsl:template match="/">
<HTML dir="rtl">
...
<BODY dir="rtl">
...
<TD width="68" dir="ltr" align="middle">
<FONT size="1" face="David">
<xsl:variable name="DegB" select="//*[@ZihuyMuzar='27171']" />
</FONT>
</TD>
...
<TD height="19" dir="ltr" align="middle">
<FONT size="2" face="David">
<xsl:value-of select="$DegB/*/Degem[1]/@*[3]" />
%
</FONT>
</TD>
...

I don't want to talk about "virtues" of "html" that's produced by this alleged "xsl", however about xsl itself. To my amazement msxml sees $DegB, which is declared and dies in different scope. At first I thought I was wrong: "Must be scope is defined defferently then I thought?", but no. OK, I said to myself I can fix that. I've created another xsl that elevates xsl:variable declarations to a scope where they are visible to xsl:value-of.

But that wasn't my main head ache. Some genius has decided to use third, forth, and so on attribute. What does this mean in the god's sake? How one could rely on this? I'll kill him if I'll find him! There was thousands of such @*[3]. Even if I'll see the original xml how can I be sure that msxml and .NET handle attribute collections in the same order?

There was no other way, but check this assumption. I've verified that both implementations store attributes in xml source order. This easied my pains.

To clarify implementation details I have digged in XmlDocument/XPathDocument implementations in .NET 1.1 and 2.0. I was curious how they store a set of attributes. It's interesting to know that they has decided to keep only ordered list of attributes in either implementation. This means that ordered attribute access is fast, and named access leads to list scan. In my opinion it's dubious solution. Probably the idea behind is that there is in average a few attributes to scan when one uses named access. In my case there were up to 100 attributes per element.

Conclusion?
1. Don't allow to ignorants to come near the xsl.
2. Don't design xmls that use many attributes when you're planning to use .NET xslt.

Monday, 14 August 2006 13:42:06 UTC

Comments [0] -

Wednesday, 21 June 2006

Useful BizTalk functoids

Recently we were creating a BizTalk 2006 project. A map was used to normalize input data, where numbers were stored with group separators like "15,000,000.00" and text (Hebrew in our case) was stored visually like "ןודנול סלפ קנב סדיולל".

We do need to store the output data the xml way, this means numbers as "15000000.00" and Hebrew text in logical form "ללוידס בנק פלס לונדון". Well, it's understood that there are no standard functoids that deal with bidi, as there are no too many people that know about the problem in the first place. However we thought at least that there will not be problems with removing of "," in numbers.

BizTalk 2006 does not provide functoids to solve either of these tasks! To answer our needs we have designed two custom functoids.

"Replace string":
Returns a string with text Replaced using a regular expression or search string.
First parameter is a string where to Replace.
Second parameter is a string or regular expression pattern in the format /pattern/flags to Replace.
Third parameter is a string or regular expression pattern that Replaces all found matches.

"Logical to visual converter":
Converts an input "logical" string into a "visual" string.
First parameter is a string to convert.
Optional second parameter is a start embedding level (LTR or RTL).

Wednesday, 21 June 2006 11:01:49 UTC

Comments [0] -
BizTalk Server

Navigation