Till recently we were living in simple world of string comparisons in SQL style,
and now everything has changed.
From the university years we knew that strings in SQL are compared by first
trimming traling spaces, and then comparing in C style.
Well, the picture was a little more complex, as collations were involved
(national, case sensivity), and as different SQL vendors implemented it
Next, we're dealing with programs converted from COBOL, which we originally
thought follow SQL rules when strings are compared.
Here is where the problem has started.
Once we have found that java program has branched differently than original
COBOL, and the reason was that the COBOL and java compared two strings
"A\n" < "A"
"A\n" > "A"
We have looked into
COBOL Language Reference and found the rules:
If all pairs of characters through the last pair test as equal, the operands are
considered as equal.
If a pair of unequal characters is encountered, the characters are tested to
determine their relative positions in the collating sequence. The operand that
contains the character higher in the sequence is considered the greater operand.
You can see that strings must not be trimmed but padded with spaces
to the longer string, and only then they are compared. This subtle difference
has significant impact for characters below the space.
So, here we've found that COBOL and SQL comparisons are different.
But then we have questioned how really SQL beheaves?
We've tested comparisons in SQL Server and DB2, and have seen that our
understanding of SQL comparison holds. It works as if trimming spaces, and then
But again we have looked into SQL-92 definition, and that's what we see there:
8.2 <comparison predicate>
3) The comparison of two character strings is determined as follows:
a) If the length
in characters of X is not equal to the length
characters of Y, then the shorter string is effectively
replaced, for the purposes of comparison, with a copy of
itself that has been extended to the length of the longer
string by concatenation on the right of one or more pad characters, where the pad character is chosen based on CS. If
has the NO PAD attribute, then the pad character is an
implementation-dependent character different from any character
in the character set of X and Y that collates less
any string under CS. Otherwise, the pad character is a
So, what we see is that SQL-92 rules are very close to COBOL rules, but then we
reach the question: how come that at least SQL Server and DB2 implement string
comparison differently than SQL-92 dictates?
Update: we have found that both SQL Server and DB2 have their string collation defined in a way that <space> is less than any other character.
So the following is always true: '[' + char(13) + ']' > '[ ]'.
'[' + char(13) + ']' > '[ ]'
a@href@title, b, blockquote@cite, em, i, strike, strong, sub, super, u