Wednesday, July 21, 2010

SQL Server strange string lengths


I just thought I would document some very unintuitive behavior of SQL Server to watch out for. It's well known that SQL Server is not well-suited to string processing. This is one of the reasons why.

The LEN built-in function returns the length of a given string, however, it has quirks about the way it handles whitespace:


Command Result
SELECT LEN('abc') 3
SELECT LEN('') 0
SELECT LEN(NULL) NULL
SELECT LEN(' ') --one space 0
SELECT LEN('       ') --multiple spaces 0
SELECT LEN(' ') --one tab 1
SELECT LEN('
') --one newline
2
SELECT LEN('ABC ')3
SELECT LEN(' ABC')4


I tested these behaviors in SQL Server 2005.

I think the reason for this is because old string types like char are fixed-width, using spaces to fill in unused portions of the string. For example, if I wanted to store 'Hello world' into a char(20), SQL Server would actually store 'Hello world         '. Then, when you want the length of the string, you just want the length of the part before all the trailing spaces. However, I'm not sure why this is still the dominant behavior, since nvarchar and varchar are much more commonly used these days than char, and they don't use all the trailing spaces.

As the the newline, it uses two characters: carriage return (13) and newline (10).

No comments: