Thursday, December 09, 2004

zVarchar example [SQL Server 2K5 .Net UDT]

zVarchar example

"As mentioned in a previous entry, I decided to try to build a compressed text data type as an example of a UDT. So here it is. I built a datatype called zVarchar. To keep things as a reasonable demo, I didn't incorporate complex compression, just a simple run length encoding scheme and built it in VB so most should be able to read it. But it's interesting. If you want to try it, start a new SQL Server project in VS2005, add a User Defined Type and include the code below. After building and deploying it, you can do the following:

CREATE TABLE GregTest(
RecID int IDENTITY(1,1),
TextValue zVarchar
)

I then tried storing values in it. On small varchar values, the overhead of the UDT meant the data was actually larger than a varchar. But, I tried some examples with large varchar values (around 1000 characters per row and about 40% compressible) and got good results. My table using the zVarchar data type was around 50% of the size of a table using varchar, with 10,000 rows in the table. To keep it interesting, I added some additional functions to allow access to the compressed value, the length (both compressed and uncompressed), etc.

You can then execute queries like:
SELECT TextValue.CompressedLength() FROM GregTest

..."


Interesting... I need to think about this. This seems like a good exception to the "SQL 2K5 .Net UDT's should only be scalar values" rule.

For my purposes, I'd need to make sure the data travels over the wire compressed. And since the .Net Assembly needs to be deployed to any client using the .Net UDT, it should be do'able.

I don't want the SQL Server taking the compress/decompress hit. And also the fewer bytes on the wire the better. Think HTTP compression....

But then again, is the .Net UDT overhead really needed? In my case, does it add enough value or is it just cool?

I can see instances where I would need the data decompress on the SQL Server (to be used in set based queries, etc). While a .Net SP could do the compress/decompress, I like the idea of the data being self contained.

Also I can see having different compression schemes for different data (text vs images, etc). With a UDT, you could query it to see what kind of compression was used (RLE/gzip/etc) for its data.

Again, I need to think about this.

No comments: