Monday, January 05, 2009

SQL Server VARBINARY vs FILESTREAM – The code and performance trade offs saving the right sized files to the right datatype.

CodeProject - How to store and fetch binary data into a file stream column

“File streams were introduced in SQL Server 2008. They offer the capability to store binary data to the database but outside the normal database files. Earlier, varbinary used to be stored inside database files, which had many side-effects. Because SQL Server stores data in blocks which are arranged as extents, the data in earlier varbinary columns had to conform to the block structure (although a bit different from normal data blocks).

In SQL Server 2008, if a varbinary column is defined to be stored as a file stream, the binary data is stored in a special folder structure which is managed by the SQL Server engine. The only thing that remains inside the database is a pointer to the file along with a mandatory GUID column in order to use the file stream from the Win32 client.

File stream data can be used from the .NET Framework using the traditional SqlParameter, but there is also a specialized class called SqlFileStream which can be used with .NET Framework 3.5 SP1 or later. This class provides mechanisms, for example, for seeking a specific position from the data.

With a small amount of binary data, it's not efficient to use a file stream. This is because it needs extra overhead like file creation and handling. These operations are not needed when predefined database files are used. However, with larger files, file streams are quite efficient. The following charts show the elapsed times for upload in milliseconds using different techniques, on my development box. The key specifications for the computer used were:

  • SQL Server and client application on the same machine
  • Processor: Intel Core2 Duo, 1.8 MHz
  • 4 GB physical memory
  • Database files on drive C:
  • Files uploaded from drive E:
  • Drives C: and E: on separate physical SATA disk drives

The charts show average upload times for:

  • 100 KB file repeated 3 times for each measurement
  • 1 MB file repeated 3 times for each measurement
  • 10 MB file repeated 3 times for each measurement


What I liked about this article is that not only is the sample a complete soup-to-nuts, SQL Create all the way through to sample code, but it also provides a harness for performance comparison using storage different techniques (as well as data run charts ;).

It makes clear, what is becoming more commonly known, that FILESTREAM is good for larger files and VARBINARY for smaller ones. But of course, YMMV. Also make sure you listen to the Run As Radio show I link to below, Run to Run As Radio for a great SQL Server 2008 FILESTREAM show, for some great FILESTREAM configuration/performance tips.


Related Past Post XRef:
SQL Server 2008 FileStream and VB.Net – The Sample
SQL Server FileStream Whitepaper – A DBA/IT focused FILESTREAM paper
Run to Run As Radio for a great SQL Server 2008 FILESTREAM show
SQL Server 2008 FILESTREAM Attribute from Start to C#
SQL Server 2008, the FILESTREAM Attribute and Partitioning - Apparently not as easy as it looks (yet at least)...
A future world without the SQL Server Image/Text/NText data types. Now's the time to start planning for that future...
SQL Server 2008 FILESTREAM - Writing a file to a FILESTREAM column
More SQL Server 2008 FileStream Fun
Playing with the SQL Server 2008 FileStream Attribute


Prakash said...

Nice article. I was actually looking for how to on FileStream and ended up seeing this article on performance analysis of filestream. very helpful really. This is what I used for start :

Anonymous said...

where is the chart ?

Greg said...

@Anonymous, click through to the original Code Project article (link's right there at the top of the post...)