Tuesday, July 24, 2012

Concatenating row values in T-SQL - One problem, many solutions and here's one page with many of them....

Data Management & Exchange - Transact SQL - Concatenating row values in T-SQL

  • Introduction
  • A core issue
  • Considerations
  • Concatenating values when the number of items is small and known upfront
  • Concatenating values when the number of items are not known
    • Recursive CTE method
    • The blackbox XML methods
    • Using Common Language Runtime
    • Scalar UDF with recursion
    • Table valued UDF with a WHILE loop
    • Dynamic SQL
    • The Cursor approach
  • Non-reliable approaches
    • Scalar UDF with t-SQL update extension
    • Scalar UDF with variable concatenation in SELECT
  • Conclusion
  • References
  • Acknowledgements

Introduction

Many a time, SQL programmers are faced with a requirement to generate report-like resultsets right off of a Transact SQL query. In most cases, the requirement arises from the fact that there are no sufficient tools or in-house expertise to develop tools that can extract the data as a resultset and massage the data in the desired display format. Quite often folks are confused about the potential of breaking relational fundamentals say like First Normal Form or the scalar nature of typed values. (Talking about 1NF violations in a language like SQL which lacks sufficient domain support, allows NULLs and supports duplicates is somewhat ironic to begin with, but that is a topic which requires detailed explanations.)

Concatenating column values or expressions from multiple rows are usually best done in a client side application language, since the string manipulation capabilities of Transact SQL and SQL based DBMSs are somewhat limited. However, you can do these using different approaches in Transact SQL, but avoiding such methods for long term solutions is your best bet.

A core issue
Even though SQL in general deviates considerably from the relational model, its reliance on certain core aspects of relational foundations makes SQL functional and powerful. One such core aspect is the set based nature of SQL expressions (well, multi-sets to be exact, but for the given context let us ignore the issue of duplication). The primary idea is that tables are unordered and hence the resultsets of any query that does not have an explicit ORDER BY clause is unordered as well. In other words, the rows in a resultset of a query do not have a prescribed position, unless it is explicitly specified in the query expression.

On the other hand, a concatenated list is an ordered structure. Each element in the list has a specific position. In fact, concatenation itself is an order-utilizing operation in the sense that values can be prefixed or post fixed to an existing list. So approaches that are loosely called “concatenating row values”, “aggregate concatenation” etc. would have to make sure that some kind of an order, either explicit or implicit, should be specified prior to concatenating the row values. If such an ordering criteria is not provided, the concatenated string would be arbitrary in nature.

Considerations
Generally, requests for row value concatenations often comes in two basic flavors, when the number of rows is known and small (typically less than 10) and when the number of rows is unknown and potentially large. It may be better to look at each of them separately.

In some cases, all the programmer wants is just the list of values from a set of rows. There is no grouping or logical partitioning of values like the list of email addresses separated by a semicolon or some such. In such situations, the approaches can be the same except the join conditions may vary. Minor variations of the examples list on this page illustrate such solutions as well.

...

image"

This is something I've been having to do quite a bit recently, but just today came across this page. While it is a little dated (seems SQL 2005'ish) it still contains a good bit of relevant information (especially for me since I have to support SQL 2005... ;). I partially appreciated the many different approaches mentioned as there's many kind of hammers for this nail. Which one have I been using? The  XML PATH with STUFF has been getting the job done for me, but after reading through the other options, my mind is very open to the other approaches...

1 comment:

David said...

Thanks for sharing! The article reminds me of a couple of articles by a SQL Server MVP (Erland Sommarskog) that have been INVALUABLE to me over the years whenever I have to do evil things in T-SQL:

The Curse and Blessings of Dynamic SQL

How to Share Data between Stored Procedures

I appreciate all the resources you find for us!

- David