Saturday, August 04, 2012

Creating the Complex [databases for testing] Series

SamLester - Creating Complex Test Databases - Intro

As a very brief intro, I have worked as a tester in SQL Server for the past 10+ years on many different features. Along the way, we develop and test features and release them to the public only to discover some customers inevitably encounter bugs when they run the features against their databases. How can this happen when we have amazing PMs and developers, devoted and talented test teams, and thousands of automated test cases per feature? The answer often lies in the incredible complexity of customer databases running on SQL Server and the evolution of those databases as they have grown from small to very complex databases over the years. As testers, we have a few different options to try to mitigate this problem and represent "all possible databases" in our testing, but it is impossible to test every possible permutation of databases based on this complexity. In practice, we do all of these to an extent and are constantly working on improving each of them.

Some options are to:

  • Acquire real customer databases - these are often the best test databases, but pose many challenges to acquire due to size, network, security, PII, NDAs, etc. We often work with our internal Microsoft product teams who run large scale database applications to leverage their DBs. (Dreaming out loud: I'd love to try to work with customers to figure out a way to get more "privacy scrubbed" customer DBs into our test environment. Microsoft products get better coverage, the customer applications are guaranteed to work, and we all win. I'll blog more on this later, but send me a message if you're interested in working together to get your scrubbed databases in our test bed.)
  • Programmatically write tools that can create many permutations of databases with various objects, properties, relationships, etc. Feed various inputs into this tool to create different test databases. We have had pretty good success with this model as we're able to use many smart testing techniques and create some great test databases that uncover some great bugs.
  • Maintain a database of "interesting" syntax and write automated data-driven test cases based on this object level syntax. As we encounter any new bug, distill the bug down to the problematic syntax and add that to our existing syntax database.
  • Handcraft complex databases with very specific requirements based on the testing needed for a particular feature/sign-off.

The last option (handcrafted databases) are often our last resort, but result in the most effective method for ensuring that specific features work for specific test cases. Our dev, test, and PM team spent some time recently for a feature we are working on to come up with the list of "complex" databases that we do not have in our test environment, but would like to add. Over the next few blog posts, I'll cover some of the interesting databases and the techniques I used to create them. Here are a few of the DBs we had in mind:

SamLester - Creating Complex Test Databases - One Table for each of the 2,397 supported Collations

As a follow up to my post on complex test databases, this article will cover one of the more interesting test DBs I recently created.

Goal: Create a database that includes one table for each supported collation. Each table contains a single column with the various column level collations supported by SQL Server 2012 (nearly 2,400 different collations supported).

The first step here is to determine where we can find the exhaustive list of supported collations. The answer comes from the built-in table-valued function, fn_helpcollations, that returns the list of supported collations in SQL Server 2012. Once we have the exhaustive list of supported collations, we need to determine how we will leverage this list to create one table for each collation. If we were to do this manually, we would write out the following CREATE TABLE statements:

create table T1 (c1 nvarchar(50) collate Albanian_100_BIN)
create table T2 (c1 nvarchar(50) collate Albanian_100_BIN2)
create table T3 (c1 nvarchar(50) collate Albanian_100_CI_AI)

create table T2395 (c1 nvarchar(50) collate Yakut_100_CS_AS_KS)
create table T2396 (c1 nvarchar(50) collate Yakut_100_CS_AS_KS_WS)
create table T2397 (c1 nvarchar(50) collate Yakut_100_CS_AS_WS)

The repetition of these statements makes them good candidates for scripting using T-SQL. By leveraging the ROW_NUMBER function as the table numeric identifier, we're able to put together the following statement:

..."

SamLester - Creating Complex Test Databases - Creating a Database with 1 Billion Random Rows

"As part of my series on creating databases with interesting characteristics for testing purposes, today we'll create a database containing a large number of tables, each with a large number of rows inserted.

Goal: Create a database that contains 1,000 tables. Each table contains 5 integer columns and should contain 1,000,000 random rows of data.

1,000 tables x 1,000,000 rows/table = 1,000,000,000 rows of data

One billion rows? Really? Yes, 1 billion rows! And random data, please.

For this task, we can break it down into two steps. The first is to create the 1,000 tables, which we can easily accomplish through a TSQL script. The next part is to populate to the tables with data. For this task, we will leverage the Data Generation tool in Visual Studio 2010.

..."

A co-worker mentioned a bit ago that his team needed something like this, safe yet complex data at a production scale. While this series isn't a prefect problem fit yet, it looks like it might evolve into one. Worth shooting off an email to him anyway... :)

No comments: