Tuesday, October 04, 2011

Microsoft SQL Server Connector for Apache Hadoop RTW

Microsoft Downloads - Microsoft SQL Server Connector for Apache Hadoop

"Microsoft SQL Server Connector for Apache Hadoop (SQL Server-Hadoop Connector) RTM is a Sqoop-based connector that facilitates efficient data transfer between SQL Server 2008 R2 and Hadoop. Sqoop supports several databases.

Version: 1.0
Date Published: 10/4/2011

Language: English

  • Microsoft SQL Server-Hadoop Connector User Guide.pdf, 878 KB
  • SQL Server Connector for Apache Hadoop MSLT.pdf, 220 KB
  • sqoop-sqlserver-1.0.tar.gz, 1.0 MB
  • THIRDPARTYNOTICES FOR HADOOP-BASED CONNECTORS.txt, 33 KB

The Microsoft SQL Server Connector for Apache Hadoop extends JDBC-based Sqoop connectivity to facilitate data transfer between SQL Server and Hadoop, and also supports the JDBC features as mentioned in SQOOP User Guide on the Cloudera website. In addition to this, this connector provides support for nchar and nvarchar data types

With SQL Server-Hadoop Connector, you import data from:

  • tables in SQL Server to delimited text files on HDFS
  • tables in SQL Server to SequenceFiles files on HDFS
  • tables in SQL Server to tables in Hive*
  • result of queries executed on SQL Server to delimited text files on HDFS
  • result of queries executed on SQL Server to SequenceFiles files on HDFS
  • result of queries executed on SQL Server to tables in Hive*

Note: importing data from SQL Server into HBase is not supported in this release.

With SQL Server-Hadoop Connector, you can export data from:

  • delimited text files on HDFS to SQL Server
  • sequenceFiles on HDFS to SQL Server
  • hive Tables* to tables in SQL Server

* Hive is a data warehouse infrastructure built on top of Hadoop (http://wiki.apache.org/hadoop/Hive). We recommend to use hive-0.7.0-cdh3u0 version of Cloudera Hive.

Sqoop is an open source connectivity framework that facilitates transfer between multiple Relational Database Management Systems (RDBMS) and HDFS. Sqoop uses MapReduce programs to import and export data; the imports and exports are performed in parallel with fault tolerance.

The Source / Target files being used by Sqoop can be delimited text files (for example, with commas or tabs separating each field), or binary SequenceFiles containing serialized record data. Please refer to section 7.2.7 in Sqoop User Guide for more details on supported file types. For information on SequenceFile format, please refer to Hadoop API page.

Supported Operating Systems: Linux, Windows Server 2008 R2

Linux (for Hadoop setup) and Windows (with SQL Server 2008 R2 installed). Both are required to use the SQL Server-Hadoop Connector

..."

I don't Hadoop yet, but it's starting to get some interest and eyeballs in my day-job's field so want to keep an eye on it...

 

Related Past Post XRef:
Microsoft SQL Server Connector for Apache Hadoop CTP1
Do you Hadoop? Angel has your links, news and resources round-up...

1 comment:

Greg Duncan said...

Note: Looks like the SQL Server Connector is now part of the official Scoop distro and can now be found as part of Scoop on the Scoop site, http://sqoop.apache.org/

So there’s no longer a separate, standalone connector just for SQL Server, but is instead part of the main Scoop project…