Thursday, July 11, 2013

A little Hadoop, HDInsight, Mahout, some .Net and a little StackOverflow and you have...

Amazedsaint's Tech Journal - Building A Recommendation Engine - Machine Learning Using Windows Azure HDInsight, Hadoop And Mahout

Feel like helping some one today?

Let us help the Stack Exchange guys to suggest questions to a user that he can answer, based on his answering history, much like the way Amazon suggests you products based on your previous purchase history.  If you don’t know what Stack Exchange does – they run a number of Q&A sites including the massively popular Stack Overflow. 

Our objective here is to see how we can analyze the past answers of a user, to predict questions that he may answer in future. May Stack Exchange’s current recommendation logic may work better than ours, but that won’t prevent us from helping them for our own  learning purposes .

We’ll be doing the following tasks.

  • Extracting the required information from Stack Exchange data set
  • Using the required information to build a Recommender

But let us start with the basics.   If you are totally new to Apache Hadoop and Hadoop On Azure, I recommend you to read these introductory articles before you begin, where I explain HDInsight and Map Reduce model a bit in detail.


Conclusion In this example, we were doing a lot of manual work to upload the required input files to HDFS, and triggering the Recommender Job manually. In fact, you could automate this entire work flow leveraging Hadoop For Azure SDK. But that is for another post, stay tuned. Real life analysis has much more to do, including writing map/reducers for extracting and dumping data to HDFS, automating creation of hive tables, perform operations using HiveQL or PIG, etc. However, we just examined the steps involved in doing something meaningful with Azure, Hadoop and Mahout.

You may also access this data in your Mobile App or ASP.NET Web application, either by using Sqoop to export this to SQL Server, or by loading it to a Hive table as I explained earlier. Happy Coding and Machine Learning!! Also, if you are interested in scenarios where you could tie your existing applications with HD Insight to build end to end workflows, get in touch with me. -


Just the article I've been looking for. It provides a nice start to finish view of playing with HDInsight and Mahout, which is something I was pulling my hair out over a few months ago...

No comments: