Analysing large scale data with Apache Hadoop
Apache Hadoop is a Java framework for large-scale distributed batch processing infrastructure which runs on commodity hardware. The biggest advantage is the ability to scale to hundreds or thousands of computers. Hadoop is designed to efficiently distribute and handle large amounts of work across a set of machines.
This talk will introduce Hadoop along with MapReduce and HDFS. It will discuss the possible scenarios where Hadoop fits as a robust solution and will include a case study from a project, where Hadoop is used for bulk inserts and large-scale data analytics.
- What is Hadoop?
- Why Hadoop?
- What is MapReduce?
- HDFS Architecture Overview
- Demo with a use case from a real project scenario.
- Who is on Hadoop ?
This demo driven presentation will help audience to see the power of Hadoop when it comes to processing terabytes of data on commodity hardware.
Speaker: Salil Kalia has 7 years of experience on various Java based platforms (including mobile, desktop and web application development). In his recent projects, he has used various bleeding edge technologies including Hadoop where he has processed thousands of Gigabytes of data on Amazon cloud.
* Do check out the 7th Annual IndicThreads Conference that will be held in December in Pune, India.
Analysing large scale data with Apache Hadoop – was presented at IndicThreads Conference, Delhi NCR, India (July 2012). Presentation – Slides