Simplified Digital and Analytics: Hadoop Simplified

Saturday, 17 August 2013

Hadoop Simplified

Today we live in the age of Big data.

Data volumes have outgrown the storage & processing capabilities of a single machine and the different types of data formats required to be analyzed have increased tremendously.

This brings 2 fundamental challenges:

How to store and work with huge volumes & variety of data
How to analyze these vast data points & use it for competitive advantage.

Hadoop fills this gap by overcoming both the challenges. Hadoop is based on research papers from Google & it was created by Doug Cutting, who named the framework after his son’s yellow stuffed toy elephant.

So What is Hadoop? It is a framework made up of:

HDFS – Hadoop distributed file system
Distributed computation tier using programming of MapReduce
Sits on the low-cost commodity servers connected together called Cluster
Consists of a Master Node or NameNode to control the processing
Data Nodes to store & process the data
JobTracker & TaskTracker to manage & monitor the jobs

Let us see why Hadoop has become so much popular now.

Over the last decade, all the data computations were done by increasing the computing power of a single machine by adding the no of processors & increasing the RAM but they had physical limitations.
As the data started growing beyond these capabilities, an alternative was required to handle these storage requirements for eBay (10 PB), Facebook (30 PB), Yahoo (170 PB), JPMC (150 PB) and increasing
With a typical 75 MB/Sec disk data transfer rate, it was impossible to process such humongous data
Scalability was limited by physical size & no or limited fault tolerance
Additionally, various formats of data are being added to the organizations for analysis, which is not possible with traditional databases

How Hadoop addresses these challenges?

Data is split into small blocks of 64 or 128MB and stored onto minimum 3 machines at a time to ensure data availability & reliability
Many machines connected in cluster work parallel for the faster crunching of data
If anyone machine fails, the work is assigned to other automatically
MapReduce breaks complex tasks into smaller chunks to be executed in parallel

Benefits of using Hadoop as Big data platform are:

Cheap storage – commodity servers to decrease the cost per terabyte
Virtually unlimited scalability – new nodes can be added without any changes to existing data gives the ability to process any amount of data, so no archival necessary
The speed of processing – tremendous parallel processing to reduce processing time
Flexibility – schema-less, can store any data format – structured & unstructured ( audio, video, texts, csv, pdf, images, logs, clickstream data, social media)
Fault tolerant – any node failure is covered by another node automatically

Later multiple products & components are added to Hadoop so it is now called an eco-system.

Hive – SQL like interface
Pig – data management language like commercial tools AbInitio, Informatica
HBase – column-oriented database on top of HDFS
Flume – real-time data streaming such as credit card transaction, videos
Sqoop – SQL interface to RDBMS and HDFS
Zookeeper – a DBA management for Hadoop

And multiple such products are getting added all the time from various companies like Cloudera, Hortonworks, Yahoo, etc.

How some of the world leaders are using Hadoop:

Chevron collects large amounts of seismic data to find where they can get more oil resources
JPMC uses it for storing more than 150 PB of data, over 3.5 Billion user log-ins for Credit scoring & Fraud detection
eBay using it for real-time analysis and search of 9 PB data with 97 million active buyers, over 200 million items for Cross-Sell
Nokia uses it to store data from phone, service logs to analyze how people interact with apps and usage patterns to address customer churn
Walmart uses it to analyze customer behavior of over 200 million customer visits in a week
UC Irvine Health hospitals are storing 9 million patients records over 22 years to build patients surveillance algorithms
Manufacturers are using it for warranty analytics

Hadoop may not replace the existing data warehouses but it is becoming no 1 choice for Big data platform with the price/performance ratio.

37 comments:

Anonymous17 August 2013 at 20:28
Very informative and helpful.
ReplyDelete
Replies
Unknown18 August 2013 at 00:47
Thank you so much for informative and useful article. It really helps me.
ReplyDelete
Replies
Hanu18 August 2013 at 14:20
Thank you so much. It is easy to understand why and What is hadoop.
ReplyDelete
Replies
Unknown18 August 2013 at 14:27
Really Simplied version on Big Data ! Very Informative !
ReplyDelete
Replies
Rai_abhinav19 August 2013 at 09:55
thanks.....you made Hadoop look easy :)
ReplyDelete
Replies
Chander Sharma19 August 2013 at 12:00
Hi Sandeep,
Its a great article for the beginners like us.
Doug Cutting child's toy change the world.
Hadoop is a greet framework and your article helped in clearing our concepts.
Thanks,
Chander Sharma
ReplyDelete
Replies
Anonymous19 August 2013 at 22:51
Your article provides a clear and concise summary.
Thank you.
ReplyDelete
Replies
Anonymous20 August 2013 at 01:43
Wow ... Have been hearing about Big Data and Hadoop , this article opens the door to Big Data World.
ReplyDelete
Replies
Anonymous20 August 2013 at 23:50
Nice article gives simplified high level understanding of Hadoop.
I find there are Lot of similarities between Hadoop and TERADATA database in terms of managing huge data and scalablity and Fault tolerance.

-- Aditya
ReplyDelete
Replies
Anonymous21 August 2013 at 09:29
Good Articles to read .WOW !!!
ReplyDelete
Replies
Anonymous23 August 2013 at 12:27
Good article and easy to understand Hadoop...

Thanks for sharing this.

---Amit
ReplyDelete
Replies
Anonymous6 September 2013 at 21:41
Clear and precise information.Thank you for sharing
ReplyDelete
Replies
Anonymous23 October 2013 at 21:01
Thank you for this.
ReplyDelete
Replies
Unknown16 December 2014 at 15:16
This is important information,
Thanks for sharing this
Oracle PL SQL Database Training VA
Big Data Hadoop Training VA
ReplyDelete
Replies
Unknown28 January 2015 at 02:17
thank you for giving valuable hadoop information.Hadoop online Training in Hyderabad
ReplyDelete
Replies
Anonymous4 March 2015 at 17:21
Very useful site for all the students and must be shareable site. thanks a lot for the post!!
Microsoft Dynamics GP Training | Informatica training
ReplyDelete
Replies
Stephen12 April 2015 at 10:31
Thank you so much for sharing this great information. Today I stand as a successful hadoop certified professional. Thanks to Big Data Training
ReplyDelete
Replies
Model Busana Muslim Lebaran Tahun 201523 April 2015 at 13:07
Nice article gives simplified high level understanding of Hadoop.
I find there are Lot of similarities between Hadoop and TERADATA database in terms of managing huge data and scalablity and Fault tolerance.
ReplyDelete
Replies
kalyan hadoop1 May 2015 at 19:17
Best Big Data Hadoop Training in Hyderabad @ Kalyan Orienit

Follow the below links to know more knowledge on Hadoop

WebSites:
================
http://www.kalyanhadooptraining.com/

http://www.hyderabadhadooptraining.com/

http://www.bigdatatraininghyderabad.com/

Videos:
===============
https://www.youtube.com/watch?v=-_fTzrgzVQc

https://www.youtube.com/watch?v=Df2Odze87dE

https://www.youtube.com/watch?v=AOfX-tNkYyo

https://www.youtube.com/watch?v=Cyo3y0vlZ3c

https://www.youtube.com/watch?v=jOLSXx6koO4

https://www.youtube.com/watch?v=09mpbNBAmCo
ReplyDelete
Replies
Unknown11 May 2015 at 11:14
Your posts is really helpful for me.Thanks for your wonderful post. I am very happy to read your post. AWS course chennai | AWS certification in chennai | AWS cerfication chennai
ReplyDelete
Replies
surangacloud11 May 2015 at 11:15
very nice blogs!!! i have to learning for lot of information for this sites...Sharing for wonderful information.Thanks for sharing this valuable information to our vision. You have posted a trust worthy blog keep sharing. cloud computing training in chennai | cloud computing training chennai | cloud computing course in chennai | cloud computing course chennai
ReplyDelete
Replies
Unknown11 May 2015 at 11:17
This is extremely helpful info!! Very good work. Everything is very interesting to learn and easy to understood. Thank you for giving information. VMWare Training in chennai | VMWare Training chennai | VMWare course in chennai | VMWare course chennai
ReplyDelete
Replies
Unknown8 March 2016 at 16:46
Hi, JAVA is not at all difficult all you need is to clear your concepts mainly because the concepts of JAVA programming have been taken from day to day life examples. CCNA Course in Chennai
ReplyDelete
Replies
Unknown1 April 2016 at 10:25
Lucid and vivid with illustrations! Thanks, Sandeep :-)
ReplyDelete
Replies
Unknown13 April 2016 at 17:06
I'm very happy to visit this blog. SAP HANA Online Training
ReplyDelete
Replies
Unknown20 April 2016 at 18:25
Thank you for sharing !

big data classroom training

hadoop ónlinÉ training

free big data bootcamp

hadoop big data videos

Big data QA Tester training

Big data Analyst training
ReplyDelete
Replies
Unknown21 May 2016 at 14:12
hii you are providing good information.Thanks for sharing if any one interested SAP APO
Online training see below

http://www.sapapoonlinetraining.in/
ReplyDelete
Replies
Aurthur31 May 2016 at 12:46
Thanks for sharing your blog.It is interesting to read.I think that it will help many of them,who are in need of this type of information.
Regards,
Best Hadoop Training in Chennai | Hadoop course in Chennai
ReplyDelete
Replies
Unknown16 August 2016 at 13:40
Thank for sharing this great Hadoop tutorials Blog post.

Big data Training
ReplyDelete
Replies
Unknown28 September 2016 at 12:43
Thank for sharing.
Hadoop-Big-Data-Administration
ReplyDelete
Replies
Unknown28 September 2016 at 12:45
Thank you for sharing such great information.
websphere training in chennai
ReplyDelete
Replies
Unknown2 October 2016 at 14:13
• can any one suggest me about testing training institute with 100 % placement in adyar...
qlikview training in chennai

ReplyDelete
Replies
Unknown2 October 2016 at 14:14
Thank you for sharing such great information !
websphere training in chennai
ReplyDelete
Replies
Unknown7 November 2016 at 15:44
Nice and thanks for sharing the data we provide the legal services in Hyderabad for more details refer at
lawyers in hyderabad
ReplyDelete
Replies
Unknown18 November 2016 at 14:38
The blog you shared is really good.
Big Data Hadoop is what's to come! Need to be a Big Data Hadoop Developer? Join TechandMate for the Big Data Hadoop Online Tutorial and learn MapReduce structure, HDFS ideas, Hadoop Cluster, Sqoop, Flume, Hive, Pig and YARN. https://goo.gl/QcZzka
ReplyDelete
Replies
Unknown19 November 2016 at 13:17
This comment has been removed by the author.
ReplyDelete
Replies

Add comment

Pages

Saturday, 17 August 2013

Hadoop Simplified

37 comments: