Saturday, 17 August 2013

Hadoop Simplified


Today we live in the age of Big data.

Data volumes have outgrown the storage & processing capabilities of a single machine and the different types of data formats required to be analyzed has increased tremendously.  
  
This brings 2 fundamental challenges: 
  • How to store and work with huge volumes & variety of data
  • How to analyze these vast data points & use it for competitive advantage.

Hadoop fills this gap by overcoming both the challenges. Hadoop is based on research papers from Google & it was created by Doug Cutting, who named the framework after his son’s yellow stuffed toy elephant.

So What is Hadoop ? It is a framework made up of:
  • HDFS – Hadoop distributed file system
  • Distributed computation tier using programming of MapReduce
  • Sits on the low cost commodity servers connected together called Cluster
  • Consists of a Master Node or NameNode to control the processing
  • Data Nodes to store & process the data
  • JobTracker & TaskTracker to manage & monitor the jobs

Let us see why Hadoop has become so much popular now.
  • Over last decade all the data computations were done by increasing the computing power of single machine by adding the no of processors & increasing the RAM but they had physical limitations. 
  • As the data started growing beyond these capabilities, an alternative was required to handle these storage requirements for eBay (10 PB), Facebook (30 PB), Yahoo (170 PB), JPMC (150 PB) and increasing
  • With typical 75 MB/Sec disk data transfer rate, it was impossible to process such humongous data
  • Scalability was limited by physical size & no or limited fault tolerance
  • Additionaly various formats of data are being added to the organizations for analysis, which is not possible with traditional databases

How Hadoop addresses these challenges?
  • Data is split into small blocks of 64 or 128MB and stored onto minimum 3 machines at a time to ensure data availability & reliability
  • Many machines connected in cluster work parallel for faster crunching of data
  • If any one machine fails, the work is assigned to other automatically
  • MapReduce breaks complex tasks into smaller chunks to be executed in parallel

Benefits of using Hadoop as Big data platform are:
  • Cheap storage – commodity servers to decrease the cost per terabyte
  • Virtually unlimited scalability – new nodes can be added without any changes to existing data gives ability to process any amount of data, so no archival necessary
  • Speed of processing – tremendous parallel processing to reduce processing time
  • Flexibility – schema less, can store any data format – structured & unstructured ( audio, video, texts, csv, pdf, images, logs, clickstream data, social media)
  • Fault tolerant – any node failure is covered by another node automatically

Later multiple products & components are added to Hadoop so it is now called an eco-system.
  • Hive – SQL like interface
  • Pig – data management language like commercial tools AbInitio, Informatica
  • Hbase – column oriented database on top of HDFS
  • Flume – real time data streaming such as credit card transaction, videos
  • Sqoop – SQL interface to RDBMS and HDFS
  • Zookeeper – a DBA management for Hadoop

 And multiple such products are getting added all the time from various companies like Cloudera, Hortonworks, Yahoo etc.

How some of the world leaders are using Hadoop:
  • Chevron collects large amounts of seismic data to find where they can get more oil resources
  • JPMC uses it for storing more than 150 PB of data, over 3.5 Billion user log-ins for Credit scoringFraud detection
  • eBay using it for real time analysis and search of 9 PB data with 97 million active buyers, over 200 million items for Cross-Sell
  • Nokia uses it store data from phone, service logs to analyze how people interact with apps and usage patterns to address customer churn
  • Walmart uses it to analyze customer behavior of over 200 million customer visits in a week
  • UC Irvine Health hospitals are storing 9 million patients records over 22 years to build patients surveillance algorithms
  • Manufacturers are using it for warranty analytics

Hadoop may not replace the existing data warehouses but it is becoming no 1 choice for Big data platform with price/performance ratio.


53 comments:

  1. Very informative and helpful.

    ReplyDelete
    Replies
    1. BackBoneJS Training in Chennai BackBoneJS Training in Chennai EmberJS Training in Chennai EmberJS Training in Chennai
      ReactJS Training in CHennai ReactJS Training KnockoutJS Training in Chennai KnockoutJS Training in Chennai D3 Training in CHennai D3 Training

      Delete
    2. I have read your blog its very attractive and impressive. I like it your blog.

      Java Training in Chennai Core Java Training in Chennai Core Java Training in Chennai

      Java Online Training Java Online Training Core Java 8 Training in Chennai Core java 8 online training JavaEE Training in Chennai Java EE Training in Chennai

      Delete
    3. Java Training Institutes Java Training Institutes Online JSF Training JSF Training Institutes in Chennai Java Training Institutes Java Training Institutes Struts2 Training Institutes in Chennai Struts2 Training Institutes in Chennai

      Delete
  2. Thank you so much for informative and useful article. It really helps me.

    ReplyDelete
  3. Thank you so much. It is easy to understand why and What is hadoop.

    ReplyDelete
    Replies
    1. Hadoop is the data software which is used for batch data processing and working with bulk data at a time in this era where social media and data maintenance has become a big problem. Hadoop helps in this area to maintain and organize data properly. For more information have a demo at hadoop training in hyderabad for point to point understanding of the comcepts.

      Delete
  4. Really Simplied version on Big Data ! Very Informative !

    ReplyDelete
  5. thanks.....you made Hadoop look easy :)

    ReplyDelete
  6. Hi Sandeep,
    Its a great article for the beginners like us.
    Doug Cutting child's toy change the world.
    Hadoop is a greet framework and your article helped in clearing our concepts.
    Thanks,
    Chander Sharma

    ReplyDelete
  7. Your article provides a clear and concise summary.
    Thank you.

    ReplyDelete
  8. Wow ... Have been hearing about Big Data and Hadoop , this article opens the door to Big Data World.

    ReplyDelete
  9. Nice article gives simplified high level understanding of Hadoop.
    I find there are Lot of similarities between Hadoop and TERADATA database in terms of managing huge data and scalablity and Fault tolerance.

    -- Aditya

    ReplyDelete
  10. Good Articles to read .WOW !!!

    ReplyDelete
  11. The information which you have provided is very good and easily understood.
    It is very useful who is looking for hadoop Online Training.

    ReplyDelete
  12. Good article and easy to understand Hadoop...

    Thanks for sharing this.

    ---Amit

    ReplyDelete
  13. Clear and precise information.Thank you for sharing

    ReplyDelete
  14. Thank you for this.

    ReplyDelete
  15. This website is very helpful for the students who need info about the Hadoop courses.i appreciate for your post. thanks for shearing it with us. keep it up.
    Hadoop Training in hyderabad

    ReplyDelete
  16. Very useful site for all the students and must be shareable site. thanks a lot for the post!!
    Microsoft Dynamics GP Training | Informatica training

    ReplyDelete
  17. Thank you so much for sharing this great information. Today I stand as a successful hadoop certified professional. Thanks to Big Data Training

    ReplyDelete
  18. Nice article gives simplified high level understanding of Hadoop.
    I find there are Lot of similarities between Hadoop and TERADATA database in terms of managing huge data and scalablity and Fault tolerance.

    ReplyDelete
  19. Best Big Data Hadoop Training in Hyderabad @ Kalyan Orienit

    Follow the below links to know more knowledge on Hadoop

    WebSites:
    ================
    http://www.kalyanhadooptraining.com/

    http://www.hyderabadhadooptraining.com/

    http://www.bigdatatraininghyderabad.com/

    Videos:
    ===============
    https://www.youtube.com/watch?v=-_fTzrgzVQc

    https://www.youtube.com/watch?v=Df2Odze87dE

    https://www.youtube.com/watch?v=AOfX-tNkYyo

    https://www.youtube.com/watch?v=Cyo3y0vlZ3c

    https://www.youtube.com/watch?v=jOLSXx6koO4

    https://www.youtube.com/watch?v=09mpbNBAmCo

    ReplyDelete
  20. Your posts is really helpful for me.Thanks for your wonderful post. I am very happy to read your post. AWS course chennai | AWS certification in chennai | AWS cerfication chennai

    ReplyDelete
  21. very nice blogs!!! i have to learning for lot of information for this sites...Sharing for wonderful information.Thanks for sharing this valuable information to our vision. You have posted a trust worthy blog keep sharing. cloud computing training in chennai | cloud computing training chennai | cloud computing course in chennai | cloud computing course chennai

    ReplyDelete
  22. This is extremely helpful info!! Very good work. Everything is very interesting to learn and easy to understood. Thank you for giving information. VMWare Training in chennai | VMWare Training chennai | VMWare course in chennai | VMWare course chennai

    ReplyDelete
  23. Hi, JAVA is not at all difficult all you need is to clear your concepts mainly because the concepts of JAVA programming have been taken from day to day life examples. J2EE Training in Chennai | JAVA Training in Chennai

    ReplyDelete
  24. Hi, JAVA is not at all difficult all you need is to clear your concepts mainly because the concepts of JAVA programming have been taken from day to day life examples. CCNA Course in Chennai

    ReplyDelete
  25. Lucid and vivid with illustrations! Thanks, Sandeep :-)

    ReplyDelete
  26. There are lots of information about hadoop have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get to the next level in big data. Thanks for sharing this. Hadoop Training in Chennai | Big Data Training in Chennai

    ReplyDelete
  27. hii you are providing good information.Thanks for sharing if any one interested SAP APO
    Online training see below

    http://www.sapapoonlinetraining.in/

    ReplyDelete
  28. Thanks for sharing your blog.It is interesting to read.I think that it will help many of them,who are in need of this type of information.
    Regards,
    Best Hadoop Training in Chennai | Hadoop course in Chennai

    ReplyDelete
  29. Thank for sharing this great Hadoop tutorials Blog post.

    Big data Training

    ReplyDelete
  30. Well said ,you have furnished the right information that will be useful to anyone at all time.Thanks for sharing your Ideas.
    hadoop online training

    ReplyDelete
  31. • can any one suggest me about testing training institute with 100 % placement in adyar...
    qlikview training in chennai

    ReplyDelete
  32. Nice and thanks for sharing the data we provide the legal services in Hyderabad for more details refer at
    lawyers in hyderabad

    ReplyDelete
  33. Nice and thanks for sharing the data we provide the legal services in Hyderabad for more details refer at
    lawyers in hyderabad

    ReplyDelete
  34. Nice and thanks for sharing the data we provide the legal services in Hyderabad for more details refer at
    lawyers in hyderabad

    ReplyDelete
  35. The blog you shared is really good.
    Big Data Hadoop is what's to come! Need to be a Big Data Hadoop Developer? Join TechandMate for the Big Data Hadoop Online Tutorial and learn MapReduce structure, HDFS ideas, Hadoop Cluster, Sqoop, Flume, Hive, Pig and YARN. https://goo.gl/QcZzka

    ReplyDelete
  36. This comment has been removed by the author.

    ReplyDelete

LinkWithin

Related Posts Plugin for WordPress, Blogger...