Saturday, 17 August 2013

Hadoop Simplified


Today we live in the age of Big data.

Data volumes have outgrown the storage & processing capabilities of a single machine and the different types of data formats required to be analyzed have increased tremendously.  
  
This brings 2 fundamental challenges: 
  • How to store and work with huge volumes & variety of data
  • How to analyze these vast data points & use it for competitive advantage.

Hadoop fills this gap by overcoming both the challenges. Hadoop is based on research papers from Google & it was created by Doug Cutting, who named the framework after his son’s yellow stuffed toy elephant.

So What is Hadoop? It is a framework made up of:
  • HDFS – Hadoop distributed file system
  • Distributed computation tier using programming of MapReduce
  • Sits on the low-cost commodity servers connected together called Cluster
  • Consists of a Master Node or NameNode to control the processing
  • Data Nodes to store & process the data
  • JobTracker & TaskTracker to manage & monitor the jobs

Let us see why Hadoop has become so much popular now.
  • Over the last decade, all the data computations were done by increasing the computing power of a single machine by adding the no of processors & increasing the RAM but they had physical limitations. 
  • As the data started growing beyond these capabilities, an alternative was required to handle these storage requirements for eBay (10 PB), Facebook (30 PB), Yahoo (170 PB), JPMC (150 PB) and increasing
  • With a typical 75 MB/Sec disk data transfer rate, it was impossible to process such humongous data
  • Scalability was limited by physical size & no or limited fault tolerance
  • Additionally, various formats of data are being added to the organizations for analysis, which is not possible with traditional databases

How Hadoop addresses these challenges?
  • Data is split into small blocks of 64 or 128MB and stored onto minimum 3 machines at a time to ensure data availability & reliability
  • Many machines connected in cluster work parallel for the faster crunching of data
  • If anyone machine fails, the work is assigned to other automatically
  • MapReduce breaks complex tasks into smaller chunks to be executed in parallel

Benefits of using Hadoop as Big data platform are:
  • Cheap storage – commodity servers to decrease the cost per terabyte
  • Virtually unlimited scalability – new nodes can be added without any changes to existing data gives the ability to process any amount of data, so no archival necessary
  • The speed of processing – tremendous parallel processing to reduce processing time
  • Flexibility – schema-less, can store any data format – structured & unstructured ( audio, video, texts, csv, pdf, images, logs, clickstream data, social media)
  • Fault tolerant – any node failure is covered by another node automatically

Later multiple products & components are added to Hadoop so it is now called an eco-system.
  • Hive – SQL like interface
  • Pig – data management language like commercial tools AbInitio, Informatica
  • HBase – column-oriented database on top of HDFS
  • Flume – real-time data streaming such as credit card transaction, videos
  • Sqoop – SQL interface to RDBMS and HDFS
  • Zookeeper – a DBA management for Hadoop

 And multiple such products are getting added all the time from various companies like Cloudera, Hortonworks, Yahoo, etc.

How some of the world leaders are using Hadoop:
  • Chevron collects large amounts of seismic data to find where they can get more oil resources
  • JPMC uses it for storing more than 150 PB of data, over 3.5 Billion user log-ins for Credit scoringFraud detection
  • eBay using it for real-time analysis and search of 9 PB data with 97 million active buyers, over 200 million items for Cross-Sell
  • Nokia uses it to store data from phone, service logs to analyze how people interact with apps and usage patterns to address customer churn
  • Walmart uses it to analyze customer behavior of over 200 million customer visits in a week
  • UC Irvine Health hospitals are storing 9 million patients records over 22 years to build patients surveillance algorithms
  • Manufacturers are using it for warranty analytics

Hadoop may not replace the existing data warehouses but it is becoming no 1 choice for Big data platform with the price/performance ratio.


Saturday, 11 May 2013

Big data Analytics in Retail


All the industry leaders like  Wal-Mart, Axa, Citibank, Humana, GE and several others are exploring how Big Data analytics can be used to better understand customer needs, pinpoint risk, improve marketing, enhance the customer experience, combat fraud, and drive profitability.  

Companies are seeking ways to rebuild their customer relationships in this time of extremely high customer expectations.

The retail industry is among the early adopters and innovative users of big data. But they have the challenge of tackling the huge data since the 1970s when barcodes were first introduced to scan the products at POS.  All sorts of supply chain data came into effect later in 1980-90s while RFID and other sources such as surveillance video cameras started sending humongous data to data centers recently.  These have challenged Retailers to capture, store, cleanse & analyze all the data they collect.

Further to flood the data centers are consumer’s interaction with social media & internet which is generating billions of data points that can be measured via clicks, page views, time spent on per page and path traversed from landing to conversion.
Big data analytics is helping retailers to collect and analyze this fine-grained shopper visit data and optimize page designs, placements and tailor promotional messages.

McKinsey report says that using big data analytics can raise the operating margins by as much as 60%

Some of the questions Retailers have are:
·   How to drive critical decision around market segmentation, personalization & merchandising?
·  How to avoid lost revenues due to stock-outs, lower online sales per visit, a lower visit to buy ratios?

Here is a glimpse of what retailers can do in big data analytics:
Customer:

·        Enhancing customer experience across all the channels such as calls, emails, campaigns, catalogs, mobile offers, brick & mortar stores
·        Customer sentiment analysis to know the market pulse and market dynamics
·        Call center data analysis for customer feedback
·        Build loyalty programs based on purchase data & customer segmentation
·   Staffing optimization based on weather forecasts & promotional campaigns for better customer experience

Merchandising:
·        Optimizing the product placements and layouts based on video data
·        Price optimization based on competitor pricing
·        Market basket analysis for revenue growth
·        Optimizing seasonal markdowns
·        Store analysis for best location & better effectiveness
·  Improve in-store sales by leveraging past data with current economic, weather & season/holiday data

·        Consumer segmentation, cross-selling
·        Campaign analytics to channelize advertising dollars in an optimal medium for highest ROI
·        Sentiment analysis from social media, call centers, surveys, blogs, product reviews
·   Identify new products, service & market opportunities by real-time monitoring of these customer sentiments
·        Location-based personalized offers on smartphones, tablets
·        Web log analytics for customer behavior analysis & next best offer

Supply Chain:
·        Inventory optimization to avoid stock outs
·        Demand-driven forecasting fueled by structured and unstructured data
·        Route optimization for cost reductions
·        Warehouse space optimization
·        Vendor performance analysis for better competitive prices

Ultimately, the goal of big data analytics is to develop an effective Omni-channel experience that integrates many different factors of supply chain including supplier effectiveness, warehouse optimization, and inventory/logistics optimization for real-time customer engagement.
Big data analytics provides the required ammunition & tools to accelerate growth, boost profits, control risks and meet regulatory & competitive demands.

Saturday, 12 January 2013

Enhancing Customer experience with Big Data Analytics




The era of Big Data is upon us.

Today, most organizations are still struggling to unlock the full value of this Big data that is available to them. From internet to mobile and social, the amount of customer data is continuously growing.

Company’s ability to extract value from big data through smart analytics will be the key to their business success.

Big data by definition is:
  • Volume - Unmanageable volumes by traditional databases
  • Variety - Combination of all the internal structured business data (CRM, ERP, POS and all the internal system data) and external unstructured data ( Social media data, feedback surveys, Audios, Videos, streaming data, Call center data, images)
  • Velocity - enormous speed at which it comes into the organization.

Channel based marketing is the least priority now. The increased amount of data available at individual customer level has allowed companies to do a personal marketing.  But all this customer data out there is worthless if you can’t process it & turn it into actionable intelligence.

With Big data platforms helping in collection, integration, and transformation of large volumes of data, companies can conduct complex and varied analysis on much larger data sets and reduce the time to action and reaction to customer needs.

Organizations can now impact the entire customer lifecycle and every interaction by being well prepared for each interaction, shaping the interaction in real time as it happens and driving the huge improvements across all the channels for next interaction.

By listening to the data as a signal from customers and working to personalize the experience for the customer, creates the value for the customer as well as business.

Some examples of enhancing the customer experience using Big data Analytics:
  • Retail giants are using Big data to personalize the offers to enhance the customer experience
  • Healthcare companies are using it for improving the patient care in hospitals
  • Banks are using it for cross-up selling and bringing the new products to market
  • Insurance companies are making tailor-made policies for their customers in real time
  • Manufacturing companies are using Big data to improve their products, predict the failures in their product lines ahead of time to make every customer interaction very smooth

Using Big data to address customer inputs before they become problems is extremely important to ensure they stay loyal and more profitable to the company.

Customers expect to have the best possible experience from their vendors/service providers. They want to be recognized as individuals and not as a part of a segment.


360TotalSecurity WW