Saturday, 29 April 2017

5 ways to improve the model accuracy of Machine Learning!!

Today we are into digital age, every business is using big data and machine learning to effectively target users with messaging in a language they really understand and push offers, deals and ads that appeal to them across a range of channels.

With exponential growth in data from people and & internet of things, a key to survival is to use machine learning & make that data more meaningful, more relevant to enrich customer experience.

Machine Learning can also wreak havoc on a business if improperly implemented. Before embracing this technology, enterprises should be aware of the ways machine learning can fall flat. Data scientists have to take extreme care while developing these machine learning models so that it generate right insights to be consumed by business.

Here are 5 ways to improve the accuracy & predictive ability of machine learning model and ensure it produces better results.

·       Ensure that you have variety of data that covers almost all the scenarios and not biased to any situation. There was a news in early pokemon go days that it was showing only white neighborhoods. It’s because the creators of the algorithms failed to provide a diverse training set, and didn't spend time in these neighborhoods. Instead of working on a limited data, ask for more data. That will improve the accuracy of the model.

·       Several times the data received has missing values. Data scientists have to treat outliers and missing values properly to increase the accuracy. There are multiple methods to do that – impute mean, median or mode values in case of continuous variables and for categorical variables use a class. For outliers either delete them or perform some transformations.

·       Finding the right variables or features which will have maximum impact on the outcome is one of the key aspect. This will come from better domain knowledge, visualizations. It’s imperative to consider as many relevant variables and potential outcomes as possible prior to deploying a machine learning algorithm.

·       Ensemble models is combining multiple models to improve the accuracy using bagging, boosting. This ensembling can improve the predictive performance more than any single model. Random forests are used many times for ensembling.

·       Re-validate the model at proper time frequency. It is necessary to score the model with new data every day, every week or month based on changes in the data. If required rebuild the models periodically with different techniques to challenge the model present in the production.

There are some more ways but the ones mentioned above are foundational steps to ensure model accuracy.

Machine learning gives the super power in the hands of organization but as mentioned in the Spider Man movie – “With great power comes the great responsibility” so use it properly.


Saturday, 22 April 2017

Beyond SMAC – Digital twister of disruption!!

Have your seen the 1996 movie Twister, based on tornadoes disrupting the neighborhoods? A group of people were shown trying to perfect the devices called Dorothy which has hundreds of sensors to be released in the center of twister so proper data can be collected to create a more advanced warning system and save people.

Today if we apply the same analogy – digital is disrupting every business, if you stand still and don’t adapt you will become digital dinosaur. Everyone wants to get that advance warning of what is coming ahead.

Even if your business is doing strong right now, you will never know who will disrupt you tomorrow.

We have seen these disruption waves and innovations in technologies – mainframe era, mini computers era, personal computers & client-server era and internet era. Then came the 5th wave of SMAC era comprising Social, 
Mobile, Analytics and Cloud technologies.

Gone are the days when we used to wait for vacations to meet our families and friends by travelling to native place or abroad. Today all of us are interacting with each other on social media rather than in person on Facebook, Whastapp, Instagram, Snapchat and so on.

Mobile enablement has helped us anytime, anywhere, any device interaction with consumers. We stare at smarphone screen more than 200 times a day.

Analytics came in to power the hyper-personalization in each interaction and send relevant offers, communications to customers. The descriptive analytics gave the power to know what is happening to the business right now, while predictive analytics gave the insight of what may happen. Going further prescriptive analytics gave the foresight of what actions to be taken to make things happens.

Cloud gave organizations the ability to quickly scale up at lower cost as the computing requirements grow with secure private clouds.

Today we are in the 6th wave of disruption beyond SMAC era - into Digital Transformation, bringing Big Data, Internet of things, APIs, Microservices, Robotics, 3d printing, augmented reality/virtual reality, wearables, drones, beacons and blockchain.

Big Data allows to store all the tons of data generated in the universe to be used further for competitive edge.

Internet of Things allows machines, computers, smart devices communicate with each other and help us carry out various tasks remotely.

APIs are getting lot of attention as they are easy, lightweight, can be plugged into virtually any system and highly customizable to ensure data flows between disparate systems.

Microservices are independently developed & deployable, small, modular services. 

Robotics is bringing the wave of intelligent automation with help of cognitive computing.

3D printing or additive manufacturing is taking the several industries like medical, military, engineering & manufacturing by storm.

Augmented reality / virtual reality is changing the travel, real estate and education.

Wearables such as smart watches, health trackers, Google Glass can help real time updates,  ensure better health & enable hands-free process optimization in areas like item picking in a warehouse.

Drones have come out of military zone and available for common use. Amazon, Dominos are using it for delivery while Insurance & Agriculture are using it for aerial surveys.

Beacons are revolutionizing the customer experience with in-store analytics, proximity marketing, indoor navigation and contact less payments.

The new kid on the block is blockchain where finance industry is all set to take advantage of this technology.

As products and services are getting more digitized, traditional business processes, business models and even business are getting disrupted.

The only way to survive this twister is to get closer to your customers by offering a radically different way of doing business that’s faster, simpler and cheaper.

Saturday, 15 April 2017

A to Z of Analytics

Analytics has taken world by storm & It it the powerhouse for all the digital transformation happening in every industry.

Today everybody is generating tons of data – we as consumers leaving digital footprints on social media, IoT generating millions of records from sensors, Mobile phones are used from morning till we sleep. All these variety of data formats are stored in Big Data platform. But only storing this data is not going to take us anywhere unless analytics is applied on it. Hence it is extremely important to close the loop with Analytics insights.

Here is my version of A to Z for Analytics:

Artificial Intelligence: AI is the capability of a machine to imitate intelligent human behavior. BMW, Tesla, Google are using AI for self-driving cars. AI should be used to solve real world tough problems like climate modeling to disease analysis and betterment of humanity.

Boosting and Bagging: it is the technique used to generate more accurate models by ensembling multiple models together

Crisp-DM: is the cross industry standard process for data mining.  It was developed by a consortium of companies like SPSS, Teradata, Daimler and NCR Corporation in 1997 to bring the order in developing analytics models. Major 6 steps involved are business understanding, data understanding, data preparation, modeling, evaluation and deployment.

Data preparation: In analytics deployments more than 60% time is spent on data preparation. As a normal rule is garbage in garbage out. Hence it is important to cleanse and normalize the data and make it available for consumption by model.

Ensembling: is the technique of combining two or more algorithms to get more robust predictions. It is like combining all the marks we obtain in exams to arrive at final overall score. Random Forest is one such example combining multiple decision trees.

Feature selection: Simply put this means selecting only those feature or variables from the data which really makes sense and remove non relevant variables. This uplifts the model accuracy.

Gini Coefficient: it is used to measure the predictive power of the model typically used in credit scoring tools to find out who will repay and who will default on a loan.

Histogram: This is a graphical representation of the distribution of a set of numeric data, usually a vertical bar graph used for exploratory analytics and data preparation step.

Independent Variable: is the variable that is changed or controlled in a scientific experiment to test the effects on the dependent variable like effect of increasing the price on Sales.

Jubatus: This is online Machine Learning Library covering Classification, Regression, Recommendation (Nearest Neighbor Search), Graph Mining, Anomaly Detection, Clustering

KNN: K nearest neighbor algorithm in Machine Learning used for classification problems based on distance or similarity between data points.

Lift Chart: These are widely used in campaign targeting problems, to determine which decile can we target customers for a specific campaign. Also, it tells you how much response you can expect from the new target base.

Model: There are more than 50+ modeling techniques like regressions, decision trees, SVM, GLM, Neural networks etc present in any technology platform like SAS Enterprise miner, IBM SPSS or R. They are broadly categorized under supervised and unsupervised methods into classification, clustering, association rules.

Neural Networks: These are typically organized in layers made up by nodes and mimic the learning like brain does. Today Deep Learning is emerging field based on deep neural networks.
 
Optimization: It the Use of simulations techniques to identify scenarios which will produce best results within available constraints e.g. Sale price optimization, identifying optimal Inventory for maximum fulfillment & avoid stock outs

PMML: this is xml base file format developed by data mining group to transfer models between various technology platforms and it stands for predictive model markup language.

Quartile: It is dividing the sorted output of model into 4 groups for further action.

R: Today every university and even corporates are using R for statistical model building. It is freely available and there are licensed versions like Microsoft R. more than 7000 packages are now available at disposal to data scientists.

Sentiment Analytics: Is the process of determining whether an information or service provided by business leads to positive, negative or neutral human feelings or opinions. All the consumer product companies are measuring the sentiments 24/7 and adjusting there marketing strategies.

Text Analytics: It is used to discover & extract meaningful patterns and relationships from the text collection from social media site such as Facebook, Twitter, Linked-in, Blogs, Call center scripts.

Unsupervised Learning: These are algorithms where there is only input data and expected to find some patterns. Clustering & Association algorithms like k-menas & apriori are best examples.

Visualization: It is the method of enhanced exploratory data analysis & showing output of modeling results with highly interactive statistical graphics. Any model output has to be presented to senior management in most compelling way. Tableau, Qlikview, Spotfire are leading visualization tools.

What-If analysis: It is the method to simulate various business scenarios questions like what if we increased our marketing budget by 20%, what will be impact on sales? Monte Carlo simulation is very popular.

What do think should come for X, Y, Z?

LinkWithin

Related Posts Plugin for WordPress, Blogger...