Analytics
has taken the world by storm & It is the powerhouse for all the digital transformation happening in every industry.
Today
everybody is generating tons of data – we as consumers leaving digital
footprints on social media, IoT generating millions of records from sensors, Mobile phones are used from morning till we sleep. All these varieties of data formats are stored in Big Data platform. But
only storing this data is not going to take us anywhere unless analytics is
applied to it. Hence it is extremely important to close the loop with Analytics
insights.
Here is
my version of A to Z for Analytics:
Artificial Intelligence: AI is the capability of a
machine to imitate intelligent human behavior. BMW, Tesla, Google are using AI
for self-driving cars. AI should be used to solve real-world tough
problems like climate modeling to disease analysis and betterment of humanity.
Boosting and Bagging: it is the technique used to generate
more accurate models by ensembling multiple models together
Crisp-DM:
is the cross-industry standard process for data mining. It was developed by a consortium of companies like
SPSS, Teradata, Daimler and NCR Corporation in 1997 to bring the order in
developing analytics models. Major 6 steps involved are business understanding,
data understanding, data preparation, modeling, evaluation, and deployment.
Data preparation: In analytics deployments, more than 60% time
is spent on data preparation. As a normal rule is a garbage in garbage out. Hence
it is important to cleanse and normalize the data and make it available for
consumption by model.
Ensembling: is the technique of combining two or more
algorithms to get more robust predictions. It is like combining all the marks
we obtain in exams to arrive at the final overall score. Random Forest is one such
example combining multiple decision trees.
Feature selection: Simply put this means selecting only those
feature or variables from the data which really makes sense and remove nonrelevant variables. This uplifts the model accuracy.
Gini Coefficient: it is used to measure the predictive power
of the model typically used in credit scoring tools to find out who will repay
and who will default on a loan?
Histogram: This is a graphical representation of the
distribution of a set of numeric data, usually a vertical bar graph used for
exploratory analytics and data preparation step.
Independent Variable: is the variable that is changed or controlled in a scientific
experiment to test the effects on the dependent variable like the effect of
increasing the price of Sales.
Jubatus: This is an online Machine Learning Library covering
Classification, Regression, Recommendation (Nearest Neighbor Search), Graph
Mining, Anomaly Detection, Clustering
KNN: K nearest neighbor algorithm in Machine Learning used
for classification problems based on distance or similarity between data
points.
Lift Chart: These are widely used in campaign targeting
problems, to determine which decile can we target customers for a specific
campaign. Also, it tells you how much response you can expect from the new
target base.
Model: There are more than 50+ modeling techniques like
regressions, decision trees, SVM, GLM, Neural networks, etc present in any
technology platform like SAS Enterprise Miner, IBM SPSS or R. They are broadly
categorized under supervised and unsupervised methods into classification,
clustering, association rules.
Neural Networks: These are typically organized in layers made
up by nodes and mimic the learning like the brain does. Today Deep Learning is an emerging field based on deep neural networks.
Optimization: It the Use of simulations techniques to
identify scenarios which will produce best results within available constraints e.g. Sale price
optimization, identifying optimal Inventory for maximum fulfillment & avoid
stockouts
PMML: this is XML based file format developed by data mining
group to transfer models between various technology platforms and it stands for
predictive model markup language.
Quartile: It is dividing the sorted output of the model into 4
groups for further action.
R: Today every university and even corporates are using R for
statistical model building. It is freely available and there are licensed
versions like Microsoft R. more than 7000 packages are now available at
disposal to data scientists.
Sentiment Analytics: Is the process of determining whether an
information or service provided by business leads to positive, negative or
neutral human feelings or opinions. All the consumer product companies are
measuring the sentiments 24/7 and adjusting their marketing strategies.
Text Analytics: It is used to discover & extract
meaningful patterns and relationships from the text collection from social
media site such as Facebook, Twitter, Linked-in, Blogs, Call center scripts.
Unsupervised Learning: These are algorithms where there is
only input data and expected to find some patterns. Clustering &
Association algorithms like k-means & apriori are best examples.
Visualization: It is the method of enhanced exploratory data
analysis & showing the output of modeling results with highly interactive
statistical graphics. Any model output has to be presented to senior management
in the most compelling way. Tableau, QlikView, Spotfire are leading visualization
tools.
What-If analysis: It is the method to simulate various
business scenarios questions like what if we increased our marketing budget by
20%, what will impact on sales? Monte Carlo simulation is very popular.
What do you think should come for X, Y, Z?
No comments:
Post a Comment