Sunday 7 August 2016

How to evaluate Data Science models ?

In today’s Digital age,  insights received from data science are extremely important to deliver the best customer experience. 

Data Scientists use various techniques such as Regression, SVM, Neural network, Nearest neighbor, Naive Bayes, Decision Tree and Ensemble models.

These algorithms help to identify previously unrecognized patterns and trends hidden within vast amounts of structured and unstructured information. These patterns are used to create predictive models that try to forecast future behavior.

These models have many practical business applications: predicting patients at risk, they help banks decide which customers to approve for loans, and marketers use them to determine which leads to target with campaigns.

But how to determine if the predictive models you create are accurate, meaningful representations that will prove valuable to your organization?

There are various methods used by data scientists to measure the accuracy of the model:
  • Lift Charts & Gain Charts: These are widely used in campaign targeting problems, to determine which decile can we target customers for a specific campaign. Also, it tells you how much response you can expect from the new target base.
  • ROC Curve: The ROC curve is the plot between false positive rate and True Positive rate.
  • Gini coefficient: This is the ratio of area between the ROC curve and the diagonal line & the area of the above triangle
  • Cross Validation: splitting the data into two parts, where one part is used for "training" your model, and the second part is used to make predictions. By this you can test the model on the data that was "not seen" by it previously, and check how it could possibly behave with external data.
  • Confusion Matrix: A table showing the number of predictions for each class compared to the number of instances that actually belong to each class. This is very useful to get an overview of the types of mistakes the algorithm made. This method shows accuracy, true positive, false positive, Sensitivity & specificity of the model.
  • Root Mean Squared Error: This is the average amount of error made on the test set in the units of the output variable. This measure helps you get an idea on the amount a given prediction may be wrong on average. This is most popular in regression techniques.
In general, the assessment used should be closely matching the business objectives. Using the right metric can have more influence on you model performance than the algorithm you use.

There are so many data points generated by Internet of Things, Mobiles, Social Media and all the Omni-Channels used for customer interactions. Only storing this data is useless , unless it is used by data scientists for generating insights that is used for next actions. 


  1. Hi,
    Find good articles and real information.

    The bigger issue that we need to concern ourselves with is whether or not we have the need for an email database in order to become effective Big Data Users Email List in our email marketing

  2. Hi,

    If your company has a product or service to market you have come to the right place.
    Acquire Mailing Data has Bulk and Targeted Email Lists for your next email marketing campaign.
    try to get better ROI for marketing sales with
    south africa digital marketing


360TotalSecurity WW