NewGenApps Blog posts

6 tips on successful Data Mining

Written by Anurag | Feb 24, 2016 6:30:00 PM

Here are a few tips to be keep in mind in order to make complete use of effective data mining techniques:

1. Choose your projects carefully

Try focusing on projects which are clearly aligned with important business verticals like finding cross-selling opportunities, customer loyalty, or detecting fraud. This helps in playing safe as far as the success of the project goes.

2. Collect as much data as possible from multiple sources

While using data mining, one needs to model customer behavior patterns which generates valuable sets of data for very influential as well as susceptible to being influenced customers.

3. Ensure a clean sampling strategy

Even if you have a quite powerful analytics platform, that could model for a huge population, you should try focusing on smaller, simpler subsets of data. Focusing on a clear, concise sampling strategy is the key to successful data mining.

4. Practice ‘throwaway modeling’

The first step of a modeling process is to identify the best predictors from the large set of variables available to you. Throwaway modeling implies throwing in all the information, testing various models and then drilling down on a selection process. Following such practices in the initial part of your project, gives a definite jump in productivity later.

5. A holdout sample should always be used

A holdout sample is used as a reference sample to judge whether the model you are working upon has the ability to predict future scores. This is based upon a sample of observations withheld from estimation to yield a predictive model. Preparing a handout sample ensures that a model just for point-of-sale is not built which is based upon a defined set of data only. Hence, it provides a robust way of building up a model.

6. Refresh the models frequently

The predictive model you have come up with will not always fit your real world data. Models need to be fed in with fresh data every month, week, daily, or even hourly. Therefore, the iteration and scoring frequency is very critical to gain the predictive validity of your model.