What is Data Mining?

Data mining is about discovering interesting patterns from data.

The steps are:

  • Data cleaning – to remove noise and inconsistent data e.g delete missing rows
  • Data integration – combine data from multiple sources e.g combine weather data to our manually generated data
  • Data selection – select relevant for the analysis task at hand e.g select on temperature and time of day
  • Data transformation – transform data into forms appropriate for mining e.g convert sunny and not sunny to 1 and 0
  • Knowledge discovery – apply intelligent methods to extract data patterns e.g we don’t need a jacket in the afternoon
  • Pattern evaluation – identify the truly interesting patterns. e.g save money by not buying a jacket
  • Knowledge presentation – present mined knowledge to users. e.g a dashboard

As a walking model, your brain processes data using all these steps and makes an instant decision.

So this is not too difficult for a human!

But humans have a limitation. They can make decisions using a few variables at a time. Computers, however, can make decisions on millions of variables at a time. And unlike humans, computers are not limited in memory.

A data mining technology called Machine Learning is used to teach machines to make decisions on data.