Data mining is about discovering interesting patterns from data.
The steps are:
- Data cleaning – to remove noise and inconsistent data e.g delete missing rows
- Data integration – combine data from multiple sources e.g combine weather data to our manually generated data
- Data selection – select relevant for the analysis task at hand e.g select on temperature and time of day
- Data transformation – transform data into forms appropriate for mining e.g convert sunny and not sunny to 1 and 0
- Knowledge discovery – apply intelligent methods to extract data patterns e.g we don’t need a jacket in the afternoon
- Pattern evaluation – identify the truly interesting patterns. e.g save money by not buying a jacket
- Knowledge presentation – present mined knowledge to users. e.g a dashboard
As a walking model, your brain processes data using all these steps and makes an instant decision.
So this is not too difficult for a human!
But humans have a limitation. They can make decisions using a few variables at a time. Computers, however, can make decisions on millions of variables at a time. And unlike humans, computers are not limited in memory.
A data mining technology called Machine Learning is used to teach machines to make decisions on data.