Introduction to Data Mining
Here in this article, we are going to learn about the introduction to Data Mining as humans have been mining from the earth from centuries, to get all sorts of valuable materials. Sometimes while mining, things are discovered from the ground which no one expected to find in the first place. For example in 1898, during the excavation of a tomb to find mummies in Saqqara, Egypt, a wooden artifact was found which exactly resembled an airplane. It was dated back to 200 BC, about 2200 years ago! But what possible information could we get from a large set of data? And even if we start mining it, are there any chances of getting any unexpected results from the data set? Before that let’s go into what exactly is Data Mining.
What is Data Mining?
- It is basically the extraction of vital information/knowledge from a large set of data.
- Think of data as a large ground/rocky surface. We don’t know what is inside it, we don’t know if something useful is beneath the rocks.
- In this introduction to data mining, we are looking for hidden information but without any idea about what type of information we want to find and what we plan to use it for once, we find it.
- Just like in the concept of traditional mining, in data mining also there are various techniques and tools, which vary according to the type of data we are mining, So we have cleared that what is data mining through this topic of introduction to data mining.
The example of data mining is as shown below:
- So there is a Mobile network operator. They consult a data-miner to dig into the call records of the operator. No specific targets are given to the Data Miner.
- A quantitative target of finding at least 2 new patterns in a month is given.
- As the data miner starts digging into the data he finds a pattern that there are less international calls on wednesday as compared to other days.
- This information is shared with the management and they come up with the plan to reduce the international call rates on wednesdays and start a campaign.
- Call rates surge, customers are happy with low call prices, more customers sign up and the company makes more money! Win-Win situation!
Keeping the above example in mind let us now look into the various steps involved in data mining.
Steps involved in Data Mining
Below are the steps involved in data mining:
1 – Business Understanding
In this introduction to data mining, we will understand every aspect of the business objectives and needs. The current situation is assessed by finding the resources, assumptions and other important factors. Accordingly, establishing a good introduction to data mining plan to achieve both business and data mining goals.
2 – Data Understanding
Initially, the data is collected, from all of the available sources. Then we choose the best data set from where we can extract the data which could be more beneficial.
3 – Data Preparation
Once the data set is identified, it is selected, cleaned, constructed and formatted in the desired form.
4 – Data Modelling
It is a process of remodeling the given data according to the requirement of the user. one or more models could be created on the prepared data set and finally, the models need to be assessed carefully involving stakeholders to make sure that created models meet business initiatives.
5 – Evaluation
This one of the most necessary process in data mining. It includes going through every aspect of the process so as to check for any possible fault or data leakage in the process. Also, new business requirements could be raised due to the new patterns discovered.
6 – Deployment
It means to simply present the knowledge in such a way that the stakeholders can use it when they want it. In our above example, it was found that international calls were less on wednesday, so this information was presented to the stakeholders who in turn used this information to their advantage and increase their profits.
Techniques used in Data Mining
The techniques used in data mining are as listed below:
Cluster Analysis enables to identify a given user group according to common features in a database. These features could include age, geographic location, education level and so on.
It is used to determine when something is noticeably different from the regular pattern. It is used to eliminate any database inconsistencies or anomalies at the source.
This technique is used to make predictions based on relationships within the data set. For example, one can predict the stock rate of a particular product by analyzing the past rate and also by taking into account the different factors that determine the stock rate. Or as shown below, if we have the data of the height and weight of different persons, then given any of height or weight we could determine the other value.
This deals with the things which have labels on it. Note in cluster detection, the things did not have a label in it and by using data mining we had to label and form into clusters, but in classification, there is information existing that can be easily classified using an algorithm. An example is Email spam filters. The spam filter is provided with both relevant and spam messages (Training Data). The differences between both of them are identified thus enabling it to classify future emails correctly.
- Associative Learning
It is used to analyze which things tend to occur together either in pairs or larger groups. For example, people who tend to buy lemons, buy oranges too, people who tend to buy bread, buy milk too and so on. So the purchases made by all the customers is analyzed and the things which occur together are placed close by together to increase the sales. So milk is placed close to bread, lemons are placed alongside oranges and so on.
Is Data Mining Ethical?
So, I am planning on a weekend trip to Goa with a friend, I search the internet for good places to visit in Goa. The next time I open the internet, I find ads about various hotels in Goa for staying.
Yes, the internet has helped me simplify my trip. After all, if I do decide to visit Goa, I would need to sleep somewhere and an ad showing me a hotel is much more useful than an ad showing me random clothes to buy.
Yes! Why would a data mining company which i have never heard before, know where I am going on a vacation. What if i haven’t told anyone about this trip, but here the internet suddenly knows i am going there. The truth is, the business model of the data mining company depends on this. They collect this data via cookies and scripts, then they sell it to advertisers who, in turn, try to sell me something else (In this case, a hotel room).
So it could be good or bad depending on the way we look at it. Also, we could always switch off the cookies or go incognito in the above case. Though whatever be the case one thing is for sure. data mining is here to stay.
This is a guide to Introduction to Data Mining. Here we discuss steps and techniques in Data Mining along with a respective example. You may also look at following articles to learn more –