Introduction to Data Mining
Here in this article, we are going to learn about the introduction to Data mining as Humans have been mining from the earth from centuries, to get all sorts of valuable materials. Sometimes while mining, things are discovered from the ground which no one expected to find in the first place. For example in 1898, during the excavation of a tomb to find mummies in Saqqara, Egypt, a wooden artifact was found which exactly resembled an airplane. It was dated back to 200 BC, about 2200 years ago! But what possible information could we get from a large set of data? And even if we start mining it, are there any chances of getting any unexpected results from the data set? Before that let’s go into what exactly is Data Mining.
What is Data Mining?
- It is basically the extraction of vital information/knowledge from a large set of data.
- Think of data as a large ground/rocky surface. We don’t know what is inside it, we don’t know if something useful is beneath the rocks.
- In this introduction to Data mining we are looking for hidden information but without any idea about what type of information we want to find and what we plan to use it for once, we find it.
- Just like in the Concept traditional mining, in Data mining also there are various techniques and tools, which varies according to the type of Data we are mining, So we have cleared that what is data mining through this topic of introduction to Data mining.
Example of Data Mining:
We have learned about the introduction to data mining in the section above and are now proceeding with the examples of data mining, which are listed below:
- So there is a Mobile network operator. They consult a Data miner to dig into the call records of the operator. No specific targets are given to the Data Miner.
- A quantitative target of finding at least 2 new patterns in a month is given.
- As the data miner starts digging into the data he finds a pattern that there are less international calls on Wednesday as compared to other days.
- This information is shared with the management and they come up with the plan to reduce the international call rates on Wednesdays and start a campaign.
- Call rates surge, customers are happy with low call price, more customers sign up and the company makes more money! Win-Win situation!
Keeping the above example in mind let us now look into the various steps involved in data mining.
Steps involved in Data Mining:
We have learned about the introduction to data mining in the above section and are now moving forward with the steps involved in data mining, which are listed below:
In this Introduction to data mining, we will understand every aspect of the business objectives and needs. The current situation is assessed by finding the resources, assumptions and other important factors. Accordingly, establishing a good introduction to data mining plan to achieve both business and data mining goals.
Initially, the data is collected, from all of the available sources. Then we choose the best data set from where we can extract the data which could be more beneficial.
Once the data set is identified, it is selected, cleaned, constructed and formatted in the desired form.
It is a process of remodeling the given data according to the requirement of the user. one or more models could be created on the prepared data set and finally, the models need to be assessed carefully involving stakeholders to make sure that created models meet business initiatives.
This one of the most necessary process in Data mining. It includes going through every aspect of the process so as to check for any possible fault or data leakage in the process. Also, new business requirements could be raised due to the new patterns discovered.
It means to simply present the knowledge in such a way that the stakeholders can use it when they want it. In our above example, it was found that international calls were less on Wednesdays, so this information was presented to the stakeholders who in turn used this information to their advantage and increase their profits.
Techniques used in Data Mining:
In the above section we have learned about the introduction to data mining now we are going ahead with the techniques used in data mining which are listed below:
Cluster Analysis enables to identify a given user group according to common features in a database. These features could include age, geographic location, education level and so on.
It is used to determine when something is noticeably different from the regular pattern. It is used to eliminate any database inconsistencies or anomalies at the source.
This technique is used to make predictions based on relationships within the data set. For example, one can predict the stock rate of a particular product by analyzing the past rate and also by taking into account the different factors that determine the stock rate. Or as shown below, if we have the data of the height and weight of different persons, then given any of height or weight we could determine the other value.
This deals with the things which have labels on it. Note in cluster detection, the things did not have a label in it and by using data mining we had to label and form into clusters, but in classification, there is information existing that can be easily classified using an algorithm. An example is Email spam filters. The spam filter is provided with both relevant and spam messages (Training Data). The differences between both of them are identified thus enabling it to classify future emails correctly.
- Associative Learning:
It is used to analyze which things tend to occur together either in pairs or larger groups. For example, people who tend to buy lemons, buy oranges too, people who tend to buy bread, buy milk too and so on. So the purchases made by all the customers is analyzed and the things which occur together are placed close by together to increase the sales. So milk is placed close to bread, lemons are placed alongside oranges and so on.
Is Data Mining ethical?
So, I am planning on a weekend trip to Goa with a friend, I search the internet for good places to visit in Goa. The next time I open the internet, I find ads about various hotels in Goa for staying.
Yes, the internet has helped me simplify my trip. After all, if I do decide to visit Goa, I would need to sleep somewhere and an ad showing me a hotel is much more useful than an ad showing me random clothes to buy.
Yes! Why would a data mining company which I have never heard before, know where I am going on a vacation. What if I haven’t told anyone about this trip, but here the internet suddenly knows I am going there. The truth is, the business model of the data mining company depends on this. They collect this data via cookies and scripts, then they sell it to advertisers who, in turn, try to sell me something else (In this case, a hotel room).
So it could be good or bad depending on the way we look at it. Also, we could always switch off the cookies or go incognito in the above case. Though whatever be the case one thing is for sure. Data mining is here to stay.
This has been a guide to Introduction to data mining. Here we discuss its meaning, techniques, and steps involved in the introduction to data mining with an example to understand better. You may also look at following articles to learn more –