EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login

Introduction To Data Mining

By Priya PedamkarPriya Pedamkar

Home » Data Science » Data Science Tutorials » Data Mining Tutorial » Introduction To Data Mining

Introduction-To-Data-Mining

Introduction to Data Mining

Here in this article, we will learn about the introduction to Data Mining as humans have been mining from the earth for centuries, to get all sorts of valuable materials. Sometimes while mining, things are discovered from the ground which no one expected to find in the first place. For example, in 1898, during the excavation of a tomb to find mummies in Saqqara, Egypt, a wooden artefact was found which exactly resembled an aeroplane. It was dated back to 200 BC, about 2200 years ago! But what possible information could we get from a large set of data? And even if we start mining it, are there any chances of getting any unexpected results from the data set? Before that, let’s go into what exactly is Data Mining.

What is Data Mining?

  • It is basically the extraction of vital information/knowledge from a large set of data.
  • Think of data as a large ground/rocky surface. We don’t know what is inside it; we don’t know if something useful is beneath the rocks.
  • In this introduction to data mining, we are looking for hidden information, but without any idea about what type of information we want to find and what we plan to use it for once, we find it.
  • Just like in the concept of traditional mining, in data mining also there are various techniques and tools, which vary according to the type of data we are mining, So we have cleared that what is data mining through this topic of introduction to data mining.

Example

An example of data mining is as shown below:

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

  • So there is a Mobile network operator. They consult a data-miner to dig into the call records of the operator. No specific targets are given to the Data Miner.
  • A quantitative target of finding at least 2 new patterns in a month is given.
  • As the data miner starts digging into the data, he finds a pattern that there are fewer international calls on Wednesday than other days.
  • This information is shared with the management, and they come up with the plan to reduce the international call rates on Wednesdays and start a campaign.
  • Call rates surge, customers are happy with low call prices, more customers sign up, and make more money! Win-Win situation!

Keeping the above example in mind, let us now look into the various data mining steps.

Steps involved in Data Mining

Below are the steps involved in data mining:

Steps involved in Data Mining

1. Business Understanding

In this introduction to data mining, we will understand every aspect of the business objectives and needs. The current situation is assessed by finding the resources, assumptions and other important factors. Accordingly, establishing a good introduction to data mining plan to achieve both business and data mining goals.

2. Data Understanding

Initially, the data is collected, from all of the available sources. Then we choose the best data set from where we can extract the information which could be more beneficial.

3. Data Preparation

Once the data set is identified, it is selected, cleaned, constructed, and formatted in the desired form.

4. Data Modelling

It is a process of remodelling the given data according to the user’s requirement. one or more models could be created on the prepared data set. Finally, the models need to be assessed carefully involving stakeholders to make sure that created models meet business initiatives.

Popular Course in this category
All in One Data Science Bundle (360+ Courses, 50+ projects)360+ Online Courses | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (3,220 ratings)
Course Price

View Course

Related Courses
Statistical Analysis Training (10 Courses, 5+ Projects)

5. Evaluation

This one of the most necessary processes in data mining. It includes going through every aspect of the process to check for any possible fault or data leakage. Also, new business requirements could be raised due to the new patterns discovered.

6. Deployment

It means to present the knowledge so that the stakeholders can use it when they want it. In our above example, it was found that international calls were less on Wednesday, so this information was presented to the stakeholders who used this information to their advantage and increased their profits.

Techniques used in Data Mining

The techniques used in data mining areas listed below:

Cluster Analysis

Cluster Analysis enables to identify of a given user group according to standard features in a database. These features could include age, geographic location, education level and so on.

Cluster Analysis Technique in Data mining

Anomaly Detection

It is used to determine when something is noticeably different from the regular pattern. It is used to eliminate any database inconsistencies or anomalies at the source.

Anomaly Detection

Regression Analysis

This technique is used to make predictions based on relationships within the data set. For example, one can predict a particular product’s stock rate by analyzing the past quality and taking into account the different factors that determine the stock rate. Or as shown below, if we have the data of the height and weight of different persons, then given any of size or weight, we could determine the other value.

Regression Analysis

Classification

This deals with the things which have labels on it. Note in cluster detection, the items did not have a title in it and by using data mining we had to label and form into clusters. Still, in classification, there is information existing that can be easily classified using an algorithm. An example is Email spam filters. The spam filter is provided with both relevant and spam messages (Training Data). The differences between both of them are identified, thus enabling it to classify future emails correctly.

Classification

Associative Learning

It is used to analyze which things tend to occur together either in pairs or larger groups. For example, people who care to buy lemons, buy oranges too, tend to buy bread, buy milk again and so on. So the purchases made by all the customers are analyzed, and the things which occur together are placed close by together to increase the sales. So milk is placed close to bread, lemons are placed alongside oranges and so on.

Associative

Is Data Mining Ethical?

So, I plan on a weekend trip to Goa with a friend; I search the internet for right places to visit in Goa. The next time I open the internet, I find ads about various hotels in Goa for staying.

Good thing?

Yes, the internet has helped me simplify my trip. After all, if I decide to visit Goa, I would need to sleep somewhere and an ad showing me a hotel is much more useful than an ad showing me random clothes to buy.

Bad Thing?

Yes! Why would a data mining company I have never heard of before, know where I am going on a vacation. What if I haven’t told anyone about this trip, but here the internet suddenly knows I am going there. The truth is, the business model of the data mining company depends on this. They collect this data via cookies and scripts, then they sell it to advertisers who, in turn, try to sell me something else (In this case, a hotel room).

So it could be good or bad depending on the way we look at it. Also, we could always switch off the cookies or go incognito in the above case. Though whatever be the case, one thing is for sure. Data mining is here to stay.

Recommended Articles

This is a guide to Introduction to Data Mining. Here we discuss steps and techniques in Data Mining along with a respective example. You may also look at the following articles to learn more –

  1. Data Mining Interview Questions
  2. Predictive Analytics vs Data Mining
  3. Introduction To Data Science
  4. What is Regression Analysis?

All in One Data Science Bundle (360+ Courses, 50+ projects)

360+ Online Courses

1500+ Hours

Verifiable Certificates

Lifetime Access

Learn More

0 Shares
Share
Tweet
Share
Primary Sidebar
Data Mining Tutorial
  • Data Mining Basics
    • Introduction To Data Mining
    • What Is Data Mining
    • Advantages of Data Mining
    • Types of Data Mining
    • Data Mining Algorithms
    • Data Mining Applications
    • Data Mining Architecture
    • Data Mining Methods
    • Data Mining Process
    • Data Mining Software
    • Data Mining Tool
    • Data Mining Techniques
    • Data Mining Concepts and Techniques
    • Data Mining Techniques for Business
    • Orange Data Mining
    • Decision Tree in Data Mining
    • Types of Clustering
    • What is Clustering in Data Mining
    • Hierarchical Clustering
    • A Definitive Guide on How Text Mining Works
    • What is Text Mining?
    • Data Mining Interview Question
    • Models in Data Mining
    • Decision Tree in Data Mining
    • Data Mining Cluster Analysis

Related Courses

Statistical Analysis Course

All in One Data Science Certification Course

Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

© 2020 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA Login

Forgot Password?

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you
Book Your One Instructor : One Learner Free Class

Let’s Get Started

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

Special Offer - All in One Data Science Bundle (360+ Courses, 50+ projects) Learn More