EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login
Home Data Science Data Science Tutorials Machine Learning Tutorial KNN Algorithm
Secondary Sidebar
Machine Learning Tutorial
  • Algorithms
    • Machine Learning Algorithms
    • Apriori Algorithm in Machine Learning
    • Types of Machine Learning Algorithms
    • Bayes Theorem
    • AdaBoost Algorithm
    • Classification Algorithms
    • Clustering Algorithm
    • Gradient Boosting Algorithm
    • Gradient Descent in Machine Learning
    • Mean Shift Algorithm
    • Hierarchical Clustering Algorithm
    • Hierarchical Clustering Agglomerative
    • What is a Greedy Algorithm?
    • What is Genetic Algorithm?
    • Random Forest Algorithm
    • Nearest Neighbors Algorithm
    • Weak Law of Large Numbers
    • Ray Tracing Algorithm
    • SVM Algorithm
    • Naive Bayes Algorithm
    • Neural Network Algorithms
    • Boosting Algorithm
    • XGBoost Algorithm
    • Pattern Searching
    • Loss Functions in Machine Learning
    • Decision Tree in Machine Learning
    • Hyperparameter Machine Learning
    • Unsupervised Machine Learning
    • K- Means Clustering Algorithm
    • KNN Algorithm
    • Monty Hall Problem
  • Basic
    • Introduction To Machine Learning
    • What is Machine Learning?
    • Uses of Machine Learning
    • Applications of Machine Learning
    • Naive Bayes in Machine Learning
    • Dataset Labelling
    • DataSet Example
    • Deep Learning Techniques
    • Dataset ZFS
    • Careers in Machine Learning
    • What is Machine Cycle?
    • Machine Learning Feature
    • Machine Learning Programming Languages
    • What is Kernel in Machine Learning
    • Machine Learning Tools
    • Machine Learning Models
    • Machine Learning Platform
    • Machine Learning Libraries
    • Machine Learning Life Cycle
    • Machine Learning System
    • Machine Learning Datasets
    • Machine Learning Certifications
    • Machine Learning Python vs R
    • Optimization for Machine Learning
    • Types of Machine Learning
    • Machine Learning Methods
    • Machine Learning Software
    • Machine Learning Techniques
    • Machine Learning Feature Selection
    • Ensemble Methods in Machine Learning
    • Support Vector Machine in Machine Learning
    • Decision Making Techniques
    • Restricted Boltzmann Machine
    • Regularization Machine Learning
    • What is Regression?
    • What is Linear Regression?
    • Dataset for Linear Regression
    • Decision tree limitations
    • What is Decision Tree?
    • What is Random Forest
  • Supervised
    • What is Supervised Learning
    • Supervised Machine Learning
    • Supervised Machine Learning Algorithms
    • Perceptron Learning Algorithm
    • Simple Linear Regression
    • Polynomial Regression
    • Multivariate Regression
    • Regression in Machine Learning
    • Hierarchical Clustering Analysis
    • Linear Regression Analysis
    • Support Vector Regression
    • Multiple Linear Regression
    • Linear Algebra in Machine Learning
    • Statistics for Machine Learning
    • What is Regression Analysis?
    • Clustering Methods
    • Backward Elimination
    • Ensemble Techniques
    • Bagging and Boosting
    • Linear Regression Modeling
    • What is Reinforcement Learning
  • Classification
    • Kernel Methods in Machine Learning
    • Clustering in Machine Learning
    • Machine Learning Architecture
    • Automation Anywhere Architecture
    • Machine Learning C++ Library
    • Machine Learning Frameworks
    • Data Preprocessing in Machine Learning
    • Data Science Machine Learning
    • Classification of Neural Network
    • Neural Network Machine Learning
    • What is Convolutional Neural Network?
    • Single Layer Neural Network
    • Kernel Methods
    • Forward and Backward Chaining
    • Forward Chaining
    • Backward Chaining
  • Deep Learning
    • What Is Deep learning
    • Overviews Deep Learning
    • Application of Deep Learning
    • Careers in Deep Learnings
    • Deep Learning Frameworks
    • Deep Learning Model
    • Deep Learning Algorithms
    • Deep Learning Technique
    • Deep Learning Networks
    • Deep Learning Libraries
    • Deep Learning Toolbox
    • Types of Neural Networks
    • Convolutional Neural Networks
    • Create Decision Tree
    • Deep Learning for NLP
    • Caffe Deep Learning
    • Deep Learning with TensorFlow
  • RPA
    • What is RPA
    • What is Robotics?
    • Benefits of RPA
    • RPA Applications
    • Types of Robots
    • RPA Tools
    • Line Follower Robot
    • What is Blue Prism?
    • RPA vs BPM
  • Interview Questions
    • Deep Learning Interview Questions And Answer
    • Machine Learning Cheat Sheet

KNN Algorithm

By Priya PedamkarPriya Pedamkar

KNN Algorithm

Introduction to KNN Algorithm

K Nearest Neighbour’s algorithm, prominently known as KNN is the basic algorithm for machine learning. Understanding this algorithm is a very good place to start learning machine learning, as the logic behind this algorithm is incorporated in many other machine learning models. K Nearest Neighbour’s algorithm comes under the classification part in supervised learning.

What is Supervised Learning?

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

The supervised learning algorithm is a kind of algorithm where it relies on labelled input to learn and predict based on the function when unlabelled data is provided. As we have understood what supervised learning is let us see what is classification, classification algorithm gives a discrete value as an output, not continuous values.

How does the KNN Algorithm Work?

K Nearest Neighbours is a basic algorithm that stores all the available and predicts the classification of unlabelled data based on a similarity measure. In linear geometry when two parameters are plotted on the 2D Cartesian system, we identify the similarity measure by calculating the distance between the points. The same applies here, KNN algorithm works on the assumption that similar things exist in close proximity, simply we can put into the same things stay close to each other.

Example:

If we have a data set when plotted looks like this, to classify these data points K Nearest Neighbours algorithm will first identify the distance between points and see if they are similar or not.

KNN Algorithm 1-1

In Geometry according to Euclidean, distance function can be calculated by the following equation.

distance function

If K=1, then the case is simply assigned to a class of its nearest neighbour [We use “1” in almost any of the situations in mathematics, we can alter the value of K while training the models in machine learning and we will discuss this further in the article] X and Y are the values on the co-ordinate axes.

If we notice here, all the measures of distance we are getting will be continuous variables, but we need discrete values while doing classification so, we have to use hamming distance to achieve this.

measures of distance

This equation also brings us the standardisation of numerical values between 0 to 1 when there is a mixture of numerical and categorical values in the data set.

X Y Distance
With Cancer With Cancer X = Y → D = 0
Without Cancer Without Cancer X != Y → D = 1

In this way, the algorithm works and now, let’s dive into how do we choose the value of K in KNN.

Choosing K Value in KNN Algorithm

Before seeing what are the factors to consider while choosing K value we have to understand how does the value of K influence of algorithm.

Choosing K Value

These are the plots of the same data set with varying K values, K-value is 1 for the plot on the left top corner and the highest for the plot on the right bottom corner. If we examine carefully we can understand that the boundary of the classification algorithm becomes smooth as the value of K increases. That is the Value of K is directly proportional to the smoothness of the boundary. So from this, we can understand that if K value is set to 1 then the training model will overfit the data and if the K value is set to a large number then it will underfit the data. To choose an optimal value of K we need to check the validation error with multiple K – values and choose one with the minimum error.

Steps to Implement the KNN Algorithm in Python

So far here we have seen the theoretical part of the K Nearest Neighbour’s algorithm now let us see it practically by learning how to implement it in python.

Implementation of KNN in Python

Step 1: Importing Libraries

In the below, we will see Importing the libraries that we need to run KNN.

Code:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

Step 2: Importing Dataset

Here, we will see the dataset being imported.

Code:

file = "/path/to/the/dataset"
#Push dataset into Pandas dataframe
dataset = pd.read_csv(file)

Step 3: Split Dataset

Next step is to split our dataset into test and train split.

Code:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30)

Note: Data set I am using to demonstrate has been pre-processed with defining the X and Y values. If this is not done first it has to be done, because while the classification model is getting trained we have to pass labelled data for that to calculate distances.

Step 4: Training Model

Now in this step, we’re going to see a model training.

Code:

from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors=3)
classifier.fit(X_train, y_train)

Note: Here we are using K neighbours classifier imported from the module sklearn.neighbours library.

Step 5: Running Predictions

Running predictions on the test split data.

Code:

y_pred = classifier.predict(X_test)

Step 6: Check Validation

Next step is to evaluate algorithm and check the validation error, run again with different K value and consider the k value where we get the minimum validation error. This is how we can practically implement K Nearest Neighbours classifier, there are multiple ways to implement this algorithm this is just one of them and in this article, I have described very briefly the steps as our main agenda is to understand how the algorithm works.

Conclusion

As said earlier, K Nearest Neighbours algorithm is one of the simplest and easiest algorithms used for classification. Based on how it works it also comes under the “Lazy Learning Algorithm”. Generally, the K-value that everyone passes while training the model is an odd number but that is not a compulsion.

However, there are few cons as well while using KNN few of them are:

  • It doesn’t go well with the categorical data, because we cannot find the distance between two categorical features.
  • It also doesn’t work well with high dimensional data, as it will be difficult for the algorithm to calculate the distance in each dimension.

If we see currently most of the use cases in Machine Learning are surrounded by the classification algorithm at the basic level, that is how KNN is playing a major role in the machine learning world.

Recommended Articles

This is a guide to KNN Algorithm. Here we discuss the introduction and working of the K Nearest Neighbours algorithm with steps to implement the kNN algorithm in python. You may also look at the following articles to learn more-

  1. How does SVM Algorithm Works?
  2. K- Means Clustering Algorithm
  3. Types of Reinforcement Learning
  4. Complete Guide to C++ Algorithm
Popular Course in this category
Machine Learning Training (20 Courses, 29+ Projects)
  19 Online Courses |  29 Hands-on Projects |  178+ Hours |  Verifiable Certificate of Completion
4.7
Price

View Course

Related Courses

Deep Learning Training (18 Courses, 24+ Projects)4.9
Artificial Intelligence AI Training (5 Courses, 2 Project)4.8
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2023 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

By continuing above step, you agree to our Terms of Use and Privacy Policy.
*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more