EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login

Nearest Neighbors Algorithm

By Priya PedamkarPriya Pedamkar

Home » Data Science » Data Science Tutorials » Machine Learning Tutorial » Nearest Neighbors Algorithm

Nearest Neighbors Algorithm

Introduction to Nearest Neighbors Algorithm

K Nearest Neighbor (KNN) algorithm is basically a classification algorithm in Machine Learning which belongs to the supervised learning category. However, it can be used in regression problems as well. KNN algorithms have been used since 1970 in many applications like pattern recognition, data mining, statistical estimation, and intrusion detection, and many more. It is widely disposable in real-life scenarios since it is non-parametric, i.e., it does not make any underlying assumptions about the distribution of data.

Classification of Nearest Neighbors Algorithm

KNN under classification problem basically classifies the whole data into training data and test sample data. The distance between training points and sample points is evaluated, and the point with the lowest distance is said to be the nearest neighbor. KNN algorithm predicts the result on the basis of the majority.

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

Classification of Nearest Neighbors Algorithm

Let us try to understand the idea behind KNN with the help of an example given below:

  • Let us take three different classes, and we need to find a class label for the unknown data xu.
  • In this case, let us find the Euclidean distance and k as 5 nearest neighbors.
  • From the above figure, we can observe that among the 5 closest neighbors, 4 belong to class ω1 and 1 belongs to class ω3, so xu is assigned to ω1.

The basic KNN algorithm stores all the examples in the training set, creating high storage requirements (and computational cost). However, the entire training set need not be stored as the examples may contain information that is highly redundant. Most of the time almost all of the information that is relevant for classification purposes is located around the decision boundaries.

How to Implement the Nearest Neighbors Algorithm?

In K-NN whole data is classified into training and test sample data. In a classification problem, k nearest algorithm is implemented using the following steps.

  1. Pick a value for k, where k is the number of training examples in the feature space.
  2. Calculate the distance of unknown data points from all the training examples.
  3. Search for the k observations in the training data that are nearest to the measurements of the unknown data point.
  4. Calculate the distance between the unknown data point and the training data.
  5. The training data which is having the smallest value will be declared as the nearest neighbor.

In the KNN-regression problem, the only difference is that the distance between training points and sample points is evaluated and the point with the lowest average distance is declared as the nearest neighbor. It predicts the result on the basis of the average of the total sum.

Popular Course in this category
Sale
All in One Data Science Bundle (360+ Courses, 50+ projects)360+ Online Courses | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (3,220 ratings)
Course Price

View Course

Related Courses
Machine Learning Training (19 Courses, 29+ Projects)Deep Learning Training (15 Courses, 24+ Projects)Artificial Intelligence Training (5 Courses, 2 Project)

How to Choose the K Value?

Hence the value of k is chosen properly according to the need.

  • If k is chosen large it will be less sensitive to noise and hence performance increases.
  • If k is chosen small it will be able to capture fine structures if exist in the feature space.
  • if k is too small it may lead to overfitting i.e. algorithm performs excellently on the training set and its performance degrades on unseen test data

Distance Metrics

The distance can be calculated in the following ways:

a. Euclidian distance,

Nearest Neighbors Algorithm distance

b. Manhattan distance,

manhatten

c. Weighted distance,

weighted

How to Choose the Weights?

For both classification and regression problems, the weighted distance method can be used to calculate the distance. Weights are assigned which signifies the contributions of the neighbors so that the nearer neighbors are assigned more weights showing more contribution than the average.

Weights can be chosen as :

  • wi = 1/ k
  • wi ∼ 1 − ||xi − xo ||
  • wi ∼ k − rank ||xi − xo ||

Improvement

1. The first technique states that by providing different weights to the nearest neighbor improvement in the prediction can be achieved. In such cases, important attributes are given larger weights and less important attributes are given smaller weights.

2. There are two classical algorithms that can improve the speed of the nearest neighbor search.

Example:

Nearest Neighbors Algorithm 3

We have given a set of N points in D-dimensional space and an unlabeled example q. We need to find the point that minimizes the distance to q. The KNN approach becomes impractical for large values of N and D.

There are two classical algorithms that speed up the nearest neighbor search.

1. Bucketing: In the Bucketing algorithm, space is divided into identical cells and for each cell, the data points inside it are stored in a list n The cells are examined in order of increasing distance from the point q and for each cell, the distance is computed between its internal data points and the point q. The search terminates when the distance from the point q to the cell exceeds the distance to the closest point already visited

2. k-d trees: A k-d tree is a generalization of a binary search tree in high dimensions. Here, each internal node in a k-d tree is associated with a hyper-rectangle and a hyperplane orthogonal to one of the coordinate axis. The hyper-plane splits the hyper-rectangle into two parts, which are associated with the child nodes. The partitioning process goes on until the number of data points in the hyper-rectangle falls below some given threshold n.

Advantages and Drawbacks of Nearest Neighbors Algorithm

Here are the advantages and drawbacks of the nearest neighbors algorithm:

Advantages

Let us see the advantages of this algorithm:

  • It is very simple to understand and implement.
  • Robust to noisy data.
  • The decision boundaries can be of arbitrary shapes.
  • It requires only a few parameters to be tuned.
  • K-NN classifier can be updated at a very little cost.

Drawbacks

Let us see the drawbacks of this algorithm:

  • K-NN is computationally expensive.
  • It is a lazy learner i.e. it uses all the training data at the runtime and hence is slow.
  • Complexity is O(n) for each instance to be classified.
  • Curse of dimensionality: distance can be dominated by irrelevant attributes.

Conclusion

With increasing K, we get smoother, more defined boundaries across different classifications. Also, the accuracy of the above classifier increases as we increase the number of data points in the training set.

Recommended Articles

This is a guide to the Nearest Neighbors Algorithm. Here we discuss the classification and implementation of the Nearest Neighbors algorithm along with its advantages & drawbacks. You may also look at the following articles to learn more –

  1. Use of MD5 Algorithm
  2. Understanding K- Means Clustering Algorithm
  3. Understand Reinforcement Learning
  4. Example of Algorithms with Steps

All in One Data Science Bundle (360+ Courses, 50+ projects)

360+ Online Courses

1500+ Hours

Verifiable Certificates

Lifetime Access

Learn More

0 Shares
Share
Tweet
Share
Primary Sidebar
Machine Learning Tutorial
  • Algorithms
    • Machine Learning Algorithms
    • Apriori Algorithm in Machine Learning
    • Types of Machine Learning Algorithms
    • Bayes Theorem
    • AdaBoost Algorithm
    • Classification Algorithms
    • Clustering Algorithm
    • Gradient Boosting Algorithm
    • Mean Shift Algorithm
    • Hierarchical Clustering Algorithm
    • Hierarchical Clustering Agglomerative
    • What is a Greedy Algorithm?
    • What is Genetic Algorithm?
    • Random Forest Algorithm
    • Nearest Neighbors Algorithm
    • Weak Law of Large Numbers
    • Ray Tracing Algorithm
    • SVM Algorithm
    • Naive Bayes Algorithm
    • Neural Network Algorithms
    • Boosting Algorithm
    • XGBoost Algorithm
    • Pattern Searching
    • Loss Functions in Machine Learning
    • Decision Tree in Machine Learning
    • Hyperparameter Machine Learning
    • Unsupervised Machine Learning
    • K- Means Clustering Algorithm
    • KNN Algorithm
    • Monty Hall Problem
  • Basic
    • Introduction To Machine Learning
    • What is Machine Learning?
    • Uses of Machine Learning
    • Applications of Machine Learning
    • Naive Bayes in Machine Learning
    • Dataset Labelling
    • DataSet Example
    • Dataset ZFS
    • Careers in Machine Learning
    • What is Machine Cycle?
    • Machine Learning Feature
    • Machine Learning Programming Languages
    • What is Kernel in Machine Learning
    • Machine Learning Tools
    • Machine Learning Models
    • Machine Learning Platform
    • Machine Learning Libraries
    • Machine Learning Life Cycle
    • Machine Learning System
    • Machine Learning Datasets
    • Top 7 Useful Benefits Of Machine Learning Certifications
    • Machine Learning Python vs R
    • Optimization for Machine Learning
    • Types of Machine Learning
    • Machine Learning Methods
    • Machine Learning Software
    • Machine Learning Techniques
    • Machine Learning Feature Selection
    • Ensemble Methods in Machine Learning
    • Support Vector Machine in Machine Learning
    • Decision Making Techniques
    • Restricted Boltzmann Machine
    • Regularization Machine Learning
    • What is Regression?
    • What is Linear Regression?
    • Dataset for Linear Regression
    • Decision tree limitations
    • What is Decision Tree?
    • What is Random Forest
  • Supervised
    • What is Supervised Learning
    • Supervised Machine Learning
    • Supervised Machine Learning Algorithms
    • Perceptron Learning Algorithm
    • Simple Linear Regression
    • Polynomial Regression
    • Multivariate Regression
    • Regression in Machine Learning
    • Hierarchical Clustering Analysis
    • Linear Regression Analysis
    • Support Vector Regression
    • Multiple Linear Regression
    • Linear Algebra in Machine Learning
    • Statistics for Machine Learning
    • What is Regression Analysis?
    • Clustering Methods
    • Backward Elimination
    • Ensemble Techniques
    • Bagging and Boosting
    • Linear Regression Modeling
    • What is Reinforcement Learning
  • Classification
    • Kernel Methods in Machine Learning
    • Clustering in Machine Learning
    • Machine Learning Architecture
    • Automation Anywhere Architecture
    • Machine Learning C++ Library
    • Machine Learning Frameworks
    • Data Preprocessing in Machine Learning
    • Data Science Machine Learning
    • Classification of Neural Network
    • Neural Network Machine Learning
    • What is Convolutional Neural Network?
    • Single Layer Neural Network
    • Kernel Methods
    • Forward and Backward Chaining
    • Forward Chaining
    • Backward Chaining
  • Deep Learning
    • What Is Deep learning
    • Overviews Deep Learning
    • Application of Deep Learning
    • Careers in Deep Learnings
    • Deep Learning Frameworks
    • Deep Learning Model
    • Deep Learning Algorithms
    • Deep Learning Technique
    • Deep Learning Networks
    • Deep Learning Libraries
    • Deep Learning Toolbox
    • Types of Neural Networks
    • Convolutional Neural Networks
    • Create Decision Tree
    • Deep Learning for NLP
    • Caffe Deep Learning
    • Deep Learning with TensorFlow
  • RPA
    • What is RPA
    • What is Robotics?
    • Benefits of RPA
    • RPA Applications
    • Types of Robots
    • RPA Tools
    • Line Follower Robot
    • What is Blue Prism?
    • RPA vs BPM
  • PyTorch
    • PyTorch Tensors
    • What is PyTorch?
    • PyTorch MSELoss()
    • PyTorch NLLLOSS
    • PyTorch MaxPool2d
    • PyTorch Pretrained Models
    • PyTorch Squeeze
    • PyTorch Reinforcement Learning
    • PyTorch zero_grad
    • PyTorch norm
    • PyTorch VAE
    • PyTorch Early Stopping
    • PyTorch requires_grad
    • PyTorch MNIST
    • PyTorch Conv2d
    • Dataset Pytorch
    • PyTorch tanh
    • PyTorch bmm
    • PyTorch profiler
    • PyTorch unsqueeze
    • PyTorch adam
    • PyTorch backward
    • PyTorch concatenate
    • PyTorch Embedding
    • PyTorch Tensor to NumPy
    • PyTorch Normalize
    • PyTorch ReLU
    • PyTorch Autograd
    • PyTorch Transpose
    • PyTorch Object Detection
    • PyTorch Autoencoder
    • PyTorch Loss
    • PyTorch repeat
    • PyTorch gather
    • PyTorch sequential
    • PyTorch U-NET
    • PyTorch Sigmoid
    • PyTorch Neural Network
    • PyTorch Quantization
    • PyTorch Ignite
    • PyTorch Versions
    • PyTorch TensorBoard
    • PyTorch Dropout
    • PyTorch Model
    • PyTorch optimizer
    • PyTorch ResNet
    • PyTorch CNN
    • PyTorch Detach
    • Single Layer Perceptron
    • PyTorch vs Keras
    • torch.nn Module
  • UiPath
    • What is UiPath
    • UiPath Action Center
    • UiPath?Orchestrator
    • UiPath web automation
    • UiPath Orchestrator API
    • UiPath Delay
    • UiPath Careers
    • UiPath Architecture
    • UiPath version
    • Uipath Reframework
    • UiPath Studio
  • Interview Questions
    • Deep Learning Interview Questions And Answer
    • Machine Learning Cheat Sheet

Related Courses

Machine Learning Training

Deep Learning Training

Artificial Intelligence Training

Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Special Offer - All in One Data Science Bundle (360+ Courses, 50+ projects) Learn More