Updated March 21, 2023

Introduction to KNN Algorithm in R

A mechanism that is based on the concept of nearest neighbor and where k is some constant represented by a certain number in a particular context, with the algorithm embodying certain useful features such as the use of input to predict output data points, has an application to problems of various nature, focuses on feature similarity so as to classify data, handle realistic data without making any assumptions, can be effectively used in classification problems; and that R programming provides a robust mechanism for its implementation is known as KNN algorithm in R programming.

Example: Let’s suppose you want to classify a touch screen and a keypad phone. There are various factors that involve in differentiating both phones. However, the factor that differentiates both phones is the keypad. So, when we receive a data point (i.e., phone). We compare it with the similar features of the neighbor data points to classify it as a keypad or a touch phone.

Features of KNN Algorithm

Here we will study the features of the KNN Algorithm:

KNN algorithm uses input data to predict output set data points.
The algorithm can be applied to various sets of problems.
Focuses on feature similarity to classify the data.
KNN algorithm handles realistic data and doesn’t make any assumptions about the data points.
KNN memorizes the training data set rather than being intuitive. Also, it can be said that it has a lazy approach.
It can solve classification and regression problems.

Addressing Problems in KNN Algorithm in R

Following Addressing Problem:

1. Classification Problem

The classification problem values are discrete, just like whether you like to eat pizza with toppings or without. There is common ground. KNN Algorithm helps in solving such a problem.

2. Regression Problem

The regression problem comes into the picture when we have a dependent variable and an independent variable. Ex: BMI index. Typically, each row contains an observation or data point and an example.

The KNN Algorithm in R

Let’s look at the steps in the algorithm that is to be followed:

Step 1: Load the input data.

Step 2: Initialize K with the number of nearest neighbors.

Step 3: Calculating the data (i.e., the distance between the current and the nearest neighbor)

Step 4: Adding the distance to the current ordered data set.

Step 5: Picking up K entries and labeling them.

Step 6: Return the mean value for the regression problem.

Step 7: Return the mode value for classification problems.

Points to Remember while Implementing the KNN Algorithm

We should make sure the K value is greater than one; it hinders in prediction to be accurate.
The more the K value, the more accurate the prediction can be due to the majority.
It is preferable to have K as an odd number. Otherwise, it can lead to a tie-breaker.

KNN Pseudocode

In the below formula represents variables and represents data points where (i=1,2,3….)

Set(,)

Use Cases

Following are the Use Cases in KNN Algorithm in R:

1. Comparing products and Helping in Shopping Recommendations

When we buy a laptop or computer from an online e-commerce website, we also see shopping recommendations like buying anti-virus software or speakers. All this is because when a previous customer buys a laptop, it is mostly bought along with anti-virus or speakers. Machine learning helps in e-commerce recommendations.

2. Food Recommendations

Machine learning also helps in recommendations based on previously ordered food and also suggest restaurants accordingly.

Example of the KNN Algorithm

Following are the examples of the KNN algorithm:

1. Importing Data

Let’s take the dummy data about us predicting the t-shirt size of a guy with the help of height and weight.

Height (cms)	Weight (kgs)	Size
140	58	S
140	59	S
140	63	S
150	59	M
152	60	M
153	60	M
154	61	M
155	64	M
156	64	M
157	61	M
160	62	L
161	65	L
162	62	L
163	63	L
163	66	L
165	63	L
165	64	L
165	68	L

2. Finding the Similarities by Calculating Distance

We can use both Manhattan and Euclidean distance as the data is continuous. We calculate the distance between the new sample and training data set, then find K-closest.

Example: Let’s say ‘Raj’ has a height of 165 cms and weighs 63 Kgs. We calculate Euclidean distance by using the first observation with the new sample: SQRT ((165-140) ^2+ (63-58) ^2)

3. Finding K-nearest Neighbors

Let’s suppose K = 4; There are 4 customers in which 3 of them had medium size and 1 being large size. The best prediction is medium size suits Raj.

Difference Between KNN and K-mean

Following are the difference:

KNN is a supervised algorithm (dependent variable), whereas K-mean is an unsupervised algorithm (no dependent variable).
K-mean uses a clustering technique to split data points forming K-clusters.KNN uses K-nearest neighbors to classify data points and combines them.

Advantages and Disadvantages of KNN

Following are the advantages:

KNN algorithm is versatile, can be used for classification and regression problems.
No need for a prior model to build the KNN algorithm.
Simple and easy to implement.

Following are the disadvantages:

The algorithm as the number of samples increase (i.e. no of variables)

Quiz Result
Total Questions	Correct Answers	Wrong Answers	Percentage