Updated March 21, 2023
Introduction to KNN Algorithm in R
A mechanism that is based on the concept of nearest neighbor and where k is some constant represented by a certain number in a particular context, with the algorithm embodying certain useful features such as the use of input to predict output data points, has an application to problems of various nature, focuses on feature similarity so as to classify data, handle realistic data without making any assumptions, can be effectively used in classification problems; and that R programming provides a robust mechanism for its implementation is known as KNN algorithm in R programming.
Example: Let’s suppose you want to classify a touch screen and a keypad phone. There are various factors that involve in differentiating both phones. However, the factor that differentiates both phones is the keypad. So, when we receive a data point (i.e., phone). We compare it with the similar features of the neighbor data points to classify it as a keypad or a touch phone.
Features of KNN Algorithm
Here we will study the features of the KNN Algorithm:
- KNN algorithm uses input data to predict output set data points.
- The algorithm can be applied to various sets of problems.
- Focuses on feature similarity to classify the data.
- KNN algorithm handles realistic data and doesn’t make any assumptions about the data points.
- KNN memorizes the training data set rather than being intuitive. Also, it can be said that it has a lazy approach.
- It can solve classification and regression problems.
Addressing Problems in KNN Algorithm in R
Following Addressing Problem:
1. Classification Problem
The classification problem values are discrete, just like whether you like to eat pizza with toppings or without. There is common ground. KNN Algorithm helps in solving such a problem.
2. Regression Problem
The regression problem comes into the picture when we have a dependent variable and an independent variable. Ex: BMI index. Typically, each row contains an observation or data point and an example.
The KNN Algorithm in R
Let’s look at the steps in the algorithm that is to be followed:
Step 1: Load the input data.
Step 2: Initialize K with the number of nearest neighbors.
Step 3: Calculating the data (i.e., the distance between the current and the nearest neighbor)
Step 4: Adding the distance to the current ordered data set.
Step 5: Picking up K entries and labeling them.
Step 6: Return the mean value for the regression problem.
Step 7: Return the mode value for classification problems.
Points to Remember while Implementing the KNN Algorithm
- We should make sure the K value is greater than one; it hinders in prediction to be accurate.
- The more the K value, the more accurate the prediction can be due to the majority.
- It is preferable to have K as an odd number. Otherwise, it can lead to a tie-breaker.
KNN Pseudocode
In the below formula represents variables and represents data points where (i=1,2,3….)
Set(,)
Use Cases
Following are the Use Cases in KNN Algorithm in R:
1. Comparing products and Helping in Shopping Recommendations
When we buy a laptop or computer from an online e-commerce website, we also see shopping recommendations like buying anti-virus software or speakers. All this is because when a previous customer buys a laptop, it is mostly bought along with anti-virus or speakers. Machine learning helps in e-commerce recommendations.
2. Food Recommendations
Machine learning also helps in recommendations based on previously ordered food and also suggest restaurants accordingly.
Example of the KNN Algorithm
Following are the examples of the KNN algorithm:
1. Importing Data
Let’s take the dummy data about us predicting the t-shirt size of a guy with the help of height and weight.
Height (cms) | Weight (kgs) | Size |
140 | 58 | S |
140 | 59 | S |
140 | 63 | S |
150 | 59 | M |
152 | 60 | M |
153 | 60 | M |
154 | 61 | M |
155 | 64 | M |
156 | 64 | M |
157 | 61 | M |
160 | 62 | L |
161 | 65 | L |
162 | 62 | L |
163 | 63 | L |
163 | 66 | L |
165 | 63 | L |
165 | 64 | L |
165 | 68 | L |
2. Finding the Similarities by Calculating Distance
We can use both Manhattan and Euclidean distance as the data is continuous. We calculate the distance between the new sample and training data set, then find K-closest.
Example: Let’s say ‘Raj’ has a height of 165 cms and weighs 63 Kgs. We calculate Euclidean distance by using the first observation with the new sample: SQRT ((165-140) ^2+ (63-58) ^2)
3. Finding K-nearest Neighbors
Let’s suppose K = 4; There are 4 customers in which 3 of them had medium size and 1 being large size. The best prediction is medium size suits Raj.
Difference Between KNN and K-mean
Following are the difference:
- KNN is a supervised algorithm (dependent variable), whereas K-mean is an unsupervised algorithm (no dependent variable).
- K-mean uses a clustering technique to split data points forming K-clusters.KNN uses K-nearest neighbors to classify data points and combines them.
Advantages and Disadvantages of KNN
Following are the advantages:
- KNN algorithm is versatile, can be used for classification and regression problems.
- No need for a prior model to build the KNN algorithm.
- Simple and easy to implement.
Following are the disadvantages:
- The algorithm as the number of samples increase (i.e. no of variables)
Recommended Articles
This is a guide to KNN Algorithm in R. Here; we discuss features, examples, pseudocode, steps to be followed in KNN Algorithm. You can also go through our other related articles to learn more-