Updated July 4, 2023
Introduction to Sorting in R
A mechanism provided by R programming through which elements of a vector can be arranged in a particular order, usually facilitated by but not just limited to the order() function that assists in sorting the elements either in ascending or descending order, as required, with the normal use of order() function sorting the result in ascending order(), otherwise sorting the result in descending order, is known as sorting in R.
Sorting Performed in R
There are multiple ways by which data can be sorted in R. It’s up to the data Analyst to consider the most suitable method based upon the structure of the data. This is because R language has multiple functions and ways to sort the data, such as sort (), order(), and dplyrI () package.
Things to keep in mind before sorting the data.
- Order in which data needs to be sorted ascending or descending.
- Multiple columns sorting criteria.
- Accounting missing and duplicate values during sorting. It’s up to the analyst to decide what must be done with the missing and duplicate values. Before removing or replacing null values, the overall impact on the data should be considered.
Sort() function in R
Sort function in R is used to sort a vector. By default, the value is organized in ascending order. Let’s take an example of the mark’s column of all the students in a classroom.
The syntax to sort the vector is
"sort (x, decreasing = FALSE)"
Here x refers to the vector and decreasing has to be replaced to TRUE when the sorting has to be done in descending order. The sort function is used in arranging numeric or character vector in the desired order. The major limitation of the sort function is that it cannot be used to sort a data frame. To overcome this limitation Order () function is used.
A basic sorting example using sort() function
set.seed(1)
x <- sample(1:100,10)
x
Output
[1] 68 39 1 34 87 43 14 82 59 51
sort(x)
Output
[1] 1 14 34 39 43 51 59 68 82 87Sorting data frames can be accomplished with the help of order () function. Variables can be easily sorted in either ascending or descending order however, the order function will sort the variable in ascending by default.
> df <- data.frame("Serial_number" = 1:5, "Age" = c(20, 21, 17, 18, 19), "Name" = c("Johnny","Dorian", "Linda", "Cathy", "Rick"))
>
> # Sort by age ascending order
> newdataAsc <- df[order(df$Age),]
> newdataAsc
# sorting is descending order
> newdataDsc> newdataDsc <- df[order(-df$Age),]
> newdataAsc
Please note negative sign is used in front of the Age column (-df$Age) in order to sort the Age in descending order. Alternately, the descending argument can be used in this position. The Order function is used to refer to the column index rather than the name of the column. For example in place of age the index reference of the data frame would be “1”. Keeping in mind index values begin a “0”.
In a few cases, we might need to sort the data with multiple criteria, this can be achieved in R with the help of using variable names or index numbers. In the below example I have used the mtcars dataset which is available in R studio.
df <- mtcars
> df
> # sort the dataframe by key disp
> df[with(df, order(mpg, disp)),]
> # Sort by column index values
> df[order( df[,1], df[,3] ),]
In R, an alternate way to sort the data is by using dplyr package. This package is very easy to use and reliable with accurate instructions available.
> install.packages("dplyr")
> library(dplyr)
> df <- mtcars
>
> # sort the dataframe by key disp
> arrange(mydata,disp)
Types of Sorting in R
R is equipped with multiple algorithms for performing sorting on the data. Below are the different types of sorting function. To illustrate the different types of sorting, a sample of 10 random numbers from an array is used.
1. Bubble Sort
In this algorithm, two values are compared side by side and elements swaps their position when the criteria are met. It can be either ascending or in descending order. In bubble sort pairs are formed for elements available in variable and elements are compared against each other, when one element is greater than another they swapped. The process is repeated until the last element.
> bubble_sort <- function (x,ascending = TRUE) {
+ n <- length(x)
+ if (ascending) {
+ for(i in 1:(n-1)){
+ for(j in 1:(n-i)) {
+ if(x[j+1] < x[j]) {
+ tmp <- x [j]
+ x[j] <- x[ j+ 1]
+ x[j+1] <- tmp
+ }
+ }
+ }
+ }
+ else {
+ for(i in 1:(n-1)){
+ for(j in 1:(n-i)) {
+ if(x[j+1] > x[j]) {
+ tmp <- x [j]
+ x[j] <- x[ j+ 1]
+ x[j+1] <- tmp
+ }
+ }
+ }
+ }
+ x
+ }
>
> x <-sample(1:100,10)
> example <- bubble_sort(x)
> example
Output
2. Insertion Sort
In insertion sort algorithm, sorted and unsorted elements are compared, and the unsorted element is placed in a suitable place after each iteration.
In this algorithm first element is assumed to be sorted and the second element is stored separately as a key element. The sorted element is then compared with the key. If the sorted element is greater than the key element, the places are swapped, and the key element is the first element.
> insertion_sort <- function(A){
+ for (j in 2:length(A)) {
+ key = A[j]
+ i = j - 1
+ while (i > 0 && A[i] > key) {
+ A[(i + 1)] = A[i]
+ i = i - 1
+ }
+ A[(i + 1)] = key
+ }
+ A
+ }
>
>
> # testing the insertion function
> x <-sample(1:100,10)
> example <- insertion_sort(x)
> example
Output
3. Selection Sort
The selection sorting function is a widely used sorting algorithm used in the R language. In this type of sorting the smallest element from the unsorted list is pushed to the start of the list. In the selection sorting algorithm, the smallest element from the array of the unsorted list is selected and placed at the start of the unsorted list at every iteration. For example, in a row of numbers arranged in a random sequence starting element or number is selected as a minimum. In the next step, the selected minimum number is compared with the next element or number. In case the compared element is smaller than our selected minimum, the second element becomes the minimum. This process is iterated until the last element.
> selection_sort <- function (x, ascending = TRUE) {
+ max <- length(x)
+ if (ascending) {
+ for (j in 1:(max-1)){
+ m <- x[j]
+ p <- j
+ for(k in (j+1):max) {
+ if(x[k] < m) {
+ m <- x[k]
+ p <- k
+ } ## end if
+ } ## end for k
+ x[p] <- x[j]
+ x[j] <- m
+ } ## end for j
+ } ## end ascending if
+ else {
+ for (j in 1:(max-1)){
+ m <- x[j]
+ p <- j
+ for(k in (j+1):max) {
+ if(x[k] > m) {
+ m <- x[k]
+ p <- k
+ } ## end if
+ } ## end for k
+ x[p] <- x[j]
+ x[j] <- m
+ } ## end for j
+ } ## end ascending else
+ x
+ }
>
>
> # testing the selectionsort function
> x <-sample(1:100,10)
>
> example <- selection_sort(x)
> example
Output
4. Quick Sort
Quicksort algorithm works in like divide and rule. The random element is selected as a pivot in an array and then all other elements except pivot are divided into two partitions. In the next step, all the elements which are less than and greater than the pivot are divided into two different partitions. Finally, the elements are sorted using recursion.
> # Quick sort algorithm:
> quickSort <- function(arr) {
+ # Pick a number at random.
+ mid <- sample(arr, 1)
+
+ # Place-holders for left and right values.
+ left <- c()
+ right <- c()
+
+ # Move all the smaller values to the left, bigger values to the right.
+ lapply(arr[arr != mid], function(d) {
+ if (d < mid) {
+ left <<- c(left, d)
+ }
+ else {
+ right <<- c(right, d)
+ }
+ })
+
+ if (length(left) > 1) {
+ left <- quickSort(left)
+ }
+
+ if (length(right) > 1) {
+ right <- quickSort(right)
+ }
+
+ # Finally, return the sorted values.
+ c(left, mid, right)
+ }
>
> x <-sample(1:100,10)
>
> RES <- quickSort(x)
> RES
Output
5. Merge Sort
Merge sort is very similar to quicksort however, here the array is divided into two equal halves. Merge sort algorithm has been divided into two parts a merge and a sort function. In merge sort, a list is broken down into multiple sub-lists till every sub-list consists of an individual element. Merging those sub-lists results is a sorted list.
> mmerge<-function(a,b) {
+ r<-numeric(length(a)+length(b))
+ ai<-1; bi<-1; j<-1;
+ for(j in 1:length(r)) {
+ if((ai<=length(a) && a[ai]<b[bi]) || bi>length(b)) {
+ r[j] <- a[ai]
+ ai <- ai+1
+ } else {
+ r[j] <- b[bi]
+ bi <- bi+1
+ }
+ }
+ r
+ }
> mmergesort<-function(A) {
+ if(length(A)>1) {
+ q <- ceiling(length(A)/2)
+ a <- mmergesort(A[1:q])
+ b <- mmergesort(A[(q+1):length(A)])
+ mmerge(a,b)
+ } else {
+ A
+ }
+ }
>
> x <-sample(1:100,10)
>
> RES <- mmergesort(x)
> RES
Output
6. HeapSort
Heap Sort technique is very similar to that of selection sort where the smallest element from an unsorted list is selected in each iteration and places at the start of the list. However, the heapsort technique uses tree concepts.
> heap.structure<-function(vect)
+ {
+ le=length(vect)
+ heap=vec
+ for (k in le:1)
+ {
+ heap=modify_heap(heap,k)
+ }
+ return(heap)
+ }
>
>
> modify_heap<-function(heap, rooti)
+ {
+ le=length(heap)
+ flag=0
+
+ while (rooti*2 <= le && flag==1)
+ {
+ left.i=rooti*2
+ right.i=rooti*2+2
+ flag=1
+ child=c(heap[left.i],heap[right.i])
+ child=child[!is.na(child)]
+ min.ind=which.min(child)
+ if (heap[rooti]>child[min.ind])
+ {
+ flag=1
+ heap.ind=c(left.i,right.i)[min.ind]
+
+ tmp1=heap[heap.ind]
+ heap[heap.ind]=heap[rooti]
+ heap[rooti]=tmp1
+
+ rooti=heap.ind
+ }
+ }
+ return(heap)
+ }
>
> heap_sort<-function(heap)
+ {
+ sorted.heap=NULL
+ le=length(heap)
+ while(le>0)
+ {
+ sorted.heap=c(sorted.heap,heap[1])
+ le=length(heap)
+ heap[1]=heap[le]
+ heap=heap[1:(le-1)]
+ heap=modify_heap(heap,rooti=1)
+ le=le-1
+ }
+ return(sorted.heap)
+ }
>
>
> x <- sample(1:100,10)
> heap=heap.building(x)
> heap_sort=heap_sort(heap)
> heap_sort
Output
Conclusion
In this article, we have seen different ways data can be sorted using R. We have seen how sort and order command is used to sort a data frame, further limitations of sort function over order function was shown in the article. A detailed explanation of different sorting algorithms such as bubble sort, selection sort and merge sort has been thoroughly discussed. Sorting being one of the most important steps of data analysis, have different functions for multiple needs. It’s entirely up to the data engineer to choose the most appropriate method for sorting based on the data available.
Recommended Articles
This has been a guide to Sorting in R. Here we discuss what is sorting in R, features, and types of sorting in R. You can also go through our other suggested articles to learn more –