Sorting Numbers in R

Basics

R has a sort function that takes a vector as an argument and returns a new vector containing the sorted elements in increasing order. An elementary mistake is to try to sort a group of scalars:

sort(1, 7, 4)

This will return an error. The first argument to sort needs to be a vector holding all the numbers you want sorted. This code

sort(c(1, 7, 4))

returns

[1] 1 4 7

If you want the output in decreasing order, pass decreasing=TRUE. Since decreasing is the second argument, the following are equivalent:

sort(c(1, 7, 4), TRUE)
sort(c(1, 7, 4), decreasing=TRUE)

NA Values

It’s easy to fall into the trap of assuming all values in your R objects are numbers. As a numerical language, R was built with an assumption that some values might be missing. You must therefore consider the possibility that you’ll be sorting a vector with NA values. Let’s see what happens if you try to sort a vector in that case. There is no “natural” or “obvious” solution, so the only option is to understand the inner workings of your language. Run this code:

x <- c(1, NA, 7, NA, 4)
length(x)
y <- sort(x)
y
length(y)

Here’s the output:

> length(x)
[1] 5

> y
[1] 1 4 7

> length(y)
[1] 3

The first important thing to note is that missing values are treated as elements. You can tell that from the length attribute. That makes sense - how else would you deal with the representation of a time series that had some missing values?

When you print out y, you see that the NA values were dropped before sorting. You can confirm that they really were dropped, as opposed to not printed, by checking the length of y.

That’s the default handling of NA values, but since that’s not the only reasonable behavior, you can set argument na.last=TRUE to put the NA values at the end of the sorted vector, or na.last=FALSE to put them at the top. The default value of na.last is, possibly ironically, NA.

sort(x, na.last=TRUE)
sort(x, na.last=FALSE)

It’s left as an exercise for the reader to confirm that the length of either of those vectors is 5.

Sorting a Matrix

It may occasionally be useful to sort all values of a matrix. Maybe you have unemployment rates over time and across states. If you want to find the 25 highest and 25 lowest unemployment rates ever observed in any state, you’d do a sort of the matrix. Since the sort function needs a vector, it’s converted to a vector first. Once that happens, all the discussion above applies.

Ordering

You may sometimes want the order of the indexes rather than a sorted vector. Suppose you have these student names and test scores:

students <- c("Eric", "Ginger", "Mindy", "Tom")
scores <- c(52, 88, 47, 29)

You can sort the names of the students according to score like this:

students[order(scores)]

or in decreasing order:

students[order(scores, decreasing=TRUE)]

n Largest Values

If you want, say, the 10 largest values, all you have to do is apply a subscript:

y <- 1:25
sort(y, TRUE)[1:10]