Sorting Numbers in R
Basics
R has a sort
function that takes a vector as an argument and returns a new vector containing the sorted elements in increasing order. An elementary mistake is to try to sort a group of scalars:
sort(1, 7, 4)
This will return an error. The first argument to sort
needs to be a vector holding all the numbers you want sorted. This code
sort(c(1, 7, 4))
returns
[1] 1 4 7
If you want the output in decreasing order, pass decreasing=TRUE
. Since decreasing
is the second argument, the following are equivalent:
sort(c(1, 7, 4), TRUE)
sort(c(1, 7, 4), decreasing=TRUE)
NA Values
It’s easy to fall into the trap of assuming all values in your R objects are numbers. As a numerical language, R was built with an assumption that some values might be missing. You must therefore consider the possibility that you’ll be sorting a vector with NA values. Let’s see what happens if you try to sort a vector in that case. There is no “natural” or “obvious” solution, so the only option is to understand the inner workings of your language. Run this code:
x <- c(1, NA, 7, NA, 4)
length(x)
y <- sort(x)
y
length(y)
Here’s the output:
> length(x)
[1] 5
> y
[1] 1 4 7
> length(y)
[1] 3
The first important thing to note is that missing values are treated as elements. You can tell that from the length attribute. That makes sense - how else would you deal with the representation of a time series that had some missing values?
When you print out y
, you see that the NA
values were dropped before sorting. You can confirm that they really were dropped, as opposed to not printed, by checking the length of y
.
That’s the default handling of NA
values, but since that’s not the only reasonable behavior, you can set argument na.last=TRUE
to put the NA
values at the end of the sorted vector, or na.last=FALSE
to put them at the top. The default value of na.last
is, possibly ironically, NA
.
sort(x, na.last=TRUE)
sort(x, na.last=FALSE)
It’s left as an exercise for the reader to confirm that the length of either of those vectors is 5.
Sorting a Matrix
It may occasionally be useful to sort all values of a matrix. Maybe you have unemployment rates over time and across states. If you want to find the 25 highest and 25 lowest unemployment rates ever observed in any state, you’d do a sort of the matrix. Since the sort
function needs a vector, it’s converted to a vector first. Once that happens, all the discussion above applies.
Ordering
You may sometimes want the order of the indexes rather than a sorted vector. Suppose you have these student names and test scores:
students <- c("Eric", "Ginger", "Mindy", "Tom")
scores <- c(52, 88, 47, 29)
You can sort the names of the students according to score like this:
students[order(scores)]
or in decreasing order:
students[order(scores, decreasing=TRUE)]
n Largest Values
If you want, say, the 10 largest values, all you have to do is apply a subscript:
y <- 1:25
sort(y, TRUE)[1:10]