# [ACCEPTED]-Fastest way to find *the index* of the second (third...) highest/lowest value in vector or column-r

Score: 11

One possible route is to use the `index.return` argument 1 to `sort`. I'm not sure if this is fastest though.

``````set.seed(21)
x <- rnorm(10)
ind <- 2
sapply(sort(x, index.return=TRUE), `[`, length(x)-ind+1)
#        x       ix
# 1.746222 3.000000
``````
Score: 10

EDIT 2 :

As Joshua pointed out, none of the 8 given solutions actually performs correct 7 when you have a tie on the maxima, so :

``````X <- c(11:19,19)

n <- length(unique(X))
which(X == sort(unique(X),partial=n-1)[n-1])
``````

fastest 6 way of doing it correctly then. I deleted 5 the order way, as that one doesn't work 4 and is a lot slower, so not a good answer 3 according to OP.

To point to the issue we 2 ran into :

``````> X <- c(11:19,19)
> n <- length(X)
> which(X == sort(X,partial=n-1)[n-1])
  9 10 #which is the indices of the double maximum 19

> n <- length(unique(X))
> which(X == sort(unique(X),partial=n-1)[n-1])
 8 # which is the correct index of 18
``````

The timings of the valid solutions 1 :

``````> x <- runif(1000000)

> ind <- 2

> n <- length(unique(x))

> system.time(which(x == sort(unique(x),partial=n-ind+1)[n-ind+1]))
user  system elapsed
0.11    0.00    0.11

> system.time(sapply(sort(unique(x), index.return=TRUE), `[`, n-ind+1))
user  system elapsed
0.69    0.00    0.69
``````
Score: 8

library Rfast has implemented the nth element 16 function with return index option.

UPDATE (28/FEB/21) package 15 kit offers a faster implementation (topn) as 14 shown in the simulations below.

``````x <- runif(1e+6)

n <- 2

which_nth_highest_richie <- function(x, n)
{
for(i in seq_len(n - 1L)) x[x == max(x)] <- -Inf
which(x == max(x))
}

which_nth_highest_joris <- function(x, n)
{
ux <- unique(x)
nux <- length(ux)
which(x == sort(ux, partial = nux - n + 1)[nux - n + 1])
}

microbenchmark::microbenchmark(
topn = kit::topn(x, n,decreasing = T)[n],
Rfast = Rfast::nth(x,n,descending = T,index.return = T),
order = order(x, decreasing = TRUE)[n],
richie = which_nth_highest_richie(x,n),
joris = which_nth_highest_joris(x,n))

Unit: milliseconds
expr    min     lq      mean     median       uq      max   neval
topn  3.741101 3.7917  4.517201  4.060752  5.108901 7.403901 100
Rfast  15.8121 16.7586  20.64204  17.73010  20.7083  47.6832  100
order 110.5416 113.4774 120.45807 116.84005 121.2291 164.5618 100
richie  22.7846 24.1552  39.35303  27.10075  42.0132  179.289  100
joris 131.7838 140.4611 158.20704 156.61610 165.1735 243.9258 100
``````

Topn is the 13 clear winner in finding the index of the 12 2nd biggest value in 1 million numbers.

Futher, simulations 11 where run to estimate running times of finding 10 the nth biggest number for varying n. Variable 9 x was repopulated for each n but it's size 8 was always 1 million numbers. As shown topn 7 is the best option for finding the nth biggest 6 element and it's index, given that n is 5 not too big. In the plot we can observe 4 that topn becomes slower than Rfast's nth 3 for bigger n. It is worthy to note that 2 topn has not been implemented for n > 1000 1 and will throw an error in such cases.

Score: 4

Method: Set all max values to `-Inf`, then find 8 the indices of the max. No sorting required.

``````X <- runif(1e7)
system.time(
{
X[X == max(X)] <- -Inf
which(X == max(X))
})
``````

Works 7 with ties and is very fast.

If you can guarantee 6 no ties, then an even faster version is

``````system.time(
{
X[which.max(X)] <- -Inf
which.max(X)
})
``````

EDIT: As 5 Joris mentioned, this method doesn't scale 4 that well for finding third, fourth, etc., highest 3 values.

``````which_nth_highest_richie <- function(x, n)
{
for(i in seq_len(n - 1L)) x[x == max(x)] <- -Inf
which(x == max(x))
}

which_nth_highest_joris <- function(x, n)
{
ux <- unique(x)
nux <- length(ux)
which(x == sort(ux, partial = nux - n + 1)[nux - n + 1])
}
``````

Using `x <- runif(1e7)` and `n = 2`, Richie wins

``````system.time(which_nth_highest_richie(x, 2))   #about half a second
``````

For `n = 100`, Joris 2 wins

``````system.time(which_nth_highest_richie(x, 100)) #about 20 seconds, ouch!
system.time(which_nth_highest_joris(x, 100))  #still about 2 seconds
``````

The balance point, where they take the 1 same length of time, is about `n = 10`.

Score: 3

No ties `which()` is probably your friend here. Combine 17 the output from the `sort()` solution with `which()` to find 16 the index that matches the output from the 15 `sort()` step.

``````> set.seed(1)
> x <- sample(1000, 250)
> sort(x,partial=n-1)[n-1]
 992
> which(x == sort(x,partial=n-1)[n-1])
 145
``````

Ties handling The solution above doesn't work properly 14 (and wasn't intended to) if there are ties 13 and the ties are the values that are the 12 ith largest or larger values. We need to 11 take the unique values of the vector before 10 sorting those values and then the above 9 solution works:

``````> set.seed(1)
> x <- sample(1000, 1000, replace = TRUE)
> length(unique(x))
 639
> n <- length(x)
> i <- which(x == sort(x,partial=n-1)[n-1])
> sum(x > x[i])
 0
> x.uni <- unique(x)
> n.uni <- length(x.uni)
> i <- which(x == sort(x.uni, partial = n.uni-1)[n.uni-1])
> sum(x > x[i])
 2
> tail(sort(x))
  994  996  997  997 1000 1000
``````

`order()` is also very useful here:

``````> head(ord <- order(x, decreasing = TRUE))
 220 145 209 202 211 163
``````

So 8 the solution here is `ord` for the index of the 7 2nd highest/largest element of `x`.

Some timings:

``````> set.seed(1)
> X <- sample(1e7, 1e7)
> system.time({n <- length(X); which(X == sort(X, partial = n-1)[n-1])})
user  system elapsed
0.319   0.058   0.378
> system.time({ord <- order(X, decreasing = TRUE); ord})
user  system elapsed
14.578   0.084  14.708
> system.time({order(X, decreasing = TRUE)})
user  system elapsed
14.647   0.084  14.779
``````

But 6 as the linked post was getting at and the 5 timings above show, `order()` is much slower, but 4 both provide the same results:

``````> all.equal(which(X == sort(X, partial = n-1)[n-1]),
+           order(X, decreasing = TRUE))
 TRUE
``````

And for the 3 ties-handling version:

``````foo <- function(x, i) {
X <- unique(x)
N <- length(X)
i <- i-1
which(x == sort(X, partial = N-i)[N-i])
}

> system.time(foo(X, 2))
user  system elapsed
1.249   0.176   1.454
``````

So the extra steps 2 slow this solution down a bit, but it is 1 still very competitive with `order()`.

Score: 1

Use maxN function given by Zach to find the next max value and 5 use which() with arr.ind = TRUE.

which(x 4 == maxN(x, 4), arr.ind = TRUE)

Using arr.ind 3 will return index position in any of the 2 above solutions as well and simplify the 1 code.

Score: 0

This is my solution for finding the index 3 of the top N highest values in a vector (not exactly what 2 the OP wanted, but this might help other 1 people)

``````index.top.N = function(xs, N=10){
if(length(xs) > 0) {
o = order(xs, na.last=FALSE)
o.length = length(o)
if (N > o.length) N = o.length
o[((o.length-N+1):o.length)]
}
else {
0
}
}
``````

More Related questions