You want to identify the nth largest or smallest item in a group using R. For example, to filter out the two rows in the table below:

Any time there is some by-group processing, I almost always stick with the dplyr
library because of it’s so-called window operations. Below are a few techniques:
Let’s say our data frame is named stuff.
Solution 1: Simply get the min/max
group_by(stuff, type) %>%
filter(weight == max(weight))
Result:
type name weight 1 Fruits Mangoes 19 2 Vegetables Brussel Sprouts 20
This gets right to the point. We set the data frame up for a grouped operation using group_by()
. Then we filter the row(s) where weight is equal to the max weight. Because of the group_by, we are looking at max(weight) within each different type.
Solution 2: More flexible if needed
Perhaps we don’t need the smallest or largest within a group, but the 3rd smallest or the top 5 within each group. In that case we can use this more flexible approach:
group_by(stuff, type) %>%
mutate(rank = rank(desc(weight))) %>%
arrange(rank)
Result:
type name weight rank 1 Fruits Mangoes 19 1.0 2 Fruits Bananas 18 2.5 3 Fruits Watermelons 18 2.5 4 Fruits Pineapples 10 4.0 5 Fruits Apples 9 5.0 6 Fruits Cantaloupes 5 6.0 7 Fruits Oranges 4 7.0 8 Vegetables Brussel Sprouts 20 1.0 9 Vegetables Spinach 15 2.0 10 Vegetables Asparagus 11 3.0 11 Vegetables Mushrooms 8 4.0 12 Vegetables Cabbage 4 5.0
Here we created a new column using the rank()
function. Now we can filter what we’d like from here. E.g., filter(rank <= 3)
will get you the top 3 within each group. Note the rank()
function has a few arguments, like ties.method
to handle ties (notice Bananas and Watermelons are tied).