You want to identify the nth largest or smallest item in a group using R. For example, to filter out the two rows in the table below:
Any time there is some by-group processing, I almost always stick with the
dplyr library because of it’s so-called window operations. Below are a few techniques:
Let’s say our data frame is named stuff.
Solution 1: Simply get the min/max
group_by(stuff, type) %>% filter(weight == max(weight))
type name weight 1 Fruits Mangoes 19 2 Vegetables Brussel Sprouts 20
This gets right to the point. We set the data frame up for a grouped operation using
group_by(). Then we filter the row(s) where weight is equal to the max weight. Because of the group_by, we are looking at max(weight) within each different type.
Solution 2: More flexible if needed
Perhaps we don’t need the smallest or largest within a group, but the 3rd smallest or the top 5 within each group. In that case we can use this more flexible approach:
group_by(stuff, type) %>% mutate(rank = rank(desc(weight))) %>% arrange(rank)
type name weight rank 1 Fruits Mangoes 19 1.0 2 Fruits Bananas 18 2.5 3 Fruits Watermelons 18 2.5 4 Fruits Pineapples 10 4.0 5 Fruits Apples 9 5.0 6 Fruits Cantaloupes 5 6.0 7 Fruits Oranges 4 7.0 8 Vegetables Brussel Sprouts 20 1.0 9 Vegetables Spinach 15 2.0 10 Vegetables Asparagus 11 3.0 11 Vegetables Mushrooms 8 4.0 12 Vegetables Cabbage 4 5.0
Here we created a new column using the
rank() function. Now we can filter what we’d like from here. E.g.,
filter(rank <= 3) will get you the top 3 within each group. Note the
rank() function has a few arguments, like
ties.method to handle ties (notice Bananas and Watermelons are tied).