You want to calculate the percent of row as shown in the tables below, or as you would in a PivotTable:
In R, any time you have to deal with row or column totals in some way, the apply()
function is often the way to go. In this case we start by calculating the totals column:
Let’s say our data frame is named fruits.
total_col = apply(fruits[,-1], 1, sum)
Result
[1] 69 50 33 17 67
The apply function can run an operation (in this case sum
) across all rows or all columns. In this case we set the second argument to 1, which represents running the operation across each row. And as you can see, the result is a vector of five numbers, one for each row. And it matches the totals column in the table above.
Two ways to calculate percent of row in R
It’s good to now the manual way so that you can grow to code your own automated way instead. The automated way is more scalable and less prone to mistakes:
Here is one way using the standard dollar sign referencing:
fruits$week1_pct = fruits$week1 / total_col fruits$week2_pct = fruits$week2 / total_col fruits$week3_pct = fruits$week3 / total_col fruits$week4_pct = fruits$week4 / total_col fruits$week5_pct = fruits$week5 / total_col
An alternative is to use a library like dplyr
, but the benefit is fairly minimal and equally verbose (might have speed gains with larger data sets).
sapply
or lapply
First we loop through each numeric column (all except first), and in each iteration divide that column by the total_col vector:
pcts = lapply(fruits[,-1], function(x) { x / total_col })
The result is a list of five vectors, one for each column calculation:
$week1 [1] 0.27536232 0.10000000 0.03030303 0.05882353 0.20895522 $week2 [1] 0.2753623 0.0200000 0.4242424 0.3529412 0.1641791 $week3 [1] 0.1304348 0.2600000 0.1212121 0.1764706 0.2686567 $week4 [1] 0.2173913 0.2600000 0.2727273 0.2941176 0.2537313 $week5 [1] 0.1014493 0.3600000 0.1515152 0.1176471 0.1044776
We could have also done a for()
loop, but that is actually more work for you.
Next we can convert that to a data frame (a data frame is actually like a list of column vectors, each with the same number of items), and append the fruit column:
pcts = as.data.frame(pcts) pcts$fruit = fruits$fruit
We could have also done a cbind()
and the lapply()
output to have both sets of numbers in one combined table.
Both methods above result in a table like this:
fruit week1 week2 week3 week4 week5 1 Apples 0.27536232 0.2753623 0.1304348 0.2173913 0.1014493 2 Bananas 0.10000000 0.0200000 0.2600000 0.2600000 0.3600000 3 Oranges 0.03030303 0.4242424 0.1212121 0.2727273 0.1515152 4 Mangoes 0.05882353 0.3529412 0.1764706 0.2941176 0.1176471 5 Pineapples 0.20895522 0.1641791 0.2686567 0.2537313 0.1044776
(Although technically in the automated example you will end up with the fruit column at the far right if you followed the same method)