How to Remove Dollar Sign in R (and other currency symbols)

Let’s say we imported a .csv or .xlsx file into R, and it’s like this:

print(df)
    person payment
1 Person 1  $56.11
2 Person 2  $20.42
3 Person 3 $104.20
4 Person 4 $201.21
5 Person 5   $5.06

Looks fine, until you try to do a calculation on the payment column:

sum(df$payment]
Error in sum(df$payment) : invalid 'type' (character) of argument

So we run str(df) to check the table structure. Lo and behold, that column is not a numeric variable, it is character (chr):

'data.frame':	5 obs. of  2 variables:
 $ person : chr  "Person 1" "Person 2" "Person 3" "Person 4" ...
 $ payment: chr  "$56.11" "$20.42" "$104.20" "$201.21" ...

How to remove the dollar signs from column in R

One way to do it is with the gsub() function, in conjunction with as.numeric(). gsub() is used to substitute specific text from a string with other text, and as.numeric() can coerce a variable to numeric.

Let’ see it in action:

# replace $ with blank "" in the df$payment column.  and coerce that result to numeric
df$payment_2 = as.numeric(gsub("\\$", "", df$payment))

# print the data frame just to eyeball it
df

# compute sum of payment_2 column
sum(df$payment_2)

The gsub() function looks for any instance of “\$” and replaces it with “”. The forward-slashes are known as escape characters. They are needed because gsub() accepts “regular expression” as the first parameter, and since $ is a reserved regular expression notation, we must “escape” this reserved meaning and look for literal values of $ in the text strings.

    person payment payment_2
1 Person 1  $56.11     56.11
2 Person 2  $20.42     20.42
3 Person 3 $104.20    104.20
4 Person 4 $201.21    201.21
5 Person 5   $5.06      5.06

[1] 387

By the same token, we can replace commas and other currency-related notations that are being read as part of the string. We can do them individual as we did above with the dollar sign, or we can specify any number of symbols to remove, all at once. For example, to remove both dollar sign and comma, we use the following notation:

# replace $ and comma with blank "" in the df$payment column
df$payment_2 = as.numeric(gsub("[\\$,]", "", df$payment))

Here we modified the gsub() function to include [\\$,]. We could add any number of other symbols within brackets that we wish to replace.

Regular Expressions

Regular expressions is a whole massive topic unto itself. Entire books are written about it. My favorite is Mastering Regular Expressions. And regexr.com is an excellent resource for learning and testing regular expressions on your text.

And don’t forget. We just used data frame columns for convenience here. But a column is simply a vector, so generically speaking, this approach can be used for any kind of vector.

0 0 vote
Article Rating
Subscribe
Notify of
guest
10 Comments
Inline Feedbacks
View all comments
Smithg985
Smithg985
4 years ago

Hello there! This post could not be written any better! Reading this post reminds me of my good old room mate! He always kept talking about this. I will forward this page to him. Fairly certain he will have a good read. Thank you for sharing

bjs12
bjs12
4 years ago

Those are back slashes not forward slashes, fyi.

CZLY
CZLY
4 years ago

Yikes – could this be anymore complicated??? How about showing is for the average person that doesn’t use excel very often

John
John
4 years ago
Reply to  CZLY

Hi. I don’t understand. There’s no reference / mention of Excel at all in this post. Just two functions: gsub() and as.numeric(). If this is going over your head, I suggest reading up on vectors. There are a few chapters dedicated to this in the book.

Mike
Mike
3 years ago

How would you apply this to multiple columns? like a range of cloumns or multiple columns?

dsolito
dsolito
2 years ago

as.tibble(map(df, str_replace, “\\$”, “”))

Taye
Taye
1 year ago

Thanks for this, it just saved me alot now. Worked perfectly

Shae Lynn Watt
Shae Lynn Watt
9 months ago

This was a really helpful post for a newbie!!

Stupid question, but… why do we need TWO backslashes before the dollar sign? I get that $ is a “regular expression” and we have to tell R to read it literally… but why doesn’t a single escape send that message?

Javiera Ibacache
Javiera Ibacache
9 months ago

Thank you so much! very useful!