Skip to content

How to do MID, RIGHT an LEFT in R

MID()LEFT() and RIGHT() make it easy to extract parts of strings in Excel. Let’s see how we apply those in R.

Using substr()

Let’s start with a simple example in Excel:

Base R does not have exact equivalents to these functions. Instead, there is substr(), which generically extracts substrings from strings. Let’s replicate the above image in R:

a = "Chicago"
b = "New York City"
c = "Los Angeles"
 
# LEFT
substr(a, 1, 3)
 
# MID
substr(b, 5, 8)
 
# RIGHT
substr(c, 5, 11)
[1] "Chi"
[1] "York"
[1] "Angeles"

The substr() function is structured as substr(x, start, stop) where start and stop are the first and last characters to be extracted, respectively.

Creating custom functions

Chapter 15 of the book shows how to create custom functions. We would write custom functions if we plan to execute these commands regularly. Or if we just want the comfort of a more familiar function. Here’s how I would do it:

left = function(text, num_char) {
  substr(text, 1, num_char)
}
 
mid = function(text, start_num, num_char) {
  substr(text, start_num, start_num + num_char - 1)
}
 
right = function(text, num_char) {
  substr(text, nchar(text) - (num_char-1), nchar(text))
}
 
left(a, 3)
mid(b, 5, 4)
right(c, 7)
[1] "Chi"
[1] "York"
[1] "Angeles"

The structure of these three functions exactly mimics the Excel formulas. right(), for example, now just takes text and num_char as input, just like Excel’s RIGHT(). We did this by hacking the input to the substr() function. Taking the Los Angeles example, right() ends up evaluating to:

substr("Los Angeles", 11 - (7 - 1), 11) ==> substr("Los Angeles", 5, 11)

Which is exactly what we did in the first example.

More Advanced string extraction

The above examples are just scratching the surface of what you can do with string extraction in R. For example:

Separating first and last names

This is not bad in Excel, but what if someone puts in their middle name or initial? It would get more complicated if you want to write a single formula to handle all the rows.

Or how about this one:

Extracting the query string from a URL

To just get the query string (the part where q=…), the first three are not too hard in Excel. But then we get a curveball with the &another_parameter, which we don’t want to grab.

Regular expressions come into play, and we will cover this in a later post.

Comments

    1. Hi Nidha, for URL parsing I recommend the urltools library. Below are two functions demonstrating…

      url_parse(“https://www.amazon.com/s?k=soap&ref=nb_sb_noss”)

      This will return a data frame breaking out each component of the url, including the scheme (https), domain (www.amazon.com) and parameters (k=soap&ref=nb_sb_noss)

      You can access specific parameters with the param_get() function, like this:

      param_get(“https://www.amazon.com/s?k=soap&ref=nb_sb_noss”, “k”)

      This returns the value of parameter “k”, which is “soap”

      Hope this helps!

Leave a Reply

Your email address will not be published. Required fields are marked *