Believe it or not, Excel is still my go-to analysis tool a lot of times, because it’s great at what it does. I’m a shortcut fiend, so I can do things pretty quickly. So when do I opt for R? People have asked me this many times. Here is my unofficial checklist I loop through in my head to decide whether to Excel or not to Excel:
- Is the data not well structured or PivotTable-ready? Does it have a lot of stuff within cells that needs to get broken out?
If yes, then R, unless I can work my Excel magic to clean it up.
- Is this a quick and dirty one-time analysis? Including quick visuals.
If Yes, then Excel, as long as the data is not gigantic.
- Do I need anything beyond basic statistical analysis? Regression, clustering, text mining, time series analysis, etc
If Yes, then R. No contest.
- Do I have to crunch a few disparate datasets to do my work?
Depends on complexity. If data sets are small and a simple vlookup can handle it, then Excel. If more than three tables, most likely R. If more than 1-2 columns vlookup’ing from each table, also R.
- Something I will want to share in a web-based, interactive format that is nice to look at?
R with the Shiny framework
- Unique and beautiful visuals the world has rarely seen?
While I was learning R, I used a hybrid approach … doing the heavy-lifting data prep work in R, then using the write.csv() function to send my data frames back to Excel for visuals and basic analysis. Over time, I have learned to do more complete analysis in R, from beginning to end.
I hope this helps! What scenarios did I miss?
Nice post, John.
I use a lot of Excel when I have to present scenarios and change a “final” table in front of clients. Looking at the data is easier in Excel than in R, but of course I’m referring to a “final, condensed, aggregated” dataset (< ~30000 I guess?).
Knowing how to conduct a PCA analysis in R does not mean you understand the dataset and can play with human-generated scenarios on it in a spreadsheet.
For that sense, R is overrated. There's a reason the industry still relies heavily on spreadsheets. They are easy. They support the creation of scenarios and conversation about the data.
R programming will be gibbreish in front of a meeting room with the CEO. Unless you spent days/weeks working on a Shiny dashboard. And still you can be surprised with a simple question on "change a parameter x, please".
What you have said is true about scenario analysis, but then R is meant for statisticians and written by statisticians. Using it for a simple think like that is like using a thermonuclear bomb to kill a fly.
R shines truly when one actually have to use statistics to infer information, say creation of neural networks or forecasting or the likes.
Also when working in a file with more then 1000 line items when it is necessary to have a birds eye view then R works well for me (this is my personal thought and given that I am proficient in R this is biased).
Thanks for the input, guys. You’re both right. I will never argue against use of spreadsheets because I still use them on a daily basis, more than R, in fact.
Over time, each person figures out what works for them and when. To this day, I still pull our R as my secret weapon at work. The important thing is you diversify your skill set so you can build the confidence to think that nothing is impossible (at least as far as data work goes).
It’s apples and oranges. Spreadsheets are for displaying small data sets and doing simple calculations on them. That’s where Excel and the likes shine. For advanced statistical analysis on larger data sets you need R. There is a small intersect between the two (i.e. joining tables) but these have very different purposes. R is not suitable for “changing a “final” table in front of clients” because advanced statistical analysis needs more time and concentration than a meeting with clients could provide.
Thanks, your opinion just the same as me. I’m using Excel and R together to analyze data, Excel helps me in the data editor and R for analyzing. In fact, R and Excel are just the tools, the most important for data analysis is the mindset and suitable theory to tell the audience a story based on boring data, don’t you think?
I’m new to R, but one of the reasons I’m learning/using it is that I need to produce analyses that other people can understand. For instance I was recently faced with a spreadsheet analysis of a bunch of measured performance data for an integrated circuit. The spreadsheet was a work of art, but absolutely IMPOSSIBLE to understand (all sorts of lookups, tables made from other tables, many sheets, etc). In fact, once I painfully stepped through it I discovered an error that was so buried that no normal person would ever have seen it. The difference between R and Excel in this case is that even though R requires me to screw on my head differently, I can write comments! And I can use well named variables! And I can create something that somebody else can understand, critique, and modify even if they don’t completely understand the syntax. And even more importantly my data collection is not directed by the spreadsheet (one IC designer told me “don’t you dare change anything about the order or quantity of data taken in this test because it took me weeks to get that spreadsheet right and you’ll break it if you change the… Read more »
Excellent points! I’ve been in that situation enough times in my career — dealing with other people’s highly convoluted Excel spreadsheets that are really sensitive to the most trivial changes. The linearity of code ensures that you can read it from top to bottom to follow the logic.
I am a long-time (25+ years) Excel user who is currently learning R. I am a Data Analyst by trade, and I also hold a Master’s degree in IT so I have a good understanding of programming concepts. I am still learning the R basics, but so far have seen nothing that can’t be done far more quickly and easily in Excel. After researching the advantages/disadvantages of R vs. Excel, those who lean towards R always overlook a critical piece of the puzzle when listing the drawbacks of Excel: Power Query (known as “Get & Transform” in Excel 2016 and later). Power Query allows you to create repeatable scripts using a point-and-click process. Write once, run repeatedly. You can also segregate the source data from your reports, pivot tables, charts, etc., so there’s no risk of your end users mucking about in the original data. If you use Excel extensively but you aren’t familiar with Power Query, get yourself a good book on the topic or take an online course straight away. It will forever change how you use Excel and interact with data. You’ll thank me someday, trust me. Want even more capabilities?
I should add that today there is a much higher limit on the size of the dataset you can work with in Excel using Power Query and/or Power Pivot. I’ve worked with tens of millions of rows of data in Power Query without noticeable performance issues. I’ve also created “live” connections to databases containing billions of records (Hadoop, ERP systems, etc.)
Anyone who thinks they can’t work with large datasets in Excel is still stuck in the “old school” ways of thinking. Microsoft has made massive strides in this area in recent years, so if you’re not aware of these new Power tools you really should avail yourself of them to see how they can totally transform the way you work with data in Excel.