Hi Deirdre, I assume you’ve figured it out by now. If not, could you clarify a bit? Feel free to email me directly with the example … John at rforexcelusers. I’m more than happy to help you; probably just a matter of transposing a table and merging in some way.

]]>By the way, if anyone searches this post out for info on cluster analysis per se, there are many simpler to use (and equally powerful) stand-alone programs that perform detailed cluster algorithms. “PAUP” is one of those, find it at: https://paup.phylosolutions.com/ ]]>

I should add that today there is a much higher limit on the size of the dataset you can work with in Excel using Power Query and/or Power Pivot. I’ve worked with tens of millions of rows of data in Power Query without noticeable performance issues. I’ve also created “live” connections to databases containing billions of records (Hadoop, ERP systems, etc.)

Anyone who thinks they can’t work with large datasets in Excel is still stuck in the “old school” ways of thinking. Microsoft has made massive strides in this area in recent years, so if you’re not aware of these new Power tools you really should avail yourself of them to see how they can totally transform the way you work with data in Excel.

]]>Excellent points! I’ve been in that situation enough times in my career — dealing with other people’s highly convoluted Excel spreadsheets that are really sensitive to the most trivial changes. The linearity of code ensures that you can read it from top to bottom to follow the logic.

]]>It’s apples and oranges. Spreadsheets are for displaying small data sets and doing simple calculations on them. That’s where Excel and the likes shine. For advanced statistical analysis on larger data sets you need R. There is a small intersect between the two (i.e. joining tables) but these have very different purposes. R is not suitable for “changing a “final” table in front of clients” because advanced statistical analysis needs more time and concentration than a meeting with clients could provide.

]]>I’ve never done this because I don’t use R much as a reporting engine. But if I were to do this, I would just compute the totals row manually (the apply function is good for this, or even these dplyr functions with no group_by()) … and then append that row to the data frame with rbind().

It’s really wonky, and perhaps there are libraries that would handle this for you. Considering the growth of Shiny, maybe R is being used more and more for reporting, so someone may have built a tool for this.

]]>