A couple weeks ago at work I gave a presentation as part of one of our lunch-time brown bag meetings. Sometimes we have a researcher come from another institution to talk about their project, but sometimes we have an internal person give a mini-workshop or ask for feedback on a research issue. For this particular meeting, I and another researcher talked about general Excel tips and tricks that we had picked up over the years. We had both realized from talking to others that some people weren’t taking advantage of some of the many advantages of using a spreadsheet program; they were just using Excel like a place for their data instead of a place where they could organize, manage, and analyze their data. (No joke, I have actually seen people count numbers on their screen.) The basic idea was to share ways that Excel can help us do our work – sometimes just knowing that something is possible to do lets you know that you can search out ways to improve data organization/analysis.
So, here are some of the things that I shared in that meeting. Most of them are things that I figured out through either hard work, a friendly colleague, or a quick Google search. They mostly assume a more than basic understanding of Excel. There are a ton of Excel forums online so I would suggest if you’re ever doing something in Excel that seems like it could be done more easily, you should search and see if a solution comes up. You will usually be able to save lots of time and headaches.
Basic formatting of your spreadsheet(s)
So, first things first. You should organize your spreadsheet(s) so that it’s easy to do the analysis that you need to do. This means that your data should make sense. Typically you will want a given row and a given column to contain the same type of thing. In other words, don’t treat Excel like a data table and change the data table contents every few rows. You want to be able to copy formulas across rows and down columns and you can’t do that if they don’t match.
Typically, I put cases of data in rows and use the columns for variables. This is the best way to do things if you want to ever export to R or SPSS or Stata or something. The order that you put them doesn’t really matter, but if you’re going to use certain formulas it will. For instance, if you’re using
vlookup (more on that later) then you definitely want to put whatever your reference cells are going to be on the left side.
You also might want to create a reference column. Let’s say you have some student data that has multiple instances of data per student (e.g., at different time points in the school year). You might want to be able to refer to specific instances of a piece of student data and you’re going to want a single cell with that information instead of two (or more) columns of identifying meta-data. The
concatenate function is your friend if you want to easily combine two or more columns of anything.
Freezing panes is another helpful thing to do. This will help keep the important reference cells visible when you’re scrolling through the thousands of rows of data or hundreds of column variables. [Fun fact! Did you know that Excel has row and column limits? It used to be 65,536 rows and 256 columns in older versions of Excel but now it’s 1,048,576 rows and 16384 columns which is much more reasonable. I found this out the hard way one day when I realized all of my data didn’t fit.]
The other important thing to do is to try as much as possible to keep all relevant data in ONE spreadsheet in multiple tabs instead of multiple separate files. You can refer to data in other files, but if you ever move either of those files you are going to have one big mess on your hands and trust me, it’s not worth it. Just copy everything into your one file.
Formulas are your friends (you know, Excel friends, don’t you have those?)
I have some favorite formulas (the dictionary says I don’t need to use formulae so I won’t). I use them all the time and they make my job so much easier. The most basic ones are
vlookup, and a good
match combo. You can do SO MUCH with these formulas. A bunch of nested if statements, with appropriately placed other formulas, can really do wonders to your data analysis. You should never count anything by hand. This includes things that are text. Always search. Always check and make sure your formulas are working correctly, but you should basically write a formula to do as much as possible. [I am assuming that you have more than a few pieces of data.]
vlookup and it’s companion
hlookup are two amazing formulas that really can change everything if you know how to use them. The basic idea is that you tell it what to search for, the array to search in, and the number of the column (for
vlookup) to return a value. The key is that your reference information needs to be in the leftmost column of the selected array. Also, you should just always use “false” for the last option in the formula because I can’t think of a good reason why you would not want an exact match (just don’t forget to do this, because it’s not the default option for some reason).
A formula that I just started using is
sumproduct which is extremely helpful when you have multiple complex criteria to search on and works because of linear algebra so that’s really cool. For instance,
sumproduct can help you do a case-sensitive
=sumproduct(--exact(a1:a5,"c")). But more generally it is good for when you need something more complicated than the regular
countif can handle. This website has some good examples of these cases.
Other formula tips include things like using array formulas (use the three-key combo of shift-command-return to enter the formula). This is necessary for
sumif and other formulas that have to evaluate a bunch of cells simultaneously. Also, you should whenever possible not hard code things into your formulas. Use reference cells when you can because inevitably something will change and you really don’t want to have to change your formulas by hand later.
Working with text
There are a whole lot of formulas that can help you with searching, replacing, and counting text. Trust me, you can do way more than you think you can with text fields in Excel. To start, in Excel, an asterisk * is a wild card for any number of characters while a question mark ? is a wild card for just one character. Here are some (maybe random) examples of text-related formulas:
- when searching for text if you don’t want a with value error [NB: adding a “blank” with the “” doesn’t actually a create a real blank cell as far as Excel is concerned, so be careful.]
- to check to see if there is something in a cell that occurs after a particular cell (e.g., “failed”) (written for the cell to the right, but could be applied in any direction by any amount):
- to fill a series of letters for columns:
=IF(ROW(A1) > 26, CHAR(INT((ROW(A1) – 1)/ 26) + 64) & CHAR(MOD(ROW(A1) -1,26) + 65), CHAR(ROW(A1) + 64))
There are a bunch of other things, like using conditional formatting for data visualization (customize your rules to color cells the way you want and include the numbers or text in the cells so they “disappear”) and creating graph templates, but this seems like enough for today. Let me know if you have questions about what is above or if you have other problems with your data that I might possibly be able to fix!