Toward a more open science practice with R

Recently I did a webinar with my colleague Joshua Rosenberg, hosted by the Center for Open Science, on Analyzing Educational Data with Open Science Best Practices, R, and OSF. You can find a recording of the webinar here and our slides and an example R Notebook are in an OSF repository here. I thought I would do this blog post to summarize some of the main things I talked about there and highlight some of the more important aspects.

This webinar was ostensibly about open science for educational data. I think most of us want to engage in more open science practices (which could include open data, open materials, preregistration of studies, replication, posting preprints, and reporting null results) but don’t know necessarily where to begin or what tools to use. I think we tried to make the argument that workflows, procedures, practices, and behaviors that are good practice for you by yourself, future you, and your internal team can also be good for open science. And that using R and its many packages and tools is a good way of achieving those goals.

I’ve written many times before about how much I love using R and how I want others to incorporate it more into their practice. I’ve now collected the series of blog posts as well as other related blog posts (like this one!) and slides onto one page for easy access. You can go to cynthiadangelo.com/r/ to see all of the R related stuff that I have worked on linked in one place.

In general, I’ve been thinking a lot lately about my values and commitments as a researcher and how I approach my work in a very basic way. What is important to me, my collaborators, and my field? How could I be doing things differently or looking at my assumptions differently?

This thinking led me to this set of things to consider for a more open science approach:

  • There are a lot of technical tools and solutions to some of the open science problems. But there are also philosophical/ethical/moral issues to consider.
  • Humans are participants that helped produce your data. All humans deserve respect and so do their data.
  • There’s no easy answer for some of these situations you might face. That’s ok. Part of what open science asks is to consider your options and document your decision making.
  • Reflect early on in your process about what your goals are and how you want to achieve them. What are your values? How do these match up?

Some of the tips and guidelines that I talk about in the webinar: (1) Documentation is so important. It’s also really difficult. Making things clearer for you and your future self will also make them clearer for others who might eventually see your code. (2) A tidy data structure will make things easier for you and easier for others to understand. If you’re not already on board the tidyverse train, it’s never too late to start. (3) Make sure you have a data dictionary somewhere that explains all of your variables. This sounds obvious, but it doesn’t usually happen because in the moment you think you understand all of your variables. But future you will not remember all of those things. Write it down. Preferably in a R Notebook (more on that later). (4) Pick a consistent and clear file naming convention early on in your project (preferably before you begin data collection). Think about the date format you use and think about the unit of analysis you will care about later and try to incorporate it directly into your filename to help with filtering and analysis later on. (5) Of course I want you to visualize your data. Descriptive statistics can be misleading sometimes and visualization is an important step in your process and is not just an end product.

The thing that ties all of this together is using a R Notebook within RStudio. R Notebooks make use of RMarkdown, a flavor of Markdown, my favorite way to write. It is a plain text file, so it’s easy to version control and easy to share, both things that are hugely important when thinking about open science. I really like R Notebooks because you can easily incorporate explanatory text alongside your code and figures/graphs are persistent across the page so you can scroll and easily refer back to something above or below where you are working. This, in my opinion, is a much better way to use R than the older way with scripts and the console.

R Notebooks can produce an html file that you can send to your colleague or friend who doesn’t have R installed and they will be able to open it up in a browser and see all of your wonderful thoughts and figures. It’s really great. You can also execute code in Python or JavaScript or D3 (or a few other programming languages) in addition to R, so it’s very versatile. There are a lot of output formats as well, including pdf, Word, slide decks, dashboards, and books. And they are all customizable. Check out the RMarkdown website to see all of the options and more details on how they work. For me, they dramatically changed (in a good way) how I do my work.

Maybe a good question to leave you with is to try and answer “What is the best way for you to work toward open science?” It doesn’t have to be a big thing; it can be a bunch of small changes over time. This hopefully shouldn’t feel too overwhelming. There are lots of us here to help.

Effect size

There are a lot of different ways to think about comparing two things, or more appropriately perhaps, two sets of things. If they are things we can count, we can easily see which there are more of. If they are more like a score, we can easily see which set has a higher score. We can also fairly easily see what the distribution of the things in each set are, although comparing the distributions is a bit more tricky.

Using some basic statistics measures, we can tell whether or not the two sets of things are different from each other using significance testing. This is typically done with a t-test or an analysis of variance (ANOVA) or a similar measure. These types of measures, based on the mean and variance of a set of data points, are simple and easy to calculate (especially with a basic stats program) and have therefore become commonplace in the research literature. But unfortunately, their simplicity ends up hiding a lot of information and potentially interesting nuance.
Continue reading “Effect size”

Becoming a PI

There are three important letters that you add to your name when you finish your Ph.D. But there are two other letters that are also important to researchers as they begin their careers: P.I. The Principal Investigator is the person in charge of a research project and it signifies the next step in your career, where a funding agency has selected your research proposal, using a panel of your peers in most cases, as worthy of gaining a substantial amount of external support. It is basically a sign that other people (you know, people who aren’t trying to help you graduate) think that your work is important and interesting. It’s a really good thing and the first time you become a P.I. is an important career milestone.
Continue reading “Becoming a PI”

NARST 2014 Presentations

So I have a pretty ambitious schedule at NARST this year. NARST is the annual science education research conference. (It used to stand for something, but doesn’t anymore.) This year, I am involved in not one or two, but six presentations at the conference. And yes, that is a lot. Two of them are ones where I am first author on the paper and am presenting at the conference and the other four I am one of the co-authors (which is associated with a varying amount of responsibility depending on the paper). And these six papers fall into three very different areas, which makes the whole thing even more onerous.
* I am presenting a subset of the final results of the simulation meta-analysis that I’ve been working on for the last year and a half (a subset focused on science, obviously).
* I am presenting some findings relating to a large efficacy study of the PBIS curriculum (my part is focused on analysis of the weekly online implementation logs, but I’ve also been working on analysis of classroom video observations and teacher professional development). This work is largely concerned with teachers’ implementation of the new science Framework and NGSS-related ideas (mostly the integration of scientific practices with content). These papers are part of a related paper set (but there’s also another one that is in its own session).
* I am helping a colleague put together a presentation on analysis of afterschool science materials that we have been working on.
Continue reading “NARST 2014 Presentations”

NARST and AERA 2013

My “spring conference series” just ended. NARST (the conference formerly known as the National Association of Research on Science Teaching) was in early April (in Puerto Rico!) and AERA (American Educational Research Association) was about a week ago (here in San Francisco). Here are my notes and thoughts from the two conferences.

NARST

The big topic of the conference was, of course, the Next Generation Science Standards (NGSS) which officially were released at the tail end of the conference. Most people were referring to it during their presentations, even though we didn’t know exactly what it was yet. (Some people were conflating the new Framework with the new NGSS, but that’s a different story.)

There were a few presentations about one of the large studies that I am working on, an efficacy study of a middle school science curriculum. These presentations on some of our preliminary findings went well and I am really looking forward to next year’s conference when we will have even more results to report on and some awesome graphs to show.

Continue reading “NARST and AERA 2013”