Data Visualization

This is the fourth part in an ongoing series on how and why you should be using R, especially if you are a social science researcher or education researcher, like me. If you missed the earlier ones, you can check out part 1 (Intro to R), part 2 (R Basics), and part 3 (Data Cleaning and Manipulation). This post will go into some more specifics relating to data visualization.

There are many ways to visualize your data using R. By far the most popular (and I think robust and flexible) is using the ggplot2 package. This post will talk a bit about why and how to visualize your data and some tips and basics to using R’s ggplot2 package to help you achieve your visualization goals.

Why visualize?

There are lots of reasons why you might want to visualize your data (or rather, why you should visualize your data). It can be a useful tool at various stages of research, and depending on where you are in your analysis process, different aspects of visualization might be more or less important to focus on. The way I see it, there are three main purposes for data visualization: examining your data, showing your data/findings, and sharing your data/findings.

What question are you trying to answer with your data? How can a visualization help you answer that? Do you have a really complex data set that is too hard to easily capture with a few numbers? Are you interested in variation and distribution rather than just means and medians? Are you exploring different relationships between variables and want to see how they interact?

Continue reading “Data Visualization”

Data Cleaning and Manipulation/Organization

This is the third part in an ongoing series on how and why you should be using R. If you missed the earlier ones, you can check out part 1 (Intro to R) and part 2 (R Basics). This post will go into some more specifics relating to data cleaning, organization, and manipulation.

In my opinion, the dplyr package is a game changer for those trying to learn R. It is what motivated me from just recommending that people use R to basically demanding that my friends and co-workers switch to R. I remember the day that I finally got around to learning how to use the package’s functionality and all of the ways in which it lets you easily and clearly manipulate your data frames1. I just kind of stared at my computer screen and imagined how much better my data-life was going to be with these tools. I realized that the hours and hours I used to spend in Excel trying to massage my data into the right form were over2. Also, I wouldn’t have to decipher weird R base code anymore when trying to create new variables or filter datasets. The dplyr package and its friends make your code/scripts much easier to read which will help both you and future you in trying to decipher what is going on.

Continue reading “Data Cleaning and Manipulation/Organization”

10,000 Tweets

I have been on twitter for almost ten years. Twitter has changed a lot in that time and my enthusiasm for the platform has waned a bit over the years, but I still find it to be a compelling communication platform. Initially I used it to share about the more mundane, personal parts of life and my stresses as I finished graduate school. Lately it’s become more professionally-focused (most of the time) and more reflective of the many things that are happening in the world (but with important dog pictures also). I have met lots of people through twitter as well as listened and learned from thousands of people who I would never have met in my day-to-day life. It has helped me gain a wider audience for my academic work and has allowed me to share pictures of my awesome dog with strangers and friends alike.

I just hit 10,000 tweets (if I did this correctly then the tweet linking to this post would be number 10,000). And I thought it would be a good opportunity for me to go back through my twitter archive and get a sense of what all of those tweets were about and how I tweeted. (The analysis that follows is actually only on my first 9,945 tweets because I had to request my tweets a couple weeks ago and do the actual analysis.) This was also a fun R exercise for me1.

Continue reading “10,000 Tweets”