I recently came across a set of data cleaning tips in Excel from EvaluATE, which provides support for people looking to improve their evaluation practice.
As I looked through the tips, I realized that I could show how to do each of the five tips listed in the document in R. Many people come to R from Excel so having a set of R to Excel equivalents (also see this post on a similar topic) is helpful.
The tips are not intended to be comprehensive, but they do show some common things that people do when cleaning messy data. I did a live stream recently where I took each tip listed in the document and showed its R equivalent.
As I mention at the end of the video, while you can certainly do data cleaning in Excel, switching to R enables you to make your work reproducible. Say you have some surveys that need cleaning today. You write your code and save it. Then, when you get 10 new surveys next week, you can simply rerun your code, saving you countless Excel points and clicks.
You can watch the full video at the very bottom or go each tip by using the videos immediately below. I hope it’s helpful in giving an overview of data cleaning in R!
Tip #1: Identify all cells that contain a specific word or (short) phrase in a column with open-ended text
Tip #2: Identify and remove duplicate data
Tip #3: Identify the outliers within a data set
Tip #4: Separate data from a single column into two or more column
Tip #5: Categorize data in a column, such as class assignments or subject groups
Full Video
Except where noted, all content on this website is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.