Most of the evaluations I conduct include interview or focus group data. This data provides a sense of student experiences and outcomes as they progress through a program. After collecting this data, we would transcribe, read, code, re-read, and recode to identify themes and experiences to capture the complex interactions between the participants, the program, and their environment. However, in our reporting of this data, we are often restricted to describing themes and providing illustrative quotes to represent the participant experiences. This is an important part of the report, but I have always felt that we could do more.

This led me to think of ways to quantify the transcribed interviews to obtain a broader impression of participant experiences and compare across interviews. I also came across the idea of crowdsourcing, which means that you get a lot of people to perform a very specific task for payment. For example, a few years ago 30,000 people were asked to review satellite images to locate a crashed airplane. Crowdsourcing has been around for a long time (e.g., the Oxford English dictionary was crowdsourced), but it has become considerably easier to access the “crowd.” Amazon’s Mechanical Turk (MTurk.com) gives researchers access to over 500,000 people around the world. It allows you to post specific tasks and have them completed within hours. For example, if you wanted to test the reliability of a survey or survey items, you can post it on MTurk and have 200 people take the survey (depending on the survey’s length, you can pay them $.50 to $1.00).

So the idea of crowdsourcing got me thinking about the kind of information we can get if we had 100 or 200 or 300 people read through interview transcripts. For simplicity, I wanted MTurk people (Called Workers on MTurk) to read transcripts and rate (using a Likert scale) students’ experiences in specific programs, as well as select text that they deemed important and illustrative of those participant experiences. We conducted a series of studies using this procedure and found that the crowd’s average ratings of the students’ experiences were stable and consistent, even after we used five different samples. We also found that the text the crowd selected was the same across the five different samples. This is important from a reporting standpoint, because it helped to identify the most relevant quotes for the reports, and the ratings provided a summary of the student experiences that could be used to compare different interview transcripts.

If you are interested in trying this approach out, here a few suggestions:

1) Make sure that you remove any identifying information about the program from the transcripts before posting them on MTurk (to protect privacy and comply with HSIRB requirements).

2) Pay the MTurk people more for work that takes more time. If a task takes 15 to 20 minutes, then I would suggest that a minimum payment is $.50 per response. If the task takes more than 20 minutes I would suggest going $.75 to $2.00 depending on the time it would take to complete.

3) Be specific about what you want the crowd to do. There should be no ambiguity about the task (this can be accomplished by pilot testing the instructions and tasks and asking the MTurk participants to provide you feedback on the clarity of the instructions).

I hope that you found this useful and please let me know how you have used crowdsourcing in your practice.

About the Authors

Tarek Azzam

Tarek Azzam box with arrow

Associate Professor, Claremont Graduate University

Tarek Azzam, Ph.D., is an Associate Professor at Claremont Graduate University. Dr. Azzam’s research focuses on developing new methods suited for real world evaluations. These methods attempt to address some of the logistical, political, and technical challenges that evaluators commonly face in practice. His work aims to improve the rigor and credibility of evaluations and increase their potential impact on programs and policies. Dr. Azzam has also been involved in multiple projects that have included the evaluation of student retention programs at the K-12 and university level, Science, Technology, Engineering, and Math (STEM) education programs, and pregnancy prevention programs.

Creative Commons

Except where noted, all content on this website is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Related Blog Posts

Nation Science Foundation Logo EvaluATE is supported by the National Science Foundation under grant numbers 0802245, 1204683, 1600992, and 1841783. Any opinions, findings, and conclusions or recommendations expressed on this site are those of the authors and do not necessarily reflect the views of the National Science Foundation.