Most of the evaluations I conduct include interview or focus group data. This data provides a sense of student experiences and outcomes as they progress through a program. After collecting this data, we would transcribe, read, code, re-read, and recode to identify themes and experiences to capture the complex interactions between the participants, the program, and their environment. However, in our reporting of this data, we are often restricted to describing themes and providing illustrative quotes to represent the participant experiences. This is an important part of the report, but I have always felt that we could do more.
This led me to think of ways to quantify the transcribed interviews to obtain a broader impression of participant experiences and compare across interviews. I also came across the idea of crowdsourcing, which means that you get a lot of people to perform a very specific task for payment. For example, a few years ago 30,000 people were asked to review satellite images to locate a crashed airplane. Crowdsourcing has been around for a long time (e.g., the Oxford English dictionary was crowdsourced), but it has become considerably easier to access the “crowd.” Amazon’s Mechanical Turk (MTurk.com) gives researchers access to over 500,000 people around the world. It allows you to post specific tasks and have them completed within hours. For example, if you wanted to test the reliability of a survey or survey items, you can post it on MTurk and have 200 people take the survey (depending on the survey’s length, you can pay them $.50 to $1.00).
So the idea of crowdsourcing got me thinking about the kind of information we can get if we had 100 or 200 or 300 people read through interview transcripts. For simplicity, I wanted MTurk people (Called Workers on MTurk) to read transcripts and rate (using a Likert scale) students’ experiences in specific programs, as well as select text that they deemed important and illustrative of those participant experiences. We conducted a series of studies using this procedure and found that the crowd’s average ratings of the students’ experiences were stable and consistent, even after we used five different samples. We also found that the text the crowd selected was the same across the five different samples. This is important from a reporting standpoint, because it helped to identify the most relevant quotes for the reports, and the ratings provided a summary of the student experiences that could be used to compare different interview transcripts.
If you are interested in trying this approach out, here a few suggestions:
1) Make sure that you remove any identifying information about the program from the transcripts before posting them on MTurk (to protect privacy and comply with HSIRB requirements).
2) Pay the MTurk people more for work that takes more time. If a task takes 15 to 20 minutes, then I would suggest that a minimum payment is $.50 per response. If the task takes more than 20 minutes I would suggest going $.75 to $2.00 depending on the time it would take to complete.
3) Be specific about what you want the crowd to do. There should be no ambiguity about the task (this can be accomplished by pilot testing the instructions and tasks and asking the MTurk participants to provide you feedback on the clarity of the instructions).
I hope that you found this useful and please let me know how you have used crowdsourcing in your practice.
Except where noted, all content on this website is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.