Thinking about Transcripts

Since I am working on transcripts and transcription, I decided to re-read some research articles and chapters about this process of working with transcripts. First, I read chapter 4 in Ellingson and Sotirin (2020) on Engaging Transcripts. Second, I looked for articles referenced in this chapter (see reference list below) that might illuminate or clarify what I was thinking about transcripts and transcription processes.

Deciding on how to conduct the transcription from the digital audio files was the first choice. Rather than spending inordinate numbers of hours doing the transcription myself, or paying for another trusted human being to do this work (which was less appealing to me), I decided to use Otter.AI software early in the design of the research. I tested the application and discovered the limitations to the free version, so made the decision to pay the annual fee for the service for the duration of my dissertation work. Each audio file, once received from the Zoom recording, was uploaded into Otter.AI and the transcription was usually available within the next few hours. Once the notification was received, I then downloaded the word processed version of the transcript. I noticed, as I reviewed the transcripts, while listening to the video recording of the interview, that the artificial intelligence of the Otter.AI software was particularly effective in removing extraneous vocalizations such as um, ah, eh, or other non-sensical utterances. As Ellingson and Sotirin (2020) note, people don’t talk in punctuated phrases. The software was particularly effective in picking up the nuanced pauses, breaks, or full stops within the transcribed document. While understanding that “accuracy is a naive standard of quality in transcription”, I am cognizant that, while I corrected and revised the text document, the transcribed versions of the video and audio record of the interviews is “better understood as interpretive and translational” (Ellingson & Sotirin, 2020, p. 58).

My next task, prior to sharing the audio file and transcribed text document back to the participant, was to re-listen and revise the transcript to clean up any transcription errors. I consequently focused on accuracy of the words being captured to the static page, as well as the deliberate, accidental, and unavoidable alterations (Ellingson & Sotirin, 2020) that I am making to each document. Correcting punctuation means attending to pauses and breaks while also allowing for run-on thoughts to evolve without adding punctuation. In this way, the transcript better represents the ideation and though-streams rather than becoming something they were not – fully formed ideas rather than wonderings or pondering by the participant or prompting done by me, the researcher, evoking a response. When cleaning up the transcripts, I am mindful of the fact that much is being lost in the translation from video to text. Ellingson and Sotirin call my attention to the “verbal disfluencies” evident in the video but not captured in the text and an awareness of the “loss of vital clues such as hesitations, pauses, or awkward struggles to rephrase a passage” (p. 56). I recall one interviewee’s self-deprecating laughter over an idea that came to mind in the midst of our conversation which I later encoded into the transcript to ensure it was not glossed over as insignificant, since it was a particularly telling moment in the interview.

Ellingson and Sotirin (2020) caution that transcriptions are fundamentally different data objects than the video and audio recordings from which they are derived. Transcripts can take on the “air of truthfulness” (p. 56) that suggests these documents are definitive and authoritative versions of the events. There is a process of translation that occurs. Suggesting that the transcribed text is the dominant and primary version can leave the video recording of the event viewed as secondary data, with less merit. By engaging in the data transcription as Ellingson and Sotirin (2020) suggest, through mapping and translating, I playfully engage in the transcription process in creative ways.

One of the creative measures I used was the application of word cloud software (WordArt) to render the transcript into an interactive image which I captured in two ways – a flat PNG image file, and a 60 second screencast video file of the interactive image. Both are retained with the transcript and both are shared back to the participant along with the transcribed text document and audio file of the recorded interview. In this way, I enliven the transcription into a playful object for reflection. I have collated these into a Flickr collection as well as one document, which allows me to see the similarities and differences that are immediately evident between/among these image renderings. [Use this link to see the word cloud collection on Flickr. Use this link to see all the word cloud images on one PDF file.]

At this point in time, as I nestle down into the coding process and begin working on the analysis of the data collection, I relish the opportunity to return to the interview video files and do, as Ellingson and Sotirin suggest. I aim to linger in this liminal space to “continually intra-act within an assemblage of discourse, material objects, methodologies, and other data” (p. 60). If my mathematical estimations are correct, this process will take up to 120 hours and up to six weeks to complete. I make efforts to trouble the flatness while I embark on a cycle of revisiting and repeating, revision, refinement and reflection (pg. 67). By seeking to achieve a deeper understanding of the data, I will re-listen to each of the interviews and capture the “embodied transcript”, one that is rich in detail, capturing rhythm and cadence, pacing, pitch, intonation, emphasis on words or phrases, and other non-verbal body language evident in the video recording. These will be added to NVivo transcripts as memos which can then be further analyzed for meaning.

Throughout this process, I will pay attention and trace my moments of wondering, playful engagements with the data, places where I pause to reflect, and where creative possibilities emerge. I end this reflection with words from Cannon (2018):

“I wonder. I freeze, incompetent. I resort to poetry. There, it is always a failure and always a truth. Multiplicity lives there, of interpretations, of meanings. In the multiplicity, I am moving, yet still, stammering to say what is true, and knowing that is an impossible task.”

Cannon, 2018, p. 547

References

Cannon, S. O. (2018). Teasing transcription: Iterations in the liminal space between voice and text. Qualitative Inquiry, 24(8), 571–582. https://doi.org/10.1177/1077800417742412

Ellingson, L., & Sotirin, P. (2020). Making data in qualitative research: Engagements, ethics, and entanglements (1st ed.). Routledge.

Lupton, D. (2016). Digital companion species and eating data: Implications for theorising digital data–human assemblages. Big Data & Society, 3(1), 205395171561994. https://doi.org/10.1177/2053951715619947

Lupton, D. (2018). How do data come to matter? Living and becoming with personal data. Big Data & Society, 5(2), 205395171878631. https://doi.org/10.1177/2053951718786314

MacLure, M. (2013). The wonder of data. Cultural Studies ↔ Critical Methodologies, 13(4), 228–232. https://doi.org/10.1177/1532708613487863

Weller, S. C., Vickers, B., Bernard, H. R., Blackburn, A. M., Borgatti, S., Gravlee, C. C., & Johnson, J. C. (2018). Open-ended interview questions and saturation. PLOS ONE, 13(6), e0198606. https://doi.org/10.1371/journal.pone.0198606