Every so often, the Oral History Discussion List  [H-ORALHIST@H-NET.MSU.EDU] posts a hopeful query from a newbie, asking for advice on which voice recognition software to use to transcribe an interview. The last one prompted me to respond, as follows.

NO computer will register the nuanced meaning of voice tone, rhythms, breathing, expression and non-verbal utterances such as sobs or catches in the throat. The meaning of oral history is about much more than words. We’ve had this audio vs print transcript debate for decades now. I forget who said this, but it expresses it well: ‘a transcript is a map, but audio is a landscape’.  Using a computer to generate a transcript would be like using a mud map – very limited sense of landscape.

I favour the timed summary-audio retrieval system used by the National Library of Australia. This allows you to browse an interview or collection digitally by keywords, then click to bring up the associated audio segment. (A print transcript may or may not be available as an ancillary text.) By LISTENING to the audio, you get a rounded sense of the speaker as well as what is said and how it is said. The keywords are manually entered by the interviewer, accompanied by a brief summary.

See here as example: interview with Howard Florey, the Australian scientist who invented penicillin.


This timed summary/audio retrieval methodology has also been built into a current major oral history project, Australian Generations (in which I play a small part as a field interviewer), led by Professor Alistair Thomson of Monash University in Melbourne. This page summarises the theoretical debates about use of audio vs. transcript.  See http://arts.monash.edu.au/australian-generations/project/significance/index.php

New practitioners, brace yourselves: (1) oral history is all about LISTENING. Once, in the interview itself; and afterwards, again and again, as you seek to interpret and understand what was said. (2) There is no way of listening to someone EXCEPT IN REAL TIME. That’s partly why oral history is so revelatory – because we do someone the courtesy and privilege of listening to them wholeheartedly for an extended period, which creates an intimate and unique space between two people, and allows for openness, reflection, disclosure and discussion. To cut short the process in the second, post-interview, phase by bringing in a mechanical interpretation via voice recognition software is to traduce the human exchange at the heart of oral history.

As to what computers CAN do, check out this story:

Can an Algorithm Write a Better News Story Than a Human Reporter?