It is so easy to ask AI to do everything for you at the moment, from writing articles (this is definitely not AI – my very sore left wrist can attest to this!) through to producing summaries of meetings, providing information and creating new documents and emails. All good and really helpful, and I am sure most organisations and individuals are now using AI on a very regular basis to improve their productivity and work levels. However, what about transcription?
We are finding time and again, that when business clients send over AI-produced transcriptions (academic clients rarely use it due to the lack of data security and institution prohibitions), the AI transcription doesn’t just contain errors, which are understandable, it also contains hallucinations and thoughts from the imagination of the AI model being used.
AI Hallucinations
There was a study completed some time ago into the ChatGPT model, which showed a propensity of their audio to text model, Whisper, to make up text when transcribing audio. This content was not just possible errors, but also quite disturbing content bearing no similarities to the actual recording. Full details of this study can be found here in our earlier article:
It is almost as if the AI model is programmed to please human users, and it assumes that human users want it to help by interpreting the audio and then creating something more interesting than the content transcribed. Fine if you are writing a story or a screenplay I guess, but definitely not for accurate transcription of an audio recording!
So at present there is very much a need for human transcription. Although AI transcription has its place for dictation and creating rough transcripts for 1 or 2 speakers, it really struggles with multi-speakers, accents, local language use, detecting hard to hear voices, background noise, remembering not to make up sections of text and formatting. In fact it struggles to cope with over 95% of our work, which tends to be one or more of the above. Also humans obviously don’t have to remember not to make up sections of text – although we want to please our clients, we don’t tend to hallucinate in the same way AI does!
We very much doubt that AI in its current form is going to be able to compete with humans for academic, research and sensitive transcription of audio & video recordings. Whether or not anyone can train an AI model to accurately transcribe accents, hard to hear recordings and multi-speakers remains to be seen.






