Welcome to Health Care POV | sign in | join
The XY Files in an MT World

Speech Recognition That Isn't?

Published July 23, 2009 1:39 PM by Jay Vance
There's a story out of Europe today (links here and here) about the speech-to-text service Spinvox. The BBC is alleging that, rather than using advanced speech recognition to transcribe voice messages into text, as the service claims, the vast majority of the transcription is actually being done by transcriptionists in South Africa, the Philippines and elsewhere. 

The Spinvox website makes this statement (emphasis mine):

How does it do it? It captures spoken words and feeds them into a Voice Message Conversion System, known as ‘D2' (the Brain), and spits them out as text content.

So D2's pretty smart. It's bound to be, as D2's a combination of artificial intelligence, voice recognition and natural linguistics. But it also knows what it doesn't know and is able to call on human experts for assistance. It learns all the time about how we speak, and what we say, from the mundane to the ridiculous and so is able to convert what you mean to say.

"Able to convert what you mean to say." Isn't that the holy grail of speech recognition? Wouldn't it be something if technology could really do that?

Only problem is, it can't. While Spinvox won't divulge exactly what percentage of its transcriptions are done by humans rather than computers, the BBC is reporting that its sources say transcriptions at one call center in Egypt were done "100% by people." According to Kareem Lucilius, who says he worked at the call center for six months, "We heard the message from the very beginning to the very end. Love messages, secret messages, messages with sexual content, even people threatening to kill each other." Another source within the company has told the BBC that the vast majority of messages are converted to text by humans rather than by speech recognition technology. 

The Spinvox story is certainly interesting in its own right, but what I thought was particularly remarkable about this story was a quote (second link above) from a solutions architect at Nuance, obviously a competitor of Spinvox. John West is quoted as saying, "In Nuance's view, this task [transcribing phone dictations] will never be able to be totally automated in the near future. You cannot control the person leaving the voicemail, or the environmental factors.  Spinvox is offering something that is impossible to deliver now."

Oh?

So let me get this straight. Someone who works for Nuance, home of Dictaphone, eScription, and Dragon Naturally Speaking, is admitting that not being able to control either the person dictating or the environment in which they dictate means that transcription via speech recognition technology will "never be able to be totally automated in the near future." In other words, as long as doctors are being told by SRT salespeople that they can continue to dictate "just like they always have" in busy corridors, noisy offices or in cars with the windows open, MT editors can rest easy that there will always be a need for our services in the foreseeable future.

Good to know.

posted by Jay Vance

2 comments

From telecoms.com today comes news of a statement from SpinVox regarding the ongoing brouhaha over questions

July 27, 2009 11:11 PM

As I always say "Unless artificial intelligence is invented one day as the same as human brain or far better than human brain, complete automation in speech recognition is impossible."

Raj July 23, 2009 10:13 PM

leave a comment



To prevent comment spam, please type the code you see below into the code field before submitting your comment. If you cannot read the numbers in the image, reload the page to generate a new one.

Captcha
Enter the security code below:
 

Search

About this Blog


    Jay Vance, CMT
    Occupation: Medical Transcription Industry Consultant
    Setting: Yuma, AZ
  • About Blog and Author

Keep Me Updated