Speech Recognition That Isn't?
There's a story out of Europe
today (links
here
and
here)
about the speech-to-text service
Spinvox. The BBC is alleging that, rather than using advanced speech
recognition to transcribe voice messages into text, as the service claims, the
vast majority of the transcription is actually being done by transcriptionists
in South Africa, the Philippines and
elsewhere.
The Spinvox website makes this statement (emphasis mine):
How does it do it? It
captures spoken words and feeds them into a Voice Message Conversion System,
known as ‘D2' (the Brain), and spits them out as text content.
So D2's pretty smart.
It's bound to be, as D2's a combination of artificial intelligence, voice
recognition and natural linguistics. But it also knows what it doesn't know and
is able to call on human experts for assistance. It learns all the time about
how we speak, and what we say, from the mundane to the ridiculous and so is able to convert what you mean to
say.
"Able to convert what you mean to say." Isn't that the holy grail of speech
recognition? Wouldn't it be something if
technology could really do that?
Only problem is, it can't. While Spinvox won't divulge exactly what percentage of its
transcriptions are done by humans rather than computers, the BBC is reporting
that its sources say transcriptions at one call center in Egypt were done
"100% by people." According to
Kareem Lucilius, who says he worked at the call center for six months, "We
heard the message from the very beginning to the very end. Love messages,
secret messages, messages with sexual content, even people threatening to kill
each other." Another source within
the company has told the BBC that the vast majority of messages are converted to
text by humans rather than by speech recognition technology.
The Spinvox story is certainly interesting in its own right,
but what I thought was particularly remarkable about this story was a quote
(second link above) from a solutions architect at Nuance, obviously a
competitor of Spinvox. John West is
quoted as saying, "In Nuance's view,
this task [transcribing phone dictations] will never be able to be totally automated in the near future. You cannot control the person leaving the
voicemail, or the environmental factors.
Spinvox is offering something that is impossible to deliver now."
Oh?
So let me get this straight. Someone who works for Nuance, home of Dictaphone, eScription, and Dragon
Naturally Speaking, is admitting that not being able to control either the
person dictating or the environment in which they dictate means that
transcription via speech recognition technology will "never be able to be
totally automated in the near future." In other words, as long as doctors are being told by SRT salespeople
that they can continue to dictate "just like they always have" in
busy corridors, noisy offices or in cars with the windows open, MT editors can
rest easy that there will always be a need for our services in the foreseeable
future.
Good to know.