Welcome to Health Care POV | sign in | join
The XY Files in an MT World

CAPTCHA: More Than Just A Spam-Fighter

Published December 15, 2008 11:33 AM by Jay Vance
Here's one more item to file under "Interesting Facts About The Internet I Never Knew." At one time or another we've all had to fill out one of those "CAPTCHA" thingies where you view distorted text (or listen to distorted audio) and type the words or letters in a little box before you can register on a website. But did you know you were also transcribing at the same time? I sure didn't, until I stumbled across some information from the reCAPTCHA website. It seems that the folks who created CAPTCHAs have now harnessed the power of that little text box to help digitize books and transcribe old radio shows.

Books from the Internet Archive and old editions of the New York Times are being digitized by optical character recognition (OCR) scanning, a technology which is good but not perfect. So the CAPTCHA system sends words which can't be read by computers (most OCR programs alert you when a word may not have been read correctly) to the Web in the form of CAPTCHAs for humans to decipher.

Of course, an obvious question is if a computer can't read a word, how does the system know if a CAPTCHA completed by a human actually interpreted that word correctly? Here's how: Each new word which can't be read correctly by OCR is sent to a CAPTCHA along with another word for which the answer is already known. The CAPTCHA user is then asked to type both words. If the user correctly interprets the word for which the answer is known, the system assumes the answer is correct for the new one as well. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.

Is that brilliant, or what?

But in addition to using CAPTCHAs to identify scanned text, the organization has gone one step further and is using the technology to transcribe old radio programs. Instead of using spoken digits or letters, the new audio CAPTCHA presents entire spoken sentences or phrases from old-time radio shows that speech recognition software could not decipher correctly. The resultant "transcription" from humans solving these CAPTCHAs is, in fact, used to transcribe the radio programs. Much like visual CAPTCHA have helped digitize billions of printed words so far, the audio version will help transcribe large amounts of historical audio content.

If you'd like to see the new audio CAPTCHA in action, use this link and click on the audio button.

(In case you're wondering how the audio CAPTCHA verifies the accuracy of transcribed words, the company says, "The verification algorithm uses a phoneme-based encoding and allows a small number of mistakes." Since I'm not sure what that means, I'll have to take their word for it!)

What I like about this concept is that it's not just another cool Internet technology (not that there's anything wrong with that!), it's also an example of finding ways to accomplish more than one task in a single operation. Speaking as a human handicapped with a Y chromosome, I'm quite envious of anything that can multitask!

I suppose someone out there may see this as yet another example of the conspiracy to get rid of medical transcriptionists (heard about this one?), but in this case I'd say the turnaround time might be a tad long, so I don't think we have to start worrying just yet.

1 comments

Could you help me. Work saves us from three great evils: boredom, vice and need.

I am from Bosnia and , too, and now am writing in English, please tell me right I wrote the following sentence: "Com offers get cheap airline tickets and other."

Thanks 8-). Cass.

Cass Cass, Airline Tickets - Airline Tickets, Airline Tickets March 14, 2009 4:39 AM
Indiana DC

leave a comment



To prevent comment spam, please type the code you see below into the code field before submitting your comment. If you cannot read the numbers in the image, reload the page to generate a new one.

Captcha
Enter the security code below:
 

Search

About this Blog


    Jay Vance, CMT
    Occupation: Medical Transcription Industry Consultant
    Setting: Yuma, AZ
  • About Blog and Author

Keep Me Updated