"I'm sorry Dave, I'm afraid I can't do that."

Talk to your computer and it might just understand you.

Talking to your computer has been a staple of science fiction since at least the 1960s, but it looks as if it's finally coming within reach.

This week saw the release of the first speech recognition software capable of handling continuous speech without the user having to train it in advance, namely Nuance's Dragon Naturally Speaking (DNS) version 9.

For anyone else who, like me, tried IBM ViaVoice or Dragon Dictate a few years ago, found it awkward to get the system used to your voice, and even more awkward to speak in a staccato word-by-word fashion, this is a huge leap forward.

It means that anyone can simply start talking to any speech-equipped PC and expect it to recognise what they're saying. That's continuous speech, not one. Word. At. A. Time. It could be text to go into a document or email message, or commands for Windows or a Windows app, to open a file or close a pane, say.

It's still not perfect - it can't handle umms and errs very well, not can it spot pauses and automatically insert punctuation, so you need to switch your speech pattern into what we might call "dictation mode."

It still needs a good microphone or headset too, and while it now supports Bluetooth wireless headsets, developer Nuance has so far only found two models good enough to certify.

However, for managers and others who are already used to dictating, this could at last relegate the keyboard to second class status.

Converging advances
So what has changed in the world of speech recognition to make all this possible? It's a convergence of a set of technological advances, says Nuance marketing manager Steven Steenhaut - from DSPs for better microphones and noise cancellation, through cheaper and more plentiful memory, to CPU chips powerful enough to handle the immense processing load, even on a PDA.

And of course, it is better and better software algorithms and language models, for example software can now learn on the fly, without the need for active training.

"The core concepts haven't changed since the technology's inception - it has to rely on those," Steenhaut says. "The advances are a combination of hardware and software, for example the hardware vendors are very focused on increasing their noise cancellation capabilities, and we have created noise cancelling algorithms too."

Other advances include software to recognise bigrams and trigrams - pairs and triplets of words that commonly occur together, and can therefore be used to improve the recognition process.

"It uses a statistical model on top of the acoustic model. It looks at how you create your vocabulary and adapts the statistical model to you," Steenhaut notes.

The same speech recognition technology is also being applied to call centres, allowing the simple enquiries to be handled by computer, within cars for access to information, and on mobile phones. The limited processing power and memory of the latter means you need to speak slowly though, and the vocabulary is more limited, but for text messaging it can be OK.

Medical dictation
Nuance now owns Dictaphone as well, and sees a huge opportunity in layering specialist vocabularies - for lawyers, surgeons or radiographers, say - on top of its speech recognition technology. It says $15 billion a year is spent world-wide on manual transcription within healthcare, so anything it can get of that is worthwhile.

Of course, there's still things it can't do - and one of them is transcribe meetings where more than one voice is present. There is technology to recognise voice-prints, and it is already being used in security applications, but Steenhaut says it's not yet ready for a broader market.

In the meantime, he points to an Australian company which has addressed the problem of recording meetings by developing a system with Dragon software and multiple microphones.

"I think acceptance will broaden significantly one people realise there's no need now to sit down and read scripts," he says. "It makes it accessible to a much wider audience.

"At the moment the focus is to improve productivity for people who create documents. The average speed for dictation is 160 words per minute, versus 50 for a typist."

"It has improved so much through the versions, you can be much more relaxed now," he adds. "Conversational speech is still a way off, but that's definitely the way the technology is evolving."


What are your views on this subject? Use the form below to post a comment on this article up to 500 characters.


Characters remaining: 500

Related Applications news

Google Calendar gets new meeting tool

Smart Rescheduler suggests new times for broken meetings

Novell enterprise social networking tools ready for beta

Pulse suite for collaboration and knowledge sharing

Google Maps for Android updated

Android 2.1 users get local map wallpaper

Java creator backs Oracle and Enterprise Edition

James Gosling emphasises continuity with Sun



Email this article to a friend or colleague:


PLEASE NOTE: Your name is used only to let the recipient know who sent the story, and in case of transmission error. Both your name and the recipient's name and address will not be used for any other purpose.

Techworld White Papers

Email archiving: Top 10 myths and challenges

This survey looks at a number of challenges and myths around email archiving that may also slow adoption of full archiving.

Download Whitepaper

Strategic mobile deployments

Deploying mobile applications? Supporting multiple devices? See why mobile platforms should be part of your IT strategy.

Download Whitepaper

Creating an AUP: Common myths & mistakes

Avoid the common myths & mistakes when implementing your AUP

Download Whitepaper

Legal risks of uncontrolled email and web use

Exploring the challenges facing IT Mangers today and vital steps to ensure safe internet an email use by employees.

Download Whitepaper

Techworld UK - Technology - Business

COLT White Paper

Virtualisation 2.0
Driving to higher ground beyond the basics

Virtualisation can deliver unparalleled efficiency and cost reductions to your business, allowing direct access to servers and guaranteeing a dependable, rapid response in times of crisis. Read this e-book to learn more about consolidation, discover the latest technologies and find out how to reduce the TCO of virtualisation.

Download E-Book
COLT White Paper

IT Misuse Survey

Complete this survey and you could win a Nexus One

Techworld are running a short survey to discover how UK businesses are managing Internet and email misuse in the Enterprise.

Complete Survey

Webcast: IT Financial Management: Cost Optimisation for Efficiency and Agility.
On Demand Webcast
Join this webcast to learn about the techniques and technologies that can help you prove the value of IT to the business by understanding the true cost of today's IT services and those that will be necessary to deliver future success.

Register Today

Site Map

IDG Network

* *