"I'm sorry Dave, I'm afraid I can't do that."

Talk to your computer and it might just understand you.

Talking to your computer has been a staple of science fiction since at least the 1960s, but it looks as if it's finally coming within reach.

This week saw the release of the first speech recognition software capable of handling continuous speech without the user having to train it in advance, namely Nuance's Dragon Naturally Speaking (DNS) version 9.

For anyone else who, like me, tried IBM ViaVoice or Dragon Dictate a few years ago, found it awkward to get the system used to your voice, and even more awkward to speak in a staccato word-by-word fashion, this is a huge leap forward.

It means that anyone can simply start talking to any speech-equipped PC and expect it to recognise what they're saying. That's continuous speech, not one. Word. At. A. Time. It could be text to go into a document or email message, or commands for Windows or a Windows app, to open a file or close a pane, say.

It's still not perfect - it can't handle umms and errs very well, not can it spot pauses and automatically insert punctuation, so you need to switch your speech pattern into what we might call "dictation mode."

It still needs a good microphone or headset too, and while it now supports Bluetooth wireless headsets, developer Nuance has so far only found two models good enough to certify.

However, for managers and others who are already used to dictating, this could at last relegate the keyboard to second class status.

Converging advances
So what has changed in the world of speech recognition to make all this possible? It's a convergence of a set of technological advances, says Nuance marketing manager Steven Steenhaut - from DSPs for better microphones and noise cancellation, through cheaper and more plentiful memory, to CPU chips powerful enough to handle the immense processing load, even on a PDA.

And of course, it is better and better software algorithms and language models, for example software can now learn on the fly, without the need for active training.

"The core concepts haven't changed since the technology's inception - it has to rely on those," Steenhaut says. "The advances are a combination of hardware and software, for example the hardware vendors are very focused on increasing their noise cancellation capabilities, and we have created noise cancelling algorithms too."

Other advances include software to recognise bigrams and trigrams - pairs and triplets of words that commonly occur together, and can therefore be used to improve the recognition process.

"It uses a statistical model on top of the acoustic model. It looks at how you create your vocabulary and adapts the statistical model to you," Steenhaut notes.

The same speech recognition technology is also being applied to call centres, allowing the simple enquiries to be handled by computer, within cars for access to information, and on mobile phones. The limited processing power and memory of the latter means you need to speak slowly though, and the vocabulary is more limited, but for text messaging it can be OK.

Medical dictation
Nuance now owns Dictaphone as well, and sees a huge opportunity in layering specialist vocabularies - for lawyers, surgeons or radiographers, say - on top of its speech recognition technology. It says $15 billion a year is spent world-wide on manual transcription within healthcare, so anything it can get of that is worthwhile.

Of course, there's still things it can't do - and one of them is transcribe meetings where more than one voice is present. There is technology to recognise voice-prints, and it is already being used in security applications, but Steenhaut says it's not yet ready for a broader market.

In the meantime, he points to an Australian company which has addressed the problem of recording meetings by developing a system with Dragon software and multiple microphones.

"I think acceptance will broaden significantly one people realise there's no need now to sit down and read scripts," he says. "It makes it accessible to a much wider audience.

"At the moment the focus is to improve productivity for people who create documents. The average speed for dictation is 160 words per minute, versus 50 for a typist."

"It has improved so much through the versions, you can be much more relaxed now," he adds. "Conversational speech is still a way off, but that's definitely the way the technology is evolving."


What are your views on this subject? Use the form below to post a comment on this article up to 500 characters.


Characters remaining: 500

Related Applications news

Toyota in Prius global recall after braking software fault

Car maker will replace antilock braking software in 400,000 cars worldwide

Microsoft updates ASP web apps tool

ASP .Net MVC 2 to simplify rich application development

Symphony 3.0 beta signals another attack on Office

IBM ramps up pressure on Microsoft on productivity applications

Microsoft to drop Linux and Unix from enterprise search

Fast Search to be Windows only



Email this article to a friend or colleague:


PLEASE NOTE: Your name is used only to let the recipient know who sent the story, and in case of transmission error. Both your name and the recipient's name and address will not be used for any other purpose.

Techworld White Papers

Challenges and opportunities of PCI

The Payment Card Industry Data Security Standard provides an enterprise structure for improving operational, security, and audit performance. The benefits of the PCI DSS go beyond audit costs and results.

Download Whitepaper

Database security: Preventing enterprise data leaks at the source

IDC discusses the growing internal threats to business information, the impact of government regulations on the protection of data, and how enterprises must adopt database security best practices...

Download Whitepaper

Six essential steps to successful IT centralisation

This report, based on the real experience of a recent centralisation project, is aimed at those involved in IT strategy within their organisation. It provides some practical insights for CIOs, CTOs, Heads of IT, IT Directors and those involved more closely with the service management function.

Download Whitepaper

Application Grid: The ideal platform for IT consolidation

Evaluating the opportunity for consolidation of middleware — Java application servers and related technologies.

Download Whitepaper

Techworld UK - Technology - Business

COLT White Paper

Are all VoIP services the same?

Questions to ask your service provider to ensure you get the VoIP service you need
With careful choice of partner, your business can have all the advantages of VoIP access - reduced costs, flexibility and simplicity - without the drawbacks.
This white paper is your guide to ensure you get right the VoIP service and details the pitfalls which businesses would do well to avoid.

Download white paper
COLT White Paper

IT Misuse Survey

Complete this survey and you could win a Nexus One

Techworld are running a short survey to discover how UK businesses are managing Internet and email misuse in the Enterprise.

Complete Survey

Webcast: IT Financial Management: Cost Optimisation for Efficiency and Agility.
On Demand Webcast
Join this webcast to learn about the techniques and technologies that can help you prove the value of IT to the business by understanding the true cost of today's IT services and those that will be necessary to deliver future success.

Register Today

Site Map

IDG Network

* *