Is it time to reappraise speech recognition systems?

Comments

One April, 11 and a half years ago, Hollywood actor Richard Dreyfuss presented a new type of software that was going to 'revolutionise business'. He had been paid to host the launch of Dragon's NaturallySpeaking application, which could faultlessly translate spoken words into text. If this worked, we could chuck away our keyboards. Productivity would multiply. Dragon would become the new Microsoft and a new era of IT would dawn. And work it did too -- in the demonstration. But not everything about the event was quite so well stage-managed. New York was suffering its worst ever blizzard and few made it through the snow. One year later, founders Janet and Jim Baker hadn't found the mass market they may have anticipated. That year, a Belgian firm called Lernout & Hauspie introduced Voice-Express, another desktop speech software product that could potentially free us all from the tyranny of crouching over a keyboard, ruining our posture and giving ourselves RSI. In a demo, it even outperformed the world's fastest typist.

So why aren't we using this software on every computer in the land? Why aren't we talking to computers, telling them what we want to do? How come Windows and Mac OS remained the user interfaces of choice, when voice commands would be so much more efficient and user friendly? Especially as speech dictation has become part of so many phone calls to buy tickets, report meter readings and query bills?

Clearly, the cynic might argue, the software was less effective outside a demo environment. As soon as there is no public relations executive standing over you, fussing that "it won't work if you do it like that", then the technology doesn't function and we give up.

And yet it's still with us. The inventors have persisted through all the trials and tribulations. How often have we seen technology deliver the promised benefits, after a decade of trying?

Videoconferencing took nearly 50 years to deliver on its initial promise. Windows took three versions to make a big impact. After only 10 years, voice recognition software might actually be worth installing for some users, especially if they have a disability, if they share computers or regularly insert slabs of ready-made text.

To find out where speech software is today, CIO magazine talked to the man behind Dragon at the launch of version 10 of NaturallySpeaking.

Well documented

Steve Chambers has been president of mobile services at Nuance (owner of Dragon as well as a number of other speech recognition companies it has acquired) for eight years, after a decade at Xerox. His career reflects the changing emphasis in managing information. Xerox was the place to be for document management. Now speech is the key to managing intelligent systems. There have been a lot of false dawns. So has voice recognition really come of age and is it ready for CIOs to consider for broad deployment? Or are optimists still kissing frogs?

"I would argue that a CIO has some pretty sound reasons for deploying this software, even if it's on a limited basis," says Chambers. "Although actually, that's one of the best ways you can build mass appeal for a new application among your users. Give it to the privileged few, and pretty soon everyone will be knocking on your door demanding it," he says.

The problem that speech recognition suffered from when it first appeared was over-selling its capabilities. It was never going to be a replacement for typing, just an alternative for certain types of users and certain types of documents or usage scenarios. It would have been a mistake to give people the impression that they could dispense with their keyboards completely and look forward to talking to their computers. And it was also premature to make too many bold claims for the robustness of the technology. It wasn't ready back then, Chambers admits, and whether it is ready now is open to debate.

"It all depends on the tasks you're undertaking," says Chambers. "Clearly there are some processes where you are going to write several drafts before you are happy, sometimes of the first paragraph. That's still a lot easier to do with a keyboard than with vocal commands."

One of the most obvious drawbacks with NaturallySpeaking is that you have to speak slowly and clearly, and you have to know exactly what you are going to say before you start dictating. The best way to work this out, of course, is to write yourself a script. And how would you do that? Using a word processor and a keyboard.

However, there are still niches where speech recognition has an obvious effect on productivity. In jobs where no creative thought is needed, for example, such as gathering information in the field. Insurance investigators reporting their findings can have their observations quickly transcribed, or legal professionals can dictate notes. Doctors who have dictated their observations about patients into a voice recorder can now have them transcribed by playing back the recording to NaturallySpeaking.

In the US, Nuance's software is used by around 400,000 physicians in 4000 healthcare institutions and is used to dictate over 2000 lines of medical notes every day.

Maybe it's because the accent is more uniform in America. Not so, says Chambers. NaturallySpeaking recognises there are eight different regional accents in the US. You can configure your software depending on your accent. A New Yorker, for example, would might want to choose the setup that recognises words pronounced in the accent of the north-east coast.

Meeting needs

By contrast, NaturallySpeaking recognises only one British accent. Still, its standard edition it's pretty cheap at around £80 (US$118), so in certain circumstances it would pay for itself. If you are in a company which has a meeting culture that keeps you away from productive work, voice recognition software can be a boon.

Nobody ever writes up their meeting notes meticulously, and most of us leave them lying round for at least a week, after which they usually become a meaningless scrawl. Dictating your notes after leaving a meeting could be a massive time-saver. It's also a good way of recapping the important parts of the meeting which are usually pushed to the back of your mind by the growing sense of outrage at how long the meeting is over-running.

"This is a great solution to meeting culture," says Chambers, before quipping that "the other solution would be to modify the time we waste at meetings".

Another modern phenomenon that this technology can be applied to is health and safety in the workplace. "This is great for users who have been incapacitated by repetitive strain injury," says Chambers. "This is another tool in the CIO's armoury as the workforce becomes potentially more hostile and litigious."

One European newspaper, Noordhollands Dagblad, adopted the software to let a long-standing employee who was suffering from carpal tunnel syndrome continue working. "I'm a newspaper editor, there's nothing else I could be employed to do," says Bart Vuijk. "So this software saved my career. It took a bit of adjustment, but I've been using it since it was on version 7."

Vuijk reports that having rested his hands, his carpal tunnel syndrome inflammation has died down. His diction has improved too.

Others see speech software being used for users with different problems.

"Local authorities like Newham have long used dictation software successfully for people with dyslexia, but it's not much used for other disabilities at present," says Richard Steel, CIO of the London Borough of Newham and president of Socitm, a body for public-sector IT chiefs.

"An effective natural language interface for general purpose use has seemed to be forever on the horizon," he says. "But if there are finally cost-effective products available, I think we should look again."

In some fields, accuracy is as important as speed, as a spelling error becomes not just an inconvenience, but a costly mistake. Dragon NaturallySpeaking transformed working practises and raised productivity at London-based patent attorney firm Beresford & Co.

As anyone who has ever filled in a patent form will know, there is plenty of soul-destroying data inputting to do. Patent work is a long slog and requires pains-taking attention to detail to define a Patent Specification, which can be over 200 pages long. If every word is not correct there can be serious ramifications. In the past, the company used traditional tape-based dictation, which often meant a delay while secretarial staff transcribed the tapes.

In theory, the software is capable of transcribing 160 words per minute. If true, this would automate many documentation processes. It would speed it up, too, since few humans can type more than 120 words per minute and the average typing speed is an unimpressive 35 words a minute.

Does it work though? Keith Beresford, senior partner at Beresford & Co, seems to think it does. "I experience 100 per cent accuracy at speeds of up to 150 words per minute when consciously dictating in a measured tone. We are producing more documents for the time we spend on administration with fewer corrections and less time editing overall," he says.

Broadly speaking

Based on this success, Beresford took the decision to deploy Dragon NaturallySpeaking Version 8 across the enterprise, equipping 20 fee earners and one of his administrative staff with the software. Currently, the firm uses the technology to deliver a faster turnaround of legal documentation and email correspondence. The training and setup of the software was completed by Sonant Software, a UK-based integration partner.

The use of speech recognition has also delivered many human benefits to the workplace. Staff are no longer "chained to the keyboard" which cuts the risk of RSI, says Beresford. "Using Dragon is a far more enjoyable way to work; especially when you get to the end of the day and the in-tray is empty."

That's all very well and speech clearly has a role for many organisations but most users are still going to find it a challenge, advises one analyst.

"Speech recognition software is now very good -- but many users do find it difficult to use," says Clive Longbottom, service director of business process analysis at Quocirca.

"The problem is that most people write in a different manner to how they speak -- and a document written from the spoken word can come across as very long-winded and wordy.

"However, for those used to dictation -- and they're a dying breed -- or those with physical issues that stop them from typing, speech recognition is more than viable these days."