Giving a voice to speech recognition

11 October 2007 in News

Speech-controlled software is opening new opportunities for businesses

After much initial promise, speech recognition technology has battled with accuracy problems. But the next generation of tools that are now available are increasingly capable of handling both different regional accents and everyday talking patterns.

And analyst Gartner predicts the global market will be worth $191m (£93m) by 2010.
Currently, the US accounts for 70 per cent of speech recognition spending. But the proportion is dropping as Europe starts to follows its lead.

The technology has many potential uses. In the healthcare sector, it offers a cost-effective way to manage the burden of medical reporting – see Hammersmith Hospitals case study.

And it can also be used to handle incoming customer queries – see Vodafone case study.

CASE STUDY: Hammersmith Hospitals

The Hammersmith Hospitals NHS Trust has cut the time it takes to process x-ray reports by a third, thanks to speech recognition software adopted as part of wider IT upgrades in the trust’s radiology departments.

Feedback from the trust’s Charing Cross Hospital shows that 84 per cent of x-ray reports are now filed within the target time of 24 hours. And report completion time has dropped from more than six hours to less than two.

The system has significantly improved efficiency, said Hammersmith director of imaging Professor Philip Gishen.

“We used to see how many reports we could get done in 24 hours,” said Gishen.

“Now we are able to measure in minutes how long it takes to finish a report from the moment an x-ray first hits our digital imaging system.”

Before the upgrades took place, the trust’s four London hospitals relied on a combination of analogue images and human transcription for handling x-ray reports. Doctors worked from a hard copy of each scan, giving dictations to a stenographer, who would then write up the official paperwork.

Images are now uploaded to the trust’s digital picture archiving communication system. And after selecting the relevant records for a specific patient, medical staff can dictate their findings directly using either a headset or a handheld dictaphone device.

Unlike human transcribers, the dictation software is capable of noting down text at a natural rate of speech. The reporting doctors are also able to edit and format their text using brief vocal commands.

CASE STUDY: Vodafone

Vodafone’s use of speech recognition technology is focused on providing a more human experience for customers using its automated call centre services.
The company receives about six million customer calls every week, which are handled by either human agents or by the company’s interactive voice response network.

By equipping its automated systems with a new “persona” – using upgraded voice recognition software – the company hopes to encourage the use of its self-service systems.

Presenting a more human interface was a vital step to overcoming negative perceptions, said Vodafone head of self service Mel Rowland.

“The thing people hate most about using an automated service is that it feels as if they are talking to a robot,” said Rowland.

“Our customers pay their bill 12 times a year, so we want them to use the service 12 times a year. It has to be a good experience, or they simply will not use it,” she said.

The virtual agent, known as Vicky, is designed to be more realistic. It uses and understands natural speech patterns, which cuts down on stilted, one-way conversations.

Since it was introduced in April, the system has acted as a guide through Vodafone’s customer service options.

And by the start of 2008, Vodafone customers will also be able to register new pre-pay phones, or pay their monthly bills using Vicky’s automated services.

Speech recognition is a useful tool, but it is not used to avoid talking directly to customers, says Vodafone. The firm will retain its human operators, callers can still choose to speak directly to a person, and a trigger system will automatically transfer them after three errors.

“There are 56 regional dialects in the UK, so there will be voices that the system will struggle to understand,” said Rowland.

"If that happens, any information that is already completed is passed on to the human agent, so the caller will not have to start the whole process over again.”

The Vodafone and Hammersmith Hospitals speech recognition projects both use systems provided by Nuance Communications.