Is Voice Recognition Right for Your Practice?

October 28, 2010


Guest author Rolfe Reitz, MD, offers tips for choosing and installing voice recognition software.

By Rolfe Reitz, MD

Interest in the use of voice recognition software in the day-to-day practice of neurologists across the country is increasing. Users of this technology recognize many benefits including reduction in overhead costs, elimination of legibility issues, increased volume of documentation, and the ability to maintain free text when used in conjunction with an electronic health record.

Many physicians who are considering the implementation of voice recognition are plagued by uncertainties including the cost, training, and hardware requirements.

Dragon Naturally Speaking is leading vendor

While multiple software options have historically existed in the field of voice recognition, the Dragon Naturally Speaking products have emerged in recent years as the clear leader in the field. The Dragon products are now available for Windows-based platforms as well as for Mac, with recent adoption of the Dragon engine in the MacSpeech program.

Three levels of software sophistication are generally available for Dragon Naturally Speaking: Standard, Preferred, and Medical. Each increase in level is associated with an increase in the capabilities, complexity, and price of the package.

The Standard version of Dragon Naturally Speaking, while relatively cheap, is too simplistic for reasonable use in most medical practices. The Preferred version may be used by those with advanced technology skills. New users will usually find the best success with the Medical version, which also has a medical dictionary that can be tailored on a limited basis to the specialty of neurology. The specifications of each version are available on the product website at You should thoroughly review the various features of each package prior to reaching a purchase decision.

Observe a current user before purchasing system

Prior to transitioning to voice recognition in your practice, you should spend some time simply observing a colleague using the program. Most long-term users of the program are enthusiastic about this technology and are pleased to have a friend looking over their shoulder as one considers whether this program fits his or her needs, temperament, and technical ability.

With many early users uncertain about their commitment to spending significant dollars on a new dictation system, some will experiment with the Standard version, which can be purchased for about $50.

An even cheaper alternative is to find a friend who is willing to allow you to sample his or her system, although this should be done by entering you as a new user. Each user on a given system develops a unique voice recognition pattern that is stored in a special file which should not be altered by another voice. Establishment of a new user allows creation of a new recognition file, which can subsequently be deleted when the trial is completed. For potential users who have an accent or a high pitched voice, this experimental use provides the opportunity to explore the system on a short-term basis prior to committing significant financial resources to a system that might not meet their needs.

Purchase a computer with maximum processor power

Hardware issues are of paramount importance in voice recognition technology. While most modern computers and laptop computers will have sufficient power to run voice recognition, new users will often purchase new hardware with the initial installation.

When selecting a computer, you should purchase the maximum power you can afford. Voice recognition programs utilize processor and memory resources to a greater extent than almost any other nongaming program. With the latest operating system such as Windows 7, the power requirements are even greater. With the advancement of 64 bit technology, it is reasonable to consider processor power approximating 3 GHz and RAM of at least 4 GB. Many late model laptop computers will now accommodate RAM of 8 GB. In this unique circumstance, overkill is a good thing.

Purchase a quality microphone

Equally important to the computer is the microphone type. A common source of failure is the decision to purchase a cheap microphone from a discount store. The retail versions of Dragon typically include a microphone in the package, although experienced users usually feel that better accuracy can be obtained by upgrading from the stock microphone.

The microphone should be a unidirectional, noise-cancelling type, with most experienced users preferring the USB interface. Plugged microphones are acceptable if the sound card quality of the computer is superb. Laptop computers have historically had extreme limitations on the quality of the sound cards that are available, and it is therefore desirable to use a USB microphone in that setting.

Excellent quality microphones usually cost well over $100 but are worthwhile investments. These usually last many years and can dramatically improve accuracy. Some users prefer a handheld microphone with built in mouse capabilities, with others prefer a headset. This is a personal preference that may change over time. Upscale microphone vendors are typically happy to talk with customers, and most are well versed in the excellent options specific for voice recognition that their company can offer.

Balance cost against transcription service

Initial costs for hardware and software are variable, depending on the version of software purchased and the budget available for hardware. Some experienced users can get a system for about $2,000, but these costs can easily exceed $5,000 for advanced hardware, extra RAM, and the medical version of Dragon Naturally Speaking, which can cost between $1,700 and $2,000. You must balance this initial cost against the yearly cost of transcription in the practice.

Install software to your hard drive

Installation is a fairly straightforward process. A few simple choices are required as the software is loaded to your hard drive, but these are mostly intuitive. New users will usually accept the default choices, although advanced users may often modify the location in which the voice recognition files are stored. These unique files store the important characteristics of your voice, the speed at which you dictate, and many issues related to your vocabulary. These can be stored in a folder created on your desktop, for example, and should be backed up and stored in a safe location on a frequent basis. This gives you additional protection in the event of a hard drive crash. Loss of these important files will require complete retraining when a new system is installed. Preservation of these files enables importation into a new system.

Read the tutorial

After the software is installed, you will be directed to the tutorial. It is of critical importance to work through this tutorial with diligence. The tutorial in the medical version is quite extensive. It is recommended that you work through this tutorial until the commands come naturally. You can repeat the tutorial as many times as necessary.

Import your past dictations into system

Users who have stored their past dictations on electronic media such as a hard drive, CD-ROM, or USB drive have additional options. The Dragon software package has the capacity to import past data into the current system. This is easily accomplished through the task bar, with the installer simply identifying the source of the data. The package makes an assessment of the vocabulary and will identify new words that are not currently in the dictionary. The user is then given the opportunity to train the system on the new words, and the option is also given to eliminate words that are not desired for the vocabulary (such as misspelled words from the old media). The user is then given the option to incorporate the sentence structure into the speech patterns. Adoption of these features dramatically improves the accuracy of the system.

Dictate as if you are reading the news

Most users will need to modify their dictation style. It is important to remember that this software program works based on statistical probability – in effect, it makes a statistical calculation of the likelihood of your next spoken words/phrases based on the words/phrases that you have used in the past. The microphone is not a human ear, and the computer is not a human brain capable of receptive speech.

A good rule to remember is to dictate as though you are reading the evening news to the public on television. This usually requires a slightly slower pace of speech, with each word spoken distinctly. Conversely, the system works poorly when users speak too slowly. Use of phrases helps the system to make predictions about your speech based on statistics. The accuracy of the system improves over time as the system acquires ongoing data regarding your unique voice characteristics and vocabulary. It is important to use the system to verbally correct mistakes that are made, as simply correcting the mistakes with the keyboard does not provide the system with the statistical means to improve accuracy.

Brush up on your computer skills

A common myth regarding voice recognition is that the system is a replacement for basic computer skills. If you have limited keyboarding and word processing skills, you will likely experience great frustration as you attempt to incorporate this system into your practice. As healthcare professionals, our continuing education should include advancement of our technology skills, regardless of the level at which we currently function. Basic level courses are usually available at local vocational technical schools/adult educational classes, and are both entertaining and practical.

Don't expect a shorter work day

Use of voice recognition technology does not save time in your practice. A recent upgrade by this author required a time investment of about 14 hours, allowing for tutorial updating, transfer of templates, and importation of four years of past dictations, which resulted in the need to train about 2,600 new words that were not in the dictionary. This time investment resulted in dramatically improved accuracy. Early adoption of the program will require allowance of additional time for correcting mistakes of the system, learning of templates, and placement of the documentation into your unique system of record keeping.

Frustration in the first week is usually quite high, especially for the new user or for those with limited technology skills. This improves significantly over a period of several weeks. As accuracy improves and the system becomes more comfortable for the user, the frustration level naturally decreases.

Long-term users will quickly learn to manipulate their dictionary, taking a few seconds to remove words that are never used but tend to confuse the system, and this process significantly improves long -erm accuracy. Veteran users will usually find that new dictations take a few extra minutes when compared with human transcription, with return notes taking slightly less time for many users.

Most long-term users of the Dragon system usually report that the slight increase in time required for this system is adequately offset by the long-term savings in transcription costs. Additional advantages include the rapid turnaround time for placement of documentation into the record, elimination of legibility problems, and the ability to markedly increase the volume of information recorded in the medical record.

Voice recognition technology also provides neurologists an additional option in dealing with increasing office overhead and declining levels of reimbursement.

Author Disclosure:

Rolfe Reitz, MD, has nothing to disclose.