Tuesday, June 14, 2011

Recording on the iPhone for Dragon NaturallySpeaking Transcription

Background

Dragon NaturallySpeaking (DNS) users have been purchasing digital voice recorders (DVRs) to take advantage of DNS's transcription feature. But the iPhone and iPod Touch (hereafter I'll use "iPhone" to refer to both devices) also allow you to record dictation that can be transcribed into text by DNS. Using a single device for multiple purposes has many advantages including being more cost-effective, and reducing the time and effort to learn how to operate and maintain another device. But the quality of the recording is critical to transcription accuracy; Nuance even tests and publishes quality ratings for a variety of recording devices.  So how do iPhone recordings compare to a typical, well-rated DVR?
I decided to try to answer this question as part of a study I'm conducting on voice note-taking for the University of Calgary's Disability Resource Centre. Voice note-taking allows students who cannot hand-write or type lecture notes to make notes by voice. They speak into a special hand-held microphone, called the Sylencer, which muffles their speech and acts as a portable sound booth to eliminate background noise. The Sylencer is connected to either a DVR or a laptop. The DVR option creates a recording that the student can later have DNS transcribe into text. With the laptop, notes are dictated into DNS in real-time.
For the voice note-taking study, I selected three DVRs that are rated for high transcription accuracy with DNS and offer a variety of features and form factors with low to mid-range affordability. The Sony ICD-SX68 DVR was available to me at the time I did this analysis, so I used it to compare to the iPhone.
Because most iPhone users won't be using a Sylencer, I also tested the iPhone with a conventional headset, the Andrea NC-185VM. It is rated 6 out of 6 on the DNS Hardware Compatibility List.

Options For Recording on the iPhone

I used two recording methods on the iPhone. It includes an app called "Voice Memo" in its Utility folder. The major disadvantage of this app is that the recording file type (M4A) is not accepted by DNS so it needs to be converted into a compatible format (e.g. MP3)  which is easy to do in iTunes once you've customized the import settings. Also, the recording is made with a 64 kbps bit rate and 44.1 kHz sample rate; you can downgrade these settings when you customize the import settings in iTunes (e.g. a 16 kbps bit rate and 11.025 kHz sample rate results in an MP3 file one-quarter the size of the original) but I've found that transcription accuracy really suffers.
In its evaluation of the iPhone as a recorder, Nuance recommends using the Andrea Pure Audio Live Recorder app. It is available at the iTunes Store for $2.99. The major advantage of this app is that it allows you to record in the WAV format which DNS accepts for transcription. There are other advantages including background noise cancellation, voice activated recording (VAR), 4 audio quality levels, naming a recording and assigning it to a category. However, Nuance recommends turning off the background noise cancellation feature, and Andrea recommends disabling both the noise cancellation and VAR features. Nuance recommends the "high" quality setting which produces files with a 22kHz sample rate. The "CD" quality setting is the highest and uses the same sample rate as the Voice Memo app: 44.1 kHz.
When I tested the Andrea headset with the iPhone, I used the Voice Memo app because at that point I had analyzed the results from using the Sylencer with these two apps, and I found that overall the Voice Memo app performed better than the Pure Audio app.

iPhone Microphone Adapter

To use the Sylencer microphone with the iPhone, I had to purchase a special adapter that converts from TR (microphone) to TRRS (iPhone). KVconnection sells a Y-shaped iPhone 1/8 inch electret condenser microphone and headphone adapter (3.5mm 4 conductor TRRS Male to 3.5mm TRS Jacks) that converts TR (mic) + TRS (headphone) to iPhone (TRRS).
The headphone plug is important because when the adapter is plugged into the iPhone, sound doesn't emit from the external speaker. The headphone plug will accept headphones, earbuds or desktop speakers.
Even more important is the fact that this adapter permits power from the iPhone to travel to the Sylencer which is necessary to power a "SmartMic" plug, found on some Sylencer models. The SmartMic plug requires power because the SmartMic enables the Sylencer to be tunable i.e. its sensitivity (loudness, softness) can be adjusted.
I also used the iPhone microphone adapter with the Andrea headset.

Analysis Procedure

On each platform, I made 3 recordings from the Success is a Journey DNS training script. Each recording was from a different section but contained the same number of words (881). I recorded on the following: Sony DVR, iPhone with Voice Memo app, iPhone with Pure Audio app. I recorded with the Sylencer microphone on each of these 3 device/app combinations but only recorded with the Andrea headset on the iPhone with the Voice Memo app. The transcripts were produced on the same computer using a DNS user profile (UP) that I had trained for each of the four test platforms. The UPs were either new or very recent so had not benefited from additional training through the transcript correction process.
I used the default or recommended audio settings for each device/app:
  • Sony DVR: recordings are made in the proprietary MSV format with highest quality (STHQ) and then converted in the Digital Voice Editor (DVE) application to the WAV format with 16 bit 11.025 kHz mono
  • iPhone with Voice Memo app: recordings are made in the M4A format with the only available quality setting and then converted in iTunes to the MP3 format with the equivalent audio settings of 64kbps 44.1kHz mono
  • iPhone with Andrea Pure Audio Live Recorder app: recordings are made in the WAV format with high quality (22kHz sample rate); they can be downloaded to your computer via Wi-Fi (browse to a particular web page) or iTunes via File Sharing in the Apps section.
To assess transcription accuracy, I used the Compare feature in Microsoft Word. I compared the transcript produced from a recording ("original document") to the correct version of the text ("revised document"). Because I was only interested in content differences, I selected only two comparison settings: Insertions and deletions, and Moves (infrequent but a good indication of a difference).

Analysis Results

Below is a table showing the Word 2010 Compare results for the 3 recordings made for the 4 platforms tested.  Usually an error will involve a deletion of the incorrect text and insertion of the correct text. But there tends to be more insertions than deletions because sometimes DNS omits words; you'll notice that accuracy may decrease as word count decreases.
"Success is a Journey" pages (881 words)MicDeviceFile TypeFile Size (MB)Rec. Time (min.)# wordsIns./ Del./ MovesTotal
6 - 7Andrea HeadsetiPod Touch, iOS 4, Voice Memo appMP3 44kHz 64kbps mono6.0113:0888227 / 22 / 049
6 - 7SylenceriPod Touch, iOS 4, Voice Memo appMP3 44kHz 64kbps mono5.6912:2589034 / 33 / 067
6 - 7SylencerSony ICD-SX68WAV 11kHz 16 bit mono16.413:0388743 / 36 / 079
6 - 7SylenceriPod Touch, iOS 4, Pure Audio appWAV 22kHz 16bit mono33.813:2389445 / 43 / 290
         
8 - 9Andrea HeadsetiPod Touch, iOS 4, Voice Memo appMP3 44kHz 64kbps mono5.0110:5788329 / 23 / 052
8 - 9SylenceriPod Touch, iOS 4, Pure Audio appWAV 22kHz 16bit mono29.311:3687741 / 40 / 081
8 - 9SylenceriPod Touch, iOS 4, Voice Memo appMP3 44kHz 64kbps mono4.6610:1087447 / 40 / 087
8 - 9SylencerSony ICD-SX68WAV 11kHz 16 bit mono13.910:4786156 / 44 / 0100
         
10 - 11Andrea HeadsetiPod Touch, iOS 4, Voice Memo appMP3 44kHz 64kbps mono4.7310:2087046 / 41 / 087
10 - 11SylenceriPod Touch, iOS 4, Voice Memo appMP3 44kHz 64kbps mono4.549:5587251 / 44 / 095
10 - 11SylencerSony ICD-SX68WAV 11kHz 16 bit mono1310:1887754 / 51 / 0105
10 - 11SylenceriPod Touch, iOS 4, Pure Audio appWAV 22kHz 16bit mono27.310:4988156 / 55 / 0111
Word 2010 Compare Feature Issues
For a number of reasons, the data reported by the Word 2010 Compare feature should only be used as a starting point for further analysis of the differences between the transcript and the original text. First, the number of insertions, deletions and moves reported may involve more than one word. For example, "can't trade him and made" is marked as a single deletion that is replaced by "concentrate on their main", a single insertion. So it is important to visually scan the compared document which uses colour highlighting and strikethrough font to show the differences.
Second, if there are major differences, large sections of text may be deleted and replaced, reducing the insertion and deletion count and thus making the transcription accuracy look more favourable. But a visual scan will show many colour changes while a closer analysis of these can reveal that the large chunks of inserted and deleted text have words in common. While I didn't encounter this situation with any of these transcriptions, I have seen this behaviour i.e. a compare reports the fewest insertions and deletions, yet most of the text is highlighted as being different while a closer look shows that some deleted chunks of text do have correctly transcribed content.
So, this comparison requires looking at the statistics as well as a close visual scan of the differences. And, at best, one can only use this tool to decide which platform is better or worse than another (versus quantifying by how much). Based upon my visual scan of the differences, the Compare results do accurately reflect which combinations of mic, device and app perform relatively better:
  • The Andrea headset gives better recognition accuracy than the Sylencer microphone.
  • Overall, iPhone Voice Memo app is likely better than the Pure Audio app.
  • Overall, iPhone offers better recognition accuracy than the Sony DVR.
  • iPhone Voice Memo file sizes are about 3 times smaller than Sony WAV files and 5-6 times smaller than Pure Audio WAV files.
Sylencer Microphone Issues
You might be surprised at the large number of errors when using the Sylencer, in even the Sony DVR recordings. This microphone has its pros and cons. It is a very sensitive microphone that by design also eliminates background noise which should give it a high signal-to-noise ratio. But in order to muffle one's voice, the user has to speak quietly, in an undertone, which means speaking more from the throat. Also, it is important to create a good seal with the mask around the face but this restricts the movement of the facial muscles. These two factors mean that the user must spend time mastering the art of dictating with the Sylencer to achieve good recognition accuracy with DNS.
DNS Transcription Issues
I've found that transcribing the same recording multiple times produces slightly different results even though no corrections were made to improve accuracy.
A Note On File Formats
I used the recommended (Sony DVR, Pure Audio) or default (Voice Memo) file format for the 3 device/app combinations tested. I also converted one of the Sony DVR recordings to 3 other formats offered by the DVE application and received similar results from the Compare analysis (WAV 44kHz stereo, MP3 44 kHz 160kbps stereo, MP3 44 kHz 128kbps stereo). Because the default sample rate is 44kHz for a Voice Memo recording, I also made a Pure Audio recording at 44kHz; its recognition accuracy was similar to the 22kHz recording.
Finally, note that MP3 is a compressed file format so minimize the number of times it is converted as this process reduces the quality of the audio.

Recommendation

While the iPhone Voice Memo app does not offer the features you can find on a DVR or in the Pure Audio app, it performed best in this analysis compared to the other methods tested in terms of recognition accuracy for DNS transcription. Using the iPhone for recording dictation is also cost-effective and convenient because you don't have to purchase, learn and maintain a separate device. If you understand iTunes well enough to customize the import settings and will mostly import recordings for transcription, then using the Voice Memo app is your best option. It's easy to create an MP3 version of the recording in iTunes, the file sizes are smaller and they seem to produce better accuracy than the Pure Audio recordings.
However, the bottom line is that there are several factors that affect DNS transcription recognition accuracy: recording device, recording settings, microphone, file format, DNS version, computer specifications, and DNS user profile training. Therefore, it's important for you to perform your own testing, and if necessary, adjust these factors until you achieve acceptable recognition accuracy.

Resources

PureAudio Live Recorder User Guide
Using PureAudio Live Recorder with Dragon NaturallySpeaking
Wikipedia article on TRS Connector

No comments:

Post a Comment