Thursday, December 8, 2011

Preliminary Findings From a Study to Evaluate Voice Note-taking: Using Dragon NaturallySpeaking in the Classroom


This blog post presents preliminary findings from a study that I am conducting into voice note-taking for the University of Calgary's Disability Resource Centre. It states the purpose of the study, describes the technologies being evaluated, and explains the methodology being followed including the evaluation criteria.
The study began in November 2009 but is still on-going. Thus, I will give an update on its current status and the preliminary findings in the form of criteria for success with this note-taking approach.


The purpose of the study is to evaluate technologies to enable voice recognition software to be used in the classroom for note-taking. This is a need for students who have a learning disability or motor impairment that may prohibit them from using traditional note-taking modalities e.g. handwriting, typing into a laptop. The study also aims to identify criteria necessary for success in the use of voice note-taking.


The three main technologies being evaluated are:
  1. Dragon NaturallySpeaking (DNS) voice recognition software: Converts voice into text
  2. Sylencer handheld microphone: Muffles speech, minimizes background noise
  3. Digital voice recorder (DVR) for later transcription with DNS.
DNS, when used in its conventional manner, is not suitable for taking notes in a classroom because:
  • it is speaker-dependent i.e. trained to recognize a single voice
  • a particular style of dictation must be followed
  • no background noise must be present
  • dictating while the lecturer or other students are speaking is disruptive.
The Sylencer, also known as a Stenomask, is not a new technology. It is widely used in court reporting for "voice writing.” It was invented by Horace Webb as a method for bypassing shorthand to record court proceedings; his son owns Talk Technologies, the developer of the Sylencer.
The approach that Nuance, the company behind DNS, recommends for students who want to use the software in the classroom is to record the lecture and using “echo-dictating” to convert it into text after the fact. But this is time consuming - the most proficient echo-dictator will need to listen to the entire lecture at least once to capture it all.

Technology Combinations

Students are offered two approaches for note-taking with these technologies:
  1. Dictating with DNS onto a laptop using the Sylencer
  2. Dictating with DNS onto a DVR using the Sylencer and later transcribing the dictation with DNS on the laptop.
Each of these approaches offers advantages over the other. The DVR is easier to carry and presents fewer distractions while taking notes i.e. with the laptop, one can view the output from DNS and misrecognitions may be distracting especially if there isn’t time to correct. But voice note-taking directly on the laptop saves time: the student does not have to perform a transcription after class nor manage another device (the DVR). Also, the student may have time to edit or format notes during the lecture.
Note: Those students that are unable to hold the Sylencer are offered a third option: using a DVR to record the lecture and then echo-dictating it afterwards to convert it into text on their computer.


The methodology includes the following phases:
  1. Initial Technology Evaluation: I tested the three technology combinations during two university lectures.
    • I was particularly interested in whether the Sylencer muffled my speech sufficiently and most students seated near to me reported that they did not hear me speaking. Some may have noticed my speech but could not understand it and were not distracted by it.
    • I learned that it is very important to be proficient in using DNS and the Sylencer, in order to keep up with the lecturer and maximize DNS’s recognition accuracy.
  2. Recruitment Session: When a student expresses interest in participating, we meet for a two hour session during which I:
    • demonstrate the technologies
    • explain the study purpose, procedure and participant responsibilities
    • if the student is interested in participating, complete a profile/pre-screening questionnaire to gather background information, assess their suitability as an evaluator, and assign them to one/more technology options to evaluate.
  3. One-on-one Training: to teach evaluator any technologies that are new to them
  4. Practice: administer a practice session to prepare the evaluator for the classroom
  5. Evaluation: evaluator takes voice notes in at least two lectures depending on the number of technology options they are evaluating and the time they require to provide an objective assessment; I attend the first session to observe technology use and assist with setup
  6. Notes Analysis: count number of corrections made to the transcription.
  7. Post-evaluation Interview: gather evaluator’s feedback about technology use and voice note-taking process; rate evaluation criteria.

Evaluation Criteria

The voice note-taking technologies are being evaluated for:
  • Training time
  • Ability to acquire the technology
  • Ease of use of the technology, especially in the classroom
  • Total time to produce text version of notes
  • Quality of notes
  • Effectiveness of entire note-taking process in achieving learning and retention of lecture material
  • Ability to use the technology independently i.e. motor skills
  • Ability to take notes independently i.e. dictation proficiency
  • Acceptance of technology by lecturer and other students

Current Status

Since the inception of the study in November 2009, eleven students have attended a recruitment session. Of these, five students agreed to participate and attended one/more technology training sessions for DNS, the Sylencer and/or the DVR. Two students took the Sylencer into the classroom: one recorded voice notes on a DVR while the other dictated onto both their laptop and a DVR. The latter student is continuing to evaluate voice note-taking. The reason for the drop in numbers between these three stages will be explained in the final section, Discussion.

Preliminary Findings

The preliminary findings can be grouped into 6 categories: personal characteristics, course selection, technology selection, training (both in technology use and for DNS) and DNS vocabulary customization. These findings are presented as criteria for success in the use of voice note-taking in the classroom.
Personal Characteristics
Personal characteristics covers a wide variety of factors that can determine whether the student is even a suitable candidate for voice note-taking, and can influence their success with it as well.
  • Comfortable using Sylencer in public (highlights disability)
  • Hold Sylencer
  • Operate the laptop or DVR
  • Suitable voice
  • Willing to learn the new technologies
  • Dictate well i.e. enunciate carefully, formulate phrase before speaking it
  • Adjust speaking style for Sylencer: Dictating using the Sylencer requires training and practice to master for two reasons.
    1. Its microphone is very sensitive so one must speak at a lower volume, in essentially an undertone as if one were speaking in a room where a small child was sleeping. A benefit of this quiet speech is that it makes the speaker less audible (and thus disruptive) to those nearby.
    2. Another design feature of the Sylencer that aids in muffling speech and minimizing background noise is the seal that its rubber mask makes when held to the face. But the mask also limits the movement of the facial muscles, making articulate speech more challenging and thus possibly compromising speech recognition as well.
  • Adjust note-taking style for DNS i.e. dictate in grammatically correct phrases versus the abbreviated note-taking style that suits physical task of writing at a quick pace
  • Multi-task i.e. listening and dictating compete for attention (versus listening and writing)
  • Manage switch between voice note-taking and class interaction e.g. removing Sylencer to answer a question, engage in laughter, etc.
Select Appropriate Course
Unfortunately, voice note-taking isn't suitable for all types of courses. DNS performs better under the following conditions:
  • Best initial DNS "out-of-the-box" results with a general vocabulary
  • Infrequent need to record special symbols, charts/graphs or diagrams that are best sketched by hand
  • Special versions of DNS are available for medical, legal, other languages (e.g. French, Spanish)
Select Appropriate Technologies
Voice recognition software is probably the most demanding application most users will ever run on their computer. This is why it is important to heed the system requirements. Nuance also rates devices used in conjunction with DNS such as DVRs and microphones. You'll achieve the best results if you observe the following:
  • Powerful computer: fast processor, lots of memory (try to exceed the recommended requirements for DNS)
  • DNS: latest version (recognition accuracy, performance and features continue to improve)
  • DVR: highly rated for transcription accuracy, easy to manipulate (see Recommended Digital Voice Recorders for Dragon NaturallySpeaking Transcription blog post)
  • Microphone: Sylencer has sensitive microphone, eliminates background noise, USB adapter for laptop (converts analog audio into digital audio, bypassing sound card which may not be the best quality)
Train To Become Proficient In Technology Use
The five evaluators that underwent one-on-one training presented with a range of training needs. Some were previous DNS users but had not received formal training nor engaged in much self-study through reading the DNS Help Topics, End-User Workbook, User Guide or online videos. Only one evaluator had used a DVR. All evaluators were new to the Sylencer.
One-on-one training was supplemented with step-by-step instruction guides for the different aspects and technologies used in the voice note-taking process. The evaluators were also made aware of the extensive online resources that Nuance makes available for DNS users. Finally, the evaluators were encouraged to practice and meet with me to work out any issues they were having.
Train Dragon NaturallySpeaking
As stated earlier, DNS is speaker-dependent, and while it recognizes speech quite well right out-of-the-box, its recognition accuracy improves after the new user trains it during the user enrollment process, and on an on-going basis. Training DNS has two effects. First, it refines the acoustic model which is how the user's voice sounds and how they pronounce words. Second, it modifies the language model, or word usage. The initial training where the user reads from a preset script adjusts the acoustic model. But DNS learns continuously after that especially if the user takes the time to properly correct misrecognitions and customize the vocabulary.
Customize DNS Vocabulary
Post-secondary students may use more unknown or uncommon words, or use words in rarer senses leading to increased misrecognitions. Therefore, to improve recognition accuracy, it is important for the student to:
  • Prior to lecture
    • Import unknown words into the DNS dictionary
    • Analyze lecture notes to adjust word usage frequency and context of use
  • During dictation or after transcription
    • Correct misrecognitions
Students who are taking courses in a variety of subjects will need to perform vocabulary customization for each specialized vocabulary they are dealing with, which may be a time-consuming process.


As stated earlier, many students did not continue past the recruitment or the training phases into classroom evaluation. The two evaluators are either not using voice note-taking at all or not on a regular basis.
There are a variety of reasons for this outcome, all relating to the preliminary findings and criteria for success. For example, to get out of the starting gate the student needs to feel comfortable using Sylencer in public. They need to be able to hold the Sylencer and operate a laptop or DVR. They also require access to the appropriate technology e.g. DNS and a computer that meets DNS's requirements.
The step from training to evaluation has not yet been made by all evaluators due other factors. They may require more training or are not taking an appropriate course. They may have an unsuitable voice (though DNS has evolved to accommodate a wide range of voice qualities).
The two evaluators who actually used voice note-taking in the classroom received poor recognition accuracy with DNS. Again, a variety of factors are likely at play. One evaluator needs to practice speaking in phrases so perhaps more experience in using DNS in a conventional manner would be advisable. Also, custom vocabulary needs to be imported prior to the lecture. Finally, the evaluators likely require more time and practice to adjust their speaking style with the Sylencer.
In conclusion, the preliminary findings of this study into voice note-taking highlight the importance of taking a holistic approach to assessing the wide variety of factors that can influence the successful adoption of an assistive technology. This approach needs to consider the characteristics of the user, the particular task(s) they need to perform, the setting(s) they will be in and the features of the technologies they will be using. Once these factors align, hopefully with the help of the criteria for success this study has identified, students who cannot use typical note-taking modalities may find voice note-taking helps them to achieve greater independence in their educational pursuits.

No comments:

Post a Comment