Home | Business News | Browse by Publication | H | Human Factors

Audio and visual cues in a two-talker divided attention speech-monitoring task.

Publication: Human Factors
Publication Date: 22-SEP-05
Format: Online
Delivery: Immediate Online Access

Article Excerpt
INTRODUCTION

In recent years, improvements in data transmission technology have dramatically reduced the cost of telecommunications bandwidth. To this point, however, relatively little effort has been made to exploit this low-cost bandwidth in improved speech communications systems. In part, this apparent oversight reflects the fact that standard telephone-grade audio speech (with a bandwidth of roughly 3500 Hz) already produces near 100% intelligibility for typical telephone conversations involving a single talker in a quiet listening environment. There is, however, ample opportunity for higher-bandwidth speech communication systems to improve performance in complex listening tasks that involve more than one simultaneous talker. High-bandwidth multichannel speech communication systems could have a wide variety of applications, ranging from simple three-way conference calling to sophisticated command and control tasks that require listeners to monitor and respond to time-critical information that could be present in any one of a number of simultaneously presented competing speech messages.

A question of practical interest, therefore, is how additional bandwidth could best be allocated to improve the effectiveness of multichannel speech communications systems. The most obvious approaches to this problem involve the restoration of the audio and visual cues that listeners rely on to segregate speech signals in real-world multitalker listening environments, such as crowded restaurants and cocktail parties. For example, listeners in the real world rely on interaural differences between the audio signals reaching their left and right ears to help them segregate the voices of spatially separated talkers (see Bronkhorst, 2000, for a recent review of this phenomenon). When these binaural cues are restored to a speech communication signal by adding a second independent audio channel to the system, multitalker listening performance improves dramatically (Abouchacra, Tran, Besing, & Koehnke, 1997; Crispien & Ehrenberg, 1995; Ericson & McKinley, 1997; Nelson, Bolia, Ericson, & McKinley, 1999).

Additional bandwidth could also be used to restore the visual speech cues that are normally available in face-to-face conversations. These cues make it possible to extract some information from visual-only speech stimuli (a process commonly known as speechreading; Summerfield, 1987), and they contribute substantially to audiovisual (AV) speech perception when the audio signal is distorted by the presence of noise (Sumby & Pollack, 1954) or interfering speech (Rudmann, McCarley, & Kramer, 2003).

From earlier experiments, it is clear that multitalker listening performance can be improved both by the addition of binaural spatial audio cues and by the addition of visual speech information. However, relatively little is known about how audio and visual information might interact in high-bandwidth multichannel AV speech displays. Important research issues related to this topic include the following:

Divided attention versus selective attention in AV speech perception. An essential underlying assumption in the design of a multitalker speech display is that neither the system nor the operator will have reliable a priori knowledge about which talker will provide the most important information at any given time. (Otherwise, either the system or the operator would simply turn off the uninformative talkers.) Consequently, it is important to know how well listeners are able to divide their attention across the different talkers in an AV speech stimulus in order to extract important information that might originate from any one of the competing speech signals. However, virtually all experiments that have examined AV speech perception with more than one simultaneous audio speech signal (Driver, 1996; Driver & Spence, 1994; Reisberg, 1978; Rudmann et al., 2003; Spence, Ranson, & Driver, 2000) have examined performance in a selective attention paradigm in which the participants were provided with a priori information about which talker to attend to and which talker to ignore prior to the presentation of each stimulus. This makes it difficult to determine how visual cues might influence performance in situations in which listeners must rely on the content of the competing speech messages to determine where the most important information resides.

AV speech perception with multiple visible talkers. Although a number of studies have examined AV speech perception with multiple simultaneous talkers, most have been limited to cases in which only a single talker was visible at any given time (Driver, 1996; Driver & Spence, 1994; Reisberg, 1978; Rudmann et al., 2003; Spence et al., 2000). Thus it is not clear how well listeners might be able to divide their visual attention across two visible faces in a multitalker AV speech stimulus.

Semantic AV incongruencies. When visual speech stimuli are presented in conjunction with mismatched audio speech stimuli, cross-modal interactions can substantially distort the overall perception of the multimodal stimulus. One classic example of this is the McGurk effect, which causes listeners who see one word spoken and hear another word spoken to report the perception of a third word that was not presented in the stimulus (e.g., they report hearing an "ada" sound when they see a talker saying "aga" and hear a talker saying "aba"; McGurk & McDonald, 1976). Although it is unlikely that AV speech display would intentionally present...

View this article FREE - Now for a Limited Time, try Goliath Business News
Free for 3 Days!



More articles from Human Factors
Speech-based interaction in multitask conditions: impact of prompt mod..., September 22, 2005

Looking for additional articles?
Search our database of over 3 million articles.

Looking for more in-depth information on this industry?
Search our complete database of Industry & Market reports by text, subject, publication name or publication date.

About Goliath
Whether you're looking for sales prospects, competitive information, company analysis or best practices in managing your organization, Goliath can help you meet your business needs.

Our extensive business information databases empower business professionals with both the breadth and depth of credible, authoritative information they need to support their business goals. Whether it be strategic planning, sales prospecting, company research or defining management best practices - Goliath is your leading source for accurate information.