Publication: IBM Systems Journal Publication Date: 01-SEP-05 Format: Online - approximately 12321 words Delivery: Immediate Online Access Author: Turunen, Markku ; Hakulinen, Jaakko ; Raiha, Kari-Jouko ; Salonen, Esa-Pekka ; Kainulainen, Anssi ; Prusi, Perttu
Article Excerpt INTRODUCTION
Accessibility is often mentioned as one of the major motivations for the development of speech applications. For example, there has been much work in speech-based and auditory interfaces to allow visually impaired users to access existing graphical interfaces. (1) In general, multiple modalities have been used to make human-computer interaction accessible for people with disabilities. (2)
In the ideal case, applications should take the needs of different users and usage conditions into account in the first place: interaction should be adapted to each usage situation. The goal of this approach is universal access to services. Current application development architectures tend to lack the flexibility necessary to adapt to a variety of users and usage conditions. In this paper, we present a system architecture capable of supporting the development of accessible interactive systems.
SPEECH SYSTEM ARCHITECTURES
A software architecture defines applications in terms of components and interactions among them. (3) A framework suitable for practical speech applications must provide components for a variety of application requirements, including dialog management, speech recognition, and natural language processing. High-level components (modules) usually contain multiple subcomponents having their own internal organization and relations. The challenge lies in finding ways for all of these components to be organized and selecting the functionality that the underlying system architecture should offer. This includes the development of principles for interaction and management of information flow among the system components.
When we consider speech architectures from the human-computer interaction point of view, an interesting issue is how the system can be made to support more intuitive and natural speech-based interaction to allow universal access to services. The needs of different user groups vary considerably. Natural interaction requires flexible interaction models supported by the system architecture. (4)
In most speech systems, components are structured in a pipeline fashion; that is, they are executed in a sequential order, although this kind of "pipes-and-filters" model is considered suboptimal for interactive systems. (3) In order to facilitate the development of advanced speech applications, we need more advanced techniques, models, methodology, and tools. In particular, we need system frameworks that support the requirements mentioned because, as with all technical variation, the key issue is the integration of components into a working system. (5)
Earlier work in advanced speech system architectures includes client-server approaches and systems based on agent architectures. Probably the best-known speech-specific client-server architecture is Galaxy-II. (6) The Open Agent Architecture (7) is a general agent architecture that has been used in the construction of many speech applications. These architectures offer the necessary infrastructure components for applications, but they do not support human-computer interaction tasks or adaptation in any particular manner.
A great deal of work has been done in the field of dialog management. Three particularly interesting recent examples include the agenda-based dialog management architecture (8) and its RavenClaw extension, (9) Queen's Communicator, (10) and SesaME. (11) The purpose of these approaches is not to provide a complete speech architecture but instead, a model for dialog management.
The Jaspis architecture addresses many of the same features as the dialog management architectures mentioned but aims for general applicability in a variety of task settings, one of which is dialog management. It introduces a new paradigm for interactive systems that focuses on speech-based applications. In our previous papers (12) we have presented technical and functional aspects of the architecture. In this paper, the architectural principles--and in particular their novel support for adaptive interaction--are described in the context of human-computer interaction. We demonstrate how it is possible to construct highly adaptive systems suitable for different user groups and to support accessibility by using the Jaspis architecture and its principles.
The remainder of the paper is organized as follows. In the next section we introduce architectural foundations for adaptive human-computer interaction. The Jaspis architecture and its novel interaction paradigm based on agents, managers and evaluators is introduced with examples. After the architecture presentation, we introduce several Jaspis-based applications for various domains. Examples of multilingual e-mail systems, timetable services, and pervasive computing applications are given. Special focus is given to interaction-level issues, such as error management, help, and guidance. In the next section we report experiences and results from user-centered design, "Wizard of Oz" ** studies, and evaluations of Jaspis applications. Accessibility issues, such as those raised in design sessions with users who are visually impaired, are discussed. The paper closes with conclusions and discussion.
JASPIS ARCHITECTURE
Jaspis is a general speech-application architecture designed for the challenges of advanced speech applications, especially adaptive and multilingual speech applications. It provides support for human-computer interaction tasks, such as error handling, "Wizard of Oz" studies (i.e., those in which some parts of the system are simulated with a human operator), and corpora collection. While Jaspis is a general conceptual architecture, it is also a concrete framework which provides components for application development. In this section, we present the principles of the Jaspis architecture, focusing on human-computer interaction tasks, in particular on dialog management, output generation, and input management tasks. In addition, application development aspects are briefly discussed.
Architecture requirements
In order to support more flexible interaction, we have identified requirements for speech application architectures. First, speech applications need adaptive interaction methods in all system modules. For example, outputs and inputs should be tailored to the language of the users, and dialog management should adapt to the situation at hand. Second, systems should be modular because modular components support reusability, are easy to maintain and extend, can be distributed efficiently, and make adaptivity easier to achieve. The other requirements concern collaborative and iterative application design and development, the need for an extensible and practical infrastructure for application development, and support for standards. These principles are motivated by the technology, human-computer interaction, and application development viewpoints, all of which should be taken into account. For a more comprehensive description of the Jaspis architecture requirements see Reference 13.
Architecture overview
In order to support the architectural requirements mentioned, the Jaspis architecture uses a modular and distributed system structure, an adaptive interaction coordination model and a shared system context. These form the basic infrastructure on which other features and components of the system are based.
Figure 1 presents a typical Jaspis-based system setup. The top-level structure of the system is based on managers, which are connected to the central manager with a star topology. Communication between components is organized according to the client-server paradigm. Local subsystems are located inside the system modules. The interaction coordination model of the Jaspis architecture is based on the agents-managers-evaluators paradigm. Agents are interaction components which implement different interaction techniques, such as speech output presentations and dialog decisions. Evaluators are used to evaluate different aspects of the agents, in order to determine how suitable the agents are for different tasks. Managers are used to coordinate these components. All information in Jaspis-based systems is stored in the shared information storage. All components of the system may access the content of the information storage by using the information manager. These are the key features enabling architecture-level adaptation.
[FIGURE 1 OMITTED]
Shared information management
Information management is a crucial element of adaptive, modular, and distributed applications. The repository approach (i.e., using "blackboards" and databases) provides several advantages for adaptive and distributed applications. The term "blackboards," in the context of speech applications, refers to shared information resources, or specifically, a shared knowledge base. Most importantly, the repository approach allows the use of shared information by all system components. The main drawback of this approach is the lack of control. (7) In Jaspis, the coordination and control are performed by a separate component (the interaction manager) to achieve architecture-level coordination.
The Jaspis information management architecture consists of four layers. In this way, the actual storage (the information storage), the application interface (the information manager), and the communication interfaces (the information access protocol and communication protocols) are separated to maximize flexibility. In this section we focus on information storing and application layers, omitting the communication layers.
The information storage holds all the shared system data, that is, the shared system context. The Jaspis architecture assumes that individual components do not store any high-level information inside them, but instead use the information storage for that purpose. This makes the interaction components stateless, and the system is able to adapt to each interaction by choosing proper components for each situation. To make this possible, the system assumes that every component updates its knowledge from the information storage when activated and writes modified information back to the information storage when deactivated. In the ideal case, the shared data should be represented at a conceptual level such that it can be used by other components as well. This is one of the main features used to adapt the systems for different users.
The reference implementation of the information storage uses XML (Extensible Markup Language) for its internal information representation. The content or the structure of information inside the information storage is not defined by the system architecture because this is specific to the particular application and domain. The definition of the shared knowledge is an important phase of the application development process.
The programming interface for the information storage is straightforward: it takes XML requests and produces XML results. The information storage offers only the minimal set of operations needed to manipulate its contents. In addition to the shared information storage, direct information exchange between certain I/O components is supported for efficiency reasons. Most notably, the raw audio streams should be passed between components in a cost-efficient way to minimize system overhead and delays.
The information manager provides an application interface to the information storage. It uses the information access protocol to access the information storage and provides a programming interface for other components to access the shared system context. For example, system inputs and outputs may be modified by their own set of methods. Application developers may write new, application-specific methods when needed.
Flexible interaction management
The interaction management model of Jaspis is focused on the key design principles of the architecture: adaptivity and modularity. Interaction management in this context means both the overall coordination of system components and the coordination of those components that implement interaction techniques to be used in human-computer interaction tasks. In practice, this means input, output, and dialog management components in their various forms.
Interaction techniques are implemented by agents, which are software components specialized for certain tasks. Evaluators are used to make selections among different agents. Managers are used to coordinate agents and evaluators. Components specialized for related tasks are organized into modules. An overview of the interaction management model is presented in Figure 2.
[FIGURE 2 OMITTED]
As illustrated in the figure, each system module contains one local manager and several agents and evaluators. It is up to the local managers to decide which agents are used in different situations. Instead of centralizing this decision (by assigning it to the managers), evaluators are used to evaluate agents and their suitability for different tasks. Thus, there is no central component which makes these decisions. This makes it possible to construct highly adaptive and modular systems because all functionality is divided into specialized components that have no predefined execution order and relations among them. The principles governing how managers, agents, and evaluators are used are presented next in more detail.
Managers: coordination
The interaction manager is a central component in Jaspis-based systems. It manages other components and is responsible for the overall coordination of the interaction. The interaction manager is similar to some central components...
NOTE: All illustrations and photos have been removed from this article.

More articles from
IBM Systems Journal Beyond predictable workflows: enhancing productivity in artful business processes., 01-OCT-06 Following the sun: case studies in global software development., 01-OCT-06 Business activity patterns: a new model for collaborative business applications., 01-OCT-06
Looking for additional articles? Click here to search our database of over 3 million articles.
Looking for more in-depth information on this industry? Click here to search our complete database of Industry & Market reports by text, subject, publication name or publication date.
About Goliath Whether you're looking for sales prospects, competitive information, company analysis or best practices in managing your organization, Goliath can help you meet your business needs.
Our extensive business information databases empower business professionals with both the breadth and depth of credible, authoritative information they need to support their business goals. Whether it be strategic planning, sales prospecting, company research or defining management best practices - Goliath is your leading source for accurate information. |