In recent years, many research activities have focused on design that aims to produce universally accessible systems, taking into account special needs of various user groups. These special needs are associated with many user factors, such as impairments of speech, hearing or vision, cognitive limitations, aging, as well as with various environmental factors . Fields that address this problem, such as Usability, Universal Accessibility, Universal Design, or Inclusive Design  have been developed as relatively independent domains, but they share many aspects with other human-computer interaction (HCI) disciplines. However, researchers and practitioners are often not aware of interconnections among concepts of universal accessibility and “ordinary” HCI. In view of this situation, in this article we show there is a fundamental connection between multimodal interface design and universal accessibility, and that awareness of these links can help both disciplines. Researchers from these areas may use different terminology, but the concepts they use often have essentially the same meaning. We propose a unified conceptual framework where these areas can be joined.
Accessibility and Multimodal Interaction
Universal accessibility and related approaches such as “Inclusive Design” or “Design for All,” aim to produce systems that can be used by everyone, regardless of their physical or cognitive skills. As this design philosophy tends to enhance the usability of the product, it can also be extremely valuable for non-disabled users trying to use the system under suboptimal conditions . The activities resulting from the growing interest in accessibility and universal usability have produced a collection of resources that developers can use in their work. For example, many guidelines about accessibility, especially for Web design, are already available . In addition, conferences such as the ACM Conference on Universal Usability (CUU), and the ACM Conference on Assistive Technologies (ASSETS), as well as journals such as the International Journal on Universal Access in the Information Society, offer good sources of practical and theoretical work in this area. Developers can also use various practical solutions and tools, such as Web site compliance checkers, semiautomatic Web site repair tools, or Web adaptation facilities that transform existing Web content “on the fly.” There are also activities in developing tools that use guidelines to automatically verify Web accessibility .
Multimodal interaction is a characteristic of everyday human discourse, in which we speak, shift eye gaze, gesture, and move in an effective flow of communication. Enriching HCI with these elements of natural human behavior is the primary task of multimodal user interfaces. Many studies have explored multimodal interaction from different viewpoints . Sharon Oviatt gave a practical definition of multimodal systems, saying they combine natural human input modalities—such as speech, pen, touch, hand gestures, eye gaze, and head and body movements—in a coordinated manner with multimedia system output . Matthew Turk and George Robertson further refined the difference between multimedia and multimodal systems, saying multimedia research focuses on the media, while multimodal research focuses on the human perceptual channels . Multimodal interfaces can improve accessibility for diverse users and usage contexts, advance performance stability, robustness, expressive power, and efficiency of communication .
While multimodal interaction research focuses on adding more natural human communication channels into HCI, accessibility research is looking for substitute ways of communication when some of these channels, due to various restrictions, are of limited bandwidth. What makes a difference between these two areas is a focus of their research. Therefore, many things from both areas can be generalized so that we can obtain a unified and more abstract view of them. In this way existing solutions from one domain could be applied in another.
The Unified Framework
Treating user interfaces as multimodal systems can clearly help design for universal accessibility, as multimodal interfaces describe HCI in terms of modalities, for example, in terms of communication channels established between the computer and the user. Environmental constraints or limited user abilities can be considered a break or decrease of throughput in these channels (see Figure 1).
If we describe user interfaces as a set of communication channels, and connect these descriptions with user, environment, and device profiles that describe limitations in usage of these channels, we can easily see if the multimodal interface will be appropriate for the user in a specific situation. However, to create a unified view of multimodal system design and accessibility, we need a semantic framework where we can explicitly and formally establish relations among concepts from both domains. Therefore, our first step is to formally define a unified modeling framework for the description of multimodal HCI and various user and environment characteristics using the same terms. The proposed framework does not define any specific interaction modality—such as speech, gesture, graphics, and so on—nor a constraint, such as low vision, immobility, or various environment conditions, but defines a generic unified approach for describing such concepts. The framework, therefore, focuses on the notions of an abstract modality and abstract constraint, defining their common characteristics regardless of their specific manifestations.
The Model of Multimodal HCI and Constraints. Our approach is based on the idea that user interfaces can be viewed as one-shot, higher-order messages sent from designers to users . While designing a user interface, the designer defines an interactive language that determines which effects and levels will be included in the interaction. Therefore, we model user interfaces with modalities they use, where we define a modality as a form of interaction designed to engage a number of human capabilities, for example, to produce effects on users, or to process effects produced by the user (see Figure 2). In our model, modalities can be simple or complex: a complex modality integrates other modalities to create simultaneous use of them, such as to provide modality fusion of fission mechanisms, while a simple modality represents a primitive form of interaction. Here, we do not focus on a detailed description of multimodal integration, but on the high-level effects a modality system or some of its parts use. We define input and output types of a simple modality, using the computer as a reference point. An input modality requires user devices to transfer human output into a form suitable for computer processing. We classify these into event-based and streaming-based classes. Event-based input modalities produce discrete events in reaction to user actions, such as user input via a keyboard or mouse. Streaming-based modalities sample input signals with some resolution and frequency, producing a time-stamped array of sampled values. We introduce a special class of streaming modality, a recognition-based modality, which processes streaming data, searching for patterns. An output modality presents data to the user, and this presentation can be static or dynamic. A more elaborate description of this model can be found in .
While we describe HCI in terms of modalities, we describe various accessibility issues in terms of interaction constraints (see Figure 3). Interaction constrictions can be viewed as the filters on usage of some effects. Constraints are organized as basic and complex. We identify two types of basic constraints: user constraints and external constraints. User constraints are classified into user features, states, and preferences. User features describe the long-term ability of a user to exploit some of the effects, and this description can include some of the user disabilities, such as low vision or immobility. A user state constraint, further classified in emotional and cognitive context, describes a user’s temporary ability to use some effects. User preferences describe how much the user is eager to make use of some effects—it is a user’s subjective mark of the effects they prefer or dislike.
External constraints are categorized as device constraints, environment constraints, and social context. Device constraints describe restrictions on the usage of effects that are a consequence of device characteristics. For example, a mouse is limited to capture movement in two-dimensional space with some resolution, while output devices, such as screens on PDAs and other mobile devices, have limited resolution and a limited number of colors. Environmental constraints describe how the interaction environment influences the effects. For example, when driving a car, in most situations, users are not able to watch the screen and, therefore, this situation greatly reduces the usage of visual effects. In addition, various other environmental factors, such as visual conditions or noise, greatly affect the usage of other effects. Social context describes the social situation in which the interaction occurs. The proposed model allows the flexible definition of various simple and complex constraints of different types. The resulting constraint in a particular situation will be a combination of the user’s state, abilities, and preferences, as well as various external factors relevant to that situation.
The proposed model allows the flexible definition of various simple and complex constraints of different types. The resulting constraint in a particular situation will be a combination of the user’s state, abilities, and preferences, as well as various external factors relevant to that situation.
Common Ground: The Effects. Entities that connect modalities and constraints in our model are effects. We have classified effects used by modalities and affected by constraints in five main categories : sensory, perceptual, motor, linguistic, and cognitive effects.
These categories are based on various sources, such as the World Health Organization International Classification of Functioning, Disability and Health (ICF). Our model also allows developers to use other categories. In our model, these concepts are subclasses of the Effect class presented in Figures 2 and 3. Sensory effects describe processing of stimuli performed by human sensory apparatus. Perceptual effects are more complex effects that the human perceptual system obtains by analyzing data received from sensors, such as shape recognition, grouping, highlighting, or 3D cues. Motor effects describe human mechanical action, such as hand movement or pressure. Linguistics effects are associated with human speech, listening, reading, and writing. Cognitive effects take place at a higher level of human information processing, such as memory processes or attention.
Effects are often interconnected. For example, all perceptual effects are a consequence of sensory effects. These relations among effects are important because in this way designers can see which side effects would be caused by their decisions to use particular effects.
Using the Framework
Our unified framework can be used to describe various interaction modalities and interaction constraints. By combining these descriptions, and by using effects as a common ground, it is possible to see if the designed interface will be appropriate for a particular situation, and it can enable adaptation of user interfaces according to user profiles and situational parameters.
Describing User Interfaces, Users, and Environment. We model user interfaces with their modalities, where we describe each modality with the effects it requires in order to be operative. For example, Table 1 shows effects produced by some common modalities, such as simple text presentation, aimed hand movement, visual menu interaction, and speech-based user interfaces, where we also illustrate different effect levels.
A specific user interface can then be described using these high-level descriptions of modalities, while we can obtain detailed descriptions of used effects automatically through mappings, such us those shown in Table 1. It is also possible to have several alternative mappings among modalities and effects according to different theories. For example, simple textual presentation in Table 1 is described according to Gestalt psychology, but it is also possible to provide a description of these modalities according to other theories.
Accessibility issues, such as user abilities and environmental conditions, are described using interaction constraints. User abilities can be described in several ways. For example, one approach is to create individual profiles of each user, associating all the effects with values describing the user’s capability to exploit them. For simplicity, the profile could include only the effects that are different from a typical user. Alternatively, it is possible to define a repository of user ability categories, where each category is described with a set of effects that it reduces. These categories can describe some disability, or other factors, such as average abilities of different age groups. For example, Table 2 shows effects that are reduced by some disabilities. In modeling and analyzing user interfaces, a very important role has the relations among effects. For example, if we describe that the user is not capable of processing a sound, it means not only sensory, but also all the audio perceptual effects will not be appropriate for that user.
In a similar way we can describe constraints introduced by the environment conditions. For example, driving a car is a complex constraint that integrates different user and environmental parameters. Table 3 shows a simplified description of this constraint that depends on traffic circumstances, weather conditions, noise level, lighting, user current state, as well as the number of people in the car. Constraints can also be interconnected, for example, visual and weather conditions can affect the user’s current state, while the number of people in the car can influence the noise level. This description can be useful in determining which modalities to use in particular situations. When the car is stopped, it is possible to use the central field vision of the user, as well as other effects to a greater degree. On the other hand, traffic congestion further limits possible usage of these effects, allowing their use to a lesser degree.
Analysis and Transformations. The descriptions of multimodal interfaces and interaction constraints presented here can be used for various purposes. In a simpler form, they can serve as metadata about some user interface, or as a part of a user profile. However, with formal descriptions of user interfaces and constraints, it is possible to develop tools that analyze and transform the content in order to see if it is suitable for particular situations or users.
We are developing a set of design-support and educational tools based on the proposed framework. These tools take as an input the description of a user interface, expressed in terms of modalities, and then evaluate it, for example, giving the list of effects, or giving a list of potential problems in some environments, as well as the list of user groups that could have difficulties using this interface. To increase the awareness about importance of accessibility aspects, these reports also contain data about the percentage of people who suffer from some interaction limitations (for example, approximately 8% of men and 0.4% of women have some form of color blindness).
Various other applications, such as dynamic adaptation and content repurposing are also possible. By connecting descriptions of user interfaces, user profiles, and other constraints, we can analyze and transform content in various ways. The proposed framework can, therefore, be a good basis for adaptation and content repurposing that attack the problem of developing content for various users and devices. A more detailed description of our previous work in this area can be found in .
Our proposed approach can provide developers and researchers several advantages. From the developer’s point of view, one advantage is that it is possible to design more flexible and more reusable solutions, aimed for a broader set of situations. Most of the previous work in designing solutions for people with disabilities concentrated on a specific set of disabilities, or on specific situations. Considering the great diversity of disabilities and situations, it is clear that development and maintenance of such systems is rather complex. With our approach, developers can concentrate on more generic effects, providing solutions for different levels of availability of specific effects. In this way it is possible to create adaptable solutions that adjust to user features, states, preferences, and environmental characteristics.
Another important advantage is that our framework enables treating different situations in the same way. As user features and preferences are described in the same way as environmental characteristics, it is possible to use solutions intended for a user with some disability for a non-disabled user in situations that limit the interaction in the same way as some disability limits the other user. In addition to providing more universal solutions, this could also solve some of the associated problems because design is not concerned with disabilities (recognizing that the term “disability” often creates negative reactions), but with various effects and their constraints.
Figure 1. Modalities, constraints, and effects. Computers and humans establish various communication channels over which they exchange messages with associated effects. Modalities process or produce these effects, while various interaction constraints reduce or completely eliminate some of these effects.