Abstract
Traditional symbol-based AAC devices impose meta-linguistic and memory demands on individuals with complex communication needs and hinder conversation partners from stimulating symbolic language in meaningful moments. This work presents a prototype application that generates situation-specific communication boards formed by a combination of descriptive, narrative, and semantic related words and phrases inferred automatically from photographs. Through semi-structured interviews with AAC professionals, we investigate how this prototype was used to support communication and language learning in naturalistic school and therapy settings. We find that the immediacy of vocabulary reduces conversation partners’ workload, opens up opportunities for AAC stimulation, and facilitates symbolic understanding and sentence construction. We contribute a nuanced understanding of how vocabularies generated automatically from photographs can support individuals with complex communication needs in using and learning symbolic AAC, offering insights into the design of automatic vocabulary generation methods and interfaces to better support various scenarios of use and goals.
Introduction
Symbol-based Augmentative and Alternative Communication (AAC) leverages the relative strengths in visual processing of individuals with complex communication needs such as children with autism spectrum disorder. As with other forms of language acquisition, learning symbolic AAC demands a linguistic-rich environment, with frequent opportunities for receiving and producing language through the symbolic modality. Conversation partners have a crucial role in this process: they need to ensure that the AAC tool is programmed with relevant symbols and then model language use with the tool as conversation opportunities naturally arise.9
However, the design of traditional symbol-based AAC devices hinders such frequent exposure to relevant symbolic communication. These tools display symbols out of context, arranged in grid-based displays that are organized hierarchically following linguistic (for example, nouns and verbs) or hierarchically-based (that is, superordinate ordinate, like “food” “dessert”) categories, imposing significant meta-linguistic and memory demands.15,19 To facilitate the availability of relevant vocabulary and reduce navigational demands, conversation partners can create topic-specific communication boards by selecting words related to a topic they deem as useful and grouping them on a single page. Nonetheless, this strategy does not scale to unexpected situations and imposes a heavy workload on conversation partners, who must anticipate learners’ vocabulary needs and dedicate time to program that vocabulary into the devices. Consequently, vocabulary availability tends to be restricted to a small set of topics or a series of frequent words that can be used across most contexts (e.g., want, go). Conversation partners are not able to capitalize on naturally occurring opportunities for language learning, further hindering symbolic communication learning and learners independent use of AAC.
In this work, we present Click AAC, the first AAC tool that generates situation-specific communication boards formed by words and phrases inferred automatically from photographs. Through our analysis of semi-structured interviews with AAC professionals, we investigate how these professionals and their clients with complex communication needs used Click AAC during their routine therapy and school activities.
Interactive App Design
The design of Click AAC is rooted in evidence-based recommendations from HCI and AAC literature, including the design of well-established AAC tools. We detail the design rationale and important facets of Click AAC’s vocabulary generation and user interface below.
Vocabulary Generation Click AAC employs a combination of three generation methods (descriptive, related, narrative) that provide vocabulary spanning the main parts of speech for symbolic AAC (that is, pronouns, nouns, verbs, and adjectives).
The first step for all methods consists of creating a set of candidate description tags and a human-like description sentence (that is, caption) for the input photograph using the computer vision technique from Fang et al.8,a This initial vocabulary is then used with distinct goals in each method:
Descriptive: Simple description of the scene. It includes lemmas of all description tags, as well as the description phrase.
Related (Expanded): Words semantically related to the elements in the scene. It includes lemmas of all description tags plus lemmas of the three words most strongly connected in SWOW—a model of the human mental lexicon constructed from word-association experiment data—for each description word.b
Narrative: Words and phrases used for creating narratives about the scene photographed, obtained through the technique proposed by de Vargas and Moffatt.7 This technique selects vocabulary associated with similar photographs (that is, having semantically similar captions) from the visual storytelling dataset Vist,11 which contains 16,168 stories about 65,394 photos created by 1,907 mechanical Turk workers.
By default, the final set of vocabulary presented to the user is the combination of all methods, categorized by their parts of speech. Finally, symbols representing the vocabulary are retrieved from ARASAAC, a repository containing more than 11,000 AAC symbols. If the language set in the application is different from English, generated words and phrases are translated to the target language through the Google Translate API.
Interface. Our mobile application is composed of three main screens designed to provide direct access to its main features, as shown in Figure 1: (1) a home screen from which the user can import existing photos from the device’s gallery, take a new photo, or view their album, (2) an album screen from which the user can navigate through all their previously imported photos and open associated communication boards, and (3) the vocabulary page screen that presents the vocabulary generated for an individual photo.
Click AAC’s vocabulary page borrows the overall layout concept and key features from Visual Scene Displays (VSDs), a state-of-art AAC support for early symbolic communicators and individuals with cognitive and linguistic limitations2,4,5,16,17: Vocabulary is organized in boards around a center topic represented by the main photograph (for example, “eating quesadillas”), rather than in hierarchical categories representing abstract concepts (for example, “actions” or “foods”). To allow a larger number of symbols to be displayed without navigation to other pages, and to facilitate the transition between Click AAC and other popular, grid-based tools, generated words are displayed in a grid layout with symbols grouped and colored according to their part of speech. Each generated sentence is displayed as a single button containing the symbols of its content words. Users can navigate to other vocabulary pages by selecting thumbnails of the signature photos, available via the navigation bar on the left of the communication board currently open, a strategy demonstrated beneficial by clinical researchers.2,17,22 Selected words are displayed in a message bar on top of the screen, allowing users to compose sentences combining individual symbols, as in typical AAC devices. Users can trigger synthesized audio output by tapping on the vocabulary buttons or on the message bar.
To support user’s agency during communication,21 the editing mode displays a new menu bar next to the photo with options for reordering and removing words and phrases, adding new words, and editing symbols associated with the words.
Methods
We conducted a user study involving AAC professionals and their clients with complex communication needs who used Click AAC in their routine practices of therapy sessions or school activities. Through questionnaires and semi-structured one-on-one online interviews with AAC professionals, we investigated an overarching question:
How can situation-specific vocabularies automatically generated from photographs support communication and language learning for individuals with complex communication needs?
We explore this research question in terms of professionals’ reflections on their experiences with our prototype, as well as broader factors and concepts envisioned through their experiences. This approach allowed us to understand the broad application of automatic generation of vocabularies from photographs, without limiting use scenarios or introducing artificial ones. As a long-established practice, AAC interventions must consider not just the needs of individuals that require AAC, but also those of their conversation partners.3,14,16 These professionals regularly try novel AAC technologies and combine multiple tools to accommodate emerging needs dependent on the situation and client profile, in addition to practicing symbolic communication with clients and instructing family members on how to support AAC at home. Therefore, their expertise can provide unique higher-level perspectives than individual users.
In this exploratory study, our goal was not to specifically evaluate Click AAC, but rather to understand the use and expand the design space regarding automatic “just-in-time” vocabulary from photographs in AAC. Engaging directly with users and observing them using the application on defined tasks might bring valuable insights for designing an application but would offer little support for such exploration. Nonetheless, during all interviews, care was taken to ensure that the participants not only shared their perspectives but also relayed the experiences of their clients. The virtual format of the interviews also enabled us to reach a broad set of use cases across learning, therapy, and cultural contexts, without geographic constraints.
Participants. We made Click AAC publicly available through mainstream app store platforms, and recruited AAC professionals through a message displayed in its initial screen. This message prompted SLPs, who were trying or expecting to try Click AAC with one or more individuals with complex communication needs, as well as AAC consultants or evaluators, who assessed the app independently based on their professional expertise, to enter their contact information if they were interested in participating in the study. Fourteen SLPs used Click AAC with their clients on private therapy sessions or with their students in special education for at least four weeks, involving approximately 67 AAC users. We refer to these clients and students as AAC learners throughout the paper. Additional 6 consultants/evaluators tested the app by themselves.
Procedure. Since this study aimed to understand the use in naturalistic settings, participants were not instructed on how or where to use the app, but rather asked to use or continue to use it in their routine practices in the ways they judged to be most appropriate. Accordingly, professionals used their own expertise and judgment in selecting which clients to try the application with.
For participants who tested with AAC learners, frequency of use ranged from a few sessions spread over four weeks to continuous use during approximately two months. The time using the app within their routines also greatly varied because most of the usage occurred as the need and opportunities arose, rather than during time slots dedicated to testing the app. Consultants/evaluators who tested the app by themselves used the app less extensively, given that their evaluation mostly consisted of uploading several photographs and investigating vocabulary generated without engaging in specific activities with AAC learners.
We interviewed the professionals through online video meetings once they deemed their evaluation was complete and they were ready to provide feedback. Each interview took approximately 20–50 minutes. The semi-structured interview was guided by eight questions, covering scenarios of use, profile of users, comparison against current AAC tools, adequacy of the tool in professionals and learners routines, and strengths and weaknesses of the current prototype.
Findings
Our thematic analysis revealed three main themes that together answer our overarching research question. The first theme describes the situations and ways in which Click AAC was incorporated in school and therapy activities. It details the kinds of support provided for different learner profiles, in addition to presenting envisioned use cases. The second theme interprets how people benefited from the immediacy of vocabulary provided by Click AAC during those activities. The third theme explores the dynamics between AI and users, weighting the benefits and issues introduced by automation and revealing the importance of keeping humans in the loop.
Click AAC offers a flexible, complementary AAC tool for a wide range of user profiles.
A wide range of learner profiles can benefit. Overall, professionals felt that a wide range range of learner profiles may benefit from technology similar to Click AAC. Selected learners were, in majority, emerging or context-dependent communicators that were non- or minimally-verbal children with diverse developmental disabilities (for example, ASD, cerebral palsy). Professionals also used Click AAC with a smaller number of children and young adults with functional communication and some literacy skills. Professionals described how a wide range of user profiles could benefit from auto JIT from photographs depending on how the professional incorporates it in their practice.
Professionals also envisioned the use of such technology for other populations such as adults with aphasia and dementia. P2, for example, described how people with Broca’s aphasia, who have “telegraphic speech” and often “want to be talking about their favorite things”, such as a visit to the “museum or zoo over the weekend”, could “take pictures of” such an event and “bring it to like, a family party and talk about it”. P15, who has experience with people with aphasia also envisioned such use cases, but warned that the user would need to have “pretty good reading comprehension” to discern whether the vocabulary generated was adequate or not. Otherwise, the app would get the user “into trouble because [users] would be selecting messages that were maybe not appropriate or relevant to the photo and not realize it.”
A complementary tool to talk about past and present contexts, “giving them a voice” and “facilitating language development”. Professionals viewed Click AAC as a complementary tool to facilitate language development and enable communication about specific topics. They reported a variety of main goals for using the app, including to “expand expressive language” (P2), “augment and facilitate speech language development” (P4), and reduce prompt dependency (P8), “build[ing] up towards using it as an alternative form of communication” (P4).
They did not see the tool as a substitute to existing robust systems because of the uncertainty of vocabulary that will be available and because it does not give access to all language concepts the user may need at all times—which are limitations inherent to the concept of automatic JIT vocabulary and topic-specific communication boards, respectively. In addition to those limitations, some aspects of our specific design further hindered the adoption of the tool as substitute AAC device, such as the the lack of a fixed core vocabulary set across all pages and a variable arrangement of symbols that does not promote motor planning.
On the other hand, the ability to easily access language in situations where they “want[ed] to talk about something that’s unique or [to] tell a story” (P12) supported learners in different ways towards the intervention goals. While learners used the app independently in some instances, on most occasions professionals operated the app in conjunction with learners due to the learner’s cognitive and motor difficulties, in line with their regular practice involving other AAC tools. Professionals used the vocabulary generated to work on different activities that encouraged symbolic AAC, such as performing aided language stimulation (that is, modeling language) with emerging communicators while describing, asking questions, or making comments about the scene photographed. With context-dependent communicators, professionals extended the activity by instructing learners to construct their own sentences.
Immediacy of vocabulary facilitates communication and language learning “on the spot” with reduced workload. Our findings in this theme reveals how learners and professionals benefited and may benefit from immediate availability of situation-specific vocabulary from photographs across the different activities and intervention goals described in the first theme.
Reduced workload opening up opportunities for AAC stimulation. Professionals stressed the importance of selecting and programming appropriate fringe vocabulary to support learners in the various situations encountered in their routines. They further described how this is typically an arduous task, but that vocabulary generated automatically from photographs can alleviate it. Not surprisingly, conversation partners were not only overloaded by the need of selecting and programming vocabulary on current tools, but also unable to plan and perform these tasks for all situations encountered by learners. The instant availability of relevant vocabulary allowed participants to increase the frequency of moments in which they could model language or engage learners with AAC in general, which is fundamental for successful AAC interventions.
For example, P7 pointed out the challenges of helping multiple students with different tasks concurrently in teaching routines. She then commented how she was currently able to provide only core vocabulary throughout the school day, and how auto JIT from photographs encouraged communication on the spot by providing easy access to relevant fringe vocabulary:
P7: You always want a child to have the ability to communicate, but the time for teachers to do that is very limited. …I just I can’t keep up with fringe [vocabulary]. With something like [Click AAC], a teacher could take a picture and could encourage that communication and they could do it quickly and they could do it easily …So this is brilliant.
Participants also postulated that auto JIT from photographs might particularly benefit families, who are less experienced with AAC technologies and vocabulary selection than professionals, and therefore face challenges in creating adequate on-topic communication boards.
P1: [It] takes me probably like a few minutes to be able to create a new page in someone’s communication application, but that’s because I do that …five days a week [for] seven years …but imagine a family who either is not tech savvy [or] just trying to keep up with their child who has special needs. It takes them like hours …they just don’t have that time. …the convenience and quickness of being able to program information into it is the most impressive thing that I’ve seen.
Facilitated symbolic understanding and sentence construction. Professionals discussed how Click AAC benefited learners and themselves during aided language stimulation and sentence constructions activities. Having “something visual [to] anchor some concepts” was deemed particularly important for modelling language for emerging communicators because a picture taken was “live” and “connecting [the symbols with] something very physical” (P13). Professionals explained how the immediate creation of symbols from real world concepts support teaching learners on “how to use symbolic communication to communicate something more specific” (P11). Because learners could see and use the symbols at the same time as they engaged with the associated object or concept, it was easier to “understand that a referential symbol replaced the presence of that object” when composing messages on the AAC tool, “such as words [do] in oral language” (P20).
Participants noticed that having a concise set of vocabulary displayed next to a photograph setting the communication context in “real-time” supported learners in engaging in the formulation of spontaneous sentences. P13 commented that this strategy gives “situational context cues”, “making it easier to compose sentences and to make it more of a conversation” with minimum navigation. As P14 mentioned, “sometimes the navigation between boards, as you’re learning to build sentences, it’s like a lot together”, and P8 described how the layout particularly supported users with limited attention span.
Support for communicating personal interests. Professionals noted that instant situation-related vocabulary from photographs enabled children who relied on nonverbal communication and had major difficulties navigating traditional AAC systems to initiate communication about personally relevant topics, allowing professionals to “expand and build on whatever modality [learners were] using” (P2). P6 explains how nonverbal learners often want to initiate communication about topics interesting to them but are hindered due to the lack of easy access to relevant vocabulary, comparing how auto JIT from photographs provided better support in relation to existing tools:
P6: My main purpose for it would be that “on the go,” when I have a student that needs to talk about something that is just too frustrating to find the words for on their device. …right now it’s snowing, we could take a picture of the snow and the playground, and then the language that comes up about that is concise and related. And, the Proloquo, the Snap Scene …we have to dig and dig and dig and find “go”, back to the page that has “playground”, and go forward to the page that has “weather”. And I go back to the page that has “clothes” …
Potential impact on motivation and confidence. Although our user study did not focus on measuring language outcomes, our findings provided preliminary evidence that such technology may improve motivation and confidence for some learners, particularly for those who had been least successful with current tools. Some professionals commented that learners were receptive to the technology and motivated to used it. P13 commented that her learner, who was working on literacy skill and had no interest in engaging in language learning activities with other tools, got motivated by trying Click AAC:
P13: [For] example, [a boy] had no interest… he [tried Click AAC] and it was very, very emotional ‘cause …Once we started it till now, he’s like, I can’t read, I don’t know, I’ll never know …but when I started talking to him about the app …[he] started saying, yeah, I’m going to learn to read and learn to write. It …got him …motivated to even try, which is very new for him.
Biases introduced did not compromise support but highlighted the importance of AI-human cooperation Automation of vocabulary selection proved helpful and led to positive outcomes, but participants’ experiences highlighted the importance of keeping humans in the loop and revealed new aspects and challenges intrinsic to human-AI cooperation for AAC.
This theme first demonstrates how participants’ perceptions of the vocabulary quality was related to the type of photographs they used as input and the context of use. Then, it shows common biases and errors caused by the algorithms powering Click AAC and revealed how participants cooperated with the AI not only to overcome those issues, but also to achieve improved support that would not be possible by the AI or themselves alone.
Quality of vocabulary was directly related to the photograph’s content, failing for some relevant situations. Our analysis revealed common patterns in the quality of vocabulary generated across different input photographs, signaling the system’s high dependency on the input photograph’s content.
Overall, professionals judged individual words generated to be mostly relevant and requiring only a few modifications, when the scene photographed was correctly identified. Participants positively noted that words were “not limited,” “not too predictable,” and included not only the names of objects depicted in the photo but also a broader set of words related to the scene, “expand[ing] language.”
In general, participants reported that Click AAC was able to correctly identify the scene in the majority of photographs (“most of the time it picks up what you’re doing” (P7)). However, although identification on “cluttered pictures”, or with very specific elements or details occurred in some instances, such as specific gardening tools (for example, pruners and loppers (P7)), TV characters (White, Rue, and Bea from Golden girls (P5)), facial expressions (“straight face” (P10), and age-related attributes (“historic” (P7)), Click AAC often misinterpreted photographs relevant for learner’s common activities.
Participants who encountered the most difficulties cited input photographs containing “two-dimensional” “images that are not real”, such as “cartoon’s characters”, “specific toys”, a “door knob”, a “smiley” face (to talk about emotions), “super heroes”, “ax throwing place”, “play-doh”, “bubbles”, “holiday tree Tu BiShvat”, “body parts for Mr. Potato head”, and “Peppa Pig”.
Besides the lack of specificity in the identification of “uncommon” scenes, participants noticed some items being constantly identified as other similar, but totally unrelated objects. For example, P10 discussed how “random cylinder objects” were being recognized as soda containers, and P7 experienced “laundry soaps …and some softeners …” being ‘identified as food’ ”. Some participants also acknowledge how tricky it is to correctly identify some photos, given potential similarities. For example, P15 described an instance where Click AAC identified a goat as a dog, expressing “but to be fair, he does kind of look like a dog in this picture”.
Errors and biases introduced did not compromise support, and effort correcting and complementing the AI “was worth it”. Despite the aforementioned errors and biases introduced by AI, participants noted that the automation still facilitated performing language stimulation with the learners during meaningful activities. In addition, the great majority of participants found that the effort filtering, complementing, and correcting the AI was worthwhile in comparison with the amount of work needed for programming the current tools, as noted by P6: “It takes less time to create a few boxes than to recreate a complete page”.
Participants found it easy to edit and add individual words once they had learned how to perform those actions, either through the app’s embedded tutorial or by asking for researcher instructions: “It was easy for me to move it around, take off what I didn’t want and add what I did want” (P3).
In instances where professionals were mostly working on aided language stimulation, professionals just mentally ignored irrelevant symbols and focused on relevant ones to maximize the immediacy of the symbolic representation, as P9 discussed: “I don’t delete [anything]. I can …go through and determine which ones I like best”. In most situations though, participants edited the vocabulary prior to engaging with learners, as a preparation for a specific activity, or in conjunction with the learner when the communication was taking place.
Cooperation led to extended support. In most cases, once Click AAC displayed a new vocabulary page, participants checked if the overall scene identification was correct, and scanned (with learners in some instances) through the items to remove undesired items and/or add missing words. Professionals reported that during this scanning process, the initial set of words generated by the AI often “served the role of a prime,” stimulating them to think of new relevant words that they would have not thought if they were selecting the vocabulary by themselves, as P15 discussed:
P15: I might see something that was generated by the app that makes me think: “Oh, that’s a good idea.” …this would also be appropriate and I might not have thought of that before. …when it comes to …vocabulary development, it’s kind of the difference between a blank slate, where you’re thinking, okay where do I start? What do I? How do I come up with something that’s relevant that …and having the app generate some stuff for you, based on a relevant picture, and then that triggers more ideas. So then you might think of other things that you would try programming to see if that would work for the client.
P7 also illustrated that the mutual collaboration between users and AI led to novel levels of support. She discussed how she adapted her communication to incorporate words offered by Click AAC and expanded the interactions with the learner:
P7: [After uploading a photo of a dog,] if a child is not really scanning, but they touch “mammal”, I can go ahead and talk about that and I can say: yeah, she’s a mammal, let’s think of some other mammals. Let’s see …animals that have “fur” (points to the vocab button). …You can really expand just with a handful of vocabulary like that, that you would go. …Why would I want the word “mammal” on a fringe board? …that’s exactly why! So, you can go ahead and expand on language so that no matter what they touch, I can go further with them …
Discussion
We now discuss how the observed and envisioned benefits of such technology relate to the conceptual underpinning of JIT support introduced by Schlosser et al.,19 moving to the implications for the design of such tools for the variety of contexts of use identified in our analysis.
Conceptual underpinning for the benefits from immediacy of vocabulary The benefits of being able to immediately generate vocabulary as needs arise, as revealed in the second theme, included reduced workload leading to increased opportunities for AAC stimulation, facilitated symbolic understanding and sentence construction, support for communicating personal interests, and potential impact on motivation and confidence. These benefits are all tightly related to the conceptual foundations of the JIT support: working memory demands, situated cognition, and teachable moments.19
When communicating with the aid of a traditional dynamic grid display, learners must keep the desired concept in mind while simultaneously remembering the page where that symbol is located, how to navigate to that page, and the location of the desired symbol on the target page, while avoiding distractions that may arise during this process. When forming sentences, users must go through this process several times.20 With the combination of automated generation of vocabulary from photographs and VSD-like interface, users do not need to hold in memory the symbols previously selected nor to remember how to navigate to a desired symbol while constructing sentences, reducing memory demands. Our participants emphasized how this was particularly useful for constructing sentences to model language because learners can focus on the language concepts rather than being burdened with the navigation task.
Our approach enabled users to have symbols representing the real world concepts they were engaging with readily available, which can not only alleviate working memory demands but also facilitate situated cognition. Cognition and learning are inherently dependent on the social and cultural contexts in which they occur, and this is no different for language learning and comprehension.6 Associating language elements with perceived referents while a situation takes place is crucial for learners to comprehend and use language. The immediacy of symbolic representation helps to clarify the relation between objects, symbols, events, and agents participating in that situation.1 By providing related vocabulary instantly without requiring users to anticipate the situation, our approach can increase the frequency of moments for which the learning of symbolic representation through aided language stimulation is possible.
This relates to the third conceptual underpinning of JIT support, teachable moments. According to the education literature,12 teachable moments are those opportunities that emerge when students are excited, engaged, and primed to learn. Adults must provide activities to children according to their level of development, allowing them to “learn what they want and when they are ready to learn”.12 The provision of automatic JIT vocabulary can support conversation partners capitalizing on those teachable moments by being able to adapt the offered support to emerging and unforeseen situations quickly, and to engage in topics of interest of the learner, which can activate background knowledge about those contexts, consequently promoting comprehension.10 Our findings indeed demonstrated how the auto generation of vocabulary in Click AAC enabled or facilitated communication in those teachable moments, even when the generated vocabulary had missing or irrelevant words. Participants explained how the app could provide relevant vocabulary during unplanned, very specific activities (for example, horticulture), or when finding the words on the main device was too frustrating for the learner (for example, visit to dentist).
Bringing these three concepts together we can see that auto JIT vocabulary from photographs not only reduced the workload of AAC professionals, but also enabled them to take advantage of teachable moments that arose during school or therapy activities, facilitating the use of situated cognition in stimulating symbolic AAC.
Designing for specific use cases. Our findings from the first theme provide insights into the scenarios in which automatic JIT vocabulary from photographs can provide support, as well as how people used the support offered across these situations. This enables future research to narrow down the design of tools such as Click AAC. Since our study was exploratory in nature, we designed Click AAC as a generic tool aimed at supporting a wide set of contexts. Future research can now explore how to leverage the capabilities of automatic JIT vocabulary from photographs to facilitate the specific activities identified, including language modeling, sentence construction, language expansion, and past event recount.
For example, researchers can explore different interfaces for facilitating single word modeling for emergent communicators, such as providing only the symbol of the main object identified in the scene, maximized in the display. Continuing, future work can look into the design of tools that facilitate the practice of sentence construction using language concepts extracted from photographs. This may include exercises for filling gaps in sentences related to the identified scene, in which sentences and available options are generated automatically. For example, taking a photograph with a boy playing soccer as input, the application could automatically generate the sentence “the boy is playing”, and ask the user to complete it from a option list including baseball, tennis, and soccer.
To support language expansion, future directions include probing new interactive interfaces and organization strategies that allow easy exploration of semantically related words. For example, words semantically related to the concepts appearing in the photographs, generated by the related-expanded method, could be displayed in a secondary level that would appear only when the user selects the main concepts in the photograph. Finally, we propose studying how to generate more meaningful sentences to retell a past event, in addition to facilitating the presentation and editing of such phrases for maximum personalization. The exploration how to combine multiple photos of the same event for providing support is another possibility, given that people often capture different moments and angles of personally relevant events.
Another avenue for future research is to study how to create a robust AAC that integrates automatic vocabulary from photographs. Our findings pointed to some design opportunities, such as the use of a customizable core vocabulary board across all pages, consistent spacial arrangement of items to support motor planning, access to a keyboard, and possibility to do morphological inflections (for example, plural and past tense).
Improving quality of vocabulary generated. Our study did not focus on evaluating the quality of vocabulary generated through controlled experiments. Nonetheless, our findings are able to provide insights into some common, general patterns in the quality of vocabulary generated in relation to the photograph content, in addition to the use cases for such technology, informing i) the future selection of machine learning models and training dataset for improved scene recognition, ii) context-related vocabulary generation methods, and iii) the selection of adequate datasets for evaluating generation methods during early stages of system design.
Future research can integrate existing techniques for identifying cartoon’s characters18,24 and person re-identification,23 for example, and study whether these models are able to attend the needs of AAC professionals and learners during their routine activities. Another thread of research can study forms of cooperation between AAC users, professionals, and AI to achieve enhanced support. This includes, for example, new techniques that incorporate corrections on the image descriptions and vocabulary set generated made by all users for retraining or reinforcing the image identification model and/or vocabulary generation method over time, aiming at improving their overall accuracy and precision.
The findings that emerged in the third theme also inform how novels methods for expanding the image description into a set of contextually related terms following user’s own style are needed. The narrative method used by Click AAC used corpora from adults in the USA. This was insufficient, leading to mismatch between users language styles and support offered. Future research should investigate generation methods for AAC that accommodate regional styles, and more importantly, that provide children and teenagers with language that sounds like their peers’. One possible avenue is to reproduce the user language style by applying the lexicon terms manually associated with a certain photograph to new photographs containing similar elements (as judged by the AI) during the generation process. Other strategy could be to reinforce the generation method with vocabulary selected during communication.
The necessity of running performance evaluations of AAC systems on datasets has been discussed in the field.7,13 Obtaining quantitative findings that are statistically significant and can inform the fine-tuning of internal components for optimizing the system, and anticipating flaws before testing the system with end users are the main reasons. In the initial evaluation of the storytelling generation method by de Vargas and Moffatt,7 authors found that the method was robust to variations in the input photograph. However our findings revealed that the technique for identifying the scene failed for several AAC use cases, leading to unrelated vocabulary and lack of support. Our findings on what kind of photographs professionals and learners want to use the technology with inform the construction of new datasets for this first stage of system evaluation that better represents AAC use. A possible next step would be to extend the VIST dataset with photos and vocabulary for cartoon characters, popular people, school objects, and toys.
Conclusion
The immense potential of the “iPad and mobile technology revolution” for benefiting AAC users has been discussed for more than a decade, but current symbol-based tools still have not realized the advantages brought by recent advancements in artificial intelligence and context-aware computing. In this work, we integrated computer vision and machine learning techniques proposed by de Vargas and Moffatt7 to create Click AAC—a mobile application that generates situation specific communication boards automatically from photographs. We conducted a user study with AAC professionals and their clients with complex communication needs who used the application in their routine practices for therapy sessions or school activities. We contribute a nuanced understanding of how situation-specific vocabularies automatically generated from photographs can support communication and language learning for individuals with complex communication needs, offering new insights into the design of automatic vocabulary generation methods and interactive interface to provide adequate support across naturalistic scenarios of use and goals.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment