Computing Applications Special Section: Supporting Exploratory Search

Supporting Insight-Based Information Exploration in Intelligence Analysis

Capturing the exploratory search process can help represent analytical insight.

By John Gersh, Bessie Lewis, Jaime Montemayor, Christine Piatko, and Russell Turner

Posted Apr 1 2006

Introduction
A Scenario
Semantic Neighborhoods: An Example of Rich Information Collections
Making Sense with Rich Information Collections
Software Architecture for Rich Information Collections
Conclusion
References
Authors
Footnotes
Figures

We are interested in the role of exploratory search in the intelligence analysis process, especially its role in sensemaking: how can exploring a set of information help an analyst to synthesize, understand, and present a coherent explanation of what it tells us about the world?

The process of exploratory search can help an analyst develop a story implied by relationships among discovered information items. One of the key challenges in supporting this process is the representation, depiction, and recording of insights—the basic elements of analysis. We have developed a simple concept for such representations, called “rich information collections,” which contain not just the analyst’s search results but also an executable collection specification by which similar information can be found, together with the analyst’s rationale for collecting them. In this article, we describe the nature of rich information collections, illustrate how they can be used in the process of exploring information represented by complex graphs such as social networks, and present a general concept for using such collections in analysis.

A primary product of intelligence analysis is insight; insight is about something, and it is based on something (“I think that these three people might be planning an attack because this data shows they received explosives from a known terrorist.“) Descriptions and models of the analysis process often include chains of data, evidence, hypotheses, and other constructs, in which a collection of lower-level information supports a statement (hypothetical or real) at some higher level of organization or abstraction [1]. Furthermore, the analysis process is inherently iterative, involving alternating narrowing and broadening of focus in a search for information [9]. In this context, exploratory search can be characterized by successive queries interspersed with stages of sensemaking [10]. At each stage, the analyst’s evolving insight determines the nature and utility of additional information and the explanatory role of information discovered.

In typical cases, an analyst pulls together a collection of interesting nuggets of information from reports, databases, open sources, colleagues, and so forth. The resulting insight may remain in the analyst’s head, it may be recorded in text snippets on paper or in computer files, or it may be organized into a formal report. Insights are tested against new information over time and are modified or discarded. Current insights may be shared with colleagues and insights from the past may be resurrected when similar situations arise. The effectiveness of such sharing and recall, though, may be limited by the ad hoc nature of the records and their separation from the information that generated them. Marchionini (in this section) characterizes search activities as supporting lookup, learning, or investigating. Exploratory search for developing coherent insight about what is happening in the world involves frequent transitions among all three activities; tools for supporting sensemaking should provide a framework for maintaining context across those transitions. An explicit representation within an information system of the insight and its supporting data can help to provide that context.

What if the information search and exploration process itself could be used to record the development of the insight?

Testing and modification of ideas can also benefit from such a representation. Requiring the analyst to record insights in a special software application, though, could interrupt the thought process and add to workload. What if the information search and exploration process itself could be used to record the development of the insight?

A Scenario

Consider the following scenario: Mary, an intelligence analyst, is interested in the activities of a suspected terrorist cell. She has populated a local workspace with information relevant to this topic from available data sources. Today she is notified about a meeting thought to involve terrorism action planning. Mary views a social-network graph in this workspace; the graph depicts entities of interest and relationships among them. She follows attended relationship links from the meeting to person entities. From prior research, she suspects these attendees are planners and facilitators and unlikely to directly engage in acts of terrorism.

By continuing to explore the data in this graph, she discovers prior events associated with meeting attendees and people known to have participated in these events. Her resulting insight is that these newly discovered people form a group that might carry out a plan made at the meeting. She creates in her workspace an information collection that includes these people together with a specification of the path of entity and relationship types she followed to find them while exploring the social-network graph (“person that participated-in an event that is associated-with a person who attended this-meeting“). Mary annotates the collection with a description of her insight about potential activity. A colleague, Tom, also alerted to the meeting, asks Mary if she thinks cell members are involved. She sends him her information collection and he adds it to his own workspace. When Tom looks at it, he is presented with the collection depicted as Mary saw it, annotated with her conclusion. The next day Tom requests an update of the information in the collection; its specification of the social network path is re-executed to see if any new entities in the graph should be added to the collection or previous members removed. Note that the key elements in this scenario include: the collection of information put together by the analyst during exploratory search, an executable specification for finding similar information, and the annotation describing her insight. Our work has led us to consider these elements as the components of a richly described information collection that can be used to represent what an analyst’s insight is about, what it is based upon, and what it is. Hence, a rich information collection.

Semantic Neighborhoods: An Example of Rich Information Collections

The concept of a rich information collection has evolved from our work on user interactions with visual representations of complex conceptual graphs (for example, social or contact-chaining networks). In particular, we developed “semantic navigation” to support user-guided exploration of such graphs [8]. Semantic navigation is a constrained approach for exploring a graph that results in a collection of entities and relationships meaningfully related to one another that form what we call a “semantic neighborhood” (as opposed to topological “nearest-neighbors”). Our hypothesis is that meaningful collections of entities are likely to be distributed throughout the graph, where “meaningful” is relative to a particular analyst’s exploration of the information represented. (In the scenario, Mary discovers a particular semantic neighborhood.) This hypothesis has been supported by discussions with and demonstrations to former intelligence analysts.

A semantic neighborhood contains entities whose relationship is explained by an analyst’s insight. Figure 1 shows two semantic neighborhoods in a social-network graph as depicted in our prototype semantic navigation system. (Information in the graph was extracted manually from news reports.) One neighborhood relates to possible implementers of an attack that may have been planned at a meeting, as described in the scenario here. The other neighborhood relates to known members of a terrorist group. Figure 2a shows the control panel for specifying the chain of entities and relationships that defines one of these neighborhoods. Figure 2b shows the control panel for selecting neighborhoods to display, selecting attributes and modes of display, and entering information describing a neighborhood. Visual attributes include color, transparency, icon, and size of nodes and links. Controls include neighborhood selection, background/foreground contrast, and selection of representation mode (an exploration discovery summary or a full elaboration). Displaying multiple neighborhoods automatically highlights common members. For example, the two enlarged icons in Figure 1 represent people that belong to both of the depicted neighborhoods.

The semantic neighborhood is one example of a rich information collection. Its components are a set of information items, a collecting specification, and a descriptive annotation, as represented by figures 1, 2a and 2b respectively. The information item set is made up of individual items appropriate to an analyst’s domain of investigation. It might contain images, reports about people, organizations, and financial institutions for an analyst following a money trail, or even individual hypotheses in an analysis of competing hypotheses [7]. In fact, the products of one level of analysis might be information set members for another level. The collecting specification describes how the information items were gathered in a format that can be repeated to find other such items. It might be a simple database query, a chain of relationship types in a social-network graph, or a place and time for surveillance. The annotation is a statement of the analyst’s particular insight. In Figure 2b, the annotation is a textual description; it could also be, for example, a markup of a diagram or picture, or a Web link. The rich information collection thus elaborates upon the raw information it contains with a method for obtaining similar information and an explanatory annotation; these provide context for understanding and use in another situation.

The semantic neighborhood is one example of a rich information collection. Its components are a set of information items, a collecting specification, and a descriptive annotation.

An important characteristic of rich information collections is that they do not constrain the analyst to any particular method for building or modifying sets of information items. We are extending our prototype system to include interaction capabilities that support collection building in ways other than semantic navigation. These include keyword searches, dynamic queries [11], manual inclusion or removal of items, and subgraph templates for pattern matching. In general, users will broaden or narrow the search as they construct and revise the query pattern, by selecting individual collection members, or by limiting the timeframe of event nodes using dynamic range slider controls. Our research is at an early stage; in addition to including a wider repertoire of query mechanisms, we are investigating ways for visually organizing sets of rich information collections and for explicitly supporting the changes in insight that occur during the analysis process.

Making Sense with Rich Information Collections

The concept of a rich information collection (see Figure 3) can be used to support more general sensemaking activities. (See [3, 12] for discussions of sensemaking.) Bodnar [1, 2], for example, describes several stages in analysis, each stage involving information at increasing levels of abstraction: Data Source, Shoebox (the loose collection of “what I’m working on”), Evidence, Schema, and Theory. A similar description appears in [10]. Key points in both are that there is a sequence of stages that can be traversed either bottom-up (data-driven) or top-down (hypothesis-driven) in which the products of one stage are the input to the next; and that the process is iterative—one can actually go from almost any stage to any other.

A rich information collection can be used to maintain context while traversing these stages. A collection at one level may represent an entity at the next. For example, Mary’s analysis scenario describes the construction of a collection that, in fact, represents a potential terrorist action. The collection itself can form an information element in the following step. The existence of the cell, for example, could be an item of evidence supporting a schema describing potential activity. The rich information collection put together by the analyst represents the analyst’s insight that this item of evidence is valid. An individual analyst could keep a set of rich information collections in a local workspace, updating them, relating them, and modifying them as the process of analysis continues. A rich information collection may also represent a hypothesis, especially if it involves hypothetical entities or relationships. A user-created information element (“I think these two people are linked.”) can be marked as hypothetical. Executing a rich information collection’s collecting specification could serve as a search for the existence of the hypothetical entity. Diagnostic evidence between competing hypotheses could also be represented in this way.

The presence of the individual information items and the collecting specification in the rich information collection means it is “live” for users with access to the information items. Changes in the collection, due to the addition, modification, or deletion of items, can be immediately linked to representations of the collection itself. This can enable an active chain in which the representation of analytic assessment in a user interface highlights in response to changes in a data item relating to a piece of evidence on which the assessment is based.

Other concepts of dynamic information sets that include exploratory queries or query sequences are e-sets in the NaviQue information gathering and organizing environment [5] and dynamic aggregates in the Visage visual query environment [4]. E-sets are sets of items (data or executable queries, for example) together with metadata and a user interface widget for viewing the sets. Dynamic aggregates are information sets whose contents are specified in part by an associated query sequence. Our approach to rich information collections expands on the basic concept by providing a set of operations for graph navigation and a generalizable annotation mechanism. The approach also provides an information architecture based on a central role rich information collections have in the sensemaking process.

Software Architecture for Rich Information Collections

We have developed a software architecture for building interactive visual applications that create and use rich information collections. The architecture defines abstract interfaces to both a data model and a user interface framework, allowing data model and user interface implementations to vary. The abstract data model represents a network consisting of nodes, that represent entities such as persons, places, or organizations, and connecting edges that represent relationships between entities. The nodes can be arbitrarily connected via edges to form a directed graph data structure. Information in the form of text, numerical values, or other data types can be associated with individual entities and relationships by attaching arbitrary attributes to the nodes and edges in the graph. To support collections and semantic navigation, the data model defines sets of nodes and edges.

To paraphrase Hamming, the purpose of exploratory search is insight, not data. In intelligence analysis, as in other domains, that insight comes from the process of exploration, not just from its end result.

Sets can also have arbitrary attributes, providing for the association of query sequences and annotations with collections. Sets may contain other sets providing a mechanism for supporting relationships among sensemaking stages. The software architecture contains a plug-in mechanism for assembling components of visualization functionality into a user-interface framework. These visualization components communicate with each other via the semantic network data model using an event passing mechanism that follows a model-view-controller design pattern. We are using this framework to develop additional user interface mechanisms for interacting with information elements and rich information collections to support analysts’ exploratory search and sensemaking activities.

Conclusion

To paraphrase Hamming [6], the purpose of exploratory search is insight, not data. In intelligence analysis, as in other domains, that insight comes from the process of exploration, not just from its end result. We are interested in capturing and visually representing analysts’ iterative query processes and insights to help them collect and compare information more effectively, as well as record and share the products of their analytic insights.

Our early work has focused on insight-based information exploration of social networks. We are continuing to develop and evaluate further exploratory search interaction and visualization mechanisms to create, compare, and share rich information collections. We believe that such mechanisms will help analysts’ exploratory search activities to be more effective for sensemaking and for sharing their insights with others.

Figures

Figure 1. Two semantic neighborhoods relating data that is not necessarily directly connected. The enlarged human icons indicate that two entities belong to multiple neighborhoods.

Figure 2a. Semantic navigation path control showing the levels of progression of an analyst’s exploratory activities in a social network graph.

Figure 2b. Semantic neighborhood description and visualization control.

Figure 3. A generalized picture of a rich information collection. The data can be collected using an arbitrary number of constraints through different methods. The data, how it was collected, and a descriptive annotation combine to record the analyst’s insight.

Submit an Article to CACM

CACM welcomes unsolicited submissions on topics of relevance and value to the computing community.

You Just Read

Supporting Insight-Based Information Exploration in Intelligence Analysis

View in the ACM Digital Library

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

DOI

10.1145/1121949.1121984

April 2006 Issue

Published: April 1, 2006

Vol. 49 No. 4

Pages: 63-68

Table of Contents

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Explore More

News Apr 23 2024

Maximizing Power Grid Security

R. Colin Johnson

Security and Privacy

News Apr 18 2024

Keeping AI Out of Elections

Bennie Mols

Artificial Intelligence and Machine Learning

BLOG@CACM Apr 17 2024

Technical Marvels

Herbert Bruderer

Computer History

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More

A Scenario

Semantic Neighborhoods: An Example of Rich Information Collections

Making Sense with Rich Information Collections

Software Architecture for Rich Information Collections

Conclusion

Figures

Supporting Insight-Based Information Exploration in Intelligence Analysis

DOI

April 2006 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.