Research highlighted here is focused on human-centered design and implementation of component-based tools that help agency analysts and others to identify errors, anomalies, clusters, and possible multivariate relationships in geospatially referenced data. One specific focus in our digital government research has been to develop and assess components that support highly interactive visual data analysis. The accompanying figure presents a prototypical application of the tools being developed, uncovering a trend toward increasing lung cancer mortality rates for white females in a few regions of the U.S. (a trend counter to decreases for the U.S. as a whole).
A table browser component provides access to multivariate data for health service areas (HSAs)—806 data aggregation units covering the U.S., each with one or more counties. The table browser adapts an application initially built as part of a separate digital government project (Citizen Access to Government Statistical Data). This component is dynamically linked to both a map and an interactive parallel coordinate plot (PCP). The latter depicts multivariate data visually for all HSAs, while the map depicts one variable at a time spatially. Each axis of the PCP represents a data variable and the axes can be manually or computationally sorted. The first five axes shown depict age-adjusted lung cancer mortality for white females (LUNWF) for each of five, three-year averages. The sixth visible axis depicts per-capita income for 1993 (PCI93).
Line segments connect the values on each variable (axis) for each HSA, creating a multivariate signature for each place. With 806 signatures displayed at once, the pattern can be difficult to interpret. Focusing (that is, narrowing the view to subsets of the data range) has been used to highlight HSAs having data values in the top 7% (purple) and bottom 7% (green) for the selected PCP axes (white female lung cancer mortality rates for 199193). The preponderance of crossed line segments between the LUNWF92 and PCI93 axes indicates an inverse relationship (regions with a low per capita income have high lung cancer mortality rates). The dynamically linked map highlights the location of the extreme HSAs (using the same colors as used in the PCP). Also highlighted on both the PCP and map are one prototypical high rate HSA that is picked (selected non-transiently and shown in light blue), and one low rate HSA is indicated (selected transiently and shown in green).
The visual analysis presents a clear picture. While lung cancer mortality for white women began decreasing nationally after a rise in the 1980s, mortality rates for the highest mortality areas (mostly low-income areas) continued to rise. This diverging trend corresponds to related evidence that both major cancers and cancers overall are exhibiting increasing disparity for places of low versus high socioeconomic status [5].
The example here includes only a small sample of available tools that can be integrated into data analysis applications using GeoVISTA Studio (another example developed specifically for our digital government work is a manipulable matrix for exploring bivariate relationships, see [2]). GeoVISTA Studio is a cross-platform, Web-deployable, Java-based, development environment that facilitates integration of data visualization and analysis components to produce stand-alone applications and Web applets.1 Studio enables data analysts who are not software developers to construct analysis applications from components developed independently (as long as the components meet the JavaBeans API standards). Details about Studio are reported in [3, 4].
One objective in our work is to develop effective data analysis components, integrate them into applications, and assess usefulness and usability of those applications. Beyond this, a specific focus has been to enable comprehensive and flexible coordination among software components. There is considerable evidence from usability assessments of coordinated views for information visualization and query that coordinated multiview environments are effective tools for access and analysis of complex information [1]. Coordination among our exploratory spatial data analysis beans is achieved through a separate coordination bean that supports several independent, simultaneous, dynamic connections among coordinator-aware components. These dynamic connections extend traditional concepts of linked brushing. The current implementation supports coordination for three categories of selection and two of visual appearance. For selection, events shared among components (all illustrated in the figure) are picking (direct selection of objects by pointing or bounding), indication (transient picking, as in a mouse-over), and focusing (indirectly manipulating the data range displayed). For visual appearance, shared events include data-to-display mapping (for example, shared colors to depict data categories on the map and parallel coordinate plot) and setting the display context (for example, shared background color, font for text labels, and so on).
Working closely with agency partners, formal usability methods are being applied to continued development and refinement of data exploration components and their coordination. Formal study of the impact of these tools on strategies for data analysis is planned for the coming year.
Figures
Figure. A multicomponent exploratory spatial data analysis application constructed with GeoVISTA Studio. In the application, a table (lower left) and a map (upper left) are dynamically linked to an interactive parallel coordinate plot (lower right). The latter depicts multivariate health service area (HSA) data; for each HSA one set of linked line segments depicts a trace (a signature) through multivariate space.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment