Research and Advances
Computing Applications Digital government

Supporting Visual Analysis of Federal Geospatial Statistics

Federal government agencies generate, summarize, and disseminate a large and growing volume of statistical data that can be linked through common geospatial referencing. The potential of this data is often unrealized because agencies and their data users lack usable and useful data analysis tools that support multivariate geospatial data exploration and processing.
Posted
  1. Article
  2. References
  3. Authors
  4. Footnotes
  5. Figures

Research highlighted here is focused on human-centered design and implementation of component-based tools that help agency analysts and others to identify errors, anomalies, clusters, and possible multivariate relationships in geospatially referenced data. One specific focus in our digital government research has been to develop and assess components that support highly interactive visual data analysis. The accompanying figure presents a prototypical application of the tools being developed, uncovering a trend toward increasing lung cancer mortality rates for white females in a few regions of the U.S. (a trend counter to decreases for the U.S. as a whole).

A table browser component provides access to multivariate data for health service areas (HSAs)—806 data aggregation units covering the U.S., each with one or more counties. The table browser adapts an application initially built as part of a separate digital government project (Citizen Access to Government Statistical Data). This component is dynamically linked to both a map and an interactive parallel coordinate plot (PCP). The latter depicts multivariate data visually for all HSAs, while the map depicts one variable at a time spatially. Each axis of the PCP represents a data variable and the axes can be manually or computationally sorted. The first five axes shown depict age-adjusted lung cancer mortality for white females (LUNWF) for each of five, three-year averages. The sixth visible axis depicts per-capita income for 1993 (PCI93).

Line segments connect the values on each variable (axis) for each HSA, creating a multivariate signature for each place. With 806 signatures displayed at once, the pattern can be difficult to interpret. Focusing (that is, narrowing the view to subsets of the data range) has been used to highlight HSAs having data values in the top 7% (purple) and bottom 7% (green) for the selected PCP axes (white female lung cancer mortality rates for 1991–93). The preponderance of crossed line segments between the LUNWF92 and PCI93 axes indicates an inverse relationship (regions with a low per capita income have high lung cancer mortality rates). The dynamically linked map highlights the location of the extreme HSAs (using the same colors as used in the PCP). Also highlighted on both the PCP and map are one prototypical high rate HSA that is picked (selected non-transiently and shown in light blue), and one low rate HSA is indicated (selected transiently and shown in green).

The visual analysis presents a clear picture. While lung cancer mortality for white women began decreasing nationally after a rise in the 1980s, mortality rates for the highest mortality areas (mostly low-income areas) continued to rise. This diverging trend corresponds to related evidence that both major cancers and cancers overall are exhibiting increasing disparity for places of low versus high socioeconomic status [5].

The example here includes only a small sample of available tools that can be integrated into data analysis applications using GeoVISTA Studio (another example developed specifically for our digital government work is a manipulable matrix for exploring bivariate relationships, see [2]). GeoVISTA Studio is a cross-platform, Web-deployable, Java-based, development environment that facilitates integration of data visualization and analysis components to produce stand-alone applications and Web applets.1 Studio enables data analysts who are not software developers to construct analysis applications from components developed independently (as long as the components meet the JavaBeans API standards). Details about Studio are reported in [3, 4].

One objective in our work is to develop effective data analysis components, integrate them into applications, and assess usefulness and usability of those applications. Beyond this, a specific focus has been to enable comprehensive and flexible coordination among software components. There is considerable evidence from usability assessments of coordinated views for information visualization and query that coordinated multiview environments are effective tools for access and analysis of complex information [1]. Coordination among our exploratory spatial data analysis beans is achieved through a separate coordination bean that supports several independent, simultaneous, dynamic connections among coordinator-aware components. These dynamic connections extend traditional concepts of linked brushing. The current implementation supports coordination for three categories of selection and two of visual appearance. For selection, events shared among components (all illustrated in the figure) are picking (direct selection of objects by pointing or bounding), indication (transient picking, as in a mouse-over), and focusing (indirectly manipulating the data range displayed). For visual appearance, shared events include data-to-display mapping (for example, shared colors to depict data categories on the map and parallel coordinate plot) and setting the display context (for example, shared background color, font for text labels, and so on).

Working closely with agency partners, formal usability methods are being applied to continued development and refinement of data exploration components and their coordination. Formal study of the impact of these tools on strategies for data analysis is planned for the coming year.

Back to Top

Back to Top

Back to Top

Back to Top

Figures

UF1 Figure. A multicomponent exploratory spatial data analysis application constructed with GeoVISTA Studio. In the application, a table (lower left) and a map (upper left) are dynamically linked to an interactive parallel coordinate plot (lower right). The latter depicts multivariate health service area (HSA) data; for each HSA one set of linked line segments depicts a trace (a signature) through multivariate space.

Back to top

    1. Chimera, R., and Shneiderman, B. An exploratory evaluation of three interfaces for browsing large hierarchical tables of contents. ACM Trans. Info. Systems 12, 4 (1994), 383–406.

    2. Dai, X., and Hardisty, F. Conditioned and manipulable matrix for visual exploration. In Proceedings of the National Conference for Digital Government Research (Los Angeles, CA, May 20–22, 2002), 489–492.

    3. Gahegan, M., Takatsuka, M., Wheeler, M., and Hardisty, F. Introducing GeoVISTA Studio: An integrated suite of visualization and computational methods for exploration and knowledge construction in geography. Computers, Environment and Urban Systems 26, 4 (2001), 267–292.

    4. MacEachren, A.M., Hardisty, F., Gahegan, M., Wheeler, M., Dai, X., Guo, D., and Takatsuka, M. Supporting visual integration and analysis of geospatially-referenced statistics through Web-deployable, cross-platform tools. In Proceeding of the National Conference for Digital Government Research (Los Angeles, CA, May 21–23, 2001), 17–24.

    5. Singh, G.K., Miller, B.A., and Hankey, B.F. Area socioeconomic status and changing patterns in U.S. cancer mortality, 1950–1998: Part II—Lung and colorectal cancers. J. of the National Cancer Institute.

    1 Mark Gahegan directs the GeoVISTA Studio project, with Masa Takatsuka as the primary software architect. For more details and to download the software, see: www.geovista.psu.edu.

    This research is part of a larger project (Collaborative Research: Quality Graphics for Federal Statistical Summaries) directed by Dan Carr at George Mason University. Alan MacEachren is PI for the Penn State University component and David Scott is PI for the Rice University component. The project involves collaboration with eight partner agencies (the National Cancer Institute, Census Bureau Population Division, Bureau of Labor Statistics, National Center for Health Statistics, Energy Information Agency, National Agricultural Statistical Service, Environmental Protection Agency, and Bureau of Transportation Statistics). The specific research presented here was carried out in the GeoVISTA Center at Penn State and supported in part by the U.S. National Science Foundation, grant #EIA-9983451. Two additional NSF-funded Digital Government projects contributed toward the software components presented here: #EIA-9983445 and #EIA-9876640.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More