Data visualization is the primary means by which data analysts explore patterns, trends, and insights in their data. Unfortunately, existing visual analytics tools offer limited expressiveness and scalability when it comes to searching for visualizations over large datasets, making visual data exploration labor-intensive and time-consuming. In this work, we introduce the problem of visualization search and highlight two underlying challenges of search enumeration and visualization matching. To address them, we first present our work on Zenvisage that helps enumerate large collections of visualizations and supports simple visualization matching with the help of an interactive interface and an expressive visualization query language. For more finegrained and flexible visualization matching, including search for underspecified and approximate patterns, we extend Zenvisage to develop ShapeSearch. ShapeSearch supports a novel shape querying algebra that helps express a large class of pattern queries that are hard to specify with existing systems. ShapeSearch exposes multiple specification mechanisms: sketch, natural-language, and visual regular expressions that help users easily issue shape queries, while applying query-aware and perceptually-aware optimizations to efficiently execute them within interactive response times. To conclude, we discuss a number of open research problems to further improve the usability and performance of both Zenvisage and ShapeSearch.
Data visualization is the primary means via which data analysts—many of whom have limited programming skills—explore their data. While the usability and visual encoding capabilities of data visualization tools such as Tableau and Excel have undergone a massive evolution over the years, when it comes to searching for patterns, trends, and insights in large and complex datasets, these tools are severely limited. The state of the art for data analysts, especially non-programmers, is to load their data into a visualization tool and repeatedly generate visualizations until the desired patterns or insights are identified. Unfortunately, this repeated process of manual examination to scour for desired insights becomes painful, tedious, and time consuming as the size and complexity of datasets increase. Even on moderately sized datasets, a data analyst may need to examine as many as tens of thousands of visualizations, all to test a single hypothesis, a severe impediment to data exploration. We characterize this problem of visualization search using examples from genomics data analysis.
Motivating example. Genomic researchers often study genes, for example, how genes affect clinical trial outcomes, how the behavior of genes gets affected on specific medications. As an example, given a dataset consisting of clinical trial outcomes (positive vs. negative), researchers often want to find genes that can visually explain the differences in these outcomes. To do so, current tools require researchers to manually generate tens of thousands of scatter plots—with the x- and y-axes each referring to a gene, and each outcome depicted as a point in the scatterplot—to determine whether the outcomes can be clearly distinguished in the scatter plot.
No entries found