Sign In

Communications of the ACM

121 - 130 of 786 for bentley

Spherical Region Queries on Multicore Architectures

In this short paper, we report the performance of multiple thread-parallel algorithms for spherical region queries on multicore architectures motivated by a challenging data analytics application in materials science. Performances of two tree-based algorithms and a naive algorithm are compared to identify the length scales at which these approaches perform optimally. The optimal algorithm is then used to scale the driving materials science application, which is shown to deliver over 17X speedup using 32 OpenMP threads on data sets containing many millions of atoms.

Collecting Observational Data about Online Video Use in the Home Using Open-Source Broadcasting Software

Capturing contextual data about online media consumption in the home can be difficult, often requiring site visits and hardware installation in the field. In this paper, we present an exploratory study in which we use free, open-source broadcasting software and participants' existing computer hardware to capture remote, contextual video data inside the home. This method allows participants to simultaneously capture live recordings across multiple computer screens-as well as themselves and their home viewing environment-while watching long-form online video. We discuss the affordances and challenges of this method for researchers seeking to capture contextual data remotely.

News From the Background to the Foreground: How People Use Technology To Manage Media Transitions: A Study of Technology-mediated News Behaviors in a Hyper-connected World

People are the designers and curators of their own news and information ecosystems, due to the disruption of the news industry and developments in media technology. To understand how people use technology to manage their news consumption, we conducted a two-week diary study with 14 participants, focusing on how people transition between news content and behaviors via different media, sources, platforms and devices. We used an inductive, qualitative analysis of the diary study data to analyze the news behaviors and their underlying motivations and found that people frequently shift their focus between ambient background news streams and active foreground news behaviors. Although people often passively consume news content as a background activity, they also actively manage background news habits to increase the chances of relevant foreground experiences. People manage news consumption by developing routines that are often supported by technology use and social interactions. We encourage product designers to treat backgrounding as an essential part of news consumption behavior and suggest new design directions that employ ubiquitous computing technologies---such as context sensing and routine modeling---to more effectively attend to background-to-foreground behaviors and transitions.

Hierarchical IP flow clustering

The analysis of flow traces can help to understand a network's usage patterns. We present a hierarchical clustering algorithm for network flow data that can summarize terabytes of IP traffic into a parsimonious tree model. The method automatically finds an appropriate scale of aggregation so that each cluster represents a local maximum of the traffic density from a block of source addresses to a block of destination addresses. We apply this clustering method on NetFlow data from an enterprise network, find the largest traffic clusters, and analyze their stationarity across time. The existence of heavy-volume clusters that persist over long time scales can help network operators to perform usage-based accounting, capacity provisioning and traffic engineering. Also, changes in the layout of hierarchical clusters can facilitate the detection of anomalies and significant changes in the network workload.

Reflections on ESM in the Wild: the Case of a Mobile Head-Gesture Game

The conduction of HCI studies in the wild brings new challenges to face, like the collection of qualitative data. In particular, one of the biggest concerns about the qualitative data is the external validity of the results obtained. We conducted an ESM based experiment to study the context of use of a mobile head-gesture game in natural settings. The obtained low rate of responses motivated us to think over the external validity of our results. To explain this behaviour, we also analysed the data collected from a technology usage study of the same head-gesture game published in a public app store. The strong correlation found between the obtained data from these two different studies encourages us to believe that the low rate of responses obtained is a consequence of the nature of our particular research question instead of a threat to the external validity of our results.

The glorious promise of the post-truth world

"Post-truth"---an adjective designated the 2016 Word of the Year by the Oxford English Dictionaries, and the related term "truthiness," have received much public attention recently, and have inspired heated discussions of "fake news" and "alternative facts."

In this article (spoof/parody/satire/dystopia/…, depending on how you read it), the author argues that the essential role of truthiness in human life is underestimated, and that it "is safer to embrace the inevitable and march into the brave new post-truth world."

BBoxDB - A Scalable Data Store for Multi-Dimensional Big Data

BBoxDB is a distributed and highly available key-bounding-box-value store which enhances the classical key-value data model with an axis-parallel bounding box. The bounding box describes the location of the values in an n-dimensional space, and enables BBoxDB to efficiently distribute multi-dimensional data across a cluster of nodes. Well-known geometric algorithms (such as the K-D Tree) are used to create distribution regions (multi-dimensional shards). Distribution regions are created dynamically, based on the stored data. BBoxDB stores data of multiple tables co-partitioned, which enables efficient distributed spatial joins. Spatial joins on co-partitioned tables can be executed without data shuffling between nodes. A two-level index structure is employed to retrieve stored data quickly. We demonstrate the interaction with the system, the dynamic creation of distribution regions and the data redistribution feature of BBoxDB.

Exploring State-of-the-Art Nearest Neighbor (NN) Search Techniques

Finding nearest neighbors (NN) is a fundamental operation in many diverse domains such as databases, machine learning, data mining, information retrieval, multimedia retrieval, etc. Due to the data deluge and the application of nearest neighbor queries in many applications where fast performance is necessary, efficient index structures are required to speed up finding nearest neighbors. Different application domains have different data characteristics and, therefore, require different types of indexing techniques. While the internal indexing and searching mechanism is generally hidden from the top-level application, it is beneficial for a data scientist to understand these fundamental operations and choose a correct indexing technique to improve the performance of the overall end-to-end workflow. Choosing the correct searching mechanism to solve a nearest neighbor query can be a daunting task, however. A wrong choice can potentially lead to low accuracy, slower execution time, or in the worst case, both. The objective of this tutorial is to present the audience with the knowledge to choose the correct index structure for specific applications. We present the state-of-the-art Nearest Neighbor (NN) indexing techniques for different data characteristics. We also present the effect, in terms of time and accuracy, of choosing the wrong index structure for different application needs. We conclude the tutorial with a discussion on the future challenges in the Nearest Neighbor search domain.

Design of Fast and Scalable Clustering Algorithm on Spark

Clustering is a popular unsupervised data mining technique. It has been applied in various data mining and big data applications. Efficient clustering algorithms and implementation techniques are keys to cope with the scalability and performance requirements of big data analysis. This paper introduces the design and implementation of a density-based clustering algorithm that can deal with big data efficiently and effectively. We present a parallel Shared Nearest Neighbor (SNN) clustering algorithm using the k-dimensional tree (k-d tree) to reduce search time to improve efficiency. The proposed algorithm is implemented in a distributed environment using the Spark framework. The effectiveness of the proposed algorithm has been evaluated through a case study involving four data sets, Bristol Crime Stats, 911 call, Complex9, and TLC Trip datasets.