Sign In

Communications of the ACM

121 - 130 of 961 for bentley

Distributed Spatial and Spatio-Temporal Join on Apache Spark

Effective processing of extremely large volumes of spatial data has led to many organizations employing distributed processing frameworks. Apache Spark is one such open source framework that is enjoying widespread adoption. Within this data space, it is important to note that most of the observational data (i.e., data collected by sensors, either moving or stationary) has a temporal component or timestamp. To perform advanced analytics and gain insights, the temporal component becomes equally important as the spatial and attribute components. In this article, we detail several variants of a spatial join operation that addresses both spatial, temporal, and attribute-based joins. Our spatial join technique differs from other approaches in that it combines spatial, temporal, and attribute predicates in the join operator. In addition, our spatio-temporal join algorithm and implementation differs from others in that it runs in commercial off-the-shelf (COTS) application. The users of this functionality are assumed to be GIS analysts with little if any knowledge of the implementation details of spatio-temporal joins or distributed processing. They are comfortable using simple tools that do not provide the ability to tweak the configuration of the algorithm or processing environment. The spatio-temporal join algorithm behind the tool must always succeed, regardless of input data parameters (e.g., it can be highly irregularly distributed, contain large numbers of coincident points, it can be extremely large, etc.). These factors combine to place additional requirements on the algorithm that are uncommonly found in the traditional research environment. Our spatio-temporal join algorithm was shipped as part of the GeoAnalytics Server [12], part of the ArcGIS Enterprise platform from version 10.5 onward.

3D Printers as Sociable Technologies: Taking Appropriation Infrastructures to the Internet of Things

3D printers have become continuously more present and are a perspicuous example of how technologies are becoming more complex and ubiquitous. To some extent, the emerging technological infrastructures around them exemplify ways how digitalization will change production machines and lines, in general, in the Internet of Things (IoT). From an End-User Development perspective, the main question is how users can be supported in managing those complex digital production lines. To reach a better understanding, we carefully analyzed 3D printers as an example of highly digitalized production machines with regard to the creative activities of their users that help them to make these machines work for their practices. In our study of appropriation processes, we are concerned with situational and social aspects of the configuration and practice challenges associated with making digitalization work and how IoT technologies can support these collaborative appropriation activities of end users by making these machines more “sociable.” We therefore conceptualize the idea of “Sociable Technologies” and implement a prototype that provides hardware-integrated affordances for communicating and documenting practices of usage. Based on the findings of our evaluation, we derive lessons learnt when aiming at making complex technologies more usable.

Designing with Gaze: Tama -- a Gaze Activated Smart-Speaker

Recent developments in gaze tracking present new opportunities for social computing. This paper presents a study of Tama, a gaze actuated smart speaker. Tama was designed taking advantage of research on gaze in conversation. Rather than being activated with a wake word (such as "Ok Google") Tama detects the gaze of a user, moving an articulated 'head' to achieve mutual gaze. We tested Tama's use in a multi-party conversation task, with users successfully activating and receiving a response to over 371 queries (over 10 trials). When Tama worked well, there was no significant difference in length of interaction. However, interactions with Tama had a higher rate of repeated queries, causing longer interactions overall. Video analysis lets us explain the problems users had interacting with gaze. In the discussion, we describe implications for designing new gaze systems, using gaze both as input and output. We also discuss how the relationship to anthropomorphic design and taking advantage of learned skills of interaction. Finally, two paths for future work are proposed, one in the field of speech agents, and the second in using human gaze as an interaction modality more widely.

Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes

Using indexes for query execution is crucial for achieving high performance in modern on-line transaction processing databases. For a main-memory database, however, these indexes consume a large fraction of the total memory available and are thus a major source of storage overhead of in-memory databases. To reduce this overhead, we propose using a two-stage index: The first stage ingests all incoming entries and is kept small for fast read and write operations. The index periodically migrates entries from the first stage to the second, which uses a more compact, read-optimized data structure. Our first contribution is hybrid index, a dual-stage index architecture that achieves both space efficiency and high performance. Our second contribution is Dual-Stage Transformation (DST), a set of guidelines for converting any order-preserving index structure into a hybrid index. Our third contribution is applying DST to four popular order-preserving index structures and evaluating them in both standalone microbenchmarks and a full in-memory DBMS using several transaction processing workloads. Our results show that hybrid indexes provide comparable throughput to the original ones while reducing the memory overhead by up to 70%.

PreDict: Predictive Dictionary Maintenance for Message Compression in Publish/Subscribe

Data usage is a significant concern, particularly in smartphone applications, M2M communications and for Internet of Things (IoT) applications. Messages in these domains are often exchanged with a backend infrastructure using publish/subscribe (pub/sub). Shared dictionary compression has been shown to reduce data usage in pub/sub networks beyond that obtained using well-known techniques, such as DEFLATE, gzip and delta encoding, but such compression requires manual configuration, which increases the operational complexity.

To address this challenge, we design a new dictionary maintenance algorithm called PreDict that adjusts its operation over time by adapting its parameters to the message stream and that amortizes the resulting compression-induced bandwidth overhead by enabling high compression ratios.

PreDict observes the message stream, takes the costs specific to pub/sub into account and uses machine learning and parameter fitting to adapt the parameters of dictionary compression to match the characteristics of the streaming messages continuously over time. The primary goal is to reduce the overall bandwidth of data dissemination without any manual parameterization.

PreDict reduces the overall bandwidth by 72.6% on average. Furthermore, the technique reduces the computational overhead by ≈ 2x for publishers and by ≈ 1.4x for subscribers compared to the state of the art using manually selected parameters. In challenging configurations that have many more publishers (10k) than subscribers (1), the overall bandwidth reductions are more than 2x higher than that obtained by the state of the art.

Theoretically-Efficient and Practical Parallel DBSCAN

The DBSCAN method for spatial clustering has received significant attention due to its applicability in a variety of data analysis tasks. There are fast sequential algorithms for DBSCAN in Euclidean space that take O(nłog n) work for two dimensions, sub-quadratic work for three or more dimensions, and can be computed approximately in linear work for any constant number of dimensions. However, existing parallel DBSCAN algorithms require quadratic work in the worst case. This paper bridges the gap between theory and practice of parallel DBSCAN by presenting new parallel algorithms for Euclidean exact DBSCAN and approximate DBSCAN that match the work bounds of their sequential counterparts, and are highly parallel (polylogarithmic depth). We present implementations of our algorithms along with optimizations that improve their practical performance. We perform a comprehensive experimental evaluation of our algorithms on a variety of datasets and parameter settings. Our experiments on a 36-core machine with two-way hyper-threading show that our implementations outperform existing parallel implementations by up to several orders of magnitude, and achieve speedups of up to 33x over the best sequential algorithms.

Progressive disclosure: empirically motivated approaches to designing effective transparency

As we increasingly delegate important decisions to intelligent systems, it is essential that users understand how algorithmic decisions are made. Prior work has often taken a technocentric approach to transparency. In contrast, we explore empirical user-centric methods to better understand user reactions to transparent systems. We assess user reactions to transparency in two studies. In Study 1, users anticipated that a more transparent system would perform better, but retracted this evaluation after experience with the system. Qualitative data suggest this arose because transparency is distracting and undermines simple heuristics users form about system operation. Study 2 explored these effects in depth, suggesting that users may benefit from initially simplified feedback that hides potential system errors and assists users in building working heuristics about system operation. We use these findings to motivate new progressive disclosure principles for transparency in intelligent systems.

Improving intrusion detectors by crook-sourcing

Conventional cyber defenses typically respond to detected attacks by rejecting them as quickly and decisively as possible; but aborted attacks are missed learning opportunities for intrusion detection. A method of reimagining cyber attacks as free sources of live training data for machine learning-based intrusion detection systems (IDSes) is proposed and evaluated. Rather than aborting attacks against legitimate services, adversarial interactions are selectively prolonged to maximize the defender's harvest of useful threat intelligence. Enhancing web services with deceptive attack-responses in this way is shown to be a powerful and practical strategy for improved detection, addressing several perennial challenges for machine learning-based IDS in the literature, including scarcity of training data, the high labeling burden for (semi-)supervised learning, encryption opacity, and concept differences between honeypot attacks and those against genuine services. By reconceptualizing software security patches as feature extraction engines, the approach conscripts attackers as free penetration testers, and coordinates multiple levels of the software stack to achieve fast, automatic, and accurate labeling of live web data streams.

Prototype implementations are showcased for two feature set models to extract security-relevant network- and system-level features from servers hosting enterprise-grade web applications. The evaluation demonstrates that the extracted data can be fed back into a network-level IDS for exceptionally accurate, yet lightweight attack detection.

Explaining Viewers' Emotional, Instrumental, and Financial Support Provision for Live Streamers

On live streams, viewers can support streamers through various methods ranging from well-wishing text messages to money. In this study (N=230) we surveyed viewers who had given money to a streamer. We identified six motivations for why they gave money to their favorite live streamer. We then examined how factors related to viewer, streamer, and viewer-streamer interaction were associated with three forms of social support provision: emotional, instrumental, and financial support. Our main findings are: parasocial relationship was consistently correlated with all three types of social support, while social presence was only related with instrumental and financial support; interpersonal attractiveness was associated with emotional and instrumental support and lonely people were more likely to give instrumental support. Our focus on various types of social support in a live streaming masspersonal platform adds a more detailed understanding to the existing literature of mediated social support. Furthermore, it suggests potential directions for designing more supportive and interactive live streaming platforms.

The case for voter-centered audits of search engines during political elections

Search engines, by ranking a few links ahead of million others based on opaque rules, open themselves up to criticism of bias. Previous research has focused on measuring political bias of search engine algorithms to detect possible search engine manipulation effects on voters or unbalanced ideological representation in search results. Insofar that these concerns are related to the principle of fairness, this notion of fairness can be seen as explicitly oriented toward election candidates or political processes and only implicitly oriented toward the public at large. Thus, we ask the following research question: how should an auditing framework that is explicitly centered on the principle of ensuring and maximizing fairness for the public (i.e., voters) operate? To answer this question, we qualitatively explore four datasets about elections and politics in the United States: 1) a survey of eligible U.S. voters about their information needs ahead of the 2018 U.S. elections, 2) a dataset of biased political phrases used in a large-scale Google audit ahead of the 2018 U.S. election, 3) Google's "related searches" phrases for two groups of political candidates in the 2018 U.S. election (one group is composed entirely of women), and 4) autocomplete suggestions and result pages for a set of searches on the day of a statewide election in the U.S. state of Virginia in 2019. We find that voters have much broader information needs than the search engine audit literature has accounted for in the past, and that relying on political science theories of voter modeling provides a good starting point for informing the design of voter-centered audits.