Sign In

Communications of the ACM

121 - 130 of 1,079 for bentley

From "nobody cares" to "way to go!": A Design Framework for Social Sharing in Personal Informatics

Many research applications and popular commercial applications include features for sharing personally collected data with others in social awareness streams. Prior work has identified several barriers to use as well as discrepancies between designer goals and how these features are used in practice. We develop a framework for designing and evaluating these features based on an extensive review of prior literature. We demonstrate the value of this framework by analyzing physical activity sharing on Twitter, coding 4,771 tweets and their responses and gathering 444 reactions from 97 potential tweet recipients, learning that specific user-generated content leads to more responses and is better received by the post audience. We conclude by extending our findings to other sharing problems and discussing the value of our design framework.

Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes

Using indexes for query execution is crucial for achieving high performance in modern on-line transaction processing databases. For a main-memory database, however, these indexes consume a large fraction of the total memory available and are thus a major source of storage overhead of in-memory databases. To reduce this overhead, we propose using a two-stage index: The first stage ingests all incoming entries and is kept small for fast read and write operations. The index periodically migrates entries from the first stage to the second, which uses a more compact, read-optimized data structure. Our first contribution is hybrid index, a dual-stage index architecture that achieves both space efficiency and high performance. Our second contribution is Dual-Stage Transformation (DST), a set of guidelines for converting any order-preserving index structure into a hybrid index. Our third contribution is applying DST to four popular order-preserving index structures and evaluating them in both standalone microbenchmarks and a full in-memory DBMS using several transaction processing workloads. Our results show that hybrid indexes provide comparable throughput to the original ones while reducing the memory overhead by up to 70%.

PreDict: Predictive Dictionary Maintenance for Message Compression in Publish/Subscribe

Data usage is a significant concern, particularly in smartphone applications, M2M communications and for Internet of Things (IoT) applications. Messages in these domains are often exchanged with a backend infrastructure using publish/subscribe (pub/sub). Shared dictionary compression has been shown to reduce data usage in pub/sub networks beyond that obtained using well-known techniques, such as DEFLATE, gzip and delta encoding, but such compression requires manual configuration, which increases the operational complexity.

To address this challenge, we design a new dictionary maintenance algorithm called PreDict that adjusts its operation over time by adapting its parameters to the message stream and that amortizes the resulting compression-induced bandwidth overhead by enabling high compression ratios.

PreDict observes the message stream, takes the costs specific to pub/sub into account and uses machine learning and parameter fitting to adapt the parameters of dictionary compression to match the characteristics of the streaming messages continuously over time. The primary goal is to reduce the overall bandwidth of data dissemination without any manual parameterization.

PreDict reduces the overall bandwidth by 72.6% on average. Furthermore, the technique reduces the computational overhead by ≈ 2x for publishers and by ≈ 1.4x for subscribers compared to the state of the art using manually selected parameters. In challenging configurations that have many more publishers (10k) than subscribers (1), the overall bandwidth reductions are more than 2x higher than that obtained by the state of the art.

Theoretically-Efficient and Practical Parallel DBSCAN

The DBSCAN method for spatial clustering has received significant attention due to its applicability in a variety of data analysis tasks. There are fast sequential algorithms for DBSCAN in Euclidean space that take O(nłog n) work for two dimensions, sub-quadratic work for three or more dimensions, and can be computed approximately in linear work for any constant number of dimensions. However, existing parallel DBSCAN algorithms require quadratic work in the worst case. This paper bridges the gap between theory and practice of parallel DBSCAN by presenting new parallel algorithms for Euclidean exact DBSCAN and approximate DBSCAN that match the work bounds of their sequential counterparts, and are highly parallel (polylogarithmic depth). We present implementations of our algorithms along with optimizations that improve their practical performance. We perform a comprehensive experimental evaluation of our algorithms on a variety of datasets and parameter settings. Our experiments on a 36-core machine with two-way hyper-threading show that our implementations outperform existing parallel implementations by up to several orders of magnitude, and achieve speedups of up to 33x over the best sequential algorithms.

Progressive disclosure: empirically motivated approaches to designing effective transparency

As we increasingly delegate important decisions to intelligent systems, it is essential that users understand how algorithmic decisions are made. Prior work has often taken a technocentric approach to transparency. In contrast, we explore empirical user-centric methods to better understand user reactions to transparent systems. We assess user reactions to transparency in two studies. In Study 1, users anticipated that a more transparent system would perform better, but retracted this evaluation after experience with the system. Qualitative data suggest this arose because transparency is distracting and undermines simple heuristics users form about system operation. Study 2 explored these effects in depth, suggesting that users may benefit from initially simplified feedback that hides potential system errors and assists users in building working heuristics about system operation. We use these findings to motivate new progressive disclosure principles for transparency in intelligent systems.

Improving intrusion detectors by crook-sourcing

Conventional cyber defenses typically respond to detected attacks by rejecting them as quickly and decisively as possible; but aborted attacks are missed learning opportunities for intrusion detection. A method of reimagining cyber attacks as free sources of live training data for machine learning-based intrusion detection systems (IDSes) is proposed and evaluated. Rather than aborting attacks against legitimate services, adversarial interactions are selectively prolonged to maximize the defender's harvest of useful threat intelligence. Enhancing web services with deceptive attack-responses in this way is shown to be a powerful and practical strategy for improved detection, addressing several perennial challenges for machine learning-based IDS in the literature, including scarcity of training data, the high labeling burden for (semi-)supervised learning, encryption opacity, and concept differences between honeypot attacks and those against genuine services. By reconceptualizing software security patches as feature extraction engines, the approach conscripts attackers as free penetration testers, and coordinates multiple levels of the software stack to achieve fast, automatic, and accurate labeling of live web data streams.

Prototype implementations are showcased for two feature set models to extract security-relevant network- and system-level features from servers hosting enterprise-grade web applications. The evaluation demonstrates that the extracted data can be fed back into a network-level IDS for exceptionally accurate, yet lightweight attack detection.

Explaining Viewers' Emotional, Instrumental, and Financial Support Provision for Live Streamers

On live streams, viewers can support streamers through various methods ranging from well-wishing text messages to money. In this study (N=230) we surveyed viewers who had given money to a streamer. We identified six motivations for why they gave money to their favorite live streamer. We then examined how factors related to viewer, streamer, and viewer-streamer interaction were associated with three forms of social support provision: emotional, instrumental, and financial support. Our main findings are: parasocial relationship was consistently correlated with all three types of social support, while social presence was only related with instrumental and financial support; interpersonal attractiveness was associated with emotional and instrumental support and lonely people were more likely to give instrumental support. Our focus on various types of social support in a live streaming masspersonal platform adds a more detailed understanding to the existing literature of mediated social support. Furthermore, it suggests potential directions for designing more supportive and interactive live streaming platforms.

The case for voter-centered audits of search engines during political elections

Search engines, by ranking a few links ahead of million others based on opaque rules, open themselves up to criticism of bias. Previous research has focused on measuring political bias of search engine algorithms to detect possible search engine manipulation effects on voters or unbalanced ideological representation in search results. Insofar that these concerns are related to the principle of fairness, this notion of fairness can be seen as explicitly oriented toward election candidates or political processes and only implicitly oriented toward the public at large. Thus, we ask the following research question: how should an auditing framework that is explicitly centered on the principle of ensuring and maximizing fairness for the public (i.e., voters) operate? To answer this question, we qualitatively explore four datasets about elections and politics in the United States: 1) a survey of eligible U.S. voters about their information needs ahead of the 2018 U.S. elections, 2) a dataset of biased political phrases used in a large-scale Google audit ahead of the 2018 U.S. election, 3) Google's "related searches" phrases for two groups of political candidates in the 2018 U.S. election (one group is composed entirely of women), and 4) autocomplete suggestions and result pages for a set of searches on the day of a statewide election in the U.S. state of Virginia in 2019. We find that voters have much broader information needs than the search engine audit literature has accounted for in the past, and that relying on political science theories of voter modeling provides a good starting point for informing the design of voter-centered audits.

Oct-TINs: A Data Structure of Triangular Irregular Networks for Terrain Map Visualization

This paper deals with a generation method of TINs from regular rectangular dissections for an effective and efficient visualization of DEM data. The method consists of two steps. The first step consists of the application of a known resolution reduction method based on octgrids for regular rectangular dissections. The second step consists of the application of a known triangulation method to those resolution reduced rectangular dissections. TINs derived from octgrids are called oct-TINs. Effectiveness and efficiency of oct-TINs generated by the above method are also shown by examples.