Sign In

Communications of the ACM

121 - 130 of 1,346 for bentley

Range Thresholding on Streams

This paper studies a type of continuous queries called range thresholding on streams (RTS). Imagine the stream as an unbounded sequence of elements each of which is a real value. A query registers an interval, and must be notified as soon as a certain number of incoming elements fall into the interval. The system needs to support multiple queries simultaneously, and aims to minimize the space consumption and computation time. Currently, all the solutions to this problem entail quadratic time O(nm) to process n stream elements and m queries, which severely limits their applicability to only a small number of queries. We propose the first algorithm that breaks the quadratic barrier, by reducing the computation cost dramatically to O(n + m), subject only to a polylogarithmic factor. The algorithm is general enough to guarantee the same on weighted versions of the queries even in d-dimensional space of any constant d. Its vast advantage over the previous methods in practical environments has been confirmed through extensive experimentation.

Recurrent Binary Embedding for GPU-Enabled Exhaustive Retrieval from Billion-Scale Semantic Vectors

Rapid advances in GPU hardware and multiple areas of Deep Learning open up a new opportunity for billion-scale information retrieval with exhaustive search. Building on top of the powerful concept of semantic learning, this paper proposes a Recurrent Binary Embedding (RBE) model that learns compact representations for real-time retrieval. The model has the unique ability to refine a base binary vector by progressively adding binary residual vectors to meet the desired accuracy. The refined vector enables efficient implementation of exhaustive similarity computation with bit-wise operations, followed by a near-lossless k-NN selection algorithm, also proposed in this paper. The proposed algorithms are integrated into an end-to-end multi-GPU system that retrieves thousands of top items from over a billion candidates in real-time. The RBE model and the retrieval system were evaluated with data from a major paid search engine. When measured against the state-of-the-art model for binary representation and the full precision model for semantic embedding, RBE significantly outperformed the former, and filled in over 80% of the AUC gap in-between. Experiments comparing with our production retrieval system also demonstrated superior performance. While the primary focus of this paper is to build RBE based on a particular class of semantic models, generalizing to other types is straightforward, as exemplified by two different models at the end of the paper.

How Do We Create a Fantabulous Password?

Although pronounceability can improve password memorability, most existing password generation approaches have not properly integrated the pronounceability of passwords in their designs. In this work, we demonstrate several shortfalls of current pronounceable password generation approaches, and then propose, ProSemPass, a new method of generating passwords that are pronounceable and semantically meaningful. In our approach, users supply initial input words and our system improves the pronounceability and meaning of the user-provided words by automatically creating a portmanteau. To measure the strength of our approach, we use attacker models, where attackers have complete knowledge of our password generation algorithms. We measure strength in guess numbers and compare those with other existing password generation approaches. Using a large-scale IRB-approved user study with 1,563 Amazon MTurkers over 9 different conditions, our approach achieves a 30% higher recall than those from current pronounceable password approaches, and is stronger than the offline guessing attack limit.

Multi-Domain Level Generation and Blending with Sketches via Example-Driven BSP and Variational Autoencoders

Procedural content generation via machine learning (PCGML) has demonstrated its usefulness as a content and game creation approach, and has been shown to be able to support human creativity. An important facet of creativity is combinational creativity or the recombination, adaptation, and reuse of ideas and concepts between and across domains. In this paper, we present a PCGML approach for level generation that is able to recombine, adapt, and reuse structural patterns from several domains to approximate unseen domains. We extend prior work involving example-driven Binary Space Partitioning for recombining and reusing patterns in multiple domains, and incorporate Variational Autoencoders (VAEs) for generating unseen structures. We evaluate our approach by blending across 7 domains and subsets of those domains. We show that our approach is able to blend domains together while retaining structural components. Additionally, by using different groups of training domains our approach is able to generate both 1) levels that reproduce and capture features of a target domain, and 2) levels that have vastly different properties from the input domain.

InfiniTouch: Finger-Aware Interaction on Fully Touch Sensitive Smartphones

Smartphones are the most successful mobile devices and offer intuitive interaction through touchscreens. Current devices treat all fingers equally and only sense touch contacts on the front of the device. In this paper, we present InfiniTouch, the first system that enables touch input on the whole device surface and identifies the fingers touching the device without external sensors while keeping the form factor of a standard smartphone. We first developed a prototype with capacitive sensors on the front, the back and on three sides. We then conducted a study to train a convolutional neural network that identifies fingers with an accuracy of 95.78% while estimating their position with a mean absolute error of 0.74cm. We demonstrate the usefulness of multiple use cases made possible with InfiniTouch, including finger-aware gestures and finger flexion state as an action modifier.

Leveraging Context-Free Grammar for Efficient Inverted Index Compression

Large-scale search engines need to answer thousands of queries per second over billions of documents, which is typically done by querying a large inverted index. Many highly optimized integer encoding techniques are applied to compress the inverted index and reduce the query processing time. In this paper, we propose a new grammar-based inverted index compression scheme, which can improve the performance of both index compression and query processing.

Our approach identifies patterns (common subsequences of docIDs) among different posting lists and generates a context-free grammar to succinctly represent the inverted index. To further optimize the compression performance, we carefully redesign the index structure. Experiments show a reduction up to 8.8% in space usage while decompression is up to 14% faster.

We also design an efficient list intersection algorithm which utilizes the proposed grammar-based inverted index. We show that our scheme can be combined with common docID reassignment methods and encoding techniques, and yields about 14% to 27% higher throughput for AND queries by utilizing multiple threads.

Investigating Expectations for Voice-based and Conversational Argument Search on the Web

Millions of arguments are shared on the web. Future information systems will be able to exploit this valuable knowledge source and to retrieve arguments relevant and convincing to our specific need---all with an interface as intuitive as asking your friend "Why ...". Although recent advancements in argument mining, conversational search, and voice recognition have put such systems within reach, many questions remain open, especially on the interface side. In this regard the paper at hand presents the first study of argument search behavior. We conduct an online-survey and a focused user study, putting emphasis on what people expect argument search to be like, rather than on what current first-generation systems provide. Our participants expected to use voice-based argument search mostly at home, but also together with others. Moreover, they expect such search systems to provide rich information on retrieved arguments, such as the source, supporting evidence, and background knowledge on entities or events mentioned. In observed interactions with a simulated system we found that the participants adapted their search behavior to different types of tasks, and that up-front categorization of the retrieved arguments is perceived as helpful if this is short. Our findings are directly applicable to the design of argument search systems, not only voice-based ones.

Locality-Sensitive Hashing Scheme based on Longest Circular Co-Substring

Locality-Sensitive Hashing (LSH) is one of the most popular methods for c-Approximate Nearest Neighbor Search (c-ANNS) in high-dimensional spaces. In this paper, we propose a novel LSH scheme based on the Longest Circular Co-Substring (LCCS) search framework (LCCS-LSH) with a theoretical guarantee. We introduce a novel concept of LCCS and a new data structure named Circular Shift Array (CSA) for k-LCCS search. The insight of LCCS search framework is that close data objects will have a longer LCCS than the far-apart ones with high probability. LCCS-LSH is LSH-family-independent, and it supports c-ANNS with different kinds of distance metrics. We also introduce a multi-probe version of LCCS-LSH and conduct extensive experiments over five real-life datasets. The experimental results demonstrate that LCCS-LSH outperforms state-of-the-art LSH schemes.

Designing for Advanced Personalization in Personal Task Management

Many applications provide personalization mechanisms through which users can make changes to adapt a system to better fit their needs or preferences. However, advanced personalization, such as extending system functionality, is often only available to programmers. Building on ideas from end-user programming and personalization literature, we developed an adaptable task management tool that allows advanced personalization using a self-disclosing mechanism and a guided scripting mechanism, ScriPer. We present our design process, its outcome, and the results of a user study (n=24). Participants, even those with no to some background in programming, were able to use ScriPer to perform advanced personalization (in 142 of 144 trials). We also found error patterns differed across programming expertise.

Behind the Voices: The Practice and Challenges of Esports Casters

Casters commentate on a live, streamed video game for a large online audience. Drawing from 20 semi-structured interviews with amateur casters of either Dota 2 or Rocket League video games and over 20 hours of participant observations, we describe the distinctive practices of two types of casters, play-by-play and color commentary. Play-by-play casters are adept at improvising a rich narrative of hype on top of live games, whereas color commentators methodically prepare to fill in the gaps of live play with informative analysis. Casters often start out alone, relying upon reflective practice to hone their craft. Through examining challenges faced by amateur casters, we identified three design opportunities for game designers to support casters and would-be casters as first-class users. Such designs would provide an antidote to the challenges faced by amateur casters: those of the lack of social support for casting, camerawork, and data availability.