Sign In

Communications of the ACM

121 - 130 of 2,354 for bentley

Close pair queries in moving object databases

Databases of moving objects are important for air traffic control, ground traffic, and battlefield configurations. We introduce the (historical and spatial) range close-pair query for moving objects as an important problem for such databases. The purpose of a range close-pair query for moving objects is to find pairs of objects that were closer than ε during time interval $I$ and within spatial range R, where ε, I and R are user-specified parameters.This paper solves the range close-pair query using two components: the retrieval component and the close-pair identification component. The retrieval component breaks up long trajectories into trajectory segments, which are produced in increasing time order, without the need for sorting. The retrieval component takes advantage of a new index mechanism, the Multiple TSB-tree. The segments are then pipelined to the close-pair identification component. The identification component introduces a novel spatial sweep that sweeps by time and one spatial dimension at the same time. Extensive experimental results are provided, demonstrating the advantages of the new approach when considering close pairs.

A space-optimal data-stream algorithm for coresets in the plane

Given a point set P⊆R2, a subset Q⊆ P is an ε-kernel of P if for every slab W containing Q, the (1+ε)-expansion of W also contains P. We present a data-stream algorithm for maintaining an ε-kernel of a stream of points in R2 that uses O(1/√ ε) space and takes O(log (1/ε)) amortized time to process each point. This is the first space-optimal data-stream algorithm for this problem.

Robust and efficient polygon overlay on parallel stream processors

The plane sweep algorithm, although widely used in computational geometry, does not parallelize efficiently, rendering it incapable of benefiting from recent trends of multi-core CPUs and general-purpose GPUs. Instead of the plane sweep, some researchers have proposed the uniform grid as a foundation for parallel algorithms of computational geometry, but long-standing robustness and performance issues have deterred its wider adoption, at least in the case of overlay analysis. To remedy that, we have developed previously missing methods to perform snap rounding and compute efficiently the winding number of overlay faces on the uniform grid, and we have implemented them as part of an extensible geometry engine to perform polygon overlay with OpenMP on CPUs and CUDA on GPUs. The overall algorithm works on any polygon configuration, either degenerate, overlapping, self-overlapping, disjoint, or with holes. On typical data, it features time and space complexities of O(N + K) where N is the number of edges and K the number of intersections. Its single-threaded performance not only rivals the plane sweep, it achieves a parallel efficiency of 0.9 on our quad-core CPU, with an additional speedup of over 4 on our GPU, a result that should extrapolate to distributed computing and other geometric operations.

Personal knowledge questions for fallback authentication: security questions in the era of Facebook

Security questions (or challenge questions) are commonly used to authenticate users who have lost their passwords. We examined the password retrieval mechanisms for a number of personal banking websites, and found that many of them rely in part on security questions with serious usability and security weaknesses. We discuss patterns in the security questions we observed. We argue that today's personal security questions owe their strength to the hardness of an information-retrieval problem. However, as personal information becomes ubiquitously available online, the hardness of this problem, and security provided by such questions, will likely diminish over time. We supplement our survey of bank security questions with a small user study that supplies some context for how such questions are used in practice.

BSkyTree: scalable skyline computation using a balanced pivot selection

Skyline queries have gained a lot of attention for multi-criteria analysis in large-scale datasets. While existing skyline algorithms have focused mostly on exploiting data dominance to achieve efficiency, we propose that data incomparability should be treated as another key factor in optimizing skyline computation. Specifically, to optimize both factors, we first identify common modules shared by existing non-index skyline algorithms, and then analyze them to develop a cost model to guide a balanced pivot point selection. Based on the cost model, we lastly implement our balanced pivot selection in two algorithms, BSkyTree-S and BSkyTree-P, treating both dominance and incomparability as key factors. Our experimental results demonstrate that proposed algorithms outperform state-of-the-art skyline algorithms up to two orders of magnitude.

An experimental investigation of set intersection algorithms for text searching

The intersection of large ordered sets is a common problem in the context of the evaluation of boolean queries to a search engine. In this article, we propose several improved algorithms for computing the intersection of sorted arrays, and in particular for searching sorted arrays in the intersection context. We perform an experimental comparison with the algorithms from the previous studies from Demaine, López-Ortiz, and Munro [ALENEX 2001] and from Baeza-Yates and Salinger [SPIRE 2005]; in addition, we implement and test the intersection algorithm from Barbay and Kenyon [SODA 2002] and its randomized variant [SAGA 2003]. We consider both the random data set from Baeza-Yates and Salinger, the Google queries used by Demaine et al., a corpus provided by Google, and a larger corpus from the TREC Terabyte 2006 efficiency query stream, along with its own query log. We measure the performance both in terms of the number of comparisons and searches performed, and in terms of the CPU time on two different architectures. Our results confirm or improve the results from both previous studies in their respective context (comparison model on real data, and CPU measures on random data) and extend them to new contexts. In particular, we show that value-based search algorithms perform well in posting lists in terms of the number of comparisons performed.

Adding an interactive display to a public basketball hoop can motivate players and foster community

Interactive displays that aim to engage people through play have been successfully deployed in urban environments. However, there has been little work bringing interactive displays into existing public game spaces like outdoor basketball courts. To explore this, we designed an interactive display for a public half-court basketball hoop. We studied the impact of 3 different display modes over a 10-week period through interviews with players, spectators, and passers-by. Our findings suggest 3 dimensions for the design space of such interactive displays: balancing noticeability across different user groups, support for different play action, and support for connecting user groups. We also present 6 design tactics along these dimensions to help designers create engaging interactive displays for public game spaces. using it to facilitate engaging playful experiences.

Social audio features for advanced music retrieval interfaces

The size of personal music collections has constantly increased over the past years. As a result, the traditional metadata based lists to browse these collections have reached their limits. Interfaces that are based on music similarity offer an alternative and thus are increasingly gaining attention. Music similarity is typically either derived from audio-features (objective approach) or from user driven information sources, such as collaborative filtering or social tags (subjective approach). Studies show that the latter techniques outperform audio-based approaches when it comes to describe the perceived music similarity. However, subjective approaches typically only define pairwise relations as opposed to the global notion of similarity given by audio-feature spaces. Many of the proposed interfaces for similarity based music access inherently depend on this global notion and are thus not applicable to user driven music similarity measures. The first contribution of this paper is a high dimensional music space that is based on user driven similarity measures. It combines the advantages of audio-feature spaces (global view) with the advantages of subjective sources that better reflect the users' perception. The proposed space compactly represents similarity and therefore is well suited for offline use, such as in mobile applications. To demonstrate the practical applicability, the second contribution is a comprehensive mobile music player that incorporates several smart interfaces to access the user's music collection. Based on this application, we finally present a large-scale user study that underlines the benefits of the introduced interfaces and shows their great user acceptance.

Finding Probabilistic k-Skyline Sets on Uncertain Data

Skyline is a set of points that are not dominated by any other point. Given uncertain objects, probabilistic skyline has been studied which computes objects with high probability of being skyline. While useful for selecting individual objects, it is not sufficient for scenarios where we wish to compute a subset of skyline objects, i.e., a skyline set. In this paper, we generalize the notion of probabilistic skyline to probabilistic k-skyline sets (Pk-SkylineSets) which computes k-object sets with high probability of being skyline set. We present an efficient algorithm for computing probabilistic k-skyline sets. It uses two heuristic pruning strategies and a novel data structure based on the classic layered range tree to compute the skyline set probability for each instance set with a worst-case time bound. The experimental results on the real NBA dataset and the synthetic datasets show that Pk-SkylineSets is interesting and useful, and our algorithms are efficient and scalable.

Sampling Big Trajectory Data

The increasing prevalence of sensors and mobile devices has led to an explosive increase of the scale of spatio-temporal data in the form of trajectories. A trajectory aggregate query, as a fundamental functionality for measuring trajectory data, aims to retrieve the statistics of trajectories passing a user-specified spatio-temporal region. A large-scale spatio-temporal database with big disk-resident data takes very long time to produce exact answers to such queries. Hence, approximate query processing with a guaranteed error bound is a promising solution in many scenarios with stringent response-time requirements. In this paper, we study the problem of approximate query processing for trajectory aggregate queries. We show that it boils down to the distinct value estimation problem, which has been proven to be very hard with powerful negative results given that no index is built. By utilizing the well-established spatio-temporal index and introducing an inverted index to trajectory data, we are able to design random index sampling (RIS) algorithm to estimate the answers with a guaranteed error bound. To further improve system scalability, we extend RIS algorithm to concurrent random index sampling (CRIS) algorithm to process a number of trajectory aggregate queries arriving concurrently with overlapping spatio-temporal query regions. To demonstrate the efficacy and efficiency of our sampling and estimation methods, we applied them in a real large-scale user trajectory database collected from a cellular service provider in China. Our extensive evaluation results indicate that both RIS and CRIS outperform exhaustive search for single and concurrent trajectory aggregate queries by two orders of magnitude in terms of the query processing time, while preserving a relative error ratio lower than 10\%, with only 1% search cost of the exhaustive search method.