Sign In

Communications of the ACM

121 - 130 of 3,913 for bentley

Visualizing Personal Rhythms: A Critical Visual Analysis of Mental Health in Flux

Visualizations of personal data in self-tracking systems can make even subtle shifts in mental and physical states observable, greatly influencing how health and wellness goals are set, pursued, and achieved. At the same time, recent work in data ethics cautions that standardized models can have unintended negative consequences for some user groups. Through collaborative design and critical visual analysis, this study contrasts conventional visualizations of personal data with the ways that vulnerable populations represent their lived experiences. Participants self-tracked to manage bipolar disorder, a mental illness characterized by severe and unpredictable mood changes. During design sessions, each created a series of timeline drawings depicting their experiences with mental health. Examples of adaptive and vernacular design, these images use both normative standards and individualized graphic modifications. Analysis shows that conventional visual encodings can support facets of self-assessment while also imposing problematic normative standards onto deeply personal experiences.


A New Approach for Pedestrian Density Estimation Using Moving Sensors and Computer Vision

An understanding of person dynamics is indispensable for numerous urban applications, including the design of transportation networks and planning for business development. Pedestrian counting often requires utilizing manual or technical means to count individuals in each location of interest. However, such methods do not scale to the size of a city and a new approach to fill this gap is here proposed. In this project, we used a large dense dataset of images of New York City along with computer vision techniques to construct a spatio-temporal map of relative person density. Due to the limitations of state-of-the-art computer vision methods, such automatic detection of person is inherently subject to errors. We model these errors as a probabilistic process, for which we provide theoretical analysis and thorough numerical simulations. We demonstrate that, within our assumptions, our methodology can supply a reasonable estimate of person densities and provide theoretical bounds for the resulting error.


Suffix rank: a new scalable algorithm for indexing large string collections

We investigate the problem of building a suffix array substring index for inputs significantly larger than main memory. This problem is especially important in the context of biological sequence analysis, where biological polymers can be thought of as very large contiguous strings. The objective is to index every substring of these long strings to facilitate efficient queries. We propose a new simple, scalable, and inherently parallelizable algorithm for building a suffix array for out-of-core strings. Our new algorithm, Suffix Rank, scales to arbitrarily large inputs, using disk as a memory extension. It solves the problem in just O(log n) scans over the disk-resident data. We evaluate the practical performance of our new algorithm, and show that for inputs significantly larger than the available amount of RAM, it scales better than other state-of-the-art solutions, such as eSAIS, SAscan, and eGSA.


Effectively learning spatial indices

Machine learning, especially deep learning, is used increasingly to enable better solutions for data management tasks previously solved by other means, including database indexing. A recent study shows that a neural network can not only learn to predict the disk address of the data value associated with a one-dimensional search key but also outperform B-tree-based indexing, thus promises to speed up a broad range of database queries that rely on B-trees for efficient data access. We consider the problem of learning an index for two-dimensional spatial data. A direct application of a neural network is unattractive because there is no obvious ordering of spatial point data. Instead, we introduce a rank space based ordering technique to establish an ordering of point data and group the points into blocks for index learning. To enable scalability, we propose a recursive strategy that partitions a large point set and learns indices for each partition. Experiments on real and synthetic data sets with more than 100 million points show that our learned indices are highly effective and efficient. Query processing using our indices is more than an order of magnitude faster than the use of R-trees or a recently proposed learned index.


Scalable Machine Learning on High-Dimensional Vectors: From Data Series to Deep Network Embeddings

There is an increasingly pressing need, by several applications in diverse domains, for developing techniques able to analyze very large collections of static and streaming sequences (a.k.a. data series), predominantly in real-time. Examples of such applications come from Internet of Things installations, neuroscience, astrophysics, and a multitude of other scientific and application domains that need to apply machine learning techniques for knowledge extraction. It is not unusual for these applications, for which similarity search is a core operation, to involve numbers of data series in the order of hundreds of millions to billions, which are seldom analyzed in their full detail due to their sheer size. Such application requirements have driven the development of novel similarity search methods that can facilitate scalable analytics in this context. At the same time, a host of other methods have been developed for similarity search of high-dimensional vectors in general. All these methods are now becoming increasingly important, because of the growing popularity and size of sequence collections, as well as the growing use of high-dimensional vector representations of a large variety of objects (such as text, multimedia, images, audio and video recordings, graphs, database tables, and others) thanks to deep network embeddings. In this work, we review recent efforts in designing techniques for indexing and analyzing massive collections of data series, and argue that they are the methods of choice even for general high-dimensional vectors. Finally, we discuss the challenges and open research problems in this area.


Evolving robot software and hardware

This paper summarizes the keynote I gave on the SEAMS 2020 conference. Noting the power of natural evolution that makes living systems extremely adaptive, I describe how artificial evolution can be employed to solve design and optimization problems in software. Thereafter, I discuss the Evolution of Things, that is, the possibility of evolving physical artefacts and zoom in on a (r)evolutionary way of creating `bodies' and `brains' of robots for engineering and fundamental research.


SLEMI: equivalence modulo input (EMI) based mutation of CPS models for finding compiler bugs in Simulink

Finding bugs in commercial cyber-physical system development tools (or "model-based design" tools) such as MathWorks's Simulink is important in practice, as these tools are widely used to generate embedded code that gets deployed in safety-critical applications such as cars and planes. Equivalence Modulo Input (EMI) based mutation is a new twist on differential testing that promises lower use of computational resources and has already been successful at finding bugs in compilers for procedural languages. To provide EMI-based mutation for differential testing of cyber-physical system (CPS) development tools, this paper develops several novel mutation techniques. These techniques deal with CPS language features that are not found in procedural languages, such as an explicit notion of execution time and zombie code, which combines properties of live and dead procedural code. In our experiments the most closely related work (SLforge) found two bugs in the Simulink tool. In comparison, SLEMI found a super-set of issues, including 9 confirmed as bugs by MathWorks Support.


On Building an Automatic Identification of Country-Specific Feature Requests in Mobile App Reviews: Possibilities and Challenges

Mobile app stores are available in over 150 countries, allowing users from all over the world to leave public reviews of downloaded apps. Previous studies have shown that such reviews can serve as sources of requirements and suggested that users from different countries have different needs and expectations regarding the same app. However, the tremendous quantity of reviews from multiple countries, as well as several other factors, complicates identifying country-specific app feature requests. In this work, we present a simple approach to address this through NLP-based analysis and discuss some of the challenges involved in using the NLP-based analysis for this task.


Precfix: large-scale patch recommendation by mining defect-patch pairs

Patch recommendation is the process of identifying errors in software systems and suggesting suitable fixes for them. Patch recommendation can significantly improve developer productivity by reducing both the debugging and repairing time. Existing techniques usually rely on complete test suites and detailed debugging reports, which are often absent in practical industrial settings. In this paper, we propose Precfix, a pragmatic approach targeting large-scale industrial codebase and making recommendations based on previously observed debugging activities. Precfix collects defect-patch pairs from development histories, performs clustering, and extracts generic reusable patching patterns as recommendations. We conducted experimental study on an industrial codebase with 10K projects involving diverse defect patterns. We managed to extract 3K templates of defect-patch pairs, which have been successfully applied to the entire codebase. Our approach is able to make recommendations within milliseconds and achieves a false positive rate of 22% confirmed by manual review. The majority (10/12) of the interviewed developers appreciated Precfix, which has been rolled out to Alibaba to support various critical businesses.


Effective reinforcement learning through evolutionary surrogate-assisted prescription

There is now significant historical data available on decision making in organizations, consisting of the decision problem, what decisions were made, and how desirable the outcomes were. Using this data, it is possible to learn a surrogate model, and with that model, evolve a decision strategy that optimizes the outcomes. This paper introduces a general such approach, called Evolutionary Surrogate-Assisted Prescription, or ESP. The surrogate is, for example, a random forest or a neural network trained with gradient descent, and the strategy is a neural network that is evolved to maximize the predictions of the surrogate model. ESP is further extended in this paper to sequential decision-making tasks, which makes it possible to evaluate the framework in reinforcement learning (RL) benchmarks. Because the majority of evaluations are done on the surrogate, ESP is more sample efficient, has lower variance, and lower regret than standard RL approaches. Surprisingly, its solutions are also better because both the surrogate and the strategy network regularize the decision making behavior. ESP thus forms a promising foundation to decision optimization in real-world problems.