Sign In

Communications of the ACM

Contributed articles

Community Sense and Response Systems: Your Phone as Quake Detector


View as: Print Mobile App ACM Digital Library Full Text (PDF) In the Digital Edition Share: Send by email Share on reddit Share on StumbleUpon Share on Hacker News Share on Tweeter Share on Facebook
Community Sense and Response Systems: Your Phone as Quake Detector, illustration

Credit: Hitandrun

The proliferation of smartphones and other powerful sensor-equipped consumer devices enables a new class of Web application: community sense and response (CSR) systems, distinguished from standard Web applications by their use of community-owned commercial sensor hardware. Just as social networks connect and share human-generated content, CSR systems gather, share, and act on sensory data from users' Internet-enabled devices. Here, we discuss the Caltech Community Seismic Network (CSN) as a prototypical CSR system harnessing accelerometers in smartphones and consumer electronics, including the systems and algorithmic challenges of designing, building, and evaluating a scalable network for real-time awareness of dangerous earthquakes.

Back to Top

Key Insights

ins01.gif

Worldwide, approximately two million Android and iOS devices have been activated every day since 2012, each carrying numerous sensors and high-speed Internet connection. Several recent sensing projects have sought to partner with the owners of these and other consumer devices to collect, share, and act on sensor data about phenomena that could affect millions of lives. Coupled with cloud computing platforms, these networks can achieve vast coverage previously beyond the reach of sensor networks.6 Boulos et al.5 includes an excellent overview of how the social and mobile Web facilitate crowdsourcing data from individuals and their sensor devices. Additional applications of community and participatory sensing include understanding traffic flows,4,14,16,20 identifying sources of environmental pollution,1,2 monitoring public health,18 and responding to natural disasters like hurricanes, floods, and earthquakes.8,9,11,15 These systems are made possible through volunteer sensors and low-cost Web solutions for data collection and storage. However, as the systems mature, they will undoubtedly extend beyond data collection and take real-time action on behalf of the community; for example, traffic networks can reroute traffic around a crash, and a seismic network can automatically slow trains to prevent derailment.

Back to Top

From Collection to Action

Acting on community sensor data is fundamentally different from acting on data in standard Web applications or scientific sensors. The potential volume of raw data is vast, even by the standards of large Web applications. Data recorded by community sensors often includes signals produced by the people operating them. And many of the desired applications involve understanding physical phenomena at a finer scale than that of previous scientific models.

A CSR network can produce an enormous volume of raw data. Smartphones and other consumer devices often have multiple sensors and can produce continuous streams of GPS position, acceleration, rotation, audio, and video data. Even if events of interest like traffic accidents, earthquakes, and disease outbreaks are rare, devices must still monitor continuously to detect them. Beyond obvious data heavyweights like video, rapidly monitoring even a single accelerometer or microphone produces hundreds of megabytes per day. Community sensing makes possible networks with tens of thousands or even millions of devices; for example, equipping taxis with GPS devices or air-quality sensors can easily yield a network of 50,000 sensors in a big city like Beijing. At this scale, collecting even a small set of summary statistics is daunting; if 50,000 sensors report a brief status update once per minute, the total number of messages would rival the daily load in the Twitter network.

Community devices also differ from their counterparts in traditional scientific and industrial applications. Beyond simply being less accurate than "professional" sensors, community sensors may be mobile, intermittently available, and affected by the unique environment of an individual user's home or workplace; for example, the accelerometer in a smartphone could measure earthquakes but user motion as well.

By enabling sensor networks that densely cover cities, community sensors make it possible to measure and act on a range of important phenomena, including traffic patterns, pollution, and natural disasters. However, due to the existing lack of fine-grain data about these phenomena, CSR systems must simultaneously learn about the phenomena they are built to act on; for example, a community seismic network may need models learned from frequent, smaller quakes to estimate damage during rare, larger quakes.

Such challenges are complicated by the need to make reliable decisions in real time with performance guarantees; for example, choosing the best emergency-response strategies following a natural disaster could be aided by real-time sensor data. However, false alarms and inaccurate data can be costly. Rigorous performance estimates and system evaluations are prerequisites for automating real-world responses.

Back to Top

Caltech Community Seismic Network

The CSN project at Caltech (http://csn.caltech.edu/) aims to quickly detect earthquakes and provide real-time estimates of their effects through community-operated sensors. Large earthquakes are among the few scenarios that can threaten an entire city. CSN is built on a vision of people sharing accelerometer data from their personal devices to collectively produce the information needed for real-time and post-event responses to dangerous earthquakes. To that end, it has partnered with more than 1,000 volunteers in the Los Angeles area and others in cities around the world contributing real-time acceleration data from their Android smartphones and low-cost USB-connected sensors (see Figure 1).

Following an earthquake, firefighters, medical teams, and other first responders must build situational awareness before they are able to deploy their resources effectively. Due to variations in ground structure, two points that may be only, say, a kilometer apart can experience significantly different levels of shaking and damage (see Figure 2). Likewise, different buildings may receive different degrees of damage due to the types of motion they experience. If communication is lost in a city, it can take up to an hour for helicopter surveillance to provide the first complete picture of the damage it has sustained. Fortunately, as sensors can detect the moderate P-wave shaking that precedes damaging S-wave shaking, they are likely to report initial quake measurements before the communication or power networks are compromised. These measurements can provide localized estimates of shaking intensity and damage to emergency responders immediately after a quake strikes.

Another intriguing application is early warning of strong shaking. Early warning follows the principle that accelerometers near the origin of an earthquake detect initial shaking before locations further from the origin experience strong shaking. While the duration of warning people receive depends on the speed of detection and their distance from the epicenter, warning times of tens of seconds to a minute have been produced by early-warning systems in Japan, Mexico, and Taiwan. These warnings give time needed to evacuate elevators, stop trains, or halt medical procedures. Following the 1989 Loma Prieta earthquake in Northern California, emergency workers involved in clearing debris received advance warning of aftershocks.

Community participation is ideal for seismic sensing for several reasons: First, community participation makes possible the densely distributed sensors needed to accurately measure shaking throughout a city; for example, instrumenting the greater Los Angeles area at a spatial resolution of one sensor per square kilometer would require more than 10,000 sensors. While traditional seismometer stations cost thousands of dollars per sensor to install and operate, the same number of sensors would be possible if only 0.5% of the area's population volunteered data from their smartphones. This way, community sensors can provide fine spatial coverage and complement existing networks of sparsely deployed, high-quality sensors.

Community sensors are also ideally situated for assisting the population through an emergency. In addition to collecting accelerometer data, community-sensing software on smartphones can report the last known location of family members or give instructions on where to gather for help from emergency teams; that is, community-sensing applications represent a new way for people to stay informed about the areas and people they care about.

CSN makes it easy for volunteers to participate through low-cost accelerometers and the sensors already in their Android phones. A free Android app called CrowdShake (http://csn.caltech.edu/crowdshake) makes volunteering data as easy as installing a new app. CSN also partners with Los Angeles-area schools and civic organizations to freely distribute 3,000 low-cost accelerometers from Phidgets, Inc. (http://www.phidgets.com) that interface through USB to a host PC, tablet, or other Internet-connected device. Phidgets sensors have also been installed in several high-rise buildings in the Los Angeles area to measure their structural responses to earthquakes, as in Figure 1.

Reliable, real-time inference of spatial events is a core task of seismic monitoring and a prototypical challenge for any application using physical sensors. Here, we outline a methodology developed by the CSN team to quickly detect quakes from thousands of community sensors, harnessing the computational power of community devices to overcome the noise in community-operated hardware and demonstrating that on-device learning yields a decentralized architecture scalable and heterogeneous even as it provides rigorous performance guarantees.

Back to Top

Decentralized Event Detection

Suppose a strong earthquake begins near a metropolitan area and that 0.1% of the population contributes accelerometer data from a personally owned Internet-enabled device. In Los Angeles County, this would mean data from 10,000 noisy sensors located on a coastal basin of rock and sediment criss-crossed with fault lines and overlapped with vibration-producing freeways. How could a sensor network detect the quake and estimate its location and magnitude as quickly as possible? The "classic" approach is to collect all data centrally and declare an event has occurred when the following likelihood ratio test is true:

eq01.gif

The test would declare a detection if the ratio exceeds a predetermined threshold . Not surprisingly, this involves transmitting a daunting amount of data; a global network of one million phones would be transmitting 30TB of acceleration data per day. Additionally, the likelihood-ratio test requires distribution of all sensor data, conditioned on the occurrence or nonoccurrence of a strong earthquake. Each community sensor is unique, so modeling these distributions requires modeling each sensor individually.

A natural next step is a decentralized approach. Suppose each device transmits only a finite summary of its current data, or a "pick message." The central server again performs a hypothesis test but now using the received pick messages instead of the entire raw data. Results from decentralized hypothesis testing theory say if the sensors' measurements are independent, depending on whether an event has or has not occurred, and if the probability of the measurements is known in each case, then the asymptotically optimal strategy is to perform a hierarchical hypothesis test;21 each sensor individually performs a hypothesis test, for some threshold , picking only when

eq02.gif

Similarly, the cloud server performs a hypothesis test on the number of picks S received at a given time and declares a detection when this condition is met

eq03.gif

The parameters rT and rF are the true positive and false positive pick rates for a single sensor, and Bin(·, p, N) is the probability mass function of the binomial distribution. Decision rules (2) and (3) are asymptotically optimal for proper choice of the thresholds and .21 Additionally, collecting picks instead of raw data helps preserve user privacy.

Detecting rare events from community sensors presents three main challenges to this classical, decentralized detection approach:

Likelihood ratio tests. How can likelihood ratio tests be performed on each sensor's data when the data needed to accurately model sensor behavior during an event (such as measurements of large, rare quakes) is difficult to obtain?;

Modeling each sensor. How might each sensor be modeled? Serverside modeling scales poorly, and on-device learning involves computational and storage limits; and

Spatial dependencies. How can the (strong) assumption of conditionally independent sensors be overcome and spatial dependencies incorporated?

We next consider how the abundance of normal data can be leveraged to detect rare events for which training data is limited; then that new tools from computational geometry make it possible to compute the needed probabilistic models on resource-constrained devices; and finally that learning on the serverside adapts data aggregation according to spatial dependencies.

Leveraging "normal" data. The sensor-level hypothesis test in (2) requires two conditional probability distributions: The numerator models a particular device's acceleration during a strong quake that is impractical to obtain due to the rarity of large quakes. In contrast, the denominator can be estimated from abundantly available "normal" data. Is reliable quake detection still possible?

It turns out that mild assumptions, a simple approach to anomaly detection using only the probability of an acceleration time series in the absence of a quake, can obtain the same asymptotically optimal performance. A given sensor now picks when

eq04.gif

For an appropriate choice of threshold, this technique can be shown to produce the same picks as the full hypothesis test, without requiring a model of sensor data during future unknown quakes; for more, see Faulkner et al.11

The anomaly-detection scheme makes use of the abundant "normal" data but still involves the challenge of computing the conditional distribution. In principle, each sensor could maintain a history of its observations, periodically estimating a probabilistic model describing that data. On a mobile device, this means logging approximately 3GB of acceleration data per month. Storing and estimating models on this much data is a burden on volunteers' smartphone resources. Is it possible to accurately model a sensor's data with (much) less storage?

The CSN system models accelerometer data as a Gaussian mixture model (GMM) over a feature vector of acceleration statistics from short-duration time windows, as in, say, phonemes in speech recognition. GMMs are a flexible family of multimodal distributions that can be estimated from data using the simple EM algorithm.3 Unlike a single Gaussian, which is specified by the mean and variance of the data, learning the optimal GMM requires access to all the data; GMMs do not admit finite sufficient statistics. Thus, a device must store its entire sensor history to produce the optimal GMM. Fortunately, it turns out the picture is drastically different for approximating a GMM, which can be fit to an arbitrary amount of data, with an arbitrary approximation guarantee, using a finite amount of storage.

A tool from computational geometry, called a "coreset," makes such approximations possible. A coreset for an algorithm is roughly a (weighted) subset of the input, such that running the algorithm on the coreset gives a constant-factor approximation to running the algorithm on the full input. Coresets have been used to obtain approximations for a variety of geometric problems, including k-means and k-medians clustering.

It turns out that many geometric coreset techniques also provide approximations for statistical problems. Given an input dataset D, the challenge is to find the maximum likelihood estimate for the means and variances of a GMM, collectively denoted . A weighted set C is a (k, ) coreset for GMMs if with high probability the log likelihood on cacm5707_p.gif(C |) is an approximation to the log likelihood on the full data cacm5707_p.gif(C |) for any mixture of k Gaussians:

ueq01.gif

Feldman et al.12 showed that given input D, it is possible to sample such a coreset C with size independent of the size of input D or depends polynomially on the dimension of the input, the number of Gaussians k, and parameters , , with probability at least 1 for all (nondegenerate) mixtures of k Gaussians. This result implies that the mixture model learned from a constant-size coreset C can obtain approximately the same likelihood as a model learned from the entire arbitrarily large D.

But where does C come from? Feldman et al.12 showed efficient algorithms to compute coresets for projective clustering problems (such as k-means and generalizations) can provide coresets for GMMs. A key insight is that while uniformly subsampling, the input can miss "important" regions of data, an adaptive-sampling approach is likely to sample from "enough" regions to reliably estimate a mixture of k Gaussians; weighting the samples accounts for the sampling bias. Har-Peled and Mazumdar13 identified that coresets for many optimization problems can be computed efficiently in the parallel or streaming model, with several such results applying here. In particular, a stream of input data can be buffered to some constant size, then compressed into a coreset. Careful merging and compressing of these coresets provides an approximation of the entire stream so far while using space and update time polynomial in all the parameters and logarithmic in n.

Quake detection in community networks requires finding a complex spatiotemporal pattern in a large set of noisy sensor measurements. The start of a quake may affect only a small fraction of a network, so the event can be concealed in single-sensor measurements and in networkwide statistics. Recent high-density seismic studies, as in Figure 2, found localized variations in ground structure significantly affect the magnitude of shaking at locations only one kilometer apart. Consequently, effective quake detection requires algorithms that are able to learn subtle dependencies among sensor data and detect changes within groups of dependent sensors.

The classical approach outlined earlier assumes sensors provide independent, identically distributed measurements conditioned on the occurrence or nonoccurrence of an event. In this case, the system would declare an event has occurred if a sufficiently large number of sensors, regardless of location, report picks. However, in many practical applications, the particular spatial configuration of the sensors matters, and the independence assumption is violated. How can (qualitative) knowledge about the nature of the event be exploited to improve detection performance?


How can (qualitative) knowledge about the nature of the event be exploited to improve detection performance?


The start of an event (such as an earthquake, fire, or epidemic) may first be observed by small groups of sensors that are close to the event or are most sensitive to its effects. Viewed as transmitting a vector x cacm5707_o.gif through a noisy channel, the signal is mostly zeros (sparse), but many bits in the received vector y (picks) are flipped due to noise. Intuitively, the signal observed by these small groups will be lost among the environmental noise unless the system is aware of dependencies among the sensors. This intuition (and some desirable analytic properties) can be captured by learning an orthonormal change-of-basis matrix that projects the picks received by the server onto a coordinate system that, roughly, aggregates groups of strongly correlated sensors. Given such a matrix B with columns bi, ..., bp, the server declares an event when

ueq02.gif

To obtain reliable detection when the signal is weak (measured by the cacm5707_q.gif0 pseudo-norm, ||x||0 < cacm5707_a.gif), traditional hypothesis testing requires the error rate of each sensor (each element of x) to decrease as the number of sensors p increases. This requirement is in stark contrast to the intuition that more sensors are better and incompatible with the "numerous but noisy" approach of community sensing. However, Faulkner et al.10 found that if the matrix B is "sparsifying," or ||BT x||0 = p, ||x||0 < p, 0 < < < ½, then the test cacm5707_b.gif y > gives probability of miss and false alarm that decays to zero exponentially as a function of the "sparsification ratio" ||x||0/||BT||0, for any rate rF < ½ of pick errors. Effectively, a change of basis allows large numbers of noisy sensors to contribute to reliable detection of signals that are observed by only a small fraction (||x||0) of sensors.

Learning to sparsify. These results depend on B's ability to concentrate weak signals.

A direct approach for learning B is to optimize

eq05.gif

where X is a matrix containing binary per-sensor event occurrences as its columns and BBT = I is the sum of non-zero elements in the matrix. The constraint ensures B remains orthonormal. Computing equation (5) can be impractical, as well as sensitive to noise or outliers in the data. It may thus be more practical to find a basis that sparsely represents "most of" the observations. More formally, let Z be a latent matrix that can be viewed as the "cause" in the transform domain of the noise-free signals X, or X = BZ. Z should be sparse and BZ should be close to the observed signal Y. These conditions suggest the next optimization, originally introduced for text modeling7 as a heuristic for equation (5)

eq06.gif

where ||·||F is the matrix Frobenius norm, and > 0 is a free parameter. Equation (6) essentially balances the difference between Y and X with the sparsity of Z; increasing more strongly penalizes choices of Z that are not sparse. For computational efficiency, the cacm5707_q.gif0-norm is replaced by the convex and heuristically "sparsity-promoting" cacm5707_q.gif1-norm.

Although equation (6) is non-convex, fixing either B or Z makes the objective function with respect to the other convex. The objective can then be efficiently solved (to a local minima) through an iterative two-step convex-optimization process.

Back to Top

Building CSN

Managing a CSR and processing its data in real time is a challenge in terms of scalability and data security. Cloud computing platforms (such as Amazon EC2, Heroku, and Google App Engine) provide practical, cost-effective resources for reliably scaling Web applications. The CSN network is built on Google App Engine (see Figure 3). Heterogeneous sensors include cellphones, standalone sensors, and accelerometers connected through USB to host computers to the cloud; the cloud, in turn, performs event detection and issues notifications of potential seismic events. Additionally, a cloud infrastructure allows sensors anywhere in the world to connect just by specifying a URL.

The CSN network includes two kinds of sensor clients: a desktop client with USB accelerometer and an Android app for phones and tablets, as in Figure 1. The internal data flow and the messaging between the cloud and an Android client are outlined in Figure 4; desktop clients differ primarily in their picking algorithm and lack of GPS (see Figure 5). At the core of the application is a suite of sensors, including three-axis accelerometer and GPS. The Android client continually tests the accelerometer data for anomalies (reported as picks), logging raw data temporarily to a local database for post-event data collection. Clients listen for push notifications from the server implemented through Google's Cloud Messaging services.

Cloud computing services are well suited for the network maintenance and real-time response tasks of CSR systems. Figure 3 outlines the main data flows through the cloud: First, CSN writes client registration and heartbeat messages to multiple data-centers via App Engine's high replication datastore. Next, incoming picks are spatially aggregated by geographic hashing into memcache, a distributed in-memory data cache. Although memcache is not persistent (objects can be ejected from the cache due to memory constraints), it is much faster than the datastore. Memcache is also ideal for computations that must occur quickly, and, because it allows values to set an expiration time, it is also ideal for data whose usefulness expires after a period of time. Finally, the CSN cloud performs event detection on the aggregated picks. Implementing this architecture on App Engine offers several advantages:

Dynamic scaling. Incoming requests are automatically load balanced between instances created and destroyed based on current demand levels, an arrangement that simplifies algorithmic development and reduces costs during idle periods;

Robust data. Datastore writes are automatically replicated to geographically separate data centers. Replication is prudent for any application but especially for disaster response systems; and

Easy deployment. Deploying applications on App Engine is fairly straightforward, as individual server instances need not be configured and coordinated. Additionally, by using the same front ends that power Google's search platform, App Engine applications can expect low latency from any geographical location in the world. Such scalability allows the network to readily include volunteers from new cities or countries.

Back to Top

Experimental Evaluation

Evaluating a CSR system involves assessing both hardware and algorithms. For CSN, this means determining whether community hardware is able to detect strong quakes, evaluate detection algorithms on their ability to detect future quakes that cannot be modeled or predicted, and verify that implementing the system is practical on mobile devices and cloud platforms (see Figure 6).

The CSN team evaluated community hardware, and found that low-cost MEMS accelerometers are capable of measuring seismic motion. Experiments with a large actuator called a "shake table" expose sensors to accurate reproductions of historic, moderately large (magnitude 4.55.5) earthquakes. The shake table demonstrates that both USB sensors and the lower-quality phone accelerometers can detect the smaller initial shaking (P-wave) and stronger secondary shaking (S-wave) that produce the characteristic signature of an earthquake (see Figure 7). These laboratory experiments are confirmed by measurements of actual earthquakes observed by the CSN network; Figure 5 shows similar signatures involving a subset of measurements of a magnitude 3.6 quake. A second experiment assesses whether community sensors can detect changes in the motion of buildings caused by earthquakes. The CSN team oscillated the 10-story Millikan Library on the Caltech campus using a large eccentric weight on the roof. CSN sensors measured the resonant frequency of the building (approximately 1.7Hz), confirming low-cost sensors are able to perform structure monitoring.

Here, we evaluate the ability of community sensors to detect future quakes for which no training data is available. While earthquakes are rare, data gathered from community sensors may be plentiful. To characterize "normal" (background) data, seven student and faculty volunteers at Caltech carried Android phones during their daily routines to gather more than 7GB of phone accelerometer data; 20 desktop USB accelerometers recorded 55GB of acceleration. From this data, the CSN team estimated models for each sensor type's normal operating behavior, evaluating anomaly-detection performance on 32 historic records of moderately large (magnitude 55.5) events (as recorded by the Southern California Seismic Network (http://www.scsn.org). Individual sensors were able to transmit "event" or "no event" to the cloud server in the form of receiver operating characteristic curves, showing anomaly detection outperforming several standard baselines (see Figure 8). The vertical axis is the attainable detection (pick) rate of a single sensor against the horizontal axis of allowable false detection (pick) rate.

Coresets provide a promising way to learn GMMs of accelerometer data on smartphones, yielding more accurate models for a given training set size than uniformly subsampling the available data. Combining the accuracy results for USB and Android sensors, Figure 8d outlines the trade-off of detecting with a mix of sensor types while limited to one false alarm per year. These results indicate approximately 50 phones or 10 Phidgets should be enough to detect a nearby magnitude 5 or larger event with close to 100% detection rate.

While earthquakes are inherently unpredictable, simulations provide a qualitative idea of spatial dependencies among sensors that can be used to train detectors. Using a prior distribution constructed from historic earthquakes in the U.S. Geological Survey database (http://www.data.scec.org) a simulator for community sensors similar to the one developed by Liu et al.17 simulated picks from 128 CSN sensors during 1,000 simulated quakes. These picks are used as training data for a sparsifying basis, a networkwide hypothesis test, and a spatial scan statistic. Each algorithm is evaluated on its ability to detect four recent events using real measurements recorded by the network (see Figure 9); for each event, the vertical bars give the time to detection for the learned bases, classical hypothesis testing, and a competitive scan statistic algorithm. The bases learned from simple simulations in general achieve faster detection (such as eight seconds faster than competitive algorithms detecting the September 3, 2102 Beverly Hills, CA, magnitude 3.2 event).

Back to Top

Conclusion

We have outlined several algorithmic and systems principles that facilitate detecting rare and complex spatial signals using large numbers of low-cost community sensors. Employing machine learning at each stage of a decentralized architecture allows efficient use of sensor-level and cloud-level resources, essential for providing performance guarantees when little can be said about a particular community sensor or when little is known about the events of interest. Community sensing is applicable to a variety of application domains, including fires, floods, radiation, epidemics, and traffic accidents, as well as monitoring pollution, pedestrian traffic, and acoustic noise levels in urban environments. In all cases, "responding" can range from taking physical action to merely devoting additional resources to an event of interest. While the CSN project is motivated by the public need to detect and react to strong earthquakes, CSR systems for these domains and others will require a similar blueprint for machine learning and scalable systems.

Back to Top

Acknowledgments

We would like to thank the Gordon and Betty Moore Foundation, the National Science Foundation (awards CNS0932392, IIS0953413), and European Research Council Starting Grant 307036. Andreas Krause was supported in part by a Microsoft Research Faculty Fellowship. We also thank Signal Hill Petroleum and Nodal Seismic for data from the Long Beach Network, and the Southern California Seismic Network for data from the permanent earthquake network in Southern California.

Back to Top

References

1. Aberer, K., Sathe, S., Chakraborty, D., Martinoli, A., Barrenetxea, G., Faltings, B., and Thiele, L. Opensense: Open community driven sensing of environment. In Proceedings of the ACM SIGSPATIAL International Workshop on GeoStreaming (San Jose, CA, Nov. 25). ACM Press, New York, 2010, 3942.

2. Aoki, P.M., Honicky, R.J., Mainwaring, A., Myers, C., Paulos, E., Subramanian, S., and Woodruff, A. A vehicle for research: Using street sweepers to explore the landscape of environmental community action. In Proceedings of the 27th International Conference on Human Factors in Computing Systems (Boston, Apr 4-9). ACM Press, New York, 2009, 375384.

3. J. Bilmes. A Gentle Tutorial on the EM Algorithm Including Gaussian Mixtures and Baum-Welch. International Computer Science Institute Technical Report TR-97-021, May 1997.

4. Borokhov, P., Blandin, S., Samaranayake, S., Goldschmidt, O., and Bayen, A. An adaptive routing system for location-aware mobile devices on the road network. In Proceedings of the 14th International IEEE Conference on Intelligent Transportation Systems (Washington, D.C., Oct. 57). IEEE Computer Society Press, New York, 2011, 18391845.

5. Boulos, M.N.K, Resch, B., Crowley, D.N., Breslin, J.G., Sohn, G., Burtner, R., Pike, W.A., Jezierski, E., and Chuang, K.-Y.S. Crowdsourcing, citizen sensing and sensor Web technologies for public and environmental health surveillance and crisis management: Trends, OGC standards, and application examples. International Journal of Health Geographics 10, 1 (2011).

6. Burke, J., Estrin, D., Hansen, M., Parker, A., Ramanathan, N., Reddy, S., and Srivastava, M.B. Participatory sensing. In the Workshop on World Sensor Web Workshop (Boulder, CO, Oct. 31Nov. 3, 2006), 15.

7. Chen, X., Qi, Y., Bai, B., Lin, Q., and Carbonell, J.G. Sparse latent semantic analysis. In Proceedings of the SIAM International Conference on Data Mining (Mesa, AZ, Apr. 2830). SIAM, Philadelphia, 2011, 474485.

8. Cochran, E.S., Lawrence, J.F., Christensen, C., and Jakka, R.S. The Quake-Catcher Network: Citizen science expanding seismic horizons. Seismological Research Letters 80, 1 (2009), 2630.

9. Ervasti, M., Dashti, S., Reilly, J., Bray, J.D., Bayen, A., and Glaser, S. iShake: Mobile phones as seismic sensors, user study findings. In Proceedings of the 10th International Conference on Mobile and Ubiquitous Multimedia (Beijing, Dec. 79). ACM Press, New York, 2011, 4352.

10. Faulkner, M., Liu, A., and Krause, A. A fresh perspective: Learning to sparsify for detection in massive noisy sensor networks. In Proceedings of the 12th ACM/IEEE International Conference on Information Processing in Sensor Networks (Philadelphia, Apr. 811). ACM Press, New York, 2013, 718.

11. Faulkner, M., Olson, M., Chandy, R., Krause, J., Chandy, K.M., and Krause, A. The next big one: Detecting earthquakes and other rare events from community-based sensors. In Proceedings of the 10th ACM/IEEE International Conference on Information Processing in Sensor Networks (Chicago, Apr. 1214). ACM Press, New York, 2011, 1324.

12. Feldman, D., Faulkner, M., and Krause, A. Scalable training of mixture models via coresets. In Proceedings of the Neural Information Processing Systems Annual Conference (Granada, Spain, Dec. 1214, 2011).

13. Har-Peled, S. and Mazumdar, S. On coresets for k-means and k-median clustering. In Proceedings of the 36th Annual ACM Symposium on Theory of Computing (Chicago, June 1315). ACM Press, New York, 2004, 291300.

14. Hoh, B., Gruteser, M., Herring, R., Ban, J., Work, D., Herrera, J.C., Bayen, A.M. Annavaram, M., and Jacobson, Q. Virtual trip lines for distributed privacy-preserving traffic monitoring. In Proceedings of the International Conference on Mobile Systems, Applications, and Services (Breckenridge, CO, June 1720, 2008), 1720.

15. Kapoor, A., Eagle, N., and Horvitz, E. People, quakes, and communications: Inferences from call dynamics about a seismic event and its influences on a population. In Proceedings of AAAI Symposium on Artificial Intelligence for Development (Atlanta, July 1115). AAAI, Palo Alto, CA, 2010, 5156.

16. Krause, A., Horvitz, E., Kansal, A., and Zhao, F. Toward community sensing. In Proceedings of the ACM/IEEE International Conference on Information Processing in Sensor Networks (St. Louis, MO, Apr. 2224). IEEE Computer Society Press, Washington, D.C., 2008, 481492.

17. Liu, A., Olson, M., Bunn, J., and Chandy, K.M. Towards a discipline of geospatial distributed event-based systems. In Proceedings of the Sixth ACM International Conference on Distributed Event-Based Systems (Berlin, July 1620). ACM Press, New York, 2012, 95106.

18. Mun, M., Reddy, S., Shilton, K., Yau, N., Burke, J., Estrin, D., Hansen, M., Howard, E., West, R., and Boda, P. Peir: The personal environmental impact report as a platform for participatory sensing systems research. In Proceedings of the Seventh International Conference on Mobile Systems, Applications, and Services (Kraków, Poland, June 2225). ACM Press, New York, 2009, 5568.

19. Neill, D.B. Fast subset scan for spatial pattern detection. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 74 (2012), 337360.

20. Thiagarajan, A., Ravindranath, L., LaCurts, K., Madden, S., Balakrishnan, H., Toledo, S., and Eriksson, J. VTrack: Accurate, energy-aware road traffic delay estimation using mobile phones. In Proceedings of the Seventh ACM Conference on Embedded Networked Sensor Systems (Berkeley, CA, Nov. 46). ACM Press, New York, 2009, 8598.

21. Tsitsiklis, J.N. Decentralized detection by a large number of sensors. Mathematics of Control, Signals, and Systems 1, 2 1988, 167182.

Back to Top

Authors

Matthew Faulkner (mfaulk@caltech.edu) is a Ph.D. candidate in computer science at Caltech, Pasadena, CA.

Robert Clayton (clay@gps.caltech.edu) is a professor of geophysics at Caltech, Pasadena, CA.

Thomas Heaton (heaton@gps.caltech.edu) is a professor of geophysics and civil engineering at Caltech, Pasadena, CA.

K. Mani Chandy (mani@cs.caltech.edu) is a professor of computer science at Caltech, Pasadena, CA.

Monica Kohler (kohler@caltech.edu) is a senior research fellow in mechanical and civil engineering at Caltech, Pasadena, CA.

Julian Bunn (julian.bunn@caltech.edu) is a principal computational scientist in the Center for Advanced Computing Research at Caltech, Pasadena, CA.

Richard Guy (rguy@gps.caltech.edu) is the Community Seismic Network project manager at Caltech, Pasadena, CA.

Annie Liu (aliu@cms.caltech.edu) is a software engineer at Facebook, Menlo Park, California; her research for this article was part of her Ph.D. in computer science at Caltech, Pasadena, CA.

Michael Olson (molson@cs.caltech.edu) is a software engineer at Google, Mountain View, CA; his research for this article was part of his Ph.D. in computer science at Caltech, Pasadena, CA.

MingHei Cheng (mmhcheng@caltech.edu) conducted research for this article as part of his Ph.D. in mechanical and civil engineering at Caltech, Pasadena, CA.

Andreas Krause (krausea@ethz.ch) is an assistant professor of computer science at ETH Zürich, Zürich, Switzerland.

Back to Top

Figures

F1Figure 1. CSN volunteers contribute data from low-cost accelerometers (above) and from Android smartphones via a CSN app (below).

F2Figure 2. Differences in soil conditions and subsurface structures cause significant variations in ground shaking; data recorded by the Long Beach, CA, network.

F3Figure 3. The CSN cloud maintains the persistent state of the network in datastore, performs real-time processing of pick data via Memcache, and generates notifications and quake statistics.

F4Figure 4. The CrowdShake app processes sensor data locally on an Android phone or tablet, sends pick messages during potential quakes, receives alerts, and responds to data requests.

F5Figure 5. CSN sensors produced picks (blue and red bars) for both P-wave and S-wave of the March 2013 Anza M4.7 earthquake; time series plots are arranged by distance from the quake's epicenter.

F6Figure 6. Eccentric weights oscillate Millikan Library, showing CSN hardware is sensitive to resonant frequencies in buildings.

F7Figure 7. Android accelerometers accurately record strong shaking during a shake-table experiment: (a) shake-table experimental setup; (b) ground truth; and (c) Android phone; and (d) Android phone in backpack.

F8Figure 8. Attainable true-positive and false-positive pick rates for: (a) USB accelerometer and (b) Android accelerometer; (c) coresets of accelerometer data require less storage to produce accurate acceleration models; and (d) estimated quake detection rates for a mixture of USB and mobile phone sensors in a given area.

F9Figure 9. Learned "sparsifying" outperforms standard spatial event-detection algorithms and provides faster detection of four Los Angeles area quakes recorded by CSN in 2012.

Back to top


©2014 ACM  0001-0782/14/07

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from permissions@acm.org or fax (212) 869-0481.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2014 ACM, Inc.


 

No entries found

Your Phone as Quake Detector.

">