Technical Perspective: Smartphone Security ‘Taint’ What It Used to Be

There is something seductive about information flow as a security policy. You can state a very clear and concise policy (for example, "forbid my GPS location information from flowing to the network"), which seems to more closely capture our intuition for right and wrong than the sorts of policies that smartphone operating systems like iOS and Android seek to enforce today (more like "give this app your GPS location, yes or no, and you have no say over how it's used"). Information flow research dates back to the early 1970s. Although much of the original computer science theory and systems were developed around modeling the military's rules for handling classified, secret, and top-secret data, information flow policies and techniques are perfectly valuable today and we can benefit from this earlier work.

An excellent foundational reference is Dorothy and Peter Denning's landmark 1977 paper, "Certifying Programs for Secure Information Flow," which pursued a static analysis strategy and appeared in Communications.^a At the same time, others were pursuing hardware or runtime-based solutions. A lot of the complexity then, as now, comes when the control flow of the program depends on sensitive values (for example, "if my GPS location is in Washington D.C., then behave differently"), never mind unusual control flows (for example, interrupt handlers, exceptions, indirect branches) and ambiguous data references (pointer dereferencing, array indexing). Tracking all of this in hardware requires extra computation and state, while analyzing it statically can induce false alarms over execution paths that might never happen at runtime.

Fast-forward to the present day. What's new? We now have insanely fast computers, even in our phones, with applications that typically spend most of their time idle, waiting for user input or network data. We can afford extra runtime CPU overhead without wasting too much battery or slowing down the user experience. Likewise, better algorithms and faster computers have made whole program static analysis, whether for bug finding or security verification, into a billion-dollar industry.

The TaintDroid project takes a runtime taint tracking approach toward analyzing Android apps. They have the benefit that the wire format for distributing apps (bytecode for the Dalvik VM) is not raw machine code, allowing them to modify the on-phone Dalvik compiler to add runtime taint tracking, yielding them an entirely reasonable 14% performance overhead on CPU-bound workloads. They similarly added annotations to the Android file-system and IPC layers to track tainted data. The TaintDroid team did cut some corners: they punted on dealing with native ARM code, which some Android apps might include for better performance, and they similarly do not track taint from a conditional expression through to the resulting computation. (Impact: malicious apps could be easily engineered to trick the current TaintDroid implementation, and souping up TaintDroid to deal with this would require static analysis, more runtime overhead, or some combination of both.)

Despite these limitations, the TaintDroid project managed to identify a number of Android apps that were clearly behaving in ways that users would find objectionable. (In hindsight, it should not be surprising when you install an advertising-supported free app that they are going to want to know as much about you as they can to better target their advertisements at you.) Since the publication of TaintDroid in 2010, there has been an explosion of research into every possible method of enhancing Android security, whether through static analysis or dynamic runtime mechanisms. Even Google is in the game, launching its "Bouncer" system in 2012, running within its Android app store, doing some combination of static analysis along with running each app inside a virtual machine to detect undesirable behavior. Google itself has had very little to say about how Bouncer works,^b but Miller and Oberheide did some clever reverse engineering that suggests Google still has a ways yet to go.^c

Certainly, whatever Google does, malicious developers will find ways around it. For example, code obfuscators do a great job of confusing static analyzers. Likewise, runtime systems can only check code that actually executes; if a malicious app can detect that it's being monitored, it might then avoid misbehavior. Over the long term, it's seemingly easy to predict circumstances analogous to current pattern-matching anti-virus systems, which require continuous updates. However, consider that Google may well choose to err on the side of denying programs access to the store. Apps with ambiguous behaviors could be preemptively forbidden, working around some of the imprecision inherent in analyzing apps for malice. Information flow techniques will, inevitably, play a huge role in policing the app ecosystems on our phones and elsewhere in our computational world.