Highly classified information on personal computers in the U.S. today is largely protected from the attacks Daniel Genkin et al. described in their article "Physical Key Extraction Attacks on PCs" (June 2016). Here, I outline cost-effective defenses that will in the future completely defeat such attacks, while making even stronger cyberattacks extremely difficult.
For example, tiny Faraday cages can be constructed in a processor package so encryption/decryption can be performed without the possibility of inadvertent emanations that could be measured or exploited, because all external communication to a cage would be through optical fiber and the cage’s power supply is filtered.1 This way, encryption keys and encryption/decryption processes would be completely protected against the attacks described in the article. In such a Faraday cage, advanced cryptography (such as Learning With Errors) could not be feasibly attacked through any known method, including quantum computing.
Hardware can likewise help protect software, including operating systems and applications, through RAM-processor package encryption. All traffic between a processor package and RAM can be encrypted using a Faraday cage to protect a potentially targeted app (which is technically a process) from operating systems and hypervisors, other apps, and other equipment, including baseband processors, disk controllers, and USB controllers. Even a cyberattack that compromises an entire operating system or hypervisor would permit only denial of service to its applications and not give access to any application or related data storage. Similarly, every-word-tagged extensions of ARM and X86 processors can be used to protect each Linux kernel object and each Java object in an app from other such objects by using a tag on each word of memory that controls how memory can be used. Tagged memory can make it much more difficult for a cyberattacker to find and exploit software vulnerabilities because compromising a Linux kernel object or a Java application object does not automatically give power over any other object. Security-access operations by such objects can be strengthened through "inconsistency robustness" providing technology for valid formal inference on inconsistent information.1 Such inconsistency robust inference is important because security-access decisions are often made on the basis of conflicting and inconsistent information.
Such individual processor-package cyberdefense methods and technologies will make it possible, within, say, the next five years, to construct a highly secure board with more than 1,000 general-purpose coherent cores, greatly diminishing dependence on datacenters and thereby decreasing centralized points of vulnerability.1
These technologies promise to provide a comprehensive advantage over cyberattacks. The U.S., among all countries, has the most to lose through its current cyberdefense vulnerabilities. In light of its charter to defend the nation and its own multitudinous cyberdefense vulnerabilities, the Department of Defense is the logical agency to initiate an urgent program to develop and deploy strong cyberdefenses using these technologies and methods. Industry and other government agencies should then quickly incorporate them into their own operations.
Carl Hewitt, Palo Alto, CA
Authors Respond:
We agree that high-security designs can mitigate physical side channels. However, they are typically difficult to design, expensive to manufacture, reduce performance, and are unavailable in most commercial and consumer contexts. Moreover, their security depends on understanding the feasible attacks. Faraday cages, optical couplers, and power filters would not stop acoustic leakage through vents or self-amplification attacks that induce leakage at frequencies below the filter’s design specification. The problem of creating inexpensive hardware or software that adequately mitigates all feasible side-channel attacks thus remains open.
Daniel Genkin, Lev Pachmanov, Itamar Pipman, Adi Shamir, and Eran Tromer, Tel Aviv, Israel
Fewer Is Better Than More Samples When Tuning Software
The emphasis on visualizing large numbers of stack samples, as in, say, flame graphs in Brendan Gregg’s article "The Flame Graph" (June 2016) actually works against finding some performance bottlenecks, resulting in sub-optimal performance of the software being tuned. Any such visualization must necessarily discard information, resulting in "false negatives," or failure to identify some bottlenecks. For example, time can be wasted by lines of code that happen to be invoked in numerous places in the call tree. The call hierarchy, which is what flame graphs display, cannot draw attention to these lines of code.2 Moreover, one cannot assume the bottlenecks can be ignored; even a particular bottleneck that starts small does not stay small, on a percentage basis, after other bottlenecks have been removed. Gregg made much of 60,000 samples and how difficult they are to visualize. However, he also discussed finding and fixing a bottleneck that resulted in saving 40% of execution time. That means the fraction of samples displaying the bottleneck was at least 40%. The bottleneck would thus have been displayed, with statistical certainty, in a purely human examination of 10 or 20 random stack samples—with no need for 60,000. This is generally true of any bottleneck big enough, on a percentage basis, to be worth fixing; moreover, every bottleneck grows as others are removed. So, if truly serious performance tuning is being done, it is not necessary or even helpful to visualize thousands of samples.
Michael R. Dunlavey, Needham, MA
Author Responds:
More samples provide more benefits. One is that performance wins of all magnitudes can be accurately quantified and compared, including even the 1%, 2%, and 3% wins, finding more wins and wasting less engineering time investigating false negatives. Samples are cheap. Engineering time is not. Another benefit is quantifying the full code flow, illustrating more tuning possibilities. There are other benefits, too. As for code invoked in numerous places, my article discussed two techniques for identifying them—searching and merging top-down.
Brendan Gregg, Campbell, CA
When Designing APIs Take the User’s View
I doubt many software developers were surprised by Brad A. Myers’s and Jeffrey Stylos’s conclusions in their article "Improving API Usability" (June 2016). I anecdotally suspect most poor API designs are based on what the implementation does rather than what the user needs.
API designers know the implementation too well, suffering from the curse of knowledge. The API makes sense to them because they understand the context. They do not consider how an API can be confusing to users without it. Worse yet, implementation details could leak into the API without the designer noticing, since the delineation of the API and feature implementation might be blurred.
I agree with Myers and Stylos that the API should be designed first. Design and code operational scenarios to confirm API usability before diving too deeply into feature design and implementation. Spend time in the mind-set of the user. The scenario code used to confirm the API can also be reused as the foundation of test code and user examples.
Myers and Stylos recommended avoiding patterns. Creation patterns involve API complications beyond usability. Gamma et al.3 defined creation-method names that suggest how the returned object was constructed; for example, factory-, singleton-, and prototype-created objects are acquired through create
, instance
, and clone
, respectively. Not only do these different object-acquisition-method names create more API confusion, they violate encapsulation, since they suggest the creation mechanism to a user. This naming constraint limits the designer’s ability to switch to a different design-pattern mechanism. Gamma et al. also failed to define creation-pattern clean-up mechanisms that are critical in C++, one of their example languages.
I prefer to unify my creation-pattern objects by defining acquire
to acquire an object and release
to release an object when it is no longer being used. These unification methods encapsulate the creation pattern and provide for clean up, as well. They yield a consistent API based on user need rather than on an implementation-creation mechanism.
They also affect usability, however, since they are not standard nomenclature, so I take it one step further for APIs used by others. I use the Pointer to IMPLementation, or PIMPL, idiom to create a thin API class that wraps the functionality. The API class calls acquire
in its constructor and release
in its destructor. Wrapping the unification methods within the constructor and destructor results in an API that uses the source language’s natural creation syntax yet still offers the advantages of the creation design patterns within the underlying implementation.
Jim Humelsine, Neptune, NJ
Join the Discussion (0)
Become a Member or Sign In to Post a Comment