Architecture and Hardware Contributed articles

Passwords and the Evolution of Imperfect Authentication

Theory on passwords has lagged practice, where large providers use back-end smarts to survive with imperfect technology.

By Joseph Bonneau, Cormac Herley, Paul C. van Oorschot, and Frank Stajano

Posted Jul 1 2015

Introduction
Key Insights
Lessons from the Past
Today's "Overconstrained" World
Multidimensional Future
Conclusion
Acknowledgments
References
Authors
Footnotes

Passwords and the Evolution of Imperfect Authentication, illustration

Passwords have dominated human-computer authentication for 50 years despite consensus among researchers that we need something more secure and deserve something more user friendly. Much published research has focused on specific aspects of the problem that can be easily formalized but do not actually havea major influence on real-world design goals, which are never authentication per se, but rather protection of user accounts and sensitive data. As an example of this disconnect, academic research often recommends strict password-composition policies (such as length requirements and mandating digits and nonalphabetic characters) despite the lack of evidence they actually reduce harm.

Key Insights

Simplistic models of user and attacker behaviors have led the research community to emphasize the wrong threats.
Authentication is a classification problem amenable to machine learning, with many signals in addition to the password available to Web services.
Passwords will continue as a useful signal for the foreseeable future, where the goal is not impregnable security but reducing harm at cost.

We argue that critically revisiting authentication as a whole and passwords’ role therein is required to understand today’s situation and provide a meaningful look ahead. Passwords were originally deployed in the 1960s for access to time-shared mainframe computers, an environment unrecognizable by today’s Web users. Many practices have survived with few changes even if no longer appropriate.^9,19 While partly attributable to inertia, this also represents a failure of the academic literature to provide approaches that are convincingly better than current practices.

We identify as outdated two models that still underlie much of the current password literature. First is the model of a random user who draws passwords uniformly and independently from some set of possible passwords. It has resulted in overestimates of security against guessing and encouraged ineffectual policies aimed at strengthening users’ password choices. The second is that of an offline attack against the password file. This model has inflated the importance of unthrottled password guessing relative to other threats (such as client malware, phishing, channel eavesdropping, and plaintext leaks from the back-end) that are more difficult to analyze but significantly more important in practice. Together, these models have inspired an awkward jumble of contradictory advice that is impossible for humans to follow.^1,19,30,36

The focus of published research on clean, well-defined problems has caused the neglect of the messy complications of real-world Web authentication. This misplaced focus continues to hinder the applicability of password research to practice. Failure to recognize the broad range of usability, deployability, and security challenges in Web authentication has produced a long list of mutually incompatible password requirements for users and countless attempts by researchers to find a magic-bullet solution despite drastically different requirements in different applications. No single technology is likely to “solve” authentication perfectly for all cases; a synergistic combination is required. Industry has already moved in this direction. Many leading providers have bolstered, not replaced, passwords with multiple parallel complementary authentication mechanisms. These are combined, often using machine learning, so as to minimize cost and annoyance while providing enough security for e-commerce and online social interaction to prosper. We expect authentication to gradually become more secure and less obtrusive, even if perhaps technically inelegant under the hood.

This trend is not without downsides. It strongly favors the largest providers with extensive knowledge of their users’ habits. It makes authentication more privacy invasive and increasingly difficult to comprehend for users and researchers alike. We encourage researchers to acknowledge this trend and focus on addressing related security, privacy, and usability challenges.

Lessons from the Past

From the beginning, passwords have been a security band-aid. During development of the first time-sharing operating systems in the 1960s, passwords were added to protect against practical jokes and researchers using more resources than authorized. The 1961 Compatible Time-Sharing System at MIT was likely the first to deploy passwords in this manner. Security issues arose immediately. Multiple cases were reported of users guessing one another’s passwords and also at least one leak of the master password file that was stored in unencrypted form. Yet these issues were easily addressed administratively, as all users were part of the same academic organization.

With development of access control in MULTICS and Unix in the 1970s, passwords were adapted to protect sensitive data and computational resources. MULTICS protected passwords by storing them in hashed form, a practice invented by Roger Needham and Mike Guy at the University of Cambridge in the 1960s. Robert Morris’s and Ken Thompson’s seminal 1979 treatment of password security²⁵ described the evolution toward dedicated password hashing and salting via the crypt() function, along with the first analysis of dictionary attacks and brute-force guessing.

A decade later, the 1988 Morris Internet worm demonstrated the vulnerability of many systems to password guessing. Administrators adapted by storing password hashes in more heavily protected shadow password files²⁴ and, sometimes, proactively checking user passwords for guessability.³⁵

With the mid-1990s advent of the World Wide Web and e-commerce, early attempts were made to replace passwords with public-key cryptography via secure sockets layer (SSL) client certificates or the competing secure electronic transaction (SET) protocol. Ultimately, managing certificates and private keys client-side proved too burdensome and a market never developed. Instead, secure connections on the Web almost universally rely on one-way authenticated SSL. Servers are authenticated by a certificate at the SSL layer, while users are left to prove their identity later with no explicit protocol support. Text-based passwords entered in HTML forms in exchange for HTTP cookies have become the dominant, albeit never formally specified, protocol for user authentication.

As Web-based services proliferated, usability problems arose that had not existed for system passwords. Resetting forgotten passwords, previously a manual task for IT support staff, was automated through email, creating a common central-point-of-failure for most users. The increased number of accounts held by individual users scuttled the assumption of dedicated passwords per account, and password reuse became commonplace. Phishing grew into a major concern, but anti-phishing proposals requiring protocol or user-interface changes failed to gain adoption. Instead, the primary countermeasure involves blacklists of known phishing sites and machine-learning classifiers to recognize new phishing sites as they arise.¹⁶

Attempts to make a dedicated business out of authentication in the consumer space have failed consistently. While there has long been interest in deploying hardware tokens as a second factor, standalone tokens (such as RSA SecurID) have seen limited deployment outside enterprise environments, likely due to the cost of tokens relative to the value of free online accounts. Microsoft Passport and OpenID, among many attempts to offer single sign-on for the Web, have failed to gain mass adoption.

The widespread availability of smartphones may be changing the equation, however, as in the early 2010s a number of online services, including Facebook, Google, and Twitter deployed free smartphone applications to act as a second factor based on the emerging time-based one-time password (TOTP) standard.²⁷ Other services send codes via short message service (SMS) as a backup authentication mechanism. A few services have offered dedicated tokens as a second factor, typically in environments at greater risk of fraud (such as eBay and World of Warcraft).

Random models for user behavior. In addition to being regarded as the weak link in password systems, users are also typically the most difficult component to model. Ideally, they would choose passwords composed of random characters. But even researchers who acknowledge this is an idealized model have usually underestimated the gap between modeled behavior and reality. Security policies and research models have been slow to adjust to the magnitude of the inaccuracies revealed by new data sources (such as leaked password datasets and large-scale measurement studies).⁵

One of the best-known early sources on password policies is the U.S. Defense Department’s circa 1985 Green Book,³⁸ which specified detailed policies for mitigating the risk of password guessing, including rate limiting, hashing passwords at rest, and limiting the lifetime of passwords. The Green Book avoided the complexity of user behavior altogether, putting forth as one of three main principles: “Since many user-created passwords are particularly easy to guess, all passwords should be machine-generated.”

The same year, NIST published its Password Usage guidelines in the Federal Information Processing Standards (FIPS) series³⁹ that were heavily derived from the Green Book. In addition to the recommended machine-chosen passwords, the FIPS guidelines allowed user-chosen passwords with the caveat that users “shall be instructed to use a password selected from all acceptable passwords at random, if possible, or to select one that is not related to their personal identity, history, or environment.” Today, nearly all nonmilitary applications allow user-chosen passwords due to the perceived difficulty of remembering machine-chosen passwords. Yet the FIPS guidelines retained most other recommendations from the Green Book unchanged, including calculations of password security based on allowed characters and length requirements, limits on password lifetime, and forced updates. This encouraged the unrealistically optimistic assumption that users choose passwords similarly to random password generators that has persisted to this day.

Estimating password strength via “entropy.” The guessing resistance of user-chosen passwords is often estimated by modeling passwords as random choices from a uniform distribution. This enables straightforward calculations of expected guessing times in the tradition of the 1985 Green Book. To attempt to capture the fact many users choose from a relatively small number of common passwords (despite the very large theoretical space from which to choose text passwords), researchers often choose a relatively small uniform distribution. The logarithm of the size of this uniform distribution in this model is often called “entropy,” in reference to Claude Shannon’s famous measure H₁ of the minimum average number of bits needed to encode symbols drawn from a distribution. Unfortunately, Shannon entropy models the number of guesses needed by an attacker who can check in constant time if an unknown password is a member of an arbitrarily sized set of possibilities. This does not correspond to any real guessing attack.

A more direct metric is “guesswork,” G,²⁵ the expected number of queries by an adversary guessing individual passwords sequentially until all are found. It can be shown that H₁ provides a lower bound on G.²⁵ However, G is also problematic, as it can be highly skewed by rare but difficult-to-guess passwords.⁵ To address this bias, “partial” (or “marginal”) guessing metrics have been developed.^5,32 One formulation is partial guesswork (G_α), which models an attacker making only enough guesses to have a probability α of succeeding.⁵ This encapsulates the traditional G when α = 1, with lower values of α modeling more realistic attackers. Such metrics have been proven not to be lower-bounded in general by Shannon entropy.^5,32 While partial-guessing metrics provide an appropriate mathematical model of password-guessing difficulty, they require a very large sample to be estimated accurately, typically millions of passwords,⁵ and have thus found limited practical use.

Heuristic measures of password strength are often needed for smaller datasets. NIST’s Electronic Authentication Guidelines⁸ (in many ways an update to the Green Book) acknowledged the mathematical unsoundness of Shannon entropy for guessing but still introduced a heuristic method for estimating “entropy” (NIST entropy) of password distributions under various composition policies. This model has since been used in many academic studies, though it has been found to produce relatively inaccurate estimates in practice.^23,40 The preferred empirical approach, albeit dependent on the configuration of a particular tool, is to simply run a popular open-source password-cracking library against a set of passwords and evaluate the average number of guesses needed to find a given proportion of them.^23,42 This approach can be applied to even a single password to evaluate its relative strength, though this clearly overestimates security relative to any real adversaries who use a more favorable cracking library.

Text-based passwords entered in HTML forms in exchange for HTTP cookies have become the dominant, albeit never formally specified, protocol for user authentication on the Web.

Improving password strength. Simple measures like Shannon and NIST entropy make increases in password strength seem tantalizingly close. Composition policies that increase the minimum length or expand the classes of characters a password seem to cause reliable increases in these measures if passwords are random; for example, the NIST guidelines suggest requiring at least one uppercase and non-alphabetic character. While acknowledging users may insert them in predictable places, the guidelines still estimate an increase in guessing difficulty of a password by six bits (or a factor of 64) compared to a policy of allowing any password. However, experiments have shown this is likely an overestimate by an order of magnitude.⁴⁰

Such password policies persist despite imposing a high usability cost, discussed later, though tellingly, their use is far less common at sites facing greater competition (such as webmail providers¹⁵) than at sites with little competition (such as universities and government services). Instead, research suggests the most effective policy is simply using a large blacklist³⁴ to limit the frequency of the most common passwords, bounding online guessing attacks to a predictable level, and conceding that many users will choose passwords vulnerable to offline guessing.

A related goal has been to nudge users toward better passwords through feedback (such as graphical meters that indicate estimated strength of their password, as they choose it). In an experimental setting, very aggressive strength meters can make guessing user-chosen passwords dramatically more difficult.³⁷ However, in studies using meters typical of those found in practice, with users who were not prompted to consider their password, the impact of meters was negligible; many users failed to notice them at all.¹² An empirical data point comes from Yahoo!, where adding a password-strength meter did improve password security but only marginally.⁵

Independence when choosing multiple passwords. A random user model often further assumes every password will be independently chosen. In practice, this is rarely true on the Web, as users cope with the large number of accounts by password reuse, sometimes with slight modification; for example, a 2007 telemetry study estimated the median user has 25 password-protected accounts but only six unique passwords.¹⁴ This has direct security implications, as leaks at one website can compromise security at another. Even if a user has not exactly reused the password, attackers can guess small variations that may double their chances of success in an online-guessing scenario.¹⁰ Related password choices similarly undermine the security goals of forced password updates, as an attacker with knowledge of a user’s previous sequence of passwords can often easily guess the next password.⁴³

Offline vs. online threats. The security literature distinguishes between online attackers who must interact with a legitimate party to authenticate and offline attackers who are limited only in terms of their computational resources.

Superficially, offline attackers are far more powerful, as they typically can make an unbounded number of guesses and compare them against a known hash of the password. Yet many additional avenues of attack are available to the online attacker: stealing the password using client-side malware, phishing the password using a spoofed site, eavesdropping the password as it is transmitted, stealing the password from the authentication server, stealing the password from a second authentication server where the user has reused it, and subverting the automated password reset process.

A critical observation is that strong passwords do not help against any of these other attacks. Even the strongest passwords are still static secrets that can be replayed and are equally vulnerable to phishing, theft, and eavesdropping. Mandating stronger passwords does nothing to increase security against such attacks.

In addition to being regarded as the weak link in password systems, users are also typically the most difficult component to model.

Offline guessing (cracking). Much attention has been devoted to devising strategies for picking passwords complex enough to resist offline cracking. Yet this countermeasure may stop real-world damage in at most a narrow set of circumstances.¹³ For an attacker without access to the password file, any guessing must be done online, which can be rate-limited. If passwords in a leaked file are unhashed, they are exposed in plaintext regardless of complexity; if hashed but unsalted, then large “rainbow tables”³¹ allow brute-force look-up up to some length.^a Only if the attacker has obtained a password file that had been properly hashed and salted do password-cracking efficiency and password strength make a real difference. And yet, while hashing and salting have long been considered best practice by security professionals, they are far from universal. Empirical estimates suggest over 40% of sites store passwords unhashed;⁷ recent large-scale password file leaks revealed many were plaintext (such as RockYou and Tianya), hashed but unsalted (such as LinkedIn), improperly hashed (such as Gawker), or reversibly encrypted (such as Adobe).

Finally, offline attackers may be interrupted if their breach is detected and administrators can force affected users to reset their passwords. Password resets are often not instituted at breached websites due to fear of losing users; they are even less commonly mandated for users whose password may have been leaked from a compromised third-party website where it may have been reused.

Online guessing. Online attackers can verify whether any given password guess is correct only by submitting it to the authentication server. The number of guesses that can be sent is limited. A crude “three strikes” model is an obvious way of throttling attacks, but relatively few sites implement such a deterministic policy,⁷ probably to avoid denial of service.

Nonetheless, online guessing attacks are in some ways much more costly to mount than offline attacks, on a per-guess basis. Whereas offline, an attacker might check a billion guesses on a single host, online an attacker might need thousands of hosts. First, if we assume IP addresses that send millions of failed attempts will be blocked, the load must be distributed. Also, the load could exceed legitimate traffic; in a service with one million users where the average user logs in once per day, a total of one billion guesses (one thousand guesses per account) is as many login requests as the legitimate population generates in three years. If legitimate users fail 5% or so of the time (due to, say, typos or forgetting) the attacker will generate as many fail events as the legitimate population generates in 60 years.

Choosing a password to withstand an offline attack is thus much more difficult than choosing one to withstand an online attack. Yet the additional effort pays off only in the very restricted circumstances in which offline attacks can occur.¹³ It makes little sense to focus on this risk when offline attacks are dwarfed by other vectors (such as malware).

Today’s “Overconstrained” World

Passwords offer compelling economic advantages over the alternatives, with lowest start-up and incremental costs per user. Due largely to their status as the incumbent solution, they also have clear “deployability” advantages (such as backward compatibility, interoperability, and no migration costs). But it is not these factors alone that are responsible for their longevity; the “password replacement problem” is both underspecified and overconstrained.^6,20

It is underspecified in that there is no universally agreed set of concrete requirements covering diverse environments, technology platforms, cultures, and applications; for example, many authentication proposals become utterly unworkable on mobile devices with touchscreens, many Asian languages are now typed with constant graphical feedback that must be disabled during password entry, and many large websites must support both low-value forum accounts and important e-commerce or webmail accounts through a single system. It is simultaneously overconstrained in that no single solution can be expected to address all requirements, ranging from financial to privacy protection. The list of usability, deployability, and security requirements is simply too long (and rarely documented explicitly).

An in-depth review of 35 proposed password alternatives using a framework of 25 comparison criteria found no proposal beats passwords on all fronts.⁶ Passwords appear to be a Pare-to equilibrium, requiring some desirable property X be given up to gain any new benefit Y, making passwords very difficult to replace.

Reviewing how categories of these password alternatives compare to regular passwords yields insight. Password managers—software that can remember and automatically type passwords for users—may improve security and usability in the common case but are challenging for users to configure across all user agents. This problem also affects some graphical password schemes,⁴ while others offer insufficient security gains to overcome change-resisting inertia. Biometric schemes, besides their significant deployment hurdles, appear poorly suited to the unsupervised environment of Web authentication; fraudsters can just replay digital representations of fingerprints or iris patterns. Schemes using hardware tokens or mobile phones to generate one-time access codes may be promising, with significant security advantages, but ubiquitous adoption remains elusive due to a combination of usability issues and cost.

Federated authentication, or “single sign-on” protocols, in which users are authenticated by delegation to central identity providers, could significantly reduce several problems with passwords without completely eliminating them. Yet, besides introducing serious privacy issues, they have been unable to offer a business model sufficiently appealing to relying sites. The most successful deployment to date, Facebook Connect (a version of OAuth), incentivizes relying parties with user data, mandating a central role for Facebook as sole identity provider, which does little for privacy.

With no clear winner satisfying all criteria, inertia becomes a substantial hurdle and the deck is stacked against technologies hoping to replace passwords entirely. A better choice is to prioritize competing requirements depending on organizational priorities and usage scenarios and aim for gradual adoption. Given their universal support as a base user-authentication mechanism, passwords are sensibly implemented first, offering the cheapest way to get things up and running when an idea is not yet proven and security is not yet critical, with no learning curve or interoperability hurdles. Low adoption costs also apply to users of new sites, who need low barriers when exploring new sites they are not sure they will return to. Financial websites are the rare exception, with offline capital and users whose accounts are all clearly valuable.

The list of challenges to would-be alternatives goes on. Improving security despite any decline in usability may mean losing potential new users (and sometimes existing users) to competitors. Some alternatives require server or client software modifications by various independent parties and are often a showstopper; some others expect large numbers of users to change their existing habits or be trained to use new mechanisms. However, some are only partial solutions or address only a subset of security threats; some are even less user friendly, though in new and different ways. Moreover, as mentioned earlier, some are more costly and bring other deployment challenges (such as interoperability, compatibility with existing infrastructure, and painful migration).

Advice to users. Users face a plethora of advice on passwords: use a different one for each account; change them often; use a mix of letters, punctuation, symbols, and digits; make them at least eight characters long; avoid personal information (such as names and birthdays); and do not write them down. These suggestions collectively pose an unrealistic burden and are sometimes mutually incompatible; a person cannot be expected to memorize a different complex password for each of, say, 50 accounts, let alone change all of them on a rolling basis. Popular wisdom has summarized the password advice of the security experts as “Pick something you cannot remember, and do not write it down.”

Each bit of advice may be useful against a specific threat, motivating security professionals to offer them in an attempt to cover their bases and avoid blame for any potential security breaches regardless of the inconvenience imposed on users. This approach is predicted to lead to failure by the “compliance budget” model³ in which the willingness of each user to comply with annoying security requirements is a finite, exhaustible resource that should be managed as carefully as any other budget. Indeed, websites (such as online stores), whose users are free to vote with their wallets, are much more careful about not exceeding their customers’ compliance budget than sites (such as universities) whose users are “captive.”¹⁵

Useful security advice requires a mature risk-management perspective and rough quantification of the risks and costs associated with each countermeasure. It also requires acknowledging that, with passwords as deployed today, users have little control over the most important countermeasures. In particular, running a personal computer free of malware may be the most important step, though it is challenging and often ignored in favor of password advice, which is simpler to conceptualize but far less important. Likewise, good rate limiting and compromise detection at the server-side are critical, as discussed earlier, but users have no agency other than to patronize better-implemented sites.

Choosing extremely strong passwords, as is often advised, is of far more limited benefit; evidence it reduces harm in practice is elusive. As noted earlier, password cracking is rarely a critical step in attacks. Hence making passwords strong enough to resist dedicated cracking attacks seems an effort poorly rewarded for all but the most critical Web accounts. For important accounts, password-selection advice should encourage passwords not easily guessed by acquaintances and sufficient for withstanding a reasonable amount of online guessing, perhaps one million guesses. About half of users are already above this bar,⁵ but discouraging simple dictionary passwords via advice, strength meters, and blacklists remains advisable to help out the others.

Advice to avoid reusing passwords is also common. While it is a good defense against cross-site password compromise, it is, for most users, incompatible with remembering passwords. Better advice is probably to avoid reusing passwords for important accounts and not to worry about the large number of accounts of little value to an attacker (or their owner).

Moreover, we consider the advice against writing passwords to be outmoded for the Web. Stories involving “Post-it notes on the monitor” usually refer to corporate passwords where users feel no personal stake in their security. Most users understand written-down passwords should be kept in a wallet or other safe location generally not available to others, even acquaintances. With this caveat, written passwords are a worthwhile trade-off if they encourage users to avoid very weak passwords. Password managers can be an even better trade-off, improving usability (no remembering, no typing) and allowing a different strong password for each account. However, they introduce a single point of failure, albeit perhaps no more vulnerable than Web accounts already due to the prevalence of email-based password reset.

Multidimensional Future

We appear stuck between the intractable difficulty of replacing passwords and their ever-increasing insecurity and burden on users. Many researchers have predicted the dam will burst soon and the industry will simply have to pay the necessary costs to replace passwords. However, these predictions have been made for over a decade.

The key to understanding how large service providers manage, using what appears to be a “broken” technology, is that websites do not need perfection. The problem of compromised accounts is just one of many forms of abuse, along with spam, phishing, click fraud, bots, fake accounts, scams, and payment fraud. None of them has been completely defeated technologically, but all are managed effectively enough to keep things running.

In nearly every case, techniques that “solve” the problem technically have lost out to ones that manage them statistically; for example, despite many proposals to end spam, including cryptographic protocols to prevent domain spoofing and microcharges for each email message sent, most email providers have settled for approaches that classify mail based on known patterns of attacker behavior. These defenses are not free or easy to implement, with large Web operators often devoting significant resources toward keeping pace with abuse as it evolves. Yet, ultimately, this cost is typically far less than any approach requiring users to change behavior.

In the case of authentication, banks provide a ready example of living with imperfect technology. Even though credit-card numbers are essentially static secrets, which users make no attempt to conceal from merchants, fraud is kept to acceptable levels by back-end classifiers. Technologies like “chip and PIN” have not been a magic bullet where deployed.² Cards are still stolen, PINs can be guessed or observed, signature transactions still exist as a fallback, and online payments without a PIN, or “card not present” transactions, are still widespread.

Yet banks survive with a non-binary authentication model where all available information is considered for each transaction on a best-effort basis. Web authentication is converging on a similar model, with passwords persisting as an imperfect signal supplemented by many others.

Web authentication as classification. Behind the scenes, many large websites have already transitioned to a risk-based model for user authentication. This approach emerged by the early 2000s at online financial sites.⁴¹ While an incorrect password means access should be denied, a correct password is just one signal or feature that can be used by a classifier to determine whether or not the authentication attempt involves the genuine account owner.

The classifier can take advantage of many signals besides the password, including the user’s IP address; geolocation; browser information, including cookies; the time of login; how the password is typed; and what resources are being requested. Unlike passwords, these implicit signals are available with no extra effort by the user.²¹ Mobile devices introduce many new signals from sensors that measure user interaction.¹¹ While none of these signals is unforgeable, each is useful; for example, geolocation can be faked by a determined adversary,²⁸ and browser fingerprinting techniques appear to be an endless arms race.²⁹ Nonetheless, both may be valuable in multidimensional solutions, as the difficulty of forging all signals can be significant in practice; for example, by combining 120 such signals, Google reported a 99.7% reduction in 2013 in the rate of accounts compromised by spammers.¹⁸

Unlike traditional password authentication, the outcome is not binary but a real-valued estimate of the likelihood the attempt is genuine. Generally these results will be discretized, as users must be given access to some resource or not, and any classifier will inevitably make false accept and false reject errors. Sites will continue to develop their machine-learning techniques to reduce these errors and may deploy new technology (such as two-factor authentication and origin-bound certificates¹⁷) to increase the number (and quality) of signals available.

Web authentication is by no means an easy domain to address for machine learning. The trade-off between false accepts and false rejects is difficult to get right. For financial sites, false accepts translate to fraud but can usually be recovered from by reversing any fraudulent payments. However, for sites where false accepts result in disclosure of sensitive user data, the confidentiality violations can never be undone, making them potentially very costly. Meanwhile, false rejects annoy customers, who may switch to competitors.

Only if the attacker has obtained a password file that had been properly hashed and salted do password-cracking efficiency and password strength make a real difference.

Obtaining a large sample of ground truth to train the classifier is another challenge, as it is difficult to find examples of attacks administrators do not yet know about. Financially motivated attackers are again likely the easiest to deal with, as their attacks typically must be scalable, leading to a large volume of attacks and hence training data. Nonfinancially motivated attackers (such as expartners) may be more difficult to detect algorithmically, but users are far better positioned to deal with them in real life. Targeted attacks, including “advanced persistent threats,” which are technically sophisticated and aimed at a single user account, are the most difficult challenge, as attackers can tailor techniques to victims and leave relatively little signal available for classifiers.

New modes of operation. Authentication by classification enables fundamentally new methods of operation. Authentication can be a more flexible process, with additional information demanded as needed if the classifier’s confidence is low or the user attempts a particularly sensitive operation, a process called “progressive authentication”;³³ for example, a site may ask users to confirm their identity by SMS or phone call if it notices suspicious concurrent activity on their account from geographically distant locations. “Multi-level authentication” becomes possible, with users given limited access when the classifier’s confidence is relatively low. In the U.K., some banks offer users a read-only view of their account with just a password but require a security token to actually transfer money out.

Sites may also ask for less information, including not requiring a password to be entered, when they have reasonable confidence, from secondary signals, the correct user is present. A form of this is already in place—where persistent session cookies once allowed password-less login for a predetermined duration, the decision of when to recheck the password is now made dynamically by a risk classifier instead. A stronger version is opportunistic two-factor authentication, ensuring correct authentication when the second factor is present but enabling fallback security if the password is still correct and enough additional signals are presented.¹⁷

The limit of this evolution is “continual authentication.” Instead of simply checking passwords at the entrance gate, the classifier can monitor the actions of users after letting them in and refining its decision based on these additional signals. Ultimately, continual authentication may mean the authentication process grows directly intertwined with other abuse-detection systems.

Changes to the user experience. As sites aim to make correct authentication decisions “magically” on the back-end through machine learning, most changes to the user experience should be positive. Common cases will be increasingly streamlined; users will be asked to type passwords (or take other explicit action) as rarely as possible. However, users also face potential downsides, as systems grow increasingly opaque and difficult to understand.

First, users may see more requests for second factors (such as a one-time code over SMS) when the classifier’s confidence is low. Users may also face more cases (such as when traveling or switching to a new machine) where they are unable to access their own account despite correctly remembering their password, akin to unexpected credit-card rejections while abroad. Increased rejections may increase the burden on “fallback” authentication, to which we still lack a satisfactory solution.

As authentication systems grow in complexity, their automated decisions may cause users increased confusion and distress. Users are less likely to buy into any system that presents them with inconveniences they do not understand. Training users to respond with their credentials to asynchronous security challenges on alternative channels may also pave the way for novel phishing attacks. Even with careful user interface design, users may end up confused as to what the genuine authentication ceremony²² should be.

Another challenge is that better classifiers may break some access-control practices on top of passwords users have grown accustomed to; for example, users who share passwords with their spouses or their assistants may face difficulty if classifiers are able to (correctly) determine another human is using their password, even though this is what the user intended.

Finally, typing passwords less often could in fact decrease their usability, as users are more likely to forget them if they go long periods between needing to type them.

Targeted attacks, including “advanced persistent threats,” technically sophisticated and aimed at a single user account, are the most difficult challenge, as they can tailor techniques to victims and leave relatively little signal available for classifiers.

Advantages of scale. Authentication may represent a classic example of the winner-take-all characteristics that appear elsewhere on the Web, since it offers benefits to scale in two different ways: First, large services are more likely to be accepted by relying parties as an identification service. Being accepted by more relying parties in turn encourages users to register accounts, further enhancing the attractiveness of these identity providers to relying parties. The second, “two-sided market,” or positive feedback loop, is for user data. Large services with more user data can provide more accurate authentication. This attracts users to interact with the services more frequently, providing even more data for authentication. Developing and maintaining a complex machine-learning-based approach to authentication requires a relatively large fixed technical cost and low marginal costs per user, further advantaging the largest identity providers.

One consequence of this consolidation is that, lacking access to the volumes of real-world data collected by larger service providers, independent researchers may be limited in their ability to contribute to several important research topics for which the limits of artificial datasets and mental extrapolation make empirically grounded research essential. Other areas of Web research (such as networking, which requires massive packet capture or search-engine research requiring huge numbers of user queries) have likewise become difficult for researchers with access to only public data sources.

There are also troubling privacy implications if relying parties require users to sign up with a large service that, in turn, requires a significant amount of personal information to perform authentication well. This information may be inherently sensitive (such as time and location of login activity) or difficult to change if leaked (such as behavioral biometrics like typing patterns). Many users already trust highly sensitive information to large online services, but authentication may be a motivating factor to collect more data, store it longer, and share it with more parties.

Conclusion

Passwords offer plenty of examples of divergence between theory and practice; estimates of strength, models of user behavior, and password-composition policies that work well in theory generally remain unsupported by evidence of reduced harm in practice and have in some cases been directly contradicted by empirical observation. Yet large Web services appear to cope with insecure passwords largely because shortcomings can be covered up with technological smarts in the back-end. This is a crucial, if unheralded, evolution, driven largely by industry, which is well experienced in data-driven engineering. Researchers who adapt their models and assumptions to reflect this trend will be in a stronger position to deliver relevant results. This evolution is still in its early stages, and there are many important and interesting questions about the long-term results that have received little or no study to this point. There is also scope for even more radical rethinking of user authentication on the Web; clean-slate approaches may seem too risky for large companies but can be explored by more agile academic researchers. Tackling these novel challenges is important for ensuring published research is ahead of industry practice, rather than the other way around.

Acknowledgments

Joseph Bonneau is funded by a Secure Usability Fellowship from Simply Secure and the Open Technology Fund. Paul van Oorschot is funded by a Natural Sciences and Engineering Research Council of Canada Chair in Authentication and Computer Security and a Discovery Grant. Frank Stajano is partly supported by European Research Council grant 307224.

Submit an Article to CACM

CACM welcomes unsolicited submissions on topics of relevance and value to the computing community.

You Just Read

Passwords and the Evolution of Imperfect Authentication

View in the ACM Digital Library

DOI

10.1145/2699390

July 2015 Issue

Published: July 1, 2015

Vol. 58 No. 7

Pages: 78-87

Table of Contents

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Explore More

BLOG@CACM Apr 26 2024

Optimizing Energy Efficiency in Datacenters with Advanced Cooling Technologies

Alex Williams

Architecture and Hardware

Credit: Getty Images Servers in snowy setting.

News Apr 23 2024

Maximizing Power Grid Security

R. Colin Johnson

Security and Privacy

News Apr 18 2024

Keeping AI Out of Elections

Bennie Mols

Artificial Intelligence and Machine Learning

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More