Consumer studies have shown that online users value personalized content. At the same time, providing personalization on Web sites also seems quite profitable for Web vendors. This win-win situation is marred, however, by potential privacy threats since personalizing people’s interaction entails gathering considerable amounts of data about them. Numerous consumer surveys have revealed that computer users are very concerned about their privacy online. Examples for privacy concerns in connection with valued personalized services include the following (the first three services are real; the fourth is on the horizon):
- Online shoppers who value an online bookstore that offers personalized recommendations based on books they bought in the past may wonder whether their purchase records will be kept truly confidential.
- Online searchers who are pleased that a search engine disambiguates their queries and delivers search results geared toward their genuine interests may feel uneasy that this procedure entails recording all their past search terms.
- Students who appreciate that a personalized tutoring system can provide individualized instruction based on a detailed model of each student’s understanding of the different learning concepts may wonder whether anyone else besides the system will have access to these models of what they know and don’t know.
- Office workers who value a help component from their word processor that gives them personalized advice based on a model of their individual word-processing skills may be concerned that the contents of their model become accessible to others in the company, specifically when negative consequences may arise from disclosing the skills they lack.
Other potential perceived privacy threats in the context of personalized systems include unsolicited marketing, computers “figuring things out” about the user, fear of price discrimination, information being revealed to other users of the same computer, unauthorized access to accounts, subpoenas by courts, and government surveillance [4].
In addition to individual privacy concerns, the collection of personal data is also subject to legal regulations in many countries and states (with the scope of some laws extending beyond the national boundaries), as well as to industry codes of conduct. Both user concerns and privacy regulations impact the type of data being collected as well as the methods employed for processing the data.
Since having less data about users and personalization methods available is generally regarded as detrimental to the quality of personalization, the existence of a “trade-off” between privacy and personalization and a need to “balance” privacy and personalization were postulated around the turn of the century. This perspective would suggest that figuratively speaking, an increase in personalization would result in a decrease of privacy by about the same amount, and vice versa.
However, recent research has shown that more factors than the degree of privacy and personalization must be taken into account when looking at the overall acceptability of a personalized system from a privacy point of view. Moreover, even when considering privacy and personalization in isolation, there seem to be a number of personalization methods that afford a significantly higher degree of privacy than traditional methods for the same purpose, with nearly the same personalization quality. The field of privacy-enhanced personalization [9, 10] aims to reconcile the goals and methods of user modeling and personalization with privacy considerations, and to strive for best possible personalization within the boundaries set by privacy. The research area is widely interdisciplinary, relying on contributions from the information and computer sciences, information systems, marketing research, public policy, economics, and law. This article summarizes major results that were obtained to date and their implications for the design of personalized systems.
The Privacy Calculus
Current privacy theory regards people’s privacy-related behavior as the result of a situation-specific cost-benefit analysis, in which the potential risks of disclosing one’s personal information are weighed against potential benefits of disclosure. However, Internet users often lack sufficient information to be able to make educated privacy-related decisions. For instance, users often underestimate the probability with which they can be identified if they disclose certain data, or are unfamiliar with a site’s privacy practices since privacy statements are difficult to understand and rarely read. Like all complex probabilistic decisions, privacy-related decisions are also affected by systematic deviations from rationality. For instance, Acquisti [1] discusses the possibility of hyperbolic temporal discounting in such decisions, which may lead to an overvaluation of small but immediate benefits and an undervaluation of future negative privacy impacts.
A number of factors have been identified that play a role in the privacy calculus of Internet users. These factors include personality- and culture-based privacy attitudes, the type of information to be disclosed and its deviance from the average, the recipient, the value being assigned to personalization benefits, the extent to which users know what information has been disclosed and can control its usage, and various trust-establishing factors. These factors are described in more detail and the consequences for the design of privacy-enhanced personalized systems are discussed.
Individual Privacy Attitudes
Various surveys established that age, education, and income are positively associated with the degree of stated Internet privacy concern. Gender effects on Internet privacy concerns have yet to be clearly established. Several surveys since the early 1980s were able to cluster respondents into roughly three groups: Privacy fundamentalists generally express extreme concern about any use of their data and an unwillingness to disclose information, even when privacy protection mechanisms would be in place. The privacy unconcerned tend to express mild concern for privacy, and mild anxiety about how other people and organizations use information about them. Privacy pragmatists are generally concerned about their privacy as well, but less than the fundamentalists. They are also far more willing to disclose personal information, for example, when they understand the reasons for its use, see benefits for doing so, or see privacy protections in place. The size ratio between these clusters is roughly 1:1:2, but the exact numbers differ noticeably across surveys and over time, with a slight decline of fundamentalists and the unconcerned over the past two decades and a corresponding increase in the number of pragmatists.
However, the predictive value of these attitudinal clusters is low. Several studies [8] have shown that privacy fundamentalists do not act much differently in situated data-disclosure decisions than the other groups. It would seem that the mitigating factors play a more important role in concrete privacy decisions than abstract attitudes solicited out of context. Fortunately, designers can address and bolster these mitigating factors in the design of privacy-enhanced personalized systems.
Many privacy surveys indicate that Internet users find it important to know how their personal information is being used, and to have control over this usage.
Type of Information to be Disclosed
Several surveys confirm that Internet users generally feel differently about the disclosure of different types of information [8]. They are usually quite willing to disclose basic demographic and lifestyle information as well as personal tastes and hobbies. They are slightly less willing to disclose details about their Internet behavior and purchases, followed by more extended demographic information. Financial information, contact information, and specifically credit card and Social Security numbers raise the highest privacy concerns. An experiment by Huberman et al. [7] suggests that not only different data categories, but also the extent to which data values deviate from the group average, has an effect on people’s concern about disclosure (this was verified for age, weight, salary, spousal salary, credit rating, and amount of savings). The results indicate the more undesirable a trait is with respect to the group norm, the higher is its privacy valuation.
The lesson from these findings for the design of personalized systems seems that highly sensitive data categories should never be requested without the presence of some of the mitigating factors discussed here. Values that deviate considerably from socially desired norms should preferably be solicited as open intervals only whose closed boundary does not deviate too much from the expected norm (such as “weight: 250 pounds and above” for male adults).
Value of Personalization
Recent surveys indicate that about 80% of Internet users are interested in personalization. While researchers today experiment with a myriad of personalization services that provide various potential benefits [2], users currently only seem to value a few of them: time savings, monetary savings, and to a lesser extent, pleasure received the highest approval in one survey, and customized content provision and remembering preferences in another. Chellappa and Sin found that “the consumers’ value for personalization is almost two times […] more influential than the consumers’ concern for privacy in determining usage of personalization services. This suggests that while vendors should not ignore privacy concerns, they are sure to reap benefits by improving the quality of personalized services that they offer” [3].
These findings imply that developers of personalized systems must clearly communicate the benefits of their services to users, and ascertain these benefits are indeed desired. If users perceive value in personalized systems, they are considerably more likely to intend to use them and to supply the required personal information.
Awareness of and Control over the Use of Personal Information
Many privacy surveys indicate that Internet users find it important to know how their personal information is being used, and to have control over this usage. In one survey, 94% agree they should have a legal right to know everything a Web site knows about them. In another, 63% of those who indicated having provided false information to a Web site or declined to provide information at all said they would have supplied the information had the site provided notice about how the information would be used prior to disclosure, and if they were comfortable with these uses. In an behavioral experiment [11], Web site visitors disclosed significantly more information about themselves when, for every requested piece of personal information, the Web site explained the user benefits and the site’s privacy practices in connection with the requested data. In another study, 69% said that “controlling what information is collected about you” is extremely important, and 24% still regarded it as somewhat important.
These findings suggest that personalized systems should be able to explain to users what facts and assumptions about them are being stored, and how these are going to be used. Moreover, users should be given ample control over the storage and usage of this data. This is likely to foster users’ data disclosure, and at the same time complies with the rights of data subjects accorded by many privacy laws, industry and company privacy regulations, and Principles of Fair Information Practices. The accompanying figure shows an example from Amazon.com in which users become informed about what personal data is being used for generating recommendations, and are also given control over this use (however, they cannot remove data from the system).
Trust
Trust in a Web site is a very important motivational factor for the disclosure of personal information. In one survey, nearly 63% of consumers who declined to provide personal information to Web sites gave as the reason that they do not trust those who are collecting the data. Conversely, trust has been found to positively affect people’s stated willingness to provide personal information to Web sites, and their actual information disclosure to an experimental Web site.
Several antecedents to trust have been empirically established, and for many of them effects on disclosure have also been verified.
Positive past experience is an established factor for trust whose impact on the disclosure of personal information is well supported. Of specific importance are established, long-term relationships. Developers of personalized systems should not regard the disclosure of personal information as a one-time matter, as is currently often the case (remember the lengthy registration form you had to complete upon your first visit, with virtually all fields marked by an asterisk?). Users of personalized Web sites can be expected to become more forthcoming with personal details over time if they have positive experiences with the same or similar sites. Personalized Web sites should be designed in such a way that users can utilize them at least adequately with any amount of personal data they chose to disclose, and allow users to incrementally supply more information later whereupon their experience with the personalized system should improve.
Design and operation of a Web site. Various interface design elements and operational characteristics of a Web site have been found to increase users’ trust [5]: the absence of errors, the (professional) design and usability of a site, the presence of contact information, links from a believable Web site, links to outside sources and materials, updates since last visit, quick responses to customer service questions, and email confirmation for all transactions. Therefore, personalization should preferably be used in professionally designed and easy-to-use Web sites that possess some of these trust-enhancing design elements and operational characteristics.
Reputation of the Web site operator. Several studies found the reputation of the organization that operates a Web site is a crucial factor for users’ trust, and for their willingness to disclose personal information [8]. In an experiment, subjects were significantly less willing to provide personally identifiable information (specifically their phone numbers, home and email addresses, and Social Security and credit card numbers) to lower-traffic sites that were presumably less known to them.
The lesson for the design of personalized systems seems to be that everything else being equal, users’ information disclosure at sites of well-reputed companies is likely to be higher than at sites with lower reputation. Personalization is therefore likely to be more successful at more highly regarded sites, unless extra emphasis is put on other factors that foster the disclosure of personal data. Designers should not employ personalization features as a “gimmick” to increase the popularity of Web sites with low reputations since users may not take advantage of them if they must disclose personal data to such sites.
Presence of a privacy statement. Traditional privacy statements on Web sites (often called “privacy policies”) describe the privacy-related practices of these sites. The effects of privacy statements on users’ trust and disclosure behavior are unfortunately somewhat unclear as yet. In several studies [8], the mere presence of a privacy link had a positive effect on both trust and disclosure (one experiment found a negative effect though). Inconclusive results have so far been obtained on whether the level of privacy protection that a privacy statement affords also has an effect on trust and disclosure. This seems unlikely for current privacy statements “in the wild” since several reading-ease analyses revealed the policies of major Web sites are too difficult in their wordings to be comprehensible to the majority of Web users. Not surprisingly, Web server logs indicate only a fraction of Web visitors access privacy statements at all (less than 1%/0.5%, according to two different sources).
The preliminary lesson for the design of personalized systems seems that traditional privacy statements should not be posted with the expectation of increasing users’ trust and/or disclosure of personal information, even when statements describe good company privacy practices. There are, however, other good reasons for posting such statements, such as legal or self-regulatory requirements in many countries, or demonstrated good will. Evidence is mounting that privacy-minded company practices can have a positive effect if they are communicated to Web users in comprehensible forms, such as through logos that indicate the level of privacy protection (by analyzing a P3P-encoded version of the privacy policy) [6], or by explaining the implications of the privacy policy in a localized and contextualized manner [11]. The figure shows examples of such strategies. More research will be needed to find such better communication forms for corporate privacy practices.
Presence of a privacy seal. Several studies indicate that the meaning of privacy seals is not well understood by current Web users [9]. Other research found the largest seal-awarding organization in the U.S. lacks scrutiny in the selection of its seal holders, and that sites displaying its seal are more likely to use privacy-invasive practices than sites that have no seal.1 Nevertheless, the presence of privacy seals at a Web site has clear effects on Web visitors, particularly on their perception of trust in the Web site, their perception of its privacy policies, and their stated willingness to disclose data. For designers of personalized systems, the pragmatic conclusion at this point is to display privacy seals as long as Web users associate trust with them since doing so is likely to foster users’ disclosure behavior.
Normative Approaches
To date, more than 40 countries and numerous states have enacted privacy laws. Many companies and a few industry sectors additionally or alternatively adopted self-regulatory privacy guidelines. These laws and self-regulations are often based on more abstract principles of fair practices with regard to the use of personal information. Some of the effects of these regulatory instruments on personalized systems are described here.
Privacy laws. Since personalized systems collect personal data of individual people, they are subject to privacy laws and regulations if the respective individuals are in principle identifiable. As noted, more than 40 countries and numerous states have privacy laws. They lay out procedural, organizational, and technical requirements for the collection, storage, and processing of personal data, to ensure the protection of this data and the data subjects to whom it applies. General requirements include conditions for legitimate data acquisition, transfer and processing, and the rights of data subjects. Other legal stipulations address adequate security mechanisms, disclosure duties, and the supervision and audit of personal data processing.
Some requirements imposed by privacy laws also affect the permissibility of personalization methods. Several international privacy laws prohibit the use of popular personalization methods without the user’s consent. Several such provisions are described in the sidebar “Provisions from Various European Privacy Laws Affecting Popular Personalization Methods.”
It is possible for users of personalized systems to enjoy anonymity and at the same time receive full personalization.
Principles of fair information practices. Over the past three decades, several collections of basic principles have been defined for ensuring privacy when dealing with personal information. So-called Principles of Fair Information Practices have been drafted by a number of countries (such as Australia and Canada) as foundations for their national privacy laws, by supranational organizations (such as the OECD and APEC) as guidance for their member states, and by professional societies (such as the ACM) as recommendations for policymakers and as guidance for the professional conduct of their members.
Developers of personalized systems should take such privacy principles into account if those are not already indirectly considered through applicable privacy laws and industry or company guidelines. Many principles have direct implications on personalized systems. The sidebar “USACM Recommendations and their Effect on Privacy-Enhanced Personalized Systems” notes several implications of the recommendations of the ACM’s U.S. Public Policy Committee.
Privacy-Enhancing Technology for Personalized Systems
Several technical approaches are discussed here that may reduce privacy risks and make privacy compliance easier. They are by no means “complete technical solutions” to the privacy risks of personalized systems, and their presence is also unlikely to “charm away” users’ privacy concerns. Rather, these technologies should be employed as additional privacy protections in the context of a user-oriented system design that also takes normative aspects into account. These technologies are still in the research stage, but some of them seem deployable to practical applications.
Pseudonymous users and user models. It is possible for users of personalized systems to enjoy anonymity and at the same time receive full personalization [12]. In an anonymization infrastructure that supports personalization, users would be unidentifiable, unlinkable, and unobservable for third parties, but linkable for the personalized system through a pseudonym. A number of authors proposed infrastructures for pseudonymous yet personalized user interaction with Web sites based on some or all of these properties. Several authors expect that Internet users are more likely to provide information when they are not identified, which may improve the quality of personalization and the benefits that users receive from it. To date, however, this claim has not found much empirical substantiation. Designers should definitely allow for pseudonymous access and pseudonymous user models (and even allow for anonymization architectures with the above properties if one is readily available). This follows from the data minimization and security requirements of the Principles of Fair Information Practices discussed in the USACM recommendations sidebar. Some privacy laws also mandate or recommend the provision of pseudonymous access if it is technically possible and not unreasonable. An interesting side effect of pseudonymous access is that in most cases privacy laws do not apply when users cannot be identified with reasonable means.
Due to a lack of relevant studies, it is unclear whether increased anonymity will lead to more disclosure and better personalization. Anonymity is currently also difficult and/or tedious to preserve when payments, physical goods, and non-electronic services are being exchanged. It harbors the risk of misuse and hinders vendors from cross-channel marketing (for example, sending a products catalog to a Web customer by postal mail). Finally, research has shown that the anonymity of database entries, Web trails, query terms, ratings, and textual data can be surprisingly well defeated by a resourceful attacker who has identified data available that can be partly matched with the “anonymous” data.
Client-side personalization. A number of researchers have worked on personalized systems in which users’ data is located at the client side rather than the server side. Likewise, all personalization processes that rely on this data are exclusively carried out at the client side. From a privacy perspective, this approach has two major advantages:
- The privacy problem becomes smaller since very little, if any, personal data of users will be stored on the server. In fact, if a Web site with client-side personalization does not have control over any data that would allow for the identification of users with reasonable means, it will generally not be subject to privacy laws.
- Users may possibly be more inclined to disclose their personal data if personalization is performed locally upon locally stored data rather than remotely on remotely stored data, since they may feel more in control of their local physical environments.
However, client-side personalization also poses numerous challenges, including:
- Popular user modeling and personalization methods that rely on an analysis of data from the whole user population, such as collaborative filtering and stereotype learning, cannot be applied anymore or will have to be radically redesigned.
- Personalization processes will be required to operate at the client side since even a mere temporary or partial transmission of personal data to the server is likely to annul the above-mentioned advantages of client-side personalization. However, program code used for personalization often incorporates confidential business rules or methods, and must be protected from unauthorized access. Trusted computing platforms must be developed for this purpose.
If these drawbacks pose no problems in a specific application domain, then developers of personalized Web-based systems should definitely adopt client-side personalization as soon as suitable tools become available. Doing so would constitute a great step forward in terms of the data minimization principle (see sidebar on USACM recommendations) and is also likely to increase users’ trust.
Privacy-enhancing techniques for collaborative filtering. Traditional collaborative filtering systems collect large amounts of information about their users in a central repository (for example, users’ product ratings, purchased products, or visited Web pages) to find regularities that allow for future recommendations. These central repositories may not always be trustworthy, however, and may constitute attractive targets for unauthorized access. To some extent, central repositories can also be mined for individual user data by asking for recommendations using cleverly constructed profiles. A number of techniques have been proposed and (partially) technically evaluated that can help protect the privacy of users of collaborative-filtering based recommender systems:
- Distribution. This approach abandons central repositories containing the data of all users in favor of interacting distributed clusters that contain information about a few users only.
- Aggregation of encrypted data allows users to privately maintain their own individual ratings, and a community of such users to compute an aggregate of their private data without disclosing them by using homomorphic encryption and peer-to-peer communication. The aggregate then allows personalized recommendations to be generated at the client side using the client’s ratings.
- Perturbation. In this approach, users’ ratings are submitted to a central server, which performs all collaborative filtering. The ratings become systematically altered before submission though, to hide users’ true values from the server.
- Obfuscation. Here, a certain percentage of users’ ratings become replaced by different values before the ratings are submitted to a central server for collaborative filtering. Users can then “plausibly deny” the accuracy of any of their data should they become compromised.
Conclusion
Research on privacy-enhanced personalization aims to reconcile the goals and methods of user modeling and personalization with privacy considerations, and at achieving the best possible personalization within the boundaries set by privacy. As illustrated throughout this article, no silver bullet exists for radically enhancing the privacy-friendliness of personalized systems, neither technical nor legal nor social/organizational. Instead, numerous small enhancements must be introduced that depend on the application domain as well as the types of data, users, and personalization goals involved. Many of the approaches described here are ready to be deployed to practical systems, and feedback from such deployments will, in turn, be very informative for research purposes. Other approaches still require further technical development or evaluation in user experiments and may yield fruitful solutions in the future.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment