As technologies available for collection and analysis of Web data have become more elaborate, data privacy concerns among Internet users have grown. In particular, they worry that Web merchants sell customer data to third parties, clog their mailboxes with unsolicited email, place persistent cookies on their PCs or enable third parties to do so . To protect their privacy, users abort transactions, falsify personal details, or maintain several email accounts. Such practices deprive Web merchants of information critical to meeting customer needs and sustaining a competitive advantage .
To encourage users to participate in online transactions, Web merchants must ease people's concerns about data misuse. To earn users' trust, a Web site should make it explicit that customer data is treated in a fair and responsible manner . It has therefore become common practice for Web merchants to post privacy policies on their Web sites to inform users about data handling practices. Another factor conducive to user trust is the level of control users have over their data by means of opt-in or opt-out facilities .
For this knowledge and control to engender trust, it is crucial that users perceive an organization making commitments to user privacy as credible . To achieve this, companies supplement their privacy policies with privacy seals or make their Web sites P3P-compliant . The more trust users have in a Web site, the more likely they are to buy from the site, visit it again, or recommend it to others  (as illustrated in the accompanying figure).
Previous research has found that U.S. online privacy policies do not address those areas of data handling that concern users. Rather, these documents are written in a manner that protects companies against privacy lawsuits by integrating privacy legislation that regulates, for example, information gathered from children (Children's Online Privacy Protection Act), financial data (Gramm-Leach-Bliley Act), and medical records (Health Insurance Portability and Accountability Act), or state legislation such as California's Online Privacy Protection Act . In addition, Internet users have been found not to read online privacy policies because they find them too legalistic and therefore difficult to understand . Another study has assessed privacy policies by means of readability formulae and found that readers would require at least some college education to understand the complex words and sentence structures in these texts . By discouraging users from reading policies, companies forego the opportunity to ease privacy concerns and build trust.
Since the manner in which a company communicates its data handling practices to Internet users has a bearing on its success in e-commerce, this article sheds light on why privacy policies fail to communicate data handling practices effectively. I conducted two separate studies, one examining what these documents say or do not say about data handling practices, and the other focusing on how companies describe their data handling practices in these documents. The purpose of these studies was not to describe the current state of privacy policies but to identify weaknesses and make suggestions for improvements.
The sample chosen was therefore not intended to be representative of commercial Web sites. Rather, 50 Web sites covering a broad spectrum of business models were chosen on the basis of their commercial success, since successful e-commerce sites may serve as lead innovators for other Web sites. Their success was determined using Alexa.com traffic rankings, a ranking of online retailers in Store magazine, and articles from the business press. The privacy policies were collected from the following commercial Web sites:
Of the 50 companies, 19 displayed at least one privacy seal at the time of data collection. Varying in length from 575 to 6,139 words, the privacy policies resulted in a text corpus of 108,570 words.
Overall, 39.4% of these questions could not be answered because the policies did not contain sufficient information. Table 2a gives a breakdown of these questions by category, showing the results obtained for all 50 companies and for those 19 companies among them that display privacy seals. The high proportion of unanswered questions pertaining to third-party data collection can be attributed to the fact that not every company allows third parties to collect data and therefore does not mention it in its policy. However, the high proportion of companies disclosing insufficient information about data collection, storage, sharing, and spam email clearly shows that privacy policies do not cover data handling practices in a satisfactory manner. Users cannot be sure whether companies do not engage in such practices or simply fail to mention that they do. It is also worth noting that the results for those companies displaying a privacy seal are only slightly better than those of the total sample, suggesting that privacy seals are no guarantee for comprehensive privacy policies.
Data storage stands out as one area of insufficient disclosure among both the total sample and those companies displaying privacy seals. While no company said it did not take steps to ensure secure offline storage and prevent unauthorized employee access or that users cannot delete their personal information, the level of disclosure on these aspects of data storage was never higher than 32% for the total sample. These results call for more detailed disclosure of data storage practices and users' control over data stored about them.
Since data sharing is one of the most prevalent concerns among Internet users , the coverage of this practice was examined in more detail. Table 2b indicates whether or not the companies share aggregate or personally identifiable information (PII) with either affiliated companies or third parties other than business agents. The proportion of companies providing no relevant information is alarming, particularly regarding the sharing of aggregate and PII with affiliates. The high percentage of companies admitting to sharing personal data with affiliates is also worth noting, considering that affiliates may maintain completely different privacy policies.
Companies also admit to practices such as selling user data, sharing data obtained through sweepstakes and contests, allowing third parties to collect data, sending unsolicited communications to registered users, or sharing email addresses with third parties. This makes it all the more important that users read privacy policies to become aware of what can happen to their data and to be able to make an informed decision as to whether or not they want to disclose personal information on a Web site.
The analysis of the language of privacy policies was based on critical linguistics , a method that seeks to uncover how authors of texts use language to construct their own versions of reality. In the context of privacy policies, this "version of reality" refers to how companies present their data handling practices to their readers. The goal of this analysis was to determine why privacy policies are difficult to understand and why readers do not consider them worth reading [1, 9]. Among the parameters put forward in critical linguistics, the following four were suitable for the analysis of privacy policies:
Lexical Choice. The analysis of the vocabulary has revealed that companies sugar-coat data handling practices by foregrounding positive aspects and backgrounding privacy invasions. These enhancements of data handling practices occur, for example, when companies claim the email messages they send to registered users are of "interest to them" or that the parties they share information with are "carefully selected." Also, companies choose verbs that exclude themselves in order to remove themselves from statements disclosing unethical practices. For example, they state that you receive unsolicited email messages instead of we send them.
Lexical choice also plays a role when companies talk about opt-in/opt-out facilities for certain practices. The framing of opt-in or opt-out messages has been found to influence people's privacy preferences . In the policies examined, companies use phrases such as only when authorized, if you authorize us, or not without your permission to describe practices relating to unsolicited commercial email. However, these lexical choices do not make it clear whether this authorization is the result of opting-in or not opting-out. Users may thus not be aware they have given authorization to a company by not opting out.
Syntactical Transformations. These were found in connection with data sharing, when companies avoid using we share but use nouns (the sharing) or switch to passive structures instead in order to distance themselves from such practices. For example: "We want you to know about the personal information we collect, how we use that information and with whom it may be shared." Since such passive structures do not make it explicit who is responsible for an action, it seems that companies try to de-emphasize the fact they share information with third parties.
Negation. To deny certain practices, negations were used frequently throughout the corpus, but not frequently enough, as the content analysis here has revealed. Not, for example, is the ninth most frequent word in the corpus, not counting grammatical words such as articles, prepositions, and pronouns. Although negative statements are generally more difficult to process for humans than positive ones, explicitly stating that a certain practice is not carried out is indispensable in easing users' privacy concerns. Negation is also used together with rhetoric hedges as in except as otherwise stated we do not [...]. Such phrases give carte blanche to the company to engage in any practice not expressly ruled out but provide little information about what actually happens with user data.
Modality. Essentially, modal verbs and adverbs make sentences vague. The corpus of privacy policies contains 948 instances of may and 123 instances of might, perhaps, sometimes, occasional(ly), and from time to time, all of which are instances of modality. May is, in fact, the fourth most frequent non-grammatical word in the corpus topped only by information, use, and site. These modality markers downplay the frequency with which companies carry out certain data handling practices. In addition, legal expressions such as we reserve the right to are reflections of modality, allowing several interpretations as to whether these practices are carried out or not.
The rhetorical features that emerged from this analysis were grouped into four broad patterns (see Table 3), including mitigation, enhancement, obfuscation, and omission. Two sample sentences containing these patterns illustrate how companies use language to construct their own privacy realities:
The main goal of the linguistic analysis was to identify realities created through language rather than to produce quantitative indices of language. However, these are necessary to examine differences in communicative quality between companies with privacy seals and those without seals. Table 3 compares the average number of instances of each pattern in the privacy policies of both types of companies, capturing occurrences of the textual realizations listed in the table as well as similarly worded phrases. The results indicate that the privacy policies of companies with seals are by no means less ambiguous than those of companies without seals, suggesting that compliance with a seal program impacts content but not language.
These findings suggest that online privacy policies have been drafted with the threat of privacy litigations in mind rather than commitment to fair data handling practices.
The findings noted here suggest that online privacy policies have been drafted with the threat of privacy litigations in mind rather than commitment to fair data handling practices. The content analysis has revealed that these documents fail to address important areas of user concern. We do not know whether companies simply do not mention practices they do not engage in or whether they abuse their knowledge about and control over user data by deliberately withholding information. Users would have more trust in a company's Web site if they can learn from a privacy statement not only what the company does with user data but also what it does not do. Thus, when drafting privacy policies, companies should focus not just on their own practices but also take into account the wider context of data handling on the Internet and address practices they do not engage in as well.
The linguistic analysis has shown that companies obscure privacy infringements by downplaying their frequency, mitigating or enhancing questionable practices, and omitting references to themselves when they talk about unethical data handling practices. One cannot safely say whether these rhetoric patterns are merely the chance product of poor writing skills or whether they are a manifestation of strategic ambiguity aimed at deceiving and confusing readers. At any rate, IS managers must be aware of the effects vague language has on readers and should tailor their privacy policies better to Internet users' information needs by representing data handling practices in a more accurate manner.
Changes are needed not only in the content and language of privacy policies but, most importantly, also in their presentation format. Tables would be a more suitable vehicle than narrative text, as they make content deficiencies evident right away and eliminate the problem of ambiguous language altogether. eBay, for example, posts a static chart summarizing parts of its text-based policy as an appendix, but does not exploit it to its fullest potential.
A more effective solution would be to present different types of data (for example, sales data, data from surveys, data from sweepstakes, click-stream data, and so on) and data handling methods (collecting, storing, sharing, selling, sending emails, and so on) in a matrix with each cell being clickable and leading the user to a plain-language explanation of when this data handling practice is carried out for this specific type of data. Chopping the information into manageable chunks and letting users decide which parts they want to read would make privacy policies more reader-friendly. This would, for example, spare users the need to read through the entire document just to check whether a company shares email addresses.
Certainly, the narrative privacy policies cannot be eliminated altogether, as they protect businesses if privacy litigations are brought against them, but more reader-friendly alternatives to conventional privacy policies should be offered to prevent poor writing skills or strategic ambiguity from undermining user trust. It is upon IS managers and system designers to take a proactive stance and let Internet users have the knowledge and control they need to make informed decisions about their personal data.
1. Antón, A.I., Earp, J.B., He, Q., Stufflebeam, W., Bolchini, D., and Jensen, C. The lack of clarity in financial privacy policies and the need for standardization. IEEE Security and Privacy 2, 2 (Mar. 2004), 3645.
5. Earp, J.B., Antón, A.I., Aiman-Smith, L., and Stufflebeam, W.H. Examining Internet privacy policies within the context of user privacy values. IEEE Trans. Engineering Management 52, 2 (May 2005), 227237.
11. Olivero, N., and Lunt, P. Privacy versus willingness to disclose in e-commerce exchanges: The effect of risk awareness on the relative role of trust and control. J. Economic Psychology 25, 2 (Apr. 2004), 243262.
12. Turner, E.C., and Dasgupta, S. Privacy on the Web: An examination of user concerns, technology, and implications for business organizations and individuals. Information Systems Management, (winter 2003), 818.
©2007 ACM 0001-0782/07/0900 $5.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2007 ACM, Inc.
No entries found