Research and Advances
Computing Applications Privacy and security in highly dynamic systems

Personalization in Privacy-Aware Highly Dynamic Systems

Enabling novel ways to personalize the relationship with customers without sacrificing their privacy.
Posted
  1. Introduction
  2. From Anonymous to Personalized Shopping Experience
  3. Risks of Personalization
  4. Extensive Data Collection Leads to a Loss of Control
  5. Today's Privacy Technologies Support Obscurity
  6. Privacy Transparency by Evidence Creation
  7. Secure Logging to Ensure Authenticity of Log Data
  8. Log Views are a Basis for Evidence Creation
  9. Conclusion
  10. References
  11. Authors
  12. Footnotes
  13. Figures
  14. Tables

Fifteen years after Mark Weiser’s inspiring article on ubiquitous computing [11], his vision has become technically feasible. Objects of everyday use are becoming increasingly interconnected and mobile communication involving devices of all sizes and bandwidths is used in various ways. Highly dynamic information systems (HDS) are emerging, bringing new challenges for the management of information systems: having to cope with components that enter and leave the system spontaneously and be autonomous in their actions. The changing and possibly conflicting requirements of the single components must be taken into account, which demands dynamic negotiation of requirements. Moreover, such highly dynamic systems must be able to contend with the constant growth of communicated data rapidly collected and accumulated in various forms.

Solving the challenges of HDS is accompanied by a prospect of economic potential. A first realization is the present rollout of RFID by major retail groups worldwide. Currently, cost savings through process automation are of prime importance but the use of this technology in retailing goes beyond mere productivity improvements. Tagging items with RFID chips in combination with other wireless technologies, equipping customers with mobile communication devices, and using sensor networks allow, for example, personalizing services that have so far been successfully used in client-server e-commerce scenarios [5].

Back to Top

From Anonymous to Personalized Shopping Experience

The Internet has substantially changed the way of personalization. As depicted in Figure 1, three ways of tailoring services to customers can be distinguished. Online retailers use the Internet today on a large scale to recommend products to known customers according to their previous purchases or interests [9]. These personalized services build upon a one-to-one communication channel and require personal data as an essential input factor. Retailers also use the Internet to offer individualized services, which do not require personal data but context data. For instance, the recommendation of products according to the sequence of clicks, pages requested, or items that have been added to the shopping cart. Since such individualized services can be realized without necessarily identifying the customers, they allow an improved shopping experience while also maintaining their anonymity. Another means of tailoring services to customers involves universal services, such as a product search function or having a look at customer reviews that need neither personal nor context data. Even so, these services are a form of personalization because a single customer can choose a service that meets his or her needs at a particular time. All three kinds of services can be part of a personalization strategy with the objective of building up customer relationships, increasing customer satisfaction, generating a `lock-in’ situation, and in the end realizing greater product or service turnover.

Today, consumers are faced with thousands of products in a physical store and must often search a large area to find a particular item. The introduction of highly dynamic systems in stationary retailing enables an electronic one-to-one communication channel and allows the collection of context data comparably cheaply and effectively as in current e-commerce environments. In grocery stores such as the Extra-Future-Store in Germany, computers with a touch screen attached to a shopping cart are deployed as personal shopping assistants (PSA) [4]. Today, these devices are equipped with a bar-code reader and customers can interact with the retailer’s information system over WLAN. Future forms of interaction may include customers using their mobile phones to communicate with RFID-tagged products and the retailer’s information system [10]. Furthermore, sensors embedded in customers’ clothing or products might also become the subject of interactions. Such a technical infrastructure enables the context of each customer to be taken into account, for example the current position within the store or the products in the cart. Combining all this context data in real time with customers’ personal data and profiles already stored in the information system, the retailer can use the electronic interaction channels (PSA, mobile phone) to enrich customers’ shopping experiences.

Imagine a customer equipped with an appropriate mobile communication device entering a store. To find a certain product, the customer can enter its name into the device and have its location displayed. To obtain additional information, for example a list of possible recipes using this product or information about its origin, the customer scans the RFID-tagged article. Retailers are able to provide such universal services to all customers without necessarily taking the differences between each of them into account. Individualized services, however, additionally require data of the customer’s context as an input factor. For instance, a shopping list can be used to optimize the route through the store for time-sensitive or handicapped shoppers. Moreover, special offers or purchasing suggestions can be displayed according to the position of the cart within the store and the products in the cart. The mobile device can also show a running total of current purchases at any time, thereby enabling the customer to control expenditure. Finally, offering personalized services requires personal data such as name, age, purchasing history, or membership in a customer program. By identifying the customer, for example by means of an RFID-tagged customer card, the display can show further items as suggestions based on former purchases. Combining context and personal data is also useful. On the way through the store, special offers can be displayed on the screen according to position and personal needs such as fat-free or whole-food products. In this manner, allergy sufferers can, for instance, be warned about certain ingredients of products. Finally, thanks to personalized automatic checkouts the customer has no need to rummage for cash, pull out cards, or wait in line at the checkout counter [4].


A highly dynamic system is only privacy-aware if it enforces formalized and personalized privacy policies. Such enforcement can be based upon past information (access control mechanisms) or present and derived information (usage control).


Back to Top

Risks of Personalization

Although the economic potential of personalization in stationary retailing seems lucrative for retailers and customers, retail groups have slowed down their activities in this area. While Wal-Mart combined RFID-tagged articles with video surveillance, the German Metro Group tried to establish customer loyalty cards with embedded RFID tags. However, after the sharp criticism of privacy activists, Metro decided to drop the use of RFID tags in cards and Wal-Mart also stopped its RFID-based surveillance1 [2]. If customers were to refuse the processing of context data within the store in general, neither individualized nor personalized services would ever come into being. An analysis of the decisive privacy concerns shows that the loss of control over personal data worries customers. According to a survey of more than 1,000 U.S. customers, two-thirds identified as a major concern the likelihood that RFID would lead to their data being shared with third parties.2

Back to Top

Extensive Data Collection Leads to a Loss of Control

Exploiting sensor networks, RFID identification, automatic video surveillance, localization technologies, and other technologies in HDS undermines the users’ desire to control personal data. Extensive and unobservable data collection is an inherent characteristic of HDS:

  • Data is increasingly being collected without any indication. There will be no red indicator light on each device signaling the recording of data [3].
  • Data collection takes place without any pre-defined purpose, for example, the shopping cart continuously defines and reports its position to the retailers’ information system. This information can be used for optimizing the store arrangement, for generating purchase suggestions as well as for identifying the customer.
  • Data once collected will be persistent and not deleted due to the continuously decreasing cost of data storage.
  • Different devices record each event simultaneously from different viewpoints, for example, a customer browsing a product is recognized by the smart shelf as well as by the video surveillance or the shopping cart. The combination of these different views allows, together with further context data, recognition, or even identification of the customer.
  • Recording devices register multiple events simultaneously, for example, video surveillance can record customer A browsing a certain product, customer B passing the corridor, and customer C stealing an item. The interpretation of the logged raw data for various purposes and the extraction of single events make the assignment of a valid privacy policy in some cases impossible.

The realization of a HDS leads to a paradigm shift of data collection and facilitates the relation of context data to individuals. The borderline between context data and personal data increasingly vanishes.

Back to Top

Today’s Privacy Technologies Support Obscurity

The inherent data collection in HDS obliterates present-day privacy-enhancing technologies [3] because they are all based on concealing data—a privacy approach referred to as `obscurity’ throughout this article. Today’s privacy mechanisms are incompatible with the objective of any retailer to provide both: personalization with useful services and assured privacy as well as security.

A classification of privacy mechanisms is given in the table “Privacy and Transparency.” In the horizontal columns, the mechanisms are classified according to what they control: access or usage. While access control is usually understood as ex ante defined authentication and authorization, usage control extends access control and encompasses all those mechanisms that actually deal with the runtime detection of privacy violations. In the vertical columns, guidelines, mechanisms, and approaches for privacy are distinguished according to whether they enable all three forms of personalization.

Anonymity, for example, prevents personalized services that require identification of the customer. Pseudonyms and identity management, as the most favored solutions of science and industry, allow personalized services. Both privacy mechanisms follow the obscurity approach and rely on controlled disclosure of data, reducing such a disclosure to the minimum necessary to perform a given transaction. As a result, personalization is limited to the amount of disclosed data. However, the extensive and unobservable collection of context data for providing individualized services already allows the recognition of customers. This is because transactions are part of a chained process: filling the shopping cart, walking through the aisles, scanning products, and payment.

Obscurity, as a privacy approach for personalization in HDS, is inadequate. Once the access to data is granted, there is no control for customers as to how data is used—irrespective of the retailer’s initial intention. Proof of being an “honest” retailer acting according to data protection laws and the declared privacy policy can be produced by making data storage and data usage transparent. Different institutions providing a first step to transparency already exist: certification authorities, trusted third parties, privacy seals, codes of conduct, or privacy policies are implemented as a predefined agreement regarding the data usage. A promising approach is to supply tools to define individualized privacy and security policies and languages to express it. Currently, the most favored language for expressing privacy policies is P3P, the Platform for Privacy Preferences. P3P uses XML specifications that state: what kind of data is to be stored; how data is to be used; and its permanence and visibility, that is, how long data is to be stored and the corresponding access rights. Customers, admittedly, can express their desires but are not able to control the usage of their data. On the retailer’s side, the rules for access are derived from the specified and possibly individualized privacy policies, for example by translating a valid P3P policy into EPAL (Enterprise Privacy Authorization Language), a formal language to express fine-grained enterprise privacy policies.

A highly dynamic system is only privacy-aware if it enforces formalized and personalized privacy policies. Such enforcement can be based upon past information (access control mechanisms) or present and derived information (usage control). Enforcement can be achieved by an information system that has been proven to fulfill the desired properties, in particular self-limitation, and can expect to gain customers’ trust by the resultant transparent access to personal data.

However, the characteristics of HDS restrict the effectiveness of formulated policies with regard to their adaptation. On the one hand, the autonomous components mean an increasing complexity for modeling the system and hinder the proof of their behavior. On the other hand, the changing manner of data collection rules out the assignment of a formulated privacy policy to personal data required for enforcing formulated policies: for example, data collected outside the scope of a formulated policy, data collected by multiple devices is not integrated and related to a policy in real time, and data collected describing different events inherently interwoven may lead to conflicting policies. Technically, research could pursue the development of an adaptive `P3P’ or the control of the actual usage of data. First efforts try to prevent an unintended usage of data in real time as pursued, for example, by Park and Sandhu [6] or the article by Pretschner et al. in this section.

Back to Top

Privacy Transparency by Evidence Creation

Instead of seeking an ex ante approach to privacy transparency, we introduce the concept of privacy evidence for ex post control of privacy policies. Transparency in HDS is provided by a cooperative mode between technology for detection and enforceable privacy contracts. The enforcement of privacy contracts requires for all involved parties the possibility to detect privacy violations—for example, by means of audit—and document in a way that is acceptable as evidence, such as in a legal dispute. As depicted in Figure 2, the creation of evidence depends on: policies as reference for a compliant usage of data and log views that encompass all data about an individual stored in an information system.

Today’s state of the art for contract representation is P3P. However, P3P cannot express composed privacy policies, in particular policies involving multiple, hierarchical departments or enterprises. These limitations are repaired by NAPS, the Novel Algebraic Privacy Specification [7]. Analogously to P3P, NAPS offers conjunction, composition, and scoping operators for policies, but exhibits desirable algebraic properties. This extension is relevant in HDS because it allows a distributed evaluation of composed policies. Although a practical realization is not yet available, NAPS demonstrates that there is no lack of expressive power regarding the representation of contracts. In other words, we can—at least theoretically—adequately represent contracts.

Back to Top

Secure Logging to Ensure Authenticity of Log Data

Log views are the second requisite for creating privacy evidence generated from log data. However, standard logging mechanisms—such as syslog or syslog-ng—cannot be used for evidence creation, as they fail to ensure the necessary authenticity guarantees of log data. To provide such guarantees, secure logging is required and the central question concerns the characteristics log data must display to be accepted as evidence.

Authenticity of log data means: confidentiality—log entries cannot be visualized or accessed by unauthorized individuals; integrity—the log entries are accurate (entries have not been modified), complete (entries have not been deleted), and compact (entries have not been illegally added to the log file); and uniqueness—log data shall not allow for parallel realities. To realize these properties, proposals such as reliable syslog or Schneier/Kelsey’s [8] are the only conceptual guidelines available today. Based on these existing guidelines, we develop a secure logging protocol to ensure authenticity of log data in a way suitable for evidence creation.

Figure 3 illustrates how secure logging is realized, whereas its details and extension for remote collection of log data are found in [1]. To achieve this, standard cryptographic techniques are employed. Evolving cryptographic keys—hereby denoted by S—ensure not only confidentiality, but also forward integrity, meaning if an attacker succeeds in taking over a logging device at time t, all the log data stored before t cannot be compromised. Hash chains, denoted by HC, guarantee integrity by creating interdependencies between entries. As a side effect, hash chains also provide tamper evidence and uniqueness guarantees of log data. Finally, entry-level access rights, denoted by AR, provide a way of controlling who has access to the log data. These access rights could be derived, for example, from a user’s privacy policies.

Back to Top

Log Views are a Basis for Evidence Creation

Albeit essential, secure logging is not enough to create evidence: views on logged data—conceptually similar to database views—are required but still not available. Log views are compilations of log entries encompassing all data collected about a user. In the case where log data can be directly assigned to a user and the related policies, generating log views can be tackled without further effort. For instance, in a P3P/EPAL setting, where the recorded data and the corresponding policies are stored together, a log view is just a query on log file parameterized by the user identification.

However, there are cases where this assignment is not directly possible or only within a certain degree of probability. In HDS there is a large variety of events that are recorded as isolated pieces of information without any explicit relationship with the surrounding context. Moreover, HDS follow unspecified, unforeseen, and sometimes even chaotic patterns. This complicates the automated generation of precise log views. Techniques to generate log views include guessing particular situations and measuring their plausibility against known facts, as well as extensive data mining to search for specific patterns in recorded data. The results can, in some cases, doubtlessly be associated with the corresponding customer, but in the majority of cases a probabilistic estimation is the best one can get. Current efforts such as the Web Ontology Language provide an accurate description of context data and processes and could lead to more precise log views.

The completeness of evidence generated based on log views remains an unresolved issue. In particular, it is currently impossible to exclude the existence of “shadow” log files hidden from a user. For example, “covert channels” within the system could redirect data to secondary log files not considered when generating log views. While trusted computing platforms could be used to attest the behavior of a data collector, guaranteeing completeness in HDS is even more challenging because of the inherent extensive and unobservable data collection. In consequence, regulatory institutions such as certification standards or legal advisory boards may be the only solution.

Back to Top

Conclusion

Highly dynamic systems enable several novel ways to personalize the relationship with the customer in stationary retailing. For this, the extensive collection and use of personal and context data are essential, but inherently raise privacy concerns: customers increasingly lose control over and awareness about what data is captured or how it is used. To a great extent, concerns of this kind considerably undermine the success of future personalization strategies. In highly dynamic systems, transparency with regard to the utilization of data is a reasonable way to maintain privacy. The concept of privacy evidence we discussed in this article is an initial step in this direction, as it permits an objective view into the data collected about a customer. Evidence could be used as a “sword” for the customer to incriminate in the case of a misuse, or as a “shield” for the retailer to absolve in the case of a privacy-compliant usage. Privacy evidence paves not only the way to transparency, but also to an acceptable deployment of highly dynamic systems.

Back to Top

Back to Top

Back to Top

Back to Top

Figures

F1 Figure 1. The personalization pyramid.

F2 Figure 2. Privacy evidence creation.

F3 Figure 3. Realization of secure logging.

Back to Top

Tables

UT1 Table. Privacy and transparency.

Back to top

    1. Accorsi, R. On the relationship of privacy and secure remote logging in dynamic systems. In S. Fisher-Hübner et al., Eds., Proceedings of the IFIP International Federation for Information Processing, Volume 201, Security and Privacy in Dynamic Environments, Springer-Verlag, 2006, pp. 329–338.

    2. Köpsell, S., Wendolsky, R., and Federath, H. Revocable anonymity. In G. Müller, Ed. ETRICS 2006, Lecture Notes in Computer Science 3995, Springer-Verlag, 2006.

    3. Langheinrich, M. Personal privacy in ubiquitous computing—Tools and system support. Ph.D. dissertation, ETH Zurich, Switzerland, May 2005.

    4. Litfin, T. and Wolfram, G. New automated checkout systems. In M. Krafft and M.K. Mantrala, Eds., Retailing in the 21st Century: Current and Future Trends, 2006.

    5. Murthi, B.P.S. and Sarkar, S. The role of the management sciences in research on personalization. Management Science 49, 10 (Oct. 2003).

    6. Park, J. and Sandhu, R. The UCONABC usage control model. ACM Transactions on Information and System Security 7, 1 (Feb. 2004).

    7. Raub, D. and Steinwandt, R. An algebra for enterprise privacy policies closed under composition and conjunction. In G. Müller, Ed. ETRICS 2006, Lecture Notes in Computer Science 3995, Springer-Verlag, 2006.

    8. Schneier, B. and Kelsey, J. Security audit logs to support computer forensics. ACM Transactions on Information and System Security 2, 2 (May 1999), 159–176.

    9. Srikumar, K. and Bhasker, B. Personalised recommendations in e-commerce. Int. J. Electronic Business 3, 1 (2005).

    10. Strüker, J. and Sackmann, S. New forms of customer communication: Concepts and pilot projects. In Proceedings of the Americas Conference on Information Systems (AMCIS `04), (Aug. 2004, New York).

    11. Weiser, M. The computer for the 21st century. Scientific American (Sept. 1991).

    12. Wohlgemuth, S. and Müller, G. Privacy with delegation of rights by identity management. In G. Müller, Ed. ETRICS 2006, Lecture Notes in Computer Science Volume 3995, Springer-Verlag, 2006.

    1Chicago Sun-Times. Chipping away at your privacy (Nov. 9, 2003).

    2RFID and Consumers: Understanding Their Mindset. Commissioned by Capgemini and the National Retail Federation; www.nrf.com/download/NewRFID_NRF.pdf.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More