Architecture and Hardware Personal information management

Email in Personal Information Management

Email's conduit function means the inbox, folders, search, and sort are used to support core PIM functions of task management, personal archiving, and contact management.

By Steve Whittaker, Victoria Bellotti, and Jacek Gwizdka

Posted Jan 1 2006

Introduction
Email As Unifying Application
Making Email Data Accessible for PIM
Interpersonal Information Management
The Future
References
Authors
Figures

For many of us, work is interpersonal rather than solitary, and email is the main conduit through which that work and its related information are distributed [1]. We tend to live in our email, as reflected in the amount of time we spend using it and our evaluation of its importance in everyday work. Email’s role as conduit naturally leads to it being used for three key functions in personal information management (PIM): task management, personal archiving, and contact management.

Task management involves reminding ourselves of current tasks, tracking task status, and maintaining relevant information. Email’s conduit function leads many of us to exploit our inbox for task management. We leave information about current tasks there, knowing that when we open it and scan its contents, we’ll be reminded about outstanding tasks [7, 12]. We even send ourselves email to put messages in our inbox as reminders and perhaps as links to useful information. Some of us also organize email relating to current tasks into active folders, returning to them as needed [1].

Figure.

We also use email for personal archiving. Reference information delivered through email or information about completed tasks often ends up in email folders for future use [1, 12]. And for many of us, because email is our primary work conduit, it is also natural that we use it to store contact information [11].

The original email applications that emerged some 20 years ago were not designed for PIM, leading to a lack of direct support for PIM functions when we use email for PIM activities. For example, we may schedule meetings and appointments using email, but email itself does not provide dedicated support for calendaring functions.

Two information access problems—fragmentation and lack of direct support for PIM functions—arise from performing PIM functions in email. Fragmentation results when information delivered through email is left there rather than relocated to dedicated PIM applications (such as contact managers, calendars, to-do lists, and a user’s personal file system). Information may be left in email due either to the effort involved in relocating it to a separate application or to the feeling that it is more meaningful and accessible in email. For example, the sender—often the most salient retrieval cue for an email attachment—is unavailable if the attachment is accessed through the user’s file system [1]. By attempting to retain such contextual information, we may end up duplicating folder hierarchies, leading related documents to be stored partially in email and partially in the file system, making it difficult to collate information [3].

Two architectural and text-processing techniques—centralization and information extraction—have been proposed by interface researchers at PARC [1] and the University of Sheffield [11] and by information retrieval researchers at Microsoft [4] to address these problems. Centralization addresses fragmentation by locating all PIM in email and provides direct PIM support by explicitly building PIM functions into email. For example, Microsoft Outlook applies this approach to provide task management, contact management, and calendaring within a single application. Information extraction takes the opposite view, looking to migrate PIM functions and information from email into dedicated applications to provide direct PIM support. It addresses fragmentation by making email data accessible to those applications (see the article by Karger and Jones in this section).

Email As Unifying Application

One weakness of the centralization approach is that current email clients do not handle all PIM functions well [1, 12]. How then must an email application be modified to explicitly support the core PIM functions of task management, personal archiving, and contact management?

Task management. Although many of us leave task-related information in our inboxes as a way to manage tasks, this approach does not scale well when we receive lots of messages. The disorganized collection of messages that accumulate in an inbox decreases the salience and accessibility of individual messages, which are often pushed out of sight by incoming items. The alternative strategy of placing messages in active folders has the advantage of grouping messages so they can be worked on together more efficiently and coherently. But it works only if we develop the habit of returning to inspect our folders, as most of us do with our inbox.

Yet another way to support task management is to classify messages by task in the inbox itself. Classifying inbox messages makes it easier to process tasks; related items can be collapsed into a single list item seen every time we access the inbox. This reduces overall inbox clutter, increasing task salience and improving how the application reminds us about current tasks. A number of visualization applications have been developed to represent inbox tasks, including tree representations and flat representations of information related to specific tasks [5, 9, 10]. However, one notable limitation of these approaches is their reliance on threads to determine whether messages relate to a common task. Threads are known as a weak indicator of tasks due to topic drift and email responding practices. For this reason, the authors of [1] developed the idea of “thrasks,” or user-customizable collections based on threads. Users can add unthreaded items to the collection or remove them, so a thrask represents a task collection more than just a series of messages.

Although search has been proposed as a solution to PIM [4], it represents only a partial solution to task management. Search can be effective for accessing information already identified as relevant to a given task. It cannot serve to remind the user about that task, as reminding is an extrinsic, rather than a user-initiated, process. It may indeed turn out that to effectively support reminding, we need new automatic methods built into email to detect and highlight critical tasks [6].

Personal archiving. Email is an important information repository for personal information, but archiving can be problematic. Users apply three main strategies—folders (containing manually classified messages), search, and sort—for accessing archived information. Manual classification into folders is primarily intended to organize information to make it more accessible later. But manual classification is a cognitively difficult task requiring users to be able to predict future usage contexts. As a result, users are often inconsistent in their classifications and may also forget the existence of their own long-term folders. A given folder may end up containing very different messages, or duplicate folders may end up containing very similar materials [12]. This situation is exacerbated when users update their folder definitions or add new folders in response to changes in their job responsibilities.

One proposed solution is assisted filing in which machine learning techniques are used to analyze message headers and content, derive folder definitions, and make recommendations to users about how they might categorize incoming inbox documents [8]. Although assisted filing has been shown to be effective in user tests, it works only if users have already created folders, and not all users do so. Moreover, for those who do create folders, assisted filing cannot identify and create new folders.

These difficulties have led some users to stop creating folders. Instead, they attempt to finesse the filing problem by relying on search or sorting, using message headers to access long-term buried email items. But search and sorting have limitations, too. Sorting by sender or date exploits users’ ability to remember partial information about a message, but access through sorting is an indirect way of finding information. For users who rely on search, new tools (such as Stuff I’ve Seen [4], Gmail, and Google Desktop) are improvements over earlier email and desktop search tools that greatly facilitate access to archives. They partially address fragmentation by accessing information from email and from the file system. Nevertheless, defining a search query can be as difficult as classifying information in folders. And by not creating folders, sorting and search lead to inbox clutter, reducing the effectiveness of task management.

Contact management. Managing names and addresses associated with key contacts is another important PIM task. While most email systems can be customized to automatically extract email addresses into the address book, other information (such as phone numbers and physical addresses) must be extracted manually from messages—a tedious and error-prone process. But lots of information can be automatically extracted from email; for example, important contacts can be identified automatically through message-header information (such as frequency, longevity of communication, and likelihood of response) [10, 11]. Having identified contacts, it should also be possible for machine learning to automatically extract additional information from, say, signature files and Web pages that could then be used to populate contact address fields.

Making Email Data Accessible for PIM

Data extraction takes the opposite approach from centralization, aiming instead to extract information from email and make it accessible to dedicated PIM applications (such as contact managers and to-do lists). But can we replace email and resituate PIM in these dedicated applications? (See the article by Karger and Jones in this section for a look at user-centric data-extraction techniques.)

Users make little use of dedicated task-management tools (such as online to-do lists and workflow) [2]. Email’s role as de facto task manager arises in large part out of its role as information conduit. Users know they frequently access email to process new messages, exploiting this access to facilitate opportunistic reminding about outstanding tasks and identifying new, as yet undefined, tasks that may appear. But such opportunistic reminding is unlikely to occur with a dedicated task manager, because users must consciously remember to access the task manager, and new tasks must be identified and entered into it. So it seems highly unlikely that users will abandon email for dedicated task-management tools because such tools fail to support the important reminding aspects of task management.

Data extraction is far more likely to be useful for personal archiving and contact management because these functions are not as closely tied to the conduit function of email applications. But email offers significant benefits for both functions, making it unlikely that users will abandon it in favor of their contact managers or file systems. In particular, email provides important contextual information that may be lost when attachments or contacts are extracted from their original email context and integrated into dedicated PIM applications.

Users trying to retrieve archival or contact information first delivered in email often use associative reminding based on indirect social and temporal cues (such as sender, recipient, and date of the message) [1, 11]. Users exploit these cues by accessing email folders and their inbox, then sorting to view by sender, date, or a combination of sender and date to triangulate retrieval. The content of the email message is also an important cue for users trying to locate information. The salient cue for retrieval may be a keyword for a topic to which the contact or attachment is related. The message itself may contain explanatory information that assists in making sense of the contact or attachment [1]. The conduit function of email means that useful information is often first encountered in email—suggesting that users frequently want to relocate it in the original context (perhaps an entire thread of email) rather than through their file system or contact manager. This need for context potentially compromises the simplicity of the data extraction approach.

Both centralization and data extraction offer distinct benefits over the current situation—email used for functions it wasn’t designed for and information about a single task possibly spread across multiple PIM applications. But neither offers a complete solution to email and PIM problems. A combination of centralization and data extraction is needed to improve dedicated support for PIM within email itself, as well as to improve data extraction from email into other PIM applications.

Interpersonal Information Management

In addition to being a critical site for PIM, email presents a more complex set of problems than other PIM applications. One key difference between email and other aspects of PIM is that email is interpersonal, serving as a conduit for tasks involving two or more people [1, 12]. Email involves group information management (see the article by Erickson in this section); email information originates from and is also owed to our colleagues, who have expectations about what we’ll do with it. In contrast, other PIM tasks (such as information seeking and archiving) involve managing self-generated or self-discovered information, which does not usually require a response. Email information is more complex and time-consuming to process for three main reasons:

Affect on others. Email processing decisions have direct implications for other people’s work. Email is a work conduit, so failure to respond appropriately to a message may directly jeopardize someone’s work somewhere. Conversely, interdependent tasks are often subject to delays due to waiting for a response from other people with different priorities. Such delays can leave messages languishing in the inbox (or, less often, in actionable folders) for extended periods, often drifting out of sight and mind. Users must track both obligations and message status for email information;

Constant processing. Failure to respond quickly to colleagues’ messages could compromise their work. And the typically enormous volume of messages we receive in work settings every day means that failure to deal with incoming messages can lead to a backlog of unopened and unresponded-to messages in the inbox, thus compromising its task management function. The pressure is unrelenting, as new messages constantly demand personal attention and processing. In contrast, filing personally generated digital files, contact addresses, and discovered Web resources tends to be at the user’s own discretion, with fewer externally imposed delays or deadlines; and

Lack of context. Email information may lack adequate context, making it more difficult to process. Much personal information is self-generated or self-discovered, arising in the context of specific user goals and interests. In contrast, email messages may not directly relate to a particular user’s goals or interests, being generated by others with their own objectives. Lack of context makes it more difficult to react appropriately to a message, judge its value, or place it in the right category.

The Future

We have argued that email is the critical PIM application, exploring two technical approaches—centralization and data extraction—to address the related challenges of using it to process, capture, store, and retrieve personal information. However, new developments in desktop search, machine learning, and text processing are beginning to generate distinct new possibilities. The improving ability of systems to analyze text and perform ever more powerful search functions is likely to lead to some profound changes in the task-management aspects of PIM, including:

Providing organization at the task level, rather than for individual messages;
Anticipating the importance of email and ordering email messages accordingly;
Detecting obligations and message urgency;
Providing visualizations that allow users to view and organize information from multiple related messages; and
Proposing actions based on email and making them easier to initiate.

These potentially useful email-supported PIM functions give rise to important issues that must be addressed through interaction design, particularly when introducing automated processes into such a critical application—where the cost is high for algorithmic error (without human oversight).

At the same time, we expect little change in other more familiar aspects of email:

List views, because they are convenient for viewing, archiving, and sorting;
Attachments, because messages often concern discussion of and work around other content;
Folders, search, and sort, because, even if the system helps, users still need more than one way to find something; and
Information overload, because email is a ready means of sharing one message with many people, as more and more collaborative work processes are moved online.

Other major new developments in email information handling concern search-based email (Gmail), desktop searches [4] and Google Desktop. Unlike earlier desktop and email search tools, these systems are much faster and operate across entire archives and applications. Fast, efficient search will surely improve access to email archives and contact information. However, it will not tackle task management, as it does not support reminding, nor will it reduce inbox clutter the way filing and folders do for some users.

A final issue in email’s emerging role in PIM is its relation to other communication and collaboration technologies that may partially replace certain email uses. Some organizations now routinely use instant messaging for quick conversations that used to take place through email. Others rely on blogs or wikis to distribute and comment on public information, rather than on email attachments. One important advantage of these alternatives is their ability to siphon information from email—reducing the overload problem. At the same time, however, they introduce issues of intrusiveness and notification not experienced with email.

Whichever approach to supporting the PIMemail connection is ultimately widely accepted, email will continue to evolve. Flexibility should therefore be the key characteristic of any solution we create.

Figures

Figure.

Submit an Article to CACM

CACM welcomes unsolicited submissions on topics of relevance and value to the computing community.

You Just Read

Email in Personal Information Management

View in the ACM Digital Library

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

DOI

10.1145/1107458.1107494

January 2006 Issue

Published: January 1, 2006

Vol. 49 No. 1

Pages: 68-73

Table of Contents

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Explore More

News Apr 18 2024

Keeping AI Out of Elections

Bennie Mols

Artificial Intelligence and Machine Learning

BLOG@CACM Apr 17 2024

Technical Marvels

Herbert Bruderer

Computer History

BLOG@CACM Apr 16 2024

The Value of Data in Embodied Artificial Intelligence

Shaoshan Liu

Artificial Intelligence and Machine Learning

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More

Email As Unifying Application

Making Email Data Accessible for PIM

Interpersonal Information Management

The Future

Figures

Email in Personal Information Management

DOI

January 2006 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.