The potential of the Web portal market and its technology has attracted some of the biggest computer and software firms, including IBM, Oracle, and Microsoft. It has inspired the mutation of search engines (such as Yahoo!) and the establishment of new vendors (such as Hummingbird and Brio). Yet the meaning of "portal" is not well defined and its use, even within the industry, remains problematic. Originally coined to describe Web-based applications (such as Yahoo!, Google, and Infoseek) that provide organized access to the resources of the Internet through search engines and lists of Web sites, the term portal has been applied to systems that differ widely in capabilities and complexityfrom static Web pages providing links to resources on a given topic to interorganizational systems providing access to multiple heterogeneous data sources and applications.
To facilitate discussion about portals and the promise of its technology, I propose instead a definition that distinguishes portals from all other types of information systems and a General Portal Model (GPM) for identifying and organizing the basic services portals provide.
I recently considered 17 definitions of portal and classes of portals published in 1999 and 2000 during the peak of the business media hype over the concept, including in trade journals , consulting firm reports [9, 12], and vendors [1, 5, 6, 11]. Several sources of confusion were apparent. The use of such phrases as "provides access to applications, data, and people"  does not distinguish a portal from a well-designed desktop user interface, even if other features (such as the ability to customize) are included. Portals provide access not to people but to applications (such as email, calendaring, and Web-based front ends to databases). This was recognized in some definitions by way of such phrases as "an amalgamation of software applications"  and "a collection of related functional components" .
The portal concept was not always distinguished from specific implementations. Many definitions focused on the applications being provided or the intended markets [2, 10], while several listed specific kinds of applications a portal would be likely to provide. However, it is impractical to compare portal products on the basis of application suite or market, since a given product may support many configurations of features. The lack of a one-size-fits-all portal definition is demonstrated by the ease with which products like Yahoo! and AOL regularly add new features. The focus of an implementation can be conveyed using existing frameworks (such as the value chain) and competitive strategy.
Words like Internet and Web [1, 5, 9] were also frequently part of the portal definitions. For providing widespread access to information, HTTP/HTML makes sense because Web browsers are ubiquitous. However, access rests on network services, as well as on a better Internet that may emerge. The portal concept should not be limited to the Internet (TCP/IP) or to services using the Internet.
Finally, though the portal definitions generally included the concept of a single point of access for users, along with personalization and better decision making, all were essentially vendor-centric. They assumed the existence of a target audience for the content the vendor provides either as an employer or in exchange for subscription or advertising revenue linked to visits to specific Web sites. When discussing portals developed to meet user needs, a user-centric orientation would be useful.
I define portal as an infrastructure providing secure, customizable, personalizable, integrated access to dynamic content from a variety of sources, in a variety of source formats, wherever it is needed. Except for "wherever it is needed," these qualities are found in existing portal products, and the definition is similar to the one proposed in . However, it is not specific about how or why these services are provided or why or to whom. Nor does it assume a particular network or service infrastructure (such as the Internet or the Web). Most important, it does not limit a portal to a specific set of applications. Nevertheless, it can help derive a set of features and capabilities that would distinguish portal from any other type of application.
Calls for customization and personalization in terms of content and presentation, as well as references, by a variety of partners within and outside the organization [46, 9, 12] imply two features for identifying data sources and destinations:
The ability to dynamically interact with applications [1, 5, 12], not just to display data, involves coordinating data exchange between applications. Although such interactions could be a property of the applications themselves, they imply the existence of a way to maintain a connection between applications so the server can be polled periodically for updates or send the updates to the client.
Though the portal definitions generally included the concept of a single point of access for users, along with personalization and better decision making, all were essentially vendor-centric.
The work of portals bears similarities to the services provided by TCP, IP, and the Domain Name Service enabling the worldwide interaction of applications. This universal compatibility and accessibility suggests that portals could be organized in the same way networks are organized, using models (such as the Internet and OSI Models) or formally in layers, with defined services and interfaces at each layer.
The figure here outlines a software infrastructure consisting of three groups of layers: presentation, resource access, and network access. The layers are implemented as processes, or the parts of a protocol stack that create each node of the portal infrastructure. This approach to portal architecture is similar to the arrangement of agent functions in the Infosleuth project (at Microelectronics and Computer Technology Corp. in Austin, TX) into a user layer, a planning and temporal layer, a query and analysis layer, and a resource layer.
The portal itself consists of software nodes that facilitate interaction among applications, whether they function as clients, servers, or peers. Node is a logical concept, and the framework implies logical, not physical, relationships among the processes implementing the layers. A process need not be implemented by a single piece of software, reside on the same hardware, or belong to the same organization as other processes.
Process Interface Layers. These two layers are concerned with portal access by applications functioning as clients, servers, or peers. Application-to-application connections might consist of anything from file transfers returning query results to interaction with a remotely served application, thus implementing Application Service Provision. Applications need not support interaction directly with humans, as with Web browsers, but could be any process that locates any other process and cooperates through the portal.
Process Identification. This layer insulates processes that use the portal from the mechanism of data exchange, appearing to provide a point-to-point connection between one or more receiving processes and one or more sending processes. Implemented here is functionality, possibly linked to individual users, governing access authorization, portal content, and presentation schemata. Since the layer is aware of all process identities, it could also facilitate charges for access, logging, and performance monitoring.
Transformation. This layer uses content and presentation schemata to ensure that data presented to processes by the portal and vice versa is compatible in terms of type, size, and format. Some data elements may represent aggregations, filtering, and other transformations of more basic elements implemented here. In the simplest case (such as remote application access through pop-up windows) no transformation may be needed. If it were needed, client and server nodes would have to agree on which node would perform it. This layer also implements rules describing what to do if content cannot be presented in the needed format.
Resource Discovery Layers. These three layers are the heart of the portal's functionality, resolving application-based descriptions of resource requirements and availability to common domain- or application-based metadata, finding resources that satisfy requirements, and coordinating interactions among nodes.
Resource Identification. Nodes cooperate at this layer to determine how to satisfy resource requirements. The layer itself implements a common understanding of resource capabilities distinct from how the layers above use them and how they are obtained by the layers below. Given a set of resource descriptions in the content schema, it resolves them to a common metadata format used by all nodes in the portal. The participating nodes must agree on a common metadata format to which each maps the description of the resources it requests or serves. Since the applications provided by a portal could range from online weather reports to accessing specific data marts, each of them must adhere to applicable domain- or application-based metadata standards (such as the metadata standard for the Unified Climate Access Network).
Resource Location. This layer coordinates the work of the ensemble of resources whose descriptions were resolved by the Resource Identification Layer. Given a metadata description of resources, possibly pertaining to several domains, it determines the addresses of the resources. However, it is ignorant of the relationship between the data being sought and the data's ultimate use. It merely uses the metadata to match the description of what is required with descriptions of what is available at other nodes.
Directory services and search engines are implemented at this layer. At the client, the process could work sequentially through a list of valid sources or pass descriptions of needs to known servers simultaneously. When potential sources are identified, the process uses its own criteria for choosing, say, reliability, what is available, or what is simply first to return. To satisfy the requirements of a given schema, resources from multiple sources may be needed.
If a node acting as a server cannot fulfill a given request, it may return a failure code. However, it may instead assume a client role and pass it on to several other nodes, as in the case of Gnutella, a popular file search and transfer service. Thus, the possibility of a "recursive" portal emerges at the Resource Location Layer. If a resource is located, the server might then either put the client in contact directly with the appropriate server or act as a go-between.
Resource Binding. This layer governs a connection between a single server resource and a single client, keeping the interaction separate from any others. A separate connection would be useful if, for example, server applications pushed data through the portal at different time intervals. Data could be buffered until called on by the Resource Location Layer. One could envision a portal activated with several resource bindings quietly polling servers for updates or receiving updates pushed by the servers until a process at a higher layer issues a call for an update.
Network Interface Layers. These two layers conceal the details of network access from the Resource Discovery Layers so they do not have to assume or even know anything about the network. General security and network access instructions from the layers above must also be resolved to implementation-specific instructions by these layers.
Security. This layer is the gatekeeper between the portal processes and the network. At all nodes, it verifies the identification of cooperating processes and their authorization to access resources, performing firewall functions for the portal. Verification is critical for minimizing the possibility that server nodes provide services or reveal details about the resources they provide to unauthorized clients.
Network Access. This layer creates a channel between process nodes and formats data for transmission throughout the network. If, for example, a node were implemented above the Internet, this layer would initiate socket creation and govern socket use. Its objective is to make the portal independent of any particular network architecture, including even that of the Internet.
The proposed GPM infrastructure described earlier would support creation of a community of applications to serve the needs of a particular user, accommodating different models of interaction and architectures (such as the centralized index of resources typified by Napster and the peer-to-peer architecture of Gnutella).
The boundaries of a particular GPM implementation might be set at the boundaries of the owning organization or include suppliers and customers. One could imagine portal nodes supporting Web-based client applications throughout an enterprise and beyond, allowing users to select from a catalog of services, each governed by a portal node residing on one of a number of machines hosting a number of applications (such as email, calendaring, weather, and access to real-time production data or to data marts).
The most challenging problems associated with implementing the model involve the Resource Discovery Layers. However, XML-based protocols that might implement or serve as models for these layers exist today, including: the Resource Description Framework (RDF), a metadata framework for describing and querying any Internet resource; the RDF Site Summary, a description of content available for distribution as a Web feed; and the Universal Description, Discovery, and Integration, a registry allowing organizations to find one another through the Web by searching for features (such as name, product, location, and Web services offered). The Lightweight Directory Access Protocol is another promising model.
Unlike the vendor-controlled, centralized portal models commonly implemented today, the GPM would enable users to draw content from any willing provider and use it however they wish. For users who are members of multiple communities (such as a university, a city, a home-owners' association, a given news feed, or a particular interest group) the model would support creation of a single point of access. Vendors would cease to be providers of Web pages and instead become pure content providers. However, this might trouble vendors (such as conventional media providers) whose revenue depends on clicks through banner or pop-up advertisements. Complete user control of content on their nodes also increases the potential for unauthorized syndication of content.
The proposed portal definition accommodates the products and technologies of major vendors while excluding exclusive features of any particular implementation. The GPM aims to identify and organize basic infrastructure capabilities. I hope discussion of portals will thus bend away from marketing classifications and toward consideration of underlying technical capabilities and development of standards enabling ubiquitous individualized application access regardless of applications served or client nature. This new view would supplement research on portal implementation based on other frameworks  and classifications  to help identify the capabilities needed to serve specific markets.
Next steps include: describing the operations of existing portal products in terms of the GPM framework; prototyping portal implementations based on the framework; and identifying existing protocols that can be modified to implement the services described at each layer. The GPM could also serve as a model for organizing other resource-discovery applications (such as automated logistics and supply-chain management).
A common portal definition and framework would encourage development of interoperable components based on standard services and interfaces. They would increase the number of options for IS sourcing and improve the environment supporting Application Service Provision. Achieving the same kind of standardization benefits that have enabled development of the Internet, IS managers and end users might thus be able to assemble suites of applications for their particular needs from components provided by specialized vendors instead of having to buy a suite of packages from just one .
©2004 ACM 0001-0782/04/1000 $5.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2004 ACM, Inc.