Research and Advances Digital government
Jan 1 2003
Bistro: a Scalable and Secure Data Transfer Service For Digital Government Applications
Government at all levels is a major collector and provider of data.Our project focuses on the collection of data over wide-area networks (WANs) and addresses the scalability issues that arise in the context of Internet-based massive data collection applications. Furthermore, security, due to the need for privacy and integrity of the data, is a central issue for data collection applications that use a public infrastructure such as the Internet. Numerous digital government applications require data collection over WANs [5].One compelling example of such an application is the Internal Revenue Service's electronic submission of income tax forms. Other digital government applications include collecting census data, federal statistics, and surveys; gathering and tallying of electronic votes; collecting crime data for the U.S. Justice department; collecting data from sensors for disaster response applications; collecting data from geological surveys; collecting electronic filings of patents, permits, and securities (for SEC) applications; grant proposals and contract bids submissions; and so on. All these applications have scalability and security needs in common.The poor performance that may be experienced by current digital government users, given the existing state of technology (as in Figure 1a), is largely due to how (independent) data transfers using TCP/IP work over the Internet. TCP/IP is good at equally sharing bandwidth between data streams, which in large-scale applications can lead to poor performance for individual clients (as they receive only a very small share of this bandwidth). Given that TCP/IP is here to stay for the foreseeable future, what is needed is a scalable yet cost- effective solution that can be easily deployed over the existing Internet technology.We are designing and developing a system called Bistro, which addresses the scalability needs of digital government data collection applications while allowing them to share the same infrastructure and resources efficiently, cost-effectively, and securely [1]. Bistro's basic approach is to introduce intermediate hosts---bistros---which allow replacement of a traditionally "synchronized client push" approach with a "nonsynchronized combination of client-push and server-pull" approach (as depicted in Figure 1b). This in turn allows spreading of the workload on the destination server and the network over time, with subsequent elimination of hot spots as well as significant improvements in performance for both clients and servers. Our ongoing research [2, 4] indicates that orders of magnitude of improvement can be achieved with the Bistro architecture and the corresponding data collection algorithms it affords.Bistro's design allows for a gradual deployment and experimentation over the Internet (by simply downloading Bistro server software and installing it on public servers). Bistro's security protocol and trust structure [3] are designed such that only encrypted data travels through (not necessarily trusted) bistros. This means a government agency does not need to trust bistros installed by other agencies or commercial institutions. At the same time, these (untrusted) bistros can significantly improve the agency's data collection performance. Each application (within each agency) can have its own scalability, security, fault tolerance, and other data collection needs, and these applications and agencies can still share available resources, if so desired, across all Bistro servers.We believe an appropriately designed single infrastructure such as Bistro can address all digital government wide-area data collection needs in a scalable, secure, and cost-effective manner. (For more information, see bourbon.usc.edu/iml/bistro/.