Research and Advances
Computing Applications New architectures for financial services

An Alternative Architecture For Financial Data Integration

Exploring a new paradigm for mediated data integration.
  1. Introduction
  2. An Architecture for Data Integration in the Financial Sector
  3. The Architecture Described
  4. References
  5. Authors
  6. Footnotes
  7. Figures
  8. Sidebar: Web Data Extraction Solutions

Real-time integration of disparate data and applications is a key challenge faced by the financial services industry today. Financial institutions are often composed of several service units that work in a relatively autonomous manner and have their own IT environments and policies. It is often necessary to combine customer data from these various services to produce an integrated financial picture of the customer. Moreover, today many institutions are reaching partnership agreements with other institutions with the goal of improving their competitiveness. Furthermore, like enterprises from virtually all sectors, financial institutions cannot ignore the data connectivity issues arising from the emergence of the Internet. Some scenarios that require real-time integration of data distributed in several heterogeneous and possibly cross-enterprise information sources are described here.

Mergers and Acquisitions. During mergers and acquisitions, rapid and efficient integration of the existing IT environments is one of the crucial issues facing financial institutions. Conventional approaches such as integrating one system into another or building a data warehouse consolidating all data are expensive, slow, and highly intrusive solutions.

Customer Data Consolidation. Obtaining a global view of customers’ financial positions is a valuable asset for activities such as targeted marketing. It often involves real-time access to several sources containing information about accounts in the different companies of the group, credit ratings, investment information, and other such information.

Risk Management. Global risk information is usually difficult to obtain because it requires real-time data about deals made in different departments and supported by disparate systems.

Straight Through Processing (STP). STP refers to fully automating the processing of transactions. In order to better realize the STP vision, powerful real-time data integration and transformation capabilities are required.

Cross-Sector Partnerships. Many financial institutions establish partnership agreements with enterprises from other industries. For instance, some banks are reaching agreements with insurance companies for augmenting the value offered to their clients. Conventional warehousing-based integration approaches are not feasible in these environments since although organizations may be willing to share some internal information with their partners, they want to have control over the querying process performed over their autonomous information sources.

Business and Competency Watch. The Internet has fostered business research and observation activities to an unprecedented level. By extracting and combining information from specialized Web sources and content providers, institutions can track business sector figures (such as statistics about mortgages or loans), products from competitors, relevant news and reports, and many other sources of business-related information worldwide.

Account Aggregation. As access to Internet-based banking systems becomes increasingly common, many customers desire managing all their financial accounts by using a single and unified Web interface.

Back to Top

An Architecture for Data Integration in the Financial Sector

Traditional data integration approaches, such as data warehousing, are costly to deploy and do not easily support either real-time access to data or dealing with autonomous sources. Here, an alternative architecture for data integration in the financial sector is outlined. This architecture has already been deployed in several real cases.

The architecture is based on two emerging paradigms: Web services [9] and Enterprise Information Integration (EII). Web services are rapidly becoming the de facto standard for interoperation between different software applications, running on a variety of platforms and/or frameworks. EII systems are based on a wrapper-mediator architecture [10]. In this approach, the data from the sources is not transferred to a new central repository. Instead, data remains at the source and the EII system is responsible for providing users with virtual unified views over the source data. When the mediator receives a query, it decomposes it into sub-queries over the sources of data, executes them in real time and integrates the sub-query results to obtain a global query answer. Providers of EII solutions include [4–7]; the proposed architecture based upon EII wrappers and Web services is shown in the figure here. In this architecture the possible sources of data include internal databases, proprietary applications, systems from partner or member companies accessible through Web services interfaces, and autonomous Web sites from external organizations (specialized content providers, competitor’s Web sites, and so forth).

The physical layer of the architecture is comprised of wrappers that make sources conform to a common model. Easily configurable wrappers to access databases, Web services, and popular back-office applications are included as standard components of most current EII systems.

Successfully integrating data from external Web sites usually requires screen-scraping techniques that can frequently be error-prone and may often result in excessively high maintenance costs. Fortunately, some recent solutions to address this problem (see the sidebar “Web Data Extraction Solutions”) have been proposed by both the academic [2] and industry [3, 8] sectors.

The integration layer allows creating unified “virtual” views over data from the sources and executing queries written in a database-like language (such as XQuery and/or SQL). Thus, users are able to obtain answers to queries very similar to the ones they could issue against a hypothetical conventional database containing all the information from the sources. Since sources can be semistructured or even nonstructured, the system should also support combining database-like queries with text-based searches. The architecture also includes a cache for partial data materialization, thus creating an improvement in the performance by pre-fetching data that does not need to be accessed in real time.

The access to the system will typically be made through a Web service interface (this also allows users to build a hierarchy of EII systems), although Open Database Connectivity (ODBC) and Java Database Connectivity (JDBC) access should also be supported for integration with popular reporting tools.

Back to Top

The Architecture Described

When compared to traditional data integration approaches, such as data warehousing, the proposed architecture has a number of advantages:

  • Since data is retrieved from the sources at query time, it allows real-time data integration, as it is required for applications such as STP or customer data integration.
  • By avoiding building a new, expensive, and difficult-to-maintain central repository, EII solutions are less expensive and faster to deploy. They also can scale to a larger number of sources. Therefore, they are an ideal solution for a fast deployment of a unified system after a merge or acquisition process.
  • Autonomous sources (sources whose owner is different from the owner of the integration system) can be effectively dealt with since data remains in the sources and the source owner retains control of how and when its data is accessed. In this way, cross-sector partnerships are enabled.
  • By using smart Web wrapper generation techniques, even the information contained in external Web sites can be used as if it was stored in a local database. Thus, Web account aggregation and business watch applications can be easily created.

The possible sources of data include internal databases, proprietary applications, systems from partner or member companies accessible through Web services interfaces, and autonomous Web sites from external organizations.

On the other hand, accessing information in real time through the network results in lower performance than in materialized approaches. Nevertheless, the development of advanced query optimization and caching techniques for these systems [1], along with the continuous bandwidth improvements, make the performance adequate for most applications.

Another issue is the collaboration with other integration architectures such as EAI, which uses a complementary process-driven approach. More work is required to allow these technologies to be managed using a single framework and from a single interface. Languages for model-driven development and integration such as OMG’s MOF will be useful as development activities continue in this realm.

Back to Top

Back to Top

Back to Top

Back to Top


UF1 Figure. Data integration architecture.

Back to Top

    1. Adali, S., Candan, K.S., Papakonstantinou, Y., and Subrahmanian, V.R. Query caching and optimization in distributed mediator systems. In Proceedings of the ACM SIGMOD Conference (1996).

    2. Alberto, H. et al. A brief survey of Web data extraction tools. SIGMOD Record (June 2002).

    3. Baumgartner, R., Flesca, S., and Gottlob, G. Visual Web information extraction with Lixto. In Proceedings of the VLDB Conference (2001);

    4. BEA Systems. BEA Liquid Data;

    5. Enterprise Information Integration. MediaMatrix;

    6. IBM Corporation. IBM DB2 Information Integrator;

    7. Pan, A. et al. The Denodo data integration platform. In Proceedings of the 28th VLDB Conference, 2002;

    8. Pan, A., Raposo, J., Álvarez, M., Hidalgo, J., and Viña, A. Semi-automatic wrapper generation for commercial Web sources. In Proceedings of the IFIP WG8.1 Conference on Engineering Information Systems in the Internet Context (EISIC). 2002;

    9. Web Services Activity at the W3C;

    10. Wiederhold, G. Mediators in the architecture of future information systems. IEEE Computer Magazine (Mar. 1992).

    Alberto Pan's research is partially funded by the Spanish Ministry of Science and Technology under the Ramón y Cajal program.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More