Beginning in the early to mid-1980s the relational model of data has dominated the DBMS landscape. Moreover, descendents of the early relational prototypes (System R and Ingres) have become the primary commercial relational DBMSs. As such, the basic architecture sold by the commercial vendors is more than two decades old. In the meantime the computers have advanced dramatically on which DBMSs are deployed. Grids (blades) have replaced shared memory multiprocessors, CPU speeds have greatly increased, main memory has gotten much bigger and faster, and disks have gotten a lot bigger (but have lagged CPUs in bandwidth increase).
During the same period, several new major applications of DBMS technology have emerged to complement the business data processing market for which RDBMSs were originally designed. These include data warehouses, semi-structured data, and scientific data.
It now seems apparent that the traditional architecture of RDBMSs can be beaten significantly (a factor of 2550) by a specialized implementation in every major DBMS market. In the data warehouse area, this implementation appears to be a coded column store. A column store represents data column-by-column rather than the traditional row-by-row. In a column architecture the execution engine must read only those data elements relevant to the query at hand, rather than all data elements. Also, data compression is much more effective in a column store because one is compressing only one type of data on a storage block rather than several. As a result, less data is brought from disk to main memory. Moreover, if the execution engine operates on compressed data, then there is less copying and better L2 cache utilization. Hence, CPU execution time is dramatically reduced. These savings have been realized in the original column stores from the 1990s (MonetDB and SybaseIQ) as well as by more recent commercial products from Vertica, Infobright, and Paraccel.
The research presented here by Boncz, Manegold, and Kersten documents these advantages, and is definitely worth reading. It focuses on column execution and compression in main memory and complements other analyses of data warehouse disk behavior. As such, it is exemplary of a collection of recent papers on column store implementation techniques (in VLDB and SIGMOD) to which the interested reader can turn for other analyses.
In other database markets, including business data processing, specialized architectures offer similar advantages. Papers analyzing early prototypes in these areas are beginning to appear. In my opinion, we are seeing "the beginning of the end" of the "one-size-fits-all" systems sold by the major DBMS vendors. I expect specialized architectures to become dominant in several DBMS application areas over the next couple of decades for performance-conscious users. On the other hand, at the low-end open source systems such as MySQL, Postgres, and Ingres are gaining traction.
Expect to see a flurry of additional papers exploring facets of specialized architectures from the DBMS research community. Furthermore, there have been a collection of recent DBMS startups with specialized implementations, and I expect there will be more to come.
It should be clear that the DBMS community is in transition from "the old" to "the new." The next decade should be a period of vibrant activity in our field.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment
Comments are closed.