Quite a few scientists who deal with the processing and storage of large amounts of data are unhappy with relational DBMSs. Here are several reasons why—and a possible solution.
Mike, why do you suppose no one has organized a major open source initiative around this idea? What is stopping the creation of a sci-mysql?
While I agree that RDBMS is not an optimal technology for scientific applications and that an open source initiative may lead to some good innovation, I'd be cautious in separating the data model from the query and management language.
There are proprietary tools (such as kx.com) that have done so successfully. The speed and capacity of such tools is phenomenal (as are the licensing fees one must pay).
I don't understand the problem with table for array. All I can think is that table with schema is unnecessarily "complicated" for array.
I agree traditional RA operators are too limited. But it is hard to unify general high-level primitive for science operations like the RA operators.
One approach is to build machine learning algorithms above database. Since these operations are usually computation intensive, and are used to handle uncertainty, performance metrics are not only I/O any more. However, DBMS is for data storage and management, analysis functions should be left for the upper level.
This statement might surprise some people but relational databases do require relational approach to stored data. I'm all burned out after recent few months struggle of trying to squeeze GIS data into relational database, so I do not think I have any religious flair related to the issue, but … before we even think of implementing anything in RDBMS we need to consider learning/trying to understand such relational paradigms as: relationship between data entities (known in RDBMS circles as analyzing Universe of Discourse), normalization process, transitive dependencies (!), 3-Normal Form and 5-th Normal Form. Yes, RDBMS usage carries a learning overhead and the curve is steep. I'm not sure if is possible to design a software tool that would free us from initial research, which in this case are intricacies relational world, but we may try :-)
Having worked in both commercial and scientific spheres, and as current curator for some 22 terabytes of climate science data, I would no more try to use an RDBMS to store vast amounts of data than try to jump over the moon. However, in commercial spheres, an RDBMS is just a tool; that is, an enormous amount of software development is required to produce customised applications which use the core facilities of an RDBMS to gain access to the required data, but it is still the application, per se, which provides the grunt work.
In relation to scientific data, an RDBMS can be used to hold the metadata, plus pointers to the actual data; where by "pointers", I mean paths to files, or URLs, etc. But you still need to customise the application to suit the project requirements, but that's life!
If you are an ACM member, Communications subscriber, Digital Library subscriber, or use your institution's subscription, please set up a web account to access comments, premium content and additional site features.
If you are a SIG member or member of the general public, you may set up a web account to comment on free articles and sign up for email alerts.