BLOG@CACM
Computing Applications

Understanding NoSQL Database Types: Column

Posted

NoSQL has grown into a popular supplement to traditional SQL database management approaches. By breaking out of the relational mould, NoSQL allows for powerful features that would otherwise be unavailable or difficult to accomplish. Namely, it's highly scalable horizontally and flexible, often complimenting budget or breakthrough CI/CD projects.

The four main NoSQL database stores/systems:

1. Key Value

The simplest type of NoSQL is the ** key value store **, which assigns a value and name/key to every given item. Redis is one example of this type of system.

2. Document

** Document stores ** use complex data schemas called documents with associative key pairs. MongoDB uses this database system.

3.  Graph

** Graph databases ** are for general-purpose use particularly with unstructured data and social networks. Neo4j is a popular system example.

4.  Column

The ** Column store ** relies on columnar rather than row data schemas, and otherwise functions similarly to relational database tables. HBase and Apache Cassandra are two system examples.

Definition: Column Databases

Though column-stores organize data into columns, they essentially operate in the same manner as tables do in relational databases. It however retains much more flexibility, particularly with quickly getting meta data insights across many or all tables, without performance kills; e.g., finding the average age of all your male customers.

There is no need to create composite indexes on sex or age, which could demand gigabytes or terabytes of query work; only the columns relevant to your query are investigated. By comparison, an SQL database would scan through countless tuples: representing millions of rows/columns.

Also known as the keyspace concept, columns are grouped into column families, each containing rows, and further columns. One way to think of it is as a trunk of rows with column offshoots. The various levels are hyper-connected to one another horizontally and therefore called columnar families.

Within each row, further columns can become available, each with its own heading, size (no uniform standard needed), and links. Each column exists inside a specific row, and can be assigned a name, time stamp (stamping when data was entered into the database), and value pair. The Row Key is the unique identifier which facilitates columnal queries. Here is an up-to-date list of the most popular and current wide-store/column databases (see the first section of that link).

Advantages of Column Databases

A few key benefits that come with columnar databases:

  • Horizontal scalability is superb with column-store databases. While relational systems are great at maximizing a rigid vertical (row-based) design for OLTP transactions, column databases are horizontally scalable to near infinity. Use across a large spread of machine clusters, even to the thousands; well-suited for Massive Parallel Processing and OLAP.

  • Aggregation queries, by extension, are faster than in relational databases, suiting it to projects that need large numbers of queries done quickly. Loading rows numbered in the billions can be done in a few seconds, allowing for near-instant querying.

  • Compression is exceptionally well-done and column-stores are therefore an efficient form of storage. Store enormous amounts of information inside a single column at reduced disk sizes and resource levels.

  • Flexibility is high. Columns do not need to mirror one another. Making it a compliment to unstructured data, easily join or remove columns without needing to disentangle the whole database. It should be noted that inputting completely new record queries will necessitate changes to all tables.

Overall, column databases do reporting and analytics well: you can store massive amounts of data without cost-prohibitive infrastructure, at fast querying speeds.

Limits of Column Databases

Column databases naturally come with disadvantages in other contexts:

  • OLTP apps are incompatible with columnar stores due to the horizontal nature of the columnar data design. For this reason, relational databases are still overwhelmingly preferred by banking institutions and personal accounting, at least for typical daily transactions.

  • Designing and indexing schema is laborious and complex. Ultimately, such solutions would be less capable than what your average relational database could more easily accomplish. Incremental data loading is also best avoided, as it is low-performance.

  • Security is affected. All NoSQL databases are more vulnerable to online attacks due to the lack of native security features. If cybersecurity is of critically high priority, you should either consider ways of using a relational model or to make your schema as defined as possible.

Are Column Databases Only Available in NoSQL?

Before wrapping up, it's useful to note that column-stores are not exclusive to NoSQL. You often hear that column databases are so divergent to relational models that they firmly fall into the NoSQL domain.

This isn't necessarily true, and the NoSQL vs SQL debate is, generally, quite complex.

Column-store databases are a virtual mirror to traditional SQL methodologies. For instance, key spaces allow for schema management, although NoSQL is often conflated as being schema-less. The metadata can often be identical to the traditional relational model. Most ironic of all, column-stores usually comply with SQL and ACID.

More strikingly, while most NoSQL databases use either the key-store or document-store, column-stores use neither.

All in all, column-stores, like graph databases, tread that fine line between two worlds.

Conclusion

Column stores offer an extremely powerful solution, with natural limits. For instance, essential financial transactions remain reliant on relational databases because writes to disk operate both vertically and sequentially, via rows. There's less consistency and isolation guarantees in columns, however, wherein rows need to be rewritten multiple times.

That said, column-stores are one of the most used and popular data designs around.

Alex Williams is a full-stack developer with over 15 years of experience, and the owner of Hosting Data UK.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More