BLOG@CACM
Computing Applications

NoSQL vs. SQL

Posted

The latest innovations in banking operational technologies and general private/public hybrid cloud data for use on and off premises can be serviced by SQL or NoSQL databases — buffered through relevant intermediaries/amalgams such as IBM Power Systems, PostgreSQL, and various GUIs. Traditionally, the non-relational (NoSQL) model of data storage was reasonably seen by some as the optimum partner for computational engineering and high-growth open source environments.

The Underlying Data Models

Let's call a database a well-organized data collection stored on a computer or cluster that allows you to store and retrieve data. When talking about a data model in this context, we are looking at a database's fundamental data storage architecture. As mentioned, this can be modified to a smaller or greater degree through intermediaries infrequently known as "middleware" — and these options are increasingly melding into one another.

Data scientists, developers, and administrators may first want to consider what capabilities they need, in relation to their preexisting architectures and systems that are exemplifying their sought-after operations, before whetting down the building's stones to either SQL or NoSQL. But it is still worth going over the conceptual idea of data models to — broadly speaking — discuss the two approaches to data storage in a database.

To be clear, developers who want to make good decisions between the two technologies will face a difficult challenge. They need a mature understanding of the full design space so that the benefits and compromises of a specific system are clear — and you do not forfeit a functionality that your application absolutely needs.

Relational/Non-Relational

The relational model and SQL are somewhat synonymous (PostgreSQL is an example of relational and non-relational roughly in one); data stored as a collection of entities which themselves contain attribute groups. Attributes are individually traceable to table columns, in a rigid architecture that maps out exactly; and entity/attribute relationships are precisely networked table to table — for this reason, regular transactional querying, such as financial transactions, happen very predictably/efficiently inside this architecture; this is the epitome of data storage logically organized through tables.

You're making a social platform or new app innovation. You want to minimize building time, competition is enormous; your app needs to come to market in weeks or months, not years. Unstructured data interactions are expected — audio, images, videos — all clashing in streams by the millions, created by perhaps thousands of users. For this, NoSQL shines (it's also known as Not Only SQL for the aforementioned "melding" reasons) — the epitome of unstructured, flexibly scalable data storage — e.g., the graph database type often used by social platforms.

Summary: Broadly, NoSQL has an absence of strict schemas for entities/attributes, while SQL rigidly relates/regulates the two.

SQL Pros And Cons

When talking about SQL, we will be referring to data that uses the relational architecture schema but keep in mind that some SQL databases can use both types. Treat this as an introductory rubric — the runway beginning to a much more thorough investigation into the fuller space.

PROs

●      Very reliable and predictable data integrity that can be very precisely modelled via ACID (Atomicity, Consistency, Isolation, Durability).

●      Roughly speaking, superior query flexibility for a larger scope of workloads; the fundamental limitation is abstracted by SQL, working with the engine to streamline queries so that they match the on-disk model.

●      More compact data due to the homogenised schema with more efficient performance and use of resources (of course, depending on the query type); leads to extra avenues for systematizing how you streamline your data.

CONs

●      Horizontal scalability isn't straightforward; indeed, the part-reason that NoSQL was developed in the early 1990s. There may be fully no support for this functionality, a dedicated workaround, or support, but still in its infancy years.

●      Works on non-distributed engines representing potential single points of failure which must be worked around by using failover and replication methods; as is, scalability is pricier rather than a natural expected feature.

●      Schemas cannot or should not be modified ad hoc, rather it should be carefully designed and well thought out beforehand; altering this design as a database evolves will still involve performance hits, perhaps downtime.

Not Only SQL Pros And Cons

To mitigate SQL's weaknesses that were found in relational systems, NoSQL was created to solve two problems: 1) Absence of horizontal scaling; 2) Prerequisite of highly lineated, rigid table structures. When referencing Not Only SQL, we will focus on the non-relational "schema-less" data model aspect, but keep in mind that there is a diverse ecosystem of NoSQL databases each with their own quirks.

PROs

●      Highly flexible data models acting in a non-relational system; developers do not need to commit at the time to one structure; the schema-less term is inaccurate, as there are schemas, but these can be altered ad hoc.

●      High availability and scalability without substantial single points of failure; NoSQL databases generally are architected for naturally high-integration online horizontal scaling.

●      NoSQL Databases are extremely high-performance when set with certain restrictions on their functionality — e.g,. by minimizing durability guarantees.

●      There is a good argument for NoSQL systems being naturally predisposed for the most advanced computer abstraction innovations in computer engineering that will take place in the future; these systems can offer very advanced APIs for high-level data structures.

CONs

●      Non-clear ACID constraint interpretations; while you can get some support for ACID, it's as yet immature, as the interpretation is so unrestrained that little in the way of useful insights can be discerned.

●      Access patterns are inflexible as NoSQL lacks abstractions of the underlying implementation — needed for its engine to streamline queries; the engine can optimize less, its on-disk data representations do not fit queries (instead, there is leakage).

●      There are specific problems that come with distributed systems, requiring some SME troubleshooting know-how that is very idiosyncratic to distributed application designs — these issues are not necessarily limited to NoSQL.

●      High-level APIs also come at the cost of vulnerabilities which should not be overlooked.

When To Use: SQL vs. NoSQL

SQL is a natural partner for relational data. It's a bit slippery to identify these days, except in certain instances, such as financial transactions in banking. Look for highly delineated entities with clear relationships with each other, wherein rigid rules should apply for their interaction. If that's the case, you're in luck: High-integrity, high-specificity querying across a broader dataset, with no great need for scalability. That said, something like PostgreSQL allows NoSQL-like workloads, excellent for mix-n-matching structured/unstructured data without complex data model integrations.

NoSQL is the natural fit for highly flexible data, or where specific solutions are needed that fit outside of the relational model. Something like MongoDB is excellent for lots of schema-less data. Redis allows for high-level data structures; get strong data integrity guarantees with fast key-value data querying. And Elasticsearch is — as it sounds — great for advanced flexible searching across great swaths of data. Look for this type if you need something with no single point of failure, exceptional availability, and naturally high scalability.

Conclusion

SQL vs. NoSQL is a useful conceptual rubric from which to begin a deeper journey into selecting a database from an adequately informed position. Doing so is no simple task however; you will not get around having to undertake considerable thought on what your application needs, what you are willing to sacrifice (pros and cons), and where the optimum balance lies,  to fit your availability/performance/growth objectives.

This is a design space full of opportunities for exploration and discovery.

Alex Williams is a full-stack developer with over 15 years of experience, and is the owner of Hosting Data UK.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More