BLOG@CACM
Computing Applications

Understanding NoSQL Database Types: Graph Databases

Posted
Overview up to today

While originating as a subset of NoSQL or "Not Only SQL," graph databases represent a sharp closing of SQL and NoSQL demarcations. Graph technologies are exploding in its market size as more companies and developers take up their hybrid flexibility offerings. Those offerings: Intuitivity plus scalability with a high connection and robust data pattern.

While I won't go into depth on the formation of the 'SQL vs NoSQL' debate, you could quite accurately say that SQL represents data stored in rows and tables, while high-growth NoSQL is data stores arranged via nested documents as columnar schemas or key-value pairs. One is relational, the other not so much.

Graph databases are formed from nodes, properties, and relationships—all in a very interlinked data structure. And yet it supports advanced, rich querying with scalability. In this model, relationships matter just as much as the data itself. In a sense, it combines the querying power of relational databases with the intuitive flexibility of columnar non-relational databases—supporting agile development while also letting you gain deep insights.

Why use graph databases: The benefits

The graph model is a general-purpose data technology. While many know it for its social media implementations—this 'emerging shape', as it's known amongst data scientists due to being a non-typical dataset, has become most popular with social media companies for performing social network analyses, and for creating social graphs via companies like Facebook and Twitter who are particularly focused on the Six Degrees of Separation concept—graph databases are actually found in a large variety of industries, ranging from finance to healthcare, to emergency-response networks.

The principal benefit of graph databases is using its ability to assign values to links or connections. If your data has connections, whether for offline machine learning systems or online mobile applications, implementing this emerging shape will likely be beneficial.

In short: Build high-fidelity, highly interconnected networks made of bite-sized, scalable patterns (ie. great for CI/CD dev) that can together service, query, and manage sophisticated problem domains.

The 'label property graph' model: 3 essential components

The most commonly used graph database model. Specialists use this term to differentiate between this type from more overtly mathematical models—for instance, hypergraphs. For nonspecialists, it may be helpful to explain each component of the label property graph model: These are nodes and relationships (otherwise known as vertices and edges), and constraints.

NODES

  • Nodes are usually used to represent possible entities, such as product, person, IP address, or medical history. 
  • Nodes allow you to add labels to designate how that node functions within the graph. For instance, you could label a node representing a current client as a Business and Client; and a prospective individual as a Lead and Client.
  • Highly targeted labels allow you to easily locate all clients, all individual leads, or all business clients and use them as the basis in graph queries.
  • Nodes also let you add data attributes, such as a first_name property attached to the label Client, or an email_address property attached to a node labelled Lead.

RELATIONSHIPS

  • Relationships connect nodes to each other and properties can be attached to relationships. 
  • Predicates are determined by the relationship type, and the order of connection determines what the subject and object positions are (for instance, dishwashers require four wheelbases, not vice versa).
  • You combine a limitless number of relationships for any type (for any direction) per node.
  • The limitless variability is inbuilt into the graph model. It's expected that some nodes may have few connections, even if others are thickly connected.

CONSTRAINTS

  • The meshing, so to speak, that you place over your graph model after securing your basic node/relationship foundation—ie. constraints significantly determine, fine-tune, and influence how your graph can evolve.
  • By setting constraints, you request that the database honours certain attributes, which then govern relationship types or specified node labels—for instance, you could declare that the first_name property must be applied to all nodes with the label Client.
  • One interesting risk management constraint is setting rules that restrict certain fields to having to be unique. This can be useful when adding properties containing personal identification information, such as tax identification numbers (TINs) you've added to Client nodes.
Graph database implementation challenges to consider

Graph models mimic the surface-level chaos of real life. Any relationship number/type can be used to connect nodes, regardless of the direction. There is no standardised structure that you need to commit to—perhaps high-density is needed, or sparse paths are best, in order to accurately model your domain. Do what is pertinent.

Each entity is represented by a node. While each relationship can be used to join two particular nodes. If you have a lot of inventory of data to be stored, expect to have lots of nodes designating each product, and more for those products' customers. You'll also have to connect all of the interlinked relationships.

This can look clustered at first sight. Relational databases, by comparison, tidily sort information according to their separate tables with preset joints. But note that graph databases have abstractions that reduce complexities. Consider first designing and planning out your graph database via the help of RDBM tools. If you can discern an Entity Relationship Diagram (ERD) from this, you can use it to create your graph database faster.

Once you get all over the initial hurdle, the unique graph way of thinking, you will find that a network of relationships and nodes can outperform relational modelling.

Conclusion and when to use graph databases

Graph databases can be an essential general-purpose database, for when you need a simple build that is very expressive. They should be used widely.

There are a few instances where it is a better specialist solution, too; specifically when you need a better bulk storage solution because your current database does not allow you great data reasoning insights (just as importantly, if it doesn't allow simple and fast querying).

This is a massively blossoming technology that adds value in a wide context of situations.

Alex Williams is a full-stack developer with over 15 years of experience, and the owner of Hosting Data UK.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More