Graph is the fastest growing area in the biggest segment in enterprise software – databases. Case in point: A series of recent funding rounds, culminating in Neo4j’s $325 million Series F funding round, brought its valuation to over $2 billion.
Building bridges with openCypher
In addition, Neptune will be adding support for Bolt, Neo4j’s binary protocol. What this hints at is the ability to allow customers to leverage familiar and existing tooling – Neo4’s tooling, to be more specific. But there are more reasons why this is important. There are two main data models used to model graphs: RDF and Labeled Property Graph (LPG). Neptune supports both, with SPARQL serving as the query language for RDF, and Gremlin serving as the query language for LPG. Gremlin has a lot going for it, as it has nearly ubiquitous support, and offers a lot of control over graph traversals. But that can also be a problem. Gremlin, part of the Apache Tinkerpop project, is an imperative query language. This means that as opposed to declarative query languages like SQL, Cypher, and SPARQL, Gremlin queries don’t just express what to retrieve, but they also need to specify how. In that respect, Gremlin is more akin a programming language. Neptune’s support for LPG and RDF is possible because it hosts two different engines under its hood, one for each data model. Adding support for openCypher does not change that – at least not yet. But RDF* might. RDF*, also known as RDF Star, is an update to the RDF standard that enables it to model LPG graphs too.
Data science and machine learning features: Notebooks and Graph Neural Networks
GQL still has some way to go. Standardization efforts are always complicated, and adoption is not guaranteed across the board either. But Neptune also exemplifies another important development in graph databases: integration of data science and machine learning features. Developing graph applications, and navigating graph results, is greatly facilitated by IDEs and visual exploration tools tailored to this purpose. While many graph database vendors have incorporated built-in tools for those purposes in their offerings, Neptune was relying exclusively on third party integrations until recently. The way Neptune’s team chose to address this gap was by developing AWS Graph Notebook. Notebooks are very popular among data scientists and machine learning practitioners, enabling them to mix and match code, data, visualization, and documentation, and to work collaboratively. We’ll have to wait to see if that bet pays off. What is certain, however, is that offering notebook support strengthens Neptune’s appeal for data science and machine learning use cases. But that’s not all Neptune has to offer there – enter Neptune ML. GNNs is a relatively new branch of Deep Learning, with the interesting feature that they leverage the additional contextual information that modeling data as a graph can model to train Deep Learning algorithms. GNNs is considered state of the art in machine learning, and they can have better accuracy in making predictions compared to conventional neural networks. Integrating GNNs with graph databases is a natural match. GNNs can be used for node-level and edge-level predictions, i.e. they can infer additional data and connections in graphs. They can be used to train models to infer properties for use cases like fraud prediction, ad targeting, customer 360, recommendations, identity resolution, and knowledge graph completion. Again, Neptune is not the only one to incorporate notebooks and machine learning in its offering. Besides addressing the data science and machine learning crowd, these features can also upgrade the developer and end-user experience as well. Better tools, better data, better analytics – they all result in better end-user applications. That’s what all vendors are striving for.