Benchmarking Neo4j and ArangoDB against Nebula Graph at Suning
The authors come from the Algorithm Lab for Intelligent Monitoring and Operation and Maintenance (O&M) of Suning, one of the largest online retailers in China
- Special thanks to Nebula colleague for the beautiful translation！
The e-commerce system and infrastructure of Suning have become more and more complex over time. With the implementation of large-scale anomaly detection and the combination of traditional rule-based alarms, Suning urgently needs an effective alarm convergence mechanism. And finding root alarms becomes a more pressing task.
This challenging task can be effectively solved by building a dual knowledge graph of O&M and alarms. Given that the core of the knowledge graph is the graph, choosing a suitable distributed graph database is very important. We chose two popular graph databases in the industry to benchmark against Nebula Graph.
1. Database candidates
- Nebula Graph: A powerfully distributed, scalable, super-fast open source graph database.
- Neo4j: The most widely used graph database. It has Community Edition and Enterprise Edition. We used the Community Edition. The Community Edition does not support clusters.
- ArangoDB: An open source multi-model database that works with documents and graphs. It supports clusters and has good performance in reading and writing.
2. The dataset used for the benchmarking test
We used the nosql-tests dataset for the benchmarking test.
The dataset contains 1,632,803 vertices and 30,622,564 edges in total. Of them, one-hop query has 18 neighbors on average per vertex. Two-hop query has 800 neighbors on average per vertex.
3. A summary of the testing enviroment
- CPU: 16 cores
- Memory: 128 GB
- Disk: 2400 GB
Neo4j is deployed only on a single node. Nebula Graph and ArangoDB are deployed on both single node and clusters.
- Single node: one node
- Clusters: three nodes
- CentOS 7.3
- Nebula Graph 1.1.0
- ArangoDB 3.7.2
- Neo4j 3.3.1
4. Benchmarking test results
4.1 Batch data import and one-hop & two-hop queries
Take the ArangoDB single node test results as the benchmark. The comparison is shown in the figure below:
The comparison for query languages is shown in the table below:
The preceding comparison shows that AQL, Cypher, and nGQL are all rather concise. However, from the perspective of readability and syntax, nGQL fits in the user habits more because it’s SQL-like.
4.2 Other comparison items
Take the ArangoDB single node test result as the benchmark. The comparison is shown in the figure below:
The following charts are the comparisons of the items:
From the preceding benchmark results, we can see that Nebula Graph outperforms ArangoDB in batch import. But Nebula Graph is slower than ArangoDB and Neo4j in one-hop and two-hop queries. We will further discuss this topic in the future. Neo4j Community Edition does not support clusters, and the half open source ArangoDB is slow in batch import. What’s more, the SmartGraph feature of ArangoDB is closed source. That’s why we choose Nebula Graph in the end. We are glad to see that Nebula Graph can share a position in the distributed graph databases together with Neo4j. In the future, we will continue to share experience in practicing Nebula Graph in the production environment with the community. Together, we will promote the prosperity of the Nebula Graph community.
This article is co-authored by:
- Yong Tang, Director, Suning SmartEyeLab
- Chuangqi Hu, Suning SmartEyeLab
- Dan Xia, Suning SmartEyeLab
- Bo Zhang, Suning SmartEyeLab