Star

Nebula capabilities for deeply recursive queries

I have been working on a research tool for several years now and I keep running into database performance constraints. So far SQL common table expressions (SQL-Server) have given me the best performance, but it often takes over 30 minutes to process hierarchical data that is rather less challenging than I anticipate in production.

With high hopes, I have spent the last few months learning graph databases (JanusGraph with BerkeleyDB) only to discover that a query comparable to my 30-40 minute cte query, crashes after about 10 hours…

I asked about this on stackoverflow https://stackoverflow.com/questions/63112168/graph-database-or-relational-database-common-table-extensions-comparing-acyclic where the details of my of my use case are given.

Is it true that graph databases are not a good solution for my use case, or will Nebula Graph be orders of magnitude faster when the query language is updated to support the kind of query I will need to construct?

Thank you for your assessment.

1 Like

I think it depends on your recursive depth and how many data you will acquire. In general, Nebula Graph is way more friendly to BFS than DFS because of its storage format. In your case, Nebula need to go from vertex f, collect all its neighbors with a edge of specified type from f, and then go a step further. Since your query will get 500000+ edges, you need to take the memory into account. (JanusGraph crashed probably because of OOM).

Thank you for your prompt response.

My first concern is with the processing time. Even with no memory limitations, why do graph databases take so long to process “joins” when that is advertised as their forte? Why do relational databases appear to be so much faster?

Having a recursive depth that may extend beyond 30, and an eventual inter-related group size likely to exceed 6 million with attendent result sets of multiple millions, it would seem that memory is also a limiter. It was a memory error that crashed the test, but once it took longer than 30 minutes, it had already proven a loser to common table extensions!

I was hoping this was a problem due to Java being the programming language instead of C or C++, but it sounds like Nebula would also be slower than SQL Server?

I wonder how you define your edges. Graph DB may get advantage if you define your ‘join link’ as edges. Then your query goes just scan the edges,

Please refer to this article. If you have enough machines and data is partitioned evenly, your memory is not a limiter. I think it won’t takes to long.

liuyu85cn, this was my thinking exactly. It is how I designed it and yet the performance using JanusGraph is horrible.
critical27, thank you for the article reference. I had planned to scale out and cluster closely related nodes together. However, my test use case is quite a small subset of my potential data and should be very quick. Unfortunately, it is very slow (though I understand that the number of iterations is in the billions).

Mmmm, not quite familiar with JanusGraph, maybe you are right, let’s blame Java. Maybe you can give Nebula a try, it’s written in C++.

Also, for your concern about why SQL server runs so fast, maybe your data is so lucky can fit into memory. Hierarchical Queries + Join (and join has to be done in one box) may easily triage ext-storage. If the happen, that will be another story.

But for graph DB, as they only do scan, and can scan distributed, the pressure for memory will be much better.

I was considering it but thought that I would need to wait for a release that addresses the issue referred to at Getting all vertices emerging from one node in DAG (infinite steps)

1 Like