Star

Nebula and ML

Hi , glad to find Nebula and come here! A question: Any case/material integrating Nebula with ML pipeline (Eg. used for building Graph Neuron Networking)? Thanks!

Can you describe your application scenario? We also learn

we’re working with two partners on GNN applications.
But as a graph database, our main target is on the OLTP scenarios (i.e., BFS).

Thanks reply! I will describe scenes in details today. Simply speaking, Currently, I am building a unified graph or knowledge graph that loads/stores data, computes topology related graph operations(including training Graph Neuron Network). On the data side, I/team am evaluating Nebula, this is an another story, until now, benchmark result is great compared with other graph db eg. ArangoDB.

On the compute & machine learning side, especially GNN, I ended up the following solution:
loading a bunch of data forward and backward into different nodes for ML, then, we start ML training and inference. and that made me think that this should have a better way. That is to say, I move data around that isn’t exactly great because processing cannot be done where the data is, or the data aren’t where the processing is happening. whether has any plan or possiblity that Nebula itself can offer 1) GNN related library 2) training / inference

In reality, this maybe also relate to separate storage from computation.

1 Like

I have seen such similar fusion (Graph + DGL) happened on Amazon DGL-KE [1] [2]

Training Knowledge Graph Embeddings at Scale
[1] https://github.com/awslabs/dgl-ke
[2] https://arxiv.org/pdf/2004.08532.pdf

1 Like

No specific plans yet.

Besides, you can Read/Write Nebula storage layer directly by the Java/Golang/Cpp/Python clients to, maybe, do some ML/DL training.

Here is an example how to access Nebula storage by python client:

https://nebula-graph.io/posts/game-of-thrones-relationship-networkx-gephi-nebula-graph-part-two/

And a graphx example will be provided later.

Anyway, for a distributed system, it’s not possible to process(training) all the edges/vertices on one machine because of the footprint limitation. So consider fetching the edges/vertices to the processing framework concurrently from Nebula storage partitions.

1 Like

Sure, I can understand the point. Thanks reply very much !

1 Like