Star

Some Questions about Storage Layer

Hello Everyone,
I am new in Nebula Graph.
1- According to documentation If we insert an edge that has same src,dest and rank but different property, we can read the last inserted data. Does it mean it is overwritten?
2- For example; I want to add multiple same edges between two vertices but different properties like date. How can achieve this?
3- When inserting an edge, can we add a constraint to check if src or dest vertex exists? If they don’t exist, I don’t want to insert edge?
Thanks

@rwer81 Thanks for all the questions. Before answering your question, let me explain how we store an edge. Edges are stored as key/value pairs. Each edge has a unique key, which is composition of four elements: src, dst, edge_type, and rank/weight. The value is the encoding of all properties. Only when all these four elements are same, the value will be overwritten.

So back to your questions

  1. If all four elements are same, the new edge will overwrite the old one

  2. Multiple edges with the same type could be achieved by using different Rank/Weight values. The existence of Rank/Weight is just for this scenario

  3. Yes, this constraint can be added easily, but will hurt insertion performance

2 Likes

Thanks for reply @shermanye
Is there any way update rank as counter? For example each new edge added between same vertices, rank increases +1 automatically.
Also can you share an example for question 3?
Thank you

Is there any way update rank as counter? For example each new edge added between same vertices, rank increases +1 automatically.

It is possible to implement this feature. But I don’t think it’s necessary. An auto-incremental rank would mean it’s hard to remove any history. I would suggest the application to have control of the rank value. For example, a timestamp, or a transaction ID could be used as the rank value, so that the application can easily figure out the key for a specific edge

Also can you share an example for question 3?

We currently don’t support this. But it could be easily implemented

Thank you @shermanye

Thanks for replies @shermanye,
You said if same edge is inserted, it will be overwritten.
But today I tested this case and I inserted 1 million same edge using different methods like row-by-row insert and batch insert from csv. I observed data folder sizes increased.
As mentioned in documentation “An edge can be inserted/wrote multiple times. Only the last written values can be read.”. So whether they are same or not each edge is written to disk but last one can be read.
If we look at the the edge format in storage, it is “Type-Part ID-Vertex ID-Edge Type-Rank-Vertex ID-Timestamp”. Due to last part of edge key-Timestamp- it can not be overwritten. It is a new row.
In conclusion, There is a waste of disk.
In my case, people go same hospital again and again. I want to set one link between a person and hospital but now I have to check if this link/edge exists. So this check decreases performance. If I don’t check, I store extra data that is not used.
Thanks.

Additionally, some specific cases we should be able to ignore “timestamp”. This may be useful to avoid too much disk usage.
Thanks.

1 Like