Storage Error - not enough RAM - Could kubernetes solve my problem?

Hi everyone.
I’m running Nebula Graph v2.0.0 on Windows 10 by using Docker Desktop on a notebook with 16GB RAM.
Premised that I’ve imported a huge quantity of data (to make the idea: I’ve imported the content of the following CSVs

My issue is that when I input complex queries I always get the following error:
[ERROR (-8)]: Storage Error: part: 4, error: E_RPC_FAILURE(-3).
As a matter of fact my RAM reaches 98% of usage, so I’m facing undoubtly a problem memory.
I was wondering whether distributing Nebula on more than one node by means of Kubernetes could solve this trouble.
Otherwise, could anyone suggest me any other solution? I would really appreciate that. Thanks

@luca Thanks for asking here.

@wey Could you please help look at this issue?

Dear @luca ,
Welcome to the nebula graph community!
By Kubernetes do you mean 1. a cluster running outside of your laptop or 2. running a Kubernetes inside your laptop?

I guess you meant 2. and would like to know if a more distributed placement of same spec(16GiB RAM in all) would help? I think no.
But if you meant 1. to put nebula cluster in a higher spec k8s env, it will for sure help.(I guess you were not asking this case)

Could you please help share graphd logs and help check if all services are up during the complex query? Also, help provide what query you performed?

Thank you!
BR, Wey

Dear @wey ,
thanks for your support.
Actually I meant 1. My idea is to deploy Nebula on pods on more than 1 node (for instance 2) each on a different PC, so one node on my PC and the other one on a different PC located in the same local network. To let you understand my idea was to try the guide in this blogpost.
Regarding your requests, could you please tell me how can I get graphd logs?.
I’m not sure the following is the correct procedure:

I’ve already checked all services are running.
Finally the query I submitted is the following:

MATCH (u:user)-[r:tweeted]->(t:tweet)-[r1:in_reply_to*0]->(t1:tweet) WHERE t.created_at > datetime("{start}T00:00:00.000") AND t.created_at <= datetime("{end}T23:59:59.999") RETURN,count(r)AS OT1 ORDER BY OT1 DESC;

That is parameterized using f-strings because I submitted it through nebula-python (Notice I also tried directly by nebula console and the problem still persists).
start and end are two strings in “%Y-%m-%d” format (ex “1900-01-01”).
Performing the same query on a smaller amount of data works.

Dear @luca
Got it, I see, for logging in docker-compose deployment, there is a logs folder under the folder nebula-docker-compose for you to check :-).

From the query you help provided, this query is a full-scan like query, which could end up consuming large RAM.

Could you check whether storageD was exited/stopped (from pod status, logs)?

In case no pod/process exited due to OOM or other reasons, this could be only a timeout case RPC failure.

Then only changing timer in graphd.conf would help:

In nebula-graphd.conf, modify or add the item --storage_client_timeout_ms=60000 to change the timeout(ms)。

ref: FAQ - Nebula Graph Database Manual

While in case it exited due to some reasons like RAM not enough, the only way forward are either revise the query or scaling up the node.

To using more resources with a larger node clusterred could help but not sure if it goes well with windows(as docker/k8s on windows is not that native).

For containerised multiple server deployment, there are actually three options, where for production in K8s, the operator solution is recommended and what we are focusing now. While for your case, for test purpose and with limit resources, operators comes with more footprint may not be the best fit, you can try with k8s helm or swarm.

While, if your another PC/node is with more spec than your laptop and it’s in Linux, you can deploy cluster only on that node first.

dear @wey ,
Firstly, I checked storageD logs, and I got it was a time out error.
For this reason, I’ve tried to change timer adding --storage_client_timeout_ms=600000 (60000 is the default value) for each graphd container in docker-compose.yaml file as follows:
Now I’m getting another error that I don’t understand:

Once again storaged logs don’t show any issue.
Looking at graphd logs I got:

1 Like

Dear @luca
Looks like the graphD crashed due to RAM/OOM[0], could you try this query in GO/LOOKUP ?

This may make difference due to GO will not actually fetch all props but only ID when scanning the data, whereas MATCH fetches all node information.

LOOKUP ON user \| 
    GO FROM $-.VertexID OVER tweeted \
    WHERE $$.tweet.created_at is not empty and $$.tweet.created_at > datetime("{start}T00:00:00.000") AND $$.tweet.created_at <= datetime("{end}T23:59:59.999") \
    YIELD $-.VretexID as userId | \
    GO FROM $-.userId OVER in_reply_to YIELD $-.userId as $-.userId, count(in_reply_to._src) AS OT1 | \

BTW. The nebula team is working on optimizing match queries now, please expect improvements in near future :slight_smile:

[0]: you may verify this by check docker ps to see if graphD restarted.

This topic was automatically closed 45 days after the last reply. New replies are no longer allowed.