Star

Building on aarch64 and running tests

Hello!
First I was trying to build nebulagraph on 4GB Raspberry Pi 4 and selected the latest branch looking like stable from github. It was 1.2.0. However there were issues - cmake was downloading some x86_64 (prebuilt?) stuff instead of aarch64. Fortunately I was more successful with master branch - it compiled successfully from the first time.

However I’m a bit stuck with running tests. They tell me:

         69 - query_engine_test (Failed)
         71 - data_test (Failed)
         76 - delete_vertices_sentence_test (Failed)
         78 - update_test (Failed)
         84 - index_test (Failed)
        109 - checkpoint_test (Failed)

I’m building and running inside of docker container which has all dependencies mentioned in a building manual.

I thought: okay it’s probably either aarch64 or docker issue and build master on x86_64 using the same docker image as I’ve created for aarch64. And there are also failing tests:

         69 - query_engine_test (Failed)
         71 - data_test (Failed)
         78 - update_test (Failed)

Running tests outside docker on x86_64 turns into failing slightly different test list:

         14 - network_utils_test (Failed)
         71 - data_test (Failed)
         78 - update_test (Failed)

Revisions master branch used for building:
b7582e28fc31528aa0104ad15c9dfd58b5dd4799
both for aarch64 and x86_64.

BTW there are lots of publications and discussions about v2 rc but I didn’t find any branches or tags containing 2.0 at github. Moreover I event don’t know which version I was building - ./bin/nebula --version prints just a git revision

Thanks for your questions!
The query_engine_test /the data_test/update_test test failed, maybe because the test includes the timezone test, the test data use the fixed timezone, so it the test environment’s timezone is not +08:00, it will check failed.

Could you provide the error log of other failed tests? You can run the failed test, and it will print the error test.
such as

./bin/test/network_utils_test

Hm, today running on aarch64 (+ inside of docker container) gives me less failures, quite strange:
69 - query_engine_test (Failed)
71 - data_test (Failed)
78 - update_test (Failed)
84 - index_test (Failed)

Detailed:
> bin/test/query_engine_test
[ FAILED ] 3 tests, listed below:
[ FAILED ] SchemaTest.issue2009
[ FAILED ] SchemaIssue1987/SchemaTestIssue1987.issue1987/0, where GetParam() = “TAG”
[ FAILED ] SchemaIssue1987/SchemaTestIssue1987.issue1987/1, where GetParam() = “EDGE”
> bin/test/data_test
[ FAILED ] 2 tests, listed below:
[ FAILED ] DataTest.InsertTest
[ FAILED ] DataTest.InsertWithDefaultValueTest
> bin/test/update_test
[ FAILED ] 1 test, listed below:
[ FAILED ] UpsertTest.EdgeNotExists
> bin/test/index_test
[ PASSED ] 8 tests. // WHAT ??? :slight_smile:

Running ctest without -j4 really produces only 3 suite errors - index_text is now ok.
Maybe it failed with -j4 because of lack of memory for example.

Thank you for your reply. DATA_TEST and UPDATE_TEST failed because of timezone problem. However, the query_engine_test is not due to the time zone. I will test it locally.

Really after propagating TZ=Asia/Shanghai into docker container all tests except query_engine_test are now passing. I’ll try to pull latest changes and rebuild.

The same is for latest. query_engine_test fails.
I’ll try to compare with x86_64 - probably it’s an arch issue.
Also I’ve found that different tests fail randomly when run multiple with cmake -jN.
Only single threaded run is successful.

I’ve compiled for x86_64. The same docker image - so the same gcc and all libs versions.
Query engine test is failing for aarch64.
All tests are passing successfully forx86_64.

I haven’t figured out why -j4 mattered before. Currently it does not make any difference neither for x86_64 nor for aarch64. On aarch64 I’m using an old laptop hdd so it could be a timeout issue due to i/o wait. Are there any time-sensitive tests?

Are there any time-sensitive tests

Yes. Thanks for your questions. Can provide the failed tests? And I’ll try my best to explain why.

1 Like

Generally they are different all the time. You can see some of them in the starting message above.
I can imagine why it’s happening. In the project I’m being developing at my job we also have similar issues with some services - for example an integration test which covers logic which sends data through RabbitMQ - there is no way to ensure that a message sent to an exchange will appear in the output queue next moment so we have to use a straightforward sleep(N). Of course it will fail if the testing environment is really slow.

So don’t bother providing a detailed explanation, i’ll just remember that it’s okay to fail on a really slow system under certain circumstances (for example using HDD instead of SSD or asking ctest to run too many threads).

So only test which is failing constantly is on aarch64 isquery_engine_test though I was using absolutely identical docker image (excluding architecture) to build and to test for both architectures (x86_64 and aarch64).

BTW it looks like query_engine_test is also time-sensitive. I’ve provided above 3 failed tests in a query_engine_test suite though I’ve just tried to run it few times and it always turns into different number failed tests - currently either two or one. This one fails from time to time:

[ FAILED ] SchemaIssue1987/SchemaTestIssue1987.issue1987/0, where GetParam() = “TAG”
And this one always fails:
[ FAILED ] SchemaIssue1987/SchemaTestIssue1987.issue1987/1, where GetParam() = “EDGE”

So generelly to eventually end this discussion I’ll just ask whether it would be normal to assume my build stable/successful even though it fails those tests. And also will the DB fail for example if I run my project’s tests which involves querying NebulaGraph in the same random way? Or those time-sensible fails are only issue of NebulaGraph’s tests and not of NebulaGraph itself?

I think the reason for the test failed in a random way may be relative to using the old laptop hdd. Here’s our doc describe about hdd. As the doc says, Nebula Graph is designed for NVMe SSD and 10 Gigabit Network. There is no special adaptation for HDD and gigabit networks.So you could meet some detailed problems like this. If you want the Nebula running stable, please use the ssd instead of hdd.

Of course, we are open source project, you can join us to find the reason and fix it😬

Ok, I’ve got it.
Should either run tests with descent storage hardware or ignore several failures.
Thank You for an explanation!

1 Like