Partially executed lookup statement

After I inserted around 40M vertexes with go importer, the following queries are only executed partially:
lookup on Storage_StorageAccount where Storage_StorageAccount.Id != “sdfdsf” | limit 10

========================
| VertexID |

2304661549333979001


8120226020803304301

Got 10 rows (Time spent: 4.85478/4.85611 s)

[WARNING]: Lookup executor was partially performed.

This is the setup for importer:
postStart:
commands: |
UPDATE CONFIGS storage:wal_ttl=3600;
UPDATE CONFIGS storage:rocksdb_column_family_options = { disable_auto_compactions = true };
CREATE SPACE IF NOT EXISTS test(partition_num=100, replica_factor=3);
USE test
CREATE TAG Storage_StorageAccount( Id string, …)
CREATE TAG INDEX byStorageAccountId ON Storage_StorageAccount (Id);

  • Nebula Graph version
    1.X
  • Deployment type (Cluster/Single-Host/Docker/DBaaS)
    3 node cluster
  • Hardware info
    • Disk in use (SSD/HDD)
      SSD
    • CPU and memory size
      32 cores 128GB

These are errors I have seen in storaged error logs:

On node1:
E0310 01:15:16.041659 9446 LookUpIndexProcessor.cpp:39] Execute Execution Plan! ret = -5, spaceId = 1387, partId = 51

node2 is fine.

on node3:
E0310 01:15:12.347851 21925 LookUpIndexProcessor.cpp:39] Execute Execution Plan! ret = -5, spaceId = 1387, partId = 27

By the way, the batch importer in go reported success without any issues.

Can someone kindly provide some pointers on how to debug it? Looks like there are issues with some partitions.

Also, these are errors during the ingestion from node3 which might help to identify the cause of the issue.

E0309 07:36:37.565825 22167 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 32] Receive response about askForVote from [10.0.4.13:44501], error code is -5
E0309 07:36:37.568286 22167 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 53] Receive response about askForVote from [10.0.4.13:44501], error code is -5
E0309 07:36:37.752787 22164 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 14] Receive response about askForVote from [10.0.4.13:44501], error code is -5
E0309 07:37:07.418205 21911 RaftPart.cpp:365] [Port: 44501, Space: 1387, Part: 22] The partition is not a leader
E0309 07:37:07.418511 21911 RaftPart.cpp:635] [Port: 44501, Space: 1387, Part: 22] Cannot append logs, clean the buffer
E0309 07:37:07.508111 21928 RaftPart.cpp:365] [Port: 44501, Space: 1387, Part: 44] The partition is not a leader
E0309 07:37:07.508348 21928 RaftPart.cpp:635] [Port: 44501, Space: 1387, Part: 44] Cannot append logs, clean the buffer
E0309 07:37:07.533210 21953 RaftPart.cpp:365] [Port: 44501, Space: 1387, Part: 25] The partition is not a leader
E0309 07:37:07.533401 21953 RaftPart.cpp:635] [Port: 44501, Space: 1387, Part: 25] Cannot append logs, clean the buffer
E0309 07:53:57.636727 21935 RaftPart.cpp:909] [Port: 44501, Space: 1387, Part: 97] processAppendLogResponses failed!
E0309 07:54:00.676102 21928 RaftPart.cpp:909] [Port: 44501, Space: 1387, Part: 26] processAppendLogResponses failed!
E0309 07:54:02.456835 21948 RaftPart.cpp:909] [Port: 44501, Space: 1387, Part: 2] processAppendLogResponses failed!
E0309 07:54:32.462111 22167 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 78] Receive response about askForVote from [10.0.4.12:44501], error code is -6
E0309 07:54:32.462599 22167 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 78] Receive response about askForVote from [10.0.4.13:44501], error code is -6

Similarly on node2, I got following storaged errors:

E0309 07:36:57.412503 4538 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 31] Receive response about askForVote from [10.0.6.4:44501], error code is -6
E0309 07:36:57.414577 4537 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 8] Receive response about askForVote from [10.0.4.12:44501], error code is -6
E0309 07:36:57.414638 4537 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 8] Receive response about askForVote from [10.0.6.4:44501], error code is -6
E0309 07:36:57.417548 4539 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 1] Receive response about askForVote from [10.0.4.12:44501], error code is -6
E0309 07:36:57.417690 4539 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 1] Receive response about askForVote from [10.0.6.4:44501], error code is -6
E0309 07:36:57.419178 4539 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 61] Receive response about askForVote from [10.0.4.12:44501], error code is -6
E0309 07:36:57.419268 4539 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 61] Receive response about askForVote from [10.0.6.4:44501], error code is -6
E0309 07:36:57.421118 4538 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 45] Receive response about askForVote from [10.0.6.4:44501], error code is -6
E0309 07:36:57.421151 4538 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 45] Receive response about askForVote from [10.0.4.12:44501], error code is -6
E0309 07:36:57.443406 4539 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 40] Receive response about askForVote from [10.0.4.12:44501], error code is -6
E0309 07:36:57.443442 4539 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 40] Receive response about askForVote from [10.0.6.4:44501], error code is -6
E0309 07:36:57.465037 4537 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 49] Receive response about askForVote from [10.0.4.12:44501], error code is -6
E0309 07:36:57.465075 4537 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 49] Receive response about askForVote from [10.0.6.4:44501], error code is -6
E0309 07:36:57.467965 4538 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 82] Receive response about askForVote from [10.0.6.4:44501], error code is -6
E0309 07:36:57.468001 4538 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 82] Receive response about askForVote from [10.0.4.12:44501], error code is -6
E0309 07:36:57.484309 4539 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 2] Receive response about askForVote from [10.0.4.12:44501], error code is -6
E0309 07:36:57.484392 4539 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 2] Receive response about askForVote from [10.0.6.4:44501], error code is -6
E0309 07:36:57.500111 4537 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 75] Receive response about askForVote from [10.0.4.12:44501], error code is -6
E0309 07:36:57.500145 4537 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 75] Receive response about askForVote from [10.0.6.4:44501], error code is -6
E0309 07:36:57.505946 4540 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 69] Receive response about askForVote from [10.0.6.4:44501], error code is -6
E0309 07:36:57.506034 4540 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 69] Receive response about askForVote from [10.0.4.12:44501], error code is -6
E0309 07:36:57.542008 4539 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 32] Receive response about askForVote from [10.0.6.4:44501], error code is -6
E0309 07:36:57.542173 4539 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 32] Receive response about askForVote from [10.0.4.12:44501], error code is -6
E0309 07:36:57.546768 4540 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 25] Receive response about askForVote from [10.0.4.12:44501], error code is -6
E0309 07:36:57.546804 4540 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 25] Receive response about askForVote from [10.0.6.4:44501], error code is -6
E0309 07:36:57.552219 4538 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 42] Receive response about askForVote from [10.0.4.12:44501], error code is -6
E0309 07:36:57.552255 4538 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 42] Receive response about askForVote from [10.0.6.4:44501], error code is -6
E0309 07:36:57.572082 4537 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 34] Receive response about askForVote from [10.0.4.12:44501], error code is -6
E0309 07:36:57.572116 4537 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 34] Receive response about askForVote from [10.0.6.4:44501], error code is -6
E0309 07:36:57.585803 4538 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 60] Receive response about askForVote from [10.0.4.12:44501], error code is -6
E0309 07:36:57.585868 4538 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 60] Receive response about askForVote from [10.0.6.4:44501], error code is -6
E0309 07:36:57.593748 4538 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 94] Receive response about askForVote from [10.0.4.12:44501], error code is -6
E0309 07:36:57.593855 4538 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 94] Receive response about askForVote from [10.0.6.4:44501], error code is -6
E0309 07:36:57.599562 4538 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 30] Receive response about askForVote from [10.0.4.12:44501], error code is -6
E0309 07:36:57.599596 4538 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 30] Receive response about askForVote from [10.0.6.4:44501], error code is -6
E0309 07:36:57.618669 4540 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 99] Receive response about askForVote from [10.0.4.12:44501], error code is -6
E0309 07:36:57.618698 4540 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 99] Receive response about askForVote from [10.0.6.4:44501], error code is -6
E0309 07:36:57.624469 4539 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 74] Receive response about askForVote from [10.0.4.12:44501], error code is -6
E0309 07:36:57.624511 4539 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 74] Receive response about askForVote from [10.0.6.4:44501], error code is -6
E0309 07:36:57.696851 4539 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 63] Receive response about askForVote from [10.0.4.12:44501], error code is -6
E0309 07:36:57.696943 4539 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 63] Receive response about askForVote from [10.0.6.4:44501], error code is -6
E0309 07:36:57.700453 4539 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 59] Receive response about askForVote from [10.0.4.12:44501], error code is -6
E0309 07:36:57.700541 4539 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 59] Receive response about askForVote from [10.0.6.4:44501], error code is -6
E0309 07:36:57.789592 4537 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 19] Receive response about askForVote from [10.0.4.12:44501], error code is -6
E0309 07:36:57.789675 4537 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 19] Receive response about askForVote from [10.0.6.4:44501], error code is -6
E0309 07:36:57.818164 4540 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 4] Receive response about askForVote from [10.0.4.12:44501], error code is -6
E0309 07:36:57.818200 4540 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 4] Receive response about askForVote from [10.0.6.4:44501], error code is -6
E0309 07:36:57.915560 4539 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 92] Receive response about askForVote from [10.0.4.12:44501], error code is -6
E0309 07:36:57.915593 4539 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 92] Receive response about askForVote from [10.0.6.4:44501], error code is -6
E0309 07:36:57.951822 4540 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 18] Receive response about askForVote from [10.0.4.12:44501], error code is -6
E0309 07:36:57.951856 4540 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 18] Receive response about askForVote from [10.0.6.4:44501], error code is -6
E0309 07:36:58.099823 4539 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 46] Receive response about askForVote from [10.0.4.12:44501], error code is -6
E0309 07:36:58.099938 4539 RaftPart.cpp:1075] [Port: 44501, Space: 1387, Part: 46] Receive response about askForVote from [10.0.6.4:44501], error code is -6
E0309 07:37:07.394817 4402 RaftPart.cpp:365] [Port: 44501, Space: 1387, Part: 31] The partition is not a leader
E0309 07:37:07.395031 4402 RaftPart.cpp:635] [Port: 44501, Space: 1387, Part: 31] Cannot append logs, clean the buffer
E0309 07:37:07.414168 4388 RaftPart.cpp:365] [Port: 44501, Space: 1387, Part: 11] The partition is not a leader
E0309 07:37:07.414484 4388 RaftPart.cpp:635] [Port: 44501, Space: 1387, Part: 11] Cannot append logs, clean the buffer
E0309 07:37:07.458024 4386 RaftPart.cpp:365] [Port: 44501, Space: 1387, Part: 9] The partition is not a leader
E0309 07:37:07.458156 4386 RaftPart.cpp:635] [Port: 44501, Space: 1387, Part: 9] Cannot append logs, clean the buffer
E0309 07:37:07.482432 4400 RaftPart.cpp:365] [Port: 44501, Space: 1387, Part: 3] The partition is not a leader
E0309 07:37:07.482638 4400 RaftPart.cpp:635] [Port: 44501, Space: 1387, Part: 3] Cannot append logs, clean the buffer

Another detail. Per some advice from slack, I restarted service on node1 and the partial execution message is gone. But if I ran the query on node2, same partial execution error. So I ended up restarting services on every node. This definitely seems some bug. Please help to take a look. I tried the ingestion twice and deterministically got the same error. Let me know if you need more information.

@yee Please help with this one.

@pandasheep @CPWstatic @kyle Please help explain this issue.

If something goes wrong, nebula-importer will output the error messages to a log file which is configured by logPath in your config.yaml. At same time, the data failed to insert into nebula will be saved in the failDataPath csv file.

So if you could not find any data in above csv file, there’s no errors when importing. And you could configure the options in your config.yaml like following image:

You can try to restart graphd service on all nodes.
Then connect to cluster and execute
use
balance leader

Wait two minutes or so until cluster balance leaders for parts,
you can validate state with
show parts

then try to execute query.
Hope this helped,
suppose it is a bug and will be resolved in future releases.

1 Like

thanks for the response. Yee, that is the exact config I have. But import was successful 100%. Per the suggestion of goranc, I have restarted the services and got it to work. This definitely indicates a bug. For offline importing, that is fine to restart services after importing. But I am concerned it might happen during the online case as well. I recommend that you try to reproduce and get to the bottom of it. These distributed system issues are tricky but it will differentiate good and mediocre systems.

I tried a few more times trying to figure out some patterns. And I found sometimes restarting services even do not work. Sometimes I got “[ERROR (-8)]: Lookup vertices failed” and other times I got “Lookup executor was partially performed.” Seems a big problem to me.

@haocornell-ms Thanks a lot for the detailed information! I will report this issue to team and have them reproduce it. Will keep you posted when I have some information.