Importing large data w/ lots of tags and edge types using nebula importer

To better locate and solve your problem, please follow the template below and provide some relevant information when asking questions on the forum:

  • Nebula Graph version
    v2.5.0

  • Deployment type (Cluster/Single-Host/Docker/DBaaS)
    Single-Host

  • Hardware info

    • Disk in use (SSD/HDD)
      AWS EBS gp2 1024GB
    • CPU and memory size
      4 CPUs, 61GB memory
  • How you have created the graph space in question. Execute describe space xxx to get the information
    ±—±---------------------±-----------------±---------------±--------±-----------±-------------------±------------±----------±--------+
    | ID | Name | Partition Number | Replica Factor | Charset | Collate | Vid Type | Atomic Edge | Group | Comment |
    ±—±---------------------±-----------------±---------------±--------±-----------±-------------------±------------±----------±--------+
    | 40 | “importer_load_test” | 100 | 2 | “utf8” | “utf8_bin” | “FIXED_STRING(20)” | false | “default” | |
    ±—±---------------------±-----------------±---------------±--------±-----------±-------------------±------------±----------±--------+

  • Detailed description of the problem
    In addition to specify CREATE TAG xxx and CREATE EDGE xxx for hundreds of times in the yaml config file (since we may have several hundreds of Tags and EdgeTypes from the csv data), is there any other better ways to achieve this?

For example: we may have 300 distinct edge types with each have the same set of attributes, how shall we define them and import the data?

Currently, the schema should be ready first before importing data. You can certainly create the tag and edge types by some scripts, before running nebula importer. However, the mappings between csv files and the tag/edge types are still needed in the yaml file. You may consider generate the yaml programmatically. In addition, adding headers to the csv files will make your life easier.