Star

Is it safe to change rocksdb parameters after data are loaded

  • Nebula Graph: 1.2.0
  • Deployment type: Cluster with 4 nodes
  • Hardware info
    • Disk in use NVMe 4 x 3.5TB
    • CPU and memory size: 64 cores 256GB RAM
  • Graph space: 128 partitions 2 replicas

Space is loaded with 24 billion Vertices and 350 billion Edges

I have settings for nebula-storaged:

> # rocksdb DBOptions in json, each name and value of option is a string, given as "option_name":"option_value" separated by comma
> --rocksdb_db_options={"max_subcompactions":"4","max_background_jobs":"4","stats_dump_period_sec":"200","write_thread_max_yield_usec":"600"}
> # rocksdb ColumnFamilyOptions in json, each name and value of option is string, given as "option_name":"option_value" separated by comma
> --rocksdb_column_family_options={"disable_auto_compactions":"false","write_buffer_size":"67108864","max_write_buffer_number":"8","max_bytes_for_level_base":"268435456","level0_file_num_compaction_trigger":"10","min_write_buffer_number_to_merge":"2","max_write_buffer_number_to_maintain":"1"}
> # rocksdb BlockBasedTableOptions in json, each name and value of option is string, given as "option_name":"option_value" separated by comma
> --rocksdb_block_based_table_options={"block_size":"8192","block_restart_interval":"2"}

# Whether or not to enable rocksdb's prefix bloom filter, disabled by default.
--enable_rocksdb_prefix_filtering=false
# Whether or not to enable the whole key filtering.
--enable_rocksdb_whole_key_filtering=true
# The prefix length for each key to use as the filter value.
# can be 12 bytes(PartitionId + VertexID), or 16 bytes(PartitionId + VertexID + TagID/EdgeType).
--rocksdb_filtering_prefix_length=12

Is it safe to change parameters after space is loaded with data ?

Specifically parameters for:

“max_subcompactions”:“4”,“max_background_jobs”:“4”

eg. change to values of 8, to increase background workers for compactions.

and combination of

–enable_rocksdb_prefix_filtering=false
–enable_rocksdb_whole_key_filtering=true

What will happen if we set both above to true ?
Is it even possible to combine both combination of the filters ?

  1. The configs sentence support modify some options when server running, or you could change other options by configurations with restart.
  2. enable_rocksdb_prefix_filtering is used to determine will build the prefix key extractor, and the enable_rocksdb_whole_key_filtering is used to choose the real filter policy.

Thanks for answers.
I’ve asked if it is safe to change some rocksdb parameters after the data is loaded into cluster ?

The reason for asking is that I have issues with data after I’ve changed parameters for
“max_subcompactions”:“4”,“max_background_jobs”:“4” from 4 to 8
and change enable_rocksdb_prefix_filtering=false to true.

After restarting the nodes I have issue that not all data is available for query
and have errors and warnings, queries are returning invalid number of records (not complete) some of them returns nothing even there is data for those vertices. It is described in

I guess it’s caused by request to non-leader partition, in 1.2.0 version when restarting the client request to origin partition but maybe it’s not elected to be leader. So you could try multiple time request, will still report error?

I suppose enable_rocksdb_prefix_filtering should not change after loaded with data. Or at least, read option need to be specified, see rocksdb doc

Requests to same node returns non complete data.
Actually 2 of 4 nodes return partial data, but 2 other nodes return completa data.
When check with SHOW PARTS got some parts without Leader. It looks like not be able to get leader for part causes that issue, so it is not safe to have 2 replicas on cluster, that should be noticed somewhere in documentation.
That should be prevented also in CREATE SPACE statement, because in some further point in time user can lose data and not be able to query.

2 raft node is safe but just without fault tolerance.

@goranc In the documentation, it is suggested that you always set the replica number as an odd number for the quorum protocol to work properly. :slight_smile:


Link to the doc: CREATE SPACE - Nebula Graph Database Manual

Good suggestion! I will submit this requirement to team for evaluation. @steam

As to your problem of not rendering complete data, @critical27 please help take a look.

I don’t think 2 replica is the root cause. Please reboot graphd only, and retry the query.

Restarted only graphd service on all nodes.
SHOW PARTS now show that all parts have leader.
Some of the queries still don’t return data from all TAGs.

Tried to restart metad on all nodes, but now not all parts have a leader.

What if you show hosts several times, is the leader distribution differing from each other?
Please print graph/storage/meta conf here.

If it is safe, we should have fault tolerance with two copies.

When one node goes down, then other node takes over as leader and we have healthy system
but with only one replica, so in that moment without fault tolerance until we replace felthy node.

If we can live with just two replicas then it is scenario when we can lower storage requirements for 1/3 against requirement to have 3 replicas for fault tolerance.
Having 3 replicas is more robust because we can have one node down and still have fault tolerance.

No change in leader distribution in show hosts several time execution.

SHOW HOSTS
=============================================================================================
| Ip          | Port | Status | Leader count | Leader distribution | Partition distribution |
=============================================================================================
| 10.20.33.29 | 9779 | online | 31           | flgraph: 31         | flgraph: 96            |
---------------------------------------------------------------------------------------------
| 10.20.33.30 | 9779 | online | 33           | flgraph: 33         | flgraph: 96            |
---------------------------------------------------------------------------------------------
| 10.20.33.31 | 9779 | online | 31           | flgraph: 31         | flgraph: 96            |
---------------------------------------------------------------------------------------------
| 10.20.33.32 | 9779 | online | 33           | flgraph: 33         | flgraph: 96            |
---------------------------------------------------------------------------------------------
| Total       |      |        | 128          | flgraph: 128        | flgraph: 384           |
---------------------------------------------------------------------------------------------

Configuration files for cluster (example from first node, other nodes are the same just IP address is different).

metad.conf

########## basics ##########
# Whether to run as a daemon process
--daemonize=true
# The file to host the process id
--pid_file=pids/nebula-metad.pid

########## logging ##########
# The directory to host logging files, which must already exists
--log_dir=/data/logs
# Log level, 0, 1, 2, 3 for INFO, WARNING, ERROR, FATAL respectively
--minloglevel=1
# Verbose log level, 1, 2, 3, 4, the higher of the level, the more verbose of the logging
--v=0
# Maximum seconds to buffer the log messages
--logbufsecs=0

########## networking ##########
# Comma separated Meta Server addresses
--meta_server_addrs=10.20.33.29:9559,10.20.33.30:9559,10.20.33.31:9559
# Local IP used to identify the nebula-metad process.
# Change it to an address other than loopback if the service is distributed or
# will be accessed remotely.
--local_ip=10.20.33.29
# Meta daemon listening port
--port=9559
# HTTP service ip
--ws_ip=10.20.33.29
# HTTP service port
--ws_http_port=19559
# HTTP2 service port
--ws_h2_port=19560

########## storage ##########
# Root data path, here should be only single path for metad
--data_path=/data/meta

########## Misc #########
# The default number of parts when a space is created
--default_parts_num=100
# The default replica factor when a space is created
--default_replica_factor=1

--heartbeat_interval_secs=10

############## rocksdb Options ##############
--rocksdb_wal_sync=true

graphd.conf

########## basics ##########
# Whether to run as a daemon process
--daemonize=true
# The file to host the process id
--pid_file=pids/nebula-graphd.pid
# Whether to enable optimizer
--enable_optimizer=false

########## logging ##########
# The directory to host logging files, which must already exists
--log_dir=/data/logs
# Log level, 0, 1, 2, 3 for INFO, WARNING, ERROR, FATAL respectively
--minloglevel=1
# Verbose log level, 1, 2, 3, 4, the higher of the level, the more verbose of the logging
--v=0
# Maximum seconds to buffer the log messages
--logbufsecs=0
# Whether to redirect stdout and stderr to separate output files
--redirect_stdout=true
# Destination filename of stdout and stderr, which will also reside in log_dir.
--stdout_log_file=graphd-stdout.log
--stderr_log_file=graphd-stderr.log
# Copy log messages at or above this level to stderr in addition to logfiles. The numbers of severity levels INFO, WARNING, ERROR, and FATAL are 0, 1, 2, and 3, respectively.
--stderrthreshold=2

########## networking ##########
# Comma separated Meta Server Addresses
--meta_server_addrs=10.20.33.29:9559,10.20.33.30:9559,10.20.33.31:9559
# Local IP used to identify the nebula-graphd process.
# Change it to an address other than loopback if the service is distributed or
# will be accessed remotely.
--local_ip=10.20.33.29
# Network device to listen on
--listen_netdev=any
# Port to listen on
--port=9669
# To turn on SO_REUSEPORT or not
--reuse_port=false
# Backlog of the listen socket, adjust this together with net.core.somaxconn
--listen_backlog=1024
# Seconds before the idle connections are closed, 0 for never closed
--client_idle_timeout_secs=0
# Seconds before the idle sessions are expired, 0 for no expiration
# --session_idle_timeout_secs=60000
--session_idle_timeout_secs=0
# The number of threads to accept incoming connections
--num_accept_threads=32
# The number of networking IO threads, 0 for # of CPU cores
--num_netio_threads=0
# The number of threads to execute user queries, 0 for # of CPU cores
--num_worker_threads=0
# HTTP service ip
--ws_ip=10.20.33.29
# HTTP service port
--ws_http_port=19669
# HTTP2 service port
--ws_h2_port=19670

# Heartbeat interval of communication between meta client and graphd service
--heartbeat_interval_secs=10

########## authorization ##########
# Enable authorization
--enable_authorize=false

########## authentication ##########
# User login authentication type, password for nebula authentication, ldap for ldap authentication, cloud for cloud authentication
--auth_type=password

storaged.conf

########## basics ##########
# Whether to run as a daemon process
--daemonize=true
# The file to host the process id
--pid_file=pids/nebula-storaged.pid

########## logging ##########
# The directory to host logging files, which must already exists
--log_dir=/data/logs
# Log level, 0, 1, 2, 3 for INFO, WARNING, ERROR, FATAL respectively
--minloglevel=1
# Verbose log level, 1, 2, 3, 4, the higher of the level, the more verbose of the logging
--v=0
# Maximum seconds to buffer the log messages
--logbufsecs=0

########## networking ##########
# Comma separated Meta server addresses
--meta_server_addrs=10.20.33.29:9559,10.20.33.30:9559,10.20.33.31:9559
# Local IP used to identify the nebula-storaged process.
# Change it to an address other than loopback if the service is distributed or
# will be accessed remotely.
--local_ip=10.20.33.29
# Storage daemon listening port
--port=9779
# HTTP service ip
--ws_ip=10.20.33.29
# HTTP service port
--ws_http_port=19779
# HTTP2 service port
--ws_h2_port=19780
# heartbeat with meta service
--heartbeat_interval_secs=10

######### Raft #########
# Raft election timeout
# --raft_heartbeat_interval_secs=30
--raft_heartbeat_interval_secs=20
# RPC timeout for raft client (ms)
# --raft_rpc_timeout_ms=500
--raft_rpc_timeout_ms=2000
## recycle Raft WAL
# --wal_ttl=14400
--wal_ttl=86400

########## Disk ##########
# Root data path. split by comma. e.g. --data_path=/disk1/path1/,/disk2/path2/
# One path per Rocksdb instance.
# --data_path=data/storage
--data_path=/data01,/data02,/data03,/data04

# indicate whether to remove data from a deleted graph space
--auto_remove_invalid_space=true

# The default reserved bytes for one batch operation
# --rocksdb_batch_size=4096
--rocksdb_batch_size=8192
# The default block cache size used in BlockBasedTable. (MB)
# recommend: 1/3 of all memory
# --rocksdb_block_cache=4096
--rocksdb_block_cache=81920

# The default block cache size used in BlockBasedTable.
# The unit is MB.
# --rocksdb_block_cache=1024

# Compression algorithm, options: no,snappy,lz4,lz4hc,zlib,bzip2,zstd
# For the sake of binary compatibility, the default value is snappy.
# Recommend to use:
#   * lz4 to gain more CPU performance, with the same compression ratio with snappy
#   * zstd to occupy less disk space
#   * lz4hc for the read-heavy write-light scenario
--rocksdb_compression=lz4

# Set different compressions for different levels
# For example, if --rocksdb_compression is snappy,
# "no:no:lz4:lz4::zstd" is identical to "no:no:lz4:lz4:snappy:zstd:snappy"
# In order to disable compression for level 0/1, set it to "no:no"

--rocksdb_compression_per_level=

############## rocksdb Options ##############
--rocksdb_disable_wal=true
# rocksdb DBOptions in json, each name and value of option is a string, given as "option_name":"option_value" separated by comma
# --rocksdb_db_options={"max_subcompactions":"4","max_background_jobs":"4"}
--rocksdb_db_options={"max_subcompactions":"4","max_background_jobs":"4","stats_dump_period_sec":"200","write_thread_max_yield_usec":"600"}
# rocksdb ColumnFamilyOptions in json, each name and value of option is string, given as "option_name":"option_value" separated by comma
# --rocksdb_column_family_options={"disable_auto_compactions":"false","write_buffer_size":"67108864","max_write_buffer_number":"4","max_bytes_for_level_base":"268435456"}
--rocksdb_column_family_options={"disable_auto_compactions":"false","write_buffer_size":"67108864","max_write_buffer_number":"8","max_bytes_for_level_base":"268435456","level0_file_num_compaction_trigger":"10","min_write_buffer_number_to_merge":"2","max_write_buffer_number_to_maintain":"1"}
# rocksdb BlockBasedTableOptions in json, each name and value of option is string, given as "option_name":"option_value" separated by comma
# --rocksdb_block_based_table_options={"block_size":"8192"}
--rocksdb_block_based_table_options={"block_size":"8192","block_restart_interval":"2"}

# Whether or not to enable rocksdb's statistics, disabled by default
--enable_rocksdb_statistics=false

# Statslevel used by rocksdb to collection statistics, optional values are
#   * kExceptHistogramOrTimers, disable timer stats, and skip histogram stats
#   * kExceptTimers, Skip timer stats
#   * kExceptDetailedTimers, Collect all stats except time inside mutex lock AND time spent on compression.
#   * kExceptTimeForMutex, Collect all stats except the counters requiring to get time inside the mutex lock.
#   * kAll, Collect all stats
--rocksdb_stats_level=kExceptHistogramOrTimers

# Whether or not to enable rocksdb's prefix bloom filter, disabled by default.
--enable_rocksdb_prefix_filtering=false
# Whether or not to enable the whole key filtering.
--enable_rocksdb_whole_key_filtering=true
# The prefix length for each key to use as the filter value.
# can be 12 bytes(PartitionId + VertexID), or 16 bytes(PartitionId + VertexID + TagID/EdgeType).
--rocksdb_filtering_prefix_length=12

############### misc ####################
--max_handlers_per_req=48
--heartbeat_interval_secs=10

############# edge samplings ##############
# --enable_reservoir_sampling=false
# --max_edge_returned_per_vertex=2147483647
--enable_partitioned_index_filter=true
--max_edge_returned_per_vertex=1048576