Helm on eks

hello
we are trying to evaluate nebula, by installing the helm 0.9.0 on k8s - eks
the pods are not started and there is no info in the pods log or describe, to understand why,
any idea?
best thanks

1 Like

@MegaByte875 could you help with this?

@ronenl could you please help provide more context before @MegaByte875 coming here tomorrow?

The operator pods themself aren’t up right?

Hi Wey
We installed the nebula operator
https://docs.nebula-graph.io/2.6.1/nebula-operator/2.deploy-nebula-operator/
and than the cluster operator
https://docs.nebula-graph.io/2.6.1/nebula-operator/3.deploy-nebula-graph-cluster/3.2create-cluster-with-helm/

none of the pod running…

nebula-graphd-0 0/1 CrashLoopBackOff 65 5h15m
nebula-graphd-1 0/1 CrashLoopBackOff 65 5h15m
nebula-metad-0 0/1 Running 1 5h15m
nebula-metad-1 0/1 Running 0 5h15m
nebula-metad-2 0/1 Running 1 5h15m
nebula-storaged-0 0/1 CrashLoopBackOff 65 5h15m
nebula-storaged-1 0/1 CrashLoopBackOff 65 5h15m
nebula-storaged-2 0/1 CrashLoopBackOff 65 5h15m

describe gives that
Normal Scheduled 36m default-scheduler Successfully assigned nebula/nebula-storaged-1 to ip-192-168-31-155.eu-central-1.compute.internal
Normal SuccessfulAttachVolume 36m attachdetach-controller AttachVolume.Attach succeeded for volume “pvc-ecf52365-d710-445f-887f-35645bf88e8c”
Normal SuccessfulAttachVolume 36m attachdetach-controller AttachVolume.Attach succeeded for volume “pvc-c085c89c-04dd-4a6f-b930-66e07193bb1b”
Normal Pulled 36m kubelet Successfully pulled image “vesoft/nebula-storaged:v2.6.1” in 7.668887623s
Normal Pulled 36m kubelet Successfully pulled image “vesoft/nebula-storaged:v2.6.1” in 1.145997653s
Normal Pulled 36m kubelet Successfully pulled image “vesoft/nebula-storaged:v2.6.1” in 1.157818732s
Normal Created 35m (x4 over 36m) kubelet Created container storaged
Normal Started 35m (x4 over 36m) kubelet Started container storaged
Normal Pulled 35m kubelet Successfully pulled image “vesoft/nebula-storaged:v2.6.1” in 1.155159126s
Normal Pulling 34m (x5 over 36m) kubelet Pulling image “vesoft/nebula-storaged:v2.6.1”
Normal Pulled 34m kubelet Successfully pulled image “vesoft/nebula-storaged:v2.6.1” in 1.155928s
Warning BackOff 103s (x159 over 36m) kubelet Back-off restarting failed container

notice - in the nebula operator we had to config
admissionWebhook:
create: false
in order it to work
best

1 Like

Thanks @ronenl !

@MegaByte875 it seems nebula cluster pods are crashing w/o useful info shown in kubectl desc, any further checkpoints?

Dear @ronenl

would you mind sharing your nebula cluster configuration(the yaml)?

Thanks!

apiVersion: apps.nebula-graph.io/v1alpha1
kind: NebulaCluster
metadata:
name: nebula
spec:
graphd:
resources:
requests:
cpu: “500m”
memory: “500Mi”
limits:
cpu: “1”
memory: “1Gi”
replicas: 1
image: vesoft/nebula-graphd
version: v2.6.1
service:
type: NodePort
externalTrafficPolicy: Local
logVolumeClaim:
resources:
requests:
storage: 2Gi
storageClassName: gp2
metad:
resources:
requests:
cpu: “500m”
memory: “500Mi”
limits:
cpu: “1”
memory: “1Gi”
replicas: 1
image: vesoft/nebula-metad
version: v2.6.1
dataVolumeClaim:
resources:
requests:
storage: 2Gi
storageClassName: gp2
logVolumeClaim:
resources:
requests:
storage: 2Gi
storageClassName: gp2
storaged:
resources:
requests:
cpu: “500m”
memory: “500Mi”
limits:
cpu: “1”
memory: “1Gi”
replicas: 3
image: vesoft/nebula-storaged
version: v2.6.1
dataVolumeClaim:
resources:
requests:
storage: 2Gi
storageClassName: gp2
logVolumeClaim:
resources:
requests:
storage: 2Gi
storageClassName: gp2
reference:
name: statefulsets.apps
version: v1
schedulerName: default-scheduler
imagePullPolicy: Always

Hi @ronenl , can you provide the execution result of the command below?

kubectl get events --sort-by=.metadata.creationTimestamp

kubectl logs nebula-storaged-1

kubectl logs --previous nebula-storaged-1

kubectl get nebula-storaged-1 -o yaml

1 Like

kubectl get events --sort-by=.metadata.creationTimestamp -n nebula
LAST SEEN TYPE REASON OBJECT MESSAGE
7s Warning BackOff pod/nebula-storaged-0 Back-off restarting failed container
12s Warning BackOff pod/nebula-storaged-1 Back-off restarting failed container
13s Warning BackOff pod/nebula-storaged-2 Back-off restarting failed container
5m19s Warning BackOff pod/nebula-graphd-0 Back-off restarting failed container
6s Warning BackOff pod/nebula-graphd-1 Back-off restarting failed container
17s Warning Unhealthy pod/nebula-metad-1 Readiness probe failed: Get “http://192.168.94.192:19559/status”: dial tcp 192.168.94.192:19559: connect: connection refused
7s Warning Unhealthy pod/nebula-metad-0 Readiness probe failed: Get “http://192.168.49.251:19559/status”: dial tcp 192.168.49.251:19559: connect: connection refused
16s Warning Unhealthy pod/nebula-metad-2 Readiness probe failed: Get “http://192.168.45.33:19559/status”: dial tcp 192.168.45.33:19559: connect: connection refused
40m Normal Pulling pod/nebula-storaged-0 Pulling image “vesoft/nebula-storaged:v2.6.1”
25m Normal Pulling pod/nebula-storaged-1 Pulling image “vesoft/nebula-storaged:v2.6.1”
20m Normal Pulling pod/nebula-graphd-1 Pulling image “vesoft/nebula-graphd:v2.6.1”
12s Normal Pulling pod/nebula-graphd-0 Pulling image “vesoft/nebula-graphd:v2.6.1”

kubectl logs nebula-storaged-1 -n nebula
++ hostname
++ hostname

  • exec /usr/local/nebula/bin/nebula-storaged --flagfile=/usr/local/nebula/etc/nebula-storaged.conf --meta_server_addrs=nebula-metad-0.nebula-metad-headless.nebula.svc.cluster.local:9559,nebula-metad-1.nebula-metad-headless.nebula.svc.cluster.local:9559,nebula-metad-2.nebula-metad-headless.nebula.svc.cluster.local:9559 --local_ip=nebula-storaged-1.nebula-storaged-headless.nebula.svc.cluster.local --ws_ip=nebula-storaged-1.nebula-storaged-headless.nebula.svc.cluster.local --daemonize=false

$ kubectl logs --previous nebula-storaged-1 -n nebula
++ hostname
++ hostname

  • exec /usr/local/nebula/bin/nebula-storaged --flagfile=/usr/local/nebula/etc/nebula-storaged.conf --meta_server_addrs=nebula-metad-0.nebula-metad-headless.nebula.svc.cluster.local:9559,nebula-metad-1.nebula-metad-headless.nebula.svc.cluster.local:9559,nebula-metad-2.nebula-metad-headless.nebula.svc.cluster.local:9559 --local_ip=nebula-storaged-1.nebula-storaged-headless.nebula.svc.cluster.local --ws_ip=nebula-storaged-1.nebula-storaged-headless.nebula.svc.cluster.local --daemonize=false

kubectl get pod nebula-storaged-1 -o yaml -n nebula
apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/psp: eks.privileged
nebula-graph.io/cm-hash: 63f7f6396f569654
creationTimestamp: “2022-01-16T12:34:46Z”
generateName: nebula-storaged-
labels:
/app.kubernetes.io/cluster: nebula
/app.kubernetes.io/component: storaged
/app.kubernetes.io/managed-by: nebula-operator
/app.kubernetes.io/name: nebula-graph
/controller-revision-hash: nebula-storaged-75cf76b988
/statefulset.kubernetes.io/pod-name: nebula-storaged-1
name: nebula-storaged-1
namespace: nebula
ownerReferences:

  • apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: StatefulSet
    name: nebula-storaged
    uid: c65611b4-6070-4ab6-952f-f789868a732c
    resourceVersion: “12439792”
    uid: b759988f-635b-40d7-8bf6-5935d2d4fc47
    spec:
    containers:
  • command:
    • /bin/bash
    • -ecx
    • exec /usr/local/nebula/bin/nebula-storaged --flagfile=/usr/local/nebula/etc/nebula-storaged.conf
      –meta_server_addrs=nebula-metad-0.nebula-metad-headless.nebula.svc.cluster.local:9559,nebula-metad-1.nebula-metad-headless.nebula.svc.cluster.local:9559,nebula-metad-2.nebula-metad-headless.nebula.svc.cluster.local:9559
      –local_ip=$(hostname).nebula-storaged-headless.nebula.svc.cluster.local --ws_ip=$(hostname).nebula-storaged-headless.nebula.svc.cluster.local
      –daemonize=false
      image: vesoft/nebula-storaged:v2.6.1
      imagePullPolicy: Always
      name: storaged
      ports:
    • containerPort: 9779
      name: thrift
      protocol: TCP
    • containerPort: 19779
      name: http
      protocol: TCP
    • containerPort: 19780
      name: http2
      protocol: TCP
    • containerPort: 9778
      name: admin
      protocol: TCP
      readinessProbe:
      failureThreshold: 3
      httpGet:
      path: /status
      port: 19779
      scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 5
      resources:
      limits:
      cpu: “1”
      memory: 1Gi
      requests:
      cpu: 500m
      memory: 500Mi
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
    • mountPath: /usr/local/nebula/logs
      name: storaged-log
      subPath: logs
    • mountPath: /usr/local/nebula/data
      name: storaged-data
      subPath: data
    • mountPath: /usr/local/nebula/etc
      name: nebula-storaged
    • mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-2c5zj
      readOnly: true
      dnsPolicy: ClusterFirst
      enableServiceLinks: true
      hostname: nebula-storaged-1
      nodeName: ip-192-168-31-155.eu-central-1.compute.internal
      preemptionPolicy: PreemptLowerPriority
      priority: 0
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: default
      serviceAccountName: default
      subdomain: nebula-storaged-headless
      terminationGracePeriodSeconds: 30
      tolerations:
  • effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  • effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
    topologySpreadConstraints:
  • labelSelector:
    matchLabels:
    /app.kubernetes.io/cluster: nebula
    /app.kubernetes.io/component: storaged
    /app.kubernetes.io/managed-by: nebula-operator
    /app.kubernetes.io/name: nebula-graph
    maxSkew: 1
    topologyKey: /kubernetes.io/hostname
    whenUnsatisfiable: ScheduleAnyway
    volumes:
  • name: storaged-log
    persistentVolumeClaim:
    claimName: storaged-log-nebula-storaged-1
  • name: storaged-data
    persistentVolumeClaim:
    claimName: storaged-data-nebula-storaged-1
  • configMap:
    defaultMode: 420
    items:
    • key: nebula-storaged.conf
      path: nebula-storaged.conf
      name: nebula-storaged
      name: nebula-storaged
  • name: kube-api-access-2c5zj
    projected:
    defaultMode: 420
    sources:
    • serviceAccountToken:
      expirationSeconds: 3607
      path: token
    • configMap:
      items:
      • key: ca.crt
        path: ca.crt
        name: kube-root-ca.crt
    • downwardAPI:
      items:
      • fieldRef:
        apiVersion: v1
        fieldPath: metadata.namespace
        path: namespace
        status:
        conditions:
  • lastProbeTime: null
    lastTransitionTime: “2022-01-16T12:34:53Z”
    status: “True”
    type: Initialized
  • lastProbeTime: null
    lastTransitionTime: “2022-01-16T12:34:53Z”
    message: ‘containers with unready status: [storaged]’
    reason: ContainersNotReady
    status: “False”
    type: Ready
  • lastProbeTime: null
    lastTransitionTime: “2022-01-16T12:34:53Z”
    message: ‘containers with unready status: [storaged]’
    reason: ContainersNotReady
    status: “False”
    type: ContainersReady
  • lastProbeTime: null
    lastTransitionTime: “2022-01-16T12:34:53Z”
    status: “True”
    type: PodScheduled
    containerStatuses:
  • containerID: docker://ff4c008a852c92b977857090a4aef30b4fb3b87d8cb831172afde3eae7585f2f
    image: vesoft/nebula-storaged:v2.6.1
    imageID: docker-pullable://vesoft/nebula-storaged@sha256:ba25257ed618f363171b680a2beb6a9e62c8a88040da315d2de318893e2c653c
    lastState:
    terminated:
    containerID: docker://ff4c008a852c92b977857090a4aef30b4fb3b87d8cb831172afde3eae7585f2f
    exitCode: 1
    finishedAt: “2022-01-18T06:46:08Z”
    reason: Error
    startedAt: “2022-01-18T06:46:05Z”
    name: storaged
    ready: false
    restartCount: 493
    started: false
    state:
    waiting:
    message: back-off 5m0s restarting failed container=storaged pod=nebula-storaged-1_nebula(b759988f-635b-40d7-8bf6-5935d2d4fc47)
    reason: CrashLoopBackOff
    hostIP: 192.168.31.155
    phase: Running
    podIP: 192.168.5.157
    podIPs:
  • ip: 192.168.5.157
    qosClass: Burstable
    startTime: “2022-01-16T12:34:53Z”

I found that the readiness probe failed. We will analyze further.

1 Like

I think we can see some nebula’s log to locate the problem, they are some log files save in /usr/local/nebula/logs, mount by pvc, such as nebula-metad.ERROR, can you show some meta’s error log?

1 Like

hi, i dont see
volumeName: pvc-80cbef35-8762-4857-95f9-71c23d6ddc22
in the list of ebs volumes

found it
but nothing is there
sudo ls /usr/local
bin etc games include lib lib64 libexec sbin share src

found some files
[root@ip-192-168-56-47 ~]# find / -name nebula
/var/lib/docker/overlay2/0923700835d6577448a7b79a33cac3f93edbe73869588a3f52f1c7b9ea2956b6/diff/usr/local/nebula
/var/lib/docker/overlay2/b9928f264df3969b9b97ec245b585b2b7e836787176d41c22fe104f165af0dde/diff/usr/local/nebula
/var/lib/docker/overlay2/5eeb4ead1497a8581d82884e7c5cf5fd9aa7bbf4eccd4ff84e42a3811aa22538/diff/usr/local/nebula
/var/lib/docker/overlay2/1d69e6d8970aeafe332b6747fc8a9dc7a13f196b2ae3963d9ac5e4fdae26441c/diff/usr/local/nebula
/var/lib/docker/overlay2/5bebb5a86ceb106e2e86f9f41dfdb1ebaf01610a8e765cb1d07f3667e5dcdde4/diff/usr/local/nebula
/var/lib/docker/overlay2/5bebb5a86ceb106e2e86f9f41dfdb1ebaf01610a8e765cb1d07f3667e5dcdde4/merged/usr/local/nebula
/var/lib/kubelet/pods/b65680a5-77f6-4ca3-8f2c-117063ecbc8e/volumes/kubernetes.io~aws-ebs/pvc-80114d9b-9966-475d-a697-2736ef3ff241/data/meta/nebula
/var/lib/kubelet/pods/b65680a5-77f6-4ca3-8f2c-117063ecbc8e/volume-subpaths/pvc-80114d9b-9966-475d-a697-2736ef3ff241/metad/1/meta/nebula
/var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/eu-central-1b/vol-0f710f8143ebfc41e/data/meta/nebula

but no errors in the log

@veezhang @kqzh this pointed to volume mounting issue?

I see that there are some /var/lib/kubelet/pods/xxx/volumes dirs, but do not have nebula-metad.ERROR files.

It seems that the log have not been successfully written, or has not yet started writing.

@ronenl Can you provide more information?

For examples, the kubernetes version, the cloud platform(aws?), the vm type, the region, and etc.

We have never encountered such a problem, wondering if it can be reproduced.

1 Like

This topic was automatically closed 45 days after the last reply. New replies are no longer allowed.