-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nebula-operator : graphd cyclic restart #108
Comments
版本0.9.0 |
我想关掉尽快检查试试 readinessProbe: {} 这个字段应该怎么写正确 |
只有 graphD 的话猜测是 vesoft-inc/nebula#3278 造成的,node hostOS 是 cgroupv2 的吧? 可以看看 graphD log 里是不是有这个issue一样的报错? cgroupv2 问题已经在master 解决了,之后的 2.6.2和 3.x 都不会有这个问题。 |
对的 cgroupv2 应该怎么解决 哈哈 几天了这个问题 哈哈 |
sorry 哈,不知道您测试的话有没有条件自己的 cherry-pick 这个 vesoft-inc/nebula#3419 然后build containe image 哈? 着急的话可以先用 2.5(这个版本要在配置里把内存水位设置为1.0,否则在容器环境会有另一个问题),那时候还没有引入这个不支持 cgroup2 的问题😭。 能等的话可以等一下 2.6.2(hotfix)、3.0 (新版本)的发布哈。 |
好的 期待你们更新版本 哈哈 |
你发的这个网址 vesoft-inc/nebula#3419 访问404 有其它的? 我自己编译一下这个镜像 |
我把单独的这个镜像退回2.5.0 还是会重启 |
sorry,是这个 vesoft-inc/nebula#3419 应该是不能单独退一个,2.5.0 不兼容 其他版本的 meta/storage 哈。 |
那重新拉取master分支 镜像版本是2.6.1 nebula-operator 版本0.9.0 这个问题可以解决不 |
这个问题就是 2.6.1 上的😭 |
那我现在把 master分支拉下来 我重新编译这3个镜像,是不是就可以解决问题了
…------------------ 原始邮件 ------------------
发件人: "vesoft-inc/nebula-operator" ***@***.***>;
发送时间: 2022年1月24日(星期一) 晚上10:45
***@***.***>;
***@***.******@***.***>;
主题: Re: [vesoft-inc/nebula-operator] nebula-operator 安装后graphd-0不断重启 (Issue #108)
那重新拉取master分支 镜像版本是2.6.1 nebula-operator 版本0.9.0 这个问题可以解决不
这个问题就是 2.6.1 上的😭
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
master 现在其实是 3.x 的 release candidate, operator 现在应该还没适配 3.x 😭,否则 其实 docker hub 上 nightly 的 tag 就是每天的 master build。 cc @veezhang @MegaByte875 现在的master 是不是 operator 不支持? 现在(node host os 是 cgroupv2)有更好的办法么? |
我应该怎么做,这个重启问题可以解决 ,比如要适配哪个版本的 operator ,比如 operator0.8.0适配的nebula-graphd是2.5.0还是什么?
…------------------ 原始邮件 ------------------
发件人: "vesoft-inc/nebula-operator" ***@***.***>;
发送时间: 2022年1月24日(星期一) 晚上10:52
***@***.***>;
***@***.******@***.***>;
主题: Re: [vesoft-inc/nebula-operator] nebula-operator 安装后graphd-0不断重启 (Issue #108)
那我现在把 master分支拉下来 我重新编译这3个镜像,是不是就可以解决问题了
master 现在其实是 3.x 的 release candidate, operator 现在应该还没适配 3.x 😭,否则 其实 docker hub 上 nightly 的 tag 就是每天的 master build。
cc @veezhang 现在的master 是不是 operator 不支持? 现在(node host os 是 cgroupv2)有更好的办法么?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
例如 目前哪个版本比较稳定 适合生产环境使用 例如 operator版本nebula-graphd版本应该怎么适配, 我们目前在调研阶段 |
neubla-operartor 0.8.0 support nebula 2.5.0, nebual-operator 0.9.0 support nebula v2.5.0 & v2.6.0,@dockerxu123 You can try |
换了 还是一样的 nebula-operator 0.9.0 with nebula 2.5.0
…------------------ 原始邮件 ------------------
发件人: "vesoft-inc/nebula-operator" ***@***.***>;
发送时间: 2022年1月25日(星期二) 中午1:01
***@***.***>;
***@***.******@***.***>;
主题: Re: [vesoft-inc/nebula-operator] nebula-operator 安装后graphd-0不断重启 (Issue #108)
neubla-operartor 0.8.0 support nebula 2.5.0, nebual-operator 0.9.0 support nebula v2.5.0 & ***@***.*** You can try
nebula-operator 0.9.0 with nebula 2.5.0 @wey-gu We do not support master now
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
--system_memory_high_watermark_ratio=1 @wey-gu I think the parameter can't resolve the problem, graphd should support cgroupv2. |
This the file ref: |
2.5.0 还是 graphD 起不来,而 metaD,storageD 是正常的?
|
是的metaD,storageD 正常,nebula-graphd不断重启
pod还没有起来没有看到相关日志
…------------------ 原始邮件 ------------------
发件人: "vesoft-inc/nebula-operator" ***@***.***>;
发送时间: 2022年1月25日(星期二) 下午3:35
***@***.***>;
***@***.******@***.***>;
主题: Re: [vesoft-inc/nebula-operator] nebula-operator 安装后graphd-0不断重启 (Issue #108)
换了 还是一样的 nebula-operator 0.9.0 with nebula 2.5.0
…
------------------ 原始邮件 ------------------ 发件人: "vesoft-inc/nebula-operator" @.>; 发送时间: 2022年1月25日(星期二) 中午1:01 @.>; @.@.>; 主题: Re: [vesoft-inc/nebula-operator] nebula-operator 安装后graphd-0不断重启 (Issue #108) neubla-operartor 0.8.0 support nebula 2.5.0, nebual-operator 0.9.0 support nebula v2.5.0 & @.*** You can try nebula-operator 0.9.0 with nebula 2.5.0 @wey-gu We do not support master now — Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.Message ID: @.***>
2.5.0 还是 graphD 起不来,而 metaD,storageD 是正常的?
能看到日志么?
kubectl get events --sort-by=.metadata.creationTimestamp kubectl logs <nebula-graphd-pod-id> kubectl logs --previous <nebula-graphd-pod-id> kubectl get <nebula-graphd-pod-id>-o yaml 还有 log 对应的 volume 里有东西么?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
```
Last login: Tue Jan 25 18:29:40 2022 from 192.168.2.2
***@***.*** ~]# kubectl get events --sort-by=.metadata.creationTimestamp
LAST SEEN TYPE REASON OBJECT MESSAGE
8m53s Warning listen tcp4 :31544: bind: address already in use node/192.168.5.133 can't open port "nodePort for default/nebula-graphd-nodeport-svc:thrift" (:31544/tcp4), skipping it
109s Normal CreateResourceSuccess cdi/cdi Successfully ensured SecurityContextConstraint exists
***@***.*** ~]# kubectl get events --sort-by=.metadata.creationTimestamp -n kube-nebula
LAST SEEN TYPE REASON OBJECT MESSAGE
31s Warning BackOff pod/nebula-graphd-0 Back-off restarting failed container
***@***.*** ~]# kubectl logs nebula-graphd-0 -n kube-nebula
++ hostname
++ hostname
+ exec /usr/local/nebula/bin/nebula-graphd --flagfile=/usr/local/nebula/etc/nebula-graphd.conf --meta_server_addrs=nebula-metad-0.nebula-metad-headless.kube-nebula.svc.cluster.local:9559 --local_ip=nebula-graphd-0.nebula-graphd-svc.kube-nebula.svc.cluster.local --ws_ip=nebula-graphd-0.nebula-graphd-svc.kube-nebula.svc.cluster.local --daemonize=false
***@***.*** ~]# kubectl logs --previous nebula-graphd-0 -n kube-nebula
++ hostname
++ hostname
+ exec /usr/local/nebula/bin/nebula-graphd --flagfile=/usr/local/nebula/etc/nebula-graphd.conf --meta_server_addrs=nebula-metad-0.nebula-metad-headless.kube-nebula.svc.cluster.local:9559 --local_ip=nebula-graphd-0.nebula-graphd-svc.kube-nebula.svc.cluster.local --ws_ip=nebula-graphd-0.nebula-graphd-svc.kube-nebula.svc.cluster.local --daemonize=false
***@***.*** ~]# kubectl get pod nebula-graphd-0 -o yaml -n kube-nebula
apiVersion: v1
kind: Pod
metadata:
annotations:
nebula-graph.io/cm-hash: ec8affde7a701107
creationTimestamp: "2022-01-25T06:10:44Z"
generateName: nebula-graphd-
labels:
app.kubernetes.io/cluster: nebula
app.kubernetes.io/component: graphd
app.kubernetes.io/managed-by: nebula-operator
app.kubernetes.io/name: nebula-graph
controller-revision-hash: nebula-graphd-76bd6446fb
statefulset.kubernetes.io/pod-name: nebula-graphd-0
name: nebula-graphd-0
namespace: kube-nebula
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: StatefulSet
name: nebula-graphd
uid: 52e62f1c-5b6f-4b45-90e6-b5c032b06d7b
resourceVersion: "11688234"
uid: 07d470a2-a639-40cc-a0c4-cf4f29fbfb77
spec:
containers:
- command:
- /bin/bash
- -ecx
- exec /usr/local/nebula/bin/nebula-graphd --flagfile=/usr/local/nebula/etc/nebula-graphd.conf
--meta_server_addrs=nebula-metad-0.nebula-metad-headless.kube-nebula.svc.cluster.local:9559
--local_ip=$(hostname).nebula-graphd-svc.kube-nebula.svc.cluster.local --ws_ip=$(hostname).nebula-graphd-svc.kube-nebula.svc.cluster.local
--daemonize=false
image: vesoft/nebula-graphd:v2.5.0
imagePullPolicy: IfNotPresent
name: graphd
ports:
- containerPort: 9669
name: thrift
protocol: TCP
- containerPort: 19669
name: http
protocol: TCP
- containerPort: 19670
name: http2
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /status
port: 19669
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
resources:
limits:
cpu: "1"
memory: 2Gi
requests:
cpu: 500m
memory: 500Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /usr/local/nebula/logs
name: graphd-log
subPath: logs
- mountPath: /usr/local/nebula/etc
name: nebula-graphd
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-w8p6s
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
hostname: nebula-graphd-0
nodeName: 192.168.5.141
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
subdomain: nebula-graphd-svc
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
topologySpreadConstraints:
- labelSelector:
matchLabels:
app.kubernetes.io/cluster: nebula
app.kubernetes.io/component: graphd
app.kubernetes.io/managed-by: nebula-operator
app.kubernetes.io/name: nebula-graph
maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
volumes:
- name: graphd-log
persistentVolumeClaim:
claimName: graphd-log-nebula-graphd-0
- configMap:
defaultMode: 420
items:
- key: nebula-graphd.conf
path: nebula-graphd.conf
name: nebula-graphd
name: nebula-graphd
- name: kube-api-access-w8p6s
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2022-01-25T06:10:46Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2022-01-25T06:10:46Z"
message: 'containers with unready status: [graphd]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2022-01-25T06:10:46Z"
message: 'containers with unready status: [graphd]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2022-01-25T06:10:46Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://bfbb389ae1d343e2497ccab31d9db57c2a34ad3fd4d41d00e59eb25bad142033
image: vesoft/nebula-graphd:v2.5.0
imageID: ***@***.***:d2ab96e28e9e6ef96e679fdcdeaab90b7b78ace38b94fdbae55f7ff69a9a710b
lastState:
terminated:
containerID: docker://bfbb389ae1d343e2497ccab31d9db57c2a34ad3fd4d41d00e59eb25bad142033
exitCode: 139
finishedAt: "2022-01-25T12:09:22Z"
reason: Error
startedAt: "2022-01-25T12:09:22Z"
name: graphd
ready: false
restartCount: 79
started: false
state:
waiting:
message: back-off 5m0s restarting failed container=graphd pod=nebula-graphd-0_kube-nebula(07d470a2-a639-40cc-a0c4-cf4f29fbfb77)
reason: CrashLoopBackOff
hostIP: 192.168.5.141
phase: Running
podIP: 172.20.119.119
podIPs:
- ip: 172.20.119.119
qosClass: Burstable
startTime: "2022-01-25T06:10:46Z"
```
2.5.0 还是 graphD 起不来,而 metaD,storageD 是正常的?
能看到日志么?
kubectl get events --sort-by=.metadata.creationTimestamp kubectl logs <nebula-graphd-pod-id> kubectl logs --previous <nebula-graphd-pod-id> kubectl get <nebula-graphd-pod-id>-o yaml 还有 log 对应的 volume 里有东西么?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
nebula-metad报错日志:
…------------------ 原始邮件 ------------------
发件人: "vesoft-inc/nebula-operator" ***@***.***>;
发送时间: 2022年1月25日(星期二) 下午3:35
***@***.***>;
***@***.******@***.***>;
主题: Re: [vesoft-inc/nebula-operator] nebula-operator 安装后graphd-0不断重启 (Issue #108)
换了 还是一样的 nebula-operator 0.9.0 with nebula 2.5.0
…
------------------ 原始邮件 ------------------ 发件人: "vesoft-inc/nebula-operator" @.>; 发送时间: 2022年1月25日(星期二) 中午1:01 @.>; @.@.>; 主题: Re: [vesoft-inc/nebula-operator] nebula-operator 安装后graphd-0不断重启 (Issue #108) neubla-operartor 0.8.0 support nebula 2.5.0, nebual-operator 0.9.0 support nebula v2.5.0 & @.*** You can try nebula-operator 0.9.0 with nebula 2.5.0 @wey-gu We do not support master now — Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.Message ID: @.***>
2.5.0 还是 graphD 起不来,而 metaD,storageD 是正常的?
能看到日志么?
kubectl get events --sort-by=.metadata.creationTimestamp kubectl logs <nebula-graphd-pod-id> kubectl logs --previous <nebula-graphd-pod-id> kubectl get <nebula-graphd-pod-id>-o yaml 还有 log 对应的 volume 里有东西么?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
hello all:
我刚才把nebula-graphd版本换成2.0.1 其它是2.5.0。nebula-graphd不会重启了 但是服务连接不上容器状态也不行
我感觉版本的兼容性还是要多测试下
…------------------ 原始邮件 ------------------
发件人: "vesoft-inc/nebula-operator" ***@***.***>;
发送时间: 2022年1月25日(星期二) 下午3:35
***@***.***>;
***@***.******@***.***>;
主题: Re: [vesoft-inc/nebula-operator] nebula-operator 安装后graphd-0不断重启 (Issue #108)
换了 还是一样的 nebula-operator 0.9.0 with nebula 2.5.0
…
------------------ 原始邮件 ------------------ 发件人: "vesoft-inc/nebula-operator" @.>; 发送时间: 2022年1月25日(星期二) 中午1:01 @.>; @.@.>; 主题: Re: [vesoft-inc/nebula-operator] nebula-operator 安装后graphd-0不断重启 (Issue #108) neubla-operartor 0.8.0 support nebula 2.5.0, nebual-operator 0.9.0 support nebula v2.5.0 & @.*** You can try nebula-operator 0.9.0 with nebula 2.5.0 @wey-gu We do not support master now — Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.Message ID: @.***>
2.5.0 还是 graphD 起不来,而 metaD,storageD 是正常的?
能看到日志么?
kubectl get events --sort-by=.metadata.creationTimestamp kubectl logs <nebula-graphd-pod-id> kubectl logs --previous <nebula-graphd-pod-id> kubectl get <nebula-graphd-pod-id>-o yaml 还有 log 对应的 volume 里有东西么?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
非常奇怪,可能 2.5.0 graphD 也使用了 cgroupv1 的 sysfs 做内存使用分析,造成了一样的问题。 不支持跨版本的 graphD 和 metaD storageD 组合哈,2.0.1 的镜像可能要配合当时的 operator 版本,应该是 0.8.0 之前的版本,如果要部署 2.0.1 可以清理掉所有东西(包括CRD那些东西),全弄 2.0.1。 辛苦啦~~~谢谢反馈 另外建议等 2.6.2 哈,估计很快啦~~ |
我可以确定就是这个文件导致的nebula-graphd,我把2.5.0的nebula-graphd文件重新写dockerfile把文件加进去还是会重启
…------------------ 原始邮件 ------------------
发件人: "vesoft-inc/nebula-operator" ***@***.***>;
发送时间: 2022年1月26日(星期三) 中午11:09
***@***.***>;
***@***.******@***.***>;
主题: Re: [vesoft-inc/nebula-operator] nebula-operator 安装后graphd-0不断重启 (Issue #108)
非常奇怪,可能 2.5.0 graphD 也使用了 cgroupv1 的 sysfs 做内存校验,造成了一样的问题。
不支持跨版本的 graphD 和 metaD storageD 组合哈,2.0.1 的镜像可能要配合当时的 operator 版本,应该是 0.8.0 之前的版本,如果要部署 2.0.1 可以清理掉所有东西(包括CRD那些东西),全弄 2.0.1。
辛苦啦~~~谢谢反馈
另外建议等 2.6.2 哈,估计很快啦~~
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
2.6.2大概什么时候出来?
…------------------ 原始邮件 ------------------
发件人: "vesoft-inc/nebula-operator" ***@***.***>;
发送时间: 2022年1月26日(星期三) 中午11:09
***@***.***>;
***@***.******@***.***>;
主题: Re: [vesoft-inc/nebula-operator] nebula-operator 安装后graphd-0不断重启 (Issue #108)
非常奇怪,可能 2.5.0 graphD 也使用了 cgroupv1 的 sysfs 做内存校验,造成了一样的问题。
不支持跨版本的 graphD 和 metaD storageD 组合哈,2.0.1 的镜像可能要配合当时的 operator 版本,应该是 0.8.0 之前的版本,如果要部署 2.0.1 可以清理掉所有东西(包括CRD那些东西),全弄 2.0.1。
辛苦啦~~~谢谢反馈
另外建议等 2.6.2 哈,估计很快啦~~
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
快了,有了之后我过来ping您,如果没有特别的意外可能一两天 |
2.6.2 已经发布了,docker hub 上已经有image了哈 :) |
奇怪,请问能看到 graphd-0 里边 log volume 里的文件么?说什么,再就是 graphd-0 自身 pod/container 有log么? |
没有看到log ,方便加一下微信好友不 不知道你明天有时间可以远程看看不
…---原始邮件---
发件人: "Wey ***@***.***>
发送时间: 2022年1月29日(周六) 晚上9:48
收件人: ***@***.***>;
抄送: ***@***.******@***.***>;
主题: Re: [vesoft-inc/nebula-operator] nebula-operator 安装后graphd-0不断重启 (Issue #108)
奇怪,请问能看到 graphd-0 里边 log volume 里的文件么?说什么,再就是 graphd-0 自身 pod/container 有log么?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
我 k8s 不算熟悉,可以,我的微信 ID 是 sivvei。或者你来 slack 找我也行。 |
talked offline that @dockerxu123 figured out by himself that it's caused by the config: |
k8s 安装nebula-operator成功以后 helm方式安装cluster 后pod 为nebula-graphd不断重启。求助
The text was updated successfully, but these errors were encountered: