Hot For Coding

Kubernetes新节点加入集群后mount错误

full

因为集群的负载有点过重,于是准备加入节点分摊,在kubeadm join结束之后发现分配到新节点的pod一直处于ContainerCreating好长时间。用kubectl describe po <pod-name>看一下原因,发现一直重复出现以下信息

...
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/45e5a290-077b-4f9b-889b-71de0c7f17bc/volumes/kubernetes.io~nfs/json --scope -- mount -t nfs nfs.server.com:/data /var/lib/kubelet/pods/45e5a290-077b-4f9b-889b-71de0c7f17bc/volumes/kubernetes.io~nfs/json
Output: Running scope as unit: run-rf893345a476c4ac2bebef49266fc63c6.scope
mount: /var/lib/kubelet/pods/45e5a290-077b-4f9b-889b-71de0c7f17bc/volumes/kubernetes.io~nfs/json: bad option; for several filesystems (e.g. nfs, cifs) you might need a /sbin/mount.<type> helper program.
  Warning  FailedMount  33s (x4 over 11m)  kubelet, node15  Unable to attach or mount volumes: unmounted volumes=[json], unattached volumes=[config json default-token-5hhj7]: timed out waiting for the condition
...

大概意思是说mount失败,原因可能是文件系统不支持。这个Pod使用了nfs挂载,极有可能在新的节点上不支持nfs挂载,我使用的是Debian,可以安装一个包就可以支持nfs挂载

apt install nfs-common -y

在节点上测试以下

mount -t nfs -o vers=4.0,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport nfs.server.com:/data /data

挂载成功,再回过来看一下,全部running了

# kubectl get po  
NAME                                READY   STATUS    RESTARTS   AGE
nginx-deployment-54f57cf6bf-bhkg9   1/1     Running   0          22h
nginx-deployment-54f57cf6bf-sszl2   1/1     Running   0          20h
nginx-deployment-54f57cf6bf-www55   1/1     Running   0          20h

TITLE: Kubernetes新节点加入集群后mount错误

LINK: https://www.qttc.net/517_kubernetes_join_error_on_mount.html

NOTE: 转载内容请注明出处