我正在运行一个带有一些节点和几个 OSD 的 ceph 集群 (quincy)。我尝试删除/删除 ODS,现在我的集群似乎仍然知道一些旧的元数据。我无法再次添加现有的 OSD。
检查设备列表我的磁盘被标记为可用:
$ sudo ceph orch device ls
HOST PATH TYPE DEVICE ID SIZE AVAILABLE REFRESHED REJECT REASONS
ceph-1 /dev/nvme0n1 ssd SAMSUNG_MZVLB512HAJQ-00000_S3W8NX0M401212 512G Yes 54s ago
.....
但如果我尝试使用以下命令添加磁盘:
$ sudo ceph orch daemon add osd ceph-1:/dev/nvme0n1
我收到错误entity osd.2 exists but key does not match
:
Error EINVAL: Traceback (most recent call last):
File "/usr/share/ceph/mgr/mgr_module.py", line 1755, in _handle_command
return self.handle_command(inbuf, cmd)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 171, in handle_command
return dispatch[cmd['prefix']].call(self, cmd, inbuf)
File "/usr/share/ceph/mgr/mgr_module.py", line 462, in call
return self.func(mgr, **kwargs)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 107, in <lambda>
wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs) # noqa: E731
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 96, in wrapper
return func(*args, **kwargs)
File "/usr/share/ceph/mgr/orchestrator/module.py", line 840, in _daemon_add_osd
raise_if_exception(completion)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 228, in raise_if_exception
raise e
RuntimeError: cephadm exited with an error code: 1, stderr:Inferring config /var/lib/ceph/e1b81d86-73be-11ed-a7c8-4c52620b9cf4/mon.ceph-1/config
Non-zero exit code 1 from /usr/bin/docker run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk --init -e CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:0560b16bec6e84345f29fb6693cd2430884e6efff16a95d5bdd0bb06d7661c45 -e NODE_NAME=ceph-1 -e CEPH_USE_RANDOM_NONCE=1 -e CEPH_VOLUME_OSDSPEC_AFFINITY=None -e CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v /var/run/ceph/e1b81d86-73be-11ed-a7c8-4c52620b9cf4:/var/run/ceph:z -v /var/log/ceph/e1b81d86-73be-11ed-a7c8-4c52620b9cf4:/var/log/ceph:z -v /var/lib/ceph/e1b81d86-73be-11ed-a7c8-4c52620b9cf4/crash:/var/lib/ceph/crash:z -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v /tmp/ceph-tmps_lu2_l_:/etc/ceph/ceph.conf:z -v /tmp/ceph-tmpr41pagls:/var/lib/ceph/bootstrap-osd/ceph.keyring:z quay.io/ceph/ceph@sha256:0560b16bec6e84345f29fb6693cd2430884e6efff16a95d5bdd0bb06d7661c45 lvm batch --no-auto /dev/nvme0n1 --yes --no-systemd
/usr/bin/docker: stderr --> passed data devices: 1 physical, 0 LVM
/usr/bin/docker: stderr --> relative data size: 1.0
/usr/bin/docker: stderr Running command: /usr/bin/ceph-authtool --gen-print-key
/usr/bin/docker: stderr Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new b481a987-1835-4844-965f-9cfdc0f7fc88
/usr/bin/docker: stderr stderr: Error EEXIST: entity osd.2 exists but key does not match
/usr/bin/docker: stderr Traceback (most recent call last):
/usr/bin/docker: stderr File "/usr/sbin/ceph-volume", line 11, in <module>
/usr/bin/docker: stderr load_entry_point('ceph-volume==1.0.0', 'console_scripts', 'ceph-volume')()
/usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 41, in __init__
/usr/bin/docker: stderr self.main(self.argv)
/usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 59, in newfunc
/usr/bin/docker: stderr return f(*a, **kw)
/usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 153, in main
/usr/bin/docker: stderr terminal.dispatch(self.mapper, subcommand_args)
/usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
/usr/bin/docker: stderr instance.main()
/usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/main.py", line 46, in main
/usr/bin/docker: stderr terminal.dispatch(self.mapper, self.argv)
/usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
/usr/bin/docker: stderr instance.main()
/usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root
/usr/bin/docker: stderr return func(*a, **kw)
/usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 441, in main
/usr/bin/docker: stderr self._execute(plan)
/usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 460, in _execute
/usr/bin/docker: stderr c.create(argparse.Namespace(**args))
/usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root
/usr/bin/docker: stderr return func(*a, **kw)
/usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/create.py", line 26, in create
/usr/bin/docker: stderr prepare_step.safe_prepare(args)
/usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/prepare.py", line 252, in safe_prepare
/usr/bin/docker: stderr self.prepare()
/usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root
/usr/bin/docker: stderr return func(*a, **kw)
/usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/prepare.py", line 292, in prepare
/usr/bin/docker: stderr self.osd_id = prepare_utils.create_id(osd_fsid, json.dumps(secrets), osd_id=self.args.osd_id)
/usr/bin/docker: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/util/prepare.py", line 176, in create_id
/usr/bin/docker: stderr raise RuntimeError('Unable to create a new OSD id')
/usr/bin/docker: stderr RuntimeError: Unable to create a new OSD id
Traceback (most recent call last):
File "/var/lib/ceph/e1b81d86-73be-11ed-a7c8-4c52620b9cf4/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e", line 9468, in <module>
main()
File "/var/lib/ceph/e1b81d86-73be-11ed-a7c8-4c52620b9cf4/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e", line 9456, in main
r = ctx.func(ctx)
File "/var/lib/ceph/e1b81d86-73be-11ed-a7c8-4c52620b9cf4/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e", line 2083, in _infer_config
return func(ctx)
File "/var/lib/ceph/e1b81d86-73be-11ed-a7c8-4c52620b9cf4/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e", line 1999, in _infer_fsid
return func(ctx)
File "/var/lib/ceph/e1b81d86-73be-11ed-a7c8-4c52620b9cf4/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e", line 2111, in _infer_image
return func(ctx)
File "/var/lib/ceph/e1b81d86-73be-11ed-a7c8-4c52620b9cf4/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e", line 1986, in _validate_fsid
return func(ctx)
File "/var/lib/ceph/e1b81d86-73be-11ed-a7c8-4c52620b9cf4/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e", line 6093, in command_ceph_volume
out, err, code = call_throws(ctx, c.run_cmd(), verbosity=CallVerbosity.QUIET_UNLESS_ERROR)
File "/var/lib/ceph/e1b81d86-73be-11ed-a7c8-4c52620b9cf4/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e", line 1788, in call_throws
raise RuntimeError('Failed command: %s' % ' '.join(command))
RuntimeError: Failed command: /usr/bin/docker run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk --init -e CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:0560b16bec6e84345f29fb6693cd2430884e6efff16a95d5bdd0bb06d7661c45 -e NODE_NAME=ceph-1 -e CEPH_USE_RANDOM_NONCE=1 -e CEPH_VOLUME_OSDSPEC_AFFINITY=None -e CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v /var/run/ceph/e1b81d86-73be-11ed-a7c8-4c52620b9cf4:/var/run/ceph:z -v /var/log/ceph/e1b81d86-73be-11ed-a7c8-4c52620b9cf4:/var/log/ceph:z -v /var/lib/ceph/e1b81d86-73be-11ed-a7c8-4c52620b9cf4/crash:/var/lib/ceph/crash:z -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v /tmp/ceph-tmps_lu2_l_:/etc/ceph/ceph.conf:z -v /tmp/ceph-tmpr41pagls:/var/lib/ceph/bootstrap-osd/ceph.keyring:z quay.io/ceph/ceph@sha256:0560b16bec6e84345f29fb6693cd2430884e6efff16a95d5bdd0bb06d7661c45 lvm batch --no-auto /dev/nvme0n1 --yes --no-systemd
如何使集群恢复一致状态?在 docker 上运行时如何删除过时的 OSD id?
答案1
这里的问题是旧的 OSD 身份验证详细信息仍然存储在集群中。删除 OSD 条目及其授权密钥,解决了问题:
$ sudo ceph osd rm osd.<YOUR-OSD-ID>
removed osd.0
$ sudo ceph auth del osd.<YOUR-OSD-ID>
updated
现在可以将 OSD 再次添加到集群中。