ceph quincy 无法创建新的 OSD id

ceph quincy 无法创建新的 OSD id

我正在运行一个带有一些节点和几个 OSD 的 ceph 集群 (quincy)。我尝试删除/删除 ODS,现在我的集群似乎仍然知道一些旧的元数据。我无法再次添加现有的 OSD。

检查设备列表我的磁盘被标记为可用:

$ sudo ceph orch device ls
HOST         PATH          TYPE  DEVICE ID                                   SIZE  AVAILABLE  REFRESHED  REJECT REASONS                                                 
ceph-1  /dev/nvme0n1  ssd   SAMSUNG_MZVLB512HAJQ-00000_S3W8NX0M401212   512G  Yes        54s ago 
.....

但如果我尝试使用以下命令添加磁盘:

$ sudo ceph orch daemon add osd ceph-1:/dev/nvme0n1

我收到错误entity osd.2 exists but key does not match

Error EINVAL: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/mgr_module.py", line 1755, in _handle_command
    return self.handle_command(inbuf, cmd)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 171, in handle_command
    return dispatch[cmd['prefix']].call(self, cmd, inbuf)
  File "/usr/share/ceph/mgr/mgr_module.py", line 462, in call
    return self.func(mgr, **kwargs)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 107, in <lambda>
    wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs)  # noqa: E731
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 96, in wrapper
    return func(*args, **kwargs)
  File "/usr/share/ceph/mgr/orchestrator/module.py", line 840, in _daemon_add_osd
    raise_if_exception(completion)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 228, in raise_if_exception
    raise e
RuntimeError: cephadm exited with an error code: 1, stderr:Inferring config /var/lib/ceph/e1b81d86-73be-11ed-a7c8-4c52620b9cf4/mon.ceph-1/config
Non-zero exit code 1 from /usr/bin/docker run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk --init -e CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:0560b16bec6e84345f29fb6693cd2430884e6efff16a95d5bdd0bb06d7661c45 -e NODE_NAME=ceph-1 -e CEPH_USE_RANDOM_NONCE=1 -e CEPH_VOLUME_OSDSPEC_AFFINITY=None -e CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v /var/run/ceph/e1b81d86-73be-11ed-a7c8-4c52620b9cf4:/var/run/ceph:z -v /var/log/ceph/e1b81d86-73be-11ed-a7c8-4c52620b9cf4:/var/log/ceph:z -v /var/lib/ceph/e1b81d86-73be-11ed-a7c8-4c52620b9cf4/crash:/var/lib/ceph/crash:z -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v /tmp/ceph-tmps_lu2_l_:/etc/ceph/ceph.conf:z -v /tmp/ceph-tmpr41pagls:/var/lib/ceph/bootstrap-osd/ceph.keyring:z quay.io/ceph/ceph@sha256:0560b16bec6e84345f29fb6693cd2430884e6efff16a95d5bdd0bb06d7661c45 lvm batch --no-auto /dev/nvme0n1 --yes --no-systemd
/usr/bin/docker: stderr --> passed data devices: 1 physical, 0 LVM
/usr/bin/docker: stderr --> relative data size: 1.0
/usr/bin/docker: stderr Running command: /usr/bin/ceph-authtool --gen-print-key
/usr/bin/docker: stderr Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new b481a987-1835-4844-965f-9cfdc0f7fc88
/usr/bin/docker: stderr  stderr: Error EEXIST: entity osd.2 exists but key does not match
/usr/bin/docker: stderr Traceback (most recent call last):
/usr/bin/docker: stderr   File "/usr/sbin/ceph-volume", line 11, in <module>
/usr/bin/docker: stderr     load_entry_point('ceph-volume==1.0.0', 'console_scripts', 'ceph-volume')()
/usr/bin/docker: stderr   File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 41, in __init__
/usr/bin/docker: stderr     self.main(self.argv)
/usr/bin/docker: stderr   File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 59, in newfunc
/usr/bin/docker: stderr     return f(*a, **kw)
/usr/bin/docker: stderr   File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 153, in main
/usr/bin/docker: stderr     terminal.dispatch(self.mapper, subcommand_args)
/usr/bin/docker: stderr   File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
/usr/bin/docker: stderr     instance.main()
/usr/bin/docker: stderr   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/main.py", line 46, in main
/usr/bin/docker: stderr     terminal.dispatch(self.mapper, self.argv)
/usr/bin/docker: stderr   File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
/usr/bin/docker: stderr     instance.main()
/usr/bin/docker: stderr   File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root
/usr/bin/docker: stderr     return func(*a, **kw)
/usr/bin/docker: stderr   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 441, in main
/usr/bin/docker: stderr     self._execute(plan)
/usr/bin/docker: stderr   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 460, in _execute
/usr/bin/docker: stderr     c.create(argparse.Namespace(**args))
/usr/bin/docker: stderr   File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root
/usr/bin/docker: stderr     return func(*a, **kw)
/usr/bin/docker: stderr   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/create.py", line 26, in create
/usr/bin/docker: stderr     prepare_step.safe_prepare(args)
/usr/bin/docker: stderr   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/prepare.py", line 252, in safe_prepare
/usr/bin/docker: stderr     self.prepare()
/usr/bin/docker: stderr   File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root
/usr/bin/docker: stderr     return func(*a, **kw)
/usr/bin/docker: stderr   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/prepare.py", line 292, in prepare
/usr/bin/docker: stderr     self.osd_id = prepare_utils.create_id(osd_fsid, json.dumps(secrets), osd_id=self.args.osd_id)
/usr/bin/docker: stderr   File "/usr/lib/python3.6/site-packages/ceph_volume/util/prepare.py", line 176, in create_id
/usr/bin/docker: stderr     raise RuntimeError('Unable to create a new OSD id')
/usr/bin/docker: stderr RuntimeError: Unable to create a new OSD id
Traceback (most recent call last):
  File "/var/lib/ceph/e1b81d86-73be-11ed-a7c8-4c52620b9cf4/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e", line 9468, in <module>
    main()
  File "/var/lib/ceph/e1b81d86-73be-11ed-a7c8-4c52620b9cf4/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e", line 9456, in main
    r = ctx.func(ctx)
  File "/var/lib/ceph/e1b81d86-73be-11ed-a7c8-4c52620b9cf4/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e", line 2083, in _infer_config
    return func(ctx)
  File "/var/lib/ceph/e1b81d86-73be-11ed-a7c8-4c52620b9cf4/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e", line 1999, in _infer_fsid
    return func(ctx)
  File "/var/lib/ceph/e1b81d86-73be-11ed-a7c8-4c52620b9cf4/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e", line 2111, in _infer_image
    return func(ctx)
  File "/var/lib/ceph/e1b81d86-73be-11ed-a7c8-4c52620b9cf4/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e", line 1986, in _validate_fsid
    return func(ctx)
  File "/var/lib/ceph/e1b81d86-73be-11ed-a7c8-4c52620b9cf4/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e", line 6093, in command_ceph_volume
    out, err, code = call_throws(ctx, c.run_cmd(), verbosity=CallVerbosity.QUIET_UNLESS_ERROR)
  File "/var/lib/ceph/e1b81d86-73be-11ed-a7c8-4c52620b9cf4/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e", line 1788, in call_throws
    raise RuntimeError('Failed command: %s' % ' '.join(command))
RuntimeError: Failed command: /usr/bin/docker run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk --init -e CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:0560b16bec6e84345f29fb6693cd2430884e6efff16a95d5bdd0bb06d7661c45 -e NODE_NAME=ceph-1 -e CEPH_USE_RANDOM_NONCE=1 -e CEPH_VOLUME_OSDSPEC_AFFINITY=None -e CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v /var/run/ceph/e1b81d86-73be-11ed-a7c8-4c52620b9cf4:/var/run/ceph:z -v /var/log/ceph/e1b81d86-73be-11ed-a7c8-4c52620b9cf4:/var/log/ceph:z -v /var/lib/ceph/e1b81d86-73be-11ed-a7c8-4c52620b9cf4/crash:/var/lib/ceph/crash:z -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v /tmp/ceph-tmps_lu2_l_:/etc/ceph/ceph.conf:z -v /tmp/ceph-tmpr41pagls:/var/lib/ceph/bootstrap-osd/ceph.keyring:z quay.io/ceph/ceph@sha256:0560b16bec6e84345f29fb6693cd2430884e6efff16a95d5bdd0bb06d7661c45 lvm batch --no-auto /dev/nvme0n1 --yes --no-systemd

如何使集群恢复一致状态?在 docker 上运行时如何删除过时的 OSD id?

答案1

这里的问题是旧的 OSD 身份验证详细信息仍然存储在集群中。删除 OSD 条目及其授权密钥,解决了问题:

$ sudo ceph osd rm osd.<YOUR-OSD-ID>
removed osd.0

$ sudo ceph auth del osd.<YOUR-OSD-ID>
updated

现在可以将 OSD 再次添加到集群中。

相关内容