Ceph RGW:`list_bucket` 请求缓慢

Ceph RGW:`list_bucket` 请求缓慢

我有一个 ceph-rgw 安装,其中包含一个大型 bucket(约 60M 个对象)和 16 个 osd,bucket 索引被分片为 997 个分片。在此环境中,单个目录列表需要 30 多秒:

$ time rclone lsd t:bucket/non/existent/path/ --contimeout=1h --timeout=1h
real    0m34.816s

这非常烦人,并且可能客户端(例如 rclone 本身)在 PUT 之前执行 list-dir 操作来检查/验证某些内容。(阻止客户端发送 list_objects/list_bucket 不是一个好选择)

的日志rgw daemon正常,部分日志如下:

08:57:45.267+0000 7f0492db2700  1 ====== starting new request req=0x7f05039a9620 =====
08:57:45.267+0000 7f0492db2700 20 req 412648 0.000000000s final domain/bucket subdomain= domain= in_hosted_domain=0 in_hosted_domain_s3website=0 s->info.domain= s->info.request_uri=/bucket
08:57:45.267+0000 7f0492db2700 10 req 412648 0.000000000s canonical request = GET
08:57:45.267+0000 7f0492db2700  2 req 412648 0.000000000s s3:list_bucket verifying op params
08:57:45.267+0000 7f0492db2700  2 req 412648 0.000000000s s3:list_bucket pre-executing
08:57:45.267+0000 7f0492db2700  2 req 412648 0.000000000s s3:list_bucket executing
08:57:45.267+0000 7f0492db2700 20 req 412648 0.000000000s s3:list_bucket RGWRados::Bucket::List::list_objects_ordered starting attempt 1
08:57:45.267+0000 7f0492db2700 10 req 412648 0.000000000s s3:list_bucket RGWRados::cls_bucket_list_ordered: :bucket[e6fb9c7c-74a2-4819-a0ed-e740d4eb590c.4751590.1]) start_after="[]", prefix="/non/existent/path/" num_entries=1001, list_versions=0, expansion_factor=1
08:57:45.271+0000 7f0492db2700 10 req 412648 0.004000000s s3:list_bucket RGWRados::cls_bucket_list_ordered request from each of 997 shard(s) for 8 entries to get 1001 total entries
08:58:07.495+0000 7f04efe6c700 10 librados: Objecter returned from call r=0
08:58:08.779+0000 7f04cd627700  4 rgw rados thread: no peers, exiting
08:58:18.803+0000 7f0492db2700  2 req 412648 33.535980225s s3:list_bucket completing
08:58:18.803+0000 7f047bd84700  2 req 412648 33.535980225s s3:list_bucket op status=0
08:58:18.803+0000 7f047bd84700  2 req 412648 33.535980225s s3:list_bucket http status=200
08:58:18.803+0000 7f047bd84700  1 ====== req done req=0x7f05039a9620 op status=0 http_status=200 latency=33.535980225s ======
08:58:18.803+0000 7f047bd84700  1 beast: 0x7f05039a9620: 192.168.1.1 - rgwuser [10/Nov/2021:08:57:45.267 +0000] "GET /bucket?delimiter=%!F(MISSING)&max-keys=1000&prefix=non%!F(MISSING)existent%!F(MISSING)path%!F(MISSING) HTTP/1.1" 200 413 - "rclone/v1.57.0" - latency=33.535980225s

环境详细信息是:Ceph 版本:16.2.5 使用 rook 安装,每个 OSD 大约~4T,带有 256G SSD 元数据设备。

相关内容