一个 ColumnFamily 只将数据放置在 4 个节点中的 3 个上

一个 ColumnFamily 只将数据放置在 4 个节点中的 3 个上

我已经在 cassandra-user 邮件列表上发布了此信息,但是尚未收到任何回复,我想知道 serverfault.com 上是否有人对此有任何想法。

我似乎遇到了 Cassandra 相当奇怪的问题/行为(至少对我来说!)。

我在 Cassandra 0.8.7 上运行一个 4 节点集群。对于所讨论的键空间,我有 RF=3、SimpleStrategy,KeySpace 内有多个 ColumnFamilies。然而,其中一个 ColumnFamilies 的数据似乎只分布在 4 个节点中的 3 个上。

有问题的ColumnFamily旁边的集群上的数据看起来或多或少是相等且均匀的。

# nodetool -h localhost ring
Address         DC          Rack        Status State   Load            Owns    Token                                       
                                                                               127605887595351923798765477786913079296     
192.168.81.2    datacenter1 rack1       Up     Normal  7.27 GB         25.00%  0                                           
192.168.81.3    datacenter1 rack1       Up     Normal  7.74 GB         25.00%  42535295865117307932921825928971026432      
192.168.81.4    datacenter1 rack1       Up     Normal  7.38 GB         25.00%  85070591730234615865843651857942052864      
192.168.81.5    datacenter1 rack1       Up     Normal  7.32 GB         25.00%  127605887595351923798765477786913079296     

键空间相关位的架构如下:

[default@A] show schema;
create keyspace A
  with placement_strategy = 'SimpleStrategy'
  and strategy_options = [{replication_factor : 3}];
[...]
create column family UserDetails
  with column_type = 'Standard'
  and comparator = 'IntegerType'
  and default_validation_class = 'BytesType'
  and key_validation_class = 'BytesType'
  and memtable_operations = 0.571875
  and memtable_throughput = 122
  and memtable_flush_after = 1440
  and rows_cached = 0.0
  and row_cache_save_period = 0
  and keys_cached = 200000.0
  and key_cache_save_period = 14400
  and read_repair_chance = 1.0
  and gc_grace = 864000
  and min_compaction_threshold = 4
  and max_compaction_threshold = 32
  and replicate_on_write = true
  and row_cache_provider = 'ConcurrentLinkedHashCacheProvider';

现在症状是每个节点上“nodetool -h localhost cfstats”的输出。请注意节点 1 上的数字。

节点1

Column Family: UserDetails
SSTable count: 0
Space used (live): 0
Space used (total): 0
Number of Keys (estimate): 0
Memtable Columns Count: 0
Memtable Data Size: 0
Memtable Switch Count: 0
Read Count: 0
Read Latency: NaN ms.
Write Count: 0
Write Latency: NaN ms.
Pending Tasks: 0
Key cache capacity: 200000
Key cache size: 0
Key cache hit rate: NaN
Row cache: disabled
Compacted row minimum size: 0
Compacted row maximum size: 0
Compacted row mean size: 0

节点2

Column Family: UserDetails
SSTable count: 3
Space used (live): 112952788
Space used (total): 164953743
Number of Keys (estimate): 384
Memtable Columns Count: 159419
Memtable Data Size: 74910890
Memtable Switch Count: 59
Read Count: 135307426
Read Latency: 25.900 ms.
Write Count: 3474673
Write Latency: 0.040 ms.
Pending Tasks: 0
Key cache capacity: 200000
Key cache size: 120
Key cache hit rate: 0.999971684189041
Row cache: disabled
Compacted row minimum size: 42511
Compacted row maximum size: 74975550
Compacted row mean size: 42364305

节点3

Column Family: UserDetails
SSTable count: 3
Space used (live): 112953137
Space used (total): 112953137
Number of Keys (estimate): 384
Memtable Columns Count: 159421
Memtable Data Size: 74693445
Memtable Switch Count: 56
Read Count: 135304486
Read Latency: 25.552 ms.
Write Count: 3474616
Write Latency: 0.036 ms.
Pending Tasks: 0
Key cache capacity: 200000
Key cache size: 109
Key cache hit rate: 0.9999716840888175
Row cache: disabled
Compacted row minimum size: 42511
Compacted row maximum size: 74975550
Compacted row mean size: 42364305

节点4

Column Family: UserDetails
SSTable count: 3
Space used (live): 117070926
Space used (total): 119479484
Number of Keys (estimate): 384
Memtable Columns Count: 159979
Memtable Data Size: 75029672
Memtable Switch Count: 60
Read Count: 135294878
Read Latency: 19.455 ms.
Write Count: 3474982
Write Latency: 0.028 ms.
Pending Tasks: 0
Key cache capacity: 200000
Key cache size: 119
Key cache hit rate: 0.9999752235777154
Row cache: disabled
Compacted row minimum size: 2346800
Compacted row maximum size: 62479625
Compacted row mean size: 42591803

当我转到 node1 上的“数据”目录时,没有关于 UserDetails ColumnFamily 的文件。

我尝试进行手动修复,希望能够解决问题,但是没有任何运气。

# nodetool -h localhost repair A UserDetails
 INFO 15:19:54,611 Starting repair command #8, repairing 3 ranges.
 INFO 15:19:54,647 Sending AEService tree for #<TreeRequest manual-repair-89c1acb0-184c-438f-bab8-7ceed27980ec, /192.168.81.2, (A,UserDetails), (85070591730234615865843651857942052864,127605887595351923798765477786913079296]>
 INFO 15:19:54,742 Endpoints /192.168.81.2 and /192.168.81.3 are consistent for UserDetails on (85070591730234615865843651857942052864,127605887595351923798765477786913079296]
 INFO 15:19:54,750 Endpoints /192.168.81.2 and /192.168.81.5 are consistent for UserDetails on (85070591730234615865843651857942052864,127605887595351923798765477786913079296]
 INFO 15:19:54,751 Repair session manual-repair-89c1acb0-184c-438f-bab8-7ceed27980ec (on cfs [Ljava.lang.String;@3491507b, range (85070591730234615865843651857942052864,127605887595351923798765477786913079296]) completed successfully
 INFO 15:19:54,816 Sending AEService tree for #<TreeRequest manual-repair-6d2438ca-a05c-4217-92c7-c2ad563a92dd, /192.168.81.2, (A,UserDetails), (42535295865117307932921825928971026432,85070591730234615865843651857942052864]>
 INFO 15:19:54,865 Endpoints /192.168.81.2 and /192.168.81.4 are consistent for UserDetails on (42535295865117307932921825928971026432,85070591730234615865843651857942052864]
 INFO 15:19:54,874 Endpoints /192.168.81.2 and /192.168.81.5 are consistent for UserDetails on (42535295865117307932921825928971026432,85070591730234615865843651857942052864]
 INFO 15:19:54,874 Repair session manual-repair-6d2438ca-a05c-4217-92c7-c2ad563a92dd (on cfs [Ljava.lang.String;@7e541d08, range (42535295865117307932921825928971026432,85070591730234615865843651857942052864]) completed successfully
 INFO 15:19:54,909 Sending AEService tree for #<TreeRequest manual-repair-98d1a21c-9d6e-41c8-8917-aea70f716243, /192.168.81.2, (A,UserDetails), (127605887595351923798765477786913079296,0]>
 INFO 15:19:54,967 Endpoints /192.168.81.2 and /192.168.81.3 are consistent for UserDetails on (127605887595351923798765477786913079296,0]
 INFO 15:19:54,974 Endpoints /192.168.81.2 and /192.168.81.4 are consistent for UserDetails on (127605887595351923798765477786913079296,0]
 INFO 15:19:54,975 Repair session manual-repair-98d1a21c-9d6e-41c8-8917-aea70f716243 (on cfs [Ljava.lang.String;@48c651f2, range (127605887595351923798765477786913079296,0]) completed successfully
 INFO 15:19:54,975 Repair command #8 completed successfully

当我使用 SimpleStrategy 时,我希望键能够在节点之间或多或少地平均分配,但事实似乎并非如此。

以前有人遇到过类似的行为吗?有人能建议我该怎么做才能将一些数据导入节点 1 吗?显然,这种数据分割意味着节点 2、节点 3 和节点 4 需要完成所有读取工作,这并不理想。

非常感谢您的任何建议。

祝好,巴特

答案1

简单策略意味着 Cassandra 不考虑机架、数据中心或其他地理位置来分发数据。这是了解数据分布的重要信息,但不足以全面分析您的情况。

如果你想了解行在集群中的分布情况,这也是一个问题分区器您使用的。随机分区器在决定应该拥有行键的集群成员之前会对其进行哈希处理。保序分区器则不会这样做,这可能会在集群上创建热点(包括完全不使用节点!),即使您的节点对环进行了相等的划分。您可以在其中一个节点上使用以下命令试验 Cassandra 如何分配不同的密钥,以查看 Cassandra 认为不同的密钥(实际或假设)属于哪些节点:

nodetool -h localhost getendpoints <keyspace> <cf> <key>

如果其他列族在集群中正确地分布其数据,我会研究您使用的分区器和键。

答案2

结果是模式问题——我们本来拥有多行(每个用户 1 行),但却拥有一行包含超过 800,000 列的巨大行。

我怀疑发生的事情是:

  • 此行始终被操作系统缓存 - 因此我们没有看到任何 IO
  • 然后 Cassandra 占用了所有的 CPU 时间一遍又一遍地序列化大量的行,以获取其中的数据

我们已经改变了应用程序的执行方式,即它为单个用户的详细信息存储单行,问题就消失了。

相关内容