我正在使用 Apache Cassandra 2.1.2 和 DataStax OpsCenter 5.0.2 从 AWS EC2 AMI“DataStax Auto-Clustering AMI 2.5.1-hvm”(DataStax Community AMI)运行一个 6 节点集群。当我尝试在 OpsCenter 键空间中对 rollups60 列系列进行修复时,我在 Cassandra 系统日志中收到有关快照创建失败的错误。修复似乎仍在继续,但尚未完成。
我想知道这是否会使修复无效,或者我是否可以期望它完全完成。
我正在运行命令
nodetool repair OpsCenter rollups60
在其中一个节点 (10.63.74.70) 上。到目前为止,我从命令中获得了以下输出:
[2015-01-23 19:36:06,261] Starting repair command #9, repairing 511 ranges for keyspace OpsCenter (seq=true, full=true)
下面是我在日志中看到的示例:
INFO [AntiEntropyStage:1] 2015-01-23 19:38:28,235 RepairSession.java:171 - [repair #138b42e0-a337-11e4-9e78-37e5027a626b] Received merkle tree for rollups60 from /10.63.74.70
INFO [AntiEntropySessions:9] 2015-01-23 19:38:28,236 RepairSession.java:260 - [repair #67772db0-a337-11e4-9e78-37e5027a626b] new session: will sync /10.63.74.70, /10.51.180.16 on range (5848435723460298978,5868916338423419522] for OpsCenter.[rollups60]
INFO [RepairJobTask:3] 2015-01-23 19:38:28,237 Differencer.java:74 - [repair #138b42e0-a337-11e4-9e78-37e5027a626b] Endpoints /10.13.157.190 and /10.63.74.70 have 1 range(s) out of sync for rollups60
INFO [AntiEntropyStage:1] 2015-01-23 19:38:28,237 ColumnFamilyStore.java:840 - Enqueuing flush of rollups60: 465365 (0%) on-heap, 0 (0%) off-heap
INFO [MemtableFlushWriter:25] 2015-01-23 19:38:28,238 Memtable.java:325 - Writing Memtable-rollups60@204861223(51960 serialized bytes, 1395 ops, 0%/0% of on/off-heap limit)
INFO [RepairJobTask:3] 2015-01-23 19:38:28,239 StreamingRepairTask.java:68 - [streaming task #138b42e0-a337-11e4-9e78-37e5027a626b] Performing streaming repair of 1 ranges with /10.13.157.190
INFO [MemtableFlushWriter:25] 2015-01-23 19:38:28,262 Memtable.java:364 - Completed flushing /raid0/cassandra/data/OpsCenter/rollups60-445613507ca411e4bd3f1927a2a71193/OpsCenter-rollups60-ka-331933-Data.db (29998 bytes) for commitlog position ReplayPosition(segmentId=1422038939094, position=31047766)
ERROR [RepairJobTask:2] 2015-01-23 19:38:39,067 RepairJob.java:127 - Error occurred during snapshot phase
java.lang.RuntimeException: Could not create snapshot at /10.63.74.70
at org.apache.cassandra.repair.SnapshotTask$SnapshotCallback.onFailure(SnapshotTask.java:77) ~[apache-cassandra-2.1.2.jar:2.1.2]
at org.apache.cassandra.net.MessagingService$5$1.run(MessagingService.java:347) ~[apache-cassandra-2.1.2.jar:2.1.2]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_51]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_51]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_51]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
INFO [AntiEntropySessions:10] 2015-01-23 19:38:39,068 RepairSession.java:260 - [repair #6dec29c0-a337-11e4-9e78-37e5027a626b] new session: will sync /10.63.74.70, /10.51.180.16 on range (-6918744323658665195,-6916171087863528821] for OpsCenter.[rollups60]
ERROR [AntiEntropySessions:9] 2015-01-23 19:38:39,068 RepairSession.java:303 - [repair #67772db0-a337-11e4-9e78-37e5027a626b] session completed with the following error
java.io.IOException: Failed during snapshot creation.
at org.apache.cassandra.repair.RepairSession.failedSnapshot(RepairSession.java:344) ~[apache-cassandra-2.1.2.jar:2.1.2]
at org.apache.cassandra.repair.RepairJob$2.onFailure(RepairJob.java:128) ~[apache-cassandra-2.1.2.jar:2.1.2]
at com.google.common.util.concurrent.Futures$4.run(Futures.java:1172) ~[guava-16.0.jar:na]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_51]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
ERROR [AntiEntropySessions:9] 2015-01-23 19:38:39,070 CassandraDaemon.java:153 - Exception in thread Thread[AntiEntropySessions:9,5,RMI Runtime]
java.lang.RuntimeException: java.io.IOException: Failed during snapshot creation.
at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.jar:na]
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) ~[apache-cassandra-2.1.2.jar:2.1.2]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_51]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_51]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_51]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
Caused by: java.io.IOException: Failed during snapshot creation.
at org.apache.cassandra.repair.RepairSession.failedSnapshot(RepairSession.java:344) ~[apache-cassandra-2.1.2.jar:2.1.2]
at org.apache.cassandra.repair.RepairJob$2.onFailure(RepairJob.java:128) ~[apache-cassandra-2.1.2.jar:2.1.2]
at com.google.common.util.concurrent.Futures$4.run(Futures.java:1172) ~[guava-16.0.jar:na]
... 3 common frames omitted
错误重复多次。日志中的 IP 地址 10.63.74.70 是我正在运行修复的节点。我能够修复其余的 OpsCenter 列系列,它们很快就完成了,没有错误。
我尝试创建自己的快照,并且成功完成,但没有任何记录。
nodetool snapshot OpsCenter
磁盘有足够的剩余空间。这些错误有问题吗?我是否应该让修复过程继续进行,无论需要多长时间?集群当前未被任何应用程序使用,但它有一些负载,因此它没有处于闲置状态(当我不进行修复时它没有负载)。
谢谢你的帮助。
顺便说一句,这里没有 datastax-community 标签,所以我必须使用 datastax-enterprise 标签。