一些版本信息:
Operating system is Ubuntu 11.10, on EC2, kernel is 3.0.0-16-virtual and the application info is:
Version: 8.3.11 (api:88)
GIT-hash: 0de839cee13a4160eed6037c4bddd066645e23c5 build by buildd@allspice, 2011-07-05 19:51:07
dmesg 中也出现了一些奇怪的错误(如下所示),没有发生复制。我已将第一个节点设为主节点,其显示:
drbd driver loaded OK; device status:
version: 8.3.11 (api:88/proto:86-96)
srcversion: DA5A13F16DE6553FC7CE9B2
m:res cs ro ds p mounted fstype
0:r0 StandAlone Primary/Unknown UpToDate/DUnknown r----s ext3
我的辅助节点显示:
drbd driver loaded OK; device status:
version: 8.3.11 (api:88/proto:86-96)
srcversion: DA5A13F16DE6553FC7CE9B2
m:res cs ro ds p mounted fstype
0:r0 StandAlone Secondary/Unknown Inconsistent/DUnknown r----s
在主服务器上显示 /proc/drbd 显示:
version: 8.3.11 (api:88/proto:86-96)
srcversion: DA5A13F16DE6553FC7CE9B2
0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r----s
ns:0 nr:0 dw:4 dr:1073 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:262135964
在从属服务器上显示 /proc/drbd 表明没有任何内容被传输......
version: 8.3.11 (api:88/proto:86-96)
srcversion: DA5A13F16DE6553FC7CE9B2
0: cs:StandAlone ro:Secondary/Unknown ds:Inconsistent/DUnknown r----s
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:262135964
这是我的配置...
resource r0 {
protocol C;
startup {
wfc-timeout 15;
degr-wfc-timeout 60;
}
net {
cram-hmac-alg sha1;
shared-secret "test123;
}
on drbd01 {
device /dev/drbd0;
disk /dev/xvdm;
address 23.XX.XX.XX:7788; # blocked out ip
meta-disk internal;
}
on drbd02 {
device /dev/drbd0;
disk /dev/xvdm;
address 184.XX.XX.XX:7788; #blocked out ip
meta-disk internal;
}
}
我在主服务器上运行了以下命令:
sudo drbdadm -- --overwrite-data-of-peer primary all
系统之间没有防火墙。
以下是包含一些错误的 dmesg:
[2285172.969955] drbd: initialized. Version: 8.3.11 (api:88/proto:86-96)
[2285172.969960] drbd: srcversion: DA5A13F16DE6553FC7CE9B2
[2285172.969962] drbd: registered as block device major 147
[2285172.969965] drbd: minor_table @ 0xffff88000276ea00
[2285173.000952] block drbd0: Starting worker thread (from drbdsetup [1300])
[2285173.003971] block drbd0: disk( Diskless -> Attaching )
[2285173.006150] block drbd0: No usable activity log found.
[2285173.006154] block drbd0: Method to ensure write ordering: flush
[2285173.006158] block drbd0: max BIO size = 4096
[2285173.006165] block drbd0: drbd_bm_resize called with capacity == 524271928
[2285173.008512] block drbd0: resync bitmap: bits=65533991 words=1023969 pages=2000
[2285173.008518] block drbd0: size = 250 GB (262135964 KB)
[2285173.079566] block drbd0: bitmap READ of 2000 pages took 17 jiffies
[2285173.081189] block drbd0: recounting of set bits took additional 1 jiffies
[2285173.081194] block drbd0: 250 GB (65533991 bits) marked out-of-sync by on disk bit-map.
[2285173.081203] block drbd0: Suspended AL updates
[2285173.081210] block drbd0: disk( Attaching -> UpToDate )
[2285173.081214] block drbd0: attached to UUIDs 1C1291D39584C1D1:0000000000000004:0000000000000000:0000000000000000
[2285173.095016] block drbd0: conn( StandAlone -> Unconnected )
[2285173.095046] block drbd0: Starting receiver thread (from drbd0_worker [1301])
[2285173.099297] block drbd0: receiver (re)started
[2285173.099304] block drbd0: conn( Unconnected -> WFConnection )
[2285173.099330] block drbd0: bind before connect failed, err = -99
[2285173.099346] block drbd0: conn( WFConnection -> Disconnecting )
[2285173.295788] block drbd0: Discarding network configuration.
[2285173.295815] block drbd0: Connection closed
[2285173.295826] block drbd0: conn( Disconnecting -> StandAlone )
[2285173.295840] block drbd0: receiver terminated
[2285173.295844] block drbd0: Terminating drbd0_receiver
编辑:
阅读一些其他类似的问题时,有人建议执行“drbdadm dump all”,所以我认为这不会有害。
ubuntu@drbd01:~$ drbdadm dump all
/etc/drbd.conf:19: in resource r0, on drbd01:
IP 23.XX.XX.XX not found on this host.
在奴隶身上:
root@drbd02:~# drbdadm dump all
/etc/drbd.conf:25: in resource r0, on drbd02:
IP 184.XX.XX.XX not found on this host.
奇怪的是它找不到自己的 IP,然而,这是一个使用弹性 IP 的 Amazon EC2 系统...这里是我的 ipconfigs 两者...
掌握:
ubuntu@drbd01:~$ ifconfig
eth0 Link encap:Ethernet HWaddr 22:00:0a:1c:27:11
inet addr:10.28.39.17 Bcast:10.28.39.63 Mask:255.255.255.192
inet6 addr: fe80::2000:aff:fe1c:2711/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1569 errors:0 dropped:0 overruns:0 frame:0
TX packets:1169 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:124409 (124.4 KB) TX bytes:213601 (213.6 KB)
Interrupt:26
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
奴隶:
root@drbd02:~# ifconfig
eth0 Link encap:Ethernet HWaddr 12:31:3f:00:14:9d
inet addr:10.160.27.107 Bcast:10.160.27.255 Mask:255.255.254.0
inet6 addr: fe80::1031:3fff:fe00:149d/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:915 errors:0 dropped:0 overruns:0 frame:0
TX packets:774 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:75381 (75.3 KB) TX bytes:109673 (109.6 KB)
Interrupt:26
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
答案1
你其实不需要跑sudo drbdadm -- --overwrite-data-of-peer primary all
只要 /dev/drbd 你应该已经做了以下
步骤 01)sudo service mysql stop
在 DRBD Primary 上,这样额外的更改就不会堆积起来供 DRBD 同步
步骤02)sudo drbdadm connect all
在DRBD Secondary上
步骤 03)sudo cat /proc/drbd
在 DRBD Secondary 上确保连接状态WFConnection
步骤04)sudo drbdadm connect all
在DRBD主节点上
步骤05)sudo cat /proc/drbd
在DRBD Primary上确保连接状态为SyncTarget
。
步骤 06)sudo service mysql start
在 DRBD Primary 上,以便 MySQL 可以重新启动。DRBD 同步将继续。您不必等待步骤 05 中 DRBD 完全同步即可重新启用 MySQL。
警告#1
DRBD 不应在地理距离之外使用。它适用于通过 192.168.xx 或其他纯 LAN 上的 CrossOver 电缆连接 DRBD 对的设置。
警告#2
根本不应该使用 DRBD。即使是最新的 V9 也很容易进入“裂脑”模式。
答案2
请尝试以下操作:
在基本的节点
drbdadm connect all
在次要的节点
drbdadm -- --discard-my-data connect all