我在 Ubuntu Server 16.04 LTS 上安装了 mssql-server 和 mssql-server-ha。我在两个节点上使用 drbd,使用 pacemaker 和 corosync 尝试控制两个节点之间的自动故障转移。crm status
显示 2 个错误:
Failed Actions:
* res_mssql_monitor_5000 on hostname2 'invalid parameter' (2): call=57, status=complete, exitreason='2017/11/09 12:33:01 Expected local server name to be res_mssql but it was hostname1',
last-rc-change='Thu Nov 9 12:33:01 2017', queued=0ms, exec=5241ms
* res_mssql_start_0 on hostname2 'unknown error' (1): call=6086, status=complete, exitreason='SQL Server crashed during startup.',
last-rc-change='Thu Nov 9 12:32:39 2017', queued=0ms, exec=24329ms
(实际主机名替换为“hostname1 和 hostname2”)
总结 如果有人已成功配置了具有浮动 IP 的双节点 pacemaker/corosync/drbd SQL Server 2017 on Linux 设置,我很想知道我做错了什么。如果您需要其他配置或日志文件,请告诉我。
我不知道它在哪里找到实际的 hostname1 而不是 rs_mssql 作为预期的主机名。上述错误发生在 hostname2 上,所以我认为可能是在初始设置期间我将配置文件从 hostname1 复制到 hostname2 时发生的。
我的客户关系管理配置:
(注意:我还没有解决 IPaddr2 问题;我的常规 IP 地址是 ens160 和 ens192,我想稍后将 IP 别名配置为 ip_mssql,以便通过公网 IP 访问 SQL 服务器)
node 1: hostname1 \
attributes
node 2: hostname2 \
attributes
primitive ip_mssql IPaddr2 \
params ip=(virt IP addr) iflabel=ip_mssql \ #I think iflabel is wrong
op monitor interval=5s nic=ip_mssql \
meta target-role=Stopped
primitive res_drbd_mssql ocf:linbit:drbd \
params drbd_resource=mssql \
op start interval=0 timeout=240s \
op stop interval=0 timeout=120s
primitive res_fs_mssqlData Filesystem \
params device="/dev/drbd0" directory="/var/opt/mssql/data" fstype=xfs \
op start interval=0 timeout=60s \
op stop interval=0 timeout=120s
primitive res_fs_mssqlLog Filesystem \
params device="/dev/drbd1" directory="/var/opt/mssql/log" fstype=xfs \
op start interval=0 timeout=60s \
op stop interval=0 timeout=120s
primitive res_fs_mssqlTempDB Filesystem \
params device="/dev/drbd2" directory="/var/opt/mssql/tempDB" fstype=xfs \
op start interval=0 timeout=60s \
op stop interval=0 timeout=120s
primitive res_mssql ocf:mssql:fci \
op monitor interval=5s timeout=30s \
op start interval=0 timeout=60s \
op stop interval=0 timeout=60s
group mssqlserver res_fs_mssqlData res_fs_mssqlLog res_fs_mssqlTempDB ip_mssql
ms ms_drbd_mssql res_drbd_mssql \
meta notify=true master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
colocation col_mssql_drbd inf: mssqlserver ms_drbd_mssql:Master
order ord_mssql inf: ms_drbd_mssql:promote mssqlserver:start
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=1.1.14-70404b0 \
cluster-infrastructure=corosync \
cluster-name=mssqlserver \
stonith-enabled=false \
start-failure-is-fatal=false \
last-lrm-refresh=1510177588 \
startup-fencing=true \
enable-startup-probes=true \
symmetric-cluster=true \
stop-orphan-actions=true \
stonith-action=reboot \
remove-after-stop=false \
stop-all-resources=false \
stop-orphan-resources=true \
no-quorum-policy=ignore \
is-managed-default=true
我可以手动启动mssql-server
:
sudo systemctl start mssql-server
sudo systemctl status mssql-server
mssql-server.service - Microsoft SQL Server Database Engine
Loaded: loaded (/lib/systemd/system/mssql-server.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2017-11-09 12:49:21 CST; 1s ago
Docs: https://docs.microsoft.com/en-us/sql/linux
Main PID: 3368 (sqlservr)
Tasks: 62
Memory: 171.0M
CPU: 1.770s
CGroup: /system.slice/mssql-server.service
3368 /opt/mssql/bin/sqlservr
3371 /opt/mssql/bin/sqlservr
Nov 09 12:49:21 hostname2 systemd[1]: Started Microsoft SQL Server Database Engine.
这些是我在 中发现的唯一实际错误/var/opt/mssql/log/errorlog
:
2017-11-09 12:49:28.17 spid4s Service Master Key could not be decrypted using one of its encryptions. See sys.key_encryptions for details.
2017-11-09 12:49:28.17 spid4s An error occurred during Service Master Key initialization. SQLErrorCode=33095, State=8, LastOsError=0.
2017-11-09 12:49:31.14 spid22s The Service Broker endpoint is in disabled or stopped state.
2017-11-09 12:49:31.14 spid22s The Database Mirroring endpoint is in disabled or stopped state.
2017-11-09 12:49:31.17 spid22s Service Broker manager has started.
2017-11-09 12:49:31.37 spid4s Recovery is complete. This is an informational message only. No user action is required.
手动 drbd 故障转移通过umount /dev/drbd0 /dev/drbd1 /dev/drbd2
和来工作drbdadm secondary mssql
,然后在新的主节点上逆转该过程(drbdadm primary mssql
和挂载……)。
我的 /etc/drbd.d/mssql.res conf(/etc/drbd.d/global_common.conf 与存储库相同):
resource mssql {
handlers {
split-brain "/usr/lib/drbd/notify-split-brain.sh root";
}
net {
after-sb-0pri discard-least-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
}
volume 0 {
device minor 0;
disk /dev/VG-SqlData/LV-SqlData;
meta-disk internal;
}
volume 1 {
device minor 1;
disk /dev/VG-SqlLogs/LV-SqlLogs;
meta-disk internal;
}
volume 2 {
device minor 2;
disk /dev/VG-TempDB/LV-TempDB;
meta-disk internal;
}
syncer {
rate 35M;
verify-alg md5;
}
on hostname1 {
address <ip addr1>:7788;
}
on hostname2 {
address <ip addr2>:7788;
}
}
答案1
尝试使用systemd
来启动服务:
crm configure edit res_mssql
编辑配置,使其如下所示:
primitive res_mssql systemd:mssql-server \
op monitor interval=30s timeout=30s \
op start interval=0 timeout=60s \
op stop interval=0 timeout=60s
这应该能完成相同的任务。不过,我认为资源代理可以接受一些额外的参数,这些参数可能就是让它按照您尝试的方式工作所需的全部参数。
我建议检查 RA 信息,看看是否可以找出您缺少哪些参数:crm ra info ocf:mssql:fci