遵循本指南:
https://jabriffa.wordpress.com/2015/02/11/installing-torquepbs-job-scheduler-on-ubuntu-14-04-lts/
我在 Ubuntu 16-04-lts 上安装了 TORQUE(他声称该过程在 16.04 上也一样)
以下是安装说明的简短摘要,以便能够自给自足:
apt-get install torque-server torque-client torque-mom torque-pam
/etc/init.d/torque-mom stop
/etc/init.d/torque-scheduler stop
/etc/init.d/torque-server stop
pbs_server -t create
killall pbs_server
echo SERVER.DOMAIN > /etc/torque/server_name
echo SERVER.DOMAIN > /var/spool/torque/server_priv/acl_svr/acl_hosts
echo [email protected] > /var/spool/torque/server_priv/acl_svr/operators
echo [email protected] > /var/spool/torque/server_priv/acl_svr/managers
echo "SERVER.DOMAIN np=4" > /var/spool/torque/server_priv/nodes
echo SERVER.DOMAIN > /var/spool/torque/mom_priv/config
/etc/init.d/torque-server start
/etc/init.d/torque-scheduler start
/etc/init.d/torque-mom start
# set scheduling properties
qmgr -c 'set server scheduling = true'
qmgr -c 'set server keep_completed = 300'
qmgr -c 'set server mom_job_sync = true
按照他的指示后:
qmgr -c 'set server scheduling = true'
我收到错误消息
qmgr obj=master.node svr=master.node: Unauthorized Request
我按照他提到的方法对日志进行 grep,并发现这个无用的片段:grep Unauthorized /var/spool/torque/server_logs/*
08/25/2018 15:48:43;0080;PBS_Server;Req;req_reject;Reject reply code=15007(Unauthorized Request ), aux=0, type=Manager, from [email protected]
这是我的主机名:
master
这是我的主机文件:
127.0.1.1 master master
127.0.0.1 localhost
10.136.7.155 master.node
10.136.7.155 master
10.136.65.29 slave1
10.136.73.247 slave2
10.136.44.128 slave3
这就是我配置各种配置文件的方式:
echo master.node > /etc/torque/server_name
echo master.node > /var/spool/torque/server_priv/acl_svr/acl_hosts
echo [email protected] > /var/spool/torque/server_priv/acl_svr/operators
echo [email protected] > /var/spool/torque/server_priv/acl_svr/managers
echo "master.node np=4" > /var/spool/torque/server_priv/nodes
echo master.node > /var/spool/torque/mom_priv/config
每次我摆弄它时,我都会使用以下命令重新启动各种守护进程:
/etc/init.d/torque-server restart
/etc/init.d/torque-scheduler restart
/etc/init.d/torque-mom restart
我目前正在以 root 身份运行。
我完全搞不懂 TORQUE 想要干什么。为什么我没有获得授权?
尽管有 /var/spool/torque/server_priv/nodes 文件,但 qmgr 仍认为没有节点。为什么?
Qmgr: list node
No Active Nodes, nothing done.
答案1
我按照同一链接中的说明进行操作,但遇到了同样的错误。
问题是服务器在本地主机上运行,因此如果您指定了本地主机以外的 FQDN,则请求将显示为来自未经授权的用户。
我必须将服务器域更改为本地主机:
echo localhost > /etc/torque/server_name
echo localhost > /var/spool/torque/server_priv/acl_svr/acl_hosts
echo root@localhost > /var/spool/torque/server_priv/acl_svr/operators
...
...