在 ubuntu 18.04 上使用 ABAQUS 6.14(还有 ABAQUS 2018)时,除了进程终止standard
(执行隐式分析——如果你不熟悉这个也没关系)。
分析确实有效,因为在日志文件(.sta
对于熟悉 abaqus 的人来说,是文件)中可以看到消息THE ANALYSIS HAS COMPLETED SUCCESSFULLY
。输出数据库包含分析结果。但是,在分析完成后,该过程standard
仍处于睡眠状态使用 0% CPU 并保持与运行时相同数量的 RAM。
从strace
我得到:
[pid 23191] close(8) = 0
[pid 23185] <... select resumed> ) = 0 (Timeout)
[pid 23185] select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=50000} <unfinished ...>
[pid 23193] <... select resumed> ) = 0 (Timeout)
[pid 23193] futex(0x7f3acd917db0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 23191] futex(0x7f3acd917db0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 23193] <... futex resumed> ) = 0
[pid 23191] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
[pid 23191] futex(0x7f3acd917db0, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000} <unfinished ...>
[pid 23191] munmap(0x7f3ab130b000, 327680) = 0
[pid 23191] munmap(0x7f3ab136b000, 1114112) = 0
[pid 23191] munmap(0x7f3ab16db000, 1114112) = 0
[pid 23191] munmap(0x7f3ab0fbb000, 1114112) = 0
[pid 23191] munmap(0x7f3ab0ddb000, 1114112) = 0
[pid 23191] munmap(0x7f3ab0a0b000, 1114112) = 0
[pid 23191] munmap(0x7f3ab03fb000, 1114112) = 0
[pid 23191] munmap(0x7f3ab050b000, 1114112) = 0
[pid 23191] munmap(0x7f3ab00cb000, 1114112) = 0
[pid 23191] munmap(0x7f3ab02eb000, 1114112) = 0
[pid 23191] munmap(0x7f3ab14eb000, 1114112) = 0
[pid 23191] futex(0x7f3ab8a5dd44, FUTEX_WAIT_PRIVATE, 8, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 23191] futex(0x7f3ab8a5dd44, FUTEX_WAIT_PRIVATE, 12, NULL <unfinished ...>
[pid 23193] <... select resumed> ) = 0 (Timeout)
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000}) = 0 (Timeout)
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000} <unfinished ...>
[pid 23185] <... select resumed> ) = 0 (Timeout)
[pid 23185] select(10, [5 6 8 9], NULL, NULL, {tv_sec=0, tv_usec=20000} <unfinished ...>
[pid 23193] <... select resumed> ) = 0 (Timeout)
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000} <unfinished ...>
[pid 23185] <... select resumed> ) = 0 (Timeout)
[pid 23185] select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=50000} <unfinished ...>
[pid 23193] <... select resumed> ) = 0 (Timeout)
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000}) = 0 (Timeout)
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000} <unfinished ...>
[pid 23185] <... select resumed> ) = 0 (Timeout)
[pid 23185] select(10, [5 6 8 9], NULL, NULL, {tv_sec=0, tv_usec=20000} <unfinished ...>
[pid 23193] <... select resumed> ) = 0 (Timeout)
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000} <unfinished ...>
[pid 23185] <... select resumed> ) = 0 (Timeout)
[pid 23185] select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=50000} <unfinished ...>
[pid 23193] <... select resumed> ) = 0 (Timeout)
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000}) = 0 (Timeout)
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000}) = 0 (Timeout)
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000} <unfinished ...>
就像两个进程处于死锁状态一样。此外,命令
pid -p 7002
和
pid -p 7010
确实会给出空输出。目录/proc/7002
和/proc/7010
不存在。
唯一执行的与 abaqus 相关的进程是
david 6995 0.0 0.1 295428 51388 pts/0 S 17:00 0:00 /opt/abaqus/6.14-1/code/bin/python /opt/abaqus/6.14-1
david 6998 0.0 0.2 368744 97948 pts/0 S 17:00 0:00 /opt/abaqus/6.14-1/code/bin/python std_inst.com
david 7001 0.1 0.0 122076 20096 pts/0 Sl 17:00 0:03 /opt/abaqus/6.14-1/code/bin/eliT_DriverLM -job std_in
david 7008 0.4 0.5 735812 185364 pts/0 Sl 17:00 0:07 /opt/abaqus/6.14-1/code/bin/standard -standard -acade
在 ubuntu 16.04 上,完全相同的版本运行良好。以下是strace
ubuntu 16.04 上的情况(内核版本与我的 18.04 相同,即 4.15.0-29):
3890 close(8) = 0
3892 <... select resumed> ) = 0 (Timeout)
3892 futex(0x7f29e43e1db0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
3890 futex(0x7f29e43e1db0, FUTEX_WAKE_PRIVATE, 1) = 0
3892 <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
3892 futex(0x7f29e43e1db0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
3890 futex(0x7f29e43e1db0, FUTEX_WAKE_PRIVATE, 1) = 0
3892 <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
3892 futex(0x7f29e43e1db0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
3890 futex(0x7f29e43e1db0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
3892 <... futex resumed> ) = 0
3890 <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
3890 futex(0x7f29e43e1db0, FUTEX_WAKE_PRIVATE, 1) = 0
3892 select(7, [4 5 6], NULL, NULL, {0, 20000} <unfinished ...>
3890 munmap(0x7f29c7adb000, 327680) = 0
3890 munmap(0x7f29c7b3b000, 1114112) = 0
3890 munmap(0x7f29c7eab000, 1114112) = 0
3890 munmap(0x7f29c778b000, 1114112) = 0
3890 munmap(0x7f29c75ab000, 1114112) = 0
3890 munmap(0x7f29c71db000, 1114112) = 0
3890 munmap(0x7f29c6bcb000, 1114112) = 0
3890 munmap(0x7f29c6cdb000, 1114112) = 0
3890 munmap(0x7f29c689b000, 1114112) = 0
3890 munmap(0x7f29c6abb000, 1114112) = 0
3890 munmap(0x7f29c7cbb000, 1114112) = 0
3890 exit_group(0) = ?
3891 +++ exited with 0 +++
3893 +++ exited with 0 +++
3892 +++ exited with 0 +++
3890 +++ exited with 0 +++
3880 <... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 3890
3880 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3890, si_uid=1000, si_status=0, si_utime=107, si_stime=7} ---
有人有好的办法可以解决这个问题吗?或者我应该朝哪个方向进一步研究。
答案1
我找到了一个通过使用奇点容器来避免死锁的解决方案,正如 Will Furnass 在这里提出的:http://learningpatterns.me/posts-output/2018-01-30-abaqus-singularity/
虽然一开始有点复杂,但只要设置正确,它就会像魔法一样工作。我在主机系统 (Manjaro/Arch linux) 上修改了 abaqus 的别名,使它们指向 singularity 容器中的安装并在容器环境中执行命令。但是,由于我需要 Intel Fortran 编译器,我生成了一个基本的 centos 7 容器,然后对其进行了修改以安装编译器和 abaqus(在本例中为 v2019),而不是使用 Will Furnass 建议的 .def 脚本。
虽然设置需要一些时间,但现在我有一个可以在任何运行 singularity 的系统上使用的容器映像,这非常好 :)
编辑:我还测试了将工作安装复制到较新的 Linux 系统(并避免全新安装 abaqus),我可以确认这在我的情况下不起作用(CentOS 7 安装复制到 Manajaro 系统)。
答案2
达索系统本月发布了一个错误修复:
您需要更新至Abaqus 2018
至Abaqus 2018-HF16
https://software.3ds.com/更多详情请访问https://github.com/willfurnass/abaqus-2017-centos-7-singularity/issues/5#issue-713025844
我尝试更新它Abaqus 2020
,Abaqus 2020-HF5
它适用于 Ubuntu 20.04 和 Fedora 32。
答案3
我想介绍一下我针对这个问题的解决方法。我为 abq2018 求解器制作了一个 Python 包装器,用于检查 .sta 文件的完整性。一旦 .sta 文件完成,任何名为 standard 的进程都将被终止。我发现当 standard 被终止并且分析完成时,求解器会正常退出。
此解决方法并非完美的解决方案。此解决方法的当前问题:
- 无法直接替换 abq2018 求解器调用
- 无法通过 GUI 运行,必须通过 shell 运行
- 仅解析 job= 参数
- 每次只能运行一个分析,因为所有标准进程被终止
- 如果未创建或修改 .sta 文件,abq 将永远挂起
如何使用此解决方法:
- 创建名为 abq 的 Python 文件。abq 的代码详述如下。如果您使用的是 abq2018 以外的求解器,请将行 cmd = 'abq20xx.. 替换为您使用的求解器。
- 使 abq 可执行并在您的路径中可用。我将 abq 放在 Abaqus 命令文件夹中,然后运行
chmod +x abq
- 通过执行 运行 Abaqus 标准作业
abq job=Job-1
。这将执行 Job-1.inp,然后在 Job-1.sta 完成后终止标准求解器。
abq 的代码如下
#!/usr/bin/python
import subprocess
import sys
import time
arguments = sys.argv
jobname = arguments[1].split('job=')[-1]
cmd = 'abq2018 cpus=4 ask_delete=OFF background job=' + jobname
p = subprocess.call(cmd, shell=True)
complete = False
termination_criteria = [' THE ANALYSIS HAS COMPLETED SUCCESSFULLY\n',
' THE ANALYSIS HAS NOT BEEN COMPLETED\n']
while complete is False:
# wait every 5 seconds
time.sleep(5)
try:
with open(jobname + '.sta', 'r') as f:
last = f.readlines()[-1]
if last in termination_criteria:
# this will kill any process named standard
subprocess.call('pgrep standard | xargs kill', shell=True)
complete = True
except IOError:
# model.sta has been deleted or doesn't exist
# try again in 5 seconds
time.sleep(5)
答案4
我在 Linux Mint 19 上也遇到了这个问题。Abaqus 6.14-5 安装在 Linux Mint 19 上。它无法自动终止,但从 .sta 文件可以看出,分析已完成。我认为这个问题与内核有关。顺便问一下,你现在找到解决方案了吗?