版本:Ubuntu 22.04.3 GCC:11.4.0(Ubuntu 11.4.0-1ubuntu1~22.04)
你好,
我正在尝试创建一个具有非常大的共享内存对象的程序。这是在具有 130GB RAM 的 AWS EC2 实例上完成的。如果创建了高达 120GB 的共享内存对象,shm_open()、ftruncate() 和 mmap() 不会出现任何错误。但是,当逐行读取共享内存对象中的每个内存位置时,会发生总线错误。我创建了一个附加的小型测试程序,它会产生完全可重复的结果。
请注意,shmmax = 18446744073692774399、shmall = 18446744073692774399 和 shmmni = 8092。
当从共享内存对象的底部开始向上读取时,总线错误发生在 66,936,954,880 处。当从顶部 (79,999,999,999) 开始向下读取时,在读取 13,063,041,023 次后发生总线错误。因此在从底部开始的 66,936,958,976 处。因此,总线错误发生的位置之间有一页 (4096) 的间隙。
知道会发生什么吗?
谢谢,
基因
非常简单的 C/C++ 测试程序显示了这个问题。共享内存对象只是硬编码为 80GB。更改注释行以使其通过共享内存对象增加或减少。
// g++ -std=c++20 -O3 test2.cpp -W -Wall -Wextra -pedantic -pthread -o test2
#include <iostream>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/types.h>
int main() {
uint_fast64_t mem_amt = 80000000000;
std::cout << "mem_amt = " << mem_amt << "\n";
int fd;
std::string shmpath = "/foo";
// Remove any existing shared memory object
shm_unlink(shmpath.c_str());
// Create the shared memory object with read-write access.
fd = shm_open(shmpath.c_str(), O_CREAT | O_EXCL | O_RDWR, S_IRUSR | S_IWUSR);
if (fd == -1) {
std::cerr << "\nshm_open shmbuf failure. Exiting program.\n\n";
exit(EXIT_FAILURE);
}
// Truncate (set) the size.
if (ftruncate64(fd, mem_amt) == -1) {
std::cerr << "\nftruncate shmbuf failure. Exiting program.\n\n";
exit(EXIT_FAILURE);
}
// Map the shared memory object.
char* pool = (char*)mmap(NULL, mem_amt, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (pool == MAP_FAILED) {
std::cerr << "\nmmap pool failure. Exiting program.\n\n";
exit(EXIT_FAILURE);
}
std::cout << "pool = " << (uint_fast64_t)pool << "\n";
char temp;
for (uint_fast64_t i=0; i<mem_amt; i++) {
// for (uint_fast64_t i=mem_amt-1; i>0; i--) {
temp = pool[i];
if (i % 5000000000 == 0) {
std::cout << "i = " << i << "\n";
}
}
std::cout << "temp = " << temp << "\n";
}
gbd 分别输出递增和递减的核心文件:
Core was generated by `./test2'.
Program terminated with signal SIGBUS, Bus error.
#0 0x00005570b7fd1373 in main () at test2.cpp:47
47 temp = pool[i];
(gdb) bt full
#0 0x00005570b7fd1373 in main () at test2.cpp:47
i = 66936954880
mem_amt = 80000000000
fd = <optimized out>
shmpath = "/foo"
pool = 0x7fa09da0e000 ""
temp = <optimized out>
(gdb)
Core was generated by `./test2'.
Program terminated with signal SIGBUS, Bus error.
#0 0x000055e242fdc379 in main () at test2.cpp:47
47 temp = pool[i];
(gdb) bt full
#0 0x000055e242fdc379 in main () at test2.cpp:47
i = 13063041023
mem_amt = 80000000000
fd = <optimized out>
shmpath = "/foo"
pool = 0x7f7366a0e000 ""
temp = <optimized out>
(gdb)