我想使用主机上的 RAM 在 Xeon Phi (mic0) 上使用交换
在主机上:
# free -m
total used free shared buffers cached
Mem: 129022 60312 68710 0 1092 50078
-/+ buffers/cache: 9141 119880
Swap: 0 0 0
在主机上运行此命令:
# mount -t ramfs ramfs /mnt/ramfs/
# dd bs=512M if=/dev/zero of=/mnt/ramfs/ram1 count=48
# echo /mnt/ramfs/ram1 >/sys/class/mic/mic0/virtblk_file
# df -a | grep ramfs
/mnt/ramfs 0 0 0 - /mnt/ramfs
# vim /etc/mpss/default.conf # add:
ExtraCommandLine "vfs_read_optimization=on"
ExtraCommandLine "vfs_write_optimization=on"
# service mpss stop
# micctrl --resetconfig
# service mpss start
然后在mic0上运行:
# modprobe mic_virtblk
# mkswap /dev/vda
# swapon /dev/vda
# free -m
total used free shared buffers cached
Mem: 7697 574 7123 0 0 145
-/+ buffers/cache: 428 7268
Swap: 24575 0 24575
我怎样才能确保交换已经连接到主机的RAM上?
如果它已经连接到主机的 RAM,为什么在 Xeon Phi 上使用交换时变得如此慢?
测试代码:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <time.h>
long timediff(clock_t t1, clock_t t2) {
long elapsed;
elapsed = ((double)t2 - t1) / CLOCKS_PER_SEC * 1000;
return elapsed;
}
int main(int argc, char** argv) {
clock_t t1, t2;
int max = 100;
int mb = 0;
int size = 256;
char* buffer;
if(argc > 1)
max = atoi(argv[1]);
t1 = clock();
while((buffer=malloc(size*1024*1024)) != NULL && mb != max) {
memset(buffer, 0, size*1024*1024);
++mb;
t2 = clock();
printf("Allocated %.2f GB in %ld ms\n", mb * size / 1024.0, timediff(t1, t2) );
t1 = t2;
}
return 0;
}
编译使用:icc swaptest.c -o swaptest -mmic
结果:
# ./swaptest
Allocated 0.25 GB in 260 ms
Allocated 0.50 GB in 269 ms
...
Allocated 6.75 GB in 269 ms
Allocated 7.00 GB in 260 ms
Allocated 7.25 GB in 470 ms
Allocated 7.50 GB in 1819 ms
Allocated 7.75 GB in 2060 ms
Allocated 8.00 GB in 2420 ms
Allocated 8.25 GB in 2820 ms
Allocated 8.50 GB in 2750 ms
Allocated 8.75 GB in 2300 ms
Allocated 9.00 GB in 1380 ms
Allocated 9.25 GB in 1530 ms
Allocated 9.50 GB in 3400 ms
Allocated 9.75 GB in 3800 ms
Allocated 10.00 GB in 3940 ms
Allocated 10.25 GB in 3579 ms
Allocated 10.50 GB in 5050 ms
Allocated 10.75 GB in 5029 ms
Allocated 11.00 GB in 5130 ms
Allocated 11.25 GB in 4770 ms
Allocated 11.50 GB in 3719 ms
Allocated 11.75 GB in 2300 ms
Allocated 12.00 GB in 3619 ms
等等..
与主机系统相比:
$ ./a.out
Allocated 0.25 GB in 140 ms
Allocated 0.50 GB in 170 ms
Allocated 0.75 GB in 160 ms
Allocated 1.00 GB in 160 ms
...
Allocated 23.75 GB in 130 ms
Allocated 24.00 GB in 130 ms
Allocated 24.25 GB in 130 ms
Allocated 24.50 GB in 130 ms
Allocated 24.75 GB in 130 ms
Allocated 25.00 GB in 120 ms
当低于交换时:269 毫秒内 256MB 约为 951MB/s
使用交换时:5.13 秒内的 256MB 约为 48.7MB/s,它比上显示的基准慢得多https://software.intel.com/en-us/blogs/2014/01/07/improving-file-io-performance-on-intel-xeon-phi(至少~360MB/s),这是有意的吗?
我正在使用icc (ICC) 14.0.2 20140120
(parallel_studio_xe_2013_sp1_update2) 和mpss-3.2.1
Xeon Phi5110P系列(第 11 版)