我们使用定制的opsi.orgbootimage 在客户的客户端计算机上自动安装 Windows。该引导映像的用户区基于上游引导映像,经过我们的一些修改,以及取自 Ubuntu 的内核。
自从我们将内核从 Ubuntu yakkety 升级到 Linux 4.8.0-42.45 以来,我们开始收到客户的投诉,称安装由于段lshw
错误而停止:
[7] [Apr 27 23:29:08] Expecting compressed data from server (JSONRPC.py|660)
[5] [Apr 27 23:29:08] Running hardware inventory (setup.py|140)
[7] [Apr 27 23:29:08] Command 'lshw' found at: '/usr/bin/lshw' (Posix.py|640)
[6] [Apr 27 23:29:08] Executing: /usr/bin/lshw -xml 2>/dev/null (Posix.py|660)
[6] [Apr 27 23:29:08] Using encoding 'UTF-8' (Posix.py|691)
[7] [Apr 27 23:29:08] Exit code: 139 (Posix.py|748)
[2] [Apr 27 23:29:09] Traceback: (Logger.py|742)
[2] [Apr 27 23:29:09] line 1390 in '<module>' in file '/usr/local/bin/master.py' (Logger.py|742)
[2] [Apr 27 23:29:09] line 141 in '<module>' in file '/tmp/setup.py' (Logger.py|742)
[2] [Apr 27 23:29:09] line 2482 in 'auditHardware' in file '/usr/lib/pymodules/python2.6/OPSI/System/Posix.py' (Logger.py|742)
[2] [Apr 27 23:29:09] line 2526 in 'hardwareInventory' in file '/usr/lib/pymodules/python2.6/OPSI/System/Posix.py' (Logger.py|742)
[2] [Apr 27 23:29:09] line 755 in 'execute' in file '/usr/lib/pymodules/python2.6/OPSI/System/Posix.py' (Logger.py|742)
[2] [Apr 27 23:29:09] ==>>> Command '/usr/bin/lshw -xml 2>/dev/null' failed (139):
(master.py|1438)
同时,以下错误被记录到dmesg
:
[ 69.852348] usercopy: kernel memory exposure attempt detected from c0080000 (dma-kmalloc-512) (4096 bytes)
[ 69.852365] ------------[ cut here ]------------
[ 69.852367] kernel BUG at /build/linux-7qXOmc/linux-4.8.0/mm/usercopy.c:75!
[ 69.852370] invalid opcode: 0000 [#1] SMP
[ 69.852371] Modules linked in: arc4 md4 nls_utf8 cifs fscache joydev rtsx_usb_ms memstick snd_hda_intel rtsx_usb_sdmmc snd_hda_codec snd_hda_core acer_wmi snd_hwdep rtsx_usb r8169 fjes video sparse_keymap snd_pcm mii mei_txe wmi input_leds snd_timer mac_hid snd mei lpc_ich ahci libahci intel_smartconnect soundcore
[ 69.852399] CPU: 0 PID: 1528 Comm: lshw Not tainted 4.8.0-42-generic #45-Ubuntu
[ 69.852400] Hardware name: Acer Extensa 2508/Extensa 2508, BIOS V1.10 12/15/2014
[ 69.852402] task: f6d16f00 task.stack: f6cc6000
[ 69.852405] EIP: 0060:[<dd1f7543>] EFLAGS: 00010282 CPU: 0
[ 69.852411] EIP is at __check_object_size+0x123/0x12c
[ 69.852413] EAX: 0000005e EBX: c0080000 ECX: 00000247 EDX: 00000247
[ 69.852414] ESI: 00001000 EDI: dda5944f EBP: f6cc7ee0 ESP: f6cc7eb8
[ 69.852416] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 69.852418] CR0: 80050033 CR2: bf910000 CR3: 36883e40 CR4: 001006f0
[ 69.852419] Stack:
[ 69.852420] dda5f86c dda628ce dda97569 c0080000 f1402080 00001000 c0081000 c0080000
[ 69.852427] 00090000 00001000 f6cc7f1c dd5074d6 00000000 00001000 bf900678 00010000
[ 69.852434] 00080000 00000000 00090000 00000000 00090000 00000000 dd507430 f6cc7f60
[ 69.852440] Call Trace:
[ 69.852447] [<dd5074d6>] read_mem+0xa6/0x1f0
[ 69.852451] [<dd507430>] ? write_mem+0x1f0/0x1f0
[ 69.852454] [<dd1fb15f>] __vfs_read+0x1f/0x50
[ 69.852457] [<dd1fb85f>] vfs_read+0x7f/0x140
[ 69.852461] [<dd80a0a0>] ? down_write+0x10/0x40
[ 69.852465] [<dd1fc9e9>] SyS_read+0x49/0xb0
[ 69.852469] [<dd0037cd>] do_fast_syscall_32+0x8d/0x140
[ 69.852472] [<dd80c07a>] sysenter_past_esp+0x47/0x75
[ 69.852473] Code: 89 74 24 14 0f 44 ca ba ce 28 a6 dd 89 44 24 10 0f 44 d7 89 5c 24 0c 89 4c 24 08 89 54 24 04 c7 04 24 6c f8 a5 dd e8 15 91 f8 ff <0f> 0b b8 97 28 a6 dd eb b9 55 89 e5 57 56 53 83 ec 1c 3e 8d 74
[ 69.852516] EIP: [<dd1f7543>] __check_object_size+0x123/0x12c SS:ESP 0068:f6cc7eb8
[ 69.852523] ---[ end trace 5b12719d45b0befe ]---
我认为要么是 Linux 中存在错误lshw
,要么是硬件有问题。不过这个问题影响了很多机器,所以我排除了有缺陷的硬件。该问题似乎仅发生在 Linux >= 4.8 上;至少Linux 4.4不受影响。这可能是由于 Linux 4.8 中引入了用户复制强化。
该问题不会影响所有机器(例如,它在我的 VirtualBox VM 中运行良好;我当前测试的受影响机器是 Acer Extensa 2508 笔记本)。我们确实使用了一个lshw
可能非常古老的版本:
root@testnb:~# uname -a
Linux testnb 4.8.0-42-generic #45-Ubuntu SMP Wed Mar 8 20:05:25 UTC 2017 i686 GNU/Linux
root@testnb:~# lshw -version
B.02.14
root@testnb:~# lshw
Segmentation fault
我怀疑这可能是原因,所以我lshw
从 Debian jessie 静态编译了 02.17-1.1,但这也不起作用:
root@testnb:~# uname -a
Linux testnb 4.8.0-42-generic #45-Ubuntu SMP Wed Mar 8 20:05:25 UTC 2017 i686 GNU/Linux
root@testnb:~# ./lshw-02.17-static -version
B.02.17
root@testnb:~# ./lshw-02.17-static
Segmentation fault
我尝试了 Ubuntu yakkety 的 Linux 4.8 软件包,它是更新的:
root@testnb:~# uname -a
Linux testnb 4.8.0-49-generic #52-Ubuntu SMP Thu Apr 20 09:39:42 UTC 2017 i686 GNU/Linux
root@testnb:~# lshw
Segmentation fault
root@testnb:~# ./lshw-02.17-static
Segmentation fault
来自 Ubuntu 的 Linux 4.10:
root@testnb:~# uname -a
Linux testnb 4.10.0-20-generic #22-Ubuntu SMP Thu Apr 20 09:22:16 UTC 2017 i686 GNU/Linux
root@testnb:~# lshw
Segmentation fault
root@testnb:~# ./lshw-02.17-static
Segmentation fault
我现在不知道该怎么办。有任何想法吗?
编辑:我从我们的日志中编制了受影响计算机的列表:
martin@dogmeat ~/pssh/lshw-segfault-bootimage/output % cat *.out | sed 's/.*DMI: //' | sort | uniq
Acer Extensa 2508/Extensa 2508, BIOS V1.09 10/24/2014 (Posix.py|741)
Acer Extensa 2508/Extensa 2508, BIOS V1.10 12/15/2014 (Posix.py|741)
Dell Inc. Latitude D630 /0KU184, BIOS A17 01/04/2010 (Posix.py|741)
Dell Inc. Latitude E5500 /0DW634, BIOS A15 11/05/2009 (Posix.py|741)
Dell Inc. Vostro 1015 /047MWF, BIOS A03 09/01/2010 (Posix.py|741)
FUJITSU ESPRIMO P910/D3162-A1, BIOS V4.6.5.3 R1.19.0 for D3162-A1x 12/17/2012 (Posix.py|741)
FUJITSU ESPRIMO P910/D3162-A1, BIOS V4.6.5.3 R1.22.0 for D3162-A1x 10/15/2013 (Posix.py|741)
Hewlett-Packard HP Compaq 6730b (GW687AV)/30DD, BIOS 68PDD Ver. F.10 07/31/2009 (Posix.py|741)
Hewlett-Packard HP Compaq 8510p /30C5, BIOS 68MVD Ver. F.0F 02/05/2008 (Posix.py|741)
Hewlett-Packard HP EliteBook 2540p/7008, BIOS 68CSU Ver. F.24 09/12/2013 (Posix.py|741)
Hewlett-Packard HP EliteBook 8470p/179B, BIOS 68ICF Ver. F.42 05/20/2013 (Posix.py|741)
Hewlett-Packard HP ProBook 4720s/1411, BIOS 68AZZ Ver. F.0B 09/16/2010 (Posix.py|741)
IBM 1860W25/1860W25, BIOS 70ET40WW (1.04 ) 06/02/2005 (Posix.py|741)
IBM 1860WR7/1860WR7, BIOS 70ET66WW (1.26 ) 05/18/2006 (Posix.py|741)
LENOVO 80ES/Lenovo B50-30, BIOS 9CCN21WW(V1.06) 04/09/2014 (Posix.py|741)
Quanta TW8/SW8/DW8/TW8/SW8/DW8, BIOS A3B92 10/07/2008 (Posix.py|741)
To Be Filled By O.E.M. To Be Filled By O.E.M./ALiveNF6G-GLAN, BIOS P1.70 03/06/2009 (Posix.py|741)
TOSHIBA Satellite Pro R50-B/Satellite Pro R50-B, BIOS Version 1.40 09/25/2014 (Posix.py|741)
TOSHIBA TECRA M10/Portable PC, BIOS Version 3.00 09/08/2009 (Posix.py|741)
EDIT2:尝试lshw
从Debianstretch,以及最新版本上游:
root@testnb:~# uname -a
Linux testnb 4.8.0-42-generic #45-Ubuntu SMP Wed Mar 8 20:05:25 UTC 2017 i686 GNU/Linux
root@testnb:~# ./lshw-02.18-static
Segmentation fault
root@testnb:~# ./lshw-static-b1eab6372d
Segmentation fault
EDIT3:我现在已经在 VirtualBox VM 中使用干净的 Linux 发行版(Ubuntu 17.04 live CD)对此进行了测试,并且我可以确认这个问题在那里可以重现 - 但是仅有的32 位lshw
:
第一次尝试使用 64 位 Live CD - 内置
lshw
作品:root@ubuntu:~# uname -a Linux ubuntu 4.10.0-19-generic #21-Ubuntu SMP Thu Apr 6 17:04:57 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux root@ubuntu:~# dpkg -l | grep lshw ii lshw 02.18-0.1ubuntu3 amd64 information about hardware configuration root@ubuntu:~# lshw | wc -l 231
我的静态 32 位版本之一
lshw
没有:root@ubuntu:~# ./lshw-02.18-static Segmentation fault
第二次尝试使用 32 位 Live CD - 现在甚至内置功能也
lshw
无法工作:root@ubuntu:~# uname -a Linux ubuntu 4.10.0-19-generic #21-Ubuntu SMP Thu Apr 6 17:03:14 UTC 2017 i686 i686 i686 GNU/Linux root@ubuntu:~# dpkg -l | grep lshw ii lshw 02.18-0.1ubuntu3 i386 information about hardware configuration root@ubuntu:~# lshw Segmentation fault
lshw
当我在没有 root 权限的情况下运行时,不会出现分段错误:ubuntu@ubuntu:~$ lshw | wc -l WARNING: you should run this program as super-user. WARNING: output may be incomplete or inaccurate, you should run this program as super-user. 168
我们在两台不同的机器(一台配备 ASUSTeK H170-PRO/USB 3.1 主板,另一台配备 ASUSTeK P8H77-M 主板)和几种不同的 VM 类型(Microsoft Windows -> Windows 7(32 位)、Microsoft Windows -> Windows 10(64 位),Linux -> Ubuntu(32 位),并且问题总是可重现的。
EDIT4:由于某种原因,我现在无法在 VirtualBox 中重现我们的引导映像的问题。也许这取决于虚拟机配置?不过,它仍然可以在 Acer Extensa 2508 机器上重现。由于该问题似乎仅影响 32 位构建,因此我们现在通过在支持 64 位的计算机上lshw
使用 64 位启动映像和静态 64 位构建来解决此问题。lshw