挂起系统调用

挂起系统调用

所以我正在使用一个自定义内核模块,我正在为其编写一个 python 前端。内核模块工作,并将帧缓冲区设备文件添加到/dev/fb1.我可以很好地读取和写入它。我一直在使用 python 的mmap模块来映射设备缓冲区,这似乎效果很好。

现在我正在尝试实现 numpy,并且我正在使用 numpy 的 memmap 函数,我的假设是,它应该以类似的方式工作。问题是使用 numpy 的 memmap 函数打开设备文件会挂起内核(我认为)。

这就是我最初打开文件时所做的

self.surface = np.memmap(dev, dtype=np.uint16, mode='r+', shape=(320,240))

该进程挂起,我无法杀死 python,除非通过killall python它可能使文件资源保持打开状态。任何后续再次打开文件的访问都会无限期地挂起,只需执行以下操作

f = open('/dev/fb1', 'r+b')

我在 dmesg 中得到这个

[ 1081.480104] INFO: task python2.6:2834 blocked for more than 120 seconds.
[ 1081.480109] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1081.480113] python2.6     D 0000000100004eec     0  2834      1 0x00000004
[ 1081.480118]  ffff88020a175db8 0000000000000086 0000000000000000 0000000000015980
[ 1081.480122]  ffff88020a175fd8 0000000000015980 ffff88020a175fd8 ffff88022b69adc0
[ 1081.480127]  0000000000015980 0000000000015980 ffff88020a175fd8 0000000000015980
[ 1081.480131] Call Trace:
[ 1081.480142]  [<ffffffff81049b17>] ? mutex_spin_on_owner+0x97/0xc0
[ 1081.480148]  [<ffffffff81589477>] __mutex_lock_slowpath+0xf7/0x180
[ 1081.480151]  [<ffffffff8158935b>] mutex_lock+0x2b/0x50
[ 1081.480157]  [<ffffffff812f3bcf>] fb_release+0x1f/0x60
[ 1081.480161]  [<ffffffff81154825>] __fput+0xf5/0x210
[ 1081.480164]  [<ffffffff81154965>] fput+0x25/0x30
[ 1081.480168]  [<ffffffff81123d35>] remove_vma+0x45/0x90
[ 1081.480171]  [<ffffffff81126179>] do_munmap+0x309/0x3a0
[ 1081.480174]  [<ffffffff81126266>] sys_munmap+0x56/0x80
[ 1081.480180]  [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b
[ 1081.480183] INFO: task ipython:2856 blocked for more than 120 seconds.
[ 1081.480185] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1081.480187] ipython       D ffff880256e92018     0  2856   1841 0x00000000
[ 1081.480191]  ffff88022341fb58 0000000000000086 ffffffff81625f10 0000000000015980
[ 1081.480196]  ffff88022341ffd8 0000000000015980 ffff88022341ffd8 ffff88022b7c16e0
[ 1081.480200]  0000000000015980 0000000000015980 ffff88022341ffd8 0000000000015980
[ 1081.480204] Call Trace:
[ 1081.480207]  [<ffffffff81589477>] __mutex_lock_slowpath+0xf7/0x180
[ 1081.480210]  [<ffffffff8158935b>] mutex_lock+0x2b/0x50
[ 1081.480214]  [<ffffffff812f3cd8>] fb_open+0xc8/0x200
[ 1081.480217]  [<ffffffff8115657d>] ? cdev_get+0x2d/0xb0
[ 1081.480221]  [<ffffffff81156e6a>] chrdev_open+0x10a/0x200
[ 1081.480225]  [<ffffffff810878a1>] ? in_group_p+0x31/0x40
[ 1081.480228]  [<ffffffff81156d60>] ? chrdev_open+0x0/0x200
[ 1081.480232]  [<ffffffff811512c5>] __dentry_open+0xe5/0x330
[ 1081.480237]  [<ffffffff81260e4f>] ? security_inode_permission+0x1f/0x30
[ 1081.480240]  [<ffffffff81151624>] nameidata_to_filp+0x54/0x70
[ 1081.480244]  [<ffffffff8115e398>] finish_open+0xe8/0x1d0
[ 1081.480248]  [<ffffffff8116701f>] ? dput+0xdf/0x1b0
[ 1081.480251]  [<ffffffff8115f7f6>] do_last+0x86/0x460
[ 1081.480254]  [<ffffffff81161b2b>] do_filp_open+0x21b/0x660
[ 1081.480259]  [<ffffffff8112117f>] ? handle_mm_fault+0x32f/0x440
[ 1081.480263]  [<ffffffff8116d33a>] ? alloc_fd+0x10a/0x150
[ 1081.480266]  [<ffffffff81151069>] do_sys_open+0x69/0x170
[ 1081.480270]  [<ffffffff811511b0>] sys_open+0x20/0x30
[ 1081.480273]  [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b

我想我的问题是,我可以手动终止系统调用吗?或者以某种方式解锁互斥锁?或者我完全错过了错误告诉我的内容。

奇怪的是,即使只是 memmap 调用也会损坏帧缓冲区并将垃圾写入我的显示器。我猜这只是 numpy 不能很好地处理设备文件。

更新:

这是 ps -l 的输出。第一个 python 是最初运行 numpy 调用的 python memmap(至少我相当确定)。第二个 ipython 是在第一个进程挂起后运行一个简单的普通 python open 调用。

0 D  1000  2834     1  0  80   0 - 22101 fb_rel ?        00:00:00 python2.6
0 D  1000  2856  1841  0  80   0 - 15065 fb_ope pts/1    00:00:00 ipython

答案1

堆栈跟踪表明,之前的 python 命令被卡在自旋锁中,试图在退出时释放互斥体(出现了严重错误)。 POSIX 规定您不能从与进入互斥体的线程不同的线程中释放该互斥体。下一步是找出是什么资源导致互斥锁在退出时被保留。从:

linux-source-2.6.38/kernel/mutex.c:
     69  * The mutex must later on be released by the same task that
     70  * acquired it. Recursive locking is not allowed. The task
     71  * may not exit without first unlocking the mutex. Also, kernel
     72  * memory where the mutex resides mutex must not be freed with
     73  * the mutex still locked. The mutex must first be initialized
     74  * (or statically defined) before it can be locked. memset()-ing
     75  * the mutex to 0 is not allowed.

您可以从跟踪 python 或在驱动程序上启用 dtrace 开始。

相关内容