所以我正在使用一个自定义内核模块,我正在为其编写一个 python 前端。内核模块工作,并将帧缓冲区设备文件添加到/dev/fb1
.我可以很好地读取和写入它。我一直在使用 python 的mmap
模块来映射设备缓冲区,这似乎效果很好。
现在我正在尝试实现 numpy,并且我正在使用 numpy 的 memmap 函数,我的假设是,它应该以类似的方式工作。问题是使用 numpy 的 memmap 函数打开设备文件会挂起内核(我认为)。
这就是我最初打开文件时所做的
self.surface = np.memmap(dev, dtype=np.uint16, mode='r+', shape=(320,240))
该进程挂起,我无法杀死 python,除非通过killall python
它可能使文件资源保持打开状态。任何后续再次打开文件的访问都会无限期地挂起,只需执行以下操作
f = open('/dev/fb1', 'r+b')
我在 dmesg 中得到这个
[ 1081.480104] INFO: task python2.6:2834 blocked for more than 120 seconds.
[ 1081.480109] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1081.480113] python2.6 D 0000000100004eec 0 2834 1 0x00000004
[ 1081.480118] ffff88020a175db8 0000000000000086 0000000000000000 0000000000015980
[ 1081.480122] ffff88020a175fd8 0000000000015980 ffff88020a175fd8 ffff88022b69adc0
[ 1081.480127] 0000000000015980 0000000000015980 ffff88020a175fd8 0000000000015980
[ 1081.480131] Call Trace:
[ 1081.480142] [<ffffffff81049b17>] ? mutex_spin_on_owner+0x97/0xc0
[ 1081.480148] [<ffffffff81589477>] __mutex_lock_slowpath+0xf7/0x180
[ 1081.480151] [<ffffffff8158935b>] mutex_lock+0x2b/0x50
[ 1081.480157] [<ffffffff812f3bcf>] fb_release+0x1f/0x60
[ 1081.480161] [<ffffffff81154825>] __fput+0xf5/0x210
[ 1081.480164] [<ffffffff81154965>] fput+0x25/0x30
[ 1081.480168] [<ffffffff81123d35>] remove_vma+0x45/0x90
[ 1081.480171] [<ffffffff81126179>] do_munmap+0x309/0x3a0
[ 1081.480174] [<ffffffff81126266>] sys_munmap+0x56/0x80
[ 1081.480180] [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b
[ 1081.480183] INFO: task ipython:2856 blocked for more than 120 seconds.
[ 1081.480185] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1081.480187] ipython D ffff880256e92018 0 2856 1841 0x00000000
[ 1081.480191] ffff88022341fb58 0000000000000086 ffffffff81625f10 0000000000015980
[ 1081.480196] ffff88022341ffd8 0000000000015980 ffff88022341ffd8 ffff88022b7c16e0
[ 1081.480200] 0000000000015980 0000000000015980 ffff88022341ffd8 0000000000015980
[ 1081.480204] Call Trace:
[ 1081.480207] [<ffffffff81589477>] __mutex_lock_slowpath+0xf7/0x180
[ 1081.480210] [<ffffffff8158935b>] mutex_lock+0x2b/0x50
[ 1081.480214] [<ffffffff812f3cd8>] fb_open+0xc8/0x200
[ 1081.480217] [<ffffffff8115657d>] ? cdev_get+0x2d/0xb0
[ 1081.480221] [<ffffffff81156e6a>] chrdev_open+0x10a/0x200
[ 1081.480225] [<ffffffff810878a1>] ? in_group_p+0x31/0x40
[ 1081.480228] [<ffffffff81156d60>] ? chrdev_open+0x0/0x200
[ 1081.480232] [<ffffffff811512c5>] __dentry_open+0xe5/0x330
[ 1081.480237] [<ffffffff81260e4f>] ? security_inode_permission+0x1f/0x30
[ 1081.480240] [<ffffffff81151624>] nameidata_to_filp+0x54/0x70
[ 1081.480244] [<ffffffff8115e398>] finish_open+0xe8/0x1d0
[ 1081.480248] [<ffffffff8116701f>] ? dput+0xdf/0x1b0
[ 1081.480251] [<ffffffff8115f7f6>] do_last+0x86/0x460
[ 1081.480254] [<ffffffff81161b2b>] do_filp_open+0x21b/0x660
[ 1081.480259] [<ffffffff8112117f>] ? handle_mm_fault+0x32f/0x440
[ 1081.480263] [<ffffffff8116d33a>] ? alloc_fd+0x10a/0x150
[ 1081.480266] [<ffffffff81151069>] do_sys_open+0x69/0x170
[ 1081.480270] [<ffffffff811511b0>] sys_open+0x20/0x30
[ 1081.480273] [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b
我想我的问题是,我可以手动终止系统调用吗?或者以某种方式解锁互斥锁?或者我完全错过了错误告诉我的内容。
奇怪的是,即使只是 memmap 调用也会损坏帧缓冲区并将垃圾写入我的显示器。我猜这只是 numpy 不能很好地处理设备文件。
更新:
这是 ps -l 的输出。第一个 python 是最初运行 numpy 调用的 python memmap
(至少我相当确定)。第二个 ipython 是在第一个进程挂起后运行一个简单的普通 python open 调用。
0 D 1000 2834 1 0 80 0 - 22101 fb_rel ? 00:00:00 python2.6
0 D 1000 2856 1841 0 80 0 - 15065 fb_ope pts/1 00:00:00 ipython
答案1
堆栈跟踪表明,之前的 python 命令被卡在自旋锁中,试图在退出时释放互斥体(出现了严重错误)。 POSIX 规定您不能从与进入互斥体的线程不同的线程中释放该互斥体。下一步是找出是什么资源导致互斥锁在退出时被保留。从:
linux-source-2.6.38/kernel/mutex.c:
69 * The mutex must later on be released by the same task that
70 * acquired it. Recursive locking is not allowed. The task
71 * may not exit without first unlocking the mutex. Also, kernel
72 * memory where the mutex resides mutex must not be freed with
73 * the mutex still locked. The mutex must first be initialized
74 * (or statically defined) before it can be locked. memset()-ing
75 * the mutex to 0 is not allowed.
您可以从跟踪 python 或在驱动程序上启用 dtrace 开始。