我试图理解 task_struct 的 mm 和 active_mm 字段之间的区别,并遇到了Linus Torvalds 20 年前的电子邮件其中引用了“匿名进程”的概念:
- we have "real address spaces" and "anonymous address spaces". The
difference is that an anonymous address space doesn't care about the
user-level page tables at all, so when we do a context switch into an
anonymous address space we just leave the previous address space
active.
[...]
- "tsk->mm" points to the "real address space". For an **anonymous process**,
tsk->mm will be NULL, for the logical reason that an **anonymous process**
really doesn't _have_ a real address space at all.
- however, we obviously need to keep track of which address space we
"stole" for such an anonymous user. For that, we have "tsk->active_mm",
which shows what the currently active address space is.
The rule is that for a process with a real address space (ie tsk->mm is
non-NULL) the active_mm obviously always has to be the same as the real
one.
For a **anonymous process**, tsk->mm == NULL, and tsk->active_mm is the
"borrowed" mm while the **anonymous process** is running. When the
**anonymous process** gets scheduled away, the borrowed address space is
returned and cleared.
答案1
这部分或多或少地解释了电子邮件你遗漏了。
“匿名地址空间”的明显用途是任何不需要任何用户映射的线程 - 所有内核线程基本上都属于这一类,但即使是“真实”线程也可以暂时说在一段时间内它们不会运行对用户空间感兴趣,并且调度程序可能会尽量避免在切换虚拟机状态上浪费时间。目前只有旧式 bdflush 同步可以做到这一点。
内核线程只访问内核内存,因此它们不关心用户空间内存中的内容。 “匿名进程”是对这些的优化。
当调度程序切换到内核线程任务时,它可以跳过相对耗时的内存映射设置,而仅保留前一个进程的地址空间。地址空间的内核部分对于所有进程都以相同的方式映射,因此对于这些任务使用哪种映射没有任何区别。
这种优化也可以暂时应用于用户空间任务,同时该任务正在运行内核空间代码,例如在等待系统调用sync
完成时,因为真正的地址空间只需要在返回用户空间代码之前恢复。正如电子邮件中提到的,至少从那时起,似乎就不再这样做了bdflush
被内核线程取代pdflush
。
答案2
它将匿名内存表示为没有文件或设备支持的内存映射。这就是程序从操作系统分配内存以供堆和堆栈等使用的方式。首先,匿名映射仅分配虚拟内存。新映射从零页写入映射上的冗余副本开始。