根据时间快照查找可能的进程

根据时间快照查找可能的进程

介绍

我正在尝试捕捉在有限时间段内启动的进程。

我创建了一个脚本(ps-suspects.sh),其中:

  • ps-suspects.sh从终端运行。
  • 我启动并关闭一个应用程序,比如说桌面计算器。
  • 我按Ctrl+C结束ps-suspects.sh
  • 我想知道计算器的进程名称是什么
  • 我不想列出快照整个期间运行的所有其他进程名称。

问题

我有一段需要微调的代码:

$ sort -k15 ~/pid.log | uniq -f14 -c

以下是它产生的结果:

$ head ~/pid.tmp
  1 /mnt/e/bin/ps-suspects.sh Possible suspects causing problems
 63 1 S root       127     2  0  60 -20 -     0 -      Sep08 ?        00:00:00 [acpi_thermal_pm]
 63 1 S root        75     2  0  60 -20 -     0 -      Sep08 ?        00:00:00 [ata_sff]
 63 1 S root       447     2  0  60 -20 -     0 -      Sep08 ?        00:00:00 [ath10k_aux_wq]
 63 1 S root       446     2  0  60 -20 -     0 -      Sep08 ?        00:00:00 [ath10k_wq]
 63 1 S avahi      922   910  0  80   0 - 11195 -      Sep08 ?        00:00:00 avahi-daemon: chroot helper
 63 4 S avahi      910     1  0  80   0 - 11228 -      Sep08 ?        00:00:00 avahi-daemon: running [alien.local]
126 0 S rick      2902  2867  0  80   0 -  7409 wait_w Sep08 pts/18   00:00:00 bash
 63 0 S rick     25894  5775  0  80   0 -  4908 wait   10:43 pts/2    00:00:00 /bin/bash /mnt/e/bin/ps-suspects.sh
 63 0 S root       980   976  0  80   0 -  4921 -      Sep08 ?        00:00:01 /bin/bash /usr/local/bin/display-auto-brightness

我想消除所有出现63或出现多次的行。

期望输出

$ ps-suspects.sh
20 times / second ps -elf is captured to /home/rick/pid.log

Type Ctrl+C when done capturing

~/pid.log is sorted and uniq counted on column 15
which is full path and program name.

Then all matches with same unique count (the headings)
are stripped and only new processes started are printed.

This function can help you trace down what processes are
causing you grief for lid close events, hot plugging, etc.
^C 
wc of ~/pid.log :   17288  343162 2717102 /home/rick/pid.log

HighCnt: 63
      1 /mnt/e/bin/ps-suspects.sh Possible suspects causing problems
     26 0 R rick     25976  2051  0  80   0 - 120676 -     10:43 ?        00:00:00 gnome-calculator
     62 0 S root     22561   980  0  80   0 -  3589 -      10:42 ?        00:00:00 sleep 60

问题

在这个例子中,63将出现在第 1 列的 90%-99% 的行上,这些行需要删除。所有出现的126也可以被删除。所以任何东西发生频率最高、影响最大的可以被移除。

有人能想出缺失的awk和/或uniq和/或grep来完成任务吗?

答案1

Python 来救援:

python3 -c 'import sys,collections;l=[(int(L.split(None,1)[0]),L)for L in sys.stdin.readlines()];m=collections.Counter(x[0]for x in l).most_common(1)[0][0];print(*[x[1]for x in l if x[0]<m],sep="",end="")'

可用作脚本文件的未压缩版本:

#!/usr/bin/env python3
import sys
import collections

# read lines from stdin (with trailing \n) and extract the number in their first column
items = [(int(line.split(None, 1)[0]), line) for line in sys.stdin]
# find the most common number from the first column
most_common = collections.Counter(item[0] for item in items).most_common()[0][0]
# print input lines in order, but only those with their number lower than the most common
print(*[item[1] for item in items if item[0] < most_common], sep="", end="")

此脚本对其输入(预期通过管道传输到 stdin)的唯一假设是,每行在其第一个以空格分隔的列中都有一个有效的整数。这些行不需要以任何形式进行排序。


笔记:如果第一列中有多个不同的最常见数字,且计数相同,则选择这两个数字中的哪一个是任意的,但对于相同的输入应该是恒定的。如果不希望出现这种情况,则必须将查找最常见值的行替换为类似下面的代码,以查找最高的最常见值:

most_common = sorted(collections.Counter(item[0] for item in items).most_common(),
                     key=lambda x:x[::-1])[-1][0]

输入示例:

1 foo
3 bar
2 baz
3 apple
3 banana
2 cherry
4 beep

示例输出:

1 foo
2 baz
2 cherry

答案2

awk救援

sortuniq完成他们的工作后,我完成了awk这个项目。Byte 的答案仍然被接受,因为他是第一个回答的人,并且提供了使用 Python 的出色见解。最后,我蹒跚地凑齐了 3 或 4 个不同的堆栈交换通过问题和答案找到可行的解决方案。

对于那些感兴趣的人,这里有完整的脚本:

#!/bin/bash

echo 20 times / second ps -elf is captured to ~/pid.log
echo
echo "Type Ctrl+C when done capturing"
echo
echo "~/pid.log is sorted and uniq counted on column 15"
echo "which is full path and program name."
echo
echo "Then all matches with same unique count (the headings)"
echo "are stripped and only new processes started are printed."
echo
echo "This function can help you trace down what processes are"
echo "causing you grief for lid close events, hot plugging, etc."

trap ctrl_c INT

function ctrl_c() {
    echo " "
    printf "wc of ~/pid.log : " 
    wc ~/pid.log
    echo
    # Tally occurances of same Command Name always running
    sort -k15 ~/pid.log | uniq -f14 -c > ~/pid.tmp
    # What is the most frequent occurance?
    <~/pid.tmp awk '{print $1}' | sort -n | uniq -c > ~/pid.wrk
    HighCnt=$(awk -v max=0 '{if($1>max){want=$2; max=$1}}END{print want} '\
        ~/pid.wrk)
    echo HighCnt: $HighCnt
    # Exclude all processes => highest count
    awk '($1 < var) ' var="$HighCnt" ~/pid.tmp
    rm ~/pid.wrk ~/pid.tmp ~/pid.log
    
    exit 0
}

echo "$0 Possible suspects causing problems" > ~/pid.log

while true ; do

    ps -elf >> ~/pid.log
    sleep .05 # 20 samples / second
    
done

缺失的部分

这是我添加的使其工作的代码。

# What is the most frequent occurance?
<~/pid.tmp awk '{print $1}' | sort -n | uniq -c > ~/pid.wrk
HighCnt=$(awk -v max=0 '{if($1>max){want=$2; max=$1}}END{print want} '\
    ~/pid.wrk)
echo HighCnt: $HighCnt
# Exclude all processes => highest count
awk '($1 < var) ' var="$HighCnt" ~/pid.tmp

示例输出

PIDgnome-calculator按照预期显示。有一个多余的行sleep 60,这是不需要的,但会出现,因为守护进程在测试期间被唤醒并再次休眠。

$ ps-suspects.sh
10 times / second ps -elf is captured to /home/rick/pid.log

Type Ctrl+C when done capturing

~/pid.log is sorted and uniq counted on column 15
which is full path and program name.

Then all matches with same unique count (the headings)
are stripped and only new processes started are printed.

This function can help you trace down what processes are
causing you grief for lid close events, hot plugging, etc.
^C 
wc of ~/pid.log :   17288  343162 2717102 /home/rick/pid.log

HighCnt: 63
      1 /mnt/e/bin/ps-suspects.sh Possible suspects causing problems
     26 0 R rick     25976  2051  0  80   0 - 120676 -     10:43 ?        00:00:00 gnome-calculator
     62 0 S root     22561   980  0  80   0 -  3589 -      10:42 ?        00:00:00 sleep 60

相关内容