统计 ~/.bash_history 中所有调用的程序/命令

2024-9-1 • tag-icon

我想提取所有命令/程序~/.bash_history并计算每个命令/程序被调用的次数。程序的参数无关紧要，应忽略。我使用这个 bash oneliner 来做到这一点：

cut -f 1 -d ' ' ~/.bash_history | sort | uniq -c | sort -h

但这种方法会漏掉一些命令。例如，下面几行中的命令mount、tee和应该被计算在内：mktempcut

sudo mount /dev/sdb2 /foo/bar
candump can0 | tee canlog
T=$(mktemp -d)
diff <(cut -f 2 -d ' ' ./foo ) ./bar

但事实并非如此。如何计算每个命令被调用的次数？有合理的方法吗？

我不在乎强制内置命令（如exit）是否被计算在内，但其他内置命令（如echo）也作为普通程序存在，应该被计算在内，别名也应该被计算在内。没有函数，也没有 if/else 分支，循环关键字和/或与循环在同一行的命令是否被计算在内并不重要，因为数量很少，而且数字不需要精确。

我想这样做，这样我就能知道每个命令被调用了多少次，这样我就可以为最常用的命令使用最短的别名。

编辑

对于此示例~/.bash_history文件：

git commit
sudo mount /dev/sdb2 /foo/bar
gcc -Wall -o foo ./bar.c
git add bar.c
T=$(mktemp -d)
diff <(cut -f 2 -d ' ' ./foo )  <(cut -f 2 -d ' ' ./bar )
diff <(cut -f 2 -d ' ' ./foo1 )  <(cut -f 2 -d ' ' ./bar )
diff <(cut -f 2 -d ' ' ./foo2 )  <(cut -f 2 -d ' ' ./bar )
Vd <(cut -f 2 -d ' ' ./foo2 )  <(cut -f 2 -d ' ' ./bar )
T2=$(mktemp -d)

我期望输出如下：

1  sudo
1  gcc
1  Vd
1  mount
2  mktmp
2  git
3  diff
8  cut

但我明白这一点：

1  sudo
1  gcc
1  Vd
1  T=$(mktemp
1  T2=$(mktemp
2  git
3  diff

在这种情况下Vd是的别名vimdiff。

编辑2

我想出了一个可能更准确的脚本，但仍然有一些问题。我现在检查每一个可能的程序和别名，并计算它在中出现的次数~/bash_history。但仍然存在一些问题。如果任何其他程序的参数与程序或别名的名称匹配，它将被计算在内，但事实并非如此。

对于上面的例子我得到了这个输出：

<many programs with count 0 that i removed, they don't matter>
1 gcc
1 mount
1 sudo
1 Vd
2 c
2 git
2 mktemp
3 diff
8 cut

列出别名c是因为它与cin匹配bar.c。但除此之外，输出应如预期。

这是我编写的脚本：

#!/bin/bash

#Exit script when a program encounters an error or a variable is used which was not defined
set -eu

#Use ~/.bash_history if no argument is given, otherwise use $1
INPUT="~/.bash_history"
if [ $# -ge 1 ]
then
  INPUT="$1"
fi



#Store the code to count the occurences of a program in a input file in the variable FUNCTIONCODE
#Do this because we can not call a function from find, so we have to store the function code and 
# to find, to work around this limitation.
#The code must be called with "$FUNCTIONCODE <input file> <programToCount>" to a shell interpreter
#
#The code counts how many times a program occurs in the given input file and prints the count + word.
#
#The -d argument means read till the end. But then read returns a non-true value, so the || true
# at the end makes bash ignoring the non-true value
read -r -d '' FUNCTIONCODE << '.EOT'   || true
#Needs the input file as $1
#Need a function as argument at $2
count()
{
  #Get the program name in case we became the complete path
  program="$(basename $2)"
  #grep -w will only search for full words. -F will ignore any regex and -o will print only the
  # matching word. We can't use -c because that will only count one per line, even when there are
  # multiple matches on a single line.
  printf "%5i %s\n" "$(grep -o -w -F "$program" "$1" | wc -l)" "$program"
}
#Call the function. Since this FUNCTIONCODE string will be combined with the program argument, the
# program is given to the function count as an argument
count 
.EOT


#Print how many times each function is called
PrintAll()
{
  #Check every program in every PATH location
  for DIR in $(echo "$PATH" | tr ":" "\n")
  do
    find "$DIR/" -type f,l -executable -exec bash -c "$FUNCTIONCODE $INPUT {}" \;
  done
  #Check every alias in ~/bashrc . We assume all aliases start at the beginning of a line.
  for ALIAS in $(grep '^alias' ~/.bashrc | cut -d ' ' -f 2 | cut -d '=' -f 1)
  do
    bash -c "$FUNCTIONCODE $INPUT $ALIAS"
  done
  #Check every alias in ~/.alias
  for ALIAS in $(grep '^alias' ~/.alias | cut -d ' ' -f 2 | cut -d '=' -f 1)
  do
    bash -c "$FUNCTIONCODE $INPUT $ALIAS"
  done
}
#Use uniq because some programs are listed twice, this happens for exampel when an alias matches the
# name of a program.

PrintAll | sort -h | uniq

相关内容