用于获取文件内容的 shell 脚本

Question 1

当 input.log 作为你的输入时，我有这个可以与 gawk 一起使用，但仍在努力用 awk 找出答案：

cat input.log |
 gawk -F"|" \
    #print the header
 'BEGIN{print"Type, Number,ID,submitted,notsubmitted"}
    #only work on non empty lines
 NF>0{ 
     #create an ID from the first three fields
    n=$1","$2","$3; 
        #everytime the ID pops up, increment subindex 1 or 2 depending on the value of field 4
        if($4=="S:1"){
            array[n][2]++}
        else{
            array[n][1]++}
}
     #print the final array
END{for(i in array){
       #if the value has never been seen declare it to be zero
        if(array[i][1]){
            m=array[i][1]}
        else {
            m=0}
        if(array[i][2]){
            n=array[i][2]}
        else {
            n=0}
    print i","m","n}
}'

Answer

当 input.log 作为你的输入时，我有这个可以与 gawk 一起使用，但仍在努力用 awk 找出答案：

cat input.log |
 gawk -F"|" \
    #print the header
 'BEGIN{print"Type, Number,ID,submitted,notsubmitted"}
    #only work on non empty lines
 NF>0{ 
     #create an ID from the first three fields
    n=$1","$2","$3; 
        #everytime the ID pops up, increment subindex 1 or 2 depending on the value of field 4
        if($4=="S:1"){
            array[n][2]++}
        else{
            array[n][1]++}
}
     #print the final array
END{for(i in array){
       #if the value has never been seen declare it to be zero
        if(array[i][1]){
            m=array[i][1]}
        else {
            m=0}
        if(array[i][2]){
            n=array[i][2]}
        else {
            n=0}
    print i","m","n}
}'

Question 2

要查找最近 5 分钟内修改的日志文件，您可以使用find.例如

find data_logs/ -type f -name 'abc.log.*' -mmin -6

这会找到修改少于的日志文件6几分钟前，这对于大多数用途来说应该足够了。如果您需要精确的文件修改时间，请使用：

find data_logs/ -type f -name 'abc.log.*' \( -mmin -5 -o -mmin 5 \)

它将查找不到 5 分钟前或正好 5 分钟前修改的文件。

从man find：

-mmin n
    File's data was last modified n minutes ago.

和：

数字参数可以指定为

 +n     for greater than n,
 -n     for less than n,
  n     for exactly n.

Answer

要查找最近 5 分钟内修改的日志文件，您可以使用find.例如

find data_logs/ -type f -name 'abc.log.*' -mmin -6

这会找到修改少于的日志文件6几分钟前，这对于大多数用途来说应该足够了。如果您需要精确的文件修改时间，请使用：

find data_logs/ -type f -name 'abc.log.*' \( -mmin -5 -o -mmin 5 \)

它将查找不到 5 分钟前或正好 5 分钟前修改的文件。

从man find：

-mmin n
    File's data was last modified n minutes ago.

和：

数字参数可以指定为

 +n     for greater than n,
 -n     for less than n,
  n     for exactly n.

Question 3

交叉发布的问题https://stackoverflow.com/q/57377173/3220113已被搁置。我将在这里复制已接受的答案，其他问题可以删除。

对于一个文件： 首先使流易于使用 awk 进行处理（可以全部在 awk 中完成，以获得稍微更好的性能）：

sed -nr 's/\|/,/g;s/(^R_MT,.*),S:([^ ]) *$/\1 \2/p' <(zcat abc.log.2019041607.gz)

结果（添加额外测试线后）

R_MT,D:1234,ID:413 1
R_MT,D:1234,ID:413 1
R_MT,D:1234,ID:413 1
R_MT,D:1234,ID:413 1
R_MT,D:1234,ID:413 1
R_MT,D:1234,ID:413 1
R_MT,D:1234,ID:413 1
R_MT,D:1234,ID:413 1
R_MT,D:1234,ID:413 1
R_MT,D:1234,ID:413 1
R_MT,D:1234,ID:413 0
R_MT,D:1234,ID:413 0
R_MT,D:1234,ID:413 0
R_MT,D:1234,ID:413 0
R_MT,D:1234,ID:413 0
R_MT,D:1234,ID:414 1
R_MT,D:1234,ID:414 1
R_MT,D:1235,ID:413 1
R_MT,D:1235,ID:413 1

现在在 awk 中对它们进行计数，使用数组 a 作为字段名。

sed -nr 's/\|/,/g;s/(^R_MT,.*),S:([^ ]) *$/\1 \2/p' <(zcat abc.log.2019041607.gz) |
   awk '{a[$1]; if ($2>0) notsub[$1]++; else submit[$1]++;}
        END {for (i in a) print i "," submit[i]+0 "," notsub[i]+0;}
       '

对于 5 个文件，首先确定您想要的结果。每个文件都有不同的输出文件：使用类似的循环

while IFS= read -r filename; do
   ... <( zcat "${filename}") ...
done < <(find datalogs -type f -name "abc*" -mmin -5)

5 个文件的结果相加为一个总和

... <( find datalogs -type f -name "abc*" -mmin -5 -exec zcat {} \;) ...

Answer