解析列式数据

Question 1

编辑1：

一种方法可能是这样的：

getStats | grep ESTABLISHED | column -t | sed \
-e 's/\(<-\|->\)[ ]\+/\1 /g' \
-e 's/[ ]\+\([^ ]\+$\)/\t\1/' | column -t -s "   "
                                               ^--- TAB
all  tcp  117.54.56.131:80     <- 10.42.100.211:63752                        ESTABLISHED:ESTABLISHED
all  tcp  10.42.120.201:63752  -> 219.224.67.112:31180  -> 137.51.59.141:80  ESTABLISHED:ESTABLISHED
all  tcp  77.221.237.24:443    <- 10.42.100.117:59999                        ESTABLISHED:ESTABLISHED

首先使用然后删除一个空格之后column -t的所有连续空格，然后用分隔最后一列并执行一个新的<-->tabcolumn -t -s '<TAB>'

如果在命令行上：-s“ Ctrl+VTAB”（又名。tab）作为column.可以选择首先使用tr制表符替换空格。

设置sed为一项操作并跳过grep和一些其他修改：

getStats | column -t | \
sed '/ESTABLISHED/!d;s/\(<-\|->\) */\1  /g;s/ *\([^ ]*\)$/\t \1/' | \
column -t -s "    "
                ^--- TAB

编辑2：

即使你发现awk而且printf很混乱，我还是把它作为一个选项。有了这个脚本，你可以说：

getStats | scrip_name ESTABLISHED

优点之一是可以灵活定制等。

无论哪种方式，人们要么解析数据两次，要么保存有关数据的元数据并在最后打印。

简而言之，它的作用是：

记录每列的最大宽度。
记录最大列数。
将每个字段按行保存到数组中。
最后打印每个字段，但最后使用该列的最大宽度。
填充至最大列数 - 1有空格。
打印最后一个字段。

（代码和其余代码之间的分割awk -v pat="$1" '仅是由于此页面上的自定义突出显示 HTML 注释所致）

#!/bin/bash

# Argument 1 is what to match against.
awk -v pat="$1" '

# Iff match pat.
$0 ~ pat {
    # Highest number of columns.
    if (NF > cols)
        cols = NF
    # Increment number of lines.
    ++nl
    # Number of fileds on this line.
    lines[nl] = NF

    for (i = 1; i <= NF; ++i) {
        # IFF not last field and 
        # width of field is > current width of column, store it in wc_a.
        if (i < NF && (wc = length($i)) > wc_a[i])
            wc_a[i] = wc
        # Save columns in array lines[LINE COLUMN]=FIELD_DATA.
        lines[nl,i] = $i
    }
}

END {
    # Loop lines.
    for (i = 1; i <= nl; ++i) {
        # Print all but last.
        for (j = 1; j < lines[i]; ++j)
            printf("%-*s ", wc_a[j], lines[i,j])
        # Print "missing" columns.
        for (; j < cols; ++j)
            printf("%-*s ", wc_a[j], "")
        # Print last column field.
        printf("%s\n", lines[i,lines[i]])
    }
}
' "$2"
# $2 is either file or empty: expect pipe.

老的：

删除了，在这里找到。

Answer

编辑1：

一种方法可能是这样的：

getStats | grep ESTABLISHED | column -t | sed \
-e 's/\(<-\|->\)[ ]\+/\1 /g' \
-e 's/[ ]\+\([^ ]\+$\)/\t\1/' | column -t -s "   "
                                               ^--- TAB
all  tcp  117.54.56.131:80     <- 10.42.100.211:63752                        ESTABLISHED:ESTABLISHED
all  tcp  10.42.120.201:63752  -> 219.224.67.112:31180  -> 137.51.59.141:80  ESTABLISHED:ESTABLISHED
all  tcp  77.221.237.24:443    <- 10.42.100.117:59999                        ESTABLISHED:ESTABLISHED

首先使用然后删除一个空格之后column -t的所有连续空格，然后用分隔最后一列并执行一个新的<-->tabcolumn -t -s '<TAB>'

如果在命令行上：-s“ Ctrl+VTAB”（又名。tab）作为column.可以选择首先使用tr制表符替换空格。

设置sed为一项操作并跳过grep和一些其他修改：

getStats | column -t | \
sed '/ESTABLISHED/!d;s/\(<-\|->\) */\1  /g;s/ *\([^ ]*\)$/\t \1/' | \
column -t -s "    "
                ^--- TAB

编辑2：

即使你发现awk而且printf很混乱，我还是把它作为一个选项。有了这个脚本，你可以说：

getStats | scrip_name ESTABLISHED

优点之一是可以灵活定制等。

无论哪种方式，人们要么解析数据两次，要么保存有关数据的元数据并在最后打印。

简而言之，它的作用是：

记录每列的最大宽度。
记录最大列数。
将每个字段按行保存到数组中。
最后打印每个字段，但最后使用该列的最大宽度。
填充至最大列数 - 1有空格。
打印最后一个字段。

（代码和其余代码之间的分割awk -v pat="$1" '仅是由于此页面上的自定义突出显示 HTML 注释所致）

#!/bin/bash

# Argument 1 is what to match against.
awk -v pat="$1" '

# Iff match pat.
$0 ~ pat {
    # Highest number of columns.
    if (NF > cols)
        cols = NF
    # Increment number of lines.
    ++nl
    # Number of fileds on this line.
    lines[nl] = NF

    for (i = 1; i <= NF; ++i) {
        # IFF not last field and 
        # width of field is > current width of column, store it in wc_a.
        if (i < NF && (wc = length($i)) > wc_a[i])
            wc_a[i] = wc
        # Save columns in array lines[LINE COLUMN]=FIELD_DATA.
        lines[nl,i] = $i
    }
}

END {
    # Loop lines.
    for (i = 1; i <= nl; ++i) {
        # Print all but last.
        for (j = 1; j < lines[i]; ++j)
            printf("%-*s ", wc_a[j], lines[i,j])
        # Print "missing" columns.
        for (; j < cols; ++j)
            printf("%-*s ", wc_a[j], "")
        # Print last column field.
        printf("%s\n", lines[i,lines[i]])
    }
}
' "$2"
# $2 is either file or empty: expect pipe.

老的：

删除了，在这里找到。

Question 2

这里真正的问题是列数不相等。有些行有六列，有些有八列。

因此，您需要做的是在缺少的第 x 和第 y 字段中添加一个空的字段（x 和 y 可能是 5 和 6，或者可能是 3 和 4）。

你可以这样做：

F="\\(\\S\\S*\\)\\s*\\s"
# This is 0160, a nonbreaking space
G=" "

| sed -e "s/^$F$F$F$F$F$F*$/\\1 \\2 \\3 \\4 \\5 $G $G \\6/g" \
| column -t

sed识别那些只有六个字段的行，并在适当的情况下添加两个额外的字段。有了上面的内容，我得到

all  tcp  117.54.56.131:80     <-  10.42.100.211:63752                         ESTABLISHED:ESTABLISHED
all  tcp  10.42.120.201:63752  ->  219.224.67.112:31180  ->  137.51.59.141:80  ESTABLISHED:ESTABLISHED
all  tcp  77.221.237.24:443    <-  10.42.100.117:59999                         ESTABLISHED:ESTABLISHED

Answer

这里真正的问题是列数不相等。有些行有六列，有些有八列。

因此，您需要做的是在缺少的第 x 和第 y 字段中添加一个空的字段（x 和 y 可能是 5 和 6，或者可能是 3 和 4）。

你可以这样做：

F="\\(\\S\\S*\\)\\s*\\s"
# This is 0160, a nonbreaking space
G=" "

| sed -e "s/^$F$F$F$F$F$F*$/\\1 \\2 \\3 \\4 \\5 $G $G \\6/g" \
| column -t

sed识别那些只有六个字段的行，并在适当的情况下添加两个额外的字段。有了上面的内容，我得到

all  tcp  117.54.56.131:80     <-  10.42.100.211:63752                         ESTABLISHED:ESTABLISHED
all  tcp  10.42.120.201:63752  ->  219.224.67.112:31180  ->  137.51.59.141:80  ESTABLISHED:ESTABLISHED
all  tcp  77.221.237.24:443    <-  10.42.100.117:59999                         ESTABLISHED:ESTABLISHED

Question 3

这里有一个 Perl 脚本来完成你想要的事情：

$ getStats | grep ESTABLISHED | \
perl -ne '
chomp @a;
@a = split(" ",$_);
map { print "$_," } @a[0..4];
if ($a[5] !~ m/>/) {
  map { print " , ,$_," } @a[5..$#a];
  print "\n";
} else {
  map { print "$_," } @a[5..$#a];
  print "\n";
}
' | column -t -s ','

结果如下：

all  tcp  117.54.56.131:80     <-  10.42.100.211:63752                         ESTABLISHED:ESTABLISHED
all  tcp  10.42.120.201:63752  ->  219.224.67.112:31180  ->  137.51.59.141:80  ESTABLISHED:ESTABLISHED
all  tcp  77.221.237.24:443    <-  10.42.100.117:59999                         ESTABLISHED:ESTABLISHED

我对你的方法采取了稍微不同的方法，column -t并修改了我的 Perl 输出，以便在每个字段之间引入逗号“，”。因此 `column 命令之前的输出如下所示：

all,tcp,117.54.56.131:80,<-,10.42.100.211:63752, , ,ESTABLISHED:ESTABLISHED,
all,tcp,10.42.120.201:63752,->,219.224.67.112:31180,->,137.51.59.141:80,ESTABLISHED:ESTABLISHED,
all,tcp,77.221.237.24:443,<-,10.42.100.117:59999, , ,ESTABLISHED:ESTABLISHED,

thencolumn -t -s ','会诱导column在分隔符上进行分割，我发现这比简单的空白更容易处理。

每行引入逗号对我来说有点老套，但它确实完成了工作，这可能会进一步简化，但这是一个可行的解决方案。

Answer