如何将丑陋的输出变成漂亮、有用的信息？

Question 1

有点可怕的 sed oneliner：

sed -n  \
# we divide out incoming text to small parts, 
# each one as you mentioned from /---.*box.*/ to /profile/
'/---.*box.*/,/profile/{
     # inside of each part we do following things:
     # if string matches our pattern we extract 
     # the value and give it some identifier (which you
     # can see is "ij", "st" and so on)
     # and we copy that value with identifier to hold buffer,
     # but we don't replace the content of hold buffer
     # we just append (capital H) new var to it
     /insert_job/{s/[^:]*: /ij"/;s/ .*/",/;H};
     /start_times/{s/[^:]*: /st/;s/$/,/;H};
     /days_of_week/{s/[^:]*: /dw"/;s/$/",/;H};
     /machine/{s/[^:]*: /ma"/;s/$/",/;H};
     /description/{s/[^:]*: /de/;s/$/,/;H};
     /command/{s/[^:]*: /co"/;s/$/",/;H};
     # when line matches next pattern (profile)
     # we think that it is the end of our part,
     # therefore we delete the whole line (s/.*//;)
     # and exchange the pattern and hold buffers (x;)
     # so now in pattern buffer we have several strings with all needed variables
     # but all of them are in pattern space, therefore we can remove
     # all newlines symbols (s/\n//g;). so it is just one string 
     # with a list of variables
     # and we just need to move to the order we want,
     # so in this section we do it with several s commands.
     # after that we print the result (p)
     /profile/{s/.*//;x;s/\n//g;s/ij\("[^"]*box[^"]*",\)/\1/;
          s/,\(.*\)st\("[^"]*",\)\(.*ij"[^"]*",\)/,\2\1\3\2/;
          s/\([^,]*,[^,]*,\)\(.*\)dw\("[^"]*",\)\(.*ij"[^"]*",[^,]*,\)/\1\3\2\4\3/;
          s/de/"",/;s/ij/""\n/;
          s/\(\n[^,]*,[^,]*,[^,]*,\)\(.*\)ma\("[^"]*",\)/\1\3\2/;
          s/co\("[^"]*"\),\(.*\)/\2\1/;s/de//;p}
     };
     # the last command just adds table caption and nothing more.
     # note: if you want to add some new commands,
     # add them before this one
     1i"Job Name", "Time", "Schedule", "Machine", "Description", "Command"'

我写它是因为不同框中的字段顺序可能有所不同，但配置文件始终是最后一个。如果顺序总是相同的话，会更容易一些。

Answer

有点可怕的 sed oneliner：

sed -n  \
# we divide out incoming text to small parts, 
# each one as you mentioned from /---.*box.*/ to /profile/
'/---.*box.*/,/profile/{
     # inside of each part we do following things:
     # if string matches our pattern we extract 
     # the value and give it some identifier (which you
     # can see is "ij", "st" and so on)
     # and we copy that value with identifier to hold buffer,
     # but we don't replace the content of hold buffer
     # we just append (capital H) new var to it
     /insert_job/{s/[^:]*: /ij"/;s/ .*/",/;H};
     /start_times/{s/[^:]*: /st/;s/$/,/;H};
     /days_of_week/{s/[^:]*: /dw"/;s/$/",/;H};
     /machine/{s/[^:]*: /ma"/;s/$/",/;H};
     /description/{s/[^:]*: /de/;s/$/,/;H};
     /command/{s/[^:]*: /co"/;s/$/",/;H};
     # when line matches next pattern (profile)
     # we think that it is the end of our part,
     # therefore we delete the whole line (s/.*//;)
     # and exchange the pattern and hold buffers (x;)
     # so now in pattern buffer we have several strings with all needed variables
     # but all of them are in pattern space, therefore we can remove
     # all newlines symbols (s/\n//g;). so it is just one string 
     # with a list of variables
     # and we just need to move to the order we want,
     # so in this section we do it with several s commands.
     # after that we print the result (p)
     /profile/{s/.*//;x;s/\n//g;s/ij\("[^"]*box[^"]*",\)/\1/;
          s/,\(.*\)st\("[^"]*",\)\(.*ij"[^"]*",\)/,\2\1\3\2/;
          s/\([^,]*,[^,]*,\)\(.*\)dw\("[^"]*",\)\(.*ij"[^"]*",[^,]*,\)/\1\3\2\4\3/;
          s/de/"",/;s/ij/""\n/;
          s/\(\n[^,]*,[^,]*,[^,]*,\)\(.*\)ma\("[^"]*",\)/\1\3\2/;
          s/co\("[^"]*"\),\(.*\)/\2\1/;s/de//;p}
     };
     # the last command just adds table caption and nothing more.
     # note: if you want to add some new commands,
     # add them before this one
     1i"Job Name", "Time", "Schedule", "Machine", "Description", "Command"'

我写它是因为不同框中的字段顺序可能有所不同，但配置文件始终是最后一个。如果顺序总是相同的话，会更容易一些。

Question 2

我会使用 Perl，或者至少使用 awk。

perl -ne '
    BEGIN {
        print "\"Job Name\", \"Time\", \"Schedule\", \"Machine\", \"Description\", \"Command\", \"\n";
    }
    chomp; s/^\s+//; s/\s+$//;
    if (($_ eq "" || eof) && exists $fields{insert_job}) {
        print "\"", join("\", \"", @fields{qw(insert_job start_times days_of_week machine description command)}), "\"\n";
        delete @fields{qw(insert_job)};
    }
    if (/^([^ :]+): *(.*)/) {$fields{$1} = $2}
'

说明：

该BEGIN块在脚本开头运行一次，其余部分针对每个输入行运行。
以开头的行chomp去掉前导和尾随空白。
如果该字段存在，第一if行将在空行（段落分隔符）上触发。insert_job
该delete行删除该insert_job字段。添加您不想从一个段落溢出到下一个段落的其他字段名称。
最后if一行存储字段。

Answer

我会使用 Perl，或者至少使用 awk。

perl -ne '
    BEGIN {
        print "\"Job Name\", \"Time\", \"Schedule\", \"Machine\", \"Description\", \"Command\", \"\n";
    }
    chomp; s/^\s+//; s/\s+$//;
    if (($_ eq "" || eof) && exists $fields{insert_job}) {
        print "\"", join("\", \"", @fields{qw(insert_job start_times days_of_week machine description command)}), "\"\n";
        delete @fields{qw(insert_job)};
    }
    if (/^([^ :]+): *(.*)/) {$fields{$1} = $2}
'

说明：

该BEGIN块在脚本开头运行一次，其余部分针对每个输入行运行。
以开头的行chomp去掉前导和尾随空白。
如果该字段存在，第一if行将在空行（段落分隔符）上触发。insert_job
该delete行删除该insert_job字段。添加您不想从一个段落溢出到下一个段落的其他字段名称。
最后if一行存储字段。

Question 3

使用 TXR 语言：

@(bind inherit-time nil)
@(bind inherit-sched nil)
@(collect)
@  (all)
@indent/* ---------- @jobname ---------- */
@  (and)
@/ *//* ---------- @nil#@type#@nil ---------- */
@  (end)

@  (bind is-indented @(> (length indent) 0))
@  (gather :vars ((time "") (sched "") (mach "") (descr "") (cmd "")))
@/ */start_times: "@*time"
@/ */days_of_week: @sched
@/ */machine: @mach
@/ */description: "@*descr"
@/ */command: @cmd
@  (until)

@  (end)
@  (cases)
@    (bind type "box")
@    (set (inherit-time inherit-sched) (time sched))
@  (or)
@    (bind type "cmd")
@    (bind is-indented t)
@    (set (time sched) (inherit-time inherit-sched))
@  (end)
@(end)
@(output)
"Job Name", "Time", "Schedule", "Machine", "Description", "Command"
@  (repeat)
"@jobname", "@time", "@sched", "@mach", "@descr", "@cmd"
@  (end)
@(end)

这是一种非常幼稚的做法。从每条记录中，我们提取我们感兴趣的所有字段，用空白替换不存在的字段（参数中的默认值:vars）@(gather)。我们关注作业类型（box或cmd）和缩进。当我们看到一个盒子时，我们将一些盒子属性复制到全局变量中；当我们看到缩进的 cmd 时，它会复制这些属性。（我们盲目地假设它们是由较早的人设置的box。）

跑步：

$ txr jobs.txr jobs
"Job Name", "Time", "Schedule", "Machine", "Description", "Command"
"TA#box#AbC_p", "16:15", "su", "", "Job AbC that runs at 4:15PM on Sundays, and should end before 5:30PM", ""
"TA#cmd#EfGJob_p", "16:15", "su", "vm_machine1", "job EfG that runs within box AbC", "/path/to/shell/script.sh"

请注意，输出是逗号分隔的带引号的字段，但对于数据包含引号的可能性没有采取任何措施。如果引号以某种方式在中转义description:，那么当然它将被保留。该@*descr表示法是贪婪匹配，因此description: "a b"c\"d"将导致采用将在输出中逐字再现的descr字符。a b"c\"d

这个解决方案的好处是，如果我们没有数据示例，我们可以从代码结构中猜测大部分数据，因为它表达了整个文件的有序模式匹配。我们可以看到正在收集的部分以一行开头/* --- ... --- */，其中嵌入了作业名称，并且作业名称中间的两个哈希标记之间有一个类型字段。然后是一个强制性的空行，之后收集属性，直到另一个空行，依此类推。

Answer

使用 TXR 语言：

@(bind inherit-time nil)
@(bind inherit-sched nil)
@(collect)
@  (all)
@indent/* ---------- @jobname ---------- */
@  (and)
@/ *//* ---------- @nil#@type#@nil ---------- */
@  (end)

@  (bind is-indented @(> (length indent) 0))
@  (gather :vars ((time "") (sched "") (mach "") (descr "") (cmd "")))
@/ */start_times: "@*time"
@/ */days_of_week: @sched
@/ */machine: @mach
@/ */description: "@*descr"
@/ */command: @cmd
@  (until)

@  (end)
@  (cases)
@    (bind type "box")
@    (set (inherit-time inherit-sched) (time sched))
@  (or)
@    (bind type "cmd")
@    (bind is-indented t)
@    (set (time sched) (inherit-time inherit-sched))
@  (end)
@(end)
@(output)
"Job Name", "Time", "Schedule", "Machine", "Description", "Command"
@  (repeat)
"@jobname", "@time", "@sched", "@mach", "@descr", "@cmd"
@  (end)
@(end)

这是一种非常幼稚的做法。从每条记录中，我们提取我们感兴趣的所有字段，用空白替换不存在的字段（参数中的默认值:vars）@(gather)。我们关注作业类型（box或cmd）和缩进。当我们看到一个盒子时，我们将一些盒子属性复制到全局变量中；当我们看到缩进的 cmd 时，它会复制这些属性。（我们盲目地假设它们是由较早的人设置的box。）

跑步：

$ txr jobs.txr jobs
"Job Name", "Time", "Schedule", "Machine", "Description", "Command"
"TA#box#AbC_p", "16:15", "su", "", "Job AbC that runs at 4:15PM on Sundays, and should end before 5:30PM", ""
"TA#cmd#EfGJob_p", "16:15", "su", "vm_machine1", "job EfG that runs within box AbC", "/path/to/shell/script.sh"

请注意，输出是逗号分隔的带引号的字段，但对于数据包含引号的可能性没有采取任何措施。如果引号以某种方式在中转义description:，那么当然它将被保留。该@*descr表示法是贪婪匹配，因此description: "a b"c\"d"将导致采用将在输出中逐字再现的descr字符。a b"c\"d

这个解决方案的好处是，如果我们没有数据示例，我们可以从代码结构中猜测大部分数据，因为它表达了整个文件的有序模式匹配。我们可以看到正在收集的部分以一行开头/* --- ... --- */，其中嵌入了作业名称，并且作业名称中间的两个哈希标记之间有一个类型字段。然后是一个强制性的空行，之后收集属性，直到另一个空行，依此类推。

如何将丑陋的输出变成漂亮、有用的信息？

答案1

答案2

答案3

相关内容