在 Linux 中用一行分割和格式化文件

Question

文本格式化程序par（和 GNU sed）（非par解决方案请参见末尾）：

$ tr -s ' ' <file.in | awk '{ print $0, "\n" }' | par 50p8h | sed -r -e '/^$/d' -e 's/^ {8}/\t/'
11430.00 SH: gry to dk gry, firm to mod hd, plty,
        flk, ea to gt, abd LCM; SLTST: gry, sft to
        firm, amor to blky, slty to ea
11460.00 SH: gry to dk gry, firm to mod hd, plty,
        flk, ea to gt, abd LCM; SLTST: gry, sft to
        firm, amor to blky, slty to ea
11490.00 MRL: lt gry, mod hd, blky, occ flk, wxy;
        SH: gry to dk gry, firm to mod hd, plty,
        occ blky, ea to gt; SLTST: gry to dk gry,
        mod firm to firm, amor, blky, slty
11520.00 SH: gry to dk gry, firm to mod hd, plty,
        blky, ea to gt, tr MRL, occ LCM; SLTST:
        gry, occ brnsh rd, firm, amor to blky, ea
        to g

tr -s ' '将多个连续的空格压缩为一个。
该awk代码只是向每行输入添加一个额外的换行符。
par 50p8h将生成的文本设置为 50 个字符的宽度，并带有 8 个字符的悬挂缩进。
这些sed表达式删除空行并用单个制表符替换行开头的 8 个空格。

对于不使用 GNU 的解决方案，您必须在最后一个表达式sed中插入文字制表符。\tsed

对于使用空格作为缩进并将左侧悬挂缩进与屏幕截图中完全相同的解决方案（9 个空格）：

$ tr -s ' ' <file | awk '{ print $0, "\n" }' | par 50p9h | sed -e '/^$/d'
11430.00 SH: gry to dk gry, firm to mod hd, plty,
         flk, ea to gt, abd LCM; SLTST: gry, sft
         to firm, amor to blky, slty to ea
11460.00 SH: gry to dk gry, firm to mod hd, plty,
         flk, ea to gt, abd LCM; SLTST: gry, sft
         to firm, amor to blky, slty to ea
11490.00 MRL: lt gry, mod hd, blky, occ flk, wxy;
         SH: gry to dk gry, firm to mod hd, plty,
         occ blky, ea to gt; SLTST: gry to dk gry,
         mod firm to firm, amor, blky, slty
11520.00 SH: gry to dk gry, firm to mod hd, plty,
         blky, ea to gt, tr MRL, occ LCM; SLTST:
         gry, occ brnsh rd, firm, amor to blky, ea
         to g

添加jto50p9h可以很好地证明段落的合理性：

11430.00 SH: gry to dk gry,  firm to mod hd, plty,
         flk, ea  to gt, abd LCM;  SLTST: gry, sft
         to firm, amor to blky, slty to ea
11460.00 SH: gry to dk gry,  firm to mod hd, plty,
         flk, ea  to gt, abd LCM;  SLTST: gry, sft
         to firm, amor to blky, slty to ea
11490.00 MRL: lt gry, mod  hd, blky, occ flk, wxy;
         SH: gry to dk gry,  firm to mod hd, plty,
         occ blky, ea to gt; SLTST: gry to dk gry,
         mod firm to firm, amor, blky, slty
11520.00 SH: gry to dk gry,  firm to mod hd, plty,
         blky, ea  to gt, tr MRL,  occ LCM; SLTST:
         gry, occ brnsh rd, firm, amor to blky, ea
         to g

...并且添加l到此也会强制调整每个段落的最后一行（不太好）：

11430.00 SH: gry to dk gry,  firm to mod hd, plty,
         flk,  ea  to  gt, abd  LCM;  SLTST:  gry,
         sft  to firm,  amor to  blky, slty  to ea
11460.00 SH: gry to dk gry,  firm to mod hd, plty,
         flk,  ea  to  gt, abd  LCM;  SLTST:  gry,
         sft  to firm,  amor to  blky, slty  to ea
11490.00 MRL: lt gry, mod  hd, blky, occ flk, wxy;
         SH: gry to dk gry,  firm to mod hd, plty,
         occ  blky, ea  to  gt; SLTST:  gry to  dk
         gry, mod  firm to firm, amor,  blky, slty
11520.00 SH:   gry  to   dk  gry,   firm  to   mod
         hd,  plty,  blky,  ea   to  gt,  tr  MRL,
         occ   LCM;   SLTST:    gry,   occ   brnsh
         rd,  firm,   amor  to   blky,  ea   to  g

par大多数 Unices 上的大多数包管理器都可以提供它，但您也可以在以下位置找到它的源代码（以便自己编译）http://www.nicemice.net/par/

解决方案使用fmt而不是par

$ tr -s ' ' <file.in | awk '{ print $0, "\n" }' | fmt -w 50 |
  awk '/^[^0-9]/  { $0 = "         " $0 }
                  { print }' | fmt -w 50 | sed '/^$/d'
11430.00 SH: gry to dk gry, firm to mod hd, plty,
         flk, ea to gt, abd LCM; SLTST: gry, sft
         to firm, amor to blky, slty to ea
11460.00 SH: gry to dk gry, firm to mod hd, plty,
         flk, ea to gt, abd LCM; SLTST: gry, sft
         to firm, amor to blky, slty to ea
11490.00 MRL: lt gry, mod hd, blky, occ flk, wxy;
         SH: gry to dk gry, firm to mod hd, plty,
         occ blky, ea to gt; SLTST: gry to dk gry,
         mod firm to firm, amor, blky, slty
11520.00 SH: gry to dk gry, firm to mod hd, plty,
         blky, ea to gt, tr MRL, occ LCM; SLTST:
         gry, occ brnsh rd, firm, amor to blky, ea
         to g

fmt它的格式不那么灵活，这里我们需要使用它两次才能获得预期的结果。我们还利用了每个原始行都以数字开头的事实。

tr -s ' '，像之前一样。
awk '{ print $0, "\n" }'，像之前一样。
第一个fmt调用 ( fmt -w 50) 的作用是将每个段落的第一行调整为正确的宽度（50 个字符）。
该awk脚本会将不以数字开头的每一行缩进 9 个空格。
第二次fmt调用会将整个文本格式化为 50 个字符，但现在缩进的行将保持缩进。
该sed表达式删除空行。

Answer 1