在 Linux 中用一行分割和格式化文件

在 Linux 中用一行分割和格式化文件

请帮助我找到以下问题的解决方案。我正在尝试在 Linux 中编写一个单行命令,它为我提供了下面的 Example.txt 文件的输出(附图) -

输入 - 示例.txt

11430.00    SH: gry to dk gry, firm to mod hd, plty, flk, ea to gt, abd LCM; SLTST: gry, sft to firm, amor to blky, slty to ea
11460.00    SH: gry to dk gry, firm to mod hd, plty, flk, ea to gt, abd LCM; SLTST: gry, sft to firm, amor to blky, slty to ea
11490.00    MRL: lt gry, mod hd, blky, occ flk, wxy; SH: gry to dk gry, firm to mod hd, plty, occ blky, ea to gt; SLTST: gry to dk gry, mod firm to firm, amor, blky, slty
11520.00    SH: gry to dk gry, firm to mod hd, plty, blky, ea to gt, tr MRL, occ LCM; SLTST: gry, occ brnsh rd, firm, amor to blky, ea to g

我使用了fmt -w 50 -u Example.txt > FMT_Output.txt但没有生成所需的输出。我需要在所有行上添加空格/制表符,除了以数字开头的行(如“所需输出”下所示)。我也尝试使用sed 's/^/ /命令,但这是一个多步骤过程,并且该命令也没有给出所需的输出。您能否告诉我是否有办法在一步过程中完成此操作?

输出

更新的问题

不幸的是,当我尝试加载格式化文件时它不起作用。当我加载文件时,系统应该将数字识别为第一列,将文本识别为第二列。但是整个格式化的第一行进入第一列,剩余文本进入第二列。您能想出一种根据列标题进行格式化的方法吗?请看新图片。

DEEP    Description
(ft)    -
12370.0 LS: Mdst, blsh gry, sft, occ mod firm, crpxln, prly, arg, SLTST: blk, firm-mod hd, amor, gt, mod calc, CLST: lt gry-m gry, sft, amor, wxy
12400.0 LS: Mdst, blsh gry, mod firm, crpxln, chky, arg, SLTST: blk-dk gry, firm-mod hd, amor, gt, mod calc, CLST: lt gry-m gry, occ rdsh gry, mod firm, amor, wxy, tr CHK
12430.0 LS: Mdst, blsh gry, mod firm, crpxln, chky, arg, SLTST: blk-dk gry, firm-mod hd, amor, gt, mod calc, SH: blk-dk gry, mod firm, blky-plty, occ brit, wxy
12460.0 SH: blk-dk gry, mod firm, blky-plty, occ brit, ea, SLTST: blk-dk gry, firm-mod hd, amor, gt, mod calc, SST: gry-dk gry, wl consol, v f, ang, p srt, cotd, slily calc cmt, no fluor

答案1

文本格式化程序par(和 GNU sed)(非par解决方案请参见末尾):

$ tr -s ' ' <file.in | awk '{ print $0, "\n" }' | par 50p8h | sed -r -e '/^$/d' -e 's/^ {8}/\t/'
11430.00 SH: gry to dk gry, firm to mod hd, plty,
        flk, ea to gt, abd LCM; SLTST: gry, sft to
        firm, amor to blky, slty to ea
11460.00 SH: gry to dk gry, firm to mod hd, plty,
        flk, ea to gt, abd LCM; SLTST: gry, sft to
        firm, amor to blky, slty to ea
11490.00 MRL: lt gry, mod hd, blky, occ flk, wxy;
        SH: gry to dk gry, firm to mod hd, plty,
        occ blky, ea to gt; SLTST: gry to dk gry,
        mod firm to firm, amor, blky, slty
11520.00 SH: gry to dk gry, firm to mod hd, plty,
        blky, ea to gt, tr MRL, occ LCM; SLTST:
        gry, occ brnsh rd, firm, amor to blky, ea
        to g
  1. tr -s ' '将多个连续的空格压缩为一个。
  2. awk代码只是向每行输入添加一个额外的换行符。
  3. par 50p8h将生成的文本设置为 50 个字符的宽度,并带有 8 个字符的悬挂缩进。
  4. 这些sed表达式删除空行并用单个制表符替换行开头的 8 个空格。

对于不使用 GNU 的解决方案,您必须在最后一个表达式sed中插入文字制表符。\tsed

对于使用空格作为缩进并将左侧悬挂缩进与屏幕截图中完全相同的解决方案(9 个空格):

$ tr -s ' ' <file | awk '{ print $0, "\n" }' | par 50p9h | sed -e '/^$/d'
11430.00 SH: gry to dk gry, firm to mod hd, plty,
         flk, ea to gt, abd LCM; SLTST: gry, sft
         to firm, amor to blky, slty to ea
11460.00 SH: gry to dk gry, firm to mod hd, plty,
         flk, ea to gt, abd LCM; SLTST: gry, sft
         to firm, amor to blky, slty to ea
11490.00 MRL: lt gry, mod hd, blky, occ flk, wxy;
         SH: gry to dk gry, firm to mod hd, plty,
         occ blky, ea to gt; SLTST: gry to dk gry,
         mod firm to firm, amor, blky, slty
11520.00 SH: gry to dk gry, firm to mod hd, plty,
         blky, ea to gt, tr MRL, occ LCM; SLTST:
         gry, occ brnsh rd, firm, amor to blky, ea
         to g

添加jto50p9h可以很好地证明段落的合理性:

11430.00 SH: gry to dk gry,  firm to mod hd, plty,
         flk, ea  to gt, abd LCM;  SLTST: gry, sft
         to firm, amor to blky, slty to ea
11460.00 SH: gry to dk gry,  firm to mod hd, plty,
         flk, ea  to gt, abd LCM;  SLTST: gry, sft
         to firm, amor to blky, slty to ea
11490.00 MRL: lt gry, mod  hd, blky, occ flk, wxy;
         SH: gry to dk gry,  firm to mod hd, plty,
         occ blky, ea to gt; SLTST: gry to dk gry,
         mod firm to firm, amor, blky, slty
11520.00 SH: gry to dk gry,  firm to mod hd, plty,
         blky, ea  to gt, tr MRL,  occ LCM; SLTST:
         gry, occ brnsh rd, firm, amor to blky, ea
         to g

...并且添加l到此也会强制调整每个段落的最后一行(不太好):

11430.00 SH: gry to dk gry,  firm to mod hd, plty,
         flk,  ea  to  gt, abd  LCM;  SLTST:  gry,
         sft  to firm,  amor to  blky, slty  to ea
11460.00 SH: gry to dk gry,  firm to mod hd, plty,
         flk,  ea  to  gt, abd  LCM;  SLTST:  gry,
         sft  to firm,  amor to  blky, slty  to ea
11490.00 MRL: lt gry, mod  hd, blky, occ flk, wxy;
         SH: gry to dk gry,  firm to mod hd, plty,
         occ  blky, ea  to  gt; SLTST:  gry to  dk
         gry, mod  firm to firm, amor,  blky, slty
11520.00 SH:   gry  to   dk  gry,   firm  to   mod
         hd,  plty,  blky,  ea   to  gt,  tr  MRL,
         occ   LCM;   SLTST:    gry,   occ   brnsh
         rd,  firm,   amor  to   blky,  ea   to  g

par大多数 Unices 上的大多数包管理器都可以提供它,但您也可以在以下位置找到它的源代码(以便自己编译)http://www.nicemice.net/par/


解决方案使用fmt而不是par

$ tr -s ' ' <file.in | awk '{ print $0, "\n" }' | fmt -w 50 |
  awk '/^[^0-9]/  { $0 = "         " $0 }
                  { print }' | fmt -w 50 | sed '/^$/d'
11430.00 SH: gry to dk gry, firm to mod hd, plty,
         flk, ea to gt, abd LCM; SLTST: gry, sft
         to firm, amor to blky, slty to ea
11460.00 SH: gry to dk gry, firm to mod hd, plty,
         flk, ea to gt, abd LCM; SLTST: gry, sft
         to firm, amor to blky, slty to ea
11490.00 MRL: lt gry, mod hd, blky, occ flk, wxy;
         SH: gry to dk gry, firm to mod hd, plty,
         occ blky, ea to gt; SLTST: gry to dk gry,
         mod firm to firm, amor, blky, slty
11520.00 SH: gry to dk gry, firm to mod hd, plty,
         blky, ea to gt, tr MRL, occ LCM; SLTST:
         gry, occ brnsh rd, firm, amor to blky, ea
         to g

fmt它的格式不那么灵活,这里我们需要使用它两次才能获得预期的结果。我们还利用了每个原始行都以数字开头的事实。

  1. tr -s ' ', 像之前一样。
  2. awk '{ print $0, "\n" }', 像之前一样。
  3. 第一个fmt调用 ( fmt -w 50) 的作用是将每个段落的第一行调整为正确的宽度(50 个字符)。
  4. awk脚本会将不以数字开头的每一行缩进 9 个空格。
  5. 第二次fmt调用会将整个文本格式化为 50 个字符,但现在缩进的行将保持缩进。
  6. sed表达式删除空行。

相关内容