多线图案/数据提取

Question 1

和awk：

awk -F': ' 'BEGIN{ORS=" "}$1=="MIME-Version"{exit}{print $2}END{print "\n"}' file

Answer

和awk：

awk -F': ' 'BEGIN{ORS=" "}$1=="MIME-Version"{exit}{print $2}END{print "\n"}' file

Question 2

以下是相当老套和暴力的方法，可能有更好的方法（但我需要更多地了解您的数据以及为什么您甚至提到 Excel - 如果数据最初位于电子表格中，那么是用于直接从 Excel 或 Open/Libre Office 等提取数据的 Perl 模块，但它确实可以使用您提供的示例数据。

它可以处理任意数量的输入文件。

它已被编写为使用 TAB （\t或 Ctrl-I 或^I）作为输出字段分隔符，而不是空格因为您的字段数据可以包含空格。

#!/usr/bin/perl

while (<>) {
  chomp;
  s/^\s*|\s*$//g;  # strip any leading and trailing whitespace
  next if /^$/;    # ignore all blank lines

  # split input line into @F array
  # $F[0] will contain the field name and
  # $F[1] will contain the field data
  # The field separator is a quite-forgiving zero-or-more spaces followed by
  # a colon followed by one-or-more spaces. This should cope with most minor
  # variants caused by manual extraction from Excel. 
  my @F = split /\s*:\s+/;

  # print the data at end of each input record (file)
  if (/^MIME-Version/) {
    # add space-separated @participants array to end of @record array
    push @record, join(" ", @participants);

    # print @record array, tab-separated
    print join("\t", @record), "\n";

    # clear both arrays, ready for next input file
    @record=();
    @participants=();
    next;
  };

  # fix up the date format
  if (/^X-RSMF-(Begin|End)Date/) {
    $F[1] =~ s/T/ /;
    $F[1] =~ s/-0[45]:00$//;
  };

  if (/^X-RSMF-Participants/) {
    # participants need to be handled differently because this field can
    # be multi-line.  Store in a separate @participants array
    push @participants, $F[1];

  } elsif ($#F == 0) {
    # lines without a field name get added to @participants array
    push @participants, $_;

  } else {
    # all other fields get added to @record array
    push @record, $F[1];
  }
}

将其保存到文件中，例如rsmf2tab.pl，使其可执行chmod +x rsmf2tab.pl，然后运行它，例如

./rsmf2tab.pl /mnt/c/Temp/rsmf/*.rsmf

或者如果您的 .rsmf 文件位于多个子目录中：

find /mnt/c/Temp/rsmf/ -name '*.rsmf' -exec /path/to/rsmf2tab.pl {} +

示例输出以两个示例数据副本（如 file1.rsmf 和 file2.rsmf）作为输入，通过管道传输cat -A以将选项卡显示为^I：

$ ./rsmf2tab.pl *.rsmf | cat -A
RSMF Generator Sample Library^I1.0.0^I53^I2022-09-20 04:33:11^I2022-09-20 16:47:56^IGRP000000118^IGRP000000118_D_20220920^IFalse^INative Messages^IPerson One <5156242756> Person two, Person three [email protected] <21243210277> Person four <345278652345>$
RSMF Generator Sample Library^I1.0.0^I53^I2022-09-20 04:33:11^I2022-09-20 16:47:56^IGRP000000118^IGRP000000118_D_20220920^IFalse^INative Messages^IPerson One <5156242756> Person two, Person three [email protected] <21243210277> Person four <345278652345>$

顺便说一句，你真的不想再做你的FILE=/mnt/c/Temp/rsmf/*.rsmf后继者for f in $FILES。如果任何文件包含任何空白字符，这将会中断。无论如何，这不是必需的 - 只需运行for f in /mnt/c/Temp/rsmf/*.rsmf或（取决于您正在运行的内容），只需将所有文件名参数传递给您正在运行的命令而不使用循环。

Answer