来自两个文件的正则表达式

Question 1

#!/usr/bin/perl

use strict;

my $f1 = shift;
my $f2 = shift;

open(F1,"<",$f1) || die "couldn't open '$f1' for read: $!\n";
open(F2,"<",$f2) || die "couldn't open '$f2' for read: $!\n";

# set the input record separator (IRS) to '@'
$/='@';

# Normally the IRS is found at the END of a record, but your input
# files START with the input record separator, so we need to throw
# away the first (bogus) input record (i.e. everything from the start
# of the files to the first @ characters in them. in other words, the
# first @ characters in both files.)
my $junk = <F1>;
$junk = <F2>;

while (!eof(F1) && !eof(F2)) {
  my @record1 = split(/\n/, <F1>);
  my @record2 = split(/\n/, <F2>);

  printf "%s%s\n", $/, $record1[0];  # prepend the IRS
  printf "%s%s\n", substr($record2[1],0,4), $record1[1];
  printf "%s\n",   $record2[2];
  printf "%s%s\n", $record2[3], $record1[3];
};

close(F1);
close(F2);

这将打开两个文件进行读取，并将 perl 的$/输入记录分隔符变量设置为一个@字符。

然后，当两个文件都没有达到 EOF 时，它会从每个文件中读取一条记录，将记录拆分为数组（使用换行符\n作为分隔符），然后按指定输出合并的记录。

请注意，perl 数组从 0 开始，而不是 1 - 因此，例如，$record1[0]是 file1 中记录的第一行。

将脚本保存到文件（例如hassan.pl），使其可执行chmod +x hassan.pl并运行它，如下所示：

示例输出：

$ ./hassan.pl file1.txt file2.txt  
@NB551168:120:HTKN2BGX5:1:11101:3598:1051 2:N:0:NATC
NATCCAATCTCTAAAGTTT
+
#EEEAA/A/EEEE///EEE
@NB551168:120:HTKN2BGX5:1:11101:24202:1051 2:N:0:NTCG
NTCGTGAGACCGGGTGTTG
+
#EEEAAAAAAEEE///<AA
@NB551168:120:HTKN2BGX5:1:11101:4381:1051 2:N:0:NCTT
NCTTGCTACTCCTAAGGCA
+
#EEAA////6/////EE//

（我验证了diff这个输出与您想要的输出完全匹配）。

Answer

#!/usr/bin/perl

use strict;

my $f1 = shift;
my $f2 = shift;

open(F1,"<",$f1) || die "couldn't open '$f1' for read: $!\n";
open(F2,"<",$f2) || die "couldn't open '$f2' for read: $!\n";

# set the input record separator (IRS) to '@'
$/='@';

# Normally the IRS is found at the END of a record, but your input
# files START with the input record separator, so we need to throw
# away the first (bogus) input record (i.e. everything from the start
# of the files to the first @ characters in them. in other words, the
# first @ characters in both files.)
my $junk = <F1>;
$junk = <F2>;

while (!eof(F1) && !eof(F2)) {
  my @record1 = split(/\n/, <F1>);
  my @record2 = split(/\n/, <F2>);

  printf "%s%s\n", $/, $record1[0];  # prepend the IRS
  printf "%s%s\n", substr($record2[1],0,4), $record1[1];
  printf "%s\n",   $record2[2];
  printf "%s%s\n", $record2[3], $record1[3];
};

close(F1);
close(F2);

这将打开两个文件进行读取，并将 perl 的$/输入记录分隔符变量设置为一个@字符。

然后，当两个文件都没有达到 EOF 时，它会从每个文件中读取一条记录，将记录拆分为数组（使用换行符\n作为分隔符），然后按指定输出合并的记录。

请注意，perl 数组从 0 开始，而不是 1 - 因此，例如，$record1[0]是 file1 中记录的第一行。

将脚本保存到文件（例如hassan.pl），使其可执行chmod +x hassan.pl并运行它，如下所示：

示例输出：

$ ./hassan.pl file1.txt file2.txt  
@NB551168:120:HTKN2BGX5:1:11101:3598:1051 2:N:0:NATC
NATCCAATCTCTAAAGTTT
+
#EEEAA/A/EEEE///EEE
@NB551168:120:HTKN2BGX5:1:11101:24202:1051 2:N:0:NTCG
NTCGTGAGACCGGGTGTTG
+
#EEEAAAAAAEEE///<AA
@NB551168:120:HTKN2BGX5:1:11101:4381:1051 2:N:0:NCTT
NCTTGCTACTCCTAAGGCA
+
#EEAA////6/////EE//

（我验证了diff这个输出与您想要的输出完全匹配）。

Question 2

我换句话来说：除了该+行和以开头的行之外@，将所有I行粘贴到该行之前R。

如果你明白为什么，事情就变得非常简单：

sed '/^[@+]/s/.*//' I|paste -d '' - R

/^[@+]/+选择以or开头的行@
s/.*//清空这些行
paste -d '' - R将不带分隔符 ( )的结果（-对于管道输入）粘贴到文件中。-d ''R

Answer

我换句话来说：除了该+行和以开头的行之外@，将所有I行粘贴到该行之前R。

如果你明白为什么，事情就变得非常简单：

sed '/^[@+]/s/.*//' I|paste -d '' - R

/^[@+]/+选择以or开头的行@
s/.*//清空这些行
paste -d '' - R将不带分隔符 ( )的结果（-对于管道输入）粘贴到文件中。-d ''R

Question 3

如果您有权访问 Gnu sed，您可以执行以下操作：

$ sed -e 'R I' R | sed -ne 'p;n;n;h;n;G;s/\n//p'

Answer

如果您有权访问 Gnu sed，您可以执行以下操作：

$ sed -e 'R I' R | sed -ne 'p;n;n;h;n;G;s/\n//p'

来自两个文件的正则表达式

答案1

答案2

答案3

相关内容