使用文本处理工具解析文件

Question 1

使用awk。下面的命令检查每一行中的每个条目并写入不同的文件，在我的示例中是out1和out2。如果输入文件中有一个换行符，那么输出文件中也会写入一个换行符。

awk '{for(i=1;i<=NF;i++) {if($i!=0) {printf "%s ",$i > "out1"} else {printf "%s ",$i > "out2"}; if (i==NF) {printf "\n" > "out1"; printf "\n" > "out2"} }}' foo

例子

输入文件

cat foo

1140.271257 0.002288454025 0.002763420728 0.004142512599 0 0 0 0 0 0 0 0 0 0 0 
1479.704769 0.00146621631 0.003190634646 0.003672029231 0 0 0 0 0 0 0 0 0 0 0 
1663.276205 0.003379552854 0.04643209167 0.0539399155 0 0 0 0 0 0 0 0 0 0 0 0

命令

awk '{for(i=1;i<=NF;i++) {if($i!=0) {printf "%s ",$i > "out1"} else {printf "%s ",$i > "out2"}; if (i==NF) {printf "\n" > "out1"; printf "\n" > "out2"} }}' foo

输出文件

cat out1

1140.271257 0.002288454025 0.002763420728 0.004142512599 
1479.704769 0.00146621631 0.003190634646 0.003672029231 
1663.276205 0.003379552854 0.04643209167 0.0539399155

cat out2

0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0

Answer

使用awk。下面的命令检查每一行中的每个条目并写入不同的文件，在我的示例中是out1和out2。如果输入文件中有一个换行符，那么输出文件中也会写入一个换行符。

awk '{for(i=1;i<=NF;i++) {if($i!=0) {printf "%s ",$i > "out1"} else {printf "%s ",$i > "out2"}; if (i==NF) {printf "\n" > "out1"; printf "\n" > "out2"} }}' foo

例子

输入文件

cat foo

1140.271257 0.002288454025 0.002763420728 0.004142512599 0 0 0 0 0 0 0 0 0 0 0 
1479.704769 0.00146621631 0.003190634646 0.003672029231 0 0 0 0 0 0 0 0 0 0 0 
1663.276205 0.003379552854 0.04643209167 0.0539399155 0 0 0 0 0 0 0 0 0 0 0 0

命令

awk '{for(i=1;i<=NF;i++) {if($i!=0) {printf "%s ",$i > "out1"} else {printf "%s ",$i > "out2"}; if (i==NF) {printf "\n" > "out1"; printf "\n" > "out2"} }}' foo

输出文件

cat out1

1140.271257 0.002288454025 0.002763420728 0.004142512599 
1479.704769 0.00146621631 0.003190634646 0.003672029231 
1663.276205 0.003379552854 0.04643209167 0.0539399155

cat out2

0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0

Question 2

您确实可以使用文本处理工具来执行此操作，但如果目的是将前 4 个字段与后面的字段分开，使用以下方法cut就足够了：

 cut -d ' ' -f 1-4 infile > outfile1
 cut -d ' ' -f 5- infile > outfile2

user@debian ~/tmp % cat infile
1140.271257 0.002288454025 0.002763420728 0.004142512599 0 0 0 0 0 0 0 0 0 0 0 
1479.704769 0.00146621631 0.003190634646 0.003672029231 0 0 0 0 0 0 0 0 0 0 0 
1663.276205 0.003379552854 0.04643209167 0.0539399155 0 0 0 0 0 0 0 0 0 0 0 0 
user@debian ~/tmp % cut -d ' ' -f 1-4 infile
1140.271257 0.002288454025 0.002763420728 0.004142512599
1479.704769 0.00146621631 0.003190634646 0.003672029231
1663.276205 0.003379552854 0.04643209167 0.0539399155
user@debian ~/tmp % cut -d ' ' -f 5- infile 
0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0

Answer

您确实可以使用文本处理工具来执行此操作，但如果目的是将前 4 个字段与后面的字段分开，使用以下方法cut就足够了：

 cut -d ' ' -f 1-4 infile > outfile1
 cut -d ' ' -f 5- infile > outfile2

user@debian ~/tmp % cat infile
1140.271257 0.002288454025 0.002763420728 0.004142512599 0 0 0 0 0 0 0 0 0 0 0 
1479.704769 0.00146621631 0.003190634646 0.003672029231 0 0 0 0 0 0 0 0 0 0 0 
1663.276205 0.003379552854 0.04643209167 0.0539399155 0 0 0 0 0 0 0 0 0 0 0 0 
user@debian ~/tmp % cut -d ' ' -f 1-4 infile
1140.271257 0.002288454025 0.002763420728 0.004142512599
1479.704769 0.00146621631 0.003190634646 0.003672029231
1663.276205 0.003379552854 0.04643209167 0.0539399155
user@debian ~/tmp % cut -d ' ' -f 5- infile 
0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0

Question 3

我建议为此使用 perl。保存您的输入input.txt并运行以下命令：

cat input.txt | perl -ane 'foreach(@F){   #loop through input and split each line into an array
  chomp; #remove trailing newline
  if($_ == 0){   #print the element to STDOUT if it is "0"
    print $_," "
  }
  else{     #print the element to STDERR if it is not "0"
    print STDERR $_," "
    }
  };
  print "\n"; print STDERR "\n";' #add a newline at the end 
> x2.txt 2> x1.txt    #redirect STDOUT to x2.txt and STDERR to x1.txt

这里作为一行复制粘贴：

cat input.txt | perl -ane 'foreach(@F){chomp;if($_ == 0){print $_," "}else{print STDERR $_," "}};print "\n"; print STDERR "\n";' > x2.txt 2> 1.txt

Answer

我建议为此使用 perl。保存您的输入input.txt并运行以下命令：

cat input.txt | perl -ane 'foreach(@F){   #loop through input and split each line into an array
  chomp; #remove trailing newline
  if($_ == 0){   #print the element to STDOUT if it is "0"
    print $_," "
  }
  else{     #print the element to STDERR if it is not "0"
    print STDERR $_," "
    }
  };
  print "\n"; print STDERR "\n";' #add a newline at the end 
> x2.txt 2> x1.txt    #redirect STDOUT to x2.txt and STDERR to x1.txt

这里作为一行复制粘贴：

cat input.txt | perl -ane 'foreach(@F){chomp;if($_ == 0){print $_," "}else{print STDERR $_," "}};print "\n"; print STDERR "\n";' > x2.txt 2> 1.txt

Question 4

另一种使用 Perl 的方法：

perl -lne '/(.*?)\s(0\s.*)/; print "$1"; print STDERR "$2"' file > filex1 2> filex2

正则表达式将匹配所有直到第一个0空格包围的内容，然后匹配从 0 到行尾的内容。括号分别将这两组捕获为$1和$2。-l启用自动尾随换行符删除 ( chomp) 并\n在每个print调用中添加一个。因此，我们将打印$1到标准输出和$2标准错误，然后将每个重定向到不同的文件。

由于这是 Perl，因此有多种方法可以做到这一点。这与 Wayne_Yux 的答案的想法相同，但经过了简化：

perl -lane '@A=grep{$_==0}@F; @B=grep{$_!=0}@F;print STDERR "@A"; print "@B"' file > filex1 2>filex2

或者更简单一点grep -P：

grep -oP '^.+?(?=\s0\s)' file > filex1
grep -oP ' \K0 .*' file > filex2

Answer

另一种使用 Perl 的方法：

perl -lne '/(.*?)\s(0\s.*)/; print "$1"; print STDERR "$2"' file > filex1 2> filex2

正则表达式将匹配所有直到第一个0空格包围的内容，然后匹配从 0 到行尾的内容。括号分别将这两组捕获为$1和$2。-l启用自动尾随换行符删除 ( chomp) 并\n在每个print调用中添加一个。因此，我们将打印$1到标准输出和$2标准错误，然后将每个重定向到不同的文件。

由于这是 Perl，因此有多种方法可以做到这一点。这与 Wayne_Yux 的答案的想法相同，但经过了简化：

perl -lane '@A=grep{$_==0}@F; @B=grep{$_!=0}@F;print STDERR "@A"; print "@B"' file > filex1 2>filex2

或者更简单一点grep -P：

grep -oP '^.+?(?=\s0\s)' file > filex1
grep -oP ' \K0 .*' file > filex2

使用文本处理工具解析文件

答案1

答案2

答案3

答案4

相关内容