查找文本文件中的行并将其替换为另一个文件的输出

2024-5-29 • tag-icon

text-processing awk sed grep bioinformatics

查找文本文件中的行并将其替换为另一个文件的输出

我有两个文件A和B。

File A
>Node1                  
...
>Node2
...

File B
>gb|KY551314.1| Influenza A virus (A/mallard/Idaho/AH0011522/2015(H7N7)) segment 
2 polymerase PB1 (PB1) and PB1-F2 protein (PB1-F2) genes, 
complete cds
Length=2316

>gb|KY561069.1| Influenza A virus (A/American green-winged teal/Missouri/15OS6591/2015(H11N9)) 
segment 1 polymerase PB2 (PB2) gene, complete 
cds
Length=2341

如何用文件 B 中的后续条目替换文件 A 的每个 NodeX 行？结果如下所示：

File A
>gb|KY551314.1| Influenza A virus (A/mallard/Idaho/AH0011522/2015(H7N7)) segment 2 polymerase PB1 (PB1) and PB1-F2 protein (PB1-F2) genes, complete ads Length=2316
...

>gb|KY561069.1| Influenza A virus (A/American green-winged teal/Missouri/15OS6591/2015(H11N9)) segment 1 polymerase PB2 (PB2) gene, complete ads Length=2341
...

答案1

一种方法是awk：

awk 'NR==FNR && /^>Node/ {
    $0 = ""
    for(i=0; i<=4; i++) {
        getline s <ARGV[2]
        $0 = $0 s
    }
}
NR==FNR' FileA FileB

答案2

perl -lMautodie -pe 'BEGIN{ open FILE_B, "<", pop; }
   s/^>Node.*// && do{for my $k (0..3) { s/$/<FILE_B> =~ s|\n| |r/e }}
' FileA FileB

解释

Perl 选项：-l=> IFS=OFS=\n, -p=> 循环中读取隐式文件 + 自动打印记录。
autodie包含 pragma 来修复错误。
打开文件B
对于匹配 FileA 开头的 >Node 的行，我们读取 FileB 4 次并删除换行符。

相关内容