如何在多个 txt 文件中传递字符串

如何在多个 txt 文件中传递字符串

我有一个类似的数据,但要大得多。所以我有一个 df1.txt 如下

sp|O15304|SIVA_HUMAN    MPKRSCPFADVAPLQLKVRVSQRELSRGVCAERYSQEVFEKTKRLLFLGAQAYLDHVWDEGCAVVHLPESPKPGPTGAPRAARGQMLIGPDGRLIRSLGQASEADPSGVASIACSSCVRAVDGKAVCGQCERALCGQCVRTCWGCGSVACTLCGLVDCSDMYEKVLCTSCAMFET 
tr|A0A1B1L9R9|A0A1B1L9R9_BACTU  MNKQLFLASLKETQKSILSYACGAALYLWLLIWIFPSMVSAKGLNELIAAMPDSVKKIVGMESPIQNVMDFLAGEYYSLLFIIILTIFCVTVATHLIARHVDKGAMAYLLATPVSRVQIAITQATVLILGLLIIVSVTYVAGLVGAEWFLQDNNLNKELFLKINIVGGLIFLVVSAYSFFFSCICNDERKALSYSASLTILFFVLDMVGKLSDKLEWMKNLSLFTLFRPKEIAEGAYNIWPVSIGLIAGALCIFIVAIVVFKKRDLPL    

我有 df2.txt 如下

sp|O15304|SIVA_HUMAN    IGPDGR

我试图将它们结合在一起,所以我执行以下操作

join df1.txt df2.txt | awk '{gsub($3, tolower($3), $2) ; print $1 "\t" $2}' > out.txt

我希望有这个

sp|O15304|SIVA_HUMAN    MPKRSCPFADVAPLQLKVRVSQRELSRGVCAERYSQEVFEKTKRLLFLGAQAYLDHVWDEGCAVVHLPESPKPGPTGAPRAARGQMLigpdgrLIRSLGQASEADPSGVASIACSSCVRAVDGKAVCGQCERALCGQCVRTCWGCGSVACTLCGLVDCSDMYEKVLCTSCAMFET
tr|A0A1B1L9R9|A0A1B1L9R9_BACTU  MNKQLFLASLKETQKSILSYACGAALYLWLLIWIFPSMVSAKGLNELIAAMPDSVKKIVGMESPIQNVMDFLAGEYYSLLFIIILTIFCVTVATHLIARHVDKGAMAYLLATPVSRVQIAITQATVLILGLLIIVSVTYVAGLVGAEWFLQDNNLNKELFLKINIVGGLIFLVVSAYSFFFSCICNDERKALSYSASLTILFFVLDMVGKLSDKLEWMKNLSLFTLFRPKEIAEGAYNIWPVSIGLIAGALCIFIVAIVVFKKRDLPL    

但我却有这个

sp|O15304|SIVA_HUMAN    MPKRSCPFADVAPLQLKVRVSQRELSRGVCAERYSQEVFEKTKRLLFLGAQAYLDHVWDEGCAVVHLPESPKPGPTGAPRAARGQMLigpdgrLIRSLGQASEADPSGVASIACSSCVRAVDGKAVCGQCERALCGQCVRTCWGCGSVACTLCGLVDCSDMYEKVLCTSCAMFET

,我该如何解决?

答案1

问题出在 join 命令上。你需要使用-a 1.

man join

-a FILENUM
       also print unpairable lines from file FILENUM, where
       FILENUM is 1 or  2,  corresponding to FILE1 or FILE2

即最终命令是

join -a 1 df1.txt df2.txt | awk '{gsub($3, tolower($3), $2) ; print $1 "\t" $2}' > out.txt

背景

排除故障时,应依次测试管道的各个部分。join df1.txt df2.txt只输出两个文件中的行。要包含 中 中df1.txt没有匹配项的行df2.txt,请-a 1按上述方式使用。

相关内容