我有一个类似的数据,但要大得多。所以我有一个 df1.txt 如下
sp|O15304|SIVA_HUMAN MPKRSCPFADVAPLQLKVRVSQRELSRGVCAERYSQEVFEKTKRLLFLGAQAYLDHVWDEGCAVVHLPESPKPGPTGAPRAARGQMLIGPDGRLIRSLGQASEADPSGVASIACSSCVRAVDGKAVCGQCERALCGQCVRTCWGCGSVACTLCGLVDCSDMYEKVLCTSCAMFET
tr|A0A1B1L9R9|A0A1B1L9R9_BACTU MNKQLFLASLKETQKSILSYACGAALYLWLLIWIFPSMVSAKGLNELIAAMPDSVKKIVGMESPIQNVMDFLAGEYYSLLFIIILTIFCVTVATHLIARHVDKGAMAYLLATPVSRVQIAITQATVLILGLLIIVSVTYVAGLVGAEWFLQDNNLNKELFLKINIVGGLIFLVVSAYSFFFSCICNDERKALSYSASLTILFFVLDMVGKLSDKLEWMKNLSLFTLFRPKEIAEGAYNIWPVSIGLIAGALCIFIVAIVVFKKRDLPL
我有 df2.txt 如下
sp|O15304|SIVA_HUMAN IGPDGR
我试图将它们结合在一起,所以我执行以下操作
join df1.txt df2.txt | awk '{gsub($3, tolower($3), $2) ; print $1 "\t" $2}' > out.txt
我希望有这个
sp|O15304|SIVA_HUMAN MPKRSCPFADVAPLQLKVRVSQRELSRGVCAERYSQEVFEKTKRLLFLGAQAYLDHVWDEGCAVVHLPESPKPGPTGAPRAARGQMLigpdgrLIRSLGQASEADPSGVASIACSSCVRAVDGKAVCGQCERALCGQCVRTCWGCGSVACTLCGLVDCSDMYEKVLCTSCAMFET
tr|A0A1B1L9R9|A0A1B1L9R9_BACTU MNKQLFLASLKETQKSILSYACGAALYLWLLIWIFPSMVSAKGLNELIAAMPDSVKKIVGMESPIQNVMDFLAGEYYSLLFIIILTIFCVTVATHLIARHVDKGAMAYLLATPVSRVQIAITQATVLILGLLIIVSVTYVAGLVGAEWFLQDNNLNKELFLKINIVGGLIFLVVSAYSFFFSCICNDERKALSYSASLTILFFVLDMVGKLSDKLEWMKNLSLFTLFRPKEIAEGAYNIWPVSIGLIAGALCIFIVAIVVFKKRDLPL
但我却有这个
sp|O15304|SIVA_HUMAN MPKRSCPFADVAPLQLKVRVSQRELSRGVCAERYSQEVFEKTKRLLFLGAQAYLDHVWDEGCAVVHLPESPKPGPTGAPRAARGQMLigpdgrLIRSLGQASEADPSGVASIACSSCVRAVDGKAVCGQCERALCGQCVRTCWGCGSVACTLCGLVDCSDMYEKVLCTSCAMFET
,我该如何解决?
答案1
问题出在 join 命令上。你需要使用-a 1
.
从man join
-a FILENUM
also print unpairable lines from file FILENUM, where
FILENUM is 1 or 2, corresponding to FILE1 or FILE2
即最终命令是
join -a 1 df1.txt df2.txt | awk '{gsub($3, tolower($3), $2) ; print $1 "\t" $2}' > out.txt
背景
排除故障时,应依次测试管道的各个部分。join df1.txt df2.txt
只输出两个文件中的行。要包含 中 中df1.txt
没有匹配项的行df2.txt
,请-a 1
按上述方式使用。