如何删除重复的行取决于第二列

如何删除重复的行取决于第二列

我有以下文件:

chr11_pilon3.g3568.t1   transcript:OIT01734 transcript:OIT01734 1.1e-107    389.8   1000    218 992 1   216 130 345 MDALTRHIQGDVPWCMLFADDIILIDETRAGVSERLEIWRQTLESKGFKISRSKTEYLECKFGDEPSGVGREVMLGSQAIAKRDSVRYLGSVIQGDGEIDGDVTHRIGAGWSKWRLASGVLCDKKIPHKLKGKFFRAMVRPAMFYEAECWPVKNSHIQRMKVAEMRMLRWMCGHTRLDKIKNEVIRQKVGVAPVDKKMGEARLRWFGHVRRRGPDA    MDALTRHIQGDVPWCMLFADDIVLIDETRVGVNERLEVWRQTLESKGFKLSRSKTEYLECKFSAESSEVGRDVKLGSQVIAKRDSFRYLGSVIQGEGEIDGDVTHRIGAGWSKWRLASGVLCDKKVPQKLKGKFYRAVVRPAMLYGAECWPVKNSHVQRMKVAEMRMLRWMRGLTRLDRIRNEVIREKVGVALVDEKMREARLRWYGHVRRRRPDA    MDALTRHIQGDVPWCMLFADDIILIDETRAGVSERLEIWRQTLESKGFKISRSKTEYLECKFGDEPSGVGREVMLGSQAIAKRDSVRYLGSVIQGDGEIDGDVTHRIGAGWSKWRLASGVLCDKKIPHKLKGKFFRAMVRPAMFYEAECWPVKNSHIQRMKVAEMRMLRWMCGHTRLDKIKNEVIRQKVGVAPVDKKMGEARLRWFGHVRRRGPDAR*  MKVWERVVEARVREMTSISVNQFGFMPGRSTTEAIHLVRRLVEHFRDKKKDLHMVFIDLENAYDKVPREVLWRCLEAKSVPEAYIRVIKDMYDGAKTRVRTVGGDSDHFPVVMGLHQGSALSPLLFALVMDALTRHIQGDVPWCMLFADDIVLIDETRVGVNERLEVWRQTLESKGFKLSRSKTEYLECKFSAESSEVGRDVKLGSQVIAKRDSFRYLGSVIQGEGEIDGDVTHRIGAGWSKWRLASGVLCDKKVPQKLKGKFYRAVVRPAMLYGAECWPVKNSHVQRMKVAEMRMLRWMRGLTRLDRIRNEVIREKVGVALVDEKMREARLRWYGHVRRRRPDAPVRIYKSAILGHLNSHGSQNALAGPVEAEENRQKTKKEVMEEIIQKSKFFKAQKAKDREENDELTEQLDKDFTSLVESKALLSLTQPDKINALKALVNKNISVGNVKKDEVADVPRKASIGKEKPDTYEMLVSEMALDMRARPSDRTKTPEEIAQEEKERLELLEQEXXXXXXXXXXXXXXDGNASDDNSKLVKDPRTVSGDDLGDDLEEVPRTKLGWIGEILRRKENELESEDAASSGDSDDGEDEGXXXXXXXXXXXXXXXXXXXXDEEQGKTQTIKDWEQSDDDIIDTELEDDDEGFGDDAKKVVKIKDHKEENLSITVAAENKKKMQVFYGVLLQYFAVLANKKPLNSKLLNLLVKPLMEMSAVSPYFAAICARQRLQRTRAQFCEDLKNTGKSSWPSLKTIFLLRLWSMIFPCSDFRHCVMTPAILLMCEYLMRCTIISGRDIAIASFLCSLLLSVIKQSQKFCPEAIVFIQTLLMAALDRKQRSNSQLDNLMEIKELGPLLCIRSSKVEMDSLDFLTLMDLPEDSQYFHSDNYRTSMLVTVLETLQGFVNVYKELISFPEIFMLISKLLCKMAGENHIPDALREKIKDVSQLIDTKAQEHHMLRQPLKMRKKKPVPIRMLNPKFEENFVKGRDYDPDRERA    389.8   1000    216 85.6    185 31  200 0   0   92.6    0   22IV6AV2SN4IV11IL12GSDA1PS1GE3ED1MK4AV6VF9DE29IV1HQ6FY2MV5FL1EG10IV14CR1HL4KR1KR5QE5PL2KE2GR6FY6GR3 85.6    1.1e-107    99.1
gene.9403.0.4.p1    transcript:OIT35479 transcript:OIT35479 8.5e-191    667.5   1721    690 406 1   378 1   378 MLSAPRVSPPAVAVAAPARFKFPNVCVNPVNLLLLHRNVGSSCKRVVVSTKAAYSRMPMDTPGAYQLIDKESGDKFIIWGGTEDDDSSIPSKEVLSWKPLASTPXXXXXXXXXXXXXDEASTRGLTGNFGRLKFRRMRDLVRKSYTKNKERDVIDHNKHNIADASSRSSFSSYNEPDQLKEQQTLSLPRGRAKIQQLDDKKNFQKLIRVEDEDRGIAIENVSKHFAGYSIDSHAQSARVVHPGSKASASPLRGWGGGSSHYSLKRDEIFRERQNLGDENNFFSRKSFQELGCSDYMIESLRNQHFVRPSHIQAMTFGPIIAGKSCIISDQSGSGKTLAYLLPLIQRLRQEELQGLSKPSSQSPRVVVLAPTAELASQV  MLSAPRAPPPAVAVAAPARFKFQNVCGNPVNLLLLHRNVGSSCKRVVVSTKAAYSRMPMDTPGAYQLIDKESGDKFIVWGGTEDDDSSIPSKEVLSWKPLASTSPDNNHPPPTQSSSNEASTRGLTGNFGRLKFRRMRDLVRKSYTKNKERDVIDHDKHNTTDASSRSSFSSYNEPGQLKEQQTLSLPRGRAKIQQLEDRKNSQKLIRVEDEDRDIAIENVSKHFAGYSSDSHAHSARVVHPGSKASASPLRGWGGGSSHYSLKREEIFRQRRNLDDENNFFSRKSFQELGCSDYMIESLRNQHFVRPSHIQAMTFGPIIAGKSCIISDQSGSGKTLAYLLPLIQRLRQEELQGLSKPSSQSPRVVVLAPTAELASQV  MLSAPRVSPPAVAVAAPARFKFPNVCVNPVNLLLLHRNVGSSCKRVVVSTKAAYSRMPMDTPGAYQLIDKESGDKFIIWGGTEDDDSSIPSKEVLSWKPLASTPXXXXXXXXXXXXXDEASTRGLTGNFGRLKFRRMRDLVRKSYTKNKERDVIDHNKHNIADASSRSSFSSYNEPDQLKEQQTLSLPRGRAKIQQLDDKKNFQKLIRVEDEDRGIAIENVSKHFAGYSIDSHAQSARVVHPGSKASASPLRGWGGGSSHYSLKRDEIFRERQNLGDENNFFSRKSFQELGCSDYMIESLRNQHFVRPSHIQAMTFGPIIAGKSCIISDQSGSGKTLAYLLPLIQRLRQEELQGLSKPSSQSPRVVVLAPTAELASQVLSTCRSFSKSGVPFHSMVVTGGFCQRTQLENLRQELDILIATPGRFMFLIKEGYLQLTNLKCAVLDEVDILFSDEDFETAFQCLINSSPITTQYLFVTATLPMDIYNKLVESFPDCELVSGPGMHRTSPGLEEFLVDCSGDETAEKSPDTAFINKKNALLHLVEDSPVPKTIVFCNKIDSCRKVENALKRFDRKGFSIKILPFHAALDQRRRLANMEEFRRSKMENVSLFLVCTDRASRGIDFEGVDHVVLFDYPRDPSEYVRRVGRTARGAGGKGKAFIFAVGKQVSLARRIMERNKKGHPVHDVPSILT*  MLSAPRAPPPAVAVAAPARFKFQNVCGNPVNLLLLHRNVGSSCKRVVVSTKAAYSRMPMDTPGAYQLIDKESGDKFIVWGGTEDDDSSIPSKEVLSWKPLASTSPDNNHPPPTQSSSNEASTRGLTGNFGRLKFRRMRDLVRKSYTKNKERDVIDHDKHNTTDASSRSSFSSYNEPGQLKEQQTLSLPRGRAKIQQLEDRKNSQKLIRVEDEDRDIAIENVSKHFAGYSSDSHAHSARVVHPGSKASASPLRGWGGGSSHYSLKREEIFRQRRNLDDENNFFSRKSFQELGCSDYMIESLRNQHFVRPSHIQAMTFGPIIAGKSCIISDQSGSGKTLAYLLPLIQRLRQEELQGLSKPSSQSPRVVVLAPTAELASQVCQISSSIKGTFATYSPYCSATTHTKRKK  667.5   1721    378 91.0    344 34  352 0   0   93.1    0   6VASP14PQ3VG50IV25PSXPXDXNXNXHXPXPXPXTXQXSXSXSDN38ND3ITAT14DG20DE1KR2FS11GD14IS4QH30DE4EQ1QR2GD102  91.0    8.5e-191    54.8
gene.9403.0.5.p1    transcript:OIT35479 transcript:OIT35479 8.5e-191    667.5   1721    690 406 1   378 1   378 MLSAPRVSPPAVAVAAPARFKFPNVCVNPVNLLLLHRNVGSSCKRVVVSTKAAYSRMPMDTPGAYQLIDKESGDKFIIWGGTEDDDSSIPSKEVLSWKPLASTPXXXXXXXXXXXXXDEASTRGLTGNFGRLKFRRMRDLVRKSYTKNKERDVIDHNKHNIADASSRSSFSSYNEPDQLKEQQTLSLPRGRAKIQQLDDKKNFQKLIRVEDEDRGIAIENVSKHFAGYSIDSHAQSARVVHPGSKASASPLRGWGGGSSHYSLKRDEIFRERQNLGDENNFFSRKSFQELGCSDYMIESLRNQHFVRPSHIQAMTFGPIIAGKSCIISDQSGSGKTLAYLLPLIQRLRQEELQGLSKPSSQSPRVVVLAPTAELASQV  MLSAPRAPPPAVAVAAPARFKFQNVCGNPVNLLLLHRNVGSSCKRVVVSTKAAYSRMPMDTPGAYQLIDKESGDKFIVWGGTEDDDSSIPSKEVLSWKPLASTSPDNNHPPPTQSSSNEASTRGLTGNFGRLKFRRMRDLVRKSYTKNKERDVIDHDKHNTTDASSRSSFSSYNEPGQLKEQQTLSLPRGRAKIQQLEDRKNSQKLIRVEDEDRDIAIENVSKHFAGYSSDSHAHSARVVHPGSKASASPLRGWGGGSSHYSLKREEIFRQRRNLDDENNFFSRKSFQELGCSDYMIESLRNQHFVRPSHIQAMTFGPIIAGKSCIISDQSGSGKTLAYLLPLIQRLRQEELQGLSKPSSQSPRVVVLAPTAELASQV  MLSAPRVSPPAVAVAAPARFKFPNVCVNPVNLLLLHRNVGSSCKRVVVSTKAAYSRMPMDTPGAYQLIDKESGDKFIIWGGTEDDDSSIPSKEVLSWKPLASTPXXXXXXXXXXXXXDEASTRGLTGNFGRLKFRRMRDLVRKSYTKNKERDVIDHNKHNIADASSRSSFSSYNEPDQLKEQQTLSLPRGRAKIQQLDDKKNFQKLIRVEDEDRGIAIENVSKHFAGYSIDSHAQSARVVHPGSKASASPLRGWGGGSSHYSLKRDEIFRERQNLGDENNFFSRKSFQELGCSDYMIESLRNQHFVRPSHIQAMTFGPIIAGKSCIISDQSGSGKTLAYLLPLIQRLRQEELQGLSKPSSQSPRVVVLAPTAELASQVLSTCRSFSKSGVPFHSMVVTGGFCQRTQLENLRQELDILIATPGRFMFLIKEGYLQLTNLKCAVLDEVDILFSDEDFETAFQCLINSSPITTQYLFVTATLPMDIYNKLVESFPDCELVSGPGMHRTSPGLEEFLVDCSGDETAEKSPDTAFINKKNALLHLVEDSPVPKTIVFCNKIDSCRKVENALKRFDRKGFSIKILPFHAALDQRRRLANMEEFRRSKMENVSLFLVCTDRASRGIDFEGVDHVVLFDYPRDPSEYVRRVGRTARGAGGKGKAFIFAVGKQVSLARRIMERNKKGHPVHDVPSILT*  MLSAPRAPPPAVAVAAPARFKFQNVCGNPVNLLLLHRNVGSSCKRVVVSTKAAYSRMPMDTPGAYQLIDKESGDKFIVWGGTEDDDSSIPSKEVLSWKPLASTSPDNNHPPPTQSSSNEASTRGLTGNFGRLKFRRMRDLVRKSYTKNKERDVIDHDKHNTTDASSRSSFSSYNEPGQLKEQQTLSLPRGRAKIQQLEDRKNSQKLIRVEDEDRDIAIENVSKHFAGYSSDSHAHSARVVHPGSKASASPLRGWGGGSSHYSLKREEIFRQRRNLDDENNFFSRKSFQELGCSDYMIESLRNQHFVRPSHIQAMTFGPIIAGKSCIISDQSGSGKTLAYLLPLIQRLRQEELQGLSKPSSQSPRVVVLAPTAELASQVCQISSSIKGTFATYSPYCSATTHTKRKK  667.5   1721    378 91.0    344 34  352 0   0   93.1    0   6VASP14PQ3VG50IV25PSXPXDXNXNXHXPXPXPXTXQXSXSXSDN38ND3ITAT14DG20DE1KR2FS11GD14IS4QH30DE4EQ1QR2GD102  91.0    8.5e-191    54.8
gene.69001.9.9.p1   NisylKD955766g0010.1    NisylKD955766g0010.1    1.4e-294    1011.9  2615    531 530 1   530 1   530 MKEMCLAVAPLPFRLGNNLIFHNPLSIGSSSHMDVTRLNSMGGTTTSLYAESAEKDLSDTVSSSRSEGVPLLHMISENESNNWISGDAVVRESEDDEILSLDGDQMSCSLSVVSDSSSLCGDDFIGFEVASEIFGQNFVDAEKSICSVELIAKPGDLVESGVEDDNVSKPFAVKIEEQITDGSSSKSSQVVVQLPLNKGLSAAVSRSVFEVDYIPLWGFTSVCGRRPEMEDALATVPRFLRIPLQMLVGHRVPDGVSRCLSHLTAHFFGVYDGHGGSQVANYCRDRVHAVLAEELEKFMANLNDESIRQNCQEQWKKAFTNCFLMVDDEVGGTGNHEAVAAETVGSTAVVAIVCSSHIIVANCGDSRAVLCRGKEPTALSVDHKPNREDEYARIEAAGGKVIQWNGHRVFGVLAMSRSIGDRYLKPWIIPDPEVMFIPRTKDDECLILASDGLWDVMSNEEACELARKRILLWHKKNGVTLTLERGQGIDPAAQAAAECLSNRAIQKGSKDNITVIVVDLKAQRKFKSKT  MKEMCLAVAPLPFRLGNNLIFRNPPSIGSSSHMDATRLNSMGDTTTSLYAESAEKDLSDTVSSSRSEGVPLLPMISENDRNNWIAGDAVVRESEDDEILSLDGDQVSCSLSVVSDSSSLCGDDFIGFEVASDIYGQNFVDAEKSICSVELIAKPGDLVESGVEDDNVSKPFAVKLEEQITDGSSSKSSQVVVQLPLNKGLSAAVSRSVFEVDYIPLWGFTSVCGRRPEMEDALATVPRFLRIPLQMLVGDRVPDGVSRCLSHLTAHFFGVYDGHGGSQVANYCRDRVHAVLAEELEKFMANLNDESIRQNCQDQWKKAFTNCFLKVDDEVGGTGNREAVAAETVGSTAVVAIVCSSHIIVANCGDSRAVLCRGKEPMALSVDHKPNREDEYARIEAAGGKVIQWNGHRVFGVLAMSRSIGDRYLKPWIIPDPEVMFIPRTKDDECLILASDGLWDVMSNEEACELARKRILLWHKKNGVTLTLERGQGIDPAAQAAAECLSNRATQKGSKDNITVIVVDLKAQRKFKSKT  MKEMCLAVAPLPFRLGNNLIFHNPLSIGSSSHMDVTRLNSMGGTTTSLYAESAEKDLSDTVSSSRSEGVPLLHMISENESNNWISGDAVVRESEDDEILSLDGDQMSCSLSVVSDSSSLCGDDFIGFEVASEIFGQNFVDAEKSICSVELIAKPGDLVESGVEDDNVSKPFAVKIEEQITDGSSSKSSQVVVQLPLNKGLSAAVSRSVFEVDYIPLWGFTSVCGRRPEMEDALATVPRFLRIPLQMLVGHRVPDGVSRCLSHLTAHFFGVYDGHGGSQVANYCRDRVHAVLAEELEKFMANLNDESIRQNCQEQWKKAFTNCFLMVDDEVGGTGNHEAVAAETVGSTAVVAIVCSSHIIVANCGDSRAVLCRGKEPTALSVDHKPNREDEYARIEAAGGKVIQWNGHRVFGVLAMSRSIGDRYLKPWIIPDPEVMFIPRTKDDECLILASDGLWDVMSNEEACELARKRILLWHKKNGVTLTLERGQGIDPAAQAAAECLSNRAIQKGSKDNITVIVVDLKAQRKFKSKT* MKEMCLAVAPLPFRLGNNLIFRNPPSIGSSSHMDATRLNSMGDTTTSLYAESAEKDLSDTVSSSRSEGVPLLPMISENDRNNWIAGDAVVRESEDDEILSLDGDQVSCSLSVVSDSSSLCGDDFIGFEVASDIYGQNFVDAEKSICSVELIAKPGDLVESGVEDDNVSKPFAVKLEEQITDGSSSKSSQVVVQLPLNKGLSAAVSRSVFEVDYIPLWGFTSVCGRRPEMEDALATVPRFLRIPLQMLVGDRVPDGVSRCLSHLTAHFFGVYDGHGGSQVANYCRDRVHAVLAEELEKFMANLNDESIRQNCQDQWKKAFTNCFLKVDDEVGGTGNREAVAAETVGSTAVVAIVCSSHIIVANCGDSRAVLCRGKEPMALSVDHKPNREDEYARIEAAGGKVIQWNGHRVFGVLAMSRSIGDRYLKPWIIPDPEVMFIPRTKDDECLILASDGLWDVMSNEEACELARKRILLWHKKNGVTLTLERGQGIDPAAQAAAECLSNRATQKGSKDNITVIVVDLKAQRKFKSKT  1011.9  2615    530 96.6    512 18  519 0   0   97.9    0   21HR2LP9VA7GD29HP5EDSR4SA20MV25ED1FY40IL74HD62ED11MK10HR40TM127IT25 96.6    1.4e-294    99.8
gene.9403.9.5.p1    transcript:OIT35479 transcript:OIT35479 8.5e-191    667.5   1721    690 406 1   378 1   378 MLSAPRVSPPAVAVAAPARFKFPNVCVNPVNLLLLHRNVGSSCKRVVVSTKAAYSRMPMDTPGAYQLIDKESGDKFIIWGGTEDDDSSIPSKEVLSWKPLASTPXXXXXXXXXXXXXDEASTRGLTGNFGRLKFRRMRDLVRKSYTKNKERDVIDHNKHNIADASSRSSFSSYNEPDQLKEQQTLSLPRGRAKIQQLDDKKNFQKLIRVEDEDRGIAIENVSKHFAGYSIDSHAQSARVVHPGSKASASPLRGWGGGSSHYSLKRDEIFRERQNLGDENNFFSRKSFQELGCSDYMIESLRNQHFVRPSHIQAMTFGPIIAGKSCIISDQSGSGKTLAYLLPLIQRLRQEELQGLSKPSSQSPRVVVLAPTAELASQV  MLSAPRAPPPAVAVAAPARFKFQNVCGNPVNLLLLHRNVGSSCKRVVVSTKAAYSRMPMDTPGAYQLIDKESGDKFIVWGGTEDDDSSIPSKEVLSWKPLASTSPDNNHPPPTQSSSNEASTRGLTGNFGRLKFRRMRDLVRKSYTKNKERDVIDHDKHNTTDASSRSSFSSYNEPGQLKEQQTLSLPRGRAKIQQLEDRKNSQKLIRVEDEDRDIAIENVSKHFAGYSSDSHAHSARVVHPGSKASASPLRGWGGGSSHYSLKREEIFRQRRNLDDENNFFSRKSFQELGCSDYMIESLRNQHFVRPSHIQAMTFGPIIAGKSCIISDQSGSGKTLAYLLPLIQRLRQEELQGLSKPSSQSPRVVVLAPTAELASQV  MLSAPRVSPPAVAVAAPARFKFPNVCVNPVNLLLLHRNVGSSCKRVVVSTKAAYSRMPMDTPGAYQLIDKESGDKFIIWGGTEDDDSSIPSKEVLSWKPLASTPXXXXXXXXXXXXXDEASTRGLTGNFGRLKFRRMRDLVRKSYTKNKERDVIDHNKHNIADASSRSSFSSYNEPDQLKEQQTLSLPRGRAKIQQLDDKKNFQKLIRVEDEDRGIAIENVSKHFAGYSIDSHAQSARVVHPGSKASASPLRGWGGGSSHYSLKRDEIFRERQNLGDENNFFSRKSFQELGCSDYMIESLRNQHFVRPSHIQAMTFGPIIAGKSCIISDQSGSGKTLAYLLPLIQRLRQEELQGLSKPSSQSPRVVVLAPTAELASQVLSTCRSFSKSGVPFHSMVVTGGFCQRTQLENLRQELDILIATPGRFMFLIKEGYLQLTNLKCAVLDEVDILFSDEDFETAFQCLINSSPITTQYLFVTATLPMDIYNKLVESFPDCELVSGPGMHRTSPGLEEFLVDCSGDETAEKSPDTAFINKKNALLHLVEDSPVPKTIVFCNKIDSCRKVENALKRFDRKGFSIKILPFHAALDQRRRLANMEEFRRSKMENVSLFLVCTDRASRGIDFEGVDHVVLFDYPRDPSEYVRRVGRTARGAGGKGKAFIFAVGKQVSLARRIMERNKKGHPVHDVPSILT*  MLSAPRAPPPAVAVAAPARFKFQNVCGNPVNLLLLHRNVGSSCKRVVVSTKAAYSRMPMDTPGAYQLIDKESGDKFIVWGGTEDDDSSIPSKEVLSWKPLASTSPDNNHPPPTQSSSNEASTRGLTGNFGRLKFRRMRDLVRKSYTKNKERDVIDHDKHNTTDASSRSSFSSYNEPGQLKEQQTLSLPRGRAKIQQLEDRKNSQKLIRVEDEDRDIAIENVSKHFAGYSSDSHAHSARVVHPGSKASASPLRGWGGGSSHYSLKREEIFRQRRNLDDENNFFSRKSFQELGCSDYMIESLRNQHFVRPSHIQAMTFGPIIAGKSCIISDQSGSGKTLAYLLPLIQRLRQEELQGLSKPSSQSPRVVVLAPTAELASQVCQISSSIKGTFATYSPYCSATTHTKRKK  667.5   1721    378 91.0    344 34  352 0   0   93.1    0   6VASP14PQ3VG50IV25PSXPXDXNXNXHXPXPXPXTXQXSXSXSDN38ND3ITAT14DG20DE1KR2FS11GD14IS4QH30DE4EQ1QR2GD102  91.0    8.5e-191    54.8

上面的文件有一些相似的ID

gene.9403.0.4.p1
gene.9403.0.5.p1
gene.9403.9.5.p1    

通过保留,仅gene.9403ID 变得相同。其余的列gene.9403是相同的,因此我想删除任何重复项。

我使用了它awk -F"\t" '!seen[$2, $3, $4, $5, $6, $7,$8, $9,$10,$11,$12, $13,$14,$15,$16,$17,$18,$19,$20,$21,$22,$23,$24,$25,$26,$27,$28,$29,$30,$31]++' select-results2.txt,它为我提供了上面示例的正确输出

chr11_pilon3.g3568.t1   transcript:OIT01734 transcript:OIT01734 1.1e-107    389.8   1000    218 992 1   216 130 345 MDALTRHIQGDVPWCMLFADDIILIDETRAGVSERLEIWRQTLESKGFKISRSKTEYLECKFGDEPSGVGREVMLGSQAIAKRDSVRYLGSVIQGDGEIDGDVTHRIGAGWSKWRLASGVLCDKKIPHKLKGKFFRAMVRPAMFYEAECWPVKNSHIQRMKVAEMRMLRWMCGHTRLDKIKNEVIRQKVGVAPVDKKMGEARLRWFGHVRRRGPDA    MDALTRHIQGDVPWCMLFADDIVLIDETRVGVNERLEVWRQTLESKGFKLSRSKTEYLECKFSAESSEVGRDVKLGSQVIAKRDSFRYLGSVIQGEGEIDGDVTHRIGAGWSKWRLASGVLCDKKVPQKLKGKFYRAVVRPAMLYGAECWPVKNSHVQRMKVAEMRMLRWMRGLTRLDRIRNEVIREKVGVALVDEKMREARLRWYGHVRRRRPDA    MDALTRHIQGDVPWCMLFADDIILIDETRAGVSERLEIWRQTLESKGFKISRSKTEYLECKFGDEPSGVGREVMLGSQAIAKRDSVRYLGSVIQGDGEIDGDVTHRIGAGWSKWRLASGVLCDKKIPHKLKGKFFRAMVRPAMFYEAECWPVKNSHIQRMKVAEMRMLRWMCGHTRLDKIKNEVIRQKVGVAPVDKKMGEARLRWFGHVRRRGPDAR*  MKVWERVVEARVREMTSISVNQFGFMPGRSTTEAIHLVRRLVEHFRDKKKDLHMVFIDLENAYDKVPREVLWRCLEAKSVPEAYIRVIKDMYDGAKTRVRTVGGDSDHFPVVMGLHQGSALSPLLFALVMDALTRHIQGDVPWCMLFADDIVLIDETRVGVNERLEVWRQTLESKGFKLSRSKTEYLECKFSAESSEVGRDVKLGSQVIAKRDSFRYLGSVIQGEGEIDGDVTHRIGAGWSKWRLASGVLCDKKVPQKLKGKFYRAVVRPAMLYGAECWPVKNSHVQRMKVAEMRMLRWMRGLTRLDRIRNEVIREKVGVALVDEKMREARLRWYGHVRRRRPDAPVRIYKSAILGHLNSHGSQNALAGPVEAEENRQKTKKEVMEEIIQKSKFFKAQKAKDREENDELTEQLDKDFTSLVESKALLSLTQPDKINALKALVNKNISVGNVKKDEVADVPRKASIGKEKPDTYEMLVSEMALDMRARPSDRTKTPEEIAQEEKERLELLEQEXXXXXXXXXXXXXXDGNASDDNSKLVKDPRTVSGDDLGDDLEEVPRTKLGWIGEILRRKENELESEDAASSGDSDDGEDEGXXXXXXXXXXXXXXXXXXXXDEEQGKTQTIKDWEQSDDDIIDTELEDDDEGFGDDAKKVVKIKDHKEENLSITVAAENKKKMQVFYGVLLQYFAVLANKKPLNSKLLNLLVKPLMEMSAVSPYFAAICARQRLQRTRAQFCEDLKNTGKSSWPSLKTIFLLRLWSMIFPCSDFRHCVMTPAILLMCEYLMRCTIISGRDIAIASFLCSLLLSVIKQSQKFCPEAIVFIQTLLMAALDRKQRSNSQLDNLMEIKELGPLLCIRSSKVEMDSLDFLTLMDLPEDSQYFHSDNYRTSMLVTVLETLQGFVNVYKELISFPEIFMLISKLLCKMAGENHIPDALREKIKDVSQLIDTKAQEHHMLRQPLKMRKKKPVPIRMLNPKFEENFVKGRDYDPDRERA    389.8   1000    216 85.6    185 31  200 0   0   92.6    0   22IV6AV2SN4IV11IL12GSDA1PS1GE3ED1MK4AV6VF9DE29IV1HQ6FY2MV5FL1EG10IV14CR1HL4KR1KR5QE5PL2KE2GR6FY6GR3 85.6    1.1e-107    99.1
gene.9403.0.4.p1    transcript:OIT35479 transcript:OIT35479 8.5e-191    667.5   1721    690 406 1   378 1   378 MLSAPRVSPPAVAVAAPARFKFPNVCVNPVNLLLLHRNVGSSCKRVVVSTKAAYSRMPMDTPGAYQLIDKESGDKFIIWGGTEDDDSSIPSKEVLSWKPLASTPXXXXXXXXXXXXXDEASTRGLTGNFGRLKFRRMRDLVRKSYTKNKERDVIDHNKHNIADASSRSSFSSYNEPDQLKEQQTLSLPRGRAKIQQLDDKKNFQKLIRVEDEDRGIAIENVSKHFAGYSIDSHAQSARVVHPGSKASASPLRGWGGGSSHYSLKRDEIFRERQNLGDENNFFSRKSFQELGCSDYMIESLRNQHFVRPSHIQAMTFGPIIAGKSCIISDQSGSGKTLAYLLPLIQRLRQEELQGLSKPSSQSPRVVVLAPTAELASQV  MLSAPRAPPPAVAVAAPARFKFQNVCGNPVNLLLLHRNVGSSCKRVVVSTKAAYSRMPMDTPGAYQLIDKESGDKFIVWGGTEDDDSSIPSKEVLSWKPLASTSPDNNHPPPTQSSSNEASTRGLTGNFGRLKFRRMRDLVRKSYTKNKERDVIDHDKHNTTDASSRSSFSSYNEPGQLKEQQTLSLPRGRAKIQQLEDRKNSQKLIRVEDEDRDIAIENVSKHFAGYSSDSHAHSARVVHPGSKASASPLRGWGGGSSHYSLKREEIFRQRRNLDDENNFFSRKSFQELGCSDYMIESLRNQHFVRPSHIQAMTFGPIIAGKSCIISDQSGSGKTLAYLLPLIQRLRQEELQGLSKPSSQSPRVVVLAPTAELASQV  MLSAPRVSPPAVAVAAPARFKFPNVCVNPVNLLLLHRNVGSSCKRVVVSTKAAYSRMPMDTPGAYQLIDKESGDKFIIWGGTEDDDSSIPSKEVLSWKPLASTPXXXXXXXXXXXXXDEASTRGLTGNFGRLKFRRMRDLVRKSYTKNKERDVIDHNKHNIADASSRSSFSSYNEPDQLKEQQTLSLPRGRAKIQQLDDKKNFQKLIRVEDEDRGIAIENVSKHFAGYSIDSHAQSARVVHPGSKASASPLRGWGGGSSHYSLKRDEIFRERQNLGDENNFFSRKSFQELGCSDYMIESLRNQHFVRPSHIQAMTFGPIIAGKSCIISDQSGSGKTLAYLLPLIQRLRQEELQGLSKPSSQSPRVVVLAPTAELASQVLSTCRSFSKSGVPFHSMVVTGGFCQRTQLENLRQELDILIATPGRFMFLIKEGYLQLTNLKCAVLDEVDILFSDEDFETAFQCLINSSPITTQYLFVTATLPMDIYNKLVESFPDCELVSGPGMHRTSPGLEEFLVDCSGDETAEKSPDTAFINKKNALLHLVEDSPVPKTIVFCNKIDSCRKVENALKRFDRKGFSIKILPFHAALDQRRRLANMEEFRRSKMENVSLFLVCTDRASRGIDFEGVDHVVLFDYPRDPSEYVRRVGRTARGAGGKGKAFIFAVGKQVSLARRIMERNKKGHPVHDVPSILT*  MLSAPRAPPPAVAVAAPARFKFQNVCGNPVNLLLLHRNVGSSCKRVVVSTKAAYSRMPMDTPGAYQLIDKESGDKFIVWGGTEDDDSSIPSKEVLSWKPLASTSPDNNHPPPTQSSSNEASTRGLTGNFGRLKFRRMRDLVRKSYTKNKERDVIDHDKHNTTDASSRSSFSSYNEPGQLKEQQTLSLPRGRAKIQQLEDRKNSQKLIRVEDEDRDIAIENVSKHFAGYSSDSHAHSARVVHPGSKASASPLRGWGGGSSHYSLKREEIFRQRRNLDDENNFFSRKSFQELGCSDYMIESLRNQHFVRPSHIQAMTFGPIIAGKSCIISDQSGSGKTLAYLLPLIQRLRQEELQGLSKPSSQSPRVVVLAPTAELASQVCQISSSIKGTFATYSPYCSATTHTKRKK  667.5   1721    378 91.0    344 34  352 0   0   93.1    0   6VASP14PQ3VG50IV25PSXPXDXNXNXHXPXPXPXTXQXSXSXSDN38ND3ITAT14DG20DE1KR2FS11GD14IS4QH30DE4EQ1QR2GD102  91.0    8.5e-191    54.8
gene.69001.9.9.p1   NisylKD955766g0010.1    NisylKD955766g0010.1    1.4e-294    1011.9  2615    531 530 1   530 1   530 MKEMCLAVAPLPFRLGNNLIFHNPLSIGSSSHMDVTRLNSMGGTTTSLYAESAEKDLSDTVSSSRSEGVPLLHMISENESNNWISGDAVVRESEDDEILSLDGDQMSCSLSVVSDSSSLCGDDFIGFEVASEIFGQNFVDAEKSICSVELIAKPGDLVESGVEDDNVSKPFAVKIEEQITDGSSSKSSQVVVQLPLNKGLSAAVSRSVFEVDYIPLWGFTSVCGRRPEMEDALATVPRFLRIPLQMLVGHRVPDGVSRCLSHLTAHFFGVYDGHGGSQVANYCRDRVHAVLAEELEKFMANLNDESIRQNCQEQWKKAFTNCFLMVDDEVGGTGNHEAVAAETVGSTAVVAIVCSSHIIVANCGDSRAVLCRGKEPTALSVDHKPNREDEYARIEAAGGKVIQWNGHRVFGVLAMSRSIGDRYLKPWIIPDPEVMFIPRTKDDECLILASDGLWDVMSNEEACELARKRILLWHKKNGVTLTLERGQGIDPAAQAAAECLSNRAIQKGSKDNITVIVVDLKAQRKFKSKT  MKEMCLAVAPLPFRLGNNLIFRNPPSIGSSSHMDATRLNSMGDTTTSLYAESAEKDLSDTVSSSRSEGVPLLPMISENDRNNWIAGDAVVRESEDDEILSLDGDQVSCSLSVVSDSSSLCGDDFIGFEVASDIYGQNFVDAEKSICSVELIAKPGDLVESGVEDDNVSKPFAVKLEEQITDGSSSKSSQVVVQLPLNKGLSAAVSRSVFEVDYIPLWGFTSVCGRRPEMEDALATVPRFLRIPLQMLVGDRVPDGVSRCLSHLTAHFFGVYDGHGGSQVANYCRDRVHAVLAEELEKFMANLNDESIRQNCQDQWKKAFTNCFLKVDDEVGGTGNREAVAAETVGSTAVVAIVCSSHIIVANCGDSRAVLCRGKEPMALSVDHKPNREDEYARIEAAGGKVIQWNGHRVFGVLAMSRSIGDRYLKPWIIPDPEVMFIPRTKDDECLILASDGLWDVMSNEEACELARKRILLWHKKNGVTLTLERGQGIDPAAQAAAECLSNRATQKGSKDNITVIVVDLKAQRKFKSKT  MKEMCLAVAPLPFRLGNNLIFHNPLSIGSSSHMDVTRLNSMGGTTTSLYAESAEKDLSDTVSSSRSEGVPLLHMISENESNNWISGDAVVRESEDDEILSLDGDQMSCSLSVVSDSSSLCGDDFIGFEVASEIFGQNFVDAEKSICSVELIAKPGDLVESGVEDDNVSKPFAVKIEEQITDGSSSKSSQVVVQLPLNKGLSAAVSRSVFEVDYIPLWGFTSVCGRRPEMEDALATVPRFLRIPLQMLVGHRVPDGVSRCLSHLTAHFFGVYDGHGGSQVANYCRDRVHAVLAEELEKFMANLNDESIRQNCQEQWKKAFTNCFLMVDDEVGGTGNHEAVAAETVGSTAVVAIVCSSHIIVANCGDSRAVLCRGKEPTALSVDHKPNREDEYARIEAAGGKVIQWNGHRVFGVLAMSRSIGDRYLKPWIIPDPEVMFIPRTKDDECLILASDGLWDVMSNEEACELARKRILLWHKKNGVTLTLERGQGIDPAAQAAAECLSNRAIQKGSKDNITVIVVDLKAQRKFKSKT* MKEMCLAVAPLPFRLGNNLIFRNPPSIGSSSHMDATRLNSMGDTTTSLYAESAEKDLSDTVSSSRSEGVPLLPMISENDRNNWIAGDAVVRESEDDEILSLDGDQVSCSLSVVSDSSSLCGDDFIGFEVASDIYGQNFVDAEKSICSVELIAKPGDLVESGVEDDNVSKPFAVKLEEQITDGSSSKSSQVVVQLPLNKGLSAAVSRSVFEVDYIPLWGFTSVCGRRPEMEDALATVPRFLRIPLQMLVGDRVPDGVSRCLSHLTAHFFGVYDGHGGSQVANYCRDRVHAVLAEELEKFMANLNDESIRQNCQDQWKKAFTNCFLKVDDEVGGTGNREAVAAETVGSTAVVAIVCSSHIIVANCGDSRAVLCRGKEPMALSVDHKPNREDEYARIEAAGGKVIQWNGHRVFGVLAMSRSIGDRYLKPWIIPDPEVMFIPRTKDDECLILASDGLWDVMSNEEACELARKRILLWHKKNGVTLTLERGQGIDPAAQAAAECLSNRATQKGSKDNITVIVVDLKAQRKFKSKT  1011.9  2615    530 96.6    512 18  519 0   0   97.9    0   21HR2LP9VA7GD29HP5EDSR4SA20MV25ED1FY40IL74HD62ED11MK10HR40TM127IT25 96.6    1.4e-294    99.8

然而,我担心如果我不考虑,gene.9403我可能会删除错误的条目。有没有办法也考虑第一列?

先感谢您。

答案1

尝试这个:

awk '
  {line = gensub(/^([^.]+\.[^.]+)[^[:blank:]]*/, "\1", 1, $0)}
  !seen[line]++
' file

相关内容