从 .txt 文件中删除一组字符

从 .txt 文件中删除一组字符

我有一个文本文件(大小超过 1GB)并且包含如下行:

10830110bcdf9002a6ade209c5cafbc02e90f84696b04c166c7029c427d1ef4a56580dbbce84a0574ba1fc17c8035ccec4679e5dcb6a6a331ebdb15d6cc0661378f409c3
1083021106e581c71003b987a75f18543cf5858b9fcfc5e04c0dddd79cd18764a865ba86d027de6d1900dc171e4d90a0564abbce99b812b821bd0d7d37aad72ead19c17
10840110dbd43121ef0c51a8ba62193eac247f57f1909e270eeb53d68da60ad61519f19cfb0511ec2431ca54e2fcabf6fa985615ec06def5ba1b753e8ad96d0564aa4c
1084011028375c62fd132d5a4e41ffef2419da345b6595fba8a49b5136de59a884d878fc9789009843c49866a0dc97889242b9fb0b8c112f1423e3b220bc04a2d7dfbdff
10880221005f0e261be654e4c52034d8d05b5c4dc0456b7868763367ab998b7d5886d64fbb24efd14cea668d00bfe8048eb8f096c3306bbb31aaea3e06710fa8c0bb8fca71
108501103461fca7077fc2f0d895048606b828818047a64611ec94443e52cc2d39c968363359de5fc76df48e0bf3676b73b1f8fea5780c2af22c507f83331cc0fbfe6ea9
1085022100a4ce8a09d1f28e78530ce940d6fcbd3c1fe2cb00e7b212b893ce78f8839a11868281179b4f2c812b8318f8d3f9a598b4da750a0ba6054d7e1b743bb67896ee62
1086022100638681ade4b306295815221c5b445ba017943ae59c4c742f0b1442dae4902a56d173a6f859dc6088b6364224ec17c4e2213d9d3c96bd9992b696d7c13b234b50

我有这些字符串

10830110
1083021
10840110
10840110
1088022100
10850110
1085022100
1086022100

我需要从文本文件的每一行的开头删除这些字符串,以便最终结果保存到新文件中,上面的行将如下所示:

bcdf9002a6ade209c5cafbc02e90f84696b04c166c7029c427d1ef4a56580dbbce84a0574ba1fc17c8035ccec4679e5dcb6a6a331ebdb15d6cc0661378f409c3
106e581c71003b987a75f18543cf5858b9fcfc5e04c0dddd79cd18764a865ba86d027de6d1900dc171e4d90a0564abbce99b812b821bd0d7d37aad72ead19c17
dbd43121ef0c51a8ba62193eac247f57f1909e270eeb53d68da60ad61519f19cfb0511ec2431ca54e2fcabf6fa985615ec06def5ba1b753e8ad96d0564aa4c
28375c62fd132d5a4e41ffef2419da345b6595fba8a49b5136de59a884d878fc9789009843c49866a0dc97889242b9fb0b8c112f1423e3b220bc04a2d7dfbdff
5f0e261be654e4c52034d8d05b5c4dc0456b7868763367ab998b7d5886d64fbb24efd14cea668d00bfe8048eb8f096c3306bbb31aaea3e06710fa8c0bb8fca71
3461fca7077fc2f0d895048606b828818047a64611ec94443e52cc2d39c968363359de5fc76df48e0bf3676b73b1f8fea5780c2af22c507f83331cc0fbfe6ea9
a4ce8a09d1f28e78530ce940d6fcbd3c1fe2cb00e7b212b893ce78f8839a11868281179b4f2c812b8318f8d3f9a598b4da750a0ba6054d7e1b743bb67896ee62
638681ade4b306295815221c5b445ba017943ae59c4c742f0b1442dae4902a56d173a6f859dc6088b6364224ec17c4e2213d9d3c96bd9992b696d7c13b234b50

答案1

如果我理解正确的话,您有一个名为的文件file1,其中包含如下行:

10830110bcdf9002a6ade209c5cafbc02e90f84696b04c166c7029c427d1ef4a56580dbbce84a0574ba1fc17c8035ccec4679e5dcb6a6a331ebdb15d6cc0661378f409c3
1083021106e581c71003b987a75f18543cf5858b9fcfc5e04c0dddd79cd18764a865ba86d027de6d1900dc171e4d90a0564abbce99b812b821bd0d7d37aad72ead19c17
10840110dbd43121ef0c51a8ba62193eac247f57f1909e270eeb53d68da60ad61519f19cfb0511ec2431ca54e2fcabf6fa985615ec06def5ba1b753e8ad96d0564aa4c
1084011028375c62fd132d5a4e41ffef2419da345b6595fba8a49b5136de59a884d878fc9789009843c49866a0dc97889242b9fb0b8c112f1423e3b220bc04a2d7dfbdff
10880221005f0e261be654e4c52034d8d05b5c4dc0456b7868763367ab998b7d5886d64fbb24efd14cea668d00bfe8048eb8f096c3306bbb31aaea3e06710fa8c0bb8fca71
108501103461fca7077fc2f0d895048606b828818047a64611ec94443e52cc2d39c968363359de5fc76df48e0bf3676b73b1f8fea5780c2af22c507f83331cc0fbfe6ea9
1085022100a4ce8a09d1f28e78530ce940d6fcbd3c1fe2cb00e7b212b893ce78f8839a11868281179b4f2c812b8318f8d3f9a598b4da750a0ba6054d7e1b743bb67896ee62
1086022100638681ade4b306295815221c5b445ba017943ae59c4c742f0b1442dae4902a56d173a6f859dc6088b6364224ec17c4e2213d9d3c96bd9992b696d7c13b234b50

以及一个名为的文件file2,其中包含如下行:

10830110
1083021
10840110
10840110
1088022100
10850110
1085022100
1086022100

并且您想要创建一个名为 的新文件file3,其中包含从中file1减去(从每行的开头)匹配行中的字符串,file2如下所示:

bcdf9002a6ade209c5cafbc02e90f84696b04c166c7029c427d1ef4a56580dbbce84a0574ba1fc17c8035ccec4679e5dcb6a6a331ebdb15d6cc0661378f409c3
106e581c71003b987a75f18543cf5858b9fcfc5e04c0dddd79cd18764a865ba86d027de6d1900dc171e4d90a0564abbce99b812b821bd0d7d37aad72ead19c17
dbd43121ef0c51a8ba62193eac247f57f1909e270eeb53d68da60ad61519f19cfb0511ec2431ca54e2fcabf6fa985615ec06def5ba1b753e8ad96d0564aa4c
28375c62fd132d5a4e41ffef2419da345b6595fba8a49b5136de59a884d878fc9789009843c49866a0dc97889242b9fb0b8c112f1423e3b220bc04a2d7dfbdff
5f0e261be654e4c52034d8d05b5c4dc0456b7868763367ab998b7d5886d64fbb24efd14cea668d00bfe8048eb8f096c3306bbb31aaea3e06710fa8c0bb8fca71
3461fca7077fc2f0d895048606b828818047a64611ec94443e52cc2d39c968363359de5fc76df48e0bf3676b73b1f8fea5780c2af22c507f83331cc0fbfe6ea9
a4ce8a09d1f28e78530ce940d6fcbd3c1fe2cb00e7b212b893ce78f8839a11868281179b4f2c812b8318f8d3f9a598b4da750a0ba6054d7e1b743bb67896ee62
638681ade4b306295815221c5b445ba017943ae59c4c742f0b1442dae4902a56d173a6f859dc6088b6364224ec17c4e2213d9d3c96bd9992b696d7c13b234b50

可以在 bash 中这样做:

while read p; do
        read l <&4
        n=${l#"$p"}
        echo "${n}" >> file3
        done <"file2" 4<"file1"

另一方面,如果您没有file2,而是有一组类似的字符串10830110, 1083021 ... and 1086022100,您想在它们出现在任何行的开头时删除它们,file1并将修改后的行保存到名为 的新文件中,file3那么可以使用 sed 来完成,如下所示:

sed 's/^10830110\|^1083021\|^10840110\|^10840110\|^1088022100\|^10850110\|^1085022100\|^1086022100//g' file1 > file3

相关内容