根据匹配模式将 UNIX 中的文件拆分为多个文件

根据匹配模式将 UNIX 中的文件拆分为多个文件

我的文件内容如下

#2211000000031#####{1:F01BKXXXX0AXXX0000000000}{2:I103BOTKJPJTXXXXN}{3:{121:faffba68-3ebe-4653-93fe-8b082ff226a5}}
{4:@@:20:EDCAK0010245@@:23B:CRED@@:32A:220303JPY10000,@@:33B:JPY10000,@@:50K:ABC@@WLG@@:52A:BKNZNZ20XXX@@:59:SUPER SERVICES LTD@@PO BOX 9999@@XX@@NEW YORK@@:70:/RFB/AUTOTEST-020356@@:71A:SHA@@-}   
#2211000002311#####< Saa:Body>< AppHdr xmlns="urn:iso:std:iso:20022:tech:xsd:head.001.001.02">< Fr>< FIId>< FinInstnId>< BICFI>BKNZNZ22985</ BICFI></ FinInstnId></ FIId></ Fr>< To>< FIId>< FinInstnId>< BICFI>BKNZNZ22985</ BICFI></ FinInstnId></ FIId></ To>< BizMsgIdr>2_1 Generic pacs 008</ BizMsgIdr>< MsgDefIdr>pacs.008.001.08</ MsgDefIdr>< BizSvc>swift.cbprplus.02</ BizSvc>< CreDt>2022-03-01T21:40:01+13:00</ CreDt></ AppHdr>< Document xmlns="urn:iso:std:iso:20022:tech:xsd:pacs.008.001.08">< FIToFICstmrCdtTrf>< GrpHdr>< MsgId>Generic Pacs 008</ MsgId>< CreDtTm>2021-12-09T07:08:54+12:00</ CreDtTm>< NbOfTxs>1</ NbOfTxs>< SttlmInf>< SttlmMtd>INDA</ SttlmMtd></ SttlmInf></ GrpHdr>< CdtTrfTxInf>< PmtId>< InstrId>Generic Pacs 008</ InstrId>< EndToEndId>Generic Pacs 008</ EndToEndId>< UETR>a19e9375-3e20-41ed-b75c-bb40d5afe540</ UETR></ PmtId>< IntrBkSttlmAmt Ccy="NZD">65.00</ IntrBkSttlmAmt>< IntrBkSttlmDt>2022-04-20</ IntrBkSttlmDt>< InstdAmt Ccy="NZD">1.00</ InstdAmt>< ChrgBr>SHAR</ ChrgBr>< PrvsInstgAgt1>< FinInstnId>< BICFI>NATAUS33</ BICFI></ FinInstnId></ PrvsInstgAgt1>< InstgAgt>< FinInstnId>< BICFI>BKNZNZ22985</ BICFI></ FinInstnId></ InstgAgt>< InstdAgt>< FinInstnId>< BICFI>BKNZNZ22985</ BICFI></ FinInstnId></ InstdAgt>< Dbtr>< Nm>REMITTING PERSON </ Nm>< PstlAdr>< StrtNm>A STREET NAME</ StrtNm>< BldgNb>999</ BldgNb>< BldgNm>THE BIG BUILDING</ BldgNm>< Flr>1</ Flr>< PstCd>1234</ PstCd>< TwnNm>A TOWN</ TwnNm>< TwnLctnNm>A COUNTY</ TwnLctnNm>< DstrctNm>WESTERN DISTRICT</ DstrctNm>< CtrySubDvsn>A STATE IN THE USA</ CtrySubDvsn>< Ctry>US</ Ctry></ PstlAdr></ Dbtr>< DbtrAgt>< FinInstnId/></ DbtrAgt>< CdtrAgt>< FinInstnId/ ></ CdtrAgt>< Cdtr>< Nm>A BENEFIARY PERSON</ Nm>< PstlAdr>< StrtNm>A BENEFICIARY ADDRESS</ StrtNm>< BldgNb>77</ BldgNb>< BldgNm>THE BUILDING WITH NO NAME</ BldgNm>< Flr>50</ Flr>< Room>4566</ Room>< PstCd>4556</ PstCd>< TwnNm>A BENEFICIARY TOWN</ TwnNm>< TwnLctnNm>A BENEFICIARY SUBURB</ TwnLctnNm>< DstrctNm>A DISTRICT</ DstrctNm>< CtrySubDvsn>A PROVINCE </ CtrySubDvsn>< Ctry>Cnty</ Ctry></ PstlAdr></ Cdtr>< CdtrAcct>< Id>< Othr>< Id>0209750998907040</ Id></ Othr></ Id></ CdtrAcct>< RmtInf>< Ustrd>REMITTANCE INFORMATION</ Ustrd></ RmtInf></ CdtTrfTxInf></ FIToFICstmrCdtTrf></ Document></ Saa:Body></ Saa:DataPDU> 
#2223700000031#####<AppHdr xmlns="urn:iso:std:iso:20022:tech:xsd:head.001.001.02"><Fr><FIId><FinInstnId><BICFI>BKNZ22985</BICFI></FinInstnId></FIId></Fr><To><FIId><FinInstnId><BICFI>ASBBNZ2AXXX</BICFI></FinInstnId></FIId></To><BizMsgIdr>AVP0000676232</BizMsgIdr><MsgDefIdr>pacs.004.001.10</MsgDefIdr><BizSvc>pnz.hvcs.01</BizSvc><CreDt>2022-08-25T09:36:45+12:00</CreDt></AppHdr><Document xmlns="urn:iso:std:iso:20022:tech:xsd:pacs.004.001.10"><PmtRtr><GrpHdr><MsgId>BNZAVP0000676232</MsgId><CreDtTm>2022-08-25T09:36:45+12:00</CreDtTm><NbOfTxs>1</NbOfTxs><SttlmInf><SttlmMtd>CLRG</SttlmMtd><ClrSys><Cd>AVP</Cd></ClrSys></SttlmInf></GrpHdr><TxInf><RtrId>BNZAVP0000676232</RtrId><OrgnlGrpInf><OrgnlMsgId>ESAS.03808250935</OrgnlMsgId><OrgnlMsgNmId>pacs.008.001.09</OrgnlMsgNmId><OrgnlCreDtTm>2022-08-25T09:35:43+12:00</OrgnlCreDtTm></OrgnlGrpInf><OrgnlInstrId>ESAS.03808250935</OrgnlInstrId><OrgnlEndToEndId>E2ET.136</OrgnlEndToEndId><OrgnlUETR>875ddac6-d7f2-430c-86c0-a0f63cfdd387</OrgnlUETR><OrgnlIntrBkSttlmAmt Ccy="NZD">38.00</OrgnlIntrBkSttlmAmt><OrgnlIntrBkSttlmDt>2022-08-25</OrgnlIntrBkSttlmDt><RtrdIntrBkSttlmAmt Ccy="NZD">38</RtrdIntrBkSttlmAmt><IntrBkSttlmDt>2022-08-25</IntrBkSttlmDt><RtrdInstdAmt Ccy="XX">38.00</RtrdInstdAmt><ChrgBr>DEBT</ChrgBr><InstgAgt><FinInstnId><BICFI>BKNZNZ22985</BICFI><ClrSysMmbId><MmbId>BKNZNZ22985</MmbId></ClrSysMmbId></FinInstnId></InstgAgt><InstdAgt><FinInstnId><BICFI>2AXXX</BICFI></FinInstnId></InstdAgt><RtrChain><Dbtr><Pty><Nm>Test Customer</Nm><PstlAdr><TwnNm>XXXXXXX</TwnNm><Ctry>XX</Ctry></PstlAdr></Pty></Dbtr><DbtrAcct><Id><Othr><Id>0205730000000000</Id><SchmeNm><Cd>BBAN</Cd></SchmeNm></Othr></Id><Nm>Test Customer</Nm></DbtrAcct><Cdtr><Pty><Nm>Johnny Bravo</Nm><PstlAdr><AdrLine>12 Jellicoe Street</AdrLine><AdrLine>XXXXXX</AdrLine><AdrLine>NZ</AdrLine></PstlAdr></Pty></Cdtr><CdtrAcct><Id><Othr><Id>123166075056900</Id><SchmeNm><Cd>BBAN</Cd></SchmeNm></Othr></Id><Nm>PNZ Default</Nm></CdtrAcct></RtrChain><RtrRsnInf><Rsn><Cd>AC01</Cd></Rsn></RtrRsnInf></TxInf></PmtRtr></Document>

背景:我有 3 个输入行,所有 3 行将进入 3 个单独的输出文件... 3 个输入行之间的区别: 第一行:有大括号 ({}) 第二行有 (< Saa:Body>< AppHdr)作为开始第 3 行中的 xml 标记在开始中仅具有 (< AppHdr) 作为 xml 标记

输出文件将包含如下数据:

第一个文件:

#2211000000031#####{1:F01BKXXXX0AXXX0000000000}{2:I103BOTKJPJTXXXXN}{3:{121:faffba68-3ebe-4653-93fe-8b082ff226a5}}{4:@@:20:EDCAK0010245@@:23B:CRED@@:32A:220303JPY10000,@@:33B:JPY10000,@@:50K:ABC@@WLG@@:52A:BKNZNZ20XXX@@:59:SUPER SERVICES LTD@@PO BOX 9999@@XX@@NEW YORK@@:70:/RFB/AUTOTEST-020356@@:71A:SHA@@-}  

第二个文件:

#2211000002311#####< Saa:Body>< AppHdr xmlns="urn:iso:std:iso:20022:tech:xsd:head.001.001.02">< Fr>< FIId>< FinInstnId>< BICFI>BKNZNZ22985</ BICFI></ FinInstnId></ FIId></ Fr>< To>< FIId>< FinInstnId>< BICFI>BKNZNZ22985</ BICFI></ FinInstnId></ FIId></ To>< BizMsgIdr>2_1 Generic pacs 008</ BizMsgIdr>< MsgDefIdr>pacs.008.001.08</ MsgDefIdr>< BizSvc>swift.cbprplus.02</ BizSvc>< CreDt>2022-03-01T21:40:01+13:00</ CreDt></ AppHdr>< Document xmlns="urn:iso:std:iso:20022:tech:xsd:pacs.008.001.08">< FIToFICstmrCdtTrf>< GrpHdr>< MsgId>Generic Pacs 008</ MsgId>< CreDtTm>2021-12-09T07:08:54+12:00</ CreDtTm>< NbOfTxs>1</ NbOfTxs>< SttlmInf>< SttlmMtd>INDA</ SttlmMtd></ SttlmInf></ GrpHdr>< CdtTrfTxInf>< PmtId>< InstrId>Generic Pacs 008</ InstrId>< EndToEndId>Generic Pacs 008</ EndToEndId>< UETR>a19e9375-3e20-41ed-b75c-bb40d5afe540</ UETR></ PmtId>< IntrBkSttlmAmt Ccy="NZD">65.00</ IntrBkSttlmAmt>< IntrBkSttlmDt>2022-04-20</ IntrBkSttlmDt>< InstdAmt Ccy="NZD">1.00</ InstdAmt>< ChrgBr>SHAR</ ChrgBr>< PrvsInstgAgt1>< FinInstnId>< BICFI>NATAUS33</ BICFI></ FinInstnId></ PrvsInstgAgt1>< InstgAgt>< FinInstnId>< BICFI>BKNZNZ22985</ BICFI></ FinInstnId></ InstgAgt>< InstdAgt>< FinInstnId>< BICFI>BKNZNZ22985</ BICFI></ FinInstnId></ InstdAgt>< Dbtr>< Nm>REMITTING PERSON </ Nm>< PstlAdr>< StrtNm>A STREET NAME</ StrtNm>< BldgNb>999</ BldgNb>< BldgNm>THE BIG BUILDING</ BldgNm>< Flr>1</ Flr>< PstCd>1234</ PstCd>< TwnNm>A TOWN</ TwnNm>< TwnLctnNm>A COUNTY</ TwnLctnNm>< DstrctNm>WESTERN DISTRICT</ DstrctNm>< CtrySubDvsn>A STATE IN THE USA</ CtrySubDvsn>< Ctry>US</ Ctry></ PstlAdr></ Dbtr>< DbtrAgt>< FinInstnId/></ DbtrAgt>< CdtrAgt>< FinInstnId/ ></ CdtrAgt>< Cdtr>< Nm>A BENEFIARY PERSON</ Nm>< PstlAdr>< StrtNm>A BENEFICIARY ADDRESS</ StrtNm>< BldgNb>77</ BldgNb>< BldgNm>THE BUILDING WITH NO NAME</ BldgNm>< Flr>50</ Flr>< Room>4566</ Room>< PstCd>4556</ PstCd>< TwnNm>A BENEFICIARY TOWN</ TwnNm>< TwnLctnNm>A BENEFICIARY SUBURB</ TwnLctnNm>< DstrctNm>A DISTRICT</ DstrctNm>< CtrySubDvsn>A PROVINCE </ CtrySubDvsn>< Ctry>Cnty</ Ctry></ PstlAdr></ Cdtr>< CdtrAcct>< Id>< Othr>< Id>0209750998907040</ Id></ Othr></ Id></ CdtrAcct>< RmtInf>< Ustrd>REMITTANCE INFORMATION</ Ustrd></ RmtInf></ CdtTrfTxInf></ FIToFICstmrCdtTrf></ Document></ Saa:Body></ Saa:DataPDU>

第三个文件:

#2223700000031#####< AppHdr  xmlns="urn:iso:std:iso:20022:tech:xsd:head.001.001.02">< Fr>< FIId>< FinInstnId>< BICFI>BKNZ22985</ BICFI></ FinInstnId></ FIId></ Fr>< To>< FIId>< FinInstnId>< BICFI>ASBBNZ2AXXX</ BICFI></ FinInstnId></ FIId></ To>< BizMsgIdr>AVP0000676232</ BizMsgIdr>< MsgDefIdr>pacs.004.001.10</ MsgDefIdr>< BizSvc>pnz.hvcs.01</ BizSvc>< CreDt>2022-08-25T09:36:45+12:00</ CreDt></ AppHdr>< Document xmlns="urn:iso:std:iso:20022:tech:xsd:pacs.004.001.10">< PmtRtr>< GrpHdr>< MsgId>BNZAVP0000676232</ MsgId></ Document>

我对 UNIX shell 脚本的了解非常有限。行数会有所不同。

答案1

使用任何 awk:

$ awk '{print > ((/{/ ? "x" : "y") ".txt")}' file

$ head *.txt
==> x.txt <==
#2211000000031#####{1:F01BKXXXX0AXXX0000000000}{2:I103BOTKJPJTXXXXN}{3:{121:faffba68-3ebe-4653-93fe-8b082ff226a5}}
{4:@@:20:EDCAK0010245@@:23B:CRED@@:32A:220303JPY10000,@@:33B:JPY10000,@@:50K:ABC@@WLG@@:52A:BKNZNZ20XXX@@:59:SUPER SERVICES LTD@@PO BOX 9999@@XX@@NEW YORK@@:70:/RFB/AUTOTEST-020356@@:71A:SHA@@-}
#2211000000038#####{1:F01XXXX20AXXX0000000000}{2:I103BOTKJPJTXXXXN}{3:{121:50c659ec-6fb2-44a7-8312-26a270330aed}}{4:@@:20:ELCAK0020721@@:23B:CRED@@:32A:220303JPY1000,@@:33B:JPY1000,@@:50K:TESTAPP@@:52A:BKNZNZ20XXX@@:59:XYZ SERVICES LTD@@PO BOX 16130@@MARS@@CAL@@:70:/RFB/AUTOTEST-021013@@:71A:SHA@@-}

==> y.txt <==
#2211000002311#####< Saa:Body>< AppHdr xmlns="urn:iso:std:iso:20022:tech:xsd:head.001.001.02">< Fr>< FIId>< FinInstnId>< BICFI>BKNZNZ22985</ BICFI></ FinInstnId></ FIId></ Fr>< To>< FIId>< FinInstnId>< BICFI>BKNZNZ22985</ BICFI></ FinInstnId></ FIId></ To>< BizMsgIdr>2_1 Generic pacs 008</ BizMsgIdr>< MsgDefIdr>pacs.008.001.08</ MsgDefIdr>< BizSvc>swift.cbprplus.02</ BizSvc>< CreDt>2022-03-01T21:40:01+13:00</ CreDt></ AppHdr>< Document xmlns="urn:iso:std:iso:20022:tech:xsd:pacs.008.001.08">< FIToFICstmrCdtTrf>< GrpHdr>< MsgId>Generic Pacs 008</ MsgId>< CreDtTm>2021-12-09T07:08:54+12:00</ CreDtTm>< NbOfTxs>1</ NbOfTxs>< SttlmInf>< SttlmMtd>INDA</ SttlmMtd></ SttlmInf></ GrpHdr>< CdtTrfTxInf>< PmtId>< InstrId>Generic Pacs 008</ InstrId>< EndToEndId>Generic Pacs 008</ EndToEndId>< UETR>a19e9375-3e20-41ed-b75c-bb40d5afe540</ UETR></ PmtId>< IntrBkSttlmAmt Ccy="NZD">65.00</ IntrBkSttlmAmt>< IntrBkSttlmDt>2022-04-20</ IntrBkSttlmDt>< InstdAmt Ccy="NZD">1.00</ InstdAmt>< ChrgBr>SHAR</ ChrgBr>< PrvsInstgAgt1>< FinInstnId>< BICFI>NATAUS33</ BICFI></ FinInstnId></ PrvsInstgAgt1>< InstgAgt>< FinInstnId>< BICFI>BKNZNZ22985</ BICFI></ FinInstnId></ InstgAgt>< InstdAgt>< FinInstnId>< BICFI>BKNZNZ22985</ BICFI></ FinInstnId></ InstdAgt>< Dbtr>< Nm>REMITTING PERSON </ Nm>< PstlAdr>< StrtNm>A STREET NAME</ StrtNm>< BldgNb>999</ BldgNb>< BldgNm>THE BIG BUILDING</ BldgNm>< Flr>1</ Flr>< PstCd>1234</ PstCd>< TwnNm>A TOWN</ TwnNm>< TwnLctnNm>A COUNTY</ TwnLctnNm>< DstrctNm>WESTERN DISTRICT</ DstrctNm>< CtrySubDvsn>A STATE IN THE USA</ CtrySubDvsn>< Ctry>US</ Ctry></ PstlAdr></ Dbtr>< DbtrAgt>< FinInstnId/></ DbtrAgt>< CdtrAgt>< FinInstnId/ ></ CdtrAgt>< Cdtr>< Nm>A BENEFIARY PERSON</ Nm>< PstlAdr>< StrtNm>A BENEFICIARY ADDRESS</ StrtNm>< BldgNb>77</ BldgNb>< BldgNm>THE BUILDING WITH NO NAME</ BldgNm>< Flr>50</ Flr>< Room>4566</ Room>< PstCd>4556</ PstCd>< TwnNm>A BENEFICIARY TOWN</ TwnNm>< TwnLctnNm>A BENEFICIARY SUBURB</ TwnLctnNm>< DstrctNm>A DISTRICT</ DstrctNm>< CtrySubDvsn>A PROVINCE </ CtrySubDvsn>< Ctry>Cnty</ Ctry></ PstlAdr></ Cdtr>< CdtrAcct>< Id>< Othr>< Id>0209750998907040</ Id></ Othr></ Id></ CdtrAcct>< RmtInf>< Ustrd>REMITTANCE INFORMATION</ Ustrd></ RmtInf></ CdtTrfTxInf></ FIToFICstmrCdtTrf></ Document></ Saa:Body></ Saa:DataPDU>

如果您的真实输入可以{在块中包含 s <,反之亦然,则编辑您在问题中提供的示例以供我们测试,以包括此类情况以及任何其他非晴天情况,然后我们可以调整正则表达式以适应。

答案2

这是一种低效的方法,因为它会运行文件两次,但如果这是一次性的,这将完成工作。

grep -e '#[0-9]\{13\}#####{.\+}' inputfilename >file1
grep -e '#[0-9]\{13\}#####<.\+>' inputfilename >file2

毫无疑问,有人会提供一种awk解决方案,这会更好,因为它可以在一次传递输入文件中创建这两个文件,如果grep上述每个文件都需要很长时间,则很有用。

相关内容