从文件中提取行范围,将每个范围保存到单独的文件中

从文件中提取行范围,将每个范围保存到单独的文件中

我有一个看起来像这样的文件

740*02/01/2016*00:00*
EJ LOG COPIED OK

AUTO INIT COPY DRIVE NOT CONFIGURED

E1EF3901
[020t*741*02/01/2016*05:45*
     *TRANSACTION STARTED*
[020t CARD INSERTED
[020tCARD: *************5845
DATE 01-02-16    TIME 05:45:52
 05:46:26 GENAC 1 : ARQC
EXTERNAL AUTHENTICATE: NO ARPC
 05:46:30 GENAC 2 : AAC
 01 FEB 2016     05:47:41      10160021

     WITHDRAW
     FROM XXXXXXXX    ?
INVALID TRANSCATION ON TERMINAL.
-----------------------
[020t 05:47:05 CARD TAKEN
[020t 05:47:07 TRANSACTION END
[0r(1)2[000p[040qe1w3h162[020t*742*02/01/2016*05:47*
     *TRANSACTION STARTED*
[020t CARD INSERTED
[020tCARD: *************2584
DATE 01-02-16    TIME 05:47:27
 05:48:00 GENAC 1 : ARQC
 05:48:05 GENAC 2 : TC
[020t 05:48:16 CARD TAKEN
[020t 05:48:22 NOTES PRESENTED 0,0,2,0
 01 FEB 2016     05:48:52      10160021

     WITHDRAW
     FROM XXXXXXXX    ?
AMT   GHC40.00
[020t 05:48:31 TRANSACTION END
[0r(1)2[000p[040qe1w3h162[020t*743*02/01/2016*05:57*
     *TRANSACTION STARTED*
[020t CARD INSERTED
[020tCARD: *************3862
DATE 01-02-16    TIME 05:57:28
 01 FEB 2016     05:58:33      10160021

     INQUIRY
     FROM XXXXXXXX90018
AVAIL          GHC1260.20  
LEDGER         GHC1260.20  
[020t 05:58:06 CARD TAKEN
[020t 05:58:11 TRANSACTION END
[0r(1)2[000p[040qe1w3h162[020t*744*02/01/2016*06:43*
     *TRANSACTION STARTED*
[020t CARD INSERTED
[020tCARD: *************1972
DATE 01-02-16    TIME 06:43:53
 01 FEB 2016     06:44:56      10160021
5029110111271972
4490    4490
     INQUIRY
     FROM XXXXXXXX23013
AVAIL          GHC14.28
LEDGER         GHC14.28
[020t 06:44:25 CARD TAKEN
[020t 06:44:29 TRANSACTION END
[0r(1)2[000p[040qe1w3h162[020t*745*02/01/2016*06:56*

*TRANSACTION STARTED*并需要提取和之间的内容TRANSACTION END,忽略所有其他信息,并为每个范围创建一个新文件。
新文件将仅包含

    [020t CARD INSERTED
    [020tCARD: *************2584
    DATE 01-02-16    TIME 05:47:27
     05:48:00 GENAC 1 : ARQC
     05:48:05 GENAC 2 : TC
    [020t 05:48:16 CARD TAKEN
    [020t 05:48:22 NOTES PRESENTED 0,0,2,0
     01 FEB 2016     05:48:52      10160021

         WITHDRAW
         FROM XXXXXXXX    ?
    AMT   GHC40.00
    [020t 05:48:31

这就是我所拥有的

    #! /usr/bin/perl/ -w

print "Content-type: text/html\n\n";

use strict;


my $somefile = "/home/lord-ivan/Soures_Code/Perl/projects/Data/EJDATA.LOG";

if(open (my $fh, '<:encoding(UTF-8)', $somefile))
{
    print " $somefile is opened   $!";
}else
{
    die "Could not open file '$somefile' $!";
}

while (<$fh>) {

    if (/TRANSACTION STARTED/ .. /TRANSACTION END/) 
{
     next if /TRANSACTION\s*(STARTED|END)/;
    print $_;   
}
}

close ($somefile);



my $outputfile = "/home/lord-ivan/Soures_Code/Perl/projects/EJ Transport/Queue/";

if(open (my $ofh, '>>:encoding(UTF-8)',print $ofh $outputfile))
{
    print " $outputfile worked   $!";
}else
{
    die "Could not write to  $outputfile  $!";
}


close ($outputfile);

答案1

这是一个打开文件并写入内容的快速脚本。它使用触发器运算符的返回值来确定它是否是第一行(值为 1),或者是否是最后一行(值以“E0”结尾) ”)。

use strict;
use warnings;

my $file = "a001";
my $fh;

while (<>) {
    my $l = /start/ .. /stop/; 
    if ($l && $l == 1) {
        open $fh, ">", $file++ or die "Cannot open file"
    } elsif ($l && $l !~ /E0$/) { 
        print $fh $_
    }
}

答案2

对于 awk , in.log 是包含内容的输入文件

cat in.log | awk '/TRANSACTION STARTED/{getline;filenum++;print " ">filenum".out";f=1;}; /TRANSACTION END/{gsub(/TRANSACTION END/,"");print $0>>filenum".out";f=0} ; {if(f==1){print $0>>filenum".out";};}'

创建每个文件,从 1.out 开始,例如

[020t CARD INSERTED
[020tCARD: *************5845
DATE 01-02-16    TIME 05:45:52
 05:46:26 GENAC 1 : ARQC
EXTERNAL AUTHENTICATE: NO ARPC
 05:46:30 GENAC 2 : AAC
 01 FEB 2016     05:47:41      10160021

     WITHDRAW
     FROM XXXXXXXX    ?
INVALID TRANSCATION ON TERMINAL.
-----------------------
[020t 05:47:05 CARD TAKEN
[020t 05:47:07 

答案3

我会这样做,而不是像你一样使用触发器运算符:

#!/usr/bin/perl

use warnings;
use strict;
#set record separator
local $/ = 'TRANSACTION END'; 
#output file starts number 0. 
my $output_file_count = 0; 
#iterate filehandle - <> is the magic FH, so reads STDIN or files 
#specified as args to the script. 
while ( <> ) { 
    #discard anything before 'TRANSACTION STARTED'
    s/.*\*TRANSACTION STARTED\*\s*\n//ms;
    #skip unless there's an 'END' here (so trailing junk gets discarded)
    next unless m/TRANSACTION END/; 
    #open a new output file. 
    open ( my $output, '>', "transaction_".$output_file_count++.".log" ) or die $!;
    #set it as the location to print by default
    select $output; 
    #print this record (to $output, because of select)
    print; 
    #close it
    close ( $output );
}

答案4

这一句应该可以解决问题:

$ perl -ne 'BEGIN{$fname=0};if ((/TRANSACTION STARTED/ .. /TRANSACTION END/) && $_ !~ /TRANSACTION\s*(STARTED|END)/){open FILE, ">>${fname}.txt";print FILE $_;}else{close($fname);$fname++}' file

文件名只是带有“txt”后缀的数字。我的输出如下所示:

$ head -50 [0-9]*.txt
==> 11.txt <==
[020t CARD INSERTED
[020tCARD: *************2584
DATE 01-02-16    TIME 05:47:27
 05:48:00 GENAC 1 : ARQC
 05:48:05 GENAC 2 : TC
[020t 05:48:16 CARD TAKEN
[020t 05:48:22 NOTES PRESENTED 0,0,2,0
 01 FEB 2016     05:48:52      10160021

     WITHDRAW
     FROM XXXXXXXX    ?
AMT   GHC40.00

==> 14.txt <==
[020t CARD INSERTED
[020tCARD: *************3862
DATE 01-02-16    TIME 05:57:28
 01 FEB 2016     05:58:33      10160021

     INQUIRY
     FROM XXXXXXXX90018
AVAIL          GHC1260.20  
LEDGER         GHC1260.20  
[020t 05:58:06 CARD TAKEN

==> 17.txt <==
[020t CARD INSERTED
[020tCARD: *************1972
DATE 01-02-16    TIME 06:43:53
 01 FEB 2016     06:44:56      10160021
5029110111271972
4490    4490
     INQUIRY
     FROM XXXXXXXX23013
AVAIL          GHC14.28
LEDGER         GHC14.28
[020t 06:44:25 CARD TAKEN

==> 8.txt <==
[020t CARD INSERTED
[020tCARD: *************5845
DATE 01-02-16    TIME 05:45:52
 05:46:26 GENAC 1 : ARQC
EXTERNAL AUTHENTICATE: NO ARPC
 05:46:30 GENAC 2 : AAC
 01 FEB 2016     05:47:41      10160021

     WITHDRAW
     FROM XXXXXXXX    ?
INVALID TRANSCATION ON TERMINAL.
-----------------------
[020t 05:47:05 CARD TAKEN

相关内容