如何在 Perl 中将此时间戳格式转换为另一种格式?

如何在 Perl 中将此时间戳格式转换为另一种格式?

我正在尝试设计一种 Perl/... 方法,将我的时间戳格式 ( ) 转换为 WEKA 数据分析系统使用的ddMMyyyy-HHmm+0300时间戳/时间/... 格式 ( )。yyyy-MM-dd'T'HH:mm:00我最初是通过命令制作 WEKA 数据文件,paste并使用AWK.不应该有任何限制使问题比实际更困难,但可能是第一个变量中的引号。我认为方法(3)是最可行的,即直接使用POSIX::strftime函数(Deathgrip)

  1. 第 1 节中的难题
  2. 第 2 节中数据中没有引号的更简单方法
  3. POSIX::strftime方法和类似的线程Perl strptime 格式与 strftime 不同

输入示例

23072017-2200+0300
  • 预期产出

    2017-07-23'T'22:00:00
    

Full example of CSV line without quotes but with underscores so can be harder

 Ni, Aika, Aika_l, Un, Unen, Unen_kesto, Uniluokat_R, Uniluokat_k, Uniluokat_s, HRV_RMSSD_a, HRV_RMSSD_i, Kokonaisp, Palautumisen_k, Hermoston_t, Syke_ave_m, Syke_a, Syke_l, Hengitystiheys_ave_m, Hengitystiheys_a, Hengitystiheys_min_a, Liikeaktiivisuus_l, Liikeaktiivisuus_a, Paivamaara_l
 "Masi", 23072010-2200+0300, 24072010-0600+0300, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010
 "Masi", 23072010-2200+0300, 24072010-0600+0300, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010

预期产出

 Ni, Aika, Aika_l, Un, Unen, Unen_kesto, Uniluokat_R, Uniluokat_k, Uniluokat_s, HRV_RMSSD_a, HRV_RMSSD_i, Kokonaisp, Palautumisen_k, Hermoston_t, Syke_ave_m, Syke_a, Syke_l, Hengitystiheys_ave_m, Hengitystiheys_a, Hengitystiheys_min_a, Liikeaktiivisuus_l, Liikeaktiivisuus_a, Paivamaara_l
 "Masi", 2010-07-23'T‌​'22:00:00, 2010-07-24'T‌​'06:00:00, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010
 "Masi", 2010-07-23'T‌​'22:00:00, 2010-07-24'T‌​'06:00:00, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010

1.尝试可以调用的脚本script.pl filename

我认为解析器的使用Text::CSV太复杂,因为我的数据集比用例简单。所以我认为一个简单的正则表达式方法是可能的

#!/usr/bin/env perl
# https://stackoverflow.com/a/33995620/54964

## Data prepared like this for the script
# paste -d" " log.csv data.csv | awk '{$1=""; print $0}' > weka.data.csv
# cp $HOME/Data/weka.data.csv $HOME/Workspace/
#
# Maybe, this all could be integrated into Perl script

use strict;
use warnings;

use Text::CSV;

my $csv = Text::CSV->new( { binary => 1, eol => "\n" } );

while ( my $row = $csv->getline( \*ARGV ) ) {
    s/\n/ /g for @$row;
    $csv->print( \*STDOUT, $row );

    # TODO regex
    #convert ddMMyyyy-HHmm+0300 to yyyy-MM-dd'T'HH:mm:00    
}

2. Perl 正则表达式方法

该方法无法工作的伪代码,因为没有变量替换(例如携带dd结果)

# TODO s/ddMMyyyy-HHmm+0300/$3-$2-$1'T'$4:$5:00/;
perl -pe s/([0-3][0-9])(([0-1][0-9]))(20[0-9]{2})([0-2][0-9])([0-5][0-9])+0300/$3-$2-$1'T'$4:$5:00/;

在哪里

  • dd经过([0-3][0-9])/$3
  • 类似地,MM对于([0-1][0-9])/$2
  • yyyy类似地像(20[0-9]{2})/$1
  • -字面上地
  • HH24H 时间由([0-5][0-9])/$4
  • mm经过([0-5][0-9])) /$5
  • +0300/ 简单地删除

如果正则表达式具有更易读的格式,那就太好了。

在评论中测试 Sundeep 的提案

代码

#!/bin/bash
# https://stackoverflow.com/a/33995620/54964

s='"Masi", 23072010-2200+0300, 24072010-0600+0300 70, 7h40'

echo "$s" | perl -pe 's/\b(\d\d)(\d\d)(\d{4})-(\d\d)(\d\d)\+\d{4}\b/$3-$2-$1\x27T<200c><200b>\x27$4:$5:00/g' y $csv = Text::CSV->new( { binary => 1, eol => "\n" } );

一行的输出与预期一致

"Masi", 2010-07-23'T‌​'22:00:00, 2010-07-24'T‌​'06:00:00, 70, 7h40

只需替换变量内容即可应用于整行s,按预期输出

"Masi", 2010-07-23'T‌​'22:00:00, 2010-07-24'T‌​'06:00:00, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010

TODO 完整方法,具有多行方法,能够跳过标题

测试 Deathgrip 的积极提议

代码

#!/usr/bin/env perl
# https://stackoverflow.com/a/33995620/54964

use strict;
use warnings;
# https://stackoverflow.com/a/20007784/54964
# http://perldoc.perl.org/POSIX.html
use Time::Piece;
use POSIX;

# TODO breaks because of false brackets
#my $input = '"Masi", 2010-07-23'T<200c><200b>'22:00:00, 2010-07-24'T<200c><200b>'06:00:00, 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010'

my $str = '23072017-2200+0300';
my $f = '%d%m%dY-%H%M+0300';
#my $t = POSIX::strftime($str, $f); # fails!
my $t = strftime($str, $f); # fails!

print "$t\n";

输出

Usage: POSIX::strftime(fmt, sec, min, hour, mday, mon, year, wday = -1, yday = -1, isdst = -1) at prepare.data3.pl line 22.

操作系统:Debian 9

答案1

$ perl -pe 's/\b(\d\d)(\d\d)(\d{4})-(\d\d)(\d\d)\+\d{4}\b/$3-$2-$1\x27T\x27$4:$5:00/g' ip.csv
 Ni, Aika, Aika_l Un, Unen, Unen_kesto, Uniluokat_R, Uniluokat_k, Uniluokat_s, HRV_RMSSD_a, HRV_RMSSD_i, Kokonaisp, Palautumisen_k, Hermoston_t, Syke_ave_m, Syke_a, Syke_l, Hengitystiheys_ave_m, Hengitystiheys_a, Hengitystiheys_min_a, Liikeaktiivisuus_l, Liikeaktiivisuus_a, Paivamaara_l
 "Masi", 2010-07-23'T'22:00:00, 2010-07-24'T'06:00:00 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010
 "Masi", 2010-07-23'T'22:00:00, 2010-07-24'T'06:00:00 70, 7h40, 6h30, 1h40, 3h40, 1h10, 67.0, 43.0, 24.0, 430, 30, 70, 50, 40, 20, 10, 10, 150, 260, 24.10.2010
  • \b是单词边界
  • (\d\d)捕获两个连续数字,(\d{4})捕获其中四个数字,依此类推
  • \x27用于单引号。如果后面可能有不相关的数字,也许最​​好使用八进制表示\047
  • 由于搜索和替换仅针对特定ddMMyyyy-HHmm+0300格式,因此不会影响标头。不过如果需要的话,只需if $.>1在替代命令后面添加即可

也许paste+awk用于创建输入的命令可以轻松地合并到此命令中,但需要将该信息添加到问题中

答案2

这是我会做的:

#!/usr/bin/env perl
# https://stackoverflow.com/a/33995620/54964

use strict;
use warnings;
# https://stackoverflow.com/a/20007784/54964
# http://perldoc.perl.org/POSIX.html
use POSIX qw(strftime);
use DateTime;
use DateTime::Format::Strptime qw(strptime);

my $str = '23072017-2200+0300';
my $dtime = strptime( '%d%m%Y-%H%M%z', $str );
my $f = '%Y-%m-%d\'T\'%H:%M:%S';
my $t = strftime( $f, 0, $dtime->minute, $dtime->hour, $dtime->day, $dtime->month-1, $dtime->year-1900, -1, -1, $dtime->time_zone );

print "$t\n";

在时间字段上按预期输出

2017-07-23'T'22:00:00

相关内容