为什么扩展正则表达式在命令行输入时有效,但在从文件读取时无效

为什么扩展正则表达式在命令行输入时有效,但在从文件读取时无效

使用 Windows 10 开发人员模式。当我在命令提示符中执行时,我的正则表达式可以完美运行。在以逗号分隔的字符串中,将 mm/dd/yyyy hh:mm 替换为 yyyy-mm-dd。当我从文件读取输入时,它不起作用。

执行单行就可以了。

cka_ubuntu@AFSWWM102QEIQ1:/mnt/c/tst$ echo '12639519F0011,P00001,89813.83,10/10/2018,10/10/2018 0:00,10/18/2018 0:00,10/18/2018 0:00,,12,10/10/2018 12:26' | sed -E 's,([0-9]{1}|[0-9]{2})/([0-9]{1}|[0-9]{2})/([0-9]{4}),\3-\2-\1,g;s,\s([0-9]{1}|[0-9]{2}):([0-9]{1}|[0-9]{2}),,g'
12639519F0011,P00001,89813.83,2018-10-10,2018-10-10,2018-18-10,2018-18-10,,12,2018-10-10

问题。当我在一个文件中有多行时,现在可以使用。命令使用(从 input.csv 读取并导出 test01.csv

cka_ubuntu@AFSWWM102QEIQ1:/mnt/c/tst$ sed -E 's,([0-9]{1}|[0-9]{2})/([0-9]{1}|[0-9]{2})/([0-9]{4}),\3-\2-\1,g;s,\s([0-9]{1}|[0-9]{2}):([0-9]{1}|[0-9]{2}),,g' input.csv >  test01.csv

输入文件:

award_id_piid,modification_number,potential_total_value_of_award,action_date,period_of_performance_start_date,period_of_performance_current_end_date,period_of_performance_potential_end_date,ordering_period_end_date,awarding_agency_code,last_modified_date
68HE0418F0516,P00001,48876.44,10/10/2018,10/10/2018 0:00,12/1/2019 0:00,12/1/2019 0:00,,68,10/10/2018 8:13
12639519F0011,P00001,89813.83,10/10/2018,10/10/2018 0:00,10/18/2018 0:00,10/18/2018 0:00,,12,10/10/2018 12:26
GS35F497CA,PM0011,475000,10/10/2018,10/10/2018 6:03,,,9/16/2020,47,10/10/2018 6:39
15B41918PTP440004,P00004,617912.96,10/10/2018,10/10/2018 0:00,10/10/2018 0:00,10/10/2018 0:00,,15,10/10/2018 12:36
15B31019PUA130001,0,23925,10/10/2018,10/1/2018 0:00,10/10/2018 0:00,10/10/2018 0:00,,15,10/10/2018 14:03

我做错什么了吗?

答案1

问题是,我通过拖放 csv 文件通过 Excel 验证 csv 文件。Excel 必须根据其默认配置格式化日期。使用 sed 替换可以正常工作。当我在命令提示符下查看文件时,数据显示正确。

答案2

与米勒(http://johnkerl.org/miller/doc/),使用正则表达式

mlr --csv put '$last_modified_date=gsub($last_modified_date,"^([0-9]{1,2})(/)([0-9]{1,2})(/)([0-9]{4})(.*)$","\5-\3-\1");
$period_of_performance_start_date=gsub($period_of_performance_start_date,"^([0-9]{1,2})(/)([0-9]{1,2})(/)([0-9]{4})(.*)$","\5-\3-\1");
$period_of_performance_potential_end_date=gsub($period_of_performance_potential_end_date,"^([0-9]{1,2})(/)([0-9]{1,2})(/)([0-9]{4})(.*)$","\5-\3-\1");
$period_of_performance_current_end_date=gsub($period_of_performance_current_end_date,"^([0-9]{1,2})(/)([0-9]{1,2})(/)([0-9]{4})(.*)$","\5-\3-\1")' input.csv

你有

award_id_piid,modification_number,potential_total_value_of_award,action_date,period_of_performance_start_date,period_of_performance_current_end_date,period_of_performance_potential_end_date,ordering_period_end_date,awarding_agency_code,last_modified_date
68HE0418F0516,P00001,48876.44,10/10/2018,2018-10-10,2019-1-12,2019-1-12,,68,2018-10-10
12639519F0011,P00001,89813.83,10/10/2018,2018-10-10,2018-18-10,2018-18-10,,12,2018-10-10
GS35F497CA,PM0011,475000,10/10/2018,2018-10-10,,,9/16/2020,47,2018-10-10
15B41918PTP440004,P00004,617912.96,10/10/2018,2018-10-10,2018-10-10,2018-10-10,,15,2018-10-10
15B31019PUA130001,0,23925,10/10/2018,2018-1-10,2018-10-10,2018-10-10,,15,2018-10-10

相关内容