我有一个很大的非标准 xhtml 文件,我正在使用 sed (大约 4 次)来迭代到将数据放入 MySQL 数据库所需的基础知识。我最后的一点挣扎。该文件的格式如下:
Tue Aug 18 2015
0,0,0,0,0
0,0,0,2,275
0,0,0,3,287
0,0,0,0,327
0,0,0,3,335
0,0,0,0,413
Wed Aug 19 2015
0,0,0,0,0
0,0,0,2,275
0,0,0,3,287
0,0,0,2,308
Thu Aug 20 2015
0,0,0,0,0
0,0,0,2,458
0,0,0,3,469
0,0,0,0,472
0,0,0,3,503
0,0,0,2,534
在这一天之前总是有一个空间。日期之后可以有任意数量的 CSV 值行。
我希望能够实现的是:
Tue Aug 18 2015,0,0,0,0,0
Tue Aug 18 2015,0,0,0,2,275
Tue Aug 18 2015,0,0,0,3,287
Tue Aug 18 2015,0,0,0,0,327
Tue Aug 18 2015,0,0,0,3,335
Tue Aug 18 2015,0,0,0,0,413
Wed Aug 19 2015,0,0,0,0,0
Wed Aug 19 2015,0,0,0,2,275
Wed Aug 19 2015,0,0,0,3,287
Wed Aug 19 2015,0,0,0,2,308
Thu Aug 20 2015,0,0,0,0,0
Thu Aug 20 2015,0,0,0,2,458
Thu Aug 20 2015,0,0,0,3,469
Thu Aug 20 2015,0,0,0,0,472
Thu Aug 20 2015,0,0,0,3,503
Thu Aug 20 2015,0,0,0,2,534
如果可能的话,去掉日期并添加一些逗号,以便在 PHP 脚本中更容易操作,例如:
Aug,18,2015,0,0,0,0,0
Aug,18,2015,0,0,0,2,275
Aug,18,2015,0,0,0,3,287
Aug,18,2015,0,0,0,0,327
Aug,18,2015,0,0,0,3,335
Aug,18,2015,0,0,0,0,413
Aug,19,2015,0,0,0,0,0
Aug,19,2015,0,0,0,2,275
Aug,19,2015,0,0,0,3,287
Aug,19,2015,0,0,0,2,308
Aug,20,2015,0,0,0,0,0
Aug,20,2015,0,0,0,2,458
Aug,20,2015,0,0,0,3,469
Aug,20,2015,0,0,0,0,472
Aug,20,2015,0,0,0,3,503
Aug,20,2015,0,0,0,2,534
有没有一些命令可以使用?
答案1
这是一种方法:
sed '/,/!{ # if there's no comma on this line
y/ /,/ # translate spaces to commas
h # copy pattern space over the hold buffer
d # delete pattern space
}
//{ # if the line contains commas
G # append hold space content to pattern space
s/\(.*\)\n,[^,]*,\(.*\)/\2,\1/ # swap lines removing newline, the day part and
} # first two commas and adding a comma after year
' infile
如果您更喜欢gnu sed
单行:
sed -E '/,/!{y/ /,/;h;d};//{G;s/(.*)\n,[^,]*,(.*)/\2,\1/}' infile
它类似于awk
:
如果该行不包含逗号,您可以通过格式化日期sprintf
,将结果保存到变量中,例如dt
,然后去next
记录。否则只需添加dt
到$0
(即当前行):
awk '!/,/{dt=sprintf("%s,%s,%s,", $2, $3, $4);next};$0=dt$0' infile
答案2
awk -F, -v OFS=, '/^[[:blank:]]+/ {
str=gensub(/ /,",","g",$0);
sub(/^,+[^,]+,/,"",str);
next
};
!/^[[:blank:]]+/ {print str,$0}' nick.txt
(当然,这可以全部写在一行上。我将其编写为一行并进行了测试,然后添加了换行符和缩进以使其在此处更具可读性)
对于以一个或空白字符(即空格或制表符)开头的行,此awk
脚本将所有空格转换为逗号,将修改后的行保存在名为 的变量中str
,然后删除初始逗号以及所有文本(包括该行)下一个逗号。
对于不以空白字符开头的行,它会打印以当前值 为前缀的行str
。
警告:如果有任何 CSV 数据行前第一个日期行,这些行将仅以单个逗号作为前缀打印。
输出:
Aug,18,2015,0,0,0,0,0
Aug,18,2015,0,0,0,2,275
Aug,18,2015,0,0,0,3,287
Aug,18,2015,0,0,0,0,327
Aug,18,2015,0,0,0,3,335
Aug,18,2015,0,0,0,0,413
Aug,19,2015,0,0,0,0,0
Aug,19,2015,0,0,0,2,275
Aug,19,2015,0,0,0,3,287
Aug,19,2015,0,0,0,2,308
Aug,20,2015,0,0,0,0,0
Aug,20,2015,0,0,0,2,458
Aug,20,2015,0,0,0,3,469
Aug,20,2015,0,0,0,0,472
Aug,20,2015,0,0,0,3,503
Aug,20,2015,0,0,0,2,534