获取一个字符串并将其放在行首,直到找到下一个字符串

获取一个字符串并将其放在行首,直到找到下一个字符串

我有一个很大的非标准 xhtml 文件,我正在使用 sed (大约 4 次)来迭代到将数据放入 MySQL 数据库所需的基础知识。我最后的一点挣扎。该文件的格式如下:

 Tue Aug 18 2015
0,0,0,0,0
0,0,0,2,275
0,0,0,3,287
0,0,0,0,327
0,0,0,3,335
0,0,0,0,413
 Wed Aug 19 2015
0,0,0,0,0
0,0,0,2,275
0,0,0,3,287
0,0,0,2,308
 Thu Aug 20 2015
0,0,0,0,0
0,0,0,2,458
0,0,0,3,469
0,0,0,0,472
0,0,0,3,503
0,0,0,2,534

在这一天之前总是有一个空间。日期之后可以有任意数量的 CSV 值行。

我希望能够实现的是:

Tue Aug 18 2015,0,0,0,0,0
Tue Aug 18 2015,0,0,0,2,275
Tue Aug 18 2015,0,0,0,3,287
Tue Aug 18 2015,0,0,0,0,327
Tue Aug 18 2015,0,0,0,3,335
Tue Aug 18 2015,0,0,0,0,413
Wed Aug 19 2015,0,0,0,0,0
Wed Aug 19 2015,0,0,0,2,275
Wed Aug 19 2015,0,0,0,3,287
Wed Aug 19 2015,0,0,0,2,308
Thu Aug 20 2015,0,0,0,0,0
Thu Aug 20 2015,0,0,0,2,458
Thu Aug 20 2015,0,0,0,3,469
Thu Aug 20 2015,0,0,0,0,472
Thu Aug 20 2015,0,0,0,3,503
Thu Aug 20 2015,0,0,0,2,534

如果可能的话,去掉日期并添加一些逗号,以便在 PHP 脚本中更容易操作,例如:

Aug,18,2015,0,0,0,0,0
Aug,18,2015,0,0,0,2,275
Aug,18,2015,0,0,0,3,287
Aug,18,2015,0,0,0,0,327
Aug,18,2015,0,0,0,3,335
Aug,18,2015,0,0,0,0,413
Aug,19,2015,0,0,0,0,0
Aug,19,2015,0,0,0,2,275
Aug,19,2015,0,0,0,3,287
Aug,19,2015,0,0,0,2,308
Aug,20,2015,0,0,0,0,0
Aug,20,2015,0,0,0,2,458
Aug,20,2015,0,0,0,3,469
Aug,20,2015,0,0,0,0,472
Aug,20,2015,0,0,0,3,503
Aug,20,2015,0,0,0,2,534

有没有一些命令可以使用?

答案1

这是一种方法:

sed '/,/!{                       # if there's no comma on this line
y/ /,/                           # translate spaces to commas
h                                # copy pattern space over the hold buffer
d                                # delete pattern space
}
//{                              # if the line contains commas
G                                # append hold space content to pattern space
s/\(.*\)\n,[^,]*,\(.*\)/\2,\1/   # swap lines removing newline, the day part and
}                                # first two commas and adding a comma after year
' infile

如果您更喜欢gnu sed单行:

sed -E '/,/!{y/ /,/;h;d};//{G;s/(.*)\n,[^,]*,(.*)/\2,\1/}' infile

它类似于awk
如果该行不包含逗号,您可以通过格式化日期sprintf,将结果保存到变量中,例如dt,然后去next记录。否则只需添加dt$0(即当前行):

awk '!/,/{dt=sprintf("%s,%s,%s,", $2, $3, $4);next};$0=dt$0' infile

答案2

awk -F, -v OFS=, '/^[[:blank:]]+/ {
                      str=gensub(/ /,",","g",$0);
                      sub(/^,+[^,]+,/,"",str);
                      next
                  };

                  !/^[[:blank:]]+/ {print str,$0}' nick.txt

(当然,这可以全部写在一行上。我将其编写为一行并进行了测试,然后添加了换行符和缩进以使其在此处更具可读性)

对于以一个或空白字符(即空格或制表符)开头的行,此awk脚本将所有空格转换为逗号,将修改后的行保存在名为 的变量中str,然后删除初始逗号以及所有文本(包括该行)下一个逗号。

对于不以空白字符开头的行,它会打印以当前值 为前缀的行str

警告:如果有任何 CSV 数据行第一个日期行,这些行将仅以单个逗号作为前缀打印。

输出:

Aug,18,2015,0,0,0,0,0
Aug,18,2015,0,0,0,2,275
Aug,18,2015,0,0,0,3,287
Aug,18,2015,0,0,0,0,327
Aug,18,2015,0,0,0,3,335
Aug,18,2015,0,0,0,0,413
Aug,19,2015,0,0,0,0,0
Aug,19,2015,0,0,0,2,275
Aug,19,2015,0,0,0,3,287
Aug,19,2015,0,0,0,2,308
Aug,20,2015,0,0,0,0,0
Aug,20,2015,0,0,0,2,458
Aug,20,2015,0,0,0,3,469
Aug,20,2015,0,0,0,0,472
Aug,20,2015,0,0,0,3,503
Aug,20,2015,0,0,0,2,534

相关内容