我想问一个关于在 Linux 上连接文件中的行的问题。
这是文件的一部分:
"64f94e8e-4bd2-4721-95e7-7151cfbf3eb6"|"fanel"|""|""|""|"2020-05-20 12:23:54.611000+00:00"
"d955466b-125d-42dd-a1f6-a31e1ff6889b"|"Rancher_rke_1_0_8_Linux"|""|""|"Rancher 13 images scan:
199M May 20 12:58 calico-cni.tar
54M May 20 12:58 calico-kube-controllers.tar
252M May 20 12:58 calico-node.tar
108M May 20 12:58 calico-pod2daemon-flexvol.tar
40M May 20 12:58 cluster-proportional-autoscaler.tar
40M May 20 12:58 coredns-coredns.tar
82M May 20 12:58 coreos-etcd.tar
1.5G May 20 12:58 hyperkube.tar
40M May 20 12:58 metrics-server.tar
5.0M May 20 12:58 nginx-ingress-controller-defaultbackend.tar
488M May 20 12:58 nginx-ingress-controller.tar
737K May 20 12:58 pause.tar
129M May 20 12:58 rke-tools.tar
"|"2020-05-20 09:56:43.388000+00:00"
"853f2f8b-563b-4ea3-a465-7be4510ce66f"|"SPOT_NEWSCAN_747"|""|""|""|"2020-06-09 08:11:02.563000+00:00"
"f3d735a7-7883-48c1-91cc-28075d300d62"|"kinesis-stream-manager"|""|""|""|"2020-05-21 06:50:15.440000+00:00"
行以 开头"
并以 结束"
,但以 开头的行"d955466b
不以 结尾,"
并继续到其他行。
我可以找到有问题的行,sed -n '/\"$/!p' PROJECT2.csv
但我不知道如何仅加入有问题的行。
最终结果需要是:
"d955466b-125d-42dd-a1f6-a31e1ff6889b"|"Rancher_rke_1_0_8_Linux"|""|""|"Rancher 13 images scan:199M May 20 12:58 calico-cni.tar54M May 20 12:58 calico-kube-controllers.tar252M May 20 12:58 calico-node.tar108M May 20 12:58 calico-pod2daemon-flexvol.tar40M May 20 12:58 cluster-proportional-autoscaler.tar40M May 20 12:58 coredns-coredns.tar82M May 20 12:58 coreos-etcd.tar1.5G May 20 12:58 hyperkube.tar40M May 20 12:58 metrics-server.tar5.0M May 20 12:58 nginx-ingress-controller-defaultbackend.tar488M May 20 12:58 nginx-ingress-controller.tar737K May 20 12:58 pause.tar129M May 20 12:58 rke-tools.tar"|"2020-05-20 09:56:43.388000+00:00"
答案1
尝试计算双引号并添加行,直到计数为偶数:
awk '{while (gsub(/"/, "&")%2) {getline X; $0 = $0 X}} 1' file
答案2
删除不在 后面的所有换行符"
:
$ perl -pe 's/(?<!")\n//' file
"64f94e8e-4bd2-4721-95e7-7151cfbf3eb6"|"fanel"|""|""|""|"2020-05-20 12:23:54.611000+00:00"
"d955466b-125d-42dd-a1f6-a31e1ff6889b"|"Rancher_rke_1_0_8_Linux"|""|""|"Rancher 13 images scan:199M May 20 12:58 calico-cni.tar54M May 20 12:58 calico-kube-controllers.tar252M May 20 12:58 calico-node.tar108M May 20 12:58 calico-pod2daemon-flexvol.tar40M May 20 12:58 cluster-proportional-autoscaler.tar40M May 20 12:58 coredns-coredns.tar82M May 20 12:58 coreos-etcd.tar1.5G May 20 12:58 hyperkube.tar40M May 20 12:58 metrics-server.tar5.0M May 20 12:58 nginx-ingress-controller-defaultbackend.tar488M May 20 12:58 nginx-ingress-controller.tar737K May 20 12:58 pause.tar129M May 20 12:58 rke-tools.tar"|"2020-05-20 09:56:43.388000+00:00"
"853f2f8b-563b-4ea3-a465-7be4510ce66f"|"SPOT_NEWSCAN_747"|""|""|""|"2020-06-09 08:11:02.563000+00:00"
"f3d735a7-7883-48c1-91cc-28075d300d62"|"kinesis-stream-manager"|""|""|""|"2020-05-21 06:50:15.440000+00:00"
或者
$ perl -pe 's/([^"])\n/$1/' file
"64f94e8e-4bd2-4721-95e7-7151cfbf3eb6"|"fanel"|""|""|""|"2020-05-20 12:23:54.611000+00:00"
"d955466b-125d-42dd-a1f6-a31e1ff6889b"|"Rancher_rke_1_0_8_Linux"|""|""|"Rancher 13 images scan:199M May 20 12:58 calico-cni.tar54M May 20 12:58 calico-kube-controllers.tar252M May 20 12:58 calico-node.tar108M May 20 12:58 calico-pod2daemon-flexvol.tar40M May 20 12:58 cluster-proportional-autoscaler.tar40M May 20 12:58 coredns-coredns.tar82M May 20 12:58 coreos-etcd.tar1.5G May 20 12:58 hyperkube.tar40M May 20 12:58 metrics-server.tar5.0M May 20 12:58 nginx-ingress-controller-defaultbackend.tar488M May 20 12:58 nginx-ingress-controller.tar737K May 20 12:58 pause.tar129M May 20 12:58 rke-tools.tar"|"2020-05-20 09:56:43.388000+00:00"
"853f2f8b-563b-4ea3-a465-7be4510ce66f"|"SPOT_NEWSCAN_747"|""|""|""|"2020-06-09 08:11:02.563000+00:00"
"f3d735a7-7883-48c1-91cc-28075d300d62"|"kinesis-stream-manager"|""|""|""|"2020-05-21 06:50:15.440000+00:00"
或者,与 GNU 相同sed
:
$ sed -Ez 's/([^"])\n/\1/g' file
"64f94e8e-4bd2-4721-95e7-7151cfbf3eb6"|"fanel"|""|""|""|"2020-05-20 12:23:54.611000+00:00"
"d955466b-125d-42dd-a1f6-a31e1ff6889b"|"Rancher_rke_1_0_8_Linux"|""|""|"Rancher 13 images scan:199M May 20 12:58 calico-cni.tar54M May 20 12:58 calico-kube-controllers.tar252M May 20 12:58 calico-node.tar108M May 20 12:58 calico-pod2daemon-flexvol.tar40M May 20 12:58 cluster-proportional-autoscaler.tar40M May 20 12:58 coredns-coredns.tar82M May 20 12:58 coreos-etcd.tar1.5G May 20 12:58 hyperkube.tar40M May 20 12:58 metrics-server.tar5.0M May 20 12:58 nginx-ingress-controller-defaultbackend.tar488M May 20 12:58 nginx-ingress-controller.tar737K May 20 12:58 pause.tar129M May 20 12:58 rke-tools.tar"|"2020-05-20 09:56:43.388000+00:00"
"853f2f8b-563b-4ea3-a465-7be4510ce66f"|"SPOT_NEWSCAN_747"|""|""|""|"2020-06-09 08:11:02.563000+00:00"
"f3d735a7-7883-48c1-91cc-28075d300d62"|"kinesis-stream-manager"|""|""|""|"2020-05-21 06:50:15.440000+00:00"
答案3
其中一种方法是根据行尾是否存在双引号来分配 ORS 值。
awk 'BEGIN { a[1] = ORS }
{ ORS = a[/"$/] };1' file
另一种方法是使用 GNU ed 编辑器的 join 命令和行寻址:
ed -s file <<\eof
v/"$/ .,//j
,p
Q
eof
awk 可以以一对一的方式移植到 Perl:
$ perl -lpe '$\ = ($,,$/)[/"$/]' file
这是在扩展正则表达式模式 (-E) 下使用 GNU sed 的方法
sed -E '
/"$/b
:loop
$q;N;s/([^"])\n/\1/
tloop
P;D
' file
这是使用 csplit 实用程序的另一种方法:
cat - file <<\eof |\
csplit -sz - '/"$/' '{*}'
"."
eof
for f in xx*; do {
head -n 1 -
cat - | tr -d '\n'
} < "$f"
done | sed 1d
答案4
如果您的文件是从 Excel 等 MS 工具输出的,那么很可能真正的行结尾是\r\n
,换行符中间字段只是\n
s,那么您所需要的就是以下内容,使用 GNU awk 进行多字符 RS:
$ awk -v RS='\r\n' '{gsub(/\n/,"")}1' file
"64f94e8e-4bd2-4721-95e7-7151cfbf3eb6"|"fanel"|""|""|""|"2020-05-20 12:23:54.611000+00:00"
"d955466b-125d-42dd-a1f6-a31e1ff6889b"|"Rancher_rke_1_0_8_Linux"|""|""|"Rancher 13 images scan:199M May 20 12:58 calico-cni.tar54M May 20 12:58 calico-kube-controllers.tar252M May 20 12:58 calico-node.tar108M May 20 12:58 calico-pod2daemon-flexvol.tar40M May 20 12:58 cluster-proportional-autoscaler.tar40M May 20 12:58 coredns-coredns.tar82M May 20 12:58 coreos-etcd.tar1.5G May 20 12:58 hyperkube.tar40M May 20 12:58 metrics-server.tar5.0M May 20 12:58 nginx-ingress-controller-defaultbackend.tar488M May 20 12:58 nginx-ingress-controller.tar737K May 20 12:58 pause.tar129M May 20 12:58 rke-tools.tar"|"2020-05-20 09:56:43.388000+00:00"
"853f2f8b-563b-4ea3-a465-7be4510ce66f"|"SPOT_NEWSCAN_747"|""|""|""|"2020-06-09 08:11:02.563000+00:00"
"f3d735a7-7883-48c1-91cc-28075d300d62"|"kinesis-stream-manager"|""|""|""|"2020-05-21 06:50:15.440000+00:00"
上面的代码是在此输入文件上运行的:
$ cat -Ev file
"64f94e8e-4bd2-4721-95e7-7151cfbf3eb6"|"fanel"|""|""|""|"2020-05-20 12:23:54.611000+00:00"^M$
"d955466b-125d-42dd-a1f6-a31e1ff6889b"|"Rancher_rke_1_0_8_Linux"|""|""|"Rancher 13 images scan:$
199M May 20 12:58 calico-cni.tar$
54M May 20 12:58 calico-kube-controllers.tar$
252M May 20 12:58 calico-node.tar$
108M May 20 12:58 calico-pod2daemon-flexvol.tar$
40M May 20 12:58 cluster-proportional-autoscaler.tar$
40M May 20 12:58 coredns-coredns.tar$
82M May 20 12:58 coreos-etcd.tar$
1.5G May 20 12:58 hyperkube.tar$
40M May 20 12:58 metrics-server.tar$
5.0M May 20 12:58 nginx-ingress-controller-defaultbackend.tar$
488M May 20 12:58 nginx-ingress-controller.tar$
737K May 20 12:58 pause.tar$
129M May 20 12:58 rke-tools.tar$
"|"2020-05-20 09:56:43.388000+00:00"^M$
"853f2f8b-563b-4ea3-a465-7be4510ce66f"|"SPOT_NEWSCAN_747"|""|""|""|"2020-06-09 08:11:02.563000+00:00"^M$
"f3d735a7-7883-48c1-91cc-28075d300d62"|"kinesis-stream-manager"|""|""|""|"2020-05-21 06:50:15.440000+00:00"^M$