正确格式化 CSV 文件,以便正确从 CSV 中获取数据

正确格式化 CSV 文件,以便正确从 CSV 中获取数据

我有一个 CSV 文件,如下所示:

我想INITIAL OFFER从此文件中删除“”块并仅保留“ FINAL OFFER”块我还想从第一个字段中删除逗号(,),并从最后一列中删除多余的空格,以便更轻松地搜索这些列。

输入

500076592,      INITIAL OFFER
500076592,|11|1|1|100 MB|2 Minutes|1.0 SAR
500076592,|11|2|3|300 MB|5 Minutes|3.0 SAR
500076592,|1|1|1|100 MB|NA|0.5 SAR
500076592,|1|2|3|300 MB|NA|1.5 SAR
500076592,|1|4|7|1000 MB|NA|5.0 SAR
500076592,|2|1|1|4096 MB|NA|1.5 SAR
500076592,|2|2|3|6144 MB|NA|2.0 SAR
500076592,|2|4|7|10240 MB|NA|4.0 SAR
500076592,|5|1|1|4096 MB|NA|2.0 SAR
500076592,|5|2|3|6144 MB|NA|2.5 SAR
500076592,|5|4|7|10240 MB|NA|5.0 SAR
500076592,|6|1|1|NA|2 Minutes|0.5 SAR
500076592,|6|2|3|NA|5 Minutes|1.5 SAR
500076592,|6|4|7|NA|10 Minutes|3.0 SAR
500076592,
500076592,|FINAL OFFER
500076592,|2|1|1|4096 MB|NA|1.5 SAR
500076592,|2|2|3|6144 MB|NA|2.0 SAR
500076592,|2|4|7|10240 MB|NA|4.0 SAR
500076592,|5|1|1|4096 MB|NA|2.0 SAR
500076592,|5|2|3|6144 MB|NA|2.5 SAR
500076592,|5|4|7|10240 MB|NA|5.0 SAR
500076592,|1|1|1|100 MB|NA|0.5 SAR
500076592,|1|2|3|300 MB|NA|1.5 SAR
500076592,|1|4|7|1000 MB|NA|5.0 SAR
500076592,|11|1|1|100 MB|2 Minutes|1.0 SAR
500076592,|11|2|3|300 MB|5 Minutes|3.0 SAR
500076592,|6|1|1|NA|2 Minutes|0.5 SAR
500076592,|6|2|3|NA|5 Minutes|1.5 SAR
500076592,|6|4|7|NA|10 Minutes|3.0 SAR
500076592,
500028952,      INITIAL OFFER
500028952,|11|1|1|250 MB|2 Minutes|3.0 SAR
500028952,|11|2|3|650 MB|10 Minutes|8.0 SAR
500028952,|11|4|7|1550 MB|30 Minutes|18.5 SAR
500028952,|1|1|1|250 MB|NA|2.5 SAR
500028952,|1|2|3|650 MB|NA|6.5 SAR
500028952,|1|4|7|1550 MB|NA|15.5 SAR
500028952,|2|1|1|4096 MB|NA|1.5 SAR
500028952,|2|2|3|6144 MB|NA|2.0 SAR
500028952,|2|4|7|10240 MB|NA|4.0 SAR
500028952,|5|1|1|4096 MB|NA|2.0 SAR
500028952,|5|2|3|6144 MB|NA|2.5 SAR
500028952,|5|4|7|10240 MB|NA|5.0 SAR
500028952,|6|1|1|NA|2 Minutes|0.5 SAR
500028952,|6|2|3|NA|10 Minutes|1.5 SAR
500028952,|6|4|7|NA|30 Minutes|3.0 SAR
500028952,
500028952,|FINAL OFFER
500028952,|2|1|1|4096 MB|NA|1.5 SAR
500028952,|2|2|3|6144 MB|NA|2.0 SAR
500028952,|2|4|7|10240 MB|NA|4.0 SAR
500028952,|1|1|1|250 MB|NA|2.5 SAR
500028952,|1|2|3|650 MB|NA|6.5 SAR
500028952,|1|4|7|1550 MB|NA|15.5 SAR
500028952,|11|1|1|250 MB|2 Minutes|3.0 SAR
500028952,|11|2|3|650 MB|10 Minutes|8.0 SAR
500028952,|11|4|7|1550 MB|30 Minutes|18.5 SAR
500028952,|5|1|1|4096 MB|NA|2.0 SAR
500028952,|5|2|3|6144 MB|NA|2.5 SAR
500028952,|5|4|7|10240 MB|NA|5.0 SAR
500028952,|6|1|1|NA|2 Minutes|0.5 SAR
500028952,|6|2|3|NA|10 Minutes|1.5 SAR
500028952,|6|4|7|NA|30 Minutes|3.0 SAR
500028952,

输出

500076592,|FINAL OFFER
500076592,|2|1|1|4096 MB|NA|1.5 SAR
500076592,|2|2|3|6144 MB|NA|2.0 SAR
500076592,|2|4|7|10240 MB|NA|4.0 SAR
500076592,|5|1|1|4096 MB|NA|2.0 SAR
500076592,|5|2|3|6144 MB|NA|2.5 SAR
500076592,|5|4|7|10240 MB|NA|5.0 SAR
500076592,|1|1|1|100 MB|NA|0.5 SAR
500076592,|1|2|3|300 MB|NA|1.5 SAR
500076592,|1|4|7|1000 MB|NA|5.0 SAR
500076592,|11|1|1|100 MB|2 Minutes|1.0 SAR
500076592,|11|2|3|300 MB|5 Minutes|3.0 SAR
500076592,|6|1|1|NA|2 Minutes|0.5 SAR
500076592,|6|2|3|NA|5 Minutes|1.5 SAR
500076592,|6|4|7|NA|10 Minutes|3.0 SAR
500028952,|FINAL OFFER
500028952,|2|1|1|4096 MB|NA|1.5 SAR
500028952,|2|2|3|6144 MB|NA|2.0 SAR
500028952,|2|4|7|10240 MB|NA|4.0 SAR
500028952,|1|1|1|250 MB|NA|2.5 SAR
500028952,|1|2|3|650 MB|NA|6.5 SAR
500028952,|1|4|7|1550 MB|NA|15.5 SAR
500028952,|11|1|1|250 MB|2 Minutes|3.0 SAR
500028952,|11|2|3|650 MB|10 Minutes|8.0 SAR
500028952,|11|4|7|1550 MB|30 Minutes|18.5 SAR
500028952,|5|1|1|4096 MB|NA|2.0 SAR
500028952,|5|2|3|6144 MB|NA|2.5 SAR
500028952,|5|4|7|10240 MB|NA|5.0 SAR
500028952,|6|1|1|NA|2 Minutes|0.5 SAR
500028952,|6|2|3|NA|10 Minutes|1.5 SAR
500028952,|6|4|7|NA|30 Minutes|3.0 SAR
500028952,

答案1

sed -e '/FINAL OFFER/p;/INITIAL OFFER/,/FINAL OFFER/ d' input.csv  > output.csv

这会再次打印 FINAL OFFER 行,因为它即将被范围删除/INITIAL OFFER/,/FINAL OFFER/

答案2

如果您使用管道作为分隔符,则可以轻松awk根据字段数量过滤数据,例如:

awk -F'|' 'NF==2 { f=1 } NF==1 { f=0 } f' infile

打高尔夫球:

awk -F\| 'NF==1{f=0}NF==2{f=1}f'

答案3

您可以使用sed删除INITIAL OFFER和 之间的所有内容,只包含数字和一个逗号:

$ sed '/INITIAL OFFER/,/^[0-9][0-9]*,$/d' file
500076592,|FINAL OFFER
500076592,|2|1|1|4096 MB|NA|1.5 SAR
500076592,|2|2|3|6144 MB|NA|2.0 SAR
500076592,|2|4|7|10240 MB|NA|4.0 SAR
500076592,|5|1|1|4096 MB|NA|2.0 SAR
500076592,|5|2|3|6144 MB|NA|2.5 SAR
500076592,|5|4|7|10240 MB|NA|5.0 SAR
500076592,|1|1|1|100 MB|NA|0.5 SAR
500076592,|1|2|3|300 MB|NA|1.5 SAR
500076592,|1|4|7|1000 MB|NA|5.0 SAR
500076592,|11|1|1|100 MB|2 Minutes|1.0 SAR
500076592,|11|2|3|300 MB|5 Minutes|3.0 SAR
500076592,|6|1|1|NA|2 Minutes|0.5 SAR
500076592,|6|2|3|NA|5 Minutes|1.5 SAR
500076592,|6|4|7|NA|10 Minutes|3.0 SAR
500076592,
500028952,|FINAL OFFER
500028952,|2|1|1|4096 MB|NA|1.5 SAR
500028952,|2|2|3|6144 MB|NA|2.0 SAR
500028952,|2|4|7|10240 MB|NA|4.0 SAR
500028952,|1|1|1|250 MB|NA|2.5 SAR
500028952,|1|2|3|650 MB|NA|6.5 SAR
500028952,|1|4|7|1550 MB|NA|15.5 SAR
500028952,|11|1|1|250 MB|2 Minutes|3.0 SAR
500028952,|11|2|3|650 MB|10 Minutes|8.0 SAR
500028952,|11|4|7|1550 MB|30 Minutes|18.5 SAR
500028952,|5|1|1|4096 MB|NA|2.0 SAR
500028952,|5|2|3|6144 MB|NA|2.5 SAR
500028952,|5|4|7|10240 MB|NA|5.0 SAR
500028952,|6|1|1|NA|2 Minutes|0.5 SAR
500028952,|6|2|3|NA|10 Minutes|1.5 SAR
500028952,|6|4|7|NA|30 Minutes|3.0 SAR
500028952,

如果您不想包含500076592,和行,请使用500028952,@cas的更简单的方法,或者你可以这样做:

$ sed '/INITIAL OFFER/,/^[0-9][0-9]*,$/d; /^[0-9][0-9]*,$/d' file
500076592,|FINAL OFFER
500076592,|2|1|1|4096 MB|NA|1.5 SAR
500076592,|2|2|3|6144 MB|NA|2.0 SAR
500076592,|2|4|7|10240 MB|NA|4.0 SAR
500076592,|5|1|1|4096 MB|NA|2.0 SAR
500076592,|5|2|3|6144 MB|NA|2.5 SAR
500076592,|5|4|7|10240 MB|NA|5.0 SAR
500076592,|1|1|1|100 MB|NA|0.5 SAR
500076592,|1|2|3|300 MB|NA|1.5 SAR
500076592,|1|4|7|1000 MB|NA|5.0 SAR
500076592,|11|1|1|100 MB|2 Minutes|1.0 SAR
500076592,|11|2|3|300 MB|5 Minutes|3.0 SAR
500076592,|6|1|1|NA|2 Minutes|0.5 SAR
500076592,|6|2|3|NA|5 Minutes|1.5 SAR
500076592,|6|4|7|NA|10 Minutes|3.0 SAR
500028952,|FINAL OFFER
500028952,|2|1|1|4096 MB|NA|1.5 SAR
500028952,|2|2|3|6144 MB|NA|2.0 SAR
500028952,|2|4|7|10240 MB|NA|4.0 SAR
500028952,|1|1|1|250 MB|NA|2.5 SAR
500028952,|1|2|3|650 MB|NA|6.5 SAR
500028952,|1|4|7|1550 MB|NA|15.5 SAR
500028952,|11|1|1|250 MB|2 Minutes|3.0 SAR
500028952,|11|2|3|650 MB|10 Minutes|8.0 SAR
500028952,|11|4|7|1550 MB|30 Minutes|18.5 SAR
500028952,|5|1|1|4096 MB|NA|2.0 SAR
500028952,|5|2|3|6144 MB|NA|2.5 SAR
500028952,|5|4|7|10240 MB|NA|5.0 SAR
500028952,|6|1|1|NA|2 Minutes|0.5 SAR
500028952,|6|2|3|NA|10 Minutes|1.5 SAR
500028952,|6|4|7|NA|30 Minutes|3.0 SAR

答案4

使用GNU sed扩展正则表达式模式打开-E

sed -En '
  /^[^|]*\|?[^|]*$/h
  G;/\n.*\|/P
' file

笔记:

  • 停止

相关内容