我有来自工具供应商的几个文件,我需要将它们合并在一起。但是,仅使用 csv 文件中的少数列。所以我想删除不相关的列。我曾经sed
对文件进行一些其他格式化:sed -i '1,4d
删除前 4 行,sed -i '/^\"/d'
删除以 开头的所有行"
,sed -i 's/ */ /g'
用一个空格替换多个空格,最后sed -i '\$d'
删除最后一行。
例如:
2608577312,2608577312,"Metal drill bits HSS-PointTeq",AC,"Metal drill bits HSS-PointTeq","HSS PointTeQ 17,0 mm (reduced shank)",197,,102.78,,102.78,,102.78,,,,
06019H2100,06019H2100,"Drill Driver","BE ","Bosch Go","3.6V, 5/ 2.5 Nm, 360 rpm, inbuilt battery",1149,1,819.2869565217393,3,789.313043478261,10,759.3391304347826,,0,0,
06019K30K1,06019K30K1,"Drill Driver","BE ","GSR 185-LI (1 bat.)","18V, 1x 2.0Ah, 50/21 Nm, 35/10mm, brushless, Case",2399,1,1710.5913043478265,5,1648.0086956521743,20,1585.426086956522,,0,0,
06019K7020,06019K7020,"Drywall Screwdriver","BE ","GTB 185-LI","18V, 2x 2.0Ah, max. screw 6mm, brushless, reversible depth gauge, case",4699,1,3350.5913043478267,2,3228.0086956521745,4,3105.426086956522,,0,0,
06019G81K2,06019G81K2,"Impact Drill","BE ","GSB 120-LI + 23 pc acc","12V, 2x 2.0Ah, 30/14 Nm, drilling (wood/steel/masonry) 20/10/8mm, case",2049,1,1461.026086956522,4,1407.5739130434783,8,1354.121739130435,,0,0,
06019F83K6,06019F83K6,"Impact Drill","BE ","GSB 180-LI (1 bat.)","18V, 1x 2.0Ah, 54/21 Nm, drilling (wood/steel/masonry) 35/10/10mm, case",1799,1,1423.5565217391306,6,1376.626086956522,20,1314.0521739130436,,0,0,
06019F83K0,06019F83K0,"Impact Drill","BE ","GSB 180-LI","18V, 2x 2.0Ah, 54/21 Nm, drilling (wood/steel/masonry) 35/10/10mm, case",2799,1,1995.8086956521743,4,1922.7913043478263,10,1849.7739130434784,,0,0,
06019K31K1,06019K31K1,"Impact Drill","BE ","GSB 185-LI (1 bat.)","18V, 1x 2.0Ah, 50/21 Nm, drilling (wood/steel/masonry) 35/10/10mm, brushless, case",2599,1,1853.2,6,1785.4,12,1717.6,,0,0,
06019K31K2,06019K31K2,"Impact Drill","BE ","GSB 185-LI + 23 acc","18V, 2x 2.0Ah, 50/21 Nm, drilling (wood/steel/masonry) 35/10/10mm, brushless, acc., case",3399,1,2423.634782608696,5,2334.9652173913046,10,2246.295652173913,,0,0,
06019H5100,06019H5100,"Impact Drill","BE ","GSB 18V-50","18V, 2x 2.0Ah, 50/28 Nm, drilling (wood/steel/masonry) 35/13/13mm, brusless, case",3999,1,2851.4608695652178,6,2747.139130434783,12,2642.8173913043483,,0,0,
06019H5101,06019H5101,"Impact Drill","BE ","GSB 18V-50","18V, 2x 5.0Ah, 50/28 Nm, drilling (wood/steel/masonry) 35/13/13mm, brushless, L-Boxx",7199,1,5133.200000000001,3,4945.400000000001,6,4757.6,,0,0,
06019K6106,06019K6106,"Impact Drill","BE ","GSB 18V-90 C ","18V, 2x 5.0Ah, 64/36 Nm, drilling (wood/steel) 68/13mm, metal chuck, brushless, connectivity, L-Boxx",9399,1,6701.895652173914,3,6456.7043478260875,6,6211.513043478261,,0,0,
06019F0001,06019F0001,"Impact Driver","BE ","GDR 120-LI","12V, 2x 2.0Ah, 100 Nm, 1/4'' Internal hexagon, case",2499,1,1781.895652173913,3,1716.704347826087,6,1651.513043478261,,0,0,
06019G5223,06019G5223,"Impact Driver/Wrench","BE ","GDX 180-LI","18V, 2x 2.0Ah, 180 Nm, 1/4'' Internal Hexagon / 1/2'' Square, case",3999,1,2851.4608695652178,3,2747.139130434783,6,2642.8173913043483,,0,0,
06019G61K0,06019G61K0,"Impact Wrench","BE ","GDS 250-LI ","18V, 2x 4.0Ah, 250 Nm, 1/2'' Square, case",5599,1,3992.330434782609,2,3846.2695652173916,4,3700.208695652174,,0,0,
06019K0020,06019K0020,"Impact Wrench","BE ","GDS 18V-400","18V, 2x 5.0Ah, 400 Nm, 1/2'' Square, brushless, case",7699,1,5489.721739130436,3,5288.8782608695665,6,5088.034782608696,,0,0,
06019J8502,06019J8502,"Impact Wrench","BE ","GDS 18V-1050 H","18V, 2x 8.0Ah ProCore, 1,050 Nm, 3/4'' Square, biturbo, brushless, L-Boxx",13999,1,9981.895652173915,2,9616.704347826088,5,9251.51304347826,,0,0,
06119231K0,06119231K0,"Rotary Hammer","BE ","GBH 187-LI ","18V, 2x 5.0Ah, 2.4 J, drilling Ø (concrete/steel/wood) 24/13/30 mm, brushless, connectivity, case",10699,1,7628.852173913045,2,7349.7478260869575,3,7070.64347826087,,0,0,
061190600A,061190600A,"Rotary Hammer","BE ","GBH 36 V-LI Plus","36V, 2x 6.0Ah, Max drilling in concrete: 28 mm, SDS-Plus, 3.2 J, case",17699,1,12620.15652173913,2,12158.44347826087,3,11696.730434782608,,0,0,
06019H9021,06019H9021,"Angle Grinder","BE ","GWS 180-LI ","18V, 2x 4.0Ah, 11000 rpm, 125mm, M14, brushless, case",5799,1,4134.939130434783,3,3983.6608695652176,6,3832.3826086956524,,0,0,
06016B0072,06016B0072,Router,"BE ","GKF 12V-8 (solo)","12V, 13 000 rpm, includes a 1/4"" collet",2999,1,2138.4173913043483,2,2060.1826086956526,4,1981.9478260869566,,0,0,
06014A1100,06014A1100,Light,"BE ","GLI 18V-300","18V, LED, 300 lumen, 300 min/Ah",749,1,534.0695652173914,3,514.5304347826087,6,494.99130434782614,,0,0,
0601446400,0601446400,Light,"BE ","GLI 18V-1900 ","18V, LED, 1,900 lumen, 100 min/Ah ",1899,1,1354.0695652173915,2,1304.5304347826088,5,1254.9913043478261,,0,0,
06019F5100,06019F5100,Blower,"BE ","GBL 18V-120","18V, 17000 rpm , 120 m³/h",1149,1,819.2869565217393,3,789.313043478261,5,759.3391304347826,,0,0,
06019D0200,06019D0200,"Orbital Sander ","BE ","GSS 18V-10","18V, sanding sheet, width 115mm, 22 000 opm",1499,1,1068.8521739130435,3,1029.7478260869566,5,990.6434782608696,,0,0,
06019C6200,06019C6200,"Vacuum Cleaner","BE ","GAS 18V-1","18V, 0.7l container volume, 10 l/s",1499,1,1068.8521739130435,3,1029.7478260869566,5,990.6434782608696,,0,0,
06019H9022,06019H9022,"Angle Grinder","BE ","GWS 180-LI","18V, 115 mm, 11000 rpm, M14, brushless",1799,1,1298.408695652174,6,1259.3000000000002,20,1220.1913043478262,,0,0,
06019K3183,06019K3183,"Impact Drill","BE ","GSB 185-LI ","18V, 50/21 Nm, drilling (wood/steel/masonry) 35/10/10mm, brushless, case",1799,1,1282.7652173913045,3,1235.8347826086958,5,1188.9043478260871,,0,0,
06019G5226,06019G5226,"Impact Driver/Wrench","BE ","GDX 180-LI","18V,180 Nm, 1/4'' Internal hexagon / 1/2'' Square holder",2199,1,1567.9826086956525,3,1510.617391304348,5,1453.2521739130436,,0,0,
06013A5020,06013A5020,"Random Orbital Sander","BE ","GEX 185-LI ","18V, sanding pad dia. 125mm, Orbit dia. 2.5mm, brushless",2199,1,1567.9826086956525,3,1510.617391304348,6,1453.2521739130436,,0,0,
06119111K0,06119111K0,"Rotary Hammer","BE ","GBH 180-LI","18V, Max drilling in concrete: 20 mm, SDS-Plus, 2 J, brushless",2699,1,1947.9739130434782,4,1889.3,10,1830.626086956522,,0,0,
06019K7021,06019K7021,"Drywall Screwdriver","BE ","GTB 185-LI ","18V, max. screw 6mm, reversible depth gauge, brushless, case",2999,1,2138.4173913043483,3,2060.1826086956526,6,1981.9478260869566,,0,0,
06019K0021,06019K0021,"Impact Wrench","BE ","GDS 18V-400","18V, 400 Nm, 1/2'' Square, brushless",2999,1,2138.4173913043483,3,2060.1826086956526,6,1981.9478260869566,,0,0,
06015B3021,06015B3021,"Jig Saw","BE ","GST 185-LI","18V, cutting depth in wood: 125 mm, brushless, LED",2999,1,2138.4173913043483,3,2060.1826086956526,6,1981.9478260869566,,0,0,
06016A2200,06016A2200,"Circular Saw","BE ","GKS 18V-57","18V, cutting depth: 57/39mm, saw blade dia.: 165mm",2999,1,2138.4173913043483,3,2060.1826086956526,6,1981.9478260869566,,0,0,
上面的行有 17 列,如何删除第 1,5 和 9 列
编辑:添加了更大的样本量。是的,列总是一致的。
答案1
您的 CSV 数据具有包含嵌入逗号的字段,因此您必须使用正确的 CSV 解析器,而不仅仅是简单的逗号分割。此外,您的数据包含多字节(utf8?)字符,因此您需要使用可以处理它们的东西。
Perl 和磨坊主是不错的选择。如果我需要在打印之前对 CSV 文件进行更多处理或将 CSV 导入 SQL 数据库(一旦每一行都在数组$row
引用中,您可以使用它做任何您想做的事情 - 在您的情况下,我会使用 perl) perl 脚本可以合并多个源文件,也许可以删除重复的条目或清理格式奇怪的内容,例如使用''
而不是"
英寸),并且miller
如果我需要做的只是剪切一些字段。
使用磨坊主:
$ mlr --csv --hi --ho cut -x -f 1,5,9 input.csv
使用 Perl 和 Perl 的文本::CSV模块:
$ perl -MText::CSV -CSDA -e '
my $file=shift;
my $csv=Text::CSV->new();
open(my $fh, "<", $file) or die "error opening $file:$!\n";
while(my $row = $csv->getline($fh)) {
foreach my $i (8,4,0) {splice @$row, $i, 1};
$csv->say(STDOUT, $row)
}' input.csv
注意:perl 数组从零开始,而不是从一开始,我以相反的顺序删除数组元素(使用 perlsplice
函数),以避免在删除元素时索引重新编号的问题。
另请注意:Text::CSV
它不包含在 perl 中,它需要从发行包(例如apt-get install libtext-csv-perl
在 Debian 以及 Ubuntu 和 Mint 等衍生产品上)或使用cpan
.
答案2
由于您使用的是 GNU sed (for -i
),我假设您也有 GNU awk。您可以摆脱所有 sed 命令,因为当您使用 awk 时,您永远不需要 sed,只需运行此 1 awk 脚本(未经测试,因为没有磁力RE在问题中):
awk -v FPAT='([^,]*)|("([^"]|"")*")' -v OFS=',' '
(NR > 5) && (prev !~ /^"/) {
gsub(/ +/," ",prev)
nf = patsplit(prev,p)
out = sep = ""
for ( i=1; i<=nf; i++ ) {
if ( i !~ /^[159]$/ ) {
out = out sep p[i]
sep = OFS
}
}
print out
}
{ prev = $0 }
' file
例如,如果您想删除第 1、5、17 和 35 列,则只需更改i !~ /^[159]$/
为i !~ /^([15]|17|35)$/
或(如果您愿意)i !~ /^(1|5|17|35)$/
。
如果您希望能够传递要跳过的字段列表,而不是将其硬编码在脚本中,请将上面的内容更改为:
awk -v skip='1,5,9' -v FPAT='([^,]*)|("([^"]|"")*")' -v OFS=',' '
BEGIN {
patsplit(skip,tmp)
for ( i in tmp ) {
skipFlds[tmp[i]]
}
}
(NR > 5) && (prev !~ /^"/) {
gsub(/ +/," ",prev)
nf = patsplit(prev,p)
out = sep = ""
for ( i=1; i<=nf; i++ ) {
if ( !(i in skipFlds) ) {
out = out sep p[i]
sep = OFS
}
}
print out
}
{ prev = $0 }
' file
完成测试并对结果感到满意后,更改awk
为awk -i inplace
更新输入文件(如)。sed -i
以上需要 GNU awk 来处理FPAT
、patsplit()
和-i inplace
。
看使用 awk 高效解析 csv 的最稳健方法是什么有关使用 awk 解析 CSV 的更多信息。
答案3
用于sed
删除第 1,5 和 9 列。
$ sed -E 's/^[^,]*,(([^,]*,){3})[^,]*,(([^,]*,){3})[^,]*,/\1\3/' input_file
2608577312,"Metal drill bits HSS-PointTeq",AC,"HSS PointTeQ 17,0 mm (reduced shank)",197,102.78,,102.78,,102.78,,,,
您可以将所有sed
命令链接到一个命令中;
$ sed -Ei.bak '1,4d;$d;/^"/d;s/ +/ /g;s/^[^,]*,(([^,]*,){3})[^,]*,(([^,]*,){3})[^,]*,/\1\3/' input_file
答案4
由于仅需要选定的列,因此awk
可能是另一个不错的选择。以下是仅选择所需列的方法:
数据 :
os_user:~$ cat csv_file.csv
Tom,Jerry,23,US
luca,brasi,55,UK
使用字段分隔符和打印功能选择所需的列:
os_user:~$ awk -F"," '{print $1","$3}' csv_file.csv
Tom,23
luca,55
索引awk
从 1 开始。在上面的演示中,第 1 列和第 3 列已被打印。此外,还可以进行进一步复杂的编程来删除条件行、处理数据等。
更多详情参考: AWK 参考
编辑注释:正如问题中没有提到的,我没有意识到某些列中有逗号。感谢埃德·莫顿的评论。它需要正确处理,而我的上述解决方案没有考虑到这一点。