我有一个包含多列和 1000 条记录的 CSV 文件,我需要在其中一列(比如说第二列)的所有值前面加上撇号'
除了第一行或标题行之外,可能有一个简单的行。我如何使用awk
or来实现这一目标sed
?请注意,我可能在用双引号括起来的值中有多个逗号。
样本数据:
"col1","col2","col3","col4","col5"
"value11","value12","value13","value14","value15"
"value21","value22","value23","value24","value25"
"value31","value32","value33","value34","value35"
预期输出:
"col1","col2","col3","col4","col5"
"value11","'value12","value13","value14","value15"
"value21","'value22","value23","value24","value25"
"value31","'value32","value33","value34","value35"
答案1
sed:
sed '2,$s/^\("[^"]*","\)/\1'"'"/ test.in
使用 ERE 消除一些转义:
sed -E '2,$s/^("[^"]*",")/\1'"'"/ test.in
awk:
awk -F, 'NR>1{sub(/^"/,"\"'"'"'",$2)}1' test.in
如果您不想担心引用,请使用转义码:
awk -F, '{sub(/^"/,"\"\x27",$2)}1' test.in
答案2
使用 Perl:
perl -pi -e '
BEGIN{
$column_number = 2; # Change as needed
$column_number--;
$apostrophe = chr 39;
}
next unless $this_is_data++; # Skip the first line
s@ ^((?:"[^"]+"\s*,){$column_number}) "@$1"$apostrophe@x
' your_file
这假设您的字段不包含反斜杠转义的引号。
答案3
这是一个傻瓜:
$ gawk -F'","' -v var="'" -v OFS='","' 'NR>1{$2=var$2;} 1' foo.csv
该-v
选项允许您定义脚本可访问的变量gawk
。在这种情况下,var
is'
和OFS
(输出字段分隔符)是","
,与输入字段分隔符 ( -F
) 相同。然后我们检查这不是第一行 ( NR>1
) 并将 的值添加var
到第二列。最后,这1
只是一个技巧,它的计算结果为 true,这使得gawk
打印该行。相当于加了一个,print;
但更短。
如果您想在不同的列上运行此操作,只需更改$2=var$2;
为您感兴趣的列号$N=var$N
即可。N
你也可以在 perl 中做到这一点(当然,你可以这样做一切在 Perl 中):
$ perl -F'\",\"' -ane '$.>1 && do{$F[1]=chr(39).$F[1]};
print join("\",\"",@F)' foo.csv
该-a
开关使 perl 像 gawk 一样分割输入行,只是将它们保存在数组中@F
(perl 数组从 0 开始,所以第二列将是$F[1]
,第三列$F[2]
等)。-F
(再次类似)设置gawk
输入字段分隔符。因此,我们检查行号是否大于一 ( ),如果是,则向其添加(a ,感谢 @josephR)$.>1
的值。最后,我们使用连接数组中的每个元素并打印结果字符串。chr 39
'
join
@F
","
答案4
一个简单的sed
就可以了:
$ sed 's/","/","\x27/' afile
"col1","'col2","col3","col4","col5"
"value11","'value12","value13","value14","value15"
"value21","'value22","value23","value24","value25"
"value31","'value32","value33","value34","value35"
细节
我们正在搜索第一次出现的","
并将其替换为","`
。然而,转义反引号可能很棘手。因此只需输入其等效的十六进制转义代码即可\x27
。
你的问题
可以像这样进行调整,以将更改限制为仅您想要的行。
$ cat <(head -n +1 afile) <(tail -n +2 afile | sed 's/","/","\x27/')
"col1","col2","col3","col4","col5"
"value11","'value12","value13","value14","value15"
"value21","'value22","value23","value24","value25"
"value31","'value32","value33","value34","value35"
sed
或者,如果您知道技巧 8-),则可以完全跳过第一行:
$ sed '2,$s/","/","\x27/' afile
"col1","col2","col3","col4","col5"
"value11","'value12","value13","value14","value15"
"value21","'value22","value23","value24","value25"
"value31","'value32","value33","value34","value35"
这告诉sed
我们只取第二行直到最后一行 ( $
) 并通过搜索和替换运行它们。