我有很多.csv
包含客户信息的文件。在所有这些文件中,我想FIRSTNAME
在该列旁边添加一个附加列FULLNAME
。名字可以通过抓取第一个单词来生成FULLNAME
。
没有像让·保罗这样只有两个字的名字。在最后一列中,字段文本中使用了逗号
输入
COMPANY,FULLNAME,EMAIL,FUNCTION,CITY,INDUSTRY,COMMENT
Company name,Firstname Lastname,[email protected],Marketing Manager,New York,Health Care,"home, work"
Company name,Firstname infix Lastname,[email protected],Marketing Manager,New York,Health Care,"home, workhome, work"
Company name,Firstname infix infix2 Lastname,[email protected],Marketing Manager,New York,Health Care,"home, work"
预期产出
COMPANY,FULLNAME,FIRSTNAME,EMAIL,FUNCTION,CITY,INDUSTRY,COMMENT
Company name,Firstname Lastname,Firstname,[email protected],Marketing Manager,New York,Health Care,"home, work"
Company name,Firstname infix Lastname,Firstname,[email protected],Marketing Manager,New York,Health Care,"home, work"
Company name,Firstname infix infix2 Lastname,Firstname,[email protected],Marketing Manager,New York,Health Care,"home, work"
如何使用 awk、sed 或其他东西来做到这一点?
答案1
使用支持 CSV 的实用程序磨坊主( mlr
):
mlr --csv \
put '$FIRSTNAME = sub($FULLNAME," .*","")' then \
reorder -f COMPANY,FULLNAME,FIRSTNAME file
...鉴于问题中的数据,结果是
COMPANY,FULLNAME,FIRSTNAME,EMAIL,FUNCTION,CITY,INDUSTRY,COMMENT
Company name,Firstname Lastname,Firstname,[email protected],Marketing Manager,New York,Health Care,"home, work"
Company name,Firstname infix Lastname,Firstname,[email protected],Marketing Manager,New York,Health Care,"home, workhome, work"
Company name,Firstname infix infix2 Lastname,Firstname,[email protected],Marketing Manager,New York,Health Care,"home, work"
Miller 的这种使用首先FIRSTNAME
通过基于正则表达式的替换创建一个新字段 ,该替换会删除该FULLNAME
字段中第一个空格字符之后的所有内容。
由于新字段最后呈现,因此这些字段将被重新排序,以确保前几个字段按此顺序为COMPANY
、FULLNAME
、 和。FIRSTNAME
其余字段保留其原始顺序。
您可以使用with 的函数来代替put
表达式 using ,以空格分割字段的值并选出第一个生成的字符串:sub()
put
splitnv()
FIRSTNAME
mlr --csv \
put '$FIRSTNAME = splitnv($FULLNAME," ")[1]' then \
reorder -f COMPANY,FULLNAME,FIRSTNAME file
为了更漂亮的输出:
$ mlr --icsv --opprint --barred put '$FIRSTNAME = splitnv($FULLNAME," ")[1]' then reorder -f COMPANY,FULLNAME,FIRSTNAME file
+--------------+---------------------------------+-----------+--------------------------------+-------------------+----------+-------------+----------------------+
| COMPANY | FULLNAME | FIRSTNAME | EMAIL | FUNCTION | CITY | INDUSTRY | COMMENT |
+--------------+---------------------------------+-----------+--------------------------------+-------------------+----------+-------------+----------------------+
| Company name | Firstname Lastname | Firstname | [email protected] | Marketing Manager | New York | Health Care | home, work |
| Company name | Firstname infix Lastname | Firstname | [email protected] | Marketing Manager | New York | Health Care | home, workhome, work |
| Company name | Firstname infix infix2 Lastname | Firstname | [email protected] | Marketing Manager | New York | Health Care | home, work |
+--------------+---------------------------------+-----------+--------------------------------+-------------------+----------+-------------+----------------------+
答案2
使用乐(以前称为 Perl_6)
~$ raku -MText::CSV -e 'my @a = csv(in => $*IN); \
my @b = [Z] @a>>[0..1], @a>>[1].map(*.words.[0]), @a>>[2..*]; \
@b = @b>>.[*;*]>>.Array; @b[0][2] = "FIRSTNAME"; \
csv(in => @b, out => $*OUT);' file
如果 OP 希望整个 CSV 输出都用双引号引起来,这可能是最简单的方法(出现嵌入逗号和/或空格的引用字段,请参阅 RFC4180)。
Perl(5) 模块Text::CSV_XS
备受推崇,该模块的长期作者/维护者开发了 RakuText::CSV
模块(H. Merijn Brand,个人交流)。这里,Raku 的Text::CSV
功能仅限于 IO 操作,特别是引用的最后一栏。否则,上面的列操作是使用标准 Raku 数组完成的。不过,此代码的优点是可以使用双引号文件路径代替$*IN
,后者采用 STDIN。
上面,文件@a
在第一行被读入数组,在第二行中,word
零索引列 1 的第一个被取出并分配给零索引列 2;该行和其余行被逐行分配给@b
数组。第三行完成了一些内务处理(展平数组、使@b
元素可变以及更正列标题)。最后在第四行输出文件。
输入示例:
COMPANY,FULLNAME,EMAIL,FUNCTION,CITY,INDUSTRY,COMMENT
Company name,Firstname Lastname,[email protected],Marketing Manager,New York,Health Care,"home, work"
Company name,Firstname infix Lastname,[email protected],Marketing Manager,New York,Health Care,"home, workhome, work"
Company name,Firstname infix infix2 Lastname,[email protected],Marketing Manager,New York,Health Care,"home, work"
示例输出(来自上面):
COMPANY,FULLNAME,FIRSTNAME,EMAIL,FUNCTION,CITY,INDUSTRY,COMMENT
"Company name","Firstname Lastname",Firstname,[email protected],"Marketing Manager","New York","Health Care","home, work"
"Company name","Firstname infix Lastname",Firstname,[email protected],"Marketing Manager","New York","Health Care","home, workhome, work"
"Company name","Firstname infix infix2 Lastname",Firstname,[email protected],"Marketing Manager","New York","Health Care","home, work"
实际上,仅重新引用最后一列需要做更多的工作。将上面的最后一行替换csv(in => @b, out => $*OUT)
为以下内容:
.join(",").put for @b[0]; \
.join(",").put for [Z] @b[1..*]>>.[0..*-2]>>.join(","), @b[1..*]>>.[*-1].map(*.raku);'
示例输出(修订后的代码示例):
COMPANY,FULLNAME,FIRSTNAME,EMAIL,FUNCTION,CITY,INDUSTRY,COMMENT
Company name,Firstname Lastname,Firstname,[email protected],Marketing Manager,New York,Health Care,"home, work"
Company name,Firstname infix Lastname,Firstname,[email protected],Marketing Manager,New York,Health Care,"home, workhome, work"
Company name,Firstname infix infix2 Lastname,Firstname,[email protected],Marketing Manager,New York,Health Care,"home, work"
https://datatracker.ietf.org/doc/html/rfc4180
https://github.com/Tux/CSV/blob/master/doc/Text-CSV.md
https://raku.org
答案3
使用sed
$ sed -E '2,$s/[^,]*,([^ ]*) [^,]*,/&\1,/;1s/([^,]*,){2}/&FIRSTNAME,/' input_file
COMPANY,FULLNAME,FIRSTNAME,EMAIL,FUNCTION,CITY,INDUSTRY,COMMENT
Company name,Firstname Lastname,Firstname,[email protected],Marketing Manager,New York,Health Care,"home, work"
Company name,Firstname infix Lastname,Firstname,[email protected],Marketing Manager,New York,Health Care,"home, workhome, work"
Company name,Firstname infix infix2 Lastname,Firstname,[email protected],Marketing Manager,New York,Health Care,"home, work"