我对 bash 脚本中的 bash 和 awk 很陌生,并且从我在文件中的文本来看:
"Index", "Year", "Age", "Name", "Movie"
1, 1928, 44, "Emil Jannings", "The Last Command, The Way of All Flesh"
2, 1929, 41, "Warner Baxter", "In Old Arizona"
3, 1930, 62, "George Arliss", "Disraeli"
4, 1931, 53, "Lionel Barrymore", "A Free Soul"
我需要创建一个函数来获得此输出,按演员的姓名排序,更改一些标题名称并包括制表。
Actor Year Age Film
Emil Jannings 1928 44 The Last Command, The Way of All Flesh
George Arliss 1930 62 Disraeli
Lionel Barrymore 1931 53 A Free Soul
Warner Baxter 1929 41 In Old Arizona
你会怎么做?我还是一个初学者,我找不到正确的方法来获得我想要的东西。
谢谢
答案1
来自csvcut
基于 python 的 csvkit 和 Miller:
$ csvcut -S -c Name,Year,Age,Movie file.csv |
mlr --icsv --opprint sort -f Name then rename Name,Actor,Movie,Film
Actor Year Age Film
Emil Jannings 1928 44 The Last Command, The Way of All Flesh
George Arliss 1930 62 Disraeli
Lionel Barrymore 1931 53 A Free Soul
Warner Baxter 1929 41 In Old Arizona
虽然我认为 Miller 应该能够自己做到这一点,但当引用的分隔符不是单个字符时,它似乎会错误地解析它。
答案2
使用 GNU awk 进行 FPAT:
$ cat tst.awk
BEGIN {
in2outTag["Name"] = "Actor"
in2outTag["Movie"] = "Film"
numOutFlds = split("Actor Year Age Film",outTags)
FPAT = "\\s*(([^,]*)|(\"[^\"]+\"))\\s*"
OFS = "\t"
}
{
for (inFldNr=1; inFldNr<=NF; inFldNr++) {
gsub(/^[[:space:]]*"?|"?[[:space:]]*$/,"",$inFldNr)
}
}
NR == 1 {
for (inFldNr=1; inFldNr<=NF; inFldNr++) {
tag = ( $inFldNr in in2outTag ? in2outTag[$inFldNr] : $inFldNr )
tag2inNr[tag] = inFldNr
}
printf "%d%s", 0, OFS
for (outFldNr=1; outFldNr<=numOutFlds; outFldNr++) {
tag = outTags[outFldNr]
out2inNr[outFldNr] = tag2inNr[tag]
printf "%s%s", tag, (outFldNr < numOutFlds ? OFS : ORS)
}
next
}
{
printf "%d%s", 1, OFS
for (outFldNr=1; outFldNr<=numOutFlds; outFldNr++) {
inFldNr = out2inNr[outFldNr]
val = $inFldNr
printf "%s%s", val, (outFldNr < numOutFlds ? OFS : ORS)
}
}
$ awk -f tst.awk file | sort -t$'\t' -k1,1n -k2,2 | cut -f2-
Actor Year Age Film
Emil Jannings 1928 44 The Last Command, The Way of All Flesh
George Arliss 1930 62 Disraeli
Lionel Barrymore 1931 53 A Free Soul
Warner Baxter 1929 41 In Old Arizona