如何使用 bash 更改列、删除引号以及将制表添加到文本文件

如何使用 bash 更改列、删除引号以及将制表添加到文本文件

我对 bash 脚本中的 bash 和 awk 很陌生,并且从我在文件中的文本来看:

"Index", "Year", "Age", "Name", "Movie"
1, 1928, 44, "Emil Jannings", "The Last Command, The Way of All Flesh" 
2, 1929, 41, "Warner Baxter", "In Old Arizona"
3, 1930, 62, "George Arliss", "Disraeli"
4, 1931, 53, "Lionel Barrymore", "A Free Soul"

我需要创建一个函数来获得此输出,按演员的姓名排序,更改一些标题名称并包括制表。

Actor               Year    Age   Film
Emil Jannings       1928    44    The Last Command, The Way of All Flesh
George Arliss       1930    62    Disraeli
Lionel Barrymore    1931    53    A Free Soul
Warner Baxter       1929    41    In Old Arizona

你会怎么做?我还是一个初学者,我找不到正确的方法来获得我想要的东西。

谢谢

答案1

来自csvcut基于 python 的 csvkit 和 Miller:

$ csvcut -S -c Name,Year,Age,Movie file.csv | 
     mlr --icsv --opprint sort -f Name then rename Name,Actor,Movie,Film
Actor            Year Age Film
Emil Jannings    1928 44  The Last Command, The Way of All Flesh
George Arliss    1930 62  Disraeli
Lionel Barrymore 1931 53  A Free Soul
Warner Baxter    1929 41  In Old Arizona

虽然我认为 Miller 应该能够自己做到这一点,但当引用的分隔符不是单个字符时,它似乎会错误地解析它。

答案2

使用 GNU awk 进行 FPAT:

$ cat tst.awk
BEGIN {
    in2outTag["Name"]  = "Actor"
    in2outTag["Movie"] = "Film"
    numOutFlds = split("Actor Year Age Film",outTags)
    FPAT = "\\s*(([^,]*)|(\"[^\"]+\"))\\s*"
    OFS = "\t"
}

{
    for (inFldNr=1; inFldNr<=NF; inFldNr++) {
        gsub(/^[[:space:]]*"?|"?[[:space:]]*$/,"",$inFldNr)
    }
}

NR == 1 {
    for (inFldNr=1; inFldNr<=NF; inFldNr++) {
        tag = ( $inFldNr in in2outTag ? in2outTag[$inFldNr] : $inFldNr )
        tag2inNr[tag] = inFldNr
    }

    printf "%d%s", 0, OFS
    for (outFldNr=1; outFldNr<=numOutFlds; outFldNr++) {
        tag = outTags[outFldNr]
        out2inNr[outFldNr] = tag2inNr[tag]
        printf "%s%s", tag, (outFldNr < numOutFlds ? OFS : ORS)
    }
    next
}

{
    printf "%d%s", 1, OFS
    for (outFldNr=1; outFldNr<=numOutFlds; outFldNr++) {
        inFldNr = out2inNr[outFldNr]
        val = $inFldNr
        printf "%s%s", val, (outFldNr < numOutFlds ? OFS : ORS)
    }
}

$ awk -f tst.awk file | sort -t$'\t' -k1,1n -k2,2 | cut -f2-
Actor   Year    Age     Film
Emil Jannings   1928    44      The Last Command, The Way of All Flesh
George Arliss   1930    62      Disraeli
Lionel Barrymore        1931    53      A Free Soul
Warner Baxter   1929    41      In Old Arizona

相关内容