将 csv 文件中的特定列移到前面（按名称选择列）

Question 1

随着伟大的磨坊主（文档）非常简单 mlr --csv reorder -f " value, price,group" input.csv

你有

 value, price,group
 3.21, 3.21,1
 3.42, 4.11,1
 3.5, 1.22,1
 4.1, 9.2,2
 4.2, 2.11,2

请注意：我已经编辑了我的命令，考虑到问题 CSV 字段名称中的空格。

Answer

随着伟大的磨坊主（文档）非常简单 mlr --csv reorder -f " value, price,group" input.csv

你有

 value, price,group
 3.21, 3.21,1
 3.42, 4.11,1
 3.5, 1.22,1
 4.1, 9.2,2
 4.2, 2.11,2

请注意：我已经编辑了我的命令，考虑到问题 CSV 字段名称中的空格。

Question 2

如果您不介意value列重复，您可以做这样的事情csvtool：

$ csvtool paste <(csvtool namedcol value example_file.txt) example_file.txt 
value,group,value,price
3.21,1,3.21,3.21
3.42,1,3.42,4.11
3.5,1,3.5,1.22
4.1,2,4.1,9.2
4.2,2,4.2,2.11

但据我所知csvtool不会移动（或者消除） A namedcol。

如果找不到专门的 CSV 工具，您可以使用通用语言（如 Awk 或 Perl）自己开发。思路是搜索第一行的字段以查找匹配列的索引，然后切片切块按所选顺序排列字段。

例如使用 perl文本::CSV模块，还有这个技巧如何获取数组中特定元素（值）的索引？

$ perl -MText::CSV -lpe '
  BEGIN{ $p = Text::CSV->new({ allow_whitespace => 1 }) };
  @f = $p->fields() if $p->parse($_);
  ($i) = grep { $f[$_] eq "value" } (0..$#f) if $. == 1; 
  $_ = join ", ", splice(@f, $i, 1),  @f
' example_file.txt
value, group, price
3.21, 1, 3.21
3.42, 1, 4.11
3.5, 1, 1.22
4.1, 2, 9.2
4.2, 2, 2.11

Answer

如果您不介意value列重复，您可以做这样的事情csvtool：

$ csvtool paste <(csvtool namedcol value example_file.txt) example_file.txt 
value,group,value,price
3.21,1,3.21,3.21
3.42,1,3.42,4.11
3.5,1,3.5,1.22
4.1,2,4.1,9.2
4.2,2,4.2,2.11

但据我所知csvtool不会移动（或者消除） A namedcol。

如果找不到专门的 CSV 工具，您可以使用通用语言（如 Awk 或 Perl）自己开发。思路是搜索第一行的字段以查找匹配列的索引，然后切片切块按所选顺序排列字段。

例如使用 perl文本::CSV模块，还有这个技巧如何获取数组中特定元素（值）的索引？

$ perl -MText::CSV -lpe '
  BEGIN{ $p = Text::CSV->new({ allow_whitespace => 1 }) };
  @f = $p->fields() if $p->parse($_);
  ($i) = grep { $f[$_] eq "value" } (0..$#f) if $. == 1; 
  $_ = join ", ", splice(@f, $i, 1),  @f
' example_file.txt
value, group, price
3.21, 1, 3.21
3.42, 1, 4.11
3.5, 1, 1.22
4.1, 2, 9.2
4.2, 2, 2.11

Question 3

我的建议是以下脚本：

#!/bin/bash

# Set a default value of the LABEL of the target column that must become first column
if [[ -z ${LABEL+x} ]]; then LABEL='value'; fi

# Process a single FILE
move_the_label_column_first() {
    # Read the LABELS on the first line of the input file as an array
    IFS=', ' read -a LABELS < <(cat "$FILE" 2>/dev/null | head -n1)

    # Find the number of the target column
    for ((COL = 0; COL < ${#LABELS[@]}; ++COL))
    do
        if [[ ${LABELS[$COL]} == "$LABEL" ]]
        then
            break
        fi
    done

    # Read each LINE from the input file as an array and output it in the new order
    while IFS=', ' read -a LINE
    do
        printf '%s, ' "${LINE[$COL]}" "${LINE[@]:0:$COL}" "${LINE[@]:$((COL + 1))}" | \
        sed 's/, $/\n/'
    done < <(cat "$FILE" 2>/dev/null)
}

# Process all input files, exclude the current script filename
for FILE in "$@"
do
    if [[ -f $FILE ]] && [[ $FILE != $(basename "$0") ]]
    then
        #echo "Input file: $FILE"
        move_the_label_column_first
    fi
done

我们将脚本命名为reorder.sh。为了说明脚本的功能，我们假设有以下我们要处理的文件，并且它们位于脚本所在的同一目录中。

$ cat in-file-1 
group, value, price
1, 3.21, 3.21
1, 3.42, 4.11
1, 3.5, 1.22

$ cat in-file-2
price, group, value, other
3.21, 1, 3.21, 7
4.11, 1, 3.42, 13
1.22, 1, 3.5, -1

处理一个输入文件：

$ ./reorder.sh in-file-1 
value, group, price
3.21, 1, 3.21
3.42, 1, 4.11
3.5, 1, 1.22

处理两个输入文件，并将必须成为第一列的列的标签更改为price：

$ LABEL='price' ./reorder.sh in-file-1 in-file-2 
price, group, value
3.21, 1, 3.21
4.11, 1, 3.42
1.22, 1, 3.5
price, group, value, other
3.21, 1, 3.21, 7
4.11, 1, 3.42, 13
1.22, 1, 3.5, -1

处理目录中的所有文件：

$ ./reorder.sh *
value, group, price
3.21, 1, 3.21
3.42, 1, 4.11
3.5, 1, 1.22
value, price, group, other
3.21, 3.21, 1, 7
3.42, 4.11, 1, 13
3.5, 1.22, 1, -1

递归处理：

$ shopt -s globstar
$ ./reorder.sh **/*
value, group, price
3.21, 1, 3.21
...

Answer