从逗号分隔的文本中提取列

Question 1

awk -F , -v OFS='\t' 'NR == 1 || $6 > 4 {print $1, $6, $7, $8}' input.txt

Answer

awk -F , -v OFS='\t' 'NR == 1 || $6 > 4 {print $1, $6, $7, $8}' input.txt

Question 2

我同意 awk 是最好的解决方案。你能只需在 bash 中使用其他几个工具即可完成此操作：

cut -d , -f 2,6,7,8 filename | {
    read header
    tr , $'\t' <<< "$header"
    while IFS=, read -r id num4 num5 num6; do
        # bash can only do integer arithmetic
        if [[ $(bc <<< "$num4 >= 4.0") = 1 ]]; then
           printf "%s\t%s\t%s\t%s\n" "$id" "$num4" "$num5" "$num6"
        fi
    done
}

Answer

我同意 awk 是最好的解决方案。你能只需在 bash 中使用其他几个工具即可完成此操作：

cut -d , -f 2,6,7,8 filename | {
    read header
    tr , $'\t' <<< "$header"
    while IFS=, read -r id num4 num5 num6; do
        # bash can only do integer arithmetic
        if [[ $(bc <<< "$num4 >= 4.0") = 1 ]]; then
           printf "%s\t%s\t%s\t%s\n" "$id" "$num4" "$num5" "$num6"
        fi
    done
}

Question 3

确实无法击败上面的 awk 脚本，但这里有一个 ruby 解决方案，

#!/usr/bin/ruby1.9.1

puts "id\tnumber4\tnumber5\tnumber6"

ARGF.each_line do |line|
  arr = line.split(',')
  puts "#{arr[1]}\t#{arr[5]}\t#{arr[6]}\t#{arr[7]}" if arr[5].to_f > 4.0
end

要使用该脚本，请使用文件名调用它或将文件通过管道传输到其中。

Answer

确实无法击败上面的 awk 脚本，但这里有一个 ruby 解决方案，

#!/usr/bin/ruby1.9.1

puts "id\tnumber4\tnumber5\tnumber6"

ARGF.each_line do |line|
  arr = line.split(',')
  puts "#{arr[1]}\t#{arr[5]}\t#{arr[6]}\t#{arr[7]}" if arr[5].to_f > 4.0
end

要使用该脚本，请使用文件名调用它或将文件通过管道传输到其中。

Question 4

Perl解决方案：

perl -F, -le '$, = "\t"; print @F[1,5,6,7] if $F[5] > 4 || $. == 1' file

-F,指定要分割的模式。-F隐式设置-a

-a与一起使用时打开自动分割模式-n。对数组的隐式 split 命令@F是作为 . 生成的隐式 while 循环内的第一件事完成的-n。-a隐式设置-n

-n导致 Perl 假定程序周围有循环，这使得它迭代文件名参数，有点像sed -n或awk

-l启用自动换行处理。它有两种不同的效果。首先，它会自动截断输入记录分隔符 ( \n)。其次，它将输出记录分隔符分配给\n。

-e用于输入一行程序

所以，perl -F, -le '$, = "\t"; print @F[1,5,6,7] if $F[5] > 4 || $. == 1'做这样的事情：

use English;

$OUTPUT_RECORD_SEPARATOR = $INPUT_RECORD_SEPARATOR;

while (<>) { # iterate over each line of the each file
    chomp;
    @F = split(',');
    $OUTPUT_FIELD_SEPARATOR = "\t";
    print @F[1,5,6,7] if $F[5] > 4 || $INPUT_LINE_NUMBER == 1;
}

Answer

Perl解决方案：

perl -F, -le '$, = "\t"; print @F[1,5,6,7] if $F[5] > 4 || $. == 1' file

-F,指定要分割的模式。-F隐式设置-a

-a与一起使用时打开自动分割模式-n。对数组的隐式 split 命令@F是作为 . 生成的隐式 while 循环内的第一件事完成的-n。-a隐式设置-n

-n导致 Perl 假定程序周围有循环，这使得它迭代文件名参数，有点像sed -n或awk

-l启用自动换行处理。它有两种不同的效果。首先，它会自动截断输入记录分隔符 ( \n)。其次，它将输出记录分隔符分配给\n。

-e用于输入一行程序

所以，perl -F, -le '$, = "\t"; print @F[1,5,6,7] if $F[5] > 4 || $. == 1'做这样的事情：

use English;

$OUTPUT_RECORD_SEPARATOR = $INPUT_RECORD_SEPARATOR;

while (<>) { # iterate over each line of the each file
    chomp;
    @F = split(',');
    $OUTPUT_FIELD_SEPARATOR = "\t";
    print @F[1,5,6,7] if $F[5] > 4 || $INPUT_LINE_NUMBER == 1;
}

从逗号分隔的文本中提取列

答案1

答案2

答案3

答案4

相关内容