如何创建脚本来写入 .csv 文件

Question 1

采用与布5赫曼，即假设样本 ID 是文件名中第一个点之前的部分：

#!/bin/sh

csv_print_row () {
    # Outputs a CSV-formatted row of an arbitrary number of fields.
    # Will quote fields containing commas. That's all.

    for field do
        case $field in
            *,*) set -- "$@" "\"$field\"" ;;
            *)   set -- "$@" "$field"
        esac
        shift
    done

    # The fields are now (possibly quoted) in the list of positional parameters.
    # Print this list as a comma-delimited string:
    ( IFS=,; printf "%s\n" "$*" )
}

# Output header
csv_print_row "sample_id" "absolute-filepath" "direction"

# Loop over the *.fq files in the current directory
for fastq in *.fq; do
    # The sample ID is the filename up to the first dot.
    sample_id=${fastq%%.*}

    # Figure out the direction of the sample
    case $fastq in
        *.R1.*) dir=forward ;;
        *.R2.*) dir=reverse ;;
        *)      dir=unknown
    esac

    # Output row for this sample
    csv_print_row "$sample_id" "$PWD/$fastq" "$dir"
done

测试：

$ ls -l
total 4
-rw-r--r--  1 kk  wheel    0 Mar 13 18:01 sample-1.R1.fq
-rw-r--r--  1 kk  wheel    0 Mar 13 18:01 sample-1.R2.fq
-rw-r--r--  1 kk  wheel    0 Mar 13 18:01 sample-2.R1.fq
-rw-r--r--  1 kk  wheel    0 Mar 13 18:01 sample-2.R2.fq
-rw-r--r--  1 kk  wheel    0 Mar 13 18:01 sample-3.R1.fq
-rw-r--r--  1 kk  wheel    0 Mar 13 18:01 sample-3.R2.fq
-rw-r--r--  1 kk  wheel    0 Mar 13 18:01 sample-4.R1.fq
-rw-r--r--  1 kk  wheel    0 Mar 13 18:01 sample-4.R2.fq
-rw-r--r--  1 kk  wheel  629 Mar 13 18:00 script.sh
-rw-r--r--  1 kk  wheel    0 Mar 13 18:02 strange, sample.R1.fq
-rw-r--r--  1 kk  wheel    0 Mar 13 18:02 strange, sample.R2.fq
-rw-r--r--  1 kk  wheel    0 Mar 13 18:02 strange, sample.R3.fq

$ sh script.sh
sample_id,absolute-filepath,direction
sample-1,/tmp/shell-yash.zm5cvzG6/sample-1.R1.fq,forward
sample-1,/tmp/shell-yash.zm5cvzG6/sample-1.R2.fq,reverse
sample-2,/tmp/shell-yash.zm5cvzG6/sample-2.R1.fq,forward
sample-2,/tmp/shell-yash.zm5cvzG6/sample-2.R2.fq,reverse
sample-3,/tmp/shell-yash.zm5cvzG6/sample-3.R1.fq,forward
sample-3,/tmp/shell-yash.zm5cvzG6/sample-3.R2.fq,reverse
sample-4,/tmp/shell-yash.zm5cvzG6/sample-4.R1.fq,forward
sample-4,/tmp/shell-yash.zm5cvzG6/sample-4.R2.fq,reverse
"strange, sample","/tmp/shell-yash.zm5cvzG6/strange, sample.R1.fq",forward
"strange, sample","/tmp/shell-yash.zm5cvzG6/strange, sample.R2.fq",reverse
"strange, sample","/tmp/shell-yash.zm5cvzG6/strange, sample.R3.fq",unknown

创建清单：

sh script.sh >manifest-file.csv

请注意，如果任何文件名包含双引号，这将生成无效的 CSV 输出。

到适当地处理包含双引号的引用字段，您必须使用类似的东西

csv_print_row () {
    # Outputs a CSV-formatted row of an arbitrary number of fields.

    # Quote fields that needs quoting
    for field do
        case $field in
            *[,\"]*) set -- "$@" "\"$field\"" ;;
            *)       set -- "$@" "$field"
        esac
        shift
    done

    # Double up internal double quotes in fields that have been quoted
    for field do
        case $field in
            '"'*'"'*'"')
                field=$( printf '%s\n' "$field" | sed 's/"/""/g' )
                # Now remove the extra quote at the start and end
                field=${field%\"}
                field=${field#\"}
        esac
        set -- "$@" "$field"
        shift
    done

    ( IFS=,; printf "%s\n" "$*" )
}

对于包含换行符的字段，这仍然没有做正确的事情，但是处理这个问题会使我们超出这个问题的范围。

也可以看看：

RFC 4180

Answer

采用与布5赫曼，即假设样本 ID 是文件名中第一个点之前的部分：

#!/bin/sh

csv_print_row () {
    # Outputs a CSV-formatted row of an arbitrary number of fields.
    # Will quote fields containing commas. That's all.

    for field do
        case $field in
            *,*) set -- "$@" "\"$field\"" ;;
            *)   set -- "$@" "$field"
        esac
        shift
    done

    # The fields are now (possibly quoted) in the list of positional parameters.
    # Print this list as a comma-delimited string:
    ( IFS=,; printf "%s\n" "$*" )
}

# Output header
csv_print_row "sample_id" "absolute-filepath" "direction"

# Loop over the *.fq files in the current directory
for fastq in *.fq; do
    # The sample ID is the filename up to the first dot.
    sample_id=${fastq%%.*}

    # Figure out the direction of the sample
    case $fastq in
        *.R1.*) dir=forward ;;
        *.R2.*) dir=reverse ;;
        *)      dir=unknown
    esac

    # Output row for this sample
    csv_print_row "$sample_id" "$PWD/$fastq" "$dir"
done

测试：

$ ls -l
total 4
-rw-r--r--  1 kk  wheel    0 Mar 13 18:01 sample-1.R1.fq
-rw-r--r--  1 kk  wheel    0 Mar 13 18:01 sample-1.R2.fq
-rw-r--r--  1 kk  wheel    0 Mar 13 18:01 sample-2.R1.fq
-rw-r--r--  1 kk  wheel    0 Mar 13 18:01 sample-2.R2.fq
-rw-r--r--  1 kk  wheel    0 Mar 13 18:01 sample-3.R1.fq
-rw-r--r--  1 kk  wheel    0 Mar 13 18:01 sample-3.R2.fq
-rw-r--r--  1 kk  wheel    0 Mar 13 18:01 sample-4.R1.fq
-rw-r--r--  1 kk  wheel    0 Mar 13 18:01 sample-4.R2.fq
-rw-r--r--  1 kk  wheel  629 Mar 13 18:00 script.sh
-rw-r--r--  1 kk  wheel    0 Mar 13 18:02 strange, sample.R1.fq
-rw-r--r--  1 kk  wheel    0 Mar 13 18:02 strange, sample.R2.fq
-rw-r--r--  1 kk  wheel    0 Mar 13 18:02 strange, sample.R3.fq

$ sh script.sh
sample_id,absolute-filepath,direction
sample-1,/tmp/shell-yash.zm5cvzG6/sample-1.R1.fq,forward
sample-1,/tmp/shell-yash.zm5cvzG6/sample-1.R2.fq,reverse
sample-2,/tmp/shell-yash.zm5cvzG6/sample-2.R1.fq,forward
sample-2,/tmp/shell-yash.zm5cvzG6/sample-2.R2.fq,reverse
sample-3,/tmp/shell-yash.zm5cvzG6/sample-3.R1.fq,forward
sample-3,/tmp/shell-yash.zm5cvzG6/sample-3.R2.fq,reverse
sample-4,/tmp/shell-yash.zm5cvzG6/sample-4.R1.fq,forward
sample-4,/tmp/shell-yash.zm5cvzG6/sample-4.R2.fq,reverse
"strange, sample","/tmp/shell-yash.zm5cvzG6/strange, sample.R1.fq",forward
"strange, sample","/tmp/shell-yash.zm5cvzG6/strange, sample.R2.fq",reverse
"strange, sample","/tmp/shell-yash.zm5cvzG6/strange, sample.R3.fq",unknown

创建清单：

sh script.sh >manifest-file.csv

请注意，如果任何文件名包含双引号，这将生成无效的 CSV 输出。

到适当地处理包含双引号的引用字段，您必须使用类似的东西

csv_print_row () {
    # Outputs a CSV-formatted row of an arbitrary number of fields.

    # Quote fields that needs quoting
    for field do
        case $field in
            *[,\"]*) set -- "$@" "\"$field\"" ;;
            *)       set -- "$@" "$field"
        esac
        shift
    done

    # Double up internal double quotes in fields that have been quoted
    for field do
        case $field in
            '"'*'"'*'"')
                field=$( printf '%s\n' "$field" | sed 's/"/""/g' )
                # Now remove the extra quote at the start and end
                field=${field%\"}
                field=${field#\"}
        esac
        set -- "$@" "$field"
        shift
    done

    ( IFS=,; printf "%s\n" "$*" )
}

对于包含换行符的字段，这仍然没有做正确的事情，但是处理这个问题会使我们超出这个问题的范围。

也可以看看：

RFC 4180

Question 2

这是否接近您要寻找的东西？

echo "sample-id,absolute-filepath,direction" > manifest
for f in *.fq; do
  dir="forward"
  g=$(echo $f | grep -Po "(?<=\.R)[0-9](?=\.fq)")
  if [ $g -eq 2 ]; then
    dir="reverse"
  fi
  echo ${f%%.*},$PWD/$f,$dir
done >> manifest
cat manifest

假设只有 R1 和 R2 并且您从包含的目录执行

Answer

这是否接近您要寻找的东西？

echo "sample-id,absolute-filepath,direction" > manifest
for f in *.fq; do
  dir="forward"
  g=$(echo $f | grep -Po "(?<=\.R)[0-9](?=\.fq)")
  if [ $g -eq 2 ]; then
    dir="reverse"
  fi
  echo ${f%%.*},$PWD/$f,$dir
done >> manifest
cat manifest

假设只有 R1 和 R2 并且您从包含的目录执行

如何创建脚本来写入 .csv 文件

答案1

答案2

相关内容