从 CSV 创建具有不同字段数量的变量

Question 1

在此脚本中，该行仅被读入默认变量$REPLY。然后用空格替换逗号${REPLY//,/ }并放入数组中declare -a COL=()。然后使用循环处理部分部分，其中使用以下方法计算列索引$((idx+2))：

#! /bin/bash
while read; do
    declare -a COL=( ${REPLY//,/ } )
    echo -e "container=${COL[0]}\nrow=${COL[1]}\nshelf=${COL[2]}"
    idx=1
    while [ $idx -lt 10 ]; do
        echo "section$idx=${COL[$((idx+2))]}"
        let idx=idx+1
    done
done

Answer

在此脚本中，该行仅被读入默认变量$REPLY。然后用空格替换逗号${REPLY//,/ }并放入数组中declare -a COL=()。然后使用循环处理部分部分，其中使用以下方法计算列索引$((idx+2))：

#! /bin/bash
while read; do
    declare -a COL=( ${REPLY//,/ } )
    echo -e "container=${COL[0]}\nrow=${COL[1]}\nshelf=${COL[2]}"
    idx=1
    while [ $idx -lt 10 ]; do
        echo "section$idx=${COL[$((idx+2))]}"
        let idx=idx+1
    done
done

Question 2

我会为每个 csv 记录使用一个关联数组：假设您的数据位于名为的文件中input.csv

#!/usr/bin/env bash

counter=1          # provides index for each csv record
while read 
do
    IFS=',' a=( $REPLY )               # numeric array containing current row
    eval "declare -A row$counter"      # declare an assoc. array representing
                                       # this row   

    eval "row$counter+=( ['row']=${a[0]} )"
    a=( "${a[@]:1}" )
    eval "row$counter+=( ['shelf']=${a[0]} )"
    a=( "${a[@]:1}" )
    eval "row$counter+=( ['section1']=${a[0]} )"
    a=( "${a[@]:1}" )
    eval "row$counter+=( ['section2']=${a[0]} )"
    a=( "${a[@]:1}" )
    eval "row$counter+=( ['section3']=${a[0]} )"
    a=( "${a[@]:1}" )
    eval "row$counter+=( ['section4']=${a[0]} )"
    a=( "${a[@]:1}" )
    eval "row$counter+=( ['section5']=${a[0]} )"
    a=( "${a[@]:1}" )
    eval "row$counter+=( ['section6']=${a[0]} )"
    a=( "${a[@]:1}" )

    declare -p row$counter

    (( counter = counter + 1 ))
done < <( cat input.csv )

# access arbitrary element
printf "\n---------\n%s\n" ${row3["section4"]}

这给了我这样的输出：

declare -A row1='([section6]="6" [section5]="5" [section4]="4" [section3]="4" [section2]="2" [section1]="1" [shelf]="12" [row]="PL3" )'
declare -A row2='([section6]="" [section5]="" [section4]="" [section3]="2" [section2]="1" [section1]="4" [shelf]="13" [row]="PL4" )'
declare -A row3='([section6]="" [section5]="" [section4]="3" [section3]="2" [section2]="1" [section1]="5" [shelf]="14" [row]="PL5" )'
declare -A row4='([section6]="5" [section5]="4" [section4]="3" [section3]="2" [section2]="1" [section1]="6" [shelf]="15" [row]="PL6" )'
declare -A row5='([section6]="5" [section5]="4" [section4]="3" [section3]="2" [section2]="1" [section1]="7" [shelf]="16" [row]="PL7" )'
declare -A row6='([section6]="5" [section5]="4" [section4]="3" [section3]="2" [section2]="1" [section1]="8" [shelf]="15" [row]="PL8" )'
declare -A row7='([section6]="5" [section5]="4" [section4]="3" [section3]="2" [section2]="1" [section1]="7" [shelf]="16" [row]="PL9" )'

---------
3

Answer

我会为每个 csv 记录使用一个关联数组：假设您的数据位于名为的文件中input.csv

#!/usr/bin/env bash

counter=1          # provides index for each csv record
while read 
do
    IFS=',' a=( $REPLY )               # numeric array containing current row
    eval "declare -A row$counter"      # declare an assoc. array representing
                                       # this row   

    eval "row$counter+=( ['row']=${a[0]} )"
    a=( "${a[@]:1}" )
    eval "row$counter+=( ['shelf']=${a[0]} )"
    a=( "${a[@]:1}" )
    eval "row$counter+=( ['section1']=${a[0]} )"
    a=( "${a[@]:1}" )
    eval "row$counter+=( ['section2']=${a[0]} )"
    a=( "${a[@]:1}" )
    eval "row$counter+=( ['section3']=${a[0]} )"
    a=( "${a[@]:1}" )
    eval "row$counter+=( ['section4']=${a[0]} )"
    a=( "${a[@]:1}" )
    eval "row$counter+=( ['section5']=${a[0]} )"
    a=( "${a[@]:1}" )
    eval "row$counter+=( ['section6']=${a[0]} )"
    a=( "${a[@]:1}" )

    declare -p row$counter

    (( counter = counter + 1 ))
done < <( cat input.csv )

# access arbitrary element
printf "\n---------\n%s\n" ${row3["section4"]}

这给了我这样的输出：

declare -A row1='([section6]="6" [section5]="5" [section4]="4" [section3]="4" [section2]="2" [section1]="1" [shelf]="12" [row]="PL3" )'
declare -A row2='([section6]="" [section5]="" [section4]="" [section3]="2" [section2]="1" [section1]="4" [shelf]="13" [row]="PL4" )'
declare -A row3='([section6]="" [section5]="" [section4]="3" [section3]="2" [section2]="1" [section1]="5" [shelf]="14" [row]="PL5" )'
declare -A row4='([section6]="5" [section5]="4" [section4]="3" [section3]="2" [section2]="1" [section1]="6" [shelf]="15" [row]="PL6" )'
declare -A row5='([section6]="5" [section5]="4" [section4]="3" [section3]="2" [section2]="1" [section1]="7" [shelf]="16" [row]="PL7" )'
declare -A row6='([section6]="5" [section5]="4" [section4]="3" [section3]="2" [section2]="1" [section1]="8" [shelf]="15" [row]="PL8" )'
declare -A row7='([section6]="5" [section5]="4" [section4]="3" [section3]="2" [section2]="1" [section1]="7" [shelf]="16" [row]="PL9" )'

---------
3

Question 3

我会这样开始：

while IFS=, read -ra fields; do
    for (( i = ${#fields[@]} - 1; i >= 0; i-- )); do
        [[ -z "${fields[i]}" ]] && unset fields[i] || break
    done
    declare -p fields
done < file

declare -a fields='([0]="PL3" [1]="12" [2]="3" [3]="1" [4]="2" [5]="3" [6]="4" [7]="5" [8]="6")'
declare -a fields='([0]="PL4" [1]="13" [2]="4" [3]="1" [4]="2")'
declare -a fields='([0]="PL5" [1]="14" [2]="5" [3]="1" [4]="2" [5]="3")'
declare -a fields='([0]="PL6" [1]="15" [2]="6" [3]="1" [4]="2" [5]="3" [6]="4" [7]="5" [8]="6" [9]="7" [10]="8")'
declare -a fields='([0]="PL7" [1]="16" [2]="7" [3]="1" [4]="2" [5]="3" [6]="4" [7]="5" [8]="6" [9]="7" [10]="8" [11]="9")'
declare -a fields='([0]="PL8" [1]="15" [2]="8" [3]="1" [4]="2" [5]="3" [6]="4" [7]="5" [8]="6" [9]="7" [10]="8")'
declare -a fields='([0]="PL9" [1]="16" [2]="7" [3]="1" [4]="2" [5]="3" [6]="4" [7]="5" [8]="6" [9]="7" [10]="8" [11]="9")'

确保文件中没有任何尾随空格。

我质疑你是否需要有数字递增的变量名称。听起来你需要二维数组，这是 bash 没有的数据结构。您确定 bash 是适合这项工作的工具吗？

Answer

我会这样开始：

while IFS=, read -ra fields; do
    for (( i = ${#fields[@]} - 1; i >= 0; i-- )); do
        [[ -z "${fields[i]}" ]] && unset fields[i] || break
    done
    declare -p fields
done < file

declare -a fields='([0]="PL3" [1]="12" [2]="3" [3]="1" [4]="2" [5]="3" [6]="4" [7]="5" [8]="6")'
declare -a fields='([0]="PL4" [1]="13" [2]="4" [3]="1" [4]="2")'
declare -a fields='([0]="PL5" [1]="14" [2]="5" [3]="1" [4]="2" [5]="3")'
declare -a fields='([0]="PL6" [1]="15" [2]="6" [3]="1" [4]="2" [5]="3" [6]="4" [7]="5" [8]="6" [9]="7" [10]="8")'
declare -a fields='([0]="PL7" [1]="16" [2]="7" [3]="1" [4]="2" [5]="3" [6]="4" [7]="5" [8]="6" [9]="7" [10]="8" [11]="9")'
declare -a fields='([0]="PL8" [1]="15" [2]="8" [3]="1" [4]="2" [5]="3" [6]="4" [7]="5" [8]="6" [9]="7" [10]="8")'
declare -a fields='([0]="PL9" [1]="16" [2]="7" [3]="1" [4]="2" [5]="3" [6]="4" [7]="5" [8]="6" [9]="7" [10]="8" [11]="9")'

确保文件中没有任何尾随空格。

我质疑你是否需要有数字递增的变量名称。听起来你需要二维数组，这是 bash 没有的数据结构。您确定 bash 是适合这项工作的工具吗？

Question 4

假设数据采用“简单”CSV 格式（意味着不需要特殊的 CSV 引用字段），我们可以相对轻松地将无标头 CSV 数据转换为结构化 JSON 文件。以下代码创建一组独立的 JSON 对象，每个对象对应 CSV 文件中的每一行输入：

$ jq -R 'split(",") | {container:.[0], typeA:.[1], typeB:.[2], typeC:.[3:]}' file file.csv
{
  "container": "PL3",
  "typeA": "12.1.4.5-77",
  "typeB": "13.6.4.5-20",
  "typeC": [
    "17.3.577.9-29",
    "17.3.779.12-33",
    "17.3.802.12-60",
    "17.3.917.12-45",
    "17.3.956.12-63",
    "17.3.993.12-42"
  ]
}
{
  "container": "PL4",
  "typeA": "12.1.4.5-78",
  "typeB": "13.6.4.5-21",
  "typeC": [
    "17.3.577.9-30",
    "17.3.779.12-34"
  ]
}
[...] # output truncated for brevity

假设此 JSON 数据存储在中file.json，我们可以通过不同的方式查询它：

$ jq -r --arg container PL7 --arg type typeA 'select(.container==$container)[$type]' file.json
12.1.4.5-81

$ jq -r --arg container PL8 --arg type typeB 'select(.container==$container)[$type]' file.json
13.6.4.5-25

$ jq -r --arg container PL6 --arg type typeC 'select(.container==$container)[$type][]' file.json
17.3.577.9-32
17.3.779.12-36
17.3.802.12-63
17.3.917.12-48
17.3.956.12-66

（请注意，我[]在上面表达式的末尾添加了将数组扩展为单独的元素。）

$ jq -r --arg container PL3 --arg type typeC --arg sub 60 'select(.container==$container)[$type][] | select(endswith("-"+$sub))' file.json
17.3.802.12-60

Answer

假设数据采用“简单”CSV 格式（意味着不需要特殊的 CSV 引用字段），我们可以相对轻松地将无标头 CSV 数据转换为结构化 JSON 文件。以下代码创建一组独立的 JSON 对象，每个对象对应 CSV 文件中的每一行输入：

$ jq -R 'split(",") | {container:.[0], typeA:.[1], typeB:.[2], typeC:.[3:]}' file file.csv
{
  "container": "PL3",
  "typeA": "12.1.4.5-77",
  "typeB": "13.6.4.5-20",
  "typeC": [
    "17.3.577.9-29",
    "17.3.779.12-33",
    "17.3.802.12-60",
    "17.3.917.12-45",
    "17.3.956.12-63",
    "17.3.993.12-42"
  ]
}
{
  "container": "PL4",
  "typeA": "12.1.4.5-78",
  "typeB": "13.6.4.5-21",
  "typeC": [
    "17.3.577.9-30",
    "17.3.779.12-34"
  ]
}
[...] # output truncated for brevity

假设此 JSON 数据存储在中file.json，我们可以通过不同的方式查询它：

$ jq -r --arg container PL7 --arg type typeA 'select(.container==$container)[$type]' file.json
12.1.4.5-81

$ jq -r --arg container PL8 --arg type typeB 'select(.container==$container)[$type]' file.json
13.6.4.5-25

$ jq -r --arg container PL6 --arg type typeC 'select(.container==$container)[$type][]' file.json
17.3.577.9-32
17.3.779.12-36
17.3.802.12-63
17.3.917.12-48
17.3.956.12-66

（请注意，我[]在上面表达式的末尾添加了将数组扩展为单独的元素。）

$ jq -r --arg container PL3 --arg type typeC --arg sub 60 'select(.container==$container)[$type][] | select(endswith("-"+$sub))' file.json
17.3.802.12-60

从 CSV 创建具有不同字段数量的变量

答案1

答案2

答案3

答案4

相关内容