制表符分隔值到 YAML 的转换

Question 1

这是一种方法：

$ cat inf
your-email  your-order-id   PayPal-transaction-id   your-first-name your-second-name
[email protected]   12345   54321   sooky   spooky
[email protected]   23456   23456   kiki    dee
[email protected] 34567   76543   cheeky  chappy
$ cat mkf.sh
awk '
BEGIN {
  print "---\n"
}
NR == 1 {
  nc = NF
  for(c = 1; c <= NF; c++) {
    h[c] = $c
  }
}
NR > 1 {
  for(c = 1; c <= nc; c++) {
    printf h[c] ": " $c "\n"
  }
  print ""
}' inf
$ ./mkf.sh inf
---

your-email: [email protected]
your-order-id: 12345
PayPal-transaction-id: 54321
your-first-name: sooky
your-second-name: spooky

your-email: [email protected]
your-order-id: 23456
PayPal-transaction-id: 23456
your-first-name: kiki
your-second-name: dee

your-email: [email protected]
your-order-id: 34567
PayPal-transaction-id: 76543
your-first-name: cheeky
your-second-name: chappy

Answer

这是一种方法：

$ cat inf
your-email  your-order-id   PayPal-transaction-id   your-first-name your-second-name
[email protected]   12345   54321   sooky   spooky
[email protected]   23456   23456   kiki    dee
[email protected] 34567   76543   cheeky  chappy
$ cat mkf.sh
awk '
BEGIN {
  print "---\n"
}
NR == 1 {
  nc = NF
  for(c = 1; c <= NF; c++) {
    h[c] = $c
  }
}
NR > 1 {
  for(c = 1; c <= nc; c++) {
    printf h[c] ": " $c "\n"
  }
  print ""
}' inf
$ ./mkf.sh inf
---

your-email: [email protected]
your-order-id: 12345
PayPal-transaction-id: 54321
your-first-name: sooky
your-second-name: spooky

your-email: [email protected]
your-order-id: 23456
PayPal-transaction-id: 23456
your-first-name: kiki
your-second-name: dee

your-email: [email protected]
your-order-id: 34567
PayPal-transaction-id: 76543
your-first-name: cheeky
your-second-name: chappy

Question 2

您是否尝试过在 begin 中定义一个设置为零的可迭代整数变量并运行 if/else 语句，如果“iter==0”将字段名称保存到数组中的元素，则自动递增整数，否则它会打印记录已经写了（除了使用i可迭代打印字段？（有关 awk 数组的更多信息）。

我根本没有测试过这段代码（而且我awk总体上很糟糕），但它应该作为一般编程/脚本概念的具体说明：

#!/usr/bin/awk
FS=="\t"
BEGIN {
   print "---"
   iter=0
} 
NR==1 
{

   if (iter == 0)
      for (i=1;i<=NF;i++) 
         newArr[i]=$i
      iter++
   else
      for (i=1;i<=NF;i++) 
         print newArr[i] ": " $i

}

Answer

您是否尝试过在 begin 中定义一个设置为零的可迭代整数变量并运行 if/else 语句，如果“iter==0”将字段名称保存到数组中的元素，则自动递增整数，否则它会打印记录已经写了（除了使用i可迭代打印字段？（有关 awk 数组的更多信息）。

我根本没有测试过这段代码（而且我awk总体上很糟糕），但它应该作为一般编程/脚本概念的具体说明：

#!/usr/bin/awk
FS=="\t"
BEGIN {
   print "---"
   iter=0
} 
NR==1 
{

   if (iter == 0)
      for (i=1;i<=NF;i++) 
         newArr[i]=$i
      iter++
   else
      for (i=1;i<=NF;i++) 
         print newArr[i] ": " $i

}

Question 3

我确信这可以完成，awk但如果 Perl 答案可以接受，那么这应该可以满足您的需要：

#!/usr/bin/env perl
print "---\n";
while (<>) {
    chomp;
    ## This splits the line at one or more whitespace characters
    ## into the array @fields.
    @fields=split(/\t+/);
    ## Get the column names if this is the 1st line
    if ($.==1){@cols=@fields}
    ## Print the data if it is not the first line
    else {
      print "\n";
      for ($i=0;$i<=$#fields;$i++){
        print "$cols[$i] : $fields[$i]\n";
      }
    }
}

例如：

$./foo.pl input_text.txt
---

your-email: [email protected]
your-order-id: 12345
PayPal-transaction-id: 54321
your-first-name: sooky
your-second-name: spooky

your-email: [email protected]
your-order-id: 23456
PayPal-transaction-id: 23456
your-first-name: kiki
your-second-name: dee

your-email: [email protected]
your-order-id: 34567
PayPal-transaction-id: 76543
your-first-name: cheeky
your-second-name: chappy

可以使用 Perl 的选项将其压缩为单行-a，将每一行拆分为数组@F：

echo "---";perl  -aF"\t" -ne 'if ($.==1){@c=@F; chomp($c[$#c]);}else {
 print "\n";for ($i=0;$i<=$#F;$i++){print "$c[$i]: $F[$i]\n";}}' input_text.txt

Answer

我确信这可以完成，awk但如果 Perl 答案可以接受，那么这应该可以满足您的需要：

#!/usr/bin/env perl
print "---\n";
while (<>) {
    chomp;
    ## This splits the line at one or more whitespace characters
    ## into the array @fields.
    @fields=split(/\t+/);
    ## Get the column names if this is the 1st line
    if ($.==1){@cols=@fields}
    ## Print the data if it is not the first line
    else {
      print "\n";
      for ($i=0;$i<=$#fields;$i++){
        print "$cols[$i] : $fields[$i]\n";
      }
    }
}

例如：

$./foo.pl input_text.txt
---

your-email: [email protected]
your-order-id: 12345
PayPal-transaction-id: 54321
your-first-name: sooky
your-second-name: spooky

your-email: [email protected]
your-order-id: 23456
PayPal-transaction-id: 23456
your-first-name: kiki
your-second-name: dee

your-email: [email protected]
your-order-id: 34567
PayPal-transaction-id: 76543
your-first-name: cheeky
your-second-name: chappy

可以使用 Perl 的选项将其压缩为单行-a，将每一行拆分为数组@F：

echo "---";perl  -aF"\t" -ne 'if ($.==1){@c=@F; chomp($c[$#c]);}else {
 print "\n";for ($i=0;$i<=$#F;$i++){print "$c[$i]: $F[$i]\n";}}' input_text.txt

Question 4

csvjson -t file | yq -y .

假设原始数据的字段由制表符分隔，这使用csvjson(from工具csvkit包) 将数据转换为 JSON 格式。解析器yq（来自https://kislyuk.github.io/yq/) 然后用于将 JSON 转码为 YAML。

给定问题中的数据，最终输出将是 YAML 文档

- your-email: [email protected]
  your-order-id: 12345
  PayPal-transaction-id: 54321
  your-first-name: sooky
  your-second-name: spooky
- your-email: [email protected]
  your-order-id: 23456
  PayPal-transaction-id: 23456
  your-first-name: kiki
  your-second-name: dee
- your-email: [email protected]
  your-order-id: 34567
  PayPal-transaction-id: 76543
  your-first-name: cheeky
  your-second-name: chappy

我注意到问题中的预期输出没有什么意义，因为它是具有多个重复键的单个部分（键的值被同一键的后续实例覆盖）。因此，我选择忽略这一点，转而使用没有重复键的文档（上面的文档包含三个对象的列表）。

代替csvjson -j file你可以使用

mlr --itsv --ojson --jlistwrap cat file

...其中使用 Miller ( mlr)https://miller.readthedocs.io/en/latest/将制表符分隔的输入转换为 JSON。

代替yq -y .你可以使用

yj -jy

...使用yj来自https://github.com/sclevine/yj将 JSON 转换为 YAML。

TSV-->JSON 和 JSON-->YAML 转码提到的四种工具的任意组合最终都会给您相同（或等效）的结果。

Answer

csvjson -t file | yq -y .