合并具有一个公共字段的多行数据

Question 1

也许还有更优雅的方式awk，但这里是一个可能的脚本。

BEGIN { FS=";" ; OFS=";" }
NR==1 { print $0 }
NR>1 {
    if ( b[$5]=="" ) {
        a[$5]=$0
        b[$5]=$3
    }
    else {
        b[$5]=b[$5]","$3
        $3=b[$5]
        a[$5]=$0
    }
}
END {
    for (c in a) {
        print a[c]
    }
}

解释：

BEGIN设置分号作为输入和输出字段分隔符
NR==1只需打印第一行（标题），无需执行任何操作
NR>1对于其他线路：
- b[$5]是一个由字段 5 值索引的数组，包含字段 3 条目的（不断增长的）逗号分隔列表
- a[$5]是一个由字段 5 值索引的数组，包含修改的行（即包含以逗号分隔的字段 3 值）
- 如果b[$5]未设置（该值第一次出现），则设置a[$5]为行和b[$5]字段 3
- 否则（b[$5]已设置），将带有逗号分隔符的字段 3 添加到b[$5]，将此行中的字段 3 替换为此，然后替换a[$5]为此更改的行
- ENDc对于数组的所有索引值a打印数组元素（即所需的行）

我真的不知道如何awk对输出进行排序，但这是我的结果：

Domain Name;ID;Machine;Environment;ENV URL;Start Date;End Date;Disk Size;Used
Upgrade.uk.localhost.com;IN022345;Machine-apache-pf01,Machine-apache-pf02;per;per.Upgrade.uk.localhost.com;5 August 2015 16:54:08 GMT+01:00;2 August 2025 16:54:08 GMT+01:00;2048;Internal
matrix.localhost.com;XY6124;Machine-apache-dev1,Machine-apache-dev2;dev;dev.matrix.localhost.com;16 April 2013 06:32:28 GMT+01:00;16 April 2018 07:02:28 GMT+01:00;1024;External
matrix.localhost.com;XY6124;Machine-apache-uat1,Machine-apache-uat2;uat;uat.matrix.localhost.com;22 March 2013 06:16:10 GMT;22 March 2018 06:46:10 GMT;1024;External
orion.uk.localhost.com;XY01123;Machine-apache-ua01,Machine-apache-ua02;uat;uat.orion.uk.localhost.com;5 August 2015 16:54:08 GMT+01:00;2 August 2025 16:54:08 GMT+01:00;2048;Internal
matrix.localhost.com;XY6124;Machine-apache-bcp1,Machine-apache-prd1;test;test.matrix.localhost.com;2 April 2013 08:12:10 GMT+01:00;2 April 2018 08:42:10 GMT+01:00;1024;External

Answer

也许还有更优雅的方式awk，但这里是一个可能的脚本。

BEGIN { FS=";" ; OFS=";" }
NR==1 { print $0 }
NR>1 {
    if ( b[$5]=="" ) {
        a[$5]=$0
        b[$5]=$3
    }
    else {
        b[$5]=b[$5]","$3
        $3=b[$5]
        a[$5]=$0
    }
}
END {
    for (c in a) {
        print a[c]
    }
}

解释：

BEGIN设置分号作为输入和输出字段分隔符
NR==1只需打印第一行（标题），无需执行任何操作
NR>1对于其他线路：
- b[$5]是一个由字段 5 值索引的数组，包含字段 3 条目的（不断增长的）逗号分隔列表
- a[$5]是一个由字段 5 值索引的数组，包含修改的行（即包含以逗号分隔的字段 3 值）
- 如果b[$5]未设置（该值第一次出现），则设置a[$5]为行和b[$5]字段 3
- 否则（b[$5]已设置），将带有逗号分隔符的字段 3 添加到b[$5]，将此行中的字段 3 替换为此，然后替换a[$5]为此更改的行
- ENDc对于数组的所有索引值a打印数组元素（即所需的行）

我真的不知道如何awk对输出进行排序，但这是我的结果：

Domain Name;ID;Machine;Environment;ENV URL;Start Date;End Date;Disk Size;Used
Upgrade.uk.localhost.com;IN022345;Machine-apache-pf01,Machine-apache-pf02;per;per.Upgrade.uk.localhost.com;5 August 2015 16:54:08 GMT+01:00;2 August 2025 16:54:08 GMT+01:00;2048;Internal
matrix.localhost.com;XY6124;Machine-apache-dev1,Machine-apache-dev2;dev;dev.matrix.localhost.com;16 April 2013 06:32:28 GMT+01:00;16 April 2018 07:02:28 GMT+01:00;1024;External
matrix.localhost.com;XY6124;Machine-apache-uat1,Machine-apache-uat2;uat;uat.matrix.localhost.com;22 March 2013 06:16:10 GMT;22 March 2018 06:46:10 GMT;1024;External
orion.uk.localhost.com;XY01123;Machine-apache-ua01,Machine-apache-ua02;uat;uat.orion.uk.localhost.com;5 August 2015 16:54:08 GMT+01:00;2 August 2025 16:54:08 GMT+01:00;2048;Internal
matrix.localhost.com;XY6124;Machine-apache-bcp1,Machine-apache-prd1;test;test.matrix.localhost.com;2 April 2013 08:12:10 GMT+01:00;2 April 2018 08:42:10 GMT+01:00;1024;External

Question 2

你sqlite那里有吗？我对如何加入线路的了解正确吗？

sqlite> .separator ;
sqlite> .import file.txt alldata
sqlite> select "ENV URL", group_concat("Machine") from alldata group by "ENV URL";
dev.matrix.localhost.com;Machine-apache-dev1,Machine-apache-dev2
per.Upgrade.uk.localhost.com;Machine-apache-pf01,Machine-apache-pf02
test.matrix.localhost.com;Machine-apache-bcp1,Machine-apache-prd1
uat.matrix.localhost.com;Machine-apache-uat1,Machine-apache-uat2
uat.orion.uk.localhost.com;Machine-apache-ua01,Machine-apache-ua02

或者非交互式：

echo 'select "ENV URL", group_concat("Machine") from alldata group by "ENV URL";' \
  | sqlite3 -separator ";" -cmd ".import file.txt alldata" -batch

Answer

你sqlite那里有吗？我对如何加入线路的了解正确吗？

sqlite> .separator ;
sqlite> .import file.txt alldata
sqlite> select "ENV URL", group_concat("Machine") from alldata group by "ENV URL";
dev.matrix.localhost.com;Machine-apache-dev1,Machine-apache-dev2
per.Upgrade.uk.localhost.com;Machine-apache-pf01,Machine-apache-pf02
test.matrix.localhost.com;Machine-apache-bcp1,Machine-apache-prd1
uat.matrix.localhost.com;Machine-apache-uat1,Machine-apache-uat2
uat.orion.uk.localhost.com;Machine-apache-ua01,Machine-apache-ua02

或者非交互式：

echo 'select "ENV URL", group_concat("Machine") from alldata group by "ENV URL";' \
  | sqlite3 -separator ";" -cmd ".import file.txt alldata" -batch

Question 3

在 perl 中使用数组散列（在每次合并后使用拼接来删除并重新插入聚合字段）：

$ perl -F\; -alne '

  if($.==1) {
    print;
    next;
  }

  if(!exists $HoA{$F[4]}) {
    $HoA{$F[4]} = [ @F ];
  }
  else {
    splice @{ $HoA{$F[4]} }, 2, 0, join ",", (splice @{ $HoA{$F[4]} }, 2, 1), $F[2];
  }

  END {
    for $k (keys %HoA) {
      print join ";", @{ $HoA{$k} };
    }
  }
  ' data
Domain Name;ID;Machine;Environment;ENV URL;Start Date;End Date;Disk Size;Used
matrix.localhost.com;XY6124;Machine-apache-bcp1,Machine-apache-prd1;test;test.matrix.localhost.com;2 April 2013 08:12:10 GMT+01:00;2 April 2018 08:42:10 GMT+01:00;1024;External
orion.uk.localhost.com;XY01123;Machine-apache-ua01,Machine-apache-ua02;uat;uat.orion.uk.localhost.com;5 August 2015 16:54:08 GMT+01:00;2 August 2025 16:54:08 GMT+01:00;2048;Internal
matrix.localhost.com;XY6124;Machine-apache-dev1,Machine-apache-dev2;dev;dev.matrix.localhost.com;16 April 2013 06:32:28 GMT+01:00;16 April 2018 07:02:28 GMT+01:00;1024;External
matrix.localhost.com;XY6124;Machine-apache-uat1,Machine-apache-uat2;uat;uat.matrix.localhost.com;16 April 2013 07:06:33 GMT+01:00;16 April 2018 07:36:33 GMT+01:00;1024;External
Upgrade.uk.localhost.com;IN022345;Machine-apache-pf01,Machine-apache-pf02;per;per.Upgrade.uk.localhost.com;5 August 2015 16:54:08 GMT+01:00;2 August 2025 16:54:08 GMT+01:00;2048;Internal

或者，使用 GNU datamash（使用 acut删除多余的groupby字段）：

$ datamash -Hst\; groupby 5 unique 1-2 collapse 3 unique 4-9 < data | cut -d\; -f2-
unique(Domain Name);unique(ID);collapse(Machine);unique(Environment);unique(ENV URL);unique(Start Date);unique(End Date);unique(Disk Size);unique(Used)
matrix.localhost.com;XY6124;Machine-apache-dev1,Machine-apache-dev2;dev;dev.matrix.localhost.com;16 April 2013 06:32:28 GMT+01:00;16 April 2018 07:02:28 GMT+01:00;1024;External
Upgrade.uk.localhost.com;IN022345;Machine-apache-pf01,Machine-apache-pf02;per;per.Upgrade.uk.localhost.com;5 August 2015 16:54:08 GMT+01:00;2 August 2025 16:54:08 GMT+01:00;2048;Internal
matrix.localhost.com;XY6124;Machine-apache-bcp1,Machine-apache-prd1;test;test.matrix.localhost.com;2 April 2013 08:12:10 GMT+01:00;2 April 2018 08:42:10 GMT+01:00;1024;External
matrix.localhost.com;XY6124;Machine-apache-uat1,Machine-apache-uat2;uat;uat.matrix.localhost.com;16 April 2013 07:06:33 GMT+01:00,22 March 2013 06:16:10 GMT;16 April 2018 07:36:33 GMT+01:00,22 March 2018 06:46:10 GMT;1024;External
orion.uk.localhost.com;XY01123;Machine-apache-ua01,Machine-apache-ua02;uat;uat.orion.uk.localhost.com;5 August 2015 16:54:08 GMT+01:00;2 August 2025 16:54:08 GMT+01:00;2048;Internal

Answer

在 perl 中使用数组散列（在每次合并后使用拼接来删除并重新插入聚合字段）：

$ perl -F\; -alne '

  if($.==1) {
    print;
    next;
  }

  if(!exists $HoA{$F[4]}) {
    $HoA{$F[4]} = [ @F ];
  }
  else {
    splice @{ $HoA{$F[4]} }, 2, 0, join ",", (splice @{ $HoA{$F[4]} }, 2, 1), $F[2];
  }

  END {
    for $k (keys %HoA) {
      print join ";", @{ $HoA{$k} };
    }
  }
  ' data
Domain Name;ID;Machine;Environment;ENV URL;Start Date;End Date;Disk Size;Used
matrix.localhost.com;XY6124;Machine-apache-bcp1,Machine-apache-prd1;test;test.matrix.localhost.com;2 April 2013 08:12:10 GMT+01:00;2 April 2018 08:42:10 GMT+01:00;1024;External
orion.uk.localhost.com;XY01123;Machine-apache-ua01,Machine-apache-ua02;uat;uat.orion.uk.localhost.com;5 August 2015 16:54:08 GMT+01:00;2 August 2025 16:54:08 GMT+01:00;2048;Internal
matrix.localhost.com;XY6124;Machine-apache-dev1,Machine-apache-dev2;dev;dev.matrix.localhost.com;16 April 2013 06:32:28 GMT+01:00;16 April 2018 07:02:28 GMT+01:00;1024;External
matrix.localhost.com;XY6124;Machine-apache-uat1,Machine-apache-uat2;uat;uat.matrix.localhost.com;16 April 2013 07:06:33 GMT+01:00;16 April 2018 07:36:33 GMT+01:00;1024;External
Upgrade.uk.localhost.com;IN022345;Machine-apache-pf01,Machine-apache-pf02;per;per.Upgrade.uk.localhost.com;5 August 2015 16:54:08 GMT+01:00;2 August 2025 16:54:08 GMT+01:00;2048;Internal

或者，使用 GNU datamash（使用 acut删除多余的groupby字段）：

$ datamash -Hst\; groupby 5 unique 1-2 collapse 3 unique 4-9 < data | cut -d\; -f2-
unique(Domain Name);unique(ID);collapse(Machine);unique(Environment);unique(ENV URL);unique(Start Date);unique(End Date);unique(Disk Size);unique(Used)
matrix.localhost.com;XY6124;Machine-apache-dev1,Machine-apache-dev2;dev;dev.matrix.localhost.com;16 April 2013 06:32:28 GMT+01:00;16 April 2018 07:02:28 GMT+01:00;1024;External
Upgrade.uk.localhost.com;IN022345;Machine-apache-pf01,Machine-apache-pf02;per;per.Upgrade.uk.localhost.com;5 August 2015 16:54:08 GMT+01:00;2 August 2025 16:54:08 GMT+01:00;2048;Internal
matrix.localhost.com;XY6124;Machine-apache-bcp1,Machine-apache-prd1;test;test.matrix.localhost.com;2 April 2013 08:12:10 GMT+01:00;2 April 2018 08:42:10 GMT+01:00;1024;External
matrix.localhost.com;XY6124;Machine-apache-uat1,Machine-apache-uat2;uat;uat.matrix.localhost.com;16 April 2013 07:06:33 GMT+01:00,22 March 2013 06:16:10 GMT;16 April 2018 07:36:33 GMT+01:00,22 March 2018 06:46:10 GMT;1024;External
orion.uk.localhost.com;XY01123;Machine-apache-ua01,Machine-apache-ua02;uat;uat.orion.uk.localhost.com;5 August 2015 16:54:08 GMT+01:00;2 August 2025 16:54:08 GMT+01:00;2048;Internal

合并具有一个公共字段的多行数据

答案1

答案2

答案3

相关内容