两个输入文件数据处理

两个输入文件数据处理

我有两个巨大的输入文件:

file1包含 3 个制表符分隔的字段:

  • field1:UID
  • field2: 用户名
  • field3: 真正的名字

示例file1

644     u11496   Real name1
640     u1309    Real name2
641     u3007    Real name3
642     u3030    Real name4
643     u3112    Real name5
54      u0365 Real name6
55      u0613 Real name7
56      u1065 Real name8
57      u1550 Real name9

file2有很多行,每行有 3 个制表符分隔的字段:

  • field1: 对于这种情况可以忽略
  • field2: 团队名字
  • field3:包含以逗号分隔的用户 UID 的列

示例file2

2       group1   14,730,748,733,746,761,757,766,735,760,747,738,752,737,758,755,734,754,764,334,335,336,337,41,338,339,39,340
6       group2        14
15      group3  14,667,683,713,730,707,748,733,746,761,680,694,757,766,717,735,760,747,704,738,752,737,715,688,681,700,692,758,755,714,734

我需要在file1第四列(制表符分隔)中的每个用户行中添加特定用户所属的以逗号分隔的组名称。

答案1

以下 shell 脚本(我使用的是 ksh)将根据您的请求创建第三个文件。

while read U REST
do
  S="       "   # initialized with a tabulation
  G=""
  grep -E '[        ,]'"$U"'(,|$)' file2.txt | while read X GROUP USRLIST
  do
    G="$G$S$GROUP"
    S=","
  done
  echo "$U  $REST$G"  # tab between $U and $REST
done < file1.txt > file3.txt

file1.txt 为

644 u11496  Real name1
640 u1309   Real name2
641 u3007   Real name3
642 u3030   Real name4
643 u3112   Real name5
54  u0365   Real name6
55  u0613   Real name7
56  u1065   Real name8
57  u1550   Real name9
14  u14     Jules Ceasar

file2.txt 为

2   group1  14,730,748,733,746,761,757,766,735,760,747,738,752,737,758,755,734,754,764,334,335,336,337,41,338,339,39,340
6   group2  14   
14  group6  667,683,641
15  group3  14,667,683,713,730,707,748,733,746,761,680,694,757,766,717,735,760,747,704,738,752,737,715,688,681,700,692,758,755,714,734

你会得到 file3.txt 作为

644 u11496  Real name1
640 u1309   Real name2
641 u3007   Real name3      group6
642 u3030   Real name4
643 u3112   Real name5
54  u0365   Real name6
55  u0613   Real name7
56  u1065   Real name8
57  u1550   Real name9
14  u14     Jules Ceasar    group1,group2,group3

相关内容