我有两个巨大的输入文件:
file1
包含 3 个制表符分隔的字段:
field1
:UIDfield2
: 用户名field3
: 真正的名字
示例file1
:
644 u11496 Real name1
640 u1309 Real name2
641 u3007 Real name3
642 u3030 Real name4
643 u3112 Real name5
54 u0365 Real name6
55 u0613 Real name7
56 u1065 Real name8
57 u1550 Real name9
file2
有很多行,每行有 3 个制表符分隔的字段:
field1
: 对于这种情况可以忽略field2
: 团队名字field3
:包含以逗号分隔的用户 UID 的列
示例file2
:
2 group1 14,730,748,733,746,761,757,766,735,760,747,738,752,737,758,755,734,754,764,334,335,336,337,41,338,339,39,340
6 group2 14
15 group3 14,667,683,713,730,707,748,733,746,761,680,694,757,766,717,735,760,747,704,738,752,737,715,688,681,700,692,758,755,714,734
我需要在file1
第四列(制表符分隔)中的每个用户行中添加特定用户所属的以逗号分隔的组名称。
答案1
以下 shell 脚本(我使用的是 ksh)将根据您的请求创建第三个文件。
while read U REST
do
S=" " # initialized with a tabulation
G=""
grep -E '[ ,]'"$U"'(,|$)' file2.txt | while read X GROUP USRLIST
do
G="$G$S$GROUP"
S=","
done
echo "$U $REST$G" # tab between $U and $REST
done < file1.txt > file3.txt
file1.txt 为
644 u11496 Real name1
640 u1309 Real name2
641 u3007 Real name3
642 u3030 Real name4
643 u3112 Real name5
54 u0365 Real name6
55 u0613 Real name7
56 u1065 Real name8
57 u1550 Real name9
14 u14 Jules Ceasar
file2.txt 为
2 group1 14,730,748,733,746,761,757,766,735,760,747,738,752,737,758,755,734,754,764,334,335,336,337,41,338,339,39,340
6 group2 14
14 group6 667,683,641
15 group3 14,667,683,713,730,707,748,733,746,761,680,694,757,766,717,735,760,747,704,738,752,737,715,688,681,700,692,758,755,714,734
你会得到 file3.txt 作为
644 u11496 Real name1
640 u1309 Real name2
641 u3007 Real name3 group6
642 u3030 Real name4
643 u3112 Real name5
54 u0365 Real name6
55 u0613 Real name7
56 u1065 Real name8
57 u1550 Real name9
14 u14 Jules Ceasar group1,group2,group3