将一个文件的列搜索到另一文件的不同列

2024-6-1 • tag-icon

我有两个不同列的文件。

File1:

pears   are fruits
apple   is  fruit
carrot  is  veg
celery  is  vegetable
oranges are fruits

File2:

fruits apple   mycode is #q123c# for apple
fruits pears   my code is #q432c# for juicy
veg    celery  my code value is #q989c# for vegetables
veg    spinach code is #q783c# and is a type of vegetable
fruits papaya  i have code #q346c#
vegie  lettuce code #q445c# is vege

需要理想的输出文件：

Q432C pears fruits
Q123C apple fruit
Q---C carrot veg
Q989C celery vegetable
Q---C oranges fruits

需要将 File1 的第 1 列与的第 2 列进行比较File2。如果匹配，则打印中两个 # 字段内的 q-to-c 代码File2，否则打印的空代码q---c。并将QC代码转换为大写。

我希望输出的行数与File1.

理想情况下，输出文件应该包含 q-to-c 代码，然后File2附加File1.但目前，我只解决了如何将 q-to-c 代码从匹配的行中删除File2并将其变成大写：

awk 'NR==FNR { a[$1]=1; next } ($2 in a) {print $0} ' File1 File2 | sed -e 's/.*#\(.*\)#.*/\1/' | tr [a-z] [A-Z] > outputFile

...有人可以帮忙吗？我是awk脚本新手。

在获得上述结果后，我打算进行连接，但随后我冒着无法将正确的 q-to-c 代码连接到正确的行的风险，因为我生成的输出文件的行数没有File1.
我愿意接受除之外的其他解决方案awk。

如果有人可以提供帮助，我将非常感激。：）
提前致谢。

答案1

与单awk命令：

awk 'NR == FNR{
         match($0, /#q[0-9]{3}c#/);
         fruits[$2] = substr($0, RSTART + 1, RLENGTH - 2);
         next
     }
     { print ($1 in fruits? toupper(fruits[$1]) : "Q---C"), $1, $3 }' file2 file1

输出：

Q432C pears fruits
Q123C apple fruit
Q---C carrot veg
Q989C celery vegetable
Q---C oranges fruits

答案1

相关内容