我有一个大约 4000 行的 csv 文件,每行包含 2 到 30 个以逗号分隔的名称。姓名包括头衔(例如 X Adams 先生或 Y Sanders 女士)。有些名称在同一行中存在多次,我希望删除同一行中的多个名称。它位于文件“input.csv”中,另一个文件“output.csv”应该是最终结果。
例如,我有:
mr. 1,mr. 2,mr. 3,mr. 1,mr. 4
prof. x,prof. y,prof. x
mr. 1,prof y
这应该成为
mr. 1,mr. 2,mr. 3,mr. 4 (mr. 1 was already meantioned so it should be removed)
prof. x,prof. y (prof. x was already mentioned so it should be removed)
mr. 1,prof y (even though both were already mentioned in the same file, they were not mentioned within this line so they may remain)
答案1
你可以试试:
#!/bin/bash
cat file | while IFS= read -r line ; do
echo "$line" | tr , '\n' | sort -u | tr '\n' , | sed 's/,$/\n/' ;
done