我有一个文件想要根据第 1 列和第 6 列过滤重复值
ID,sample,NAME,reference,app_name,appession_id,workflow,execution_status,status,date_created
1,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022
1,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022
1,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022
1,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022
2,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022
2,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022
2,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022
2,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022
最终的输出应该是这样的
ID,sample,NAME,reference,app_name,appession_id,workflow,execution_status,status,date_created
1,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022
2,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022
到目前为止,这是我尝试过的
awk '!a[$1 $6]++ { print ;}' input.csv > output.csv
我最终得到
ID,sample,NAME,reference,app_name,appession_id,workflow,execution_status,status,date_created
1,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022
2,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022
2,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022
任何建议都会有帮助。谢谢