我有一个主文件A.txt
(字段分隔符 = \t):
Sample ID Internal Control Result Consensus Sequence Lane Index Set Index ID
2154686427 Pass Detected Not Available 1,2,3,4 1 UDP0001
2154666275 Pass Detected Not Available 1,2,3,4 1 UDP0002
每个样本都有一个文件,其中包含相同的指标,例如此处2154686427.mapping_metrics.csv
和2154666275.mapping_metrics.csv
(字段分隔符 = ,)。
2154686427.mapping_metrics.csv
:
MAPPING/ALIGNING SUMMARY,,Total input reads,5654101,100.00
MAPPING/ALIGNING SUMMARY,,Number of duplicate marked reads,5577937,98.65
和2154666275.mapping_metrics.csv
:
MAPPING/ALIGNING SUMMARY,,Total input reads,5651111,100.00
MAPPING/ALIGNING SUMMARY,,Number of duplicate marked reads,5511111,97.2
我想打印 中每个文件的标题 ($3) 和相应的值 ($4) A.txt
,如下所示:
Sample ID Internal Control Result Consensus Sequence Lane Index Set Index ID Total input reads Number of duplicate marked reads
2154686427 Pass Detected Not Available 1,2,3,4 1 UDP0001 5654101 5577937
2154666275 Pass Detected Not Available 1,2,3,4 1 UDP0002 561111 5511111
您有这样做的想法吗?
我尝试根据文件名相似性搜索类似的问题,但没有找到。谢谢
答案1
awk -v OFS="\t" -F, '
FS==","{
hdr[FNR]=$3 # save header in array
sub(/\..*/, "", FILENAME) # remove `.mapping_metrics.csv` from FILENAME
sub(/.*\//, "", FILENAME) # remove parent path from FILENAME
val[FILENAME]=val[FILENAME] OFS $4 # append value to array using tab as separator
next
}
FNR==1{
print $0 OFS hdr[1] OFS hdr[2] # print header and new header fields
next
}
{ print $0 val[$1] } # print record with new values
' *.mapping_metrics.csv FS="\t" A.txt
答案2
awk -F '\t' '
BEGIN { OFS = FS; ORS = "" }
NR==1 {
h1 = "Total input reads"
h2 = "Number of duplicate marked reads"
print $0, h1, h2 RS
next
}
{
print
FS = ","
f = $1 ".mapping_metrics.csv"
while (getline < f > 0)
if ((h1==$3)||(h2==$3))
print "", $4
print RS
close(f)
FS = OFS
}
' ./A.txt
假设标头的最后两个字段可以被硬编码。