我有一个像这样的文件A.txt
(字段分隔符= ,
):
Kit Batch Export
Software Version = NO_v1
Date And Time of Export =
Experiment Name =
Instrument Software Version =
Instrument Type = Cji
Instrument Serial Number =
Run Start Date =
Run End Date =
Run Operator =
Batch Status = VALID
Method = Nov
Date And Time of Export,Batch ID,Sample Name,Well,Sample Type,Status,Interpretive Result,Action*,Curve analysis,EC,CH
,novaprime-ct044032-TB_2034,2061571293,A01,Unkn-01,VALID,,,
,novaprime-ct044032-TB_2034,2061584371,A02,Unkn-09,VALID,,,
并且B.csv
(字段分隔符= \t
;第一列为空):
Well Fluor Target Content Sample Cq SQ
A01 Cy5 EC Unkn-01 2060563935 26 NaN
A02 Cy5 CH Unkn-09 2060565055 37 NaN
A01 Cy5 CH Unkn-01 2060565888 54 NaN
A02 Cy5 EC Unkn-09 2060565465 NaN NaN
B.txt
我想在相应的行/列中添加 Well/Target 的每一行的值(Cq 列)(此处示例:A01/EC;A01/CH;A02/EC;A02/CH),A.txt
如下所示:
Kit Batch Export
Software Version = NO_v1
Date And Time of Export =
Experiment Name =
Instrument Software Version =
Instrument Type = Cji
Instrument Serial Number =
Run Start Date =
Run End Date =
Run Operator =
Batch Status = VALID
Method = Nov
Date And Time of Export,Batch ID,Sample Name,Well,Sample Type,Status,Interpretive Result,Action*,Curve analysis,EC,CH
,novaprime-ct044032-TB_2034,2061571293,A01,Unkn-01,VALID,,,,26,54
,novaprime-ct044032-TB_2034,2061584371,A02,Unkn-09,VALID,,,,NaN,37
为此,我尝试这样做:
awk -F"\t" 'FNR==NR{if (a[$2]) {a[$2]=a[$2] "," $7} else {a[$2]=$7}} NR>FNR{split($0,f,","); if (a[f[4]]) $0=$0 "," a[f[4]]; print}' B.txt A.txt > C.txt
它有点工作,但它在遇到第一次迭代时粘贴该值,而不是在它识别它是 EC 还是 CH 时。那么你有不同的方法来正确地做到这一点吗?谢谢
答案1
只要“标题”行中不能出现逗号,以下内容就可以工作:
awk -F'\t' 'FNR==NR{if ($4=="EC") ec[$2]=$7; else if ($4=="CH") ch[$2]=$7; next}
NR>FNR&&NF>1 {if (!f) f=1; else {$10=ec[$4]; $11=ch[$4];}}1' B.txt FS=',' OFS=',' A.txt
这将首先解析B.txt
并创建一个“EC-to-Well”映射和一个“CH-to-Well”映射,然后在解析时使用A.txt
。我们将字段分隔符设置为,
forA.txt
并确保只处理具有多个字段(即至少一个,
)的行,但不处理包含列标题的第一个字段。
更新
由于您在评论中指出有时B.txt
可能包含空字段,您希望确保将其替换为NaN
,因此我们需要进行额外检查:
awk -F'\t' 'FNR==NR{if ($4=="EC") ec[$2]=$7; else if ($4=="CH") ch[$2]=$7; next}
NR>FNR&&NF>1 {if (!f) f=1; else {$10=ec[$4]?ec[$4]:"NaN"; $11=ch[$4]?ch[$4]:"NaN";}}1' B.txt FS=',' OFS=',' A.txt
这是非常“高尔夫”的,但基本上
$10=ec[$4] ? ec[$4] : "NaN"
方法
if (ec[$4]) $10=ec[$4]; else $10="NaN"