我有 20 个文件,文件名如下:ERR260136.genefamilies.csv、ERR276187.genefamilies.csv 等。每个文件都必须乘以一个常数。相应的常数必须从 .csv 文件中获取:read_count.csv
。The read_count.csv
文件如下:
SampleID Read_counts
ERR260136 25636740
ERR260140 19166076
ERR260145 28011856
ERR260147 27916650
ERR260148 21871928
ERR260150 30130062
ERR260152 17949808
因此,ERR260136.genefamilies.csv 必须乘以 25636740,ERR260140.genefamilies.csv 必须乘以 19166076,依此类推……
需要相乘的 20 个文件的格式如下:
# Gene Family ERR260136_Abundance-RPKs
UNMAPPED 0.445035
UniRef90_A0A015P9C8 0.00080211
UniRef90_A0A015P9C8|g__Bacteroides.s__Bacteroides_fragilis 0.00080211
UniRef90_A5ZYU5 0.000787149
UniRef90_A5ZYU5|g__Blautia.s__Blautia_obeum 0.000787149
UniRef90_A0A0E1X896 0.000573095
UniRef90_A0A0E1X896|g__Blautia.s__Blautia_obeum 0.000573095
我该怎么做?有人可以帮忙吗?
答案1
你可以做这样的事情
while read -r sampleID read_count; do
# skip header line
[ "$sampleID" = "SampleID" ] && continue
awk -v read_count="${read_count}" 'FNR>1 {$2 *= read_count} 1' "${sampleID}.genefamilies.csv"
done < read_count.csv
显然,这是非常基础的——您可能希望为未找到的文件、非数字值等添加一些错误处理。