我有一个参考文件:
参考文件
Dpse\GA30012 FBgn0000447 chr2 26607738 26607962 -1
Dpse\GA19764 FBgn0085819 chrX 28571020 28571736 -1
Dpse\ttk FBgn0000100 chr2 16553824 16561652 -1
Dpse\GA30195 FBgn0085742 chr3 22629640 22630440 -1
和一个输入文件:
文件
FBgn0000447 1 11 HWI-ST1083:68:C0YYUACXX:8:1111:20915:34957 255 -
FBgn0000100 1 11 HWI-ST1083:68:C0YYUACXX:8:1113:9591:98803 255 -
FBgn0085819 1 11 HWI-ST1083:68:C0YYUACXX:8:1204:9035:56108 255 -
FBgn0085742 1 11 HWI-ST1083:68:C0YYUACXX:8:1204:9035:56108 255 -
FBgn0037963 47752 47802 HWI-ST1083:68:C0YYUACXX:8:1215:21263:59372 255 -
FBgn0001257 11527 11577 HWI-ST1083:68:C0YYUACXX:8:1311:2957:12154 255 -
FBgn0034315 158 208 HWI-ST1083:68:C0YYUACXX:8:2113:4139:83177 255 -
FBgn0000559 3316 3365 HWI-ST484:183:C167BACXX:7:1101:1926:2031 255 +
FBgn0262975 39033 39082 HWI-ST484:183:C167BACXX:7:1101:1726:2030 255 +
FBgn0032505 1 50 HWI-ST484:183:C167BACXX:7:1101:5095:2042 255 +
FBgn0005593 403 452 HWI-ST484:183:C167BACXX:7:1101:3906:2209 255 +
FBgn0013686 692 741 HWI-ST484:183:C167BACXX:7:1101:3218:2247 255 -
FBgn0000556 3793 3842 HWI-ST484:183:C167BACXX:7:1101:5288:2041 255 +
FBgn0015521 438 487 HWI-ST484:183:C167BACXX:7:1101:5731:2170 255 -
FBgn0033912 1121 1170 HWI-ST484:183:C167BACXX:7:1101:8602:2063 255 -
我在第一列和第二列之间创建了一个空列,file
成为此输出2:
输出2
FBgn0000447 435 485 HWI-ST1083:68:C0YYUACXX:8:1111:20915:34957 255 -
FBgn0000100 704 754 HWI-ST1083:68:C0YYUACXX:8:1113:9591:98803 255 -
FBgn0085819 154 204 HWI-ST1083:68:C0YYUACXX:8:1204:9035:56108 255 -
FBgn0085742 389 439 HWI-ST1083:68:C0YYUACXX:8:1204:9035:56108 255 -
FBgn0037963 47752 47802 HWI-ST1083:68:C0YYUACXX:8:1215:21263:59372 255 -
FBgn0001257 11527 11577 HWI-ST1083:68:C0YYUACXX:8:1311:2957:12154 255 -
FBgn0034315 158 208 HWI-ST1083:68:C0YYUACXX:8:2113:4139:83177 255 -
FBgn0000559 3316 3365 HWI-ST484:183:C167BACXX:7:1101:1926:2031 255 +
FBgn0262975 39033 39082 HWI-ST484:183:C167BACXX:7:1101:1726:2030 255 +
FBgn0032505 1 50 HWI-ST484:183:C167BACXX:7:1101:5095:2042 255 +
FBgn0005593 403 452 HWI-ST484:183:C167BACXX:7:1101:3906:2209 255 +
FBgn0013686 692 741 HWI-ST484:183:C167BACXX:7:1101:3218:2247 255 -
FBgn0000556 3793 3842 HWI-ST484:183:C167BACXX:7:1101:5288:2041 255 +
FBgn0015521 438 487 HWI-ST484:183:C167BACXX:7:1101:5731:2170 255 -
FBgn0033912 1121 1170 HWI-ST484:183:C167BACXX:7:1101:8602:2063 255 -
这是理想的输出:
对于文件中第 1 列中的每个 id output2
,对于 中第 2 列中的相应 id ,用引用第 3 列的值reference file
填充文件第 2 列。对于文件中第 1 列中的每个 id,对于参考文件中第 2 列中的相应 id,output2 文件第 3 列将等于(第 3 列 + 参考 4 - 1)(作为计算结果),第 4 列将等于(第 4 列 + 参考 4 - 1) )。output2
output2
这是我current code
和我无法获得理想的输出文件:
当前代码
awk -v OFS="\t" '
NR==FNR {a[$2]=$3; b[$2]=$4; next};
{if ($1 in a) $2=a[$1]; print};
{if ($1 in b) $3=b[$1]+$3-1; $4=b[$1]+$4-1; print}
' $ref $output2 > $output3
理想的输出应如下所示(对于前 4 行):
输出(所需)
FBgn0000447 chr2 26607738 26607748 HWI-ST1083:68:C0YYUACXX:8:1111:20915:34957 255 -
FBgn0000100 chr2 28571020 28571030 HWI-ST1083:68:C0YYUACXX:8:1113:9591:98803 255 -
FBgn0085819 chrX 16553824 16553834 HWI-ST1083:68:C0YYUACXX:8:1204:9035:56108 255 -
FBgn0085742 chr3 22629640 22629650 HWI-ST1083:68:C0YYUACXX:8:1204:9035:56108 255 -
不确定这是由于 awk 数组中的数值有一些限制还是其他错误。非常感谢您的帮助!
PS 我记得一个问题,因为reference file
并非第 2 列中的所有 id 在第 3/4 列中都有对应的值。这就是为什么我无法获取值的原因output2
,我应该如何解决这个问题?用什么来填充空白最好?再次感谢
答案1
请注意,我使用了您的“原始”输入,而没有对空格进行修改:您实际上并没有指定您的输入字段分隔符是制表符,因此 awk 将使用“空格”作为字段分隔符 - 因此您并不重要插入一些额外的选项卡或其他东西。
awk -v OFS="\t" '
NR == FNR {ref3[$2] = $3; ref4[$2] = $4; next}
$1 in ref3 {
$3 = $3 + ref4[$1] - 1
$2 = ref3[$1] OFS ($2 + ref4[$1] - 1)
print
}
' reference input
FBgn0000447 chr2 26607738 26607748 HWI-ST1083:68:C0YYUACXX:8:1111:20915:34957 255 -
FBgn0000100 chr2 16553824 16553834 HWI-ST1083:68:C0YYUACXX:8:1113:9591:98803 255 -
FBgn0085819 chrX 28571020 28571030 HWI-ST1083:68:C0YYUACXX:8:1204:9035:56108 255 -
FBgn0085742 chr3 22629640 22629650 HWI-ST1083:68:C0YYUACXX:8:1204:9035:56108 255 -