使用数字作为值的 awk 数组不起作用

使用数字作为值的 awk 数组不起作用

我有一个参考文件:

参考文件

Dpse\GA30012    FBgn0000447 chr2    26607738    26607962    -1
Dpse\GA19764    FBgn0085819 chrX    28571020    28571736    -1
Dpse\ttk    FBgn0000100 chr2    16553824    16561652    -1
Dpse\GA30195    FBgn0085742 chr3    22629640    22630440    -1

和一个输入文件:

文件

FBgn0000447 1   11  HWI-ST1083:68:C0YYUACXX:8:1111:20915:34957  255 -
FBgn0000100 1   11  HWI-ST1083:68:C0YYUACXX:8:1113:9591:98803   255 -
FBgn0085819 1   11  HWI-ST1083:68:C0YYUACXX:8:1204:9035:56108   255 -
FBgn0085742 1   11  HWI-ST1083:68:C0YYUACXX:8:1204:9035:56108   255 -
FBgn0037963 47752   47802   HWI-ST1083:68:C0YYUACXX:8:1215:21263:59372  255 -
FBgn0001257 11527   11577   HWI-ST1083:68:C0YYUACXX:8:1311:2957:12154   255 -
FBgn0034315 158 208 HWI-ST1083:68:C0YYUACXX:8:2113:4139:83177   255 -
FBgn0000559 3316    3365    HWI-ST484:183:C167BACXX:7:1101:1926:2031    255 +
FBgn0262975 39033   39082   HWI-ST484:183:C167BACXX:7:1101:1726:2030    255 +
FBgn0032505 1   50  HWI-ST484:183:C167BACXX:7:1101:5095:2042    255 +
FBgn0005593 403 452 HWI-ST484:183:C167BACXX:7:1101:3906:2209    255 +
FBgn0013686 692 741 HWI-ST484:183:C167BACXX:7:1101:3218:2247    255 -
FBgn0000556 3793    3842    HWI-ST484:183:C167BACXX:7:1101:5288:2041    255 +
FBgn0015521 438 487 HWI-ST484:183:C167BACXX:7:1101:5731:2170    255 -
FBgn0033912 1121    1170    HWI-ST484:183:C167BACXX:7:1101:8602:2063    255 -

我在第一列和第二列之间创建了一个空列,file成为此输出2:

输出2

FBgn0000447     435 485 HWI-ST1083:68:C0YYUACXX:8:1111:20915:34957  255 -
FBgn0000100     704 754 HWI-ST1083:68:C0YYUACXX:8:1113:9591:98803   255 -
FBgn0085819     154 204 HWI-ST1083:68:C0YYUACXX:8:1204:9035:56108   255 -
FBgn0085742     389 439 HWI-ST1083:68:C0YYUACXX:8:1204:9035:56108   255 -
FBgn0037963     47752   47802   HWI-ST1083:68:C0YYUACXX:8:1215:21263:59372  255 -
FBgn0001257     11527   11577   HWI-ST1083:68:C0YYUACXX:8:1311:2957:12154   255 -
FBgn0034315     158 208 HWI-ST1083:68:C0YYUACXX:8:2113:4139:83177   255 -
FBgn0000559     3316    3365    HWI-ST484:183:C167BACXX:7:1101:1926:2031    255 +
FBgn0262975     39033   39082   HWI-ST484:183:C167BACXX:7:1101:1726:2030    255 +
FBgn0032505     1   50  HWI-ST484:183:C167BACXX:7:1101:5095:2042    255 +
FBgn0005593     403 452 HWI-ST484:183:C167BACXX:7:1101:3906:2209    255 +
FBgn0013686     692 741 HWI-ST484:183:C167BACXX:7:1101:3218:2247    255 -
FBgn0000556     3793    3842    HWI-ST484:183:C167BACXX:7:1101:5288:2041    255 +
FBgn0015521     438 487 HWI-ST484:183:C167BACXX:7:1101:5731:2170    255 -
FBgn0033912     1121    1170    HWI-ST484:183:C167BACXX:7:1101:8602:2063    255 -

这是理想的输出:
对于文件中第 1 列中的每个 id output2,对于 中第 2 列中的相应 id ,用引用第 3 列的值reference file填充文件第 2 列。对于文件中第 1 列中的每个 id,对于参考文件中第 2 列中的相应 id,output2 文件第 3 列将等于(第 3 列 + 参考 4 - 1)(作为计算结果),第 4 列将等于(第 4 列 + 参考 4 - 1) )。output2output2

这是我current code和我无法获得理想的输出文件:

当前代码

awk -v OFS="\t" '
    NR==FNR {a[$2]=$3; b[$2]=$4; next}; 
    {if ($1 in a) $2=a[$1]; print}; 
    {if ($1 in b) $3=b[$1]+$3-1; $4=b[$1]+$4-1; print}
' $ref $output2 > $output3

理想的输出应如下所示(对于前 4 行):

输出(所需)

FBgn0000447 chr2 26607738   26607748    HWI-ST1083:68:C0YYUACXX:8:1111:20915:34957  255 -
FBgn0000100 chr2 28571020   28571030    HWI-ST1083:68:C0YYUACXX:8:1113:9591:98803   255 -
FBgn0085819 chrX 16553824   16553834    HWI-ST1083:68:C0YYUACXX:8:1204:9035:56108   255 -
FBgn0085742 chr3 22629640   22629650    HWI-ST1083:68:C0YYUACXX:8:1204:9035:56108   255 -

不确定这是由于 awk 数组中的数值有一些限制还是其他错误。非常感谢您的帮助!

PS 我记得一个问题,因为reference file并非第 2 列中的所有 id 在第 3/4 列中都有对应的值。这就是为什么我无法获取值的原因output2,我应该如何解决这个问题?用什么来填充空白最好?再次感谢

答案1

请注意,我使用了您的“原始”输入,而没有对空格进行修改:您实际上并没有指定您的输入字段分隔符是制表符,因此 awk 将使用“空格”作为字段分隔符 - 因此您并不重要插入一些额外的选项卡或其他东西。

awk -v OFS="\t" '
    NR == FNR  {ref3[$2] = $3; ref4[$2] = $4; next}
    $1 in ref3 {
        $3 = $3 + ref4[$1] - 1
        $2 = ref3[$1] OFS ($2 + ref4[$1] - 1)
        print
    }
' reference input
FBgn0000447 chr2    26607738    26607748    HWI-ST1083:68:C0YYUACXX:8:1111:20915:34957  255 -
FBgn0000100 chr2    16553824    16553834    HWI-ST1083:68:C0YYUACXX:8:1113:9591:98803   255 -
FBgn0085819 chrX    28571020    28571030    HWI-ST1083:68:C0YYUACXX:8:1204:9035:56108   255 -
FBgn0085742 chr3    22629640    22629650    HWI-ST1083:68:C0YYUACXX:8:1204:9035:56108   255 -

相关内容