如何从第二列中减去所有相邻行并保留第一列

如何从第二列中减去所有相邻行并保留第一列

我正在通过以下命令创建一个文件:

awk '{print $2 " "$7" "$8}' REACTOME_EXTENSION_OF_TELOMERES.xls |  awk '$8!="No"  {print $1 " " $2}' | awk 'NR>1' | awk 'BEGIN { OFS=", "; print "Name" " " "0" };{ print $0 " " "" }'

输出是这样的:

    Name 0
WRAP53 0.08495288 
NHP2 0.17606254 
POLA1 0.25320756 
POLD3 0.32372433 
PRIM1 0.38140765 
RFC5 0.44302294 
POLD1 0.497649 
...

我需要一个命令来减去第二列中的每个相邻行并给出以下结果:

WRAP53 0.0849529 
NHP2 0.0911097 
POLA1 0.077145 
POLD3 0.0705168 
PRIM1 0.0576833 
RFC5 0.0616153 
POLD1 0.0546261 
...

当我只保留第二列时,我知道该怎么做,它会是这样的:

awk '{print $2 " "$7" "$8}' REACTOME_EXTENSION_OF_TELOMERES.xls |  awk '$8!="No"  {print $1 " " $2}' | awk 'NR>1' | awk 'BEGIN { OFS=", "; print "Name" " " "0" };{ print $0 " " "" }' | awk '{print $NF}' | awk 'NR-1{print $0-p}{p=$0}'

但是如何才能保留如上所示的第一列呢?

REACTOME_EXTENSION_OF_TELOMERES.xls 文件如下所示:

NAME    PROBE   GENE SYMBOL     GENE_TITLE      RANK IN GENE LIST       RANK METRIC SCORE       RUNNING ES      CORE ENRICHMENT
row_0   WRAP53  null    null    163     1.5818238258361816      0.08495288      Yes
row_1   NHP2    null    null    201     1.5055444240570068      0.17606254      Yes
row_2   POLA1   null    null    283     1.3435969352722168      0.25320756      Yes
row_3   POLD3   null    null    367     1.240567684173584       0.32372433      Yes
row_4   PRIM1   null    null    501     1.1049883365631104      0.38140765      Yes
row_5   RFC5    null    null    557     1.0596935749053955      0.44302294      Yes
row_6   POLD1   null    null    653     1.0035457611083984      0.497649        Yes

如果我可以将整个命令的输出写入:REACTOME_EXTENSION_OF_TELOMERES.y,那就太好了

答案1

您的整个 awk 管道可以替换为

awk 'NR > 1 && $8 != "No" {print $2, $7 - prev} {prev = $7}' REACTOME_EXTENSION_OF_TELOMERES.xls

哪个输出

WRAP53 0.0849529
NHP2 0.0911097
POLA1 0.077145
POLD3 0.0705168
PRIM1 0.0576833
RFC5 0.0616153
POLD1 0.0546261

答案2

$ awk 'BEGIN { OFS=FS } $8 == "No" { next } { tmp = $7 } NR > 2 { $7 -= prev } { prev = tmp; print }' inputfile
NAME    PROBE   GENE SYMBOL     GENE_TITLE      RANK IN GENE LIST       RANK METRIC SCORE       RUNNING ES      CORE ENRICHMENT
row_0   WRAP53  null    null    163     1.5818238258361816      0.08495288      Yes
row_1   NHP2    null    null    201     1.5055444240570068      0.0911097       Yes
row_2   POLA1   null    null    283     1.3435969352722168      0.077145        Yes
row_3   POLD3   null    null    367     1.240567684173584       0.0705168       Yes
row_4   PRIM1   null    null    501     1.1049883365631104      0.0576833       Yes
row_5   RFC5    null    null    557     1.0596935749053955      0.0616153       Yes
row_6   POLD1   null    null    653     1.0035457611083984      0.0546261       Yes

awk程序带有注释:

# Set output delimiter to input delimiter (tab, set with -F)
BEGIN { OFS = FS }

# Skip lines whose 8th column is "No"
$8 == "No" { next }  # or { exit } if "No"-lines are sorted at the end.

# Save the original value in column 7.
{ tmp = $7 }

# For any row past both the header and the first data line,
# decrease column 7 by the previous row's column 7 value.
NR > 2 { $7 -= prev }

# Remember the current row's original column 7 value
# in prev and print the (possibly) modified row.
{
    prev = tmp
    print
}

将输出重定向到新文件名来存储它:

awk '...as above...' inputfile >outputfile

相关内容