我想在这篇文章中得到完全相同的输出(在linux/unix中提取数据上面,但我有一个新的输入文件,如下:
ABCB11 4 ACE 11
ABCB11 4 CHRM1 114
ABCB11 4 CHRM2 115
ABCB11 4 DRD2 158
ABCB11 4 EGF 164
ABCC8 5 ACE 11
ABCC8 5 ADRA1A 21
ABCC8 5 ADRA1B 22
ABCC8 5 ADRA1D 23
ABCC8 5 CHRM1 114
获取所有独特的基因并创造输出。
我希望所有行都具有 fromid,在从两者中获取不同的值后。确切的输出是我想要的:
ABCB11 = fromid=4,from=ABCB11
ABCC8 = fromid=5,from=ABCC8
ACE = fromid=11,from=ACE
CHRM1 = fromid=114,from=CHRM1
CHRM2 = fromid=115,from=CHRM2
DRD2 = fromid=158,from=DRD2
EGF = fromid=164,from=EGF
ADRA1A = fromid=21,from=ADRA1A
ADRA1B = fromid=22,from=ADRA1B
ADRA1D = fromid=23,from=ADRA1D
答案1
假设有 4 根柱子,则为两遍解决方案
awk 'NR == FNR {if (s[$1]++ == 0)
{printf "%s = fromid=%s, from=%s\n", $1, $2, $1}; next};
!s[$3]++{printf "%s = fromid=%s, from=%s\n", $3, $4, $3}' file file
ABCB11 = fromid=4, from=ABCB11
ABCC8 = fromid=5, from=ABCC8
ACE = fromid=11, from=ACE
CHRM1 = fromid=114, from=CHRM1
CHRM2 = fromid=115, from=CHRM2
DRD2 = fromid=158, from=DRD2
EGF = fromid=164, from=EGF
ADRA1A = fromid=21, from=ADRA1A
ADRA1B = fromid=22, from=ADRA1B
ADRA1D = fromid=23, from=ADRA1D
答案2
如果您不介意输出顺序:
$ awk '!($1 in a){a[$1] = $2}
!($3 in a){a[$3] = $4}
END {
for(i in a) {
print i" = fromid="a[i]",from="i
}
}' file
EGF = fromid=164,from=EGF
CHRM1 = fromid=114,from=CHRM1
CHRM2 = fromid=115,from=CHRM2
ACE = fromid=11,from=ACE
ADRA1A = fromid=21,from=ADRA1A
DRD2 = fromid=158,from=DRD2
ABCB11 = fromid=4,from=ABCB11
ADRA1B = fromid=22,from=ADRA1B
ABCC8 = fromid=5,from=ABCC8
ADRA1D = fromid=23,from=ADRA1D