在linux/unix中提取数据

Question 1

你可以使用 awk关联数组由您断言其唯一性的字段进行索引，例如对于to=字段的唯一值（$6以逗号分隔时的字段）：

$ awk -F, '{split($6,s,"="); arr[s[2]]=s[2]" = "$7","$6;} END{for (id in arr) print arr[id]}' data.txt
EGF = toid=164,to=EGF
ADRA1A = toid=21,to=ADRA1A
ACE = toid=11,to=ACE
ADRA1B = toid=22,to=ADRA1B
ADRA1D = toid=23,to=ADRA1D
DRD2 = toid=158,to=DRD2
CHRM1 = toid=114,to=CHRM1
CHRM2 = toid=115,to=CHRM2

唯一条目的表达式fromid相同，但将字段$6and替换$7为$2and $3：

$ awk -F, '{split($2,s,"="); arr[s[2]]=s[2]" = "$3","$2;} END{for (id in arr) print arr[id]}' data.txt
ABCC8 = fromid=5,from=ABCC8
ABCB11 = fromid=4,from=ABCB11

如果您希望输出同时包含toid和fromid数据，您可以组合表达式，即

awk -F, '{
split($2,s,"="); arr[s[2]]=s[2]" = "$3","$2;
split($6,s,"="); arr[s[2]]=s[2]" = "$7","$6;
} END{for (id in arr) print arr[id]}' data.txt

要更改标签（即标记一个表中的所有字段，即使toid它们来自行fromid）可能最自然的方法是将输出通过sed例如

$ awk -F, '{
split($2,s,"="); arr[s[2]]=s[2]" = "$3","$2;
split($6,s,"="); arr[s[2]]=s[2]" = "$7","$6;
} END{for (id in arr) print arr[id]}' data.txt | sed 's/from/to/g'
ABCC8 = toid=5,to=ABCC8
EGF = toid=164,to=EGF
ADRA1A = toid=21,to=ADRA1A
ACE = toid=11,to=ACE
ABCB11 = toid=4,to=ABCB11
ADRA1B = toid=22,to=ADRA1B
ADRA1D = toid=23,to=ADRA1D
DRD2 = toid=158,to=DRD2
CHRM1 = toid=114,to=CHRM1
CHRM2 = toid=115,to=CHRM2

您可以fromid <--> toid在内部进行替换awk，但我认为这种方法使意图更清晰。然后只需将最终sed表达式更改为相反即可创建另一个表sed 's/to/from/g'。

Answer

你可以使用 awk关联数组由您断言其唯一性的字段进行索引，例如对于to=字段的唯一值（$6以逗号分隔时的字段）：

$ awk -F, '{split($6,s,"="); arr[s[2]]=s[2]" = "$7","$6;} END{for (id in arr) print arr[id]}' data.txt
EGF = toid=164,to=EGF
ADRA1A = toid=21,to=ADRA1A
ACE = toid=11,to=ACE
ADRA1B = toid=22,to=ADRA1B
ADRA1D = toid=23,to=ADRA1D
DRD2 = toid=158,to=DRD2
CHRM1 = toid=114,to=CHRM1
CHRM2 = toid=115,to=CHRM2

唯一条目的表达式fromid相同，但将字段$6and替换$7为$2and $3：

$ awk -F, '{split($2,s,"="); arr[s[2]]=s[2]" = "$3","$2;} END{for (id in arr) print arr[id]}' data.txt
ABCC8 = fromid=5,from=ABCC8
ABCB11 = fromid=4,from=ABCB11

如果您希望输出同时包含toid和fromid数据，您可以组合表达式，即

awk -F, '{
split($2,s,"="); arr[s[2]]=s[2]" = "$3","$2;
split($6,s,"="); arr[s[2]]=s[2]" = "$7","$6;
} END{for (id in arr) print arr[id]}' data.txt

要更改标签（即标记一个表中的所有字段，即使toid它们来自行fromid）可能最自然的方法是将输出通过sed例如

$ awk -F, '{
split($2,s,"="); arr[s[2]]=s[2]" = "$3","$2;
split($6,s,"="); arr[s[2]]=s[2]" = "$7","$6;
} END{for (id in arr) print arr[id]}' data.txt | sed 's/from/to/g'
ABCC8 = toid=5,to=ABCC8
EGF = toid=164,to=EGF
ADRA1A = toid=21,to=ADRA1A
ACE = toid=11,to=ACE
ABCB11 = toid=4,to=ABCB11
ADRA1B = toid=22,to=ADRA1B
ADRA1D = toid=23,to=ADRA1D
DRD2 = toid=158,to=DRD2
CHRM1 = toid=114,to=CHRM1
CHRM2 = toid=115,to=CHRM2

您可以fromid <--> toid在内部进行替换awk，但我认为这种方法使意图更清晰。然后只需将最终sed表达式更改为相反即可创建另一个表sed 's/to/from/g'。

Question 2

假设名称位于名为“filename.txt”的文件中，您可以对第一个表尝试以下操作：

cat 文件名.txt | awk -F "," '{ print $2 " = " $7 "," $6}' | sed -r 's/^.{5}//'

对于第二个表：

cat 文件名.txt | awk -F "," '{ print $2 " = " $3 "," $6}' | sed -r 's/^.{5}//'

祝你好运！

编辑：对于第二个表：

cat 文件名.txt | awk -F "," '{ print $2 " = " $7 "," $6}' | sed -r 's/^.{5}//' | sed 's/toid/fromid/'

编辑2：

cat 文件名.txt | awk -F "," '{ print $2 " = " $7 "," $6}' | sed 's/^.....//' | sed 's/toid/fromid/'

这是 5 个点。

Answer