查找特定列图案的位置

Question 1

虽然特登的很好的建议，以下 AWK 脚本可以完成这项工作：

$1 == "TYR" { seq = $1; start = $2; next }
($1 == "LYS" && seq == "TYR") || ($1 == "SER" && seq == "LYS") { seq = $1; next }
$1 == "ALA" && seq == "SER" { print start }
{ seq = "" }

这将查找TYR并记住起始位置；它还以正确的顺序匹配TYR, , LYS，并在每个阶段SER记录序列中的前一项。seq不匹配的行会导致序列被清除。

Answer

虽然特登的很好的建议，以下 AWK 脚本可以完成这项工作：

$1 == "TYR" { seq = $1; start = $2; next }
($1 == "LYS" && seq == "TYR") || ($1 == "SER" && seq == "LYS") { seq = $1; next }
$1 == "ALA" && seq == "SER" { print start }
{ seq = "" }

这将查找TYR并记住起始位置；它还以正确的顺序匹配TYR, , LYS，并在每个阶段SER记录序列中的前一项。seq不匹配的行会导致序列被清除。

Question 2

滑动窗口sed：

解析.sed

# Establish the sliding window
1N
2N

# Maintain the sliding window
N

# Match the desired pattern to the current window
/^TYR \(.*\)\nLYS .*\nSER .*\nALA .*$/ { 
  h;                           # Save the window in hold space
  s//\1/p;                     # Extract desired output
  x;                           # Re-establish window
}

# Maintain the sliding window
D

像这样运行它：

sed -nf parse.sed infile

输出：

Answer

滑动窗口sed：

解析.sed

# Establish the sliding window
1N
2N

# Maintain the sliding window
N

# Match the desired pattern to the current window
/^TYR \(.*\)\nLYS .*\nSER .*\nALA .*$/ { 
  h;                           # Save the window in hold space
  s//\1/p;                     # Extract desired output
  x;                           # Re-establish window
}

# Maintain the sliding window
D

像这样运行它：

sed -nf parse.sed infile

输出：

Question 3

与中相同的方法斯蒂芬·基特的回答，但没有附加seq变量。相反，连续的“数字”用于确定当前行是否属于我们正在查找的集合。

awk '{
  if ($1=="TYR") {
    i=$2 # remember index
  }
  else if (i!=0) { 
    if ($2==i+1 && $1=="LYS" || $2==i+2 && $1=="SER" || $2==i+3 && $1=="ALA") {
      if ($2==i+3) { # are we there yet?
        print i; exit
      }
    }
    else {
      i=0 # nope, reset index
    }
  }
}' file

（为了可读性保留了不需要的花括号和缩进）

Answer

与中相同的方法斯蒂芬·基特的回答，但没有附加seq变量。相反，连续的“数字”用于确定当前行是否属于我们正在查找的集合。

awk '{
  if ($1=="TYR") {
    i=$2 # remember index
  }
  else if (i!=0) { 
    if ($2==i+1 && $1=="LYS" || $2==i+2 && $1=="SER" || $2==i+3 && $1=="ALA") {
      if ($2==i+3) { # are we there yet?
        print i; exit
      }
    }
    else {
      i=0 # nope, reset index
    }
  }
}' file

（为了可读性保留了不需要的花括号和缩进）

Question 4

您可以使用滑动窗口来做到这一点：

解析.awk

# Split the pattern into the `p` array and remember how many there are in `n`
BEGIN { n = split(pat, p, "\n") }

# Collect n lines into the `A` array
NR <= n { A[NR] = $0; next }

# Maintain the sliding window after n lines
NR  > n {
  for(i=2; i<=n; i++)
    A[i-1] = A[i]
  A[n] = $0
}

# Test if the current window contains the pattern
{
  hit = 1
  for(i=1; i<=n; i++) {
    split(A[i], x)
    if(x[1] != p[i]) {
      hit = 0
      break
    }
  }

  # If the window matches print the second column
  if(hit) {
    split(A[1], x)
    print x[2]
  }
}

像这样运行它：

awk -v pat="$(< patternfile)" -f parse.awk infile

输出：

Answer

您可以使用滑动窗口来做到这一点：

解析.awk

# Split the pattern into the `p` array and remember how many there are in `n`
BEGIN { n = split(pat, p, "\n") }

# Collect n lines into the `A` array
NR <= n { A[NR] = $0; next }

# Maintain the sliding window after n lines
NR  > n {
  for(i=2; i<=n; i++)
    A[i-1] = A[i]
  A[n] = $0
}

# Test if the current window contains the pattern
{
  hit = 1
  for(i=1; i<=n; i++) {
    split(A[i], x)
    if(x[1] != p[i]) {
      hit = 0
      break
    }
  }

  # If the window matches print the second column
  if(hit) {
    split(A[1], x)
    print x[2]
  }
}

像这样运行它：

awk -v pat="$(< patternfile)" -f parse.awk infile

输出：

查找特定列图案的位置

答案1

答案2

答案3

答案4

相关内容