从文本文件中提取特定块

从文本文件中提取特定块

我正在尝试从包含 1193373557 行的巨大文本文件中提取一块数据。

我排除了前 25 行和后 4 行,更具挑战性的工作是剩余的块包含由 2 个标题组成的数据;我想根据文件标题分离这些数据。

示例:(test.txt此文件包含 header1 和 header2 的数据)

header1
------
----
----
----
header2
-----
----
----
---

所需输出:

  • header1.txt:在这个文件中,所有行都应该存在,直到 header2 开始
  • header2.txt:应打印 header1 之后的所有行

答案1

为了header1.txt

sed -n '/^header1$/,/^header2$/{/^header2$/d;p}' file >header1.txt
  • /pattern1/,/pattern2/这种语法sed匹配 和 之间(包括 和 )的所有pattern1内容pattern2
  • /^header2$/d这将删除 header2 行,因为它不再需要。
  • p其余的将被打印出来。

为了header2.txt

sed -n '/^header2$/,$p' file >header2.txt
  • 与第一个命令类似,该命令从header2到最后一行进行匹配$

答案2

使用 AWK:

awk -v nlines=$(wc -l test.txt | cut -d ' ' -f 1) '$0=="Reading input from PoolA_Rnase", $0=="Reading input from PoolB_Rnase" {if($0 != "Reading input from PoolB_Rnase") {print >"header1.txt"}} $0=="Reading input from PoolB_Rnase", NR==nlines-4 {print >"header2.txt"}' test.txt

AWK 脚本扩展并注释:

  • nlines包含文件中的行数,通过计算$(wc -l test.txt | cut -d ' ' -f 1)
$0=="Reading input from PoolA_Rnase", $0=="Reading input from PoolB_Rnase" { # if the current record is between a record matching "Reading input from PoolA_Rnase" and a record matching "Reading input from PoolB_Rnase" inclusive
    if($0 != "Reading input from PoolB_Rnase") { # if the current record doesn't match "Reading input from PoolB_Rnase"
        print >"header1.txt" # prints the record to header1.txt
    }
}
$0=="Reading input from PoolB_Rnase", NR==nlines-4 { # if the current record is between a record matching "Reading input from PoolB_Rnase" and the record number `nlines-4` inclusive
    print >"header2.txt" # prints the record to header2.txt
}
% cat test.txt
line 1
line 2
line 3
line 4
line 5
line 6
line 7
line 8
line 9
line 10
line 11
line 12
line 13
line 14
line 15
line 16
line 17
line 18
line 19
line 20
line 21
line 22
line 23
line 24
line 25
Reading input from PoolA_Rnase
foo
foo
foo
Reading input from PoolB_Rnase
bar
bar
bar
line 1
line 2
line 3
line 4
% awk -v nlines=$(wc -l test.txt | cut -d ' ' -f 1) '$0=="Reading input from PoolA_Rnase", $0=="Reading input from PoolB_Rnase" {if($0 != "Reading input from PoolB_Rnase") {print >"header1.txt"}} $0=="Reading input from PoolB_Rnase", NR==nlines-4 {print >"header2.txt"}' test.txt
% cat header1.txt 
Reading input from PoolA_Rnase
foo
foo
foo
% cat header2.txt 
Reading input from PoolB_Rnase
bar
bar
bar

相关内容