我有一个包含 250,000 个职位详细信息的文件。在此源文件中,所有作业都有不同的参数,因此每个作业的行数可能会有所不同。唯一的模式是每个作业定义以断裂线开始insert:
并以断裂线结束。
insert: PPC_SA1 job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0
description: "Run program"
std_out_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.log"
std_err_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.err"
alarm_if_fail: 1
group: P
resources:
insert: PPC_SA2 job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0
insert: PPC_SA3 job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0
description: "Run program"
std_out_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.log"
insert: PPC_SA4 job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0
description: "Run program"
std_out_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.log"
std_err_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.err"
alarm_if_fail: 1
group: P
resources:
insert: PPC_SA5 job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0
insert: PPC_SA6 job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0
description: "Run program"
std_out_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.log"
目标职位:
PPC_SA1
PPC_SA5
PPC_SA3
我需要将上面列表中这些作业的条目提取到另一个文件中:
insert: PPC_SA1 job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0
description: "Run program"
std_out_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.log"
std_err_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.err"
alarm_if_fail: 1
group: P
resources:
insert: PPC_SA5 job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0
insert: PPC_SA3 job_type: CMD
box: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0
description: "Run program"
std_out_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.log"
答案1
由于作业全部以空行结束,因此您可以perl
在“段落模式”下使用(这意味着它认为\n\n
, 空行作为记录分隔符,有效地将“段落”视为“行”):
$ perl -00lne 'print if /insert:\s+PPC_SA[153]\s/' file
insert: PPC_SA1 job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0
description: "Run program"
std_out_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.log"
std_err_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.err"
alarm_if_fail: 1
group: P
resources:
insert: PPC_SA3 job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0
description: "Run program"
std_out_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.log"
insert: PPC_SA5 job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0
启用-00
段落模式,然后脚本打印所有匹配的记录,后跟、或insert:\s+PPC_SA
之一以及另一个空白字符。1
5
3
当然,如果你有很多目标 id,这不太实用,所以你也可以将其概括为:
cat file |
perl -00lne 'BEGIN{ $k{$_}++ for @ARGV; @ARGV=()} /insert:\s+(\S+)/; print if $k{$1}' PPC_SA1 PPC_SA5 PPC_SA3
或者,您也可以使用awk
.将目标 ID 保存在一个文件中(target_ids
在本示例中称为),每行一个,然后运行:
$ awk '(NR==FNR){a[$1]++; next}
{
if(/insert:/ && $2 in a){want=1}
if(want){print}
if(/^\s*$/){want=0}
}' target_ids file
insert: PPC_SA1 job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0
description: "Run program"
std_out_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.log"
std_err_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.err"
alarm_if_fail: 1
group: P
resources:
insert: PPC_SA3 job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0
description: "Run program"
std_out_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.log"
insert: PPC_SA5 job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0
答案2
那这个呢?
awk 'NR==FNR{ trgtJbs[$0]; next } ($2 in trgtJbs)' targetJobs RS='' allJobs
首先我们读取所有的 targetJobstargetJobs
文件,那么既然您提到了中的每个作业allJobs
文件之间用空行分隔,所以我们设置右埃科德S将第二个文件的分隔符设置为空行,并检查每个作业块的第二个字段是否存在于trgtJbs
我们使用的数组,然后它们将被输出。
以防万一您想在输出中保留该空行,请执行以下操作:
awk 'NR==FNR{ trgtJbs[$0]; next }
($2 in trgtJbs){ print sep $0; sep=ORS }' targetJobs RS='' allJobs
答案3
在段落模式下awk
,您可以选择所需的记录:
awk -v RS= -v FS='\n' -v ORS='\n\n' '$1~/PPC_SA(1|3|5)[[:space:]]+/' file
insert: PPC_SA1 job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0
description: "Run program"
std_out_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.log"
std_err_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.err"
alarm_if_fail: 1
group: P
resources:
insert: PPC_SA3 job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0
description: "Run program"
std_out_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.log"
insert: PPC_SA5 job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0
答案4
sed -n '/^insert.* PPC_SA[153]/,/^$/p' filename
output
insert: PPC_SA1 job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0
description: "Run program"
std_out_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.log"
std_err_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.err"
alarm_if_fail: 1
group: P
resources:
insert: PPC_SA3 job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0
description: "Run program"
std_out_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.log"
insert: PPC_SA5 job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0