检索第二个文件中列出的作业列表的作业定义

检索第二个文件中列出的作业列表的作业定义

我有一个包含 250,000 个职位详细信息的文件。在此源文件中,所有作业都有不同的参数,因此每个作业的行数可能会有所不同。唯一的模式是每个作业定义以断裂线开始insert:并以断裂线结束。

insert: PPC_SA1   job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0
description: "Run program"
std_out_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.log"
std_err_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.err"
alarm_if_fail: 1
group: P
resources:

insert: PPC_SA2   job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0

insert: PPC_SA3   job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0
description: "Run program"
std_out_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.log"

insert: PPC_SA4   job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0
description: "Run program"
std_out_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.log"
std_err_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.err"
alarm_if_fail: 1
group: P
resources:

insert: PPC_SA5   job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0

insert: PPC_SA6   job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0
description: "Run program"
std_out_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.log"

目标职位:

PPC_SA1
PPC_SA5
PPC_SA3

我需要将上面列表中这些作业的条目提取到另一个文件中:

insert: PPC_SA1   job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0
description: "Run program"
std_out_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.log"
std_err_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.err"
alarm_if_fail: 1
group: P
resources:

insert: PPC_SA5   job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0

insert: PPC_SA3   job_type: CMD
box: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0
description: "Run program"
std_out_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.log"

答案1

由于作业全部以空行结束,因此您可以perl在“段落模式”下使用(这意味着它认为\n\n, 空行作为记录分隔符,有效地将“段落”视为“行”):

$ perl -00lne 'print if /insert:\s+PPC_SA[153]\s/' file
insert: PPC_SA1   job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0
description: "Run program"
std_out_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.log"
std_err_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.err"
alarm_if_fail: 1
group: P
resources:

insert: PPC_SA3   job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0
description: "Run program"
std_out_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.log"

insert: PPC_SA5   job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0

启用-00段落模式,然后脚本打印所有匹配的记录,后跟、或insert:\s+PPC_SA之一以及另一个空白字符。153

当然,如果你有很多目标 id,这不太实用,所以你也可以将其概括为:

cat file | 
    perl -00lne 'BEGIN{ $k{$_}++ for @ARGV; @ARGV=()} /insert:\s+(\S+)/; print if $k{$1}' PPC_SA1 PPC_SA5 PPC_SA3

或者,您也可以使用awk.将目标 ID 保存在一个文件中(target_ids在本示例中称为),每行一个,然后运行:

$ awk '(NR==FNR){a[$1]++; next}
       { 
         if(/insert:/ && $2 in a){want=1} 
         if(want){print}
         if(/^\s*$/){want=0}
        }' target_ids file
insert: PPC_SA1   job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0
description: "Run program"
std_out_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.log"
std_err_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.err"
alarm_if_fail: 1
group: P
resources:

insert: PPC_SA3   job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0
description: "Run program"
std_out_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.log"

insert: PPC_SA5   job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0

答案2

那这个呢?

awk 'NR==FNR{ trgtJbs[$0]; next } ($2 in trgtJbs)' targetJobs RS='' allJobs

首先我们读取所有的 targetJobstargetJobs文件,那么既然您提到了中的每个作业allJobs文件之间用空行分隔,所以我们设置埃科德S将第二个文件的分隔符设置为空行,并检查每个作业块的第二个字段是否存在于trgtJbs我们使用的数组,然后它们将被输出。

以防万一您想在输出中保留该空行,请执行以下操作:

awk 'NR==FNR{ trgtJbs[$0]; next }
    ($2 in trgtJbs){ print sep $0; sep=ORS }' targetJobs RS='' allJobs

答案3

在段落模式下awk,您可以选择所需的记录:

awk -v RS= -v FS='\n' -v ORS='\n\n' '$1~/PPC_SA(1|3|5)[[:space:]]+/' file
insert: PPC_SA1   job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0
description: "Run program"
std_out_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.log"
std_err_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.err"
alarm_if_fail: 1
group: P
resources:

insert: PPC_SA3   job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0
description: "Run program"
std_out_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.log"

insert: PPC_SA5   job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0

答案4

sed -n '/^insert.* PPC_SA[153]/,/^$/p' filename



output

insert: PPC_SA1   job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0
description: "Run program"
std_out_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.log"
std_err_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.err"
alarm_if_fail: 1
group: P
resources:

insert: PPC_SA3   job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0
description: "Run program"
std_out_file: "/home/PROD/autosys/logs/${AUTO_JOB_NAME}_`date +%y%m%d`.log"

insert: PPC_SA5   job_type: CMD
name: PPC
command: sa
machine: P
owner: cat
permission:
date_conditions: 0

相关内容