选择性提取数据

Question 1

一个sed办法...

sed -ne'/^Folder: *\[/!{'                                     \
         -e'/^Workflow\( run status\)\{0,1\}: *\[/!{'         \
              -e'/^Schedule time: *\[/!{'                     \
                   -e'/^Integration Service: *\[/!d'          \
    -e\} -e\} -e\} -e"s//'/"      -e"s/\].*/'/"  -e'H;x'      \
                   -e'/ .*\n.*/h' -e's///'       -e'x'        \ 
                   -e's//Insert into <tablename> values(&)/'  \
                   -e's/\n//'     -e's//, /gp'

Insert into <tablename> values('ALS_DIM', 'wf_ld_als_dim', 'Scheduled', 'Wed Dec 30 19:00:00 2015', 'TEST_Integration_Service')
Insert into <tablename> values('ALS_FACT', 'wf_s_m_ld_interchanges_detail_log', 'Scheduled', 'Mon Jan 04 16:30:00 2016', 'TEST_Integration_Service')
Insert into <tablename> values('ALS_PRD', 'wf_maint_service_fields', 'Scheduled', 'Thu Dec 31 07:10:00 2015', 'TEST_Integration_Service')

所以第一行否定了可接受的匹配，例如......

if ! match ^Folder: *\[
then  if ! match ^Workflow: *\[ or ^Workflow run status: *\[
      then if !  match ^Schedule time: *\[
           then  if !  match ^Integration Service: *\[
                 then  delete
                 fi
           fi
      fi
fi

然而，一旦一条线与链中的任何一条相匹配，链就会断开并且不会遍历到末端。这意味着针对任何给定匹配行测试的最后一个正则表达式将描述该行的头部直到右方括号。在sed脚本中，您可以使用空地址再次引用最近编译的正则表达式//。我只是将其替换掉，并将其替换为您所需输出中的s//'/前导引号。'

剩下的就是您想要的所有信息以及]每行第一行后面的尾随上下文。因此，我也将每行不需要的尾部s/\].*/'/替换为您的尾随引用。'

此时，所有线条都已被剥离为您想要的部分，但它们尚未连接。为了实现这一点，我使用了H保留行周期的旧空间。因此，将每行的副本附加到H旧空间，更改x保留和模式缓冲区，并/ .*\n.*/查找<空格>随后是一个<换行>- 仅发生在日期变更线之后的下一条保留线上。

如果找到该模式，我会用h它覆盖旧空间并s///替换掉全部模式空间的（将其留空以供下一次迭代使用，因为这是本次迭代的最后一行）。这里的空地址s///仅删除与该地址匹配的行的内容<空格> + <换行>模式 - 因此只有在每次迭代的最后一行这些指令中的任何一个才能成功。

无论如何，我后来最后x一次更改了保持缓冲区和模式缓冲区，因此模式缓冲区现在将包含由以下分隔的本次迭代的所有匹配项<换行>字符，否则它将仅包含最近的匹配项和零<换行>人物。如果这是本次迭代的最后一个匹配行，则此时的保持缓冲区为空，否则它包含到目前为止的所有匹配行，每个匹配行都以前导为前缀<换行>。

接下来，我再次引用相同的正则表达式，并尝试替换与<空格> + <换行>自身的图案包裹在Insert into <tablename> values(和一个尾随的)。

最后，如果有的话，我s///不会替换领先的<换行>在最后一个匹配行的模式空间中，以及所有剩余的<换行符>一个逗号然后一个<空格>每个。如果s///替换成功，其结果将p打印到标准输出。

Answer

一个sed办法...

sed -ne'/^Folder: *\[/!{'                                     \
         -e'/^Workflow\( run status\)\{0,1\}: *\[/!{'         \
              -e'/^Schedule time: *\[/!{'                     \
                   -e'/^Integration Service: *\[/!d'          \
    -e\} -e\} -e\} -e"s//'/"      -e"s/\].*/'/"  -e'H;x'      \
                   -e'/ .*\n.*/h' -e's///'       -e'x'        \ 
                   -e's//Insert into <tablename> values(&)/'  \
                   -e's/\n//'     -e's//, /gp'

Insert into <tablename> values('ALS_DIM', 'wf_ld_als_dim', 'Scheduled', 'Wed Dec 30 19:00:00 2015', 'TEST_Integration_Service')
Insert into <tablename> values('ALS_FACT', 'wf_s_m_ld_interchanges_detail_log', 'Scheduled', 'Mon Jan 04 16:30:00 2016', 'TEST_Integration_Service')
Insert into <tablename> values('ALS_PRD', 'wf_maint_service_fields', 'Scheduled', 'Thu Dec 31 07:10:00 2015', 'TEST_Integration_Service')