提取字符串后跟特定单词/符号

提取字符串后跟特定单词/符号

我的输入文件 input.txt 有两行,如下所示,我需要从第一行提取claimStartDate,从第二行提取claimEndDate。

<ProfessionalClaim paymentIndicator="P" claimProcessedDateTime="20180409120000102" claimEndDate="2018-04-02" claimStartDate="2018-04-02" sourceSystemId="abcd" claimActionCode="00">

<ProfessionalClaim paymentIndicator="P" claimProcessedDateTime="20180430120000281" claimEndDate="2018-04-17" claimStartDate="2018-04-17" sourceSystemId="abcd" claimActionCode="00">

rm input.txt
awk '/<ProfessionalClaim/' test.xml | head -1 > input.txt
awk '/<ProfessionalClaim/' test.xml | tail -1 >> input.txt
awk '{match($0, "claimStartDate=\"([^\"]+)\"", start); print start[1]} \
     {match($0, "claimEndDate=\"([^\"]+)\"", end); print end[1]}' input.txt

答案1

$ awk '/F_LINE/ {match($0, "claimStartDate=\"([^\"]+)\"", start); print start[1]} \         
       /L_LINE/ {match($0, "claimEndDate=\"([^\"]+)\"", end); print end[1]}' input.txt
2018-04-02
2018-04-17

根据您的新信息进行编辑:

$ awk 'NR==1 {match($0, "claimStartDate=\"([^\"]+)\"", start); print start[1]} \            
       NR==2 {match($0, "claimEndDate=\"([^\"]+)\"", end); print end[1]}' input.txt
2018-04-02
2018-04-17

您还可以一次完成这一切:

$ grep "<ProfessionalClaim" text.xml \
| sed -n '1p;$p' \
| $ awk 'NR==1 {match($0, "claimStartDate=\"([^\"]+)\"", start); print start[1]} \            
         NR==2 {match($0, "claimEndDate=\"([^\"]+)\"", end); print end[1]}'
  • grep查找所有包含<ProfessionalClaimin 的行text.xml
  • sed将行截断到第一个和最后一个 onyl
  • awk将打印claimStartDate第一行和ClaimEndDate第二行

答案2

假设一些 XML 输入文档如下所示:

<?xml version="1.0"?>
<root>
  <ProfessionalClaim paymentIndicator="P" claimProcessedDateTime="20180409120000102" claimEndDate="2018-04-02" claimStartDate="2018-04-02" sourceSystemId="abcd" claimActionCode="00"/>
  <ProfessionalClaim paymentIndicator="P" claimProcessedDateTime="20180430120000281" claimEndDate="2018-04-17" claimStartDate="2018-04-17" sourceSystemId="abcd" claimActionCode="00"/>
  <ProfessionalClaim paymentIndicator="P" claimProcessedDateTime="20180430120000281" claimEndDate="2018-04-18" claimStartDate="2018-04-18" sourceSystemId="abcd" claimActionCode="00"/>
  <ProfessionalClaim paymentIndicator="P" claimProcessedDateTime="20180430120000281" claimEndDate="2018-04-19" claimStartDate="2018-04-19" sourceSystemId="abcd" claimActionCode="00"/>
</root>

...我们可以用来从后面有另一个节点的每个节点中xmlstarlet提取claimStartDate属性值,以及下一个节点的属性值:ProfessionalClaimProfessionalClaimProfessionalClaimclaimEndDate

xmlstarlet select --template \
    --match '//ProfessionalClaim[following-sibling::ProfessionalClaim/@claimEndDate]' \
    --value-of 'concat(@claimStartDate, " ", following-sibling::ProfessionalClaim/@claimEndDate)' \
    -nl input.txt

这首先匹配每个ProfessionalClaim后面跟着另一个ProfessionalClaim节点的节点。

对于每个这样的节点,属性的值与后续节点的属性claimStartDate值连接,并使用单个空格字符作为分隔符。claimEndDateProfessionalClaim

鉴于我上面的示例文档,这将生成

2018-04-02 2018-04-17
2018-04-17 2018-04-18
2018-04-18 2018-04-19

相关内容