linux + 如何从 xml 文件中捕获值

linux + 如何从 xml 文件中捕获值

我想捕获此 xml 文件中的所有值并将文件中的值打印为 out1.txt

备注 - xml 中的值表示双括号中的单词

  more input.txt

 <app name="UAT/ECC/Global/MES/1206/MRP-S23"   ear="UAT/ECC/Global/MES/1206/MRP-S23.ear" xml="UAT/ECC/Glal/ME/120/MRP-  S23.xml"/>
 <app name="OQ/ediedbn/adSFSF/adSFSF-CL" ear="OQ/ebn/aSF/adSF- CL.ear"  xml="OQ/ediedbn/adSFSF/adSSF-CL.xml"/>
 <app name="OQ/ediedbn/adaEBS/adOrBS-HR-CL"  ear="OQ/ediedbn/adOraS/araEBS- HR-CL.ear" xml="OQ/eddbn/aOraEBS/adOEBS-   HR-CL.xml"/>
 <app name="UAT/CZ/LIMS/T068_01/LIMS-QA-S03" ear="UAT/CZ/LIS/T068_01/LIS-QA-     .ear" xml="UAT/CZ/LIMS/T068_01/LIMS-QA-S03.xml"/>

more out1.txt

UAT/ECC/Global/MES/1206/MRP-S23
UAT/ECC/Glal/ME/120/MRP-S23.xml
OQ/ediedbn/adSFSF/adSFSF-CL
OQ/ebn/aSF/adSF- CL.ear
.
.
.

请建议如何使用 awk / perl oneliner , bash 捕获 out1.txt 文件中的值

答案1

您可以使用 awk 对输入文件进行切片,如下所示:

gv@debian:$ cat a.txt
<app name="UAT/ECC/Global/MES/1206/MRP-S23"   ear="UAT/ECC/Global/MES/1206/MRP-S23.ear" xml="UAT/ECC/Glal/ME/120/MRP-  S23.xml"/>
<app name="OQ/ediedbn/adSFSF/adSFSF-CL" ear="OQ/ebn/aSF/adSF- CL.ear"  xml="OQ/ediedbn/adSFSF/adSSF-CL.xml"/>
<app name="OQ/ediedbn/adaEBS/adOrBS-HR-CL"  ear="OQ/ediedbn/adOraS/araEBS- HR-CL.ear" xml="OQ/eddbn/aOraEBS/adOEBS-   HR-CL.xml"/>
<app name="UAT/CZ/LIMS/T068_01/LIMS-QA-S03" ear="UAT/CZ/LIS/T068_01/LIS-QA-     .ear" xml="UAT/CZ/LIMS/T068_01/LIMS-QA-S03.xml"/>

gv@debian:$ cat b.txt

gv@debian:$ awk -F"name=|ear=|xml=|/>" '{print $2} {print $4}' a.txt >b.txt

gv@debian:$ cat b.txt
"UAT/ECC/Global/MES/1206/MRP-S23"   
"UAT/ECC/Glal/ME/120/MRP-  S23.xml"
"OQ/ediedbn/adSFSF/adSFSF-CL" 
"OQ/ediedbn/adSFSF/adSSF-CL.xml"
"OQ/ediedbn/adaEBS/adOrBS-HR-CL"  
"OQ/eddbn/aOraEBS/adOEBS-   HR-CL.xml"
"UAT/CZ/LIMS/T068_01/LIMS-QA-S03" 
"UAT/CZ/LIMS/T068_01/LIMS-QA-S03.xml"

如果您不想保留双引号,可以使用 sed 将它们移开,如下所示:

gv@debian:$ sed -i 's/\"//g' b.txt
gv@debian:$ cat b.txt
UAT/ECC/Global/MES/1206/MRP-S23   
UAT/ECC/Glal/ME/120/MRP-  S23.xml
OQ/ediedbn/adSFSF/adSFSF-CL 
OQ/ediedbn/adSFSF/adSSF-CL.xml
OQ/ediedbn/adaEBS/adOrBS-HR-CL  
OQ/eddbn/aOraEBS/adOEBS-   HR-CL.xml
UAT/CZ/LIMS/T068_01/LIMS-QA-S03 
UAT/CZ/LIMS/T068_01/LIMS-QA-S03.xml

或者在一个衬垫中,通过管道将 awk 传输到 sed :

gv@debian:$ awk -F"name=|ear=|xml=|/>" '{print $2} {print $4}' a.txt |sed 's/\"//g' >b.txt

提示:如果您希望将每个输入文件行的所有字段写入输出文件中的一行中,请使用{print $2 $4}(将字段放在同一括号内)。

这个 awk 方法工作的关键是 awk 可以接受多字符分隔符以及用 | 分隔的多个分隔符。 (=或) 。
awk 分隔符由选项 -F 定义

如果需要保存耳朵值,请将 {print $4} 替换为 {print $3}。

要了解有关 awk 切片的信息,请查看将由 awk 分隔的所有字段:

$ awk -F"name=|ear=|xml=|/>" '{print "Field1="$1} {print "Field2="$2} {print "Field3="$3} {print "Field4="$4}' a.txt
Field1=<app 
Field2="UAT/ECC/Global/MES/1206/MRP-S23"   
Field3="UAT/ECC/Global/MES/1206/MRP-S23.ear" 
Field4="UAT/ECC/Glal/ME/120/MRP-  S23.xml"
Field1=<app 
Field2="OQ/ediedbn/adSFSF/adSFSF-CL" 
Field3="OQ/ebn/aSF/adSF- CL.ear"  
Field4="OQ/ediedbn/adSFSF/adSSF-CL.xml"
Field1=<app 
Field2="OQ/ediedbn/adaEBS/adOrBS-HR-CL"  
Field3="OQ/ediedbn/adOraS/araEBS- HR-CL.ear" 
Field4="OQ/eddbn/aOraEBS/adOEBS-   HR-CL.xml"
Field1=<app 
Field2="UAT/CZ/LIMS/T068_01/LIMS-QA-S03" 
Field3="UAT/CZ/LIS/T068_01/LIS-QA-     .ear" 
Field4="UAT/CZ/LIMS/T068_01/LIMS-QA-S03.xml"

答案2

我尝试了这样的事情来得到你想要的:

sed 's/[^\"]*\"\([^\"]*\)\"[^\"]*/\1\n/g' input.txt > out.txt

它搜索双引号内的子字符串(但不在双引号内)并从 input.txt 文件中的每一行获取所有此类子字符串。它使用新行“\n”作为分隔符。

相关内容