我有一个文本文件,内容如下:
duration: 17100
series: 2016
episode: 58
modesizes: original: hd1=9120MB,hd2=7543MB,sd1=4872MB,high1=2833MB,low1=634MB
runtime: 285
duration: 13740
series: 2016
episode: 59
modesizes: original: hd1=9024MB,hd2=7203MB,sd1=5104MB,high1=2950MB,low1=570MB
runtime: 229
我想提取duration
,episode
和modesizes
.输出应如下所示:
13740,59,9024MB,7203MB,5104MB,2950MB,570MB
答案1
和awk
:
awk '/duration|episode/{printf "%s,", $2} /modesizes/{gsub(/[^=,]+=/,"",$3); print $3}' file
解释:
/duration|episode/
如果该行匹配duration
或episode
printf "%s,", $2
然后打印带有值的字段
/modesizes/
如果该行匹配modesizes
gsub(/[^=,]+=/,"",$3)
然后删除标识符和等号print $3
并打印更改的字段
使用您的输入示例,它会打印:
17100,58,9120MB,7543MB,4872MB,2833MB,634MB
13740,59,9024MB,7203MB,5104MB,2950MB,570MB
答案2
如果你有正则grep
表达式pcre
$ grep -oP '(duration|episode):\s*\K\d+|\d+MB' ip.txt | pr -ats, -7
17100,58,9120MB,7543MB,4872MB,2833MB,634MB
13740,59,9024MB,7203MB,5104MB,2950MB,570MB
(duration|episode):\s*\K
正向回顾检查duration
或episode
后跟:
, 零个或多个空格。这不是输出的一部分\d+
一位或多位数字|\d+MB
替代模式,一个或多个数字以结尾MB
pr
然后使用,
作为分隔符和最大列来7
设置如此获得的输出的样式
答案3
sed解决方案:
sed -E -e \
'/duration:/{
N;N;N;N
s/duration:\s*([0-9]*).*episode:\s*([0-9]*).*hd1=([0-9]*MB),hd2=([0-9]*MB),sd1=([0-9]*MB),high1=([0-9]*MB),low1=([0-9]*MB).*/\1,\2,\3,\4,\5,\6/
}' < input_file
它输出:
17100,58,9120MB,7543MB,4872MB,2833MB
13740,59,9024MB,7203MB,5104MB,2950MB
它保留空行。
如果你不想要这些:
sed -E -n -e \
'/duration:/{
N;N;N;N
s/duration:\s*([0-9]*).*episode:\s*([0-9]*).*hd1=([0-9]*MB),hd2=([0-9]*MB),sd1=([0-9]*MB),high1=([0-9]*MB),low1=([0-9]*MB).*/\1,\2,\3,\4,\5,\6/
p
d
}' < input_file