我有很多这样的台词:
<i class="fa fa-angle-right font-14 color-blue mr-1"></i> Books To Read <span>30</span>
<i class="fa fa-angle-right font-14 color-blue mr-1"></i> Angels Heaven <span>11</span>
我只想从每一行中选择标题,例如,Books To Read
以及每一行的数字,例如。30
我可以做一个正则表达式:
寻找: <i class="fa fa-angle-right font-14 color-blue mr-1"></i>(.*? \w+ )<span>(\d+)</span>
替换为: \1\2
我的问题:我只想选择这些字符串,而不是替换它们。因此搜索必须突出显示每个字符串,而不使用 Replace 删除任何内容。可能吗?我想将正则表达式公式用于 Python 代码。
答案1
使用 Python:
import re
lines = '''
<i class="fa fa-angle-right font-14 color-blue mr-1"></i> Books To Read <span>30</span>
<i class="fa fa-angle-right font-14 color-blue mr-1"></i> Angels Heaven <span>11</span>
I want
'''
# Define the regex patterns
title_pattern = r'<i.*?>(.*?)<\/i>'
number_pattern = r'<span>(\d+)<\/span>'
# Extract the title and number using regex
titles = re.findall(title_pattern, lines)
numbers = re.findall(number_pattern, lines)
# Print the extracted titles and numbers
for title, number in zip(titles, numbers):
print("Title:", title.strip())
print("Number:", number)
print()