正则表达式:选择不带搜索/替换的字符串

正则表达式:选择不带搜索/替换的字符串

我有很多这样的台词:

<i class="fa fa-angle-right font-14 color-blue mr-1"></i> Books To Read <span>30</span>
<i class="fa fa-angle-right font-14 color-blue mr-1"></i> Angels Heaven <span>11</span>

我只想从每一行中选择标题,例如,Books To Read以及每一行的数字,例如。30

我可以做一个正则表达式:

寻找: <i class="fa fa-angle-right font-14 color-blue mr-1"></i>(.*? \w+ )<span>(\d+)</span>

替换为: \1\2

我的问题:我只想选择这些字符串,而不是替换它们。因此搜索必须突出显示每个字符串,而不使用 Replace 删除任何内容。可能吗?我想将正则表达式公式用于 Python 代码。

答案1

使用 Python:

import re

lines = '''
<i class="fa fa-angle-right font-14 color-blue mr-1"></i> Books To Read <span>30</span>
<i class="fa fa-angle-right font-14 color-blue mr-1"></i> Angels Heaven <span>11</span>
I want
'''

# Define the regex patterns
title_pattern = r'<i.*?>(.*?)<\/i>'
number_pattern = r'<span>(\d+)<\/span>'

# Extract the title and number using regex
titles = re.findall(title_pattern, lines)
numbers = re.findall(number_pattern, lines)

# Print the extracted titles and numbers
for title, number in zip(titles, numbers):
    print("Title:", title.strip())
    print("Number:", number)
    print()

相关内容