Python:正则表达式不适用于 re.search(AttributeError:'NoneType' 对象没有属性 'group')

Python:正则表达式不适用于 re.search(AttributeError:'NoneType' 对象没有属性 'group')

在我的 html 文件中我有以下行:

<div class="color-black mt-lg-0" id="hidden">, in</div>
<a href="https://neculaifantanaru.com/en/leadership-pro.html" title="View all articles from Leadership Pro" class="color-green font-weight-600 mx-1" id="hidden">Leadership Pro</a>

我用这个正则表达式 ^\s*<a href="(.*?)" title="View为了找到这个链接 https://neculaifantanaru.com/en/leadership-pro.html

在 notepad++ 中正则表达式搜索没问题!

问题出在 Python 中。

查找:(第 18 行)

b_content = re.search('^\s*<a href="(.*?)" title="View', new_file_content).group(1)

代替:

old_file_content = re.sub(', in <a href="(.*?)" title="Vezi', f', in <a href="{b_content}" title="Vezi', old_file_content)

re.search在线时出现此错误

Traceback (most recent call last):
  File "<module2>", line 18, in <module>
AttributeError: 'NoneType' object has no attribute 'group'

我也尝试用以下方法更改该行:

b_content = re.match(r'^\s*<a href="(.*?)" title="View', new_file_content).group(1)

但我得到了同样的错误

答案1

将这两行代码替换为:

import re

b_content = re.match(r'^\s*<a href="(.*?)" title="View', new_file_content)
if b_content is not None:
    b_content = b_content.group(1)
else:
    b_content = "No match found"

或者另一种方式:

import re

match = re.search('^\s*<a href="(.*?)" title="View', new_file_content)
if match is not None:
    b_content = match.group(1)
    old_file_content = re.sub(', in <a href="(.*?)" title="Vezi', f', in <a href="{b_content}" title="Vezi', old_file_content)
else:
    print("No match found")

或者,您可以使用 re.MULTILINE

import re

match = re.search('^\s*<a href="(.*?)" title="View', new_file_content, re.MULTILINE)
if match is not None:
    b_content = match.group(1)
    old_file_content = re.sub(', in <a href="([^"]*)" title="Vezi', f', in <a href="{b_content}" title="Vezi', old_file_content)
else:
    print("No match found")

或者,最好的:

import re

# Citește conținutul fișierului new-file.html
with open('c:/Folder7/new-file.html', 'r') as file:
    first_code = file.read()

# Citește conținutul fișierului old-file.html
with open('c:/Folder7/old-file.html', 'r') as file:
    second_code = file.read()

# Extrage URL-ul din first_code
match = re.search('<a href="(.*?)" title="View all articles', first_code)
if match is not None:
    url = match.group(1)
    # Înlocuiește URL-ul în second_code
    second_code = re.sub(', in <a href=".*?" title="Vezi toate', f', in <a href="{url}" title="Vezi toate', second_code)

    # Scrie conținutul modificat înapoi în old-file.html
    with open('c:/Folder7/old-file.html', 'w') as file:
        file.write(second_code)
else:
    print("No match found")

或者使用 BeautifulSoup:

from bs4 import BeautifulSoup
import re
 
html = '''\
<div class="color-black mt-lg-0" id="hidden">, in</div>
<a href="https://neculaifantanaru.com/en/leadership-pro.html" title="View all articles from Leadership Pro" class="color-green font-weight-600 mx-1" id="hidden">Leadership Pro</a>
'''
 
soup = BeautifulSoup(html, 'html.parser')
link = soup.find('a').get('href')
print(link)

相关内容