我正在尝试使用 python3 读取特定标签的网页,但由于无法处理 unicode 字符,因此出现错误,如UnicodeEncodeError: 'latin-1' codec can't encode character '\u201c' in position 145: ordinal not in range(256)
如何使用正确的语法以便获取标签
这是我到目前为止尝试过的 MWE
import requests
page = requests.get("https://www.biblegateway.com/passage/?search=Genesis+35&version=NIV")
from bs4 import BeautifulSoup
soup = BeautifulSoup(page.content, 'html.parser')
story = soup.find_all('p') # to extract story title including <h3> tags
periods = [pt.get_text() for pt in story] # extract only data from <h3> tags
print (periods)