如何使用 grep、sed 或任何常用工具将 html 中的 URL 替换为文件?

如何使用 grep、sed 或任何常用工具将 html 中的 URL 替换为文件?

我正在尝试conf从 HTML 文件中替换我的文件的 URL,因为有时 URL 会更新/更改。我需要一个简单的脚本来获取 HTML 并更新/替换我的conf文件中的 URL。我是 Ubuntu/Linux 新手。

编辑:HTML 文件被服务器管理员更改,而我无权访问。我只能使用下面的方法访问最新的网站Html Location并手动更新我的 conf 文件。

HTML位置:https://10.10.10.1

HTML 文件的一部分:

<li><a href="#" class="dropdown-toggle hvr-bounce-to-bottom" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Movies<span class="caret"></span></a>
                                    <ul class="dropdown-menu">
                                        
<li><a class="hvr-bounce-to-bottom" href="http://10.10.10.7/MY-FTP-2/English%20Movies/">English Movies</a></li>
<li><a class="hvr-bounce-to-bottom" href="http://10.10.10.8/MY-FTP-1/English%20Movies%20%281080p%29/">English Movies -1080p </a></li>
                                        
                                        <li><a class="hvr-bounce-to-bottom" href="http://10.10.10.9/MY-FTP-1/Hindi%20Movies/">Hindi Movies</a></li>
                                        <li><a class="hvr-bounce-to-bottom" href="http://10.10.10.8/MY-FTP-1/SOUTH%20INDIAN%20MOVIES/Hindi%20Dubbed/">South-Movie Hindi Dubbed</a></li>
                                        <li><a class="hvr-bounce-to-bottom" href="http://10.10.10.10/MY-FTP-3/Animation%20Movies/">Animation Movies</a></li>
                                        <li><a class="hvr-bounce-to-bottom" href="http://10.10.10.10/MY-FTP-3/Animation%20Movies%20%281080p%29/">Animation Movies -1080p</a></li>

获取 HTML 后,它将替换/更新我的 rclone.conf 文件中的链接。

rclone.conf文件预览:

[Hindi Movies]
type = http
url = http://10.10.10.9/MY-FTP-1/Hindi%20Movies/

[English Movies]
type = http
url = http://10.10.10.7/MY-FTP-2/English%20Movies/

[English Movies -1080p]
type = http
url = http://10.10.10.9/MY-FTP-1/English%20Movies%20%281080p%29/

[South-Movie Hindi Dubbed]
type = http
url = http://10.10.10.9/MY-FTP-1/SOUTH%20INDIAN%20MOVIES/Hindi%20Dubbed/

[Animation Movies]
type = http
url = http://10.10.10.10/MY-FTP-3/Animation%20Movies/

[Animation Movies -1080p]
type = http
url = http://10.10.10.10/MY-FTP-3/Animation%20Movies%20%281080p%29/

因此,我编写了一个可以开始工作的新手脚本,但它似乎给了我一个错误!

import re
import requests
from bs4 import BeautifulSoup

# Fetch the HTML from the website
html = requests.get("http://10.10.10.1/")

# Parse the HTML
soup = BeautifulSoup(html.text, 'html.parser')

# The location of the rclone.conf file
rclone_conf_file = '/home/user/tmp/rclone.conf'

# Open the rclone.conf file
with open(rclone_conf_file, 'r') as f:
    # Read the file into a list of lines
    lines = f.readlines()

# Iterate over the <a> tags in the HTML
for a in soup.find_all('a'):
    # Get the text of the <a> tag (e.g. "Hindi Movies")
    section_name = a.text.strip().lower()

    # Check if the section name exists in the rclone.conf file
    if any(section_name in line.lower() for line in lines):
        # Get the URL of the <a> tag
        new_url = a['href']

        # Use a regular expression to match the URL in the rclone.conf file
        regex = r'^(\[%s\]\n.*\n.*http.*)' % re.escape(section_name)

        # Update the URL in the rclone.conf file
        for i, line in enumerate(lines):
            if section_name in line.lower():
                print(lines[i])  # <-- Add this line
                lines[i] = re.sub(regex, r'\1', line, flags=re.IGNORECASE)
                lines[i] = lines[i].replace(lines[i].split()[2], new_url)

# Open the rclone.conf file for writing
with open(rclone_conf_file, 'w') as f:
    # Write the updated lines to the file
    for line in lines:
        f.write(line)

显示的错误:

File "/home/plex/tmp/script.py", line 37, in <module>
    lines[i] = lines[i].replace(lines[i].split()[2], new_url)
IndexError: list index out of range

祝你度过美好的一天!!

相关内容