我正在尝试创建一个脚本来监视网页并在该页面发生更改时发送电报通知,我正在使用 diff 来完成此任务。
该脚本似乎运行良好,但有些网页在页面内容中插入了一种随机 ID,每次下载页面时该 ID 都会发生变化,我需要解决此问题才能使差异正常工作。
我需要找到某种方法来删除/编辑这个随机生成的ID,简而言之,我需要编辑这个ID的字符串,删除几乎所有字母、空格、连字符、数字等,并只保存没有ID的数据。
例如,我只需要修改引号“”括起来的信息:
<path d="M0 0h7v7h-7zM9 0h1v2h-1zM12 0h1v4h-2v-1h-1v-1h1v-1h1zM16 0h1v3h-1v-1h-1v-1h1zM18 0h4v1h-1v1h1v1h-2v-2h-1v1h-1zM23 0h1v1h-1zM26 0h7v7h-7zM1 1v5h5v-5zM22 1h1v1h-1zM27 1v5h5v-5zM2 2h3v3h-3zM8 2h1v1h1v1h1v1h1v1h-1v1h-1v-1h-1v1h1v1h-2zM14 2h1v1h-1zM23 2h1v2h1v3h-1v-2h-4v-1h3zM28 2h3v3h-3zM15 3h1v1h2v2h-1v-1h-2v2h-1v-1h-1v-2h2zM18 3h1v1h-1zM19 5h1v1h-1zM12 6h1v2h-2v-1h1zM16 6h1v2h1v-2h1v1h1v-1h1v1h1v1h-2v1h1v1h1v1h1v1h-3v1h-1v-1h-2v1h-1v-2h-2v1h-1v-4h2v1h-1v1h2zM22 6h1v1h-1zM23 7h1v1h-1zM0 8h1v1h1v-1h5v1h-3v1h3v1h-1v1h-1v-1h-2v-1h-1v1h-1v1h-1zM22 8h1v1h-1zM24 8h1v1h-1zM26 8h5v2h1v2h-2v1h3v1h-1v1h-1v1h-1v-2h-1v-1h-1v-3h1v1h1v-1h-1v-1h-1v1h-2zM9 9h1v1h-1zM23 9h1v1h-1zM32 9h1v1h-1zM8 10h1v1h-1zM18 10v1h2v-1zM10 11h1v1h-1zM25 11h1v1h-1zM3 12h2v1h-1v2h-2v-1h1zM6 12h3v1h-1v1h1v1h-1v1h-1v-1h-1v1h1v1h-1v1h-2v1h-1v-1h-3v-5h2v1h-1v2h1v1h1v-1h2v-2h2v-1h-1zM11 12h1v2h3v-1h1v1h1v1h-1v2h-1v-2h-1v1h-3zM14 12h1v1h-1zM17 13h2v1h-2zM22 13h6v1h-1v2h-1v1h-1v-1h-1v-2h-2zM20 14h2v1h1v1h-2v-1h-1zM9 15h1v1h-1zM28 15h1v2h-1v1h1v1h1v-1h-1v-1h2v1h1v1h1v3h-1v-1h-1v-1h-1v3h-1v-2h-1v-1h-2v-1h1v-1h-1v-1h1v-1h1zM10 16h1v1h-1zM17 16h1v1h-1zM32 16h1v2h-1zM8 17h2v1h-1v1h-1v1h2v3h-1v-1h-1v1h-2v1h2v1h-3v-1h-1v1h-1v-1h-1v-2h1v1h2v-1h2v-1h-1v-1h1v-1h-1v-1h2zM11 17h3v2h1v-1h1v1h1v1h-1v1h1v1h-2v-2h-3v-1h1v-1h-1v1h-1v1h-1v-2h1zM16 17h1v1h-1zM19 17h1v1h-1zM21 17h1v1h-1zM23 17h1v1h-1zM18 18h1v1h-1zM20 18h1v1h1v1h-1v1h-1v1h-1v-1h-1v-1h2zM22 18h1v1h-1zM24 18h2v1h-1v1h-1zM1 19h2v1h2v1h-3v-1h-1zM5 19h1v1h-1zM11 20h1v1h1v1h-1v1h-1zM23 20h1v1h4v2h-2v1h4v1h-1v2h1v-2h1v1h1v1h-1v1h-1v3h1v1h-1v1h-1v-1h-1v-1h1v-1h-1v-1h-4v-1h-1v-2h1v-4h-1v1h-1v-2h1zM0 21h2v1h-1v3h-1zM31 22h1v1h1v1h-3v-1h1zM10 23h1v1h-1zM13 23h1v1h-1zM16 23h1v1h-1zM21 23h1v1h-1zM9 24h1v1h1v-1h2v2h-1v1h-1v1h-1v-1h-1v-1h-1v-1h1zM14 24h1v2h-1zM17 24h1v3h2v-1h-1v-2h1v1h2v1h-1v1h1v1h-1v1h-1v1h-1v1h-3v2h-2v-1h1v-1h-4v-2h5v1h2v-1h1v-1h-2v1h-1v-2h-1v-1h1v-1h1zM22 24h1v1h-1zM25 25v3h3v-3zM32 25h1v1h-1zM0 26h7v7h-7zM26 26h1v1h-1zM1 27v5h5v-5zM8 27h1v1h1v3h1v2h-1v-1h-1v-1h-1zM12 27h1v1h-1zM2 28h3v3h-3zM31 28h2v2h-2zM21 29h2v1h-2zM20 30h1v1h-1zM23 30h1v2h-1v1h-1v-1h-1v-1h2zM26 30h2v1h-2zM8 32h1v1h-1zM17 32h3v1h-3zM24 32h1v1h-1zM26 32h2v1h-2zM31 32h1v1h-1z"/>
我需要的结果:
<path d = ""/>
或类似以下示例的内容:
<path d="0"/>
<path d="CLEAN"/>
<path d=""/>
<path d=/>
我相信使用 sed 可能可以解决这个问题,但由于字符串的复杂性,有很多字符、空格、连字符、数字等,我很难找到理想的命令
我正在使用的脚本示例:
#! /bin/bash
page_mofication="$(cat /opt/pagename/listing/latest_modifications/latest_modifications.log)"
fileold=/opt/pagename/latest_modifications/latest_modifications_old
filenew=/opt/pagename/latest_modifications/latest_modifications_new
log=/opt/pagename/listing/latest_modifications/latest_modifications.log
logold=/opt/pagename/oldfiles/latest_modifications/latest_modifications.log
mv $log $logold-`date +%d-%m-%Y_%H:%M:%S`
wget https://www.pagename.com -O $filenew
diff $fileold $filenew >> $log
message=$'\n'"$page_mofication"
/etc/scripts/telegram-send.sh "$message"
cp $filenew $fileold
exit 0
关于如何解决这个问题有什么想法吗?
答案1
假设您使用的是 GNU sed 版本的 sed 编辑器。在比较之前尝试将 fileold 和 filenew 文件中的路径 d 数据清空。所以你可以按照这些思路做一些事情:
sed -i '
/<path d=/c\
<path d=/>
' -- "$fileold" "$filenew";
或者,如果您必须确保引号之间的字符仅为字母数字、连字符、水平空格
sed -Ei '
s|(<path d)="[\t a-zA-Z0-9-]+"/>|\1=/>|
' -- "$fileold" "$filenew";