因此,我当前获取带有网页链接的 txt 文件的代码是
@echo off
wget -m -p -E -k -K -np https://vk.com/XYZ/
rem edit next line to include your filename
set "zzfilename=.\vk.com\XYZ\index.html"
rem get the target line
type "%zzfilename%"|find /i "https://m.vk.com/doc">"zztarget.txt"
for /f "usebackq delims=" %%f in (`type "zztarget.txt"`) do set zzaaa=%%f
rem change double-quotes to single-quotes
set "zzaaa1=%zzaaa:"='%"
rem remove unneeded text from the beginning of the line
set "zzaaa2=%zzaaa1:*https://m.vk.com/doc=gotit%"
rem remove the "<" and ">" characters
set "zzaaa3=%zzaaa2:<='%"
set "zzaaa4=%zzaaa3:>='%"
rem from what remains, take only the desired URL
for /f "usebackq tokens=2 delims='" %%f in (`echo %zzaaa4%`) do set "zzgotit=%%f"
rem show the work and cleanup
set zz
set "zzaaa="
set "zzaaa1="
set "zzaaa2="
set "zzaaa3="
set "zzaaa4="
del "zztarget.txt">nul 2>&1
pause
但现在的问题是仅抓取不包含任何其他文本字符串的链接我不知道如何获取它,在论坛上搜索过但没有任何效果...应该怎么做:
链接总是以
<a class="mr_label medias_link" href="
链接总是以
" rel="noopener" target="_blank">
我想要做的是获取新的 .txt 文件:
前
... class="medias_link_icon"><i class="i_icon i_doc"></i></span><span class="medias_link_texts"><span class="medias_link_label">Plik</span><span class="medias_link_labeled medias_link_title"> </span><span class="medias_link_desc"> </span></span></a></div><div class="medias_row attachment_type_doc"><a class="mr_label medias_link" href="https://m.vk.com/doc16929061_546451452?hash=a33fc7d435c432a453&dl=52261df6ba84d700f9" rel="noopener" target="_blank"> <span class="medias_link_icon"> class="medias_link_icon"><i class="i_icon i_doc"></i></span><span class="medias_link_texts"><span class="medias_link_label">Plik</span><span class="medias_link_labeled medias_link_title"> 2020-04-18_New_Scientist_UserUpload.Net.pdf</span><span class="medias_link_desc"> </span></span></a></div><div class="medias_row attachment_type_doc"><a class="mr_label medias_link" href="https://m.vk.com/doc16929061_546451452?hash=a33fc7d435c432a453&dl=52261df6ba84d700f9" rel="noopener" target="_blank"> <span class="medias_link_icon"> ...
后:
https://m.vk.com/doc116929061_546451452?hash=a33fc7d435c432a453&dl=52261df6ba84d700f9
https://m.vk.com/doc116929061_546451485?hash=872bfbdaf4e0a2f015&dl=52da751b3ad2c6b994
答案1
(Get-Content ".\vk.com\XYZ\index.html").split('"')|Select-String -SimpleMatch ";"
- 输出:
https://m.vk.com/doc16929061_546451452?hash=a33fc7d435c432a453&dl=52261df6ba84d700f9
您可能需要此处提供的某些答案选项问题:
在您的具体情况下:
1)全部拆分"
在你的字符串中:
...medias_link" href="https://m.vk.com/doc16929061_546451452?hash=a33fc7d435c432a453&dl=52261df6ba84d700f9" rel="noopener"....
2)通过过滤与字符匹配的字符串来选择所需的字符串“ ;
“
观察:1) 编辑以使用您的完整路径/相对路径:
Get-Content "D:\downloads\vk.com\XYZ\index.html"
Get-Content "..\vk.com\XYZ\index.html"
Get-Content ".\vk.com\XYZ\index.html"
观察:2)您还可以使用别名gc|sls
:
(gc ".\vk.com\XYZ\index.html").split('"')|sls -SimpleMatch ";"
进一步阅读:
[√]分裂
[√]选择字符串 | slc