从 txt 文件中剪切特定行之前和之后的文本

从 txt 文件中剪切特定行之前和之后的文本

因此,我当前获取带有网页链接的 txt 文件的代码是

@echo off

wget -m -p -E -k -K -np https://vk.com/XYZ/

rem    edit next line to include your filename    
set "zzfilename=.\vk.com\XYZ\index.html"

rem    get the target line
type "%zzfilename%"|find /i "https://m.vk.com/doc">"zztarget.txt"
for /f "usebackq delims=" %%f in (`type "zztarget.txt"`) do set zzaaa=%%f

rem    change double-quotes to single-quotes
set "zzaaa1=%zzaaa:"='%"

rem    remove unneeded text from the beginning of the line
set "zzaaa2=%zzaaa1:*https://m.vk.com/doc=gotit%"

rem    remove the "<" and ">" characters
set "zzaaa3=%zzaaa2:<='%"
set "zzaaa4=%zzaaa3:>='%"

rem    from what remains, take only the desired URL
for /f "usebackq tokens=2 delims='" %%f in (`echo %zzaaa4%`) do set "zzgotit=%%f"

rem    show the work and cleanup
set zz
set "zzaaa="
set "zzaaa1="
set "zzaaa2="
set "zzaaa3="
set "zzaaa4="
del "zztarget.txt">nul 2>&1

pause

但现在的问题是仅抓取不包含任何其他文本字符串的链接我不知道如何获取它,在论坛上搜索过但没有任何效果...应该怎么做:

链接总是以

<a class="mr_label medias_link" href="

链接总是以

 " rel="noopener" target="_blank"> 

我想要做的是获取新的 .txt 文件:

 ... class="medias_link_icon"><i class="i_icon i_doc"></i></span><span class="medias_link_texts"><span class="medias_link_label">Plik</span><span class="medias_link_labeled medias_link_title"> </span><span class="medias_link_desc"> </span></span></a></div><div class="medias_row attachment_type_doc"><a class="mr_label medias_link" href="https://m.vk.com/doc16929061_546451452?hash=a33fc7d435c432a453&amp;dl=52261df6ba84d700f9" rel="noopener" target="_blank"> <span class="medias_link_icon"> class="medias_link_icon"><i class="i_icon i_doc"></i></span><span class="medias_link_texts"><span class="medias_link_label">Plik</span><span class="medias_link_labeled medias_link_title"> 2020-04-18_New_Scientist_UserUpload.Net.pdf</span><span class="medias_link_desc"> </span></span></a></div><div class="medias_row attachment_type_doc"><a class="mr_label medias_link" href="https://m.vk.com/doc16929061_546451452?hash=a33fc7d435c432a453&amp;dl=52261df6ba84d700f9" rel="noopener" target="_blank"> <span class="medias_link_icon"> ...

后:

https://m.vk.com/doc116929061_546451452?hash=a33fc7d435c432a453&amp;dl=52261df6ba84d700f9
https://m.vk.com/doc116929061_546451485?hash=872bfbdaf4e0a2f015&amp;dl=52da751b3ad2c6b994

答案1

(Get-Content ".\vk.com\XYZ\index.html").split('"')|Select-String -SimpleMatch ";"

  • 输出:
https://m.vk.com/doc16929061_546451452?hash=a33fc7d435c432a453&amp;dl=52261df6ba84d700f9

您可能需要此处提供的某些答案选项问题

在您的具体情况下:

1)全部拆分"在你的字符串中:

...medias_link" href="https://m.vk.com/doc16929061_546451452?hash=a33fc7d435c432a453&dl=52261df6ba84d700f9" rel="noopener"....

2)通过过滤与字符匹配的字符串来选择所需的字符串;

观察:1) 编辑以使用您的完整路径/相对路径:

Get-Content "D:\downloads\vk.com\XYZ\index.html"
Get-Content "..\vk.com\XYZ\index.html"
Get-Content ".\vk.com\XYZ\index.html"

观察:2)您还可以使用别名gc|sls

(gc ".\vk.com\XYZ\index.html").split('"')|sls -SimpleMatch ";"

相关内容