FireFox 页面信息

FireFox 页面信息

希望下载“查看页面信息”下“Fire Fox 媒体”选项卡下图像的 URL,最好使用 power shell。不确定这是否可行,或者是否有更好的方法。

在此处输入图片描述

答案1

某些网站/资产或其部分不允许/阻止自动化,并且对此无能为力。

顺便说一句,您不需要浏览器来下载网站数据,这当然被称为网络抓取,这是使用 PowerShell 网络 cmdlet 完成的,具体来说......

# Get specifics for a module, cmdlet, or function
(Get-Command -Name Invoke-WebRequest).Parameters
(Get-Command -Name Invoke-WebRequest).Parameters.Keys
<#
# Results

UseBasicParsing
Uri
WebSession
SessionVariable
Credential
UseDefaultCredentials
CertificateThumbprint
Certificate
UserAgent
DisableKeepAlive
TimeoutSec
Headers
MaximumRedirection
Method
Proxy
ProxyCredential
ProxyUseDefaultCredentials
Body
ContentType
TransferEncoding
InFile
OutFile
PassThru
Verbose
Debug
ErrorAction
WarningAction
InformationAction
ErrorVariable
WarningVariable
InformationVariable
OutVariable
OutBuffer
PipelineVariable
#>
Get-help -Name Invoke-WebRequest -Examples
<#
# Results

$R = Invoke-WebRequest -URI 
$R.AllElements | where {$_.innerhtml -like "*=*"} | Sort { 
values. Sorting by the shortest HTML value often helps you find the     
$R=Invoke-WebRequest http://www.facebook.com/login.php
$FB
$Form = $R.Forms[0]
$Form | Format-List
$Form.fields
$Form.Fields["email"]="[email protected]"
$R=Invoke-WebRequest -Uri ("https://www.facebook.com" +
# Sends a sign-in request by running the Invoke-WebRequest 
$R.StatusDescription
(Invoke-WebRequest -Uri "http://msdn.microsoft.com/en-us/library 
#>
Get-help -Name Invoke-WebRequest -Full
Get-help -Name Invoke-WebRequest -Online

因此,对于您说要访问的 URL,请注意您会得到以下结果...

# Download website main page
($InstacartHomeData = Invoke-WebRequest -Uri 'https://www.instantcart.com')

<#
# Results

StatusCode        : 200
StatusDescription : OK
Content           : <!DOCTYPE html><html lang="en" class="no-js"><head><link rel="alternate"
                    href="https://www.instantcart.com/" hreflang="en-gb" /><link rel="alternate"
                    href="https://www.instantcart.com/" hreflang="en" ...
RawContent        : HTTP/1.1 200 OK
                    Pragma: no-cache
                    Vary: Accept-Encoding
                    Connection: close
                    Transfer-Encoding: chunked
                    Cache-Control: private, no-cache, no-store, proxy-revalidate, no-transform
                    Content-Type: text/...
Forms             : {}
Headers           : {[Pragma, no-cache], [Vary, Accept-Encoding], [Connection, close], [Transfer-Encoding, chunked],
                    [Cache-Control, private, no-cache, no-store, proxy-revalidate, no-transform], [Content-Type,
                    text/html], [Date, Thu, 28 May 2020 04:23:28 GMT], [Expires, Thu, 19 Nov 1981 08:52:00 GMT],
                    [Set-Cookie, sid=b806f71e100b9f2d4d1037561b53ff65; path=/; domain=www.instantcart.com], [Server,
                    Apache], [X-Powered-By, PHP/5.5.38]}
Images            : {@{innerHTML=; innerText=; outerHTML=<img width="160" class="img-responsive"
...
#>

# Get only images data
$InstacartHomeData.Images | Select-Object alt, src

<#
# Results

alt                                       src
---                                       ---
                                          /pics/logo.png
Abode Home Products                       /images/home/clients/abode-home-products.png
Avanta UK                                 /images/home/clients/avanta-uk.png
Q-Park                                    /images/home/clients/qpark.png
...
#> 

现在,对您的目标页面进行同样的尝试。

# Download website specific main page
($InstacartProductPageData = Invoke-WebRequest -Uri 'https://www.instacart.com/products/98954-poland-spring-natural-spring-water-2-5-gal')
<#
# Results

# Cookie are used to get this

StatusCode        : 200
StatusDescription : OK
Content           : <!DOCTYPE html>
                    <html lang='en'>
                    <head>
                    <title>
                    Poland Spring Natural Spring Water (2.5 gal) - Instacart
                    </title>
                    <meta content='Buy Poland Spring Natural Spring Water (2.5 gal) online and have it de...
RawContent        : HTTP/1.1 200 OK
                    Transfer-Encoding: chunked
                    Connection: keep-alive
                    X-Frame-Options: SAMEORIGIN
                    X-XSS-Protection: 1; mode=block
                    X-Content-Type-Options: nosniff
                    X-Download-Options: noopen
                    X-Permit...
Forms             : {}
Headers           : {[Transfer-Encoding, chunked], [Connection, keep-alive], [X-Frame-Options, SAMEORIGIN],
                    [X-XSS-Protection, 1; mode=block]...}
Images            : {@{innerHTML=; innerText=; outerHTML=<img class="rmq-569a8dd6" style="background: rgb(255, 255,

...
                    Poland Spring 100% Natural Spring Water

                    2.5 gal; outerHTML=<a style="text-decoration: none;"
                    href="/products/16965376-poland-spring-100-natural-spring-water-2-5-gal" data-radium="true"><div
                    class="rmq-cd8b1370 rmq-5e34cd3" style="padding: 0px 16px; width: 208px; height: 100%; text-align:
                    left; line-height: 1.29; font-size: 14px; display: flex; position: relative; opacity: 1;
                    flex-direction: column;" data-radium="true"><div class="rmq-24058c4e" style="width: 176px; height:
                    176px;" data-radium="true"><img style="width: 100%; display: block;" alt="" src="https://d2d8wwwkmh
                    fcva.cloudfront.net/352x/d1s8987jlndkbs.cloudfront.net/assets/missing-item-4bbe82b8555e4d1c12626fd4
                    82cb2409713e8e30835645ff3650ef66a725d03c.png" data-radium="true"></div><div style="padding-bottom:
                    8px; margin-top: auto;" data-radium="true"><div class="rmq-50e196af" style="color: rgb(66, 66,
                    66); overflow: hidden; margin-top: 20px; -ms-text-overflow: ellipsis; max-height: 55px;"
                    data-radium="true">Poland Spring 100% Natural Spring Water</div><div style="color: rgb(117, 117,
                    117);" data-radium="true"><span>2.5 gal</span></div></div></div></a>; outerText=
...    

#>

# Get only images data
$InstacartProductPageData.Images | Select-Object alt, src
<#
# Results

    alt                                        src
---                                        ---
Instacart logo                             https://d2guulkeunn7d8.cloudfront.net/assets/beetstrap/brand/carrotlogo-p...
Poland Spring Natural Spring Water         https://d2lnr5mha7bycj.cloudfront.net/product-image/file/large_f44f2f09-b...
Gala Fresh logo                            https://d2lnr5mha7bycj.cloudfront.net/warehouse/logo/162/0f5c96be-4126-45...
...
#>

答案2

请参阅下面使用 Internet Explorer 呈现页面的内容,然后将图像位置存储在文档属性中。

根据需要调整输出目录和网站。

我还没有测试过这个结果是否与 Firefox 列出的结果相同,但很可能会产生相同的结果。

$OutputDirectory = "c:\test\images.txt" # change this to the output directory and txt file name, ensure it ends with .txt
$Weppage = "https://www.somewebsite.com" # change this to the webpage you want

$ieObject = New-Object -ComObject 'InternetExplorer.Application'
$ieObject.Visible = $false
$ieObject.Navigate($Weppage)
while($ieObject.ReadyState -ne 4) {start-sleep -m 100}
$images = $ieObject.Document.images | % {$_.src}
$images | Out-file $OutputDirectory
$ieObject.quit()

相关内容