使用 bash 抓取 HTML 数据?

使用 bash 抓取 HTML 数据?

我有一个充满搜索结果的网页(从定制的 cURL 请求获得),我希望能够使用 BASH 抓取结果页面的数据并提取 HTML 标签之间的文本。

以下是页面上单个结果的示例:

</button></div></div></div><!----></div><!----></div></div><!----></div></nav> <main id="main-container-id" role="main" class="main-div-main"><div class="container" data-v-25425518><div class="search-bar row" data-v-25425518><div id="searchbar_container" nav-item-class="nav-item-res" class="container" data-v-6561e626 data-v-25425518><div class="d-md-none" data-v-6561e626><div class="search-bar-lookalike form-combo res-border-mobile" data-v-6561e626><input id="proxy-search-bar" type="text" placeholder="" aria-label="" class="search-text form-combo-first" data-v-6561e626> <div class="icon-search-solid right-button form-combo-second search-icon res-background" data-v-6561e626></div></div></div> <div id="searchFields" class="bring-upwards-mobile d-none d-md-block" data-v-6561e626><ul role="tablist" class="nav animatable-block-3" data-v-6561e626><li role="tab" class="nav-item-res" data-v-6561e626><a href="/business" class="nav-link searchTab_business" data-v-6561e626><span data-v-6561e626>Business &amp; Gov</span></a></li> <li role="tab" class="nav-item-res" data-v-6561e626><a href="/residential" class="nav-link searchTab_residential active" data-v-6561e626><span data-v-6561e626>Residential</span></a></li></ul> <div class="mt-3" data-v-6561e626><div id="residentialTab" aria-labelledby="residentialTab" data-v-6561e626><div class="form-group row no-gutters" data-v-6561e626><div class="col-12 col-md-12 row res-border-desktop no-gutters" data-v-6561e626><div class="col-12 col-md-6 form-combo res-border-mobile add-mb-desktop animatable-block-2" data-v-6561e626><div class="search-auto-suggest searchForm_residentialNameField form-combo-first add-border-right residential" data-v-6561e626><div id="autosuggest" spellcheck="false"><input type="text" autocomplete="off" role="combobox" aria-autocomplete="list" aria-owns="autosuggest__results" aria-activedescendant="" aria-haspopup="false" aria-expanded="false" id="residentialQueryField" onInputChange="function () { [native code] }" placeholder="Surname" initialValue="Oliver" aria-label="residentialQueryField" onBlur="true" name="q" value="Oliver" class="form-control"> <div class="autosuggest__results-container"><!----></div></div></div> <div class="form-combo-second force-width-100 form-first-line add-border-right-desktop-only form-middle-initial-mobile" data-v-6561e626><input id="residentialMiddleInitialField" placeholder="Initial" spellCheck="false" aria-label="residentialMiddleInitialField" value="R" class="form-control" data-v-6561e626></div></div> <div class="col-12 col-md-6 form-combo res-border-mobile animatable-block-1" data-v-6561e626><div class="search-auto-suggest searchForm_locationField form-combo-first residential" data-v-6561e626><div id="autosuggest" spellcheck="false"><input type="text" autocomplete="off" role="combobox" aria-autocomplete="list" aria-owns="autosuggest__results" aria-activedescendant="" aria-haspopup="false" aria-expanded="false" id="residentialLocationQueryField" onInputChange="function () { [native code] }" placeholder="Suburb, state or postcode" initialValue="Yarraville, VIC 3013" aria-label="residentialLocationQueryField" onBlur="true" name="q" value="Yarraville, VIC 3013" class="form-control"> <div class="autosuggest__results-container"><!----></div></div></div> <button aria-label="search-button" class="btn search-button-residential form-combo-second" data-v-6561e626><span class="icon-search-solid search-icon" data-v-6561e626></span> <img src="/_nuxt/img/b323415.png" alt="load-icon" class="search-progress-icon d-none" data-v-6561e626></button> <div class="col-12 col-md-6 res-border-mobile search-progress-container d-none" data-v-6561e626><img src="/_nuxt/img/b323415.png" alt="load-icon" class="search-progress-icon" data-v-6561e626></div></div></div></div></div></div></div></div></div> <div class="row" data-v-25425518><div class="col-12 col-md-6 col-lg-7 col-xl-7" data-v-25425518><div class="search-result-list residential" data-v-6e576c74 data-v-25425518><div class="promoted-listing residential" data-v-775caae4 data-v-6e576c74><div class="title-wrapper" data-v-775caae4><span class="title" data-v-775caae4>Promoted Listing</span></div> <!----> <div class="promoted-listing-carousel-indicators" data-v-775caae4><ol data-v-775caae4><li class="active" data-v-775caae4></li><li data-v-775caae4></li><li data-v-775caae4></li></ol></div></div> <div data-v-6e576c74><!----> <div data-v-6e576c74><div class="section-spacer" data-v-6e576c74><!----> <div data-v-6e576c74><span class="results-summary" data-v-6e576c74>
        2
        result<span data-v-6e576c74>s</span>
        found
        <span data-v-6e576c74>in </span>
        Yarraville, VIC 3013.
        <!----></span></div> <!----> <!----></div> <!----></div> <!----> <div data-v-6e576c74><!----> <div class="search-result-item-container" data-v-3733715c data-v-6e576c74><div class="search-result-item apply-border-top apply-border-bottom residential" data-v-3733715c><a href="/oliver-j-r-720679284R" class="page-flex flex-2-1 search-link" data-v-3733715c><div class="column logo-wrapper no-logo" data-v-3733715c><!----></div> <div class="column flex-2-1" data-v-3733715c><!----> <div data-v-3733715c><span class="display-name" data-v-3733715c>
        J R Oliver
      </span> <!----> <!----> <!----></div> <!----> <!----> <div class="padding-right-8" data-v-3733715c><span class="presence-location" data-v-3733715c>
        17 Curzon St West Melbourne VIC 3003
      </span> <!----></div> <!----></div> <!----> <div class="column flex-0-1-auto" data-v-3733715c><div class="item-go-to-profile-icon" data-v-3733715c><i class="icon-chevron-thick2" data-v-3733715c></i></div></div></a> <div class="column mobile-visible apply-border-left flex-0-0-auto" data-v-3733715c><a href="tel:0417316820" class="item-call-to-action-icon circle-background" data-v-3733715c><i class="icon-phone-solid" data-v-3733715c></i></a></div></div> <div class="social-links-tl-container" data-v-3733715c><!----></div></div> <!----></div><div data-v-6e576c74><!----> <div class="search-result-item-container" data-v-3733715c data-v-6e576c74><div class="search-result-item apply-border-bottom residential" data-v-3733715c><a href="/oliver-b-726032229R" class="page-flex flex-2-1 search-link" data-v-3733715c><div class="column logo-wrapper no-logo" data-v-3733715c><!----></div> <div class="column flex-2-1" data-v-3733715c><!----> <div data-v-3733715c><span class="display-name" data-v-3733715c>
        

我该如何获取文本数据

答案1

如果你的问题可以改写为“我怎样才能消除每个 HTML 表达式?”,那么

sed -e 's/<[^>]+>//'

应该管用。

相关内容