为什么这一行 sed 命令没有像我想象的那样工作?

为什么这一行 sed 命令没有像我想象的那样工作?

跑步

curl -s 'https://www.idealista.com/inmueble/94238881/' \
  -H 'authority: www.idealista.com' \
  -H 'cache-control: max-age=0' \
  -H 'upgrade-insecure-requests: 1' \
  -H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36' \
  -H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9' \
  -H 'sec-gpc: 1' \
  -H 'sec-fetch-site: same-origin' \
  -H 'sec-fetch-mode: navigate' \
  -H 'sec-fetch-user: ?1' \
  -H 'sec-fetch-dest: document' \
  -H 'referer: https://www.idealista.com/usuario/favoritos/' \
  -H 'accept-language: en-US,en;q=0.9' \
  -H $'cookie: didomi_token=eyJ1c2VyX2lkIjoiMTc5MTljZjItMzUxNi02NmRjLTk0YjYtNTM3ODFiMjY1NGU5IiwiY3JlYXRlZCI6IjIwMjEtMDQtMjhUMTg6NDg6MzIuMDk5WiIsInVwZGF0ZWQiOiIyMDIxLTA0LTI4VDE4OjQ4OjMyLjA5OVoiLCJ2ZW5kb3JzIjp7ImRpc2FibGVkIjpbInR3aXR0ZXIiLCJnb29nbGUiLCJmYWNlYm9vayIsImM6bWl4cGFuZWwiLCJjOmlkZWFsaXN0YS1mZVJFamUyYyIsImM6aWRlYWxpc3RhLUx6dEJlcUUzIiwiYzphYnRhc3R5LUxMa0VDQ2o4IiwiYzpob3RqYXIiLCJjOnlhbmRleG1ldHJpY3MiLCJjOmJlYW1lci1IN3RyN0hpeCIsImM6dGVhbGl1bWNvLURWRENkOFpQIiwiYzpjaGFyYmVhdC1aNFFrOENhaCJdfSwicHVycG9zZXMiOnsiZGlzYWJsZWQiOlsiZ2VvbG9jYXRpb25fZGF0YSIsImFuYWxpdGljYXMtZHlGVkdSZTgiXX0sInZlcnNpb24iOjIsImFjIjoiQUFBQS5BQUFBIn0=; euconsent-v2=CPFYMwBPFYMwBAHABBENBXCgAAAAAAAAAAAAAAAAAAEBoFAAVgAuACGAGQAMsAagA2QB2AD8AIAAQUAjABSwCngFXgLQAtIBrADeAHVAPkAhsBDoCKgEXgJEATYAnYBSIC5AGBAMJAYeAxgBk4DOQGeAM-AckA5QB1hKB6AAgABYAFAAMgAcABFADAAMQAeABEACYAFUALgAXwAxABmADaAIQAQ0AiACJAEcAKMAUoAtwBhADKAGqANkAd4A_ACMAEcAKeAVeAtAC0gF1AMUAbgA4gB1AD5AIdARUAi8BIgCbAFigLYAXaAvMBh4DIgGTgMsAZyAzwBnwDSAGsAOAAdYA7UpBQAAXABQAFQAMgAcgA-AEAAIoAYABjADQANQAeQBDAEUAJgATwApABVACwAFwAL4AYgAzABzAEIAIaARABEgCjAFKALEAW4AwgBlADRAGqANkAd8A-wD9AIsARgAjgBKQCggFDAKuAVsAuYBeQDFAG0ANwAegBDoCLwEiAJsATsAocBTQCtgFigLYAXAAuQBdoC8wGGgMPAYwAyIBkgDJwGXAM5AZ4Az6BpAGkwNYA1kBscDkwOUAcuA6wB2oDxyEEIABYAFAAMgAiABcADEAIYATAAqgBcAC-AGIAMwAbwA9ACOAFiAMIAZQA1ABvgDvgH2AfgA_wCMAEcAJSAUEAoYBTwCrwFoAWkAuYBfgDFAG0AOoAegBIICRAEqAJsAU0AsUBaMC2ALaAXAAuQBdoDDwGJAMiAZOAzkBngDPgGiANJAaWA1UBwADkgHRgOsAdqA8cdBxAAXABQAFQAMgAcgA-AEAAIgAXQAwADGAGgAagA8AB9AEMARQAmABPgCqAKwAWIAuAC6AF8AMQAZgA3gBzAD0AIQAQ0AiACJAEdAJYAmABNACjAFKALEAW8AwgDDAGQAMoAaIA1ABsgDfAHeAPaAfYB-gD_AIHARYBGACOQEpASoAoIBTwCrgFigLQAtIBcwC6gF5AL8AYoA2gBuADiQHTAdQA9ACGwEOgIiARUAi8BIICRAEqAJsATsAocBTQCrAFigLQgWwBbIC4AFyALtAXeAvMBgwDCQGGgMPAYkAxgBjwDJAGTgMqAZYAy4BnIDPgGiQNIA0kBpYDTgGqgNYAbGA4uByQHKgOXAdGA6wB44D0gHqhILYACAAFwAUABUADIAHIAPABAACIAGEANAA1AB5AEMARQAmABPgCqAKwAWAAuABvADmAHoAQgAhoBEAESAI6ASwBLgCaAFKALcAYYAyABlwDUANUAbIA7wB7AD4gH2AfoBAACBwEXARgAjQBHACUgFBAKWAU8Aq4BcwC_AGKANYAbQA3ABvADiAHoAPkAhsBDoCLwEiAJiATKAmwBOwChwFIgLFAWgAtgBcgC7wF5gMCAYMAwkBhoDDwGRAMkAZOAy4BnIDPgGkANOgawBrMDkQOVAcuA6MB1gDxxkBwACgAQwAmABcAEcAMsAagA7IB9gH4ARgAjgBSwCrgFbAN4AmIBNgC0QFsALzAYEAw8BkQDOQGeAM-AckA5QVAfAAoAEMAJgAXABHADLAGoAOwAfgBGACOAFLAKvAWgBaQDeAJBATEAmwBTYC2AFyALzAYEAw8BkQDOQGeAM-AbkA5IBygAA.YAAAAAAAAAAA; smc="{}"; userUUID=29668197-3ec8-4380-83a0-af74497d4652; askToSaveAlertPopUp=true; cookieSearch-1="/areas/venta-terrenos/con-precio-hasta_30000,precio-desde_5000,metros-cuadrados-mas-de_500,metros-cuadrados-menos-de_5000,terrenos-urbanos/?shape=%28%28uyxcF%7Cqxl%40%7BxEyrlCput%40ghjBfnw%40%7Dnr%40rlxA%7Dj_AjdWq%60xAr%60r%40_pUvdy%40rwg%40r%7CLr%7BZmfy%40f%7DsB_%7EwBb_aDyj%7DA%7CsoBupaA%7CaO%29%29:1631778650981"; uc="jNLcI107+7z1wRs0x4TdO3u5jcaE+Wrl/o5Drt4SE9qHCxSMOrDfJCS1OVr9tkKQ1xbkhwtCXOZOKw1BLkvMTAMsML+Z10HjHWdUIRhRkRXsEcNnPFEt9rqCk0DCCd7EBSE6A/jp5vs="; nl="wrtrmuF9QzNOYYO2P8SN3OqyjHQXevAY7aYvx0cKdUNwML7qYn47dSs63/pFStgOTH50K6V1y0hMkNG4T70na63g0fJdDSpgDegfruZFCA9GnVx058kgR638a8Q81Gz9r1nzfAqJdfs="; SESSION=2e11a953aead5c1f~d77af03d-2746-4657-89a6-ab25cb02e889; contactd77af03d-2746-4657-89a6-ab25cb02e889="{\'email\':\'EqU9vfX+DxXdzl+iReclTDED4UB/DY6iSfTeSLvLlsE=\',\'phone\':\'634821160\',\'phonePrefix\':null,\'friendEmails\':null,\'name\':\'RL9PoIsz5dW5rvLNF/0gvA==\',\'message\':null,\'message2Friends\':null,\'maxNumberContactsAllow\':10,\'defaultMessage\':true}"; sendd77af03d-2746-4657-89a6-ab25cb02e889="{\'friendsEmail\':null,\'email\':\'EqU9vfX+DxXdzl+iReclTDED4UB/DY6iSfTeSLvLlsE=\',\'message\':null}"; datadome=Gegr2x~z62hH2U.ay-n4sMd3xgi-RaB1X5XMr4i5qV2q6GYsINszxSNDS732-spxaAaUp.m7aGMcOgN-DcAxFY9KCQsldTsDl-RVS5ocEm; cc=eyJhbGciOiJIUzI1NiJ9.eyJjdCI6OTk2MjMyNiwiZXhwIjoxNjMxOTUxOTc4fQ.0YJZGR34jgb1SWTAzL9DFPhAeeP4k0hE9igRE8-SQp0' \
--compressed | grep -m 1 "<meta name=\"description\" content="

会给我:

<meta name="description" content="terreno de 936 m², Terreno en venta en paseo Blasco Ibáñez s/n, Costa Esuri, Ayamonte, Costa Esuri">

为了只抓取文本而不包含任何 HTML,我尝试了应用sed,最终得到了按预期工作的代码:

curl -s 'https://www.idealista.com/inmueble/94238881/' \
  -H 'authority: www.idealista.com' \
  -H 'cache-control: max-age=0' \
  -H 'upgrade-insecure-requests: 1' \
  -H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36' \
  -H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9' \
  -H 'sec-gpc: 1' \
  -H 'sec-fetch-site: same-origin' \
  -H 'sec-fetch-mode: navigate' \
  -H 'sec-fetch-user: ?1' \
  -H 'sec-fetch-dest: document' \
  -H 'referer: https://www.idealista.com/usuario/favoritos/' \
  -H 'accept-language: en-US,en;q=0.9' \
  -H $'cookie: didomi_token=eyJ1c2VyX2lkIjoiMTc5MTljZjItMzUxNi02NmRjLTk0YjYtNTM3ODFiMjY1NGU5IiwiY3JlYXRlZCI6IjIwMjEtMDQtMjhUMTg6NDg6MzIuMDk5WiIsInVwZGF0ZWQiOiIyMDIxLTA0LTI4VDE4OjQ4OjMyLjA5OVoiLCJ2ZW5kb3JzIjp7ImRpc2FibGVkIjpbInR3aXR0ZXIiLCJnb29nbGUiLCJmYWNlYm9vayIsImM6bWl4cGFuZWwiLCJjOmlkZWFsaXN0YS1mZVJFamUyYyIsImM6aWRlYWxpc3RhLUx6dEJlcUUzIiwiYzphYnRhc3R5LUxMa0VDQ2o4IiwiYzpob3RqYXIiLCJjOnlhbmRleG1ldHJpY3MiLCJjOmJlYW1lci1IN3RyN0hpeCIsImM6dGVhbGl1bWNvLURWRENkOFpQIiwiYzpjaGFyYmVhdC1aNFFrOENhaCJdfSwicHVycG9zZXMiOnsiZGlzYWJsZWQiOlsiZ2VvbG9jYXRpb25fZGF0YSIsImFuYWxpdGljYXMtZHlGVkdSZTgiXX0sInZlcnNpb24iOjIsImFjIjoiQUFBQS5BQUFBIn0=; euconsent-v2=CPFYMwBPFYMwBAHABBENBXCgAAAAAAAAAAAAAAAAAAEBoFAAVgAuACGAGQAMsAagA2QB2AD8AIAAQUAjABSwCngFXgLQAtIBrADeAHVAPkAhsBDoCKgEXgJEATYAnYBSIC5AGBAMJAYeAxgBk4DOQGeAM-AckA5QB1hKB6AAgABYAFAAMgAcABFADAAMQAeABEACYAFUALgAXwAxABmADaAIQAQ0AiACJAEcAKMAUoAtwBhADKAGqANkAd4A_ACMAEcAKeAVeAtAC0gF1AMUAbgA4gB1AD5AIdARUAi8BIgCbAFigLYAXaAvMBh4DIgGTgMsAZyAzwBnwDSAGsAOAAdYA7UpBQAAXABQAFQAMgAcgA-AEAAIoAYABjADQANQAeQBDAEUAJgATwApABVACwAFwAL4AYgAzABzAEIAIaARABEgCjAFKALEAW4AwgBlADRAGqANkAd8A-wD9AIsARgAjgBKQCggFDAKuAVsAuYBeQDFAG0ANwAegBDoCLwEiAJsATsAocBTQCtgFigLYAXAAuQBdoC8wGGgMPAYwAyIBkgDJwGXAM5AZ4Az6BpAGkwNYA1kBscDkwOUAcuA6wB2oDxyEEIABYAFAAMgAiABcADEAIYATAAqgBcAC-AGIAMwAbwA9ACOAFiAMIAZQA1ABvgDvgH2AfgA_wCMAEcAJSAUEAoYBTwCrwFoAWkAuYBfgDFAG0AOoAegBIICRAEqAJsAU0AsUBaMC2ALaAXAAuQBdoDDwGJAMiAZOAzkBngDPgGiANJAaWA1UBwADkgHRgOsAdqA8cdBxAAXABQAFQAMgAcgA-AEAAIgAXQAwADGAGgAagA8AB9AEMARQAmABPgCqAKwAWIAuAC6AF8AMQAZgA3gBzAD0AIQAQ0AiACJAEdAJYAmABNACjAFKALEAW8AwgDDAGQAMoAaIA1ABsgDfAHeAPaAfYB-gD_AIHARYBGACOQEpASoAoIBTwCrgFigLQAtIBcwC6gF5AL8AYoA2gBuADiQHTAdQA9ACGwEOgIiARUAi8BIICRAEqAJsATsAocBTQCrAFigLQgWwBbIC4AFyALtAXeAvMBgwDCQGGgMPAYkAxgBjwDJAGTgMqAZYAy4BnIDPgGiQNIA0kBpYDTgGqgNYAbGA4uByQHKgOXAdGA6wB44D0gHqhILYACAAFwAUABUADIAHIAPABAACIAGEANAA1AB5AEMARQAmABPgCqAKwAWAAuABvADmAHoAQgAhoBEAESAI6ASwBLgCaAFKALcAYYAyABlwDUANUAbIA7wB7AD4gH2AfoBAACBwEXARgAjQBHACUgFBAKWAU8Aq4BcwC_AGKANYAbQA3ABvADiAHoAPkAhsBDoCLwEiAJiATKAmwBOwChwFIgLFAWgAtgBcgC7wF5gMCAYMAwkBhoDDwGRAMkAZOAy4BnIDPgGkANOgawBrMDkQOVAcuA6MB1gDxxkBwACgAQwAmABcAEcAMsAagA7IB9gH4ARgAjgBSwCrgFbAN4AmIBNgC0QFsALzAYEAw8BkQDOQGeAM-AckA5QVAfAAoAEMAJgAXABHADLAGoAOwAfgBGACOAFLAKvAWgBaQDeAJBATEAmwBTYC2AFyALzAYEAw8BkQDOQGeAM-AbkA5IBygAA.YAAAAAAAAAAA; smc="{}"; userUUID=29668197-3ec8-4380-83a0-af74497d4652; askToSaveAlertPopUp=true; cookieSearch-1="/areas/venta-terrenos/con-precio-hasta_30000,precio-desde_5000,metros-cuadrados-mas-de_500,metros-cuadrados-menos-de_5000,terrenos-urbanos/?shape=%28%28uyxcF%7Cqxl%40%7BxEyrlCput%40ghjBfnw%40%7Dnr%40rlxA%7Dj_AjdWq%60xAr%60r%40_pUvdy%40rwg%40r%7CLr%7BZmfy%40f%7DsB_%7EwBb_aDyj%7DA%7CsoBupaA%7CaO%29%29:1631778650981"; uc="jNLcI107+7z1wRs0x4TdO3u5jcaE+Wrl/o5Drt4SE9qHCxSMOrDfJCS1OVr9tkKQ1xbkhwtCXOZOKw1BLkvMTAMsML+Z10HjHWdUIRhRkRXsEcNnPFEt9rqCk0DCCd7EBSE6A/jp5vs="; nl="wrtrmuF9QzNOYYO2P8SN3OqyjHQXevAY7aYvx0cKdUNwML7qYn47dSs63/pFStgOTH50K6V1y0hMkNG4T70na63g0fJdDSpgDegfruZFCA9GnVx058kgR638a8Q81Gz9r1nzfAqJdfs="; SESSION=2e11a953aead5c1f~d77af03d-2746-4657-89a6-ab25cb02e889; contactd77af03d-2746-4657-89a6-ab25cb02e889="{\'email\':\'EqU9vfX+DxXdzl+iReclTDED4UB/DY6iSfTeSLvLlsE=\',\'phone\':\'634821160\',\'phonePrefix\':null,\'friendEmails\':null,\'name\':\'RL9PoIsz5dW5rvLNF/0gvA==\',\'message\':null,\'message2Friends\':null,\'maxNumberContactsAllow\':10,\'defaultMessage\':true}"; sendd77af03d-2746-4657-89a6-ab25cb02e889="{\'friendsEmail\':null,\'email\':\'EqU9vfX+DxXdzl+iReclTDED4UB/DY6iSfTeSLvLlsE=\',\'message\':null}"; datadome=Gegr2x~z62hH2U.ay-n4sMd3xgi-RaB1X5XMr4i5qV2q6GYsINszxSNDS732-spxaAaUp.m7aGMcOgN-DcAxFY9KCQsldTsDl-RVS5ocEm; cc=eyJhbGciOiJIUzI1NiJ9.eyJjdCI6OTk2MjMyNiwiZXhwIjoxNjMxOTUxOTc4fQ.0YJZGR34jgb1SWTAzL9DFPhAeeP4k0hE9igRE8-SQp0' \
--compressed | grep -m 1 "<meta name=\"description\" content=" | sed -E 's;^.*(content=\");;;s;\">$;;' 

我的问题是为什么不起作用sed -E 's;^.*(content=\")\">$;;'?它的目的是给我这个结果:

terreno de 936 m², Terreno en venta en paseo Blasco Ibáñez s/n, Costa Esuri, Ayamonte, Costa Esuri 

而它给了我未经修剪的

<meta name="description" content="terreno de 936 m², Terreno en venta en paseo Blasco Ibáñez s/n, Costa Esuri, Ayamonte, Costa Esuri">

一个人可能需要去https://www.idealista.com/inmueble/94238881/,然后打开开发者工具在他们的浏览器中和复制为 cURL为了玩这个例子。

答案1

您的正则表达式有两个问题。这可能是由于对正则表达式工作原理的误解。

  1. 您使用了“扩展”正则表达式语法,这使得()特殊字符用于表示捕获组。然而,它们不会干扰匹配机制本身。由于您不使用捕获组,因此您的正则表达式相当于
    ^.*content=\"\">$
    
    其期望的模式content=""空的带引号的字符串,紧接着是闭幕式>。这不会发生在您的输入中,因此sed不会执行任何操作,因为未实现匹配。 (顺便说一句,您不需要转义"- 它们在正则表达式中并不特殊,并且您的程序用单引号引起来,因此 shell 也不会误解它们"
  2. 即使您纠正了这一点,如
    ^.*content="[^"]*".*>$
    
    您替换线的匹配部分,由于锚点,该部分是全部的行,带有空字符串,所以在你的情况下,什么都不会留下。

为了缓解这种情况,您需要重复使用捕获组的原始想法,但使用它来包含行的相关部分,然后用捕获组的内容替换整行,如下所示:

sed -E 's;^.*content="([^"]*)".*$;\1;'

"这会将属性名称之后的第一个和下一个之间的内容定义content=为捕获组,但否则匹配整行。然后,它将通过表达式仅使用捕获组的内容替换整行\1

相关内容