如何使用 sed 交换多个文件中的整个 html 块

Question 1

如果你想把aaa之前的部分放在bbb它后面：

sed -i '/<div class="aaa">/{
        :1
        /<\/div> <\!-- end aaa -->/!{N;b 1}
        /<\/div> <\!-- end aaa -->/{N;h}
        d}
        /<\/div><\!-- end bbb -->/{n;G}' *html

Answer

如果你想把aaa之前的部分放在bbb它后面：

sed -i '/<div class="aaa">/{
        :1
        /<\/div> <\!-- end aaa -->/!{N;b 1}
        /<\/div> <\!-- end aaa -->/{N;h}
        d}
        /<\/div><\!-- end bbb -->/{n;G}' *html

Question 2

这是另一个sed：

sed '/.*<div class="...">.*/{ h;s///;x;:n
     /<.div>/!N;/<!-- end/x;/<.div>/x;//!bn
    s/\(.*\).\(<div class=.*>\).*/\2\1/;x
     /<.div>[^>]*$/s/.//;H;x
}'

从一条class=.???.线开始，穿过你拥有的任意多个块，对于每一对来说，这会交换它们的位置。所以，这里有一些例子：

如果sed遇到匹配的行：

<div class=".\{3\}">

...在读取其文件时，它将确保H旧空间完全清除，然后开始拉入每一行，直到遇到匹配的行：

<.div>

...和...

<!-- end

...或者只是前者。如果两者都匹配，则sed将该块保存在备用缓冲区中，并在交换输出上的位置之前拉入第二个块。

如果只是前者，它不会影响块的位置。通过这种方式，不匹配的对就被保留下来。

作为输入给出...

<div class="wrapper">
<div class="aaa"> first </div> <!-- end aaa -->
between
<div class="bbb"> swap two </div> <!-- end bbb -->
blocks
<div class="ccc"> mismatched </div> <!-- end ccc --> 
the end         
</div>

它打印...

<div class="wrapper">
<div class="bbb"> swap two </div> <!-- end bbb -->
between
<div class="aaa"> first </div> <!-- end aaa -->
blocks
<div class="ccc"> mismatched </div> <!-- end ccc -->
the end
</div>

...如果给出：

<div class="wrapper">
<div class="aaa"> first </div> <!-- end aaa -->
between
<div class="bbb"> swap two </div> <!-- end bbb -->
blocks
<div class="ccc"> matched </div> <!-- end ccc --> 
the end
<div class="ddd"> now matched </div> <!-- end ddd -->
</div>

它打印...

<div class="wrapper">
<div class="bbb"> swap two </div> <!-- end bbb -->
between
<div class="aaa"> first </div> <!-- end aaa -->
blocks
<div class="ddd"> now matched </div> <!-- end ddd -->
the end
<div class="ccc"> matched </div> <!-- end ccc -->
</div>

而且，尽管为了节省空间，这些例子都被这样压缩了，但它并不真正关心开头<div class=和<.div> <!-- end各部分是否在同一行：

<div class="wrapper">
<div class="aaa">

the first
block is here

</div> <!-- end aaa -->

these lines were
between aaa and bbb

<div class="bbb">

this is the second block
it should be swapped with the first

</div> <!-- end bbb -->

more
blocks
follow

<div class="ccc"> this is matched </div> <!-- end ccc -->
not the end
<div class="ddd">

this last block
is matched with the ccc line
</div> <!-- end ddd -->

this is the end
</div>

得到...

<div class="wrapper">
<div class="bbb"> 

this is the second block
it should be swapped with the first

</div> <!-- end bbb -->

these lines were
between aaa and bbb

<div class="aaa"> 

the first
block is here

</div> <!-- end aaa -->

more
blocks
follow

<div class="ddd"> 

this last block
is matched with the ccc line
</div> <!-- end ddd -->
not the end
<div class="ccc"> this is matched </div> <!-- end ccc -->

this is the end
</div>

Answer

这是另一个sed：

sed '/.*<div class="...">.*/{ h;s///;x;:n
     /<.div>/!N;/<!-- end/x;/<.div>/x;//!bn
    s/\(.*\).\(<div class=.*>\).*/\2\1/;x
     /<.div>[^>]*$/s/.//;H;x
}'

从一条class=.???.线开始，穿过你拥有的任意多个块，对于每一对来说，这会交换它们的位置。所以，这里有一些例子：

如果sed遇到匹配的行：

<div class=".\{3\}">

...在读取其文件时，它将确保H旧空间完全清除，然后开始拉入每一行，直到遇到匹配的行：

<.div>

...和...

<!-- end

...或者只是前者。如果两者都匹配，则sed将该块保存在备用缓冲区中，并在交换输出上的位置之前拉入第二个块。

如果只是前者，它不会影响块的位置。通过这种方式，不匹配的对就被保留下来。

作为输入给出...

<div class="wrapper">
<div class="aaa"> first </div> <!-- end aaa -->
between
<div class="bbb"> swap two </div> <!-- end bbb -->
blocks
<div class="ccc"> mismatched </div> <!-- end ccc --> 
the end         
</div>

它打印...

<div class="wrapper">
<div class="bbb"> swap two </div> <!-- end bbb -->
between
<div class="aaa"> first </div> <!-- end aaa -->
blocks
<div class="ccc"> mismatched </div> <!-- end ccc -->
the end
</div>

...如果给出：

<div class="wrapper">
<div class="aaa"> first </div> <!-- end aaa -->
between
<div class="bbb"> swap two </div> <!-- end bbb -->
blocks
<div class="ccc"> matched </div> <!-- end ccc --> 
the end
<div class="ddd"> now matched </div> <!-- end ddd -->
</div>

它打印...

<div class="wrapper">
<div class="bbb"> swap two </div> <!-- end bbb -->
between
<div class="aaa"> first </div> <!-- end aaa -->
blocks
<div class="ddd"> now matched </div> <!-- end ddd -->
the end
<div class="ccc"> matched </div> <!-- end ccc -->
</div>

而且，尽管为了节省空间，这些例子都被这样压缩了，但它并不真正关心开头<div class=和<.div> <!-- end各部分是否在同一行：

<div class="wrapper">
<div class="aaa">

the first
block is here

</div> <!-- end aaa -->

these lines were
between aaa and bbb

<div class="bbb">

this is the second block
it should be swapped with the first

</div> <!-- end bbb -->

more
blocks
follow

<div class="ccc"> this is matched </div> <!-- end ccc -->
not the end
<div class="ddd">

this last block
is matched with the ccc line
</div> <!-- end ddd -->

this is the end
</div>

得到...

<div class="wrapper">
<div class="bbb"> 

this is the second block
it should be swapped with the first

</div> <!-- end bbb -->

these lines were
between aaa and bbb

<div class="aaa"> 

the first
block is here

</div> <!-- end aaa -->

more
blocks
follow

<div class="ddd"> 

this last block
is matched with the ccc line
</div> <!-- end ddd -->
not the end
<div class="ccc"> this is matched </div> <!-- end ccc -->

this is the end
</div>

Question 3

这不适合sed，除非你是一个贪图惩罚的人。至少在更一般的情况下，块的开头超过一行（或者标签被分成几行，这在 XML/HTML 中很可能）。

如果您确实必须使用 XML 解析器以外的任何其他东西来完成此操作（是的，修复输入或删除损坏的部分通常是一个更好的主意），awk那么至少使用类似的东西 - 它更适合这样的任务^{* ）}。总体思路是：

打印输入行直到第一个块的开头；
累积第一个要交换的块的行；
在块之间累积线条；
打印第二个块的行；
打印步骤3中积累的块之间部分的行；
打印步骤2中累积的第一个块的行；
打印其余部分。

还要记得检查规范的 SO 问答。

^*为什么我声称： sed 是面向行的并且旨在简单 (你的旅费可能会改变) 文本转换。虽然这对于 AWK（在某种程度上也适用于 Perl）也是如此，但后者编写更复杂的脚本更简单（更容易访问多个变量、自动拆分为字段等）。因此，除非您只需要交换两个分隔得非常好的块，并且永远不需要扩展脚本来处理不同格式的输入，否则更复杂的语言可能是更好的工具。也就是说，Perl 有一个可作为模块随时使用的 XML 解析器。

Answer

这不适合sed，除非你是一个贪图惩罚的人。至少在更一般的情况下，块的开头超过一行（或者标签被分成几行，这在 XML/HTML 中很可能）。

如果您确实必须使用 XML 解析器以外的任何其他东西来完成此操作（是的，修复输入或删除损坏的部分通常是一个更好的主意），awk那么至少使用类似的东西 - 它更适合这样的任务^{* ）}。总体思路是：

打印输入行直到第一个块的开头；
累积第一个要交换的块的行；
在块之间累积线条；
打印第二个块的行；
打印步骤3中积累的块之间部分的行；
打印步骤2中累积的第一个块的行；
打印其余部分。

还要记得检查规范的 SO 问答。

^*为什么我声称： sed 是面向行的并且旨在简单 (你的旅费可能会改变) 文本转换。虽然这对于 AWK（在某种程度上也适用于 Perl）也是如此，但后者编写更复杂的脚本更简单（更容易访问多个变量、自动拆分为字段等）。因此，除非您只需要交换两个分隔得非常好的块，并且永远不需要扩展脚本来处理不同格式的输入，否则更复杂的语言可能是更好的工具。也就是说，Perl 有一个可作为模块随时使用的 XML 解析器。

Question 4

显然不鼓励使用正则表达式解析 HTML。

相反，您可以使用路径&xmlstarlet如果您的源文件是有效的 XHTML ：

xmlstarlet edit -L -u "//div[@class='a']" -v 'some inner HTML' file.xhtml

如果它不是有效的 XHTML，请尝试修改以下 perl 代码：

use strict;
use warnings;
use 5.008;

use File::Slurp 'read_file';
use HTML::TreeBuilder;

sub replace_keyword
{
  my $elt = shift;

  return if $elt->is_empty;

  $elt->normalize_content;      # Make sure text is contiguous

  my $content = $elt->content_array_ref;

  for (my $i = 0; $i < @$content; ++$i) {
    if (ref $content->[$i]) {
      # It's a child element, process it recursively:
      replace_keyword($content->[$i])
          unless $content->[$i]->tag eq 'a'; # Don't descend into <a>
    } else {
      # It's text:
      if ($content->[$i] =~ /here/) { # your keyword or regexp here
        $elt->splice_content(
          $i, 1, # Replace this text element with...
          substr($content->[$i], 0, $-[0]), # the pre-match text
          # A hyperlink with the keyword itself:
          [ a => { href => 'http://example.com' },
            substr($content->[$i], $-[0], $+[0] - $-[0]) ],
          substr($content->[$i], $+[0])   # the post-match text
        );
      } # end if text contains keyword
    } # end else text
  } # end for $i in content index
} # end replace_keyword


my $content = read_file('foo.shtml');

# Wrap the SHTML fragment so the comments don't move:
my $html = HTML::TreeBuilder->new;
$html->store_comments(1);
$html->parse("<html><body>$content</body></html>");

my $body = $html->look_down(qw(_tag body));
replace_keyword($body);

# Now strip the wrapper to get the SHTML fragment back:
$content = $body->as_HTML;
$content =~ s!^<body>\n?!!;
$content =~ s!</body>\s*\z!!;

借自https://stackoverflow.com/questions/3900870/how-can-i-modify-html-files-in-perl

Answer

显然不鼓励使用正则表达式解析 HTML。

相反，您可以使用路径&xmlstarlet如果您的源文件是有效的 XHTML ：

xmlstarlet edit -L -u "//div[@class='a']" -v 'some inner HTML' file.xhtml

如果它不是有效的 XHTML，请尝试修改以下 perl 代码：

use strict;
use warnings;
use 5.008;

use File::Slurp 'read_file';
use HTML::TreeBuilder;

sub replace_keyword
{
  my $elt = shift;

  return if $elt->is_empty;

  $elt->normalize_content;      # Make sure text is contiguous

  my $content = $elt->content_array_ref;

  for (my $i = 0; $i < @$content; ++$i) {
    if (ref $content->[$i]) {
      # It's a child element, process it recursively:
      replace_keyword($content->[$i])
          unless $content->[$i]->tag eq 'a'; # Don't descend into <a>
    } else {
      # It's text:
      if ($content->[$i] =~ /here/) { # your keyword or regexp here
        $elt->splice_content(
          $i, 1, # Replace this text element with...
          substr($content->[$i], 0, $-[0]), # the pre-match text
          # A hyperlink with the keyword itself:
          [ a => { href => 'http://example.com' },
            substr($content->[$i], $-[0], $+[0] - $-[0]) ],
          substr($content->[$i], $+[0])   # the post-match text
        );
      } # end if text contains keyword
    } # end else text
  } # end for $i in content index
} # end replace_keyword


my $content = read_file('foo.shtml');

# Wrap the SHTML fragment so the comments don't move:
my $html = HTML::TreeBuilder->new;
$html->store_comments(1);
$html->parse("<html><body>$content</body></html>");

my $body = $html->look_down(qw(_tag body));
replace_keyword($body);

# Now strip the wrapper to get the SHTML fragment back:
$content = $body->as_HTML;
$content =~ s!^<body>\n?!!;
$content =~ s!</body>\s*\z!!;

借自https://stackoverflow.com/questions/3900870/how-can-i-modify-html-files-in-perl

如何使用 sed 交换多个文件中的整个 html 块

答案1

答案2

答案3

答案4

相关内容