为什么 Dreamweaver 的正则表达式会替换成错误的字符?(--如何用正则表达式替换特定于 html 标签(内部)的内容?)

为什么 Dreamweaver 的正则表达式会替换成错误的字符?(--如何用正则表达式替换特定于 html 标签(内部)的内容?)

情况

这是我在 Dreamweaver 中用于正则表达式替换的正则表达式和替换:

(( and )|( that )|( include )|( includes )|( including ))
$1<br/>\n

图像:Dreamweaver 中的正则表达式替换设置

2.

Replace All当前文档中。我发现很少错误

例如::

create new Flux and Mono instances

被替换为(错误的)

create new Flux that <br/>
Mono instances

实际图像

而不是(正确的)

create new Flux and <br/>
Mono instances

(regex101 可视化 =>https://regex101.com/r/hJTqFg/1

问题

为什么这是发生了什么事?

我怎么能够避免这个?(更安全的替换方法?)

以前有过这种事情发生吗?

笔记:

  • 很多本文档中发生了替换 - 大约 300 多次 - 该文档很大。
  • 当替换过程正在进行时。我可以看到 Dreamweaver滚动通过源代码迅速地

    而不仅仅是即刻替换所有替换项(这与其他文本编辑软件所做的不同)。

    这是正常的吗?

    这让我怀疑问题在于替换速度-- 滞后使得 Dreamweaver 以某种方式将文本替换为之前替换的文本...?--(但我不知道,我不知道 Dreamweaver 内部是如何实现的..)

  • (虽然错误很少 —— 但这种情况仍然不应该发生。)

  • (替换专门针对 html 标签<p>(我不认为这是问题所在))

  • (更确切地说,

    原文是create new <code class="literal">Flux</code> and <code class="literal">Mono</code> instances

    而不仅仅是create new Flux and Mono instances

    (为了易读性,我简化了它,但是这并不重要。)

  • 使用 Dreamweaver 版本 2021

  • 替换.xhtml文件

  • 由于|正则表达式模式,它不仅仅被替换为其他字符串(在正则表达式模式中)。

    还存在一些错误,一些角色就消失了......(尽管这种情况更为罕见)

    例如:blockhound变成lockhound1.0.1.RELEASE变成.0.1.RELEASE

    (我不只是做了 1 个正则表达式替换模式,还有其他模式应用于该文档;

    但是,上面这两个例子肯定不应该与我用于本文档的任何正则表达式模式匹配……)

  • 我做了另一个正则表达式替换测试,通过使用 中的选项(Documents in) Folder ...,而不是Current Document

    ——所以,效果scrolling不会出现(&这似乎过程更快)

    但即使这样,仍然有错误。

    scrolling—— 因此,错误的发生似乎与Dreamweaver 中的正则表达式替换无关。



以下是上述内容的简化示例(如果上述内容包含冗余信息)

简而言之:

0. 你有一些文本

<p>AA and BB</p>

1. 如果你使用包含or句法,|,例如:

(( and )|( that ))

2. 你的替代品包含一个捕获组$1,例如:

$1foobar

3. 然后你执行正则表达式全部替换在 Dreamweaver 中特定于标签,说:(<p>我称之为specific-targeting-replacement tag

4.

  • 并且有一个标签(不是类型的标签<p>),上面<p>说:(<li>我称之为non-specific-targeting-replacement tag

  • <li> 包含单词thatthat在正则表达式模式(带or语法)中与 相邻的单词and),

    • (我把这个词that称为adjacent-replacement word

    • 我称这个词andto-be-replaced word

<ul>
  <li>xxxx that xxxx</li>
</ul>

5. 然后标签中的文本<p> 低于该值标签<li>,将被替换为错误的字符串(adjacent-replacement word)。例如:

应将其替换为(正确)

<p>AA and foobarBB</p>

但可以替换为(错误的

<p>AA that foobarBB</p>

img:导致此错误的文件和程序

测试文件:

<!DOCTYPE html>
<html>
<body>
<p>AA and BB</p>
<p>AA and BB</p>
<ul>
  <li>xxxx that xxxx</li>
</ul>
<p>AA and BB</p>
<p>AA and BB</p>
</body>
</html>

答案1

以下是使用 Notepad++ 的解决方案:

  • Ctrl+H
  • 找什么:(?:<p>|\G)(?:(?!</p>).)*?\b(?:and|that|includes?|including)\b\K(?=.*?</p>)
  • 用。。。来代替:foobar
  • 查看 环绕
  • 查看 正则表达式
  • 查看 . matches newline
  • Replace all

解释:

(?:         # non capture group
    <p>         # openning tag
  |           # OR
    \G          # restart from last match position
)           # end group
        # Tempered Greedy Token
(?:         # non capture group
    (?!</p>)    # negative lookahead, make sure we haven't </p> just after
    .           # any character
)*?         # end group, may appear 0 or more times, not greedy
\b          # word boundary
(?:         # non capture   group
    and         # literally
  |           # OR
    that        # literally
  |           # OR
    includes?   # literally include OR includes
  |           # OR
    including   # literally
)           # end group
\b          # word boundary
\K          # reset operator, forget all we have seen until this position
(?=         # positive lookahead, make sure we have after:
    .*?         # 0 or more any character, noot greedy
    </p>        # closing tag
)           # end lookahead

截图(之前):

在此处输入图片描述

截图(之后):

在此处输入图片描述

答案2

这是一个错误。目前 Dreamweaver 中还没有很好的解决方案。但是有解决方法。

解决方案1(解决方法,非常有限)

使用时Dreamweaver. 请勿在正则表达式模式中使用|( or)语法和语法。grouping

您需要逐个替换它们。(这是正则表达式的巨大限制)

  • (不过,似乎有办法批量处理replacement queries=>关联

解决方案2(推荐)

使用Calibre 和 Python函数进行替换。

例如::

import re

# // finds and replaces the content inside tag <p> -- by `re.sub`
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):

    # // use the following Regex for finding in Calibre
    # (?P<tag_opening><p(|\s+[^>]*)>)(?P<content_inside_tag>.*?)(?P<tag_closing><\/p\s*>)
    m_p1 = match.group("tag_opening")
    m_main = match.group("content_inside_tag")
    m_p2 = match.group("tag_closing")

    m_main_re = m_main
    
    m_main_re = re.sub("(( and )|( that ))", "\\g<0><br>\n", m_main_re)

    # print(m_main_re)
    # return match.group(0)
    return m_p1 + m_main_re + m_p2

解决方案3(推荐)

与解决方案2相同,但Java代码。

package com.ex.main;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

/*
@to_use: 
1. put the files that need to be regex replaced in `input folder` 
2. 
change the regex pattern in 
regexReplaceFileContent()
regexReplaceFileContent_innerMatch()
3. run 
4. get output filed from `output folder`
*/
public class RegexReplaceFile {

  //################################################################################################

  // loop & read the files in input folder & invoke regex replace for each file & output the replaced file to output folder
  // (files in input folder will not be modified)
  public static void regexReplaceAndReadWriteFilesInFolder(String path_InputFolder, String path_OutputFolder) {
    File dir_InputFolder = new File(path_InputFolder);
    File[] arr_file_InInputFolder = dir_InputFolder.listFiles();
    if (arr_file_InInputFolder.length == 0) {
      System.out.println("Folder is empty.");
      return;
    }
    for (File file_curr : arr_file_InInputFolder) {
      if (file_curr.isDirectory()) {
        System.out.println("Directory: " + file_curr.getAbsolutePath());
        // showFiles(file_curr.listFiles()); // recursion (if need to loop inside sub folders)
      } else {
        String path_InputFile = file_curr.getAbsolutePath();
        String path_OutputFile = path_OutputFolder + "/" + file_curr.getName();
        System.out.println("Input  File: " + path_InputFile);
        System.out.println("Output File: " + path_OutputFile);
        // do the regex replace for each file
        regexReplaceAndReadWriteFile(path_InputFile, path_OutputFile);
      }
    }
  }

  // ^^ do the regex replace for each file
  public static void regexReplaceAndReadWriteFile(String path_InputFile, String path_OutputFile) {
    // >> read content from input file
    StringBuilder contentBuilder = new StringBuilder();
    try {
      BufferedReader in = new BufferedReader(new FileReader(path_InputFile));
      String str;
      while ((str = in.readLine()) != null) {
        contentBuilder.append(str);
        contentBuilder.append("\n");
      }
      in.close();
    } catch (IOException e) {
      e.printStackTrace();
    }
    String content = contentBuilder.toString();
    //    System.out.println(content);

    // >> regex replace
    String content_re = regexReplaceContent_SpecificToATag(content);

    // >> write content to output file
    byte[] byte_Content = content_re.getBytes(StandardCharsets.UTF_8);
    Path file = Paths.get(path_OutputFile);
    try {
      Files.write(file, byte_Content);
    } catch (IOException e) {
      e.printStackTrace();
    }

  }

  //##########################################

  // ## loop replace with `StringBuilder.append & inner replace` 
  public static String regexReplaceContent_SpecificToATag(String content, boolean det_OmitSyso) {
    // >> find content specific to tag <p> 
    final String regexNamedGroup_tagOpening = "tagOpening";
    final String regexNamedGroup_contentInsideTag = "contentInsideTag";
    final String regexNamedGroup_tagClosing = "tagClosing";

    String content_SearchOn = content;
    String str_RegexPattern = "(?s)(?<tagOpening><p(|\s+[^>]*)>)(?<contentInsideTag>.*?)(?<tagClosing></p\s*>)"; // @to_use-param;
    Pattern pattern = Pattern.compile(str_RegexPattern);
    Matcher matcher = pattern.matcher(content_SearchOn);

    // >> for the content in each (found) paragraph <p>, do an replace
    StringBuilder sb_ContentSearchOn = new StringBuilder(content_SearchOn);
    StringBuilder content_Replaced = new StringBuilder();
    int ind_MatchGroupEnd_prev = 0;
    int ind_MatchGroupEnd_curr;
    int ind_MatchGroupStart_curr;
    while (matcher.find()) {
      // 
      ind_MatchGroupStart_curr = matcher.start(regexNamedGroup_contentInsideTag);
      ind_MatchGroupEnd_curr = matcher.end(regexNamedGroup_contentInsideTag);

      String content_BeforeMatchGroup = sb_ContentSearchOn.substring(ind_MatchGroupEnd_prev, ind_MatchGroupStart_curr); // prev end to curr start, not start to end

      content_Replaced.append(content_BeforeMatchGroup);

      // ^^ for the content in each (found) paragraph <p>, do an replace -- [inner match]
      String content_SearchOn_innerMatch__TheMatchGroup = matcher.group(regexNamedGroup_contentInsideTag);
      String content_Replaced_innerMatch = regexReplaceContent_SpecificToATag_innerMatch(content_SearchOn_innerMatch__TheMatchGroup);

      content_Replaced.append(content_Replaced_innerMatch);

      // 
      ind_MatchGroupEnd_prev = ind_MatchGroupEnd_curr;
    }

    // append the content after the last match group
    String content_AfterLastMatchGroup = sb_ContentSearchOn.substring(ind_MatchGroupEnd_prev, sb_ContentSearchOn.length());
    content_Replaced.append(content_AfterLastMatchGroup);

    // >>
    if (det_OmitSyso) {
      System.out.println(content_Replaced.substring(0, 300) + "\n[... omitted]\n");
    } else {
      System.out.println(content_Replaced);
    }
    return content_Replaced.toString();
  }

  public static String regexReplaceContent_SpecificToATag(String content) {
    boolean det_OmitSyso = true;
    return regexReplaceContent_SpecificToATag(content, det_OmitSyso); // 
  }

  // ^^ for the content in each (found) paragraph <p>, do an replace -- [inner match]
  private static String regexReplaceContent_SpecificToATag_innerMatch(String content_SearchOn_innerMatch__TheMatchGroup) {
    String str_RegexPattern_innerMatch;
    String str_Substitution_innerMatch;
    String content_Replaced_innerMatch = content_SearchOn_innerMatch__TheMatchGroup;

    // 
    //    str_RegexPattern_innerMatch = "( and then )";
    //    str_Substitution_innerMatch = " and then <br>\n";
    //    content_Replaced_innerMatch = content_Replaced_innerMatch.replaceAll(str_RegexPattern_innerMatch, str_Substitution_innerMatch);
    //
    str_RegexPattern_innerMatch = "(( and )|( that ))";
    str_Substitution_innerMatch = "$0<br>\n";
    content_Replaced_innerMatch = content_Replaced_innerMatch.replaceAll(str_RegexPattern_innerMatch, str_Substitution_innerMatch);
    //
    //    str_RegexPattern_innerMatch = "((\\. )|(, ))";
    //    str_Substitution_innerMatch = "$1<br>\n";
    //    content_Replaced_innerMatch = content_Replaced_innerMatch.replaceAll(str_RegexPattern_innerMatch, str_Substitution_innerMatch);
    //
    //    str_RegexPattern_innerMatch = "( by )";
    //    str_Substitution_innerMatch = " <br>\n++ by ";
    //    content_Replaced_innerMatch = content_Replaced_innerMatch.replaceAll(str_RegexPattern_innerMatch, str_Substitution_innerMatch);

    return content_Replaced_innerMatch;
  }

  //################################################################################################

  static final String content_TESTING = "    <p>These days if you are an android developer, you might hear some hype about RxJava. \n"
                                        + "RxJava is library which can help you get rid of all you complex write-only code that deals with asynchronous events. Once you start using it in your project – you will use it everywhere.</p>\n"
                                        + "\n"
                                        + "<p>The main pitfall here is steep learning curve. If you have never used RxJava before, it will be hard or confusing to take full advantage of it for the first time. The whole way you think about writing code is a little different.\n"
                                        + "Such learning curve creates problems for massive RxJava adoption in most projects.</p>\n"
                                        + "\n"
                                        + "<p>Of course there are a lot of tutorials and code examples around that explain how to use RxJava. \n"
                                        + "Developer interested in learning and using RxJava can first visit  the official <a href=\"https://github.com/ReactiveX/RxJava/wiki\">Wiki</a> that contains great explanation of what Observable is, how it’s related to Iterable and Future. Another useful resource is <a href=\"https://github.com/ReactiveX/RxJava/wiki/How-To-Use-RxJava\">How To Use RxJava</a> page which shows code examples of how to emit items and <code class=\"highlighter-rouge\">println</code> them.</p>\n"
                                        + "\n"
                                        + "<p>But what one really wants to know, is what problem RxJava will solve and how it will help organize async code without actually learning what Observable is.</p>\n"
                                        + "\n"
                                        + "<p>My goal here is to show some “prequel” to read before the official documentation in order to better understand the problems that RxJava tries to solve.\n"
                                        + "This article is positioned as a small walk-through on how to reorganize messy Async code by ourselfs to see how we can implement basic principles of RxJava without actually using RxJava.</p>\n"
                                        + "\n"
                                        + "<p>So If you are still curious let’s get started!</p>\n"
                                        + "\n"
                                        + "<h2 id=\"cat-app\">Cat App</h2>\n"
                                        + "<p>So let’s create a <em>real world</em> example. So we know that cats are the engine of technology progress, so let’s build \n"
                                        + "a typical app for downloading cat pictures.</p>\n"
                                        + "\n"
                                        + "<h4 id=\"so-here-is-the-task\">So here is the task:</h4>\n"
                                        + "<blockquote>\n"
                                        + "  <p>We have a webservice that provides api to search the whole internet for\n"
                                        + "images of cats by given query. Every image will contain cuteness\n"
                                        + "parameter - integer value that describes how cute is that picture. Our\n"
                                        + "task will be download a list of cats, choose the most cutest, and save\n"
                                        + "it to local storage.</p>\n"
                                        + "</blockquote>\n"
                                        + "\n"
                                        + "<p>We will focus only on downloading, processing and saving cats data.</p>\n"
                                        + "\n"
                                        + "<p>So let’s start:</p>\n"
                                        + "";

  public static void main(String[] args) throws Exception {
    // >> @to_use @M2 direclty input text
    boolean det_OmitSyso = false;
    regexReplaceContent_SpecificToATag(content_TESTING, det_OmitSyso);

    //    // >> @to_use @M1 input text from folder
    //    regexReplaceAndReadWriteFilesInFolder("G:/wsp/eclipse/RegexReplaceFile_AT_tool_AT_NT/src/main/resources/input__RegexReplace",
    //                                          "G:/wsp/eclipse/RegexReplaceFile_AT_tool_AT_NT/src/main/resources/output_RegexReplace");

  }

}

相关内容