每次使用 powershell、batch 或 Notepad++ 在另一个文件中出现某个字符串时替换多行文件

2024-12-1 • tag-icon

每次使用 powershell、batch 或 Notepad++ 在另一个文件中出现某个字符串时替换多行文件

我有一个包含大约 5 万行不同文本的文件，每次出现此字符串时都需要将其放入另一个文件（有 50 万行）中：

[2]
 1 string data = "

此字符串后面跟着一段英文文本，最后以“并且必须用我从 5 万行文件中取出的意大利语文本替换。

50 万行的示例文件（代码.txt)：

other lines of code
[1]
other lines of code
    [0]
     1 string data = ""
    [1]
     1 string data = "first spanish text"
    [2]
     1 string data = "first english text"
other lines of code that have japanese and chinese characters.
[2]
other lines of code        
    [0]
     1 string data = ""
    [1]
     1 string data = "another text in spanish"
    [2]
     1 string data = "here is another text in English"
other lines of code that have japanese and chinese characters.

5万行文件（ita.txt)：

first text in Italian
another text in Italian
one again

合并输出 (代码塔.txt)：

other lines of code
[1]
other lines of code
    [0]
     1 string data = ""
    [1]
     1 string data = "first spanish text"
    [2]
     1 string data = "first text in Italian"
other lines of code that have japanese and chinese characters.
[2]
other lines of code        
    [0]
     1 string data = ""
    [1]
     1 string data = "another text in spanish"
    [2]
     1 string data = "another text in Italian"
other lines of code that have japanese and chinese characters.

我已经尝试过此代码，但它不能正确粘贴所有字符串，因为文件中的字符串在开头和结尾处都有双“（1字符串数据=“”一些文本“和其他”）并且有字符‘ 出现错误时，系统会跳过这些错误，保留英文文本，并将原本应该出现的文本移至下一行。此外，字符编码不正确，导致重音字母、日语、韩语等无法辨认。

$reader = [IO.StreamReader] (Convert-Path ita.txt)

# Read code.txt as a whole
# and perform regex-based replacements of the parts
# of interest, iteratively using ita.txt's lines as replacement text.
[regex]::Replace(
  (Get-Content -Raw code.txt),
  '(?<=\[2\][^=]+?= ")[^"]+',
  { $reader.ReadLine() }
) | 
  Set-Content -NoNewLine codeita.txt  # Save to the output file.

$reader.Dispose() # Close and dispose of the reader.

答案1

用这个解决了：

$reader = [IO.StreamReader] (Convert-Path ita.txt)

[regex]::Replace(
  (Get-Content -Raw -Encoding utf8 code.txt),
  '(?<=\[2\][^=]+?         1 string data = ").*(?=")',
  { $reader.ReadLine() }
) | 
  Set-Content -NoNewLine -Encoding utf8 codeita1.txt  # Save to the output file.

$reader.Dispose() # Close and dispose of the reader.

使用 -Encoding utf8 可以解决重音和亚洲字符的问题
使用 .*(?=") 我表示直到最后一个“ 的所有内容都被排除。

答案1

相关内容