每次使用 powershell、batch 或 Notepad++ 在另一个文件中出现某个字符串时替换多行文件

每次使用 powershell、batch 或 Notepad++ 在另一个文件中出现某个字符串时替换多行文件

我有一个包含大约 5 万行不同文本的文件,每次出现此字符串时都需要将其放入另一个文件(有 50 万行)中:

[2]
 1 string data = "

此字符串后面跟着一段英文文本,最后以并且必须用我从 5 万行文件中取出的意大利语文本替换。

50 万行的示例文件(代码.txt):

other lines of code
[1]
other lines of code
    [0]
     1 string data = ""
    [1]
     1 string data = "first spanish text"
    [2]
     1 string data = "first english text"
other lines of code that have japanese and chinese characters.
[2]
other lines of code        
    [0]
     1 string data = ""
    [1]
     1 string data = "another text in spanish"
    [2]
     1 string data = "here is another text in English"
other lines of code that have japanese and chinese characters.

5万行文件(ita.txt):

first text in Italian
another text in Italian
one again

合并输出 (代码塔.txt):

other lines of code
[1]
other lines of code
    [0]
     1 string data = ""
    [1]
     1 string data = "first spanish text"
    [2]
     1 string data = "first text in Italian"
other lines of code that have japanese and chinese characters.
[2]
other lines of code        
    [0]
     1 string data = ""
    [1]
     1 string data = "another text in spanish"
    [2]
     1 string data = "another text in Italian"
other lines of code that have japanese and chinese characters.

我已经尝试过此代码,但它不能正确粘贴所有字符串,因为文件中的字符串在开头和结尾处都有双“(1字符串数据=“”一些文本“和其他”)并且有字符 出现错误时,系统会跳过这些错误,保留英文文本,并将原本应该出现的文本移至下一行。此外,字符编码不正确,导致重音字母、日语、韩语等无法辨认。

$reader = [IO.StreamReader] (Convert-Path ita.txt)

# Read code.txt as a whole
# and perform regex-based replacements of the parts
# of interest, iteratively using ita.txt's lines as replacement text.
[regex]::Replace(
  (Get-Content -Raw code.txt),
  '(?<=\[2\][^=]+?= ")[^"]+',
  { $reader.ReadLine() }
) | 
  Set-Content -NoNewLine codeita.txt  # Save to the output file.

$reader.Dispose() # Close and dispose of the reader.

答案1

用这个解决了:

$reader = [IO.StreamReader] (Convert-Path ita.txt)

[regex]::Replace(
  (Get-Content -Raw -Encoding utf8 code.txt),
  '(?<=\[2\][^=]+?         1 string data = ").*(?=")',
  { $reader.ReadLine() }
) | 
  Set-Content -NoNewLine -Encoding utf8 codeita1.txt  # Save to the output file.

$reader.Dispose() # Close and dispose of the reader.
  • 使用 -Encoding utf8 可以解决重音和亚洲字符的问题
  • 使用 .*(?=") 我表示直到最后一个“ 的所有内容都被排除。

相关内容