我有一个包含大约 5 万行不同文本的文件,每次出现此字符串时都需要将其放入另一个文件(有 50 万行)中:
[2]
1 string data = "
此字符串后面跟着一段英文文本,最后以“并且必须用我从 5 万行文件中取出的意大利语文本替换。
50 万行的示例文件(代码.txt):
other lines of code
[1]
other lines of code
[0]
1 string data = ""
[1]
1 string data = "first spanish text"
[2]
1 string data = "first english text"
other lines of code that have japanese and chinese characters.
[2]
other lines of code
[0]
1 string data = ""
[1]
1 string data = "another text in spanish"
[2]
1 string data = "here is another text in English"
other lines of code that have japanese and chinese characters.
5万行文件(ita.txt):
first text in Italian
another text in Italian
one again
合并输出 (代码塔.txt):
other lines of code
[1]
other lines of code
[0]
1 string data = ""
[1]
1 string data = "first spanish text"
[2]
1 string data = "first text in Italian"
other lines of code that have japanese and chinese characters.
[2]
other lines of code
[0]
1 string data = ""
[1]
1 string data = "another text in spanish"
[2]
1 string data = "another text in Italian"
other lines of code that have japanese and chinese characters.
我已经尝试过此代码,但它不能正确粘贴所有字符串,因为文件中的字符串在开头和结尾处都有双“(1字符串数据=“”一些文本“和其他”)并且有字符‘ 出现错误时,系统会跳过这些错误,保留英文文本,并将原本应该出现的文本移至下一行。此外,字符编码不正确,导致重音字母、日语、韩语等无法辨认。
$reader = [IO.StreamReader] (Convert-Path ita.txt)
# Read code.txt as a whole
# and perform regex-based replacements of the parts
# of interest, iteratively using ita.txt's lines as replacement text.
[regex]::Replace(
(Get-Content -Raw code.txt),
'(?<=\[2\][^=]+?= ")[^"]+',
{ $reader.ReadLine() }
) |
Set-Content -NoNewLine codeita.txt # Save to the output file.
$reader.Dispose() # Close and dispose of the reader.
答案1
用这个解决了:
$reader = [IO.StreamReader] (Convert-Path ita.txt)
[regex]::Replace(
(Get-Content -Raw -Encoding utf8 code.txt),
'(?<=\[2\][^=]+? 1 string data = ").*(?=")',
{ $reader.ReadLine() }
) |
Set-Content -NoNewLine -Encoding utf8 codeita1.txt # Save to the output file.
$reader.Dispose() # Close and dispose of the reader.
- 使用 -Encoding utf8 可以解决重音和亚洲字符的问题
- 使用 .*(?=") 我表示直到最后一个“ 的所有内容都被排除。