Powershell $idValue 不会写入提取的 ID，而是写入“TRUE”——为什么？

2024-12-5 • tag-icon

windows powershell regex

Powershell $idValue 不会写入提取的 ID，而是写入“TRUE”——为什么？

我正在编写一个 Windows PowerShell 脚本，以根据分隔符拆分文本文件，并使用递增数字和捕获的标识符字符串创建输出文件名。

我检查了语法错误，脚本运行时没有错误。拆分、输出文件形成和文件名编号都正常工作，但脚本没有按预期将捕获的标识符值填充到文件名中。它没有使用提取的字符串，而是使用了“True”。

我一直在使用$idValue它来提取我想要的字符串，但我从中得到的只是“True”，就像它看到我使用的正则表达式匹配一样，并同意逻辑条件为真（有匹配）。

但在继续之前，这里是输入文件的简化版本，以便您可以看到其结构：

\id EUCAL
\v Text
\v more text
\z Endsect
\id LEUCO
\v Text
\v more text
\z Endsect

我希望在文件名中看到的字符串位于“\id”字段中。因此，这里是 EUCAL 和 LEUCO。所有这些“id”字符串的长度都恰好为 5 个字符。因此，我在用于提取它们的正则表达式中利用了这一点。该正则表达式肯定有效。但我得到的不是输出文件名 01-EUCAL.txt 和 02-LEUCO.txt（等等），而是 01-True.txt 和 02-True.txt，这令人费解。

我已经验证了正则表达式可以正常工作。我怀疑问题可能出在我如何访问中捕获的组值$idValue。如能提供任何帮助来解决这个问题，我将不胜感激。

以下是 PowerShell 脚本：

$Text = Get-Content -Path "D:\Test_input.txt" -raw  # Read the file content

$SplitText = $Text -split "z Endsect\r?\n?"  # Split with empty lines included
$SplitText = $SplitText -notmatch '^$'    # Filter out empty lines

$i = 1
foreach ($File in $SplitText) {

  # Append "z Endsect" to every entry
  $File += "z Endsect"   

  # Extract the identifier (assuming 5 characters after \id ) using regex
  $idValue = $File -match "\\id\s(.{5})" -replace '(?<=\\id\s)', ''  # Capture 5 characters after \id and space/tab

  # Pad the incrementing number with a leading zero if necessary
  $paddedNumber = "{0:00}" -f $i

  # Construct the new filename with separator and padded number
  $NewFilename = "D:\$paddedNumber-$idValue.txt"

  # Write the content to the new files
  $File | Out-File -FilePath $NewFilename
  $i++

}

答案1

该TRUE值来自于-match。

你最好把这两个指令分开-match，-replace

if ($File -match "\\id\s(.{5})") {
    $idValue = $File -replace '\\id\s', ''
}

答案2

您的脚本有几个问题：

-match运算符，作为其他答案提到，返回一个布尔值。然后你使用-replacebool，这当然永远不会起作用。如果类型不匹配，检查文档总是一个好主意。
$idValue赋值假设$File变量是一行，但事实并非如此。它是一段文本。你只是\id用操作符删除了其中的一部分-replace，其余部分则保持原样。

您实际上需要做的只是提取 ID。不需要双重匹配/替换逻辑，只需要一个。有很多方法可以做到这一点，例如Select-String：

$idValue = $File | Select-String -Pattern "^\\id\s(\w{5})$" | % { $_.Matches.Groups[1].Value } | Select -First 1

解释：

正则表达式匹配行首（^），然后\id跟一个空格（\s），然后我们将 5 个单词字符捕获到一个组中(\w{5})，最后我们确保行结束（$）
% { }是ForEach-Object简写，以防一个块最终会出现多个匹配对象。您声称情况并非如此，但无论如何，最好还是涵盖潜在的边缘情况。如果您 100% 确定每个块只存在 1 个匹配项，则可以跳过此步骤和最后一个步骤。
$_.Matches.Groups[1].Value获取具有 ID 的第一个捕获组值并将其放入新数组中
Select -First 1确保我们只获得第一个 ID

这也可以稍微优化一下：

$idValue = ($File | Select-String -Pattern "^\\id\s(\w{5})$" | Select -First 1).Matches.Groups[1].Value

我还建议以更清晰的方式命名变量，例如$File令人困惑的，而且我并不奇怪你因此忘记了它的整个块。

答案3

为了使这篇文章完整，我将复制已解决的脚本，该脚本现在可以正常工作。它依赖于 Destroy666 的建议。感谢 Destroy666 和 Toto 的关注和时间。

此 Powershell 脚本已完成，对于希望以关键字分隔符分割大型文本文件的任何人来说都很有用。示例输入文件与上面的原始问题相同。请注意，在下面的脚本中，我包含了 Destroy666 建议的两个用于提取 ID 字符串的代码片段。实际上，只需要一个；选择一个并删除另一个。

$Fulltext = Get-Content -Path "D:\Test_input.txt" -Raw  # Read file content

$SplitText = $Fulltext -split "Endsect\r?\n?"  # Split at keyword
$SplitText = $SplitText -notmatch '^$'         # Filter out empty lines

$i = 1
foreach ($Section in $SplitText) {
  # Append "Endsect" to every entry, to replace the keyword that split omits
  $Section += "Endsect" 

  # Create number for new filename, pad the incrementing number with a leading zero if necessary
  $paddedNumber = "{0:00}" -f $i

  # Extract ID string for new filename (coding alternative version 1)
  $idValue = $Section | Select-String -Pattern "\\id\s(\w{5})" | % { $_.Matches.Groups[1].Value } | Select -First 1

  # Extract ID string for new filename (coding alternative version 2)
  $idValue = ($Section | Select-String -Pattern "\\id\s(\w{5})" | Select -First 1).Matches.Groups[1].Value

  # Write new filenames with padded number and ID string
  $Section | Out-File "D:\Output\$paddedNumber-$idValue.txt"  # Write the content to new files

  $i++
}

相关内容