PowerShell 正则表达式难题

Question 1

我不确定这与 powershell 配合得如何，但可以尝试一下： (*CRLF)CONCLUSION:\sImpression\s\s

Answer

我不确定这与 powershell 配合得如何，但可以尝试一下： (*CRLF)CONCLUSION:\sImpression\s\s

Question 2

以下是我拙劣的暴力攻击：

$blankRows = @()
$targetRows = @()
$rowNum = 0
$foundRows = @()

$myFile = Get-Content $thisFile

Foreach($thisRow in $myFile){
    $rowNum++
    if($thisRow -eq "Impression"){$blankRows += $rowNum}
    if ($thisRow -match 'CONCLUSION'){$targetRows += $rowNum}
}

Foreach($blrw in $blankRows){
    if ($targetRows -contains ($blrw-1)){$foundRows += $blrw}

}

$foundRows

Answer

以下是我拙劣的暴力攻击：

$blankRows = @()
$targetRows = @()
$rowNum = 0
$foundRows = @()

$myFile = Get-Content $thisFile

Foreach($thisRow in $myFile){
    $rowNum++
    if($thisRow -eq "Impression"){$blankRows += $rowNum}
    if ($thisRow -match 'CONCLUSION'){$targetRows += $rowNum}
}

Foreach($blrw in $blankRows){
    if ($targetRows -contains ($blrw-1)){$foundRows += $blrw}

}

$foundRows

Question 3

为什么不这样做呢？

# Create some sample data file
@'
Impression CONCLUSION:
Impression

Impression CONCLUSION:
Impression SomeData

Impression CONCLUSION:
Impression SomeOtherData

Impression CONCLUSION:
Impression
'@ | 
Out-File -FilePath 'D:\Temp\Imprestion.txt' -Force
Get-Content -Path 'D:\Temp\Imprestion.txt'
# Results
<#
Impression CONCLUSION:
Impression

Impression CONCLUSION:
Impression SomeData

Impression CONCLUSION:
Impression SomeOtherData

Impression CONCLUSION:
Impression
#>

# Import the sample data file as a CSV, use the space as a Delimiter
Import-Csv -Path 'D:\Temp\Imprestion.txt' -Delimiter ' ' -Header Property, Value
# Results
<#
Property   Value        
--------   -----        
Impression CONCLUSION:  
Impression              
Impression CONCLUSION:  
Impression SomeData     
Impression CONCLUSION:  
Impression SomeOtherData
Impression CONCLUSION:  
Impression              
#>

# Filter by the Value property
Import-Csv -Path 'D:\Temp\Imprestion.txt' -Delimiter ' ' -Header Property, Value | 
Where-Object -Property Value -EQ $null
# Results
<#
Property   Value
--------   -----
Impression      
Impression  
#>

# Using Select-String and a RegEx 'Not Match'
Select-String -Path 'D:\Temp\Imprestion.txt' -Pattern '^((?!Impression [a-zA-Z]).)*$'
# Results
<#
D:\Temp\Imprestion.txt:2:Impression
D:\Temp\Imprestion.txt:3:
D:\Temp\Imprestion.txt:6:
D:\Temp\Imprestion.txt:9:
D:\Temp\Imprestion.txt:11:Impression
#>

# Using Select-String and RegEx match
Select-String -Path 'D:\Temp\Imprestion.txt' -Pattern '^Impression\s*$'
# Results
<#
D:\Temp\Imprestion.txt:2:Impression
D:\Temp\Imprestion.txt:11:Impression
#>

解释：

^ 是字符串锚点的开头。
$ 是字符串锚点的结尾。
\s 是空白字符类。
是零次或多次重复。

Answer

为什么不这样做呢？

# Create some sample data file
@'
Impression CONCLUSION:
Impression

Impression CONCLUSION:
Impression SomeData

Impression CONCLUSION:
Impression SomeOtherData

Impression CONCLUSION:
Impression
'@ | 
Out-File -FilePath 'D:\Temp\Imprestion.txt' -Force
Get-Content -Path 'D:\Temp\Imprestion.txt'
# Results
<#
Impression CONCLUSION:
Impression

Impression CONCLUSION:
Impression SomeData

Impression CONCLUSION:
Impression SomeOtherData

Impression CONCLUSION:
Impression
#>

# Import the sample data file as a CSV, use the space as a Delimiter
Import-Csv -Path 'D:\Temp\Imprestion.txt' -Delimiter ' ' -Header Property, Value
# Results
<#
Property   Value        
--------   -----        
Impression CONCLUSION:  
Impression              
Impression CONCLUSION:  
Impression SomeData     
Impression CONCLUSION:  
Impression SomeOtherData
Impression CONCLUSION:  
Impression              
#>

# Filter by the Value property
Import-Csv -Path 'D:\Temp\Imprestion.txt' -Delimiter ' ' -Header Property, Value | 
Where-Object -Property Value -EQ $null
# Results
<#
Property   Value
--------   -----
Impression      
Impression  
#>

# Using Select-String and a RegEx 'Not Match'
Select-String -Path 'D:\Temp\Imprestion.txt' -Pattern '^((?!Impression [a-zA-Z]).)*$'
# Results
<#
D:\Temp\Imprestion.txt:2:Impression
D:\Temp\Imprestion.txt:3:
D:\Temp\Imprestion.txt:6:
D:\Temp\Imprestion.txt:9:
D:\Temp\Imprestion.txt:11:Impression
#>

# Using Select-String and RegEx match
Select-String -Path 'D:\Temp\Imprestion.txt' -Pattern '^Impression\s*$'
# Results
<#
D:\Temp\Imprestion.txt:2:Impression
D:\Temp\Imprestion.txt:11:Impression
#>

解释：

^ 是字符串锚点的开头。
$ 是字符串锚点的结尾。
\s 是空白字符类。
是零次或多次重复。

Question 4

我仍然不确定您到底想捕获什么，完全空白的行还是带有Impression且没有后续数据的行。

但根据文件，获取内容：

对于文件来说，每次读取一行内容并返回一个对象集合，每个对象代表一行内容。

因此，<newline>字符被“用作”字符串数组的元素分隔符。
尝试：

Get-Content -Path "C:\temp\Log\Conclusion.txt" | gm
(Get-Content -Path "C:\temp\Log\Conclusion.txt").Count

因此您永远不会将 a<newline>与所写的代码相匹配。

您可以使用-Raw参数将文件视为一个带有<newline>字符的长字符串，或者重写搜索数组的逻辑。

此外，选择字符串表明它可以直接获取文件并逐行处理，而不需要Get-Content。

更新

我仍然不确定你想输出哪一行，但这是我的骨架样本思考你问的。我的假设是：

您要查找的是“印象”后面没有文本的行（请说明这是文字还是数据）
当找到匹配项时，将捕获前一行作为输出文件。

$File = Get-Content -Path "C:\temp\Log\Conclusion.txt"
$OutLines   = @()
$EmptyMatch = '^Impression\s*$'
$i = 0
ForEach ( $line in $File ) {
    If ( $line -match $EmptyMatch ) {
        $OutLines += $File[ $i - 1 ]
    }
    $i++
}
$OutLines | Set-Content 'c:\MyStuff\OutFile.txt'
Add

Answer