如何使用批处理搜索重复数据?目标是从 data.txt 文件中删除重复的“Changelist: XXXXX”条目。我有点不知所措,有人能帮我吗?
请查看 output.txt 以了解所需的输出。
数据.txt
====================================
Changelist: 808298
Date: 2015/03/19
Developer: A
ShortDescr: Checking in the following graphics:
CodeReview:
CodeReview: Result: @result___
====================================
Changelist: 808273
Date: 2015/03/19
Developer: B
ShortDescr: Hello
CodeReview: Result:
====================================
Changelist: 808271
Date: 2015/03/19
Developer: C
ShortDescr: HI
CodeReview:
====================================
Changelist: 808298
Date: 2015/03/19
Developer: A
ShortDescr: Checking in the following graphics:
CodeReview:
CodeReview: Result: @result___
====================================
Changelist: 808273
Date: 2015/03/19
Developer: B
ShortDescr: Hello
CodeReview: Result:
====================================
Changelist: 808277
Date: 2015/03/19
Developer: D
ShortDescr: HEY
CodeReview:
====================================
输出.txt
====================================
Changelist: 808298
Date: 2015/03/19
Developer: A
ShortDescr: Checking in the following graphics:
CodeReview:
CodeReview: Result: @result___
====================================
Changelist: 808273
Date: 2015/03/19
Developer: B
ShortDescr: Hello
CodeReview: Result:
====================================
Changelist: 808271
Date: 2015/03/19
Developer: C
ShortDescr: HI
CodeReview:
====================================
Changelist: 808277
Date: 2015/03/19
Developer: D
ShortDescr: HEY
CodeReview:
====================================
答案1
我假设您不关心空格,因为实际上您的Changelist: 808273
记录是不同的(选择文本以查看差异):
第一:
代码审查:结果:
冒号后一个空格
第二个:
代码审查:结果:
冒号后两个空格
以下是从数据中删除重复项的 PowerShell 脚本:
# Setup input and output files
$InFile = '.\Data.txt'
$OutFile = '.\Output.txt'
# Separator to split records
$Separator = '^=+$'
# Read file to array and trim strings
# https://mjolinor.wordpress.com/2014/01/18/another-take-on-using-the-operator/
$Reader = New-Object -TypeName System.IO.StreamReader -ArgumentList $InFile -ErrorAction Stop
$Data = while(($line = $Reader.ReadLine()) -ne $null){$line.Trim()}
$Reader.Close()
$Reader.Dispose()
# Find start and end indexes of each record
$RecordBounds = 0..($Data.Length-1) | Where-Object {$Data[$_] -match $Separator}
# Split records into multidimensional array
$Records = @()
for ($i=0 ; $i -lt ($RecordBounds.Length-1) ; $i++)
{
$Records += ,($Data[($RecordBounds[$i]+1)..($RecordBounds[$i+1]-1)])
}
# Get actual separator string to use it in new file
$LiteralSeparator = $Data | Where-Object {$_ -match $Separator} | Select-Object -First 1
# Get only unique records, combine with separators
$Result = ,$LiteralSeparator + ($Records | Select-Object -Unique | ForEach-Object {$_ ; $LiteralSeparator})
# Write result to file
$Result | Out-File -LiteralPath $OutFile -Encoding Default -Force
示例结果:
====================================
Changelist: 808298
Date: 2015/03/19
Developer: A
ShortDescr: Checking in the following graphics:
CodeReview:
CodeReview: Result: @result___
====================================
Changelist: 808273
Date: 2015/03/19
Developer: B
ShortDescr: Hello
CodeReview: Result:
====================================
Changelist: 808271
Date: 2015/03/19
Developer: C
ShortDescr: HI
CodeReview:
====================================
Changelist: 808277
Date: 2015/03/19
Developer: D
ShortDescr: HEY
CodeReview:
====================================