使用 bash、awk 或 sed 删除重复条目

使用 bash、awk 或 sed 删除重复条目

如何使用批处理搜索重复数据?目标是从 data.txt 文件中删除重复的“Changelist: XXXXX”条目。我有点不知所措,有人能帮我吗?

请查看 output.txt 以了解所需的输出。

数据.txt

====================================
 Changelist: 808298
 Date: 2015/03/19
 Developer: A
 ShortDescr: Checking in the following graphics:

 CodeReview: 
 CodeReview: Result: @result___
 ====================================
 Changelist: 808273
 Date: 2015/03/19
 Developer: B
 ShortDescr: Hello

 CodeReview: Result: 
 ====================================
 Changelist: 808271
 Date: 2015/03/19
 Developer: C
 ShortDescr: HI

 CodeReview: 
 ====================================
 Changelist: 808298
 Date: 2015/03/19
 Developer: A
 ShortDescr: Checking in the following graphics:

 CodeReview: 
 CodeReview: Result: @result___
 ====================================
 Changelist: 808273
 Date: 2015/03/19
 Developer: B
 ShortDescr: Hello

 CodeReview: Result:  
 ====================================
  Changelist: 808277
 Date: 2015/03/19
 Developer: D
 ShortDescr: HEY

 CodeReview: 
 ====================================

输出.txt

====================================
 Changelist: 808298
 Date: 2015/03/19
 Developer: A
 ShortDescr: Checking in the following graphics:

 CodeReview: 
 CodeReview: Result: @result___
 ====================================
 Changelist: 808273
 Date: 2015/03/19
 Developer: B
 ShortDescr: Hello

 CodeReview: Result: 
 ====================================
 Changelist: 808271
 Date: 2015/03/19
 Developer: C
 ShortDescr: HI

 CodeReview: 
 ====================================
  Changelist: 808277
 Date: 2015/03/19
 Developer: D
 ShortDescr: HEY

 CodeReview: 
 ====================================

答案1

我假设您不关心空格,因为实际上您的Changelist: 808273记录是不同的(选择文本以查看差异):

  • 第一:

    代码审查:结果:

    冒号后一个空格

  • 第二个:

    代码审查:结果:  

    冒号后两个空格

以下是从数据中删除重复项的 PowerShell 脚本:

# Setup input and output files
$InFile = '.\Data.txt'
$OutFile = '.\Output.txt'

# Separator to split records
$Separator = '^=+$'

# Read file to array and trim strings
# https://mjolinor.wordpress.com/2014/01/18/another-take-on-using-the-operator/
$Reader = New-Object -TypeName System.IO.StreamReader -ArgumentList $InFile -ErrorAction Stop
$Data = while(($line = $Reader.ReadLine()) -ne $null){$line.Trim()}
$Reader.Close()
$Reader.Dispose()

# Find start and end indexes of each record
$RecordBounds = 0..($Data.Length-1) | Where-Object {$Data[$_] -match $Separator}

# Split records into multidimensional array
$Records = @()
for ($i=0 ; $i -lt ($RecordBounds.Length-1) ; $i++)
{
    $Records += ,($Data[($RecordBounds[$i]+1)..($RecordBounds[$i+1]-1)])
}

# Get actual separator string to use it in new file
$LiteralSeparator = $Data | Where-Object {$_ -match $Separator} | Select-Object -First 1

# Get only unique records, combine with separators
$Result = ,$LiteralSeparator + ($Records | Select-Object -Unique | ForEach-Object {$_ ; $LiteralSeparator})

# Write result to file
$Result | Out-File -LiteralPath $OutFile -Encoding Default -Force

示例结果:

====================================
Changelist: 808298
Date: 2015/03/19
Developer: A
ShortDescr: Checking in the following graphics:

CodeReview:
CodeReview: Result: @result___
====================================
Changelist: 808273
Date: 2015/03/19
Developer: B
ShortDescr: Hello

CodeReview: Result:
====================================
Changelist: 808271
Date: 2015/03/19
Developer: C
ShortDescr: HI

CodeReview:
====================================
Changelist: 808277
Date: 2015/03/19
Developer: D
ShortDescr: HEY

CodeReview:
====================================

相关内容