在 Windows 10 中从非常大的 CSV 文件中删除第二行空白

Question

使用

gc file.csv | ? {$_.trim() -ne "" } | set-content file_trimmed.csv

原始命令中有什么问题（解释使用 PowerShell 删除文本文件中的所有空白行在Tim Curwick 的 PowerShell 博客)：

语句周围的括号Get-Content强制其完成加载全部内容将它们发送到管道之前放入一个对象中。（如果我们写入的文件和读取的文件不同，我们可以通过消除括号来加快命令速度，这样我们就可以同时读取一个文件并写入另一个文件。）

测试脚本 1264263.ps1措施仅仅阅读一个大文件并省略写入输出：

param (
        [Parameter()][string]$file = 'green_tripdata_2014-03.csv'
)

Push-Location 'D:\test'

#$file = 'green_tripdata_2014-03.csv'
"$file`: {0:N3} KiB" -f $((Get-Item $file).Length /1024 )

' GC $file                          :' + ' {0:N7} sec' -f (Measure-Command {
    $y = Get-Content $file
}).TotalSeconds

Start-Sleep -Seconds 1
' GC $file  | ? {$_.trim()}         :' + ' {0:N7} sec' -f (Measure-Command {
    $y = (Get-Content $file | 
        Where-Object {$_.trim()}) #| Set-Content "$file2"
}).TotalSeconds

Start-Sleep -Seconds 1
' GC $file  | ? {$_.trim() -ne ""}  :' + ' {0:N7} sec' -f (Measure-Command {
    $y = (Get-Content $file | 
        Where-Object {$_.trim() -ne "" }) #| Set-Content "$file2"
}).TotalSeconds

Start-Sleep -Seconds 1
'(GC $file) | ? {$_.trim() -ne ""}  :' + ' {0:N7} sec' -f (Measure-Command {
    $y = (Get-Content $file) | 
        Where-Object {$_.trim() -ne ""} #| Set-Content "$file2"
}).TotalSeconds

Pop-Location

输出表明改进的命令（案例 3）比原始命令（案例 4）快 10 倍：

PS D:\PShell> D:\PShell\SU\1264263.ps1
green_tripdata_2014-03.csv: 197,355.560 KiB
 GC $file                          : 27.4584778 sec
 GC $file  | ? {$_.trim()}         : 59.2003851 sec
 GC $file  | ? {$_.trim() -ne ""}  : 61.0429012 sec
(GC $file) | ? {$_.trim() -ne ""}  : 615.8580773 sec
PS D:\PShell>

Answer 1

使用

gc file.csv | ? {$_.trim() -ne "" } | set-content file_trimmed.csv

原始命令中有什么问题（解释使用 PowerShell 删除文本文件中的所有空白行在Tim Curwick 的 PowerShell 博客)：

语句周围的括号Get-Content强制其完成加载全部内容将它们发送到管道之前放入一个对象中。（如果我们写入的文件和读取的文件不同，我们可以通过消除括号来加快命令速度，这样我们就可以同时读取一个文件并写入另一个文件。）

测试脚本 1264263.ps1措施仅仅阅读一个大文件并省略写入输出：

param (
        [Parameter()][string]$file = 'green_tripdata_2014-03.csv'
)

Push-Location 'D:\test'

#$file = 'green_tripdata_2014-03.csv'
"$file`: {0:N3} KiB" -f $((Get-Item $file).Length /1024 )

' GC $file                          :' + ' {0:N7} sec' -f (Measure-Command {
    $y = Get-Content $file
}).TotalSeconds

Start-Sleep -Seconds 1
' GC $file  | ? {$_.trim()}         :' + ' {0:N7} sec' -f (Measure-Command {
    $y = (Get-Content $file | 
        Where-Object {$_.trim()}) #| Set-Content "$file2"
}).TotalSeconds

Start-Sleep -Seconds 1
' GC $file  | ? {$_.trim() -ne ""}  :' + ' {0:N7} sec' -f (Measure-Command {
    $y = (Get-Content $file | 
        Where-Object {$_.trim() -ne "" }) #| Set-Content "$file2"
}).TotalSeconds

Start-Sleep -Seconds 1
'(GC $file) | ? {$_.trim() -ne ""}  :' + ' {0:N7} sec' -f (Measure-Command {
    $y = (Get-Content $file) | 
        Where-Object {$_.trim() -ne ""} #| Set-Content "$file2"
}).TotalSeconds

Pop-Location

输出表明改进的命令（案例 3）比原始命令（案例 4）快 10 倍：

PS D:\PShell> D:\PShell\SU\1264263.ps1
green_tripdata_2014-03.csv: 197,355.560 KiB
 GC $file                          : 27.4584778 sec
 GC $file  | ? {$_.trim()}         : 59.2003851 sec
 GC $file  | ? {$_.trim() -ne ""}  : 61.0429012 sec
(GC $file) | ? {$_.trim() -ne ""}  : 615.8580773 sec
PS D:\PShell>

在 Windows 10 中从非常大的 CSV 文件中删除第二行空白

答案1

相关内容