我需要向一个大型 CSV 文件(900+MB)添加标题,但 PowerShell 在我的系统上停滞了 3 个小时以上,有时甚至崩溃。
问题: 有没有办法让数据流过而不是通过导入/导出将数据完全加载到内存中?
如果可能的话,我需要快速向现有的大型 CSV 文件添加标头。以下是我使用的代码变体,但每个代码变体都存在我目前不知道的问题。任何帮助或指导都非常感谢。
代码缓慢
$File = 'C:\Install.tmp\target.csv'
$filedata = import-csv $file -Header "Column 0","Column 1","Column 2","Column 3","Column 4","Column 5","Column 6","Column 7","Column 8","Column 9","Column 10","Column 11","Column 12","Column 13","Column 14","Column 15","Column 16"
$filedata | export-csv $file -NoTypeInformation
缓慢的代码—尝试将所需的列数据添加为新行而不是新列,但失败了)
$File = 'C:\Install.tmp\target.csv'
$Data = Get-Content -Path $File
$Header = "Column 0","Column 1","Column 2","Column 3","Column 4","Column 5","Column 6","Column 7","Column 8","Column 9","Column 10","Column 11","Column 12","Column 13","Column 14","Column 15","Column 16"
Set-Content $File -Value $Header
Add-Content -Path $File -Value $Data
答案1
Import-csv 可能需要很长时间。Get-Content 可能更快:
生成无标题的随机文本 csv 文件(约 1GB):
$path_csv = "c:\temp\random1_csv.txt"
$path_header = "c:\temp\random1_header.txt"
Measure-Command {
"1234567890,1234567890,1234567890`r`n" * 30737418 | Set-Content -Path $path_csv
} | select TotalSeconds # 7
在笔记本电脑和慢速 nvme 上设置标头需要 13 秒:
Measure-Command {
"a,b,c" | Set-Content $path_header
Get-Content -Path $path_csv -Raw | Add-Content -Path $path_header
} | select TotalSeconds # 13
答案2
您可以将输入直接传输到输出,而无需经过中间变量。对于大文件,这应该会运行得更快一些。尝试以下操作:
$Header = '"Column 0","Column 1","Column 2","Column 3","Column 4","Column 5","Column 6","Column 7","Column 8","Column 9","Column 10","Column 11","Column 12","Column 13","Column 14","Column 15","Column 16"'
$File = 'C:\Install.tmp\target.csv'
import-csv $file -Header $Header | export-csv $File -NoTypeInformation