我有这个脚本:
for /f "delims=" %%a in (data_A.txt) DO (
set "flag="
for /f "delims=" %%b in (data_B.txt) do (
if "%%a"=="%%b" CALL :flagit
)
if not defined flag >>notmatch.txt echo %%a
)
:flagit
set flag=1
goto :eof
我正在比较两个文本文件之间的每一行,以确定不匹配的条目。文件包含校验和哈希值以及相对路径信息。
它可以工作,但速度很慢。一个包含 100,000 个条目的文件需要两个半小时,而且我必须执行两次,因为我必须比较两种方式来捕获 A 上可能不在 B 上的条目以及 B 上可能不在 A 上的条目。数据也不一定按任何特定的排序顺序排列。
我想捕捉:
- 集合 A 中不属于集合 B 中的文件
- 集合 B 中的文件不属于集合 A
- 校验和不匹配的文件
前两个我可以轻松处理(不在 A/B 上的文件),因为一旦我得到“不匹配”的条目,它们应该很少,并且比较两者上是否存在相对路径应该很简单(我希望如此)。但最后一个才是真正耗时的。
数据示例:
dd2da0dcb5a54989dd4d2312013ddb12345c0593ed59a6d307461d925d57226d89d24c2e5a95c0d4082b14118cb8766d89ae69e40c4dac1ab5bd718bd7c58d9a \Personal\Pictures\Camera 2019\2019-02-17 15.02.34.jpg
509ebfd1e2c180ccd6bd679204b7c255f3c7abcdefg7660e219fa9eb58658d96a3ef8cec179221acb78be81f8dd78bd3a8b1a3cdaef0cd691725d3402a495b0b \Personal\Pictures\Camera 2019\2019-02-17 15.03.59.jpg
a3180dce7675aeb161f8fe25fcbd39ff2678faf2326d3e2a39fchfasff90a714134bdd22f91103026c494e6ffcfd62d5cb3d46992de9dfff71b49f9a734c0ab9 \Personal\Pictures\Camera 2019\2019-02-17 17.11.41.jpg
b5262c6ce5c4425a4ed737a7a8fdbc040c68003785d67177a25c86d9fb531ce42f74648783aed4bbb3aff7304b00d44b14eaa2a6c728b8802cafd22059570212 \Personal\Pictures\Camera 2019\2019-02-18 18.06.14.jpg
da7e1eb7ec147628a59e702c55159bc32d66f3c540dfb4be436f136137af913a7139640701eba84f34796da4f35c9fasdffae35542f56b1dccf009d1cec30d20 \Personal\Pictures\Camera 2019\2019-02-22 06.18.15.jpg
72c99a6f4394b4f65d4b66b00071de1d40cb717f525863875c36b2bc79dd0a8491ee8854b8b4437bfcfe4aa8379861aa43a7850dfac144d5db5b2c6b75dcf292 \Personal\Pictures\Camera 2019\2019-02-22 06.18.23.jpg
4a8a39e68379b2c671d83935b13dc82dd60d5e8b36a32a8677698a9306876zcvaffaaa4af292d53a8f52df4ee1c7bc701068064f4d28009566e8825abf2ab077 \Personal\Pictures\Camera 2019\2019-02-22 06.20.10.jpg
074103664be0c91664bd4e2e51d0e051c9cf8f27c26511d3a691d0asdfadfa134234808a16bf0679a8500910b09cf24d9e9c88788b4a749a81ec2d15f78cacfd \Personal\Pictures\Camera 2019\2019-02-22 06.27.14.jpg
28dc03a7722b0781caa4dfasdf664w666777068c79456941a159ffefa1d9c34fed83b98858394c1aa471396a0b1a448d8dd89e361c564e6b27e451b2dd701dd7 \Personal\Pictures\Camera 2019\2019-02-23 11.54.34.jpg
for 循环的工作方式是扫描其他文件,直到找到不匹配的文件,将其输出到日志文件,然后转到下一个文件。似乎应该有一种方法,一旦文件匹配或不匹配,就将其从搜索中消除。
编辑- 好的,我正在使用 PowerShell。Compare-Object 运行良好,但必须弄清楚如何仅显示存在于两个 A 上的不同文件和B,并将记录到仅在 A 上的文件和仅在 B 上的文件...
答案1
- 在 PowerShell /Update 中:
要在 PowerShell 中执行相同操作,您可以使用@JoseZ本文提到的方法回答,经过最少的编辑,您就可以获得保存在 NoMatch.txt 文件中的每个文件中不同的字符串:
- 获取两者不同的内容,即获取 data_A.txt 中不存在于 data_B.txt 中的行/内容,以及 data_B.txt 中不存在于 data_A.txt 中的行/内容
Set-Location -Path "D:\Your\Folder"; Clear-Content -path ".\NoMatch.txt"; $filebefore=".\data_A.txt"; $file_after=".\data_B.txt"
### Compare-Object way
$array = Compare-Object $(Get-Content $filebefore) $(Get-Content $file_after)
$array | where {$_.SideIndicator -eq "<="} | Format-Table -Property InputObject -AutoSize -HideTableHeaders
### -NotIn operator way
$(Get-Content $filebefore) | Where-Object {$_ -notIn $(Get-Content $file_after)} | Out-File ".\NoMatch.txt" -Append
$(Get-Content $file_after) | Where-Object {$_ -notIn $(Get-Content $filebefore)} | Out-File ".\NoMatch.txt" -Append
- 输出是 NoMatch.txt 中保存的每个文件的最后一行
>Get-Content ".\NoMatch.txt"
18dc03a7722b0781caa4dfasdf664w666777068c79456941a159ffefa1d9c34fed83b98858394c1aa471396a0b1a448d8dd89e361c564e6b27e451b2dd701
28dc03a7722b0781caa4dfasdf664w666777068c79456941a159ffefa1d9c34fed83b98858394c1aa471396a0b1a448d8dd89e361c564e6b27e451b2dd701
- 仅获取 data_B.txt 中不在 data_A.txt 中的行/内容
Set-Location -Path "D:\Your\Folder"; Clear-Content -path ".\NoMatch.txt"; $filebefore=".\data_A.txt"; $file_after=".\data_B.txt"
### Compare-Object way
$array = Compare-Object $(Get-Content $filebefore) $(Get-Content $file_after)
$array | where {$_.SideIndicator -eq "<="} | Format-Table -Property InputObject -AutoSize -HideTableHeaders
### -NotIn operator way
$(Get-Content $filebefore) | Where-Object {$_ -notIn $(Get-Content $file_after)} | Out-File ".\NoMatch.txt" -Append
- 输出是文件 data_B.txt 中的最后一行,保存在文件 NoMatch.txt 中
>Get-Content ".\NoMatch.txt"
18dc03a7722b0781caa4dfasdf664w666777068c79456941a159ffefa1d9c34fed83b98858394c1aa471396a0b1a448d8dd89e361c564e6b27e451b2dd701
- 在 bat/cmd 文件中:
对于较短的字符串,您可以尝试使用Findstr /vixg:data_A.txt data_B.txt
/I Specifies that the search is not to be case-sensitive.
/X Prints lines that match exactly.
/V Prints only lines that do not contain a match.
/G:file Gets search strings from the specified file(/ stands for console).
观察:为了Findstr
进行此比较/找到相同/不同的字符串,最大限制为总长度 250 个字符:
(String_A).Length + (String_B).Length <= 250 characters
- 这就是为什么我把你的琴弦缩减为最多 125 个字符在下面的例子中:
仅在最后一行和第一个字符上,文件 data_A.txt 将与文件 data_B.txt 不匹配,并将保存在文件 NoMatch.txt 中
- 仅获取 data_B.txt 中不在 data_A.txt 中的行/内容
@echo off
cd /d "%~dp0" && if not exist data_B.txt call :^)
cd. >nul >.\NoMatch.txt && for /f tokens^=* %%i in (
'findstr /vixg:data_A.txt data_B.txt')do >>.\NoMatch.txt echo=%%~i
goto :EOF
:^)
>data_A.txt ^
(
echo=dd2da0dcb5a54989dd4d2312013ddb12345c0593ed59a6d307461d925d57226d89d24c2e5a95c0d4082b14118cb8766d89ae69e40c4dac1ab5bd718bd7c58
echo=509ebfd1e2c180ccd6bd679204b7c255f3c7abcdefg7660e219fa9eb58658d96a3ef8cec179221acb78be81f8dd78bd3a8b1a3cdaef0cd691725d3402a495
echo=a3180dce7675aeb161f8fe25fcbd39ff2678faf2326d3e2a39fchfasff90a714134bdd22f91103026c494e6ffcfd62d5cb3d46992de9dfff71b49f9a734c0
echo=b5262c6ce5c4425a4ed737a7a8fdbc040c68003785d67177a25c86d9fb531ce42f74648783aed4bbb3aff7304b00d44b14eaa2a6c728b8802cafd22059570
echo=da7e1eb7ec147628a59e702c55159bc32d66f3c540dfb4be436f136137af913a7139640701eba84f34796da4f35c9fasdffae35542f56b1dccf009d1cec30
echo=72c99a6f4394b4f65d4b66b00071de1d40cb717f525863875c36b2bc79dd0a8491ee8854b8b4437bfcfe4aa8379861aa43a7850dfac144d5db5b2c6b75dcf
echo=4a8a39e68379b2c671d83935b13dc82dd60d5e8b36a32a8677698a9306876zcvaffaaa4af292d53a8f52df4ee1c7bc701068064f4d28009566e8825abf2ab
echo=074103664be0c91664bd4e2e51d0e051c9cf8f27c26511d3a691d0asdfadfa134234808a16bf0679a8500910b09cf24d9e9c88788b4a749a81ec2d15f78ca
echo=28dc03a7722b0781caa4dfasdf664w666777068c79456941a159ffefa1d9c34fed83b98858394c1aa471396a0b1a448d8dd89e361c564e6b27e451b2dd701
) && (
>data_B.txt ^
(
echo=dd2da0dcb5a54989dd4d2312013ddb12345c0593ed59a6d307461d925d57226d89d24c2e5a95c0d4082b14118cb8766d89ae69e40c4dac1ab5bd718bd7c58
echo=509ebfd1e2c180ccd6bd679204b7c255f3c7abcdefg7660e219fa9eb58658d96a3ef8cec179221acb78be81f8dd78bd3a8b1a3cdaef0cd691725d3402a495
echo=a3180dce7675aeb161f8fe25fcbd39ff2678faf2326d3e2a39fchfasff90a714134bdd22f91103026c494e6ffcfd62d5cb3d46992de9dfff71b49f9a734c0
echo=b5262c6ce5c4425a4ed737a7a8fdbc040c68003785d67177a25c86d9fb531ce42f74648783aed4bbb3aff7304b00d44b14eaa2a6c728b8802cafd22059570
echo=da7e1eb7ec147628a59e702c55159bc32d66f3c540dfb4be436f136137af913a7139640701eba84f34796da4f35c9fasdffae35542f56b1dccf009d1cec30
echo=72c99a6f4394b4f65d4b66b00071de1d40cb717f525863875c36b2bc79dd0a8491ee8854b8b4437bfcfe4aa8379861aa43a7850dfac144d5db5b2c6b75dcf
echo=4a8a39e68379b2c671d83935b13dc82dd60d5e8b36a32a8677698a9306876zcvaffaaa4af292d53a8f52df4ee1c7bc701068064f4d28009566e8825abf2ab
echo=074103664be0c91664bd4e2e51d0e051c9cf8f27c26511d3a691d0asdfadfa134234808a16bf0679a8500910b09cf24d9e9c88788b4a749a81ec2d15f78ca
echo=18dc03a7722b0781caa4dfasdf664w666777068c79456941a159ffefa1d9c34fed83b98858394c1aa471396a0b1a448d8dd89e361c564e6b27e451b2dd701
) ) && exit /b
- 输出是文件 data_B.txt 中的最后一行,保存在文件 .\NoMatch.txt 中
18dc03a7722b0781caa4dfasdf664w666777068c79456941a159ffefa1d9c34fed83b98858394c1aa471396a0b1a448d8dd89e361c564e6b27e451b2dd701
- 获取两者不同的内容,即获取 data_A.txt 中不存在于 data_B.txt 中的行/内容,以及 data_B.txt 中不存在于 data_A.txt 中的行/内容
@echo off
cd /d "%~dp0" && if not exist data_B.txt call :^)
cd. >nul >notmatch.txt
for /f tokens^=* %%i in ('findstr /vixg:data_A.txt data_B.txt')do >>notmatch.txt echo=%%~i
for /f tokens^=* %%i in ('findstr /vixg:data_B.txt data_A.txt')do >>notmatch.txt echo=%%~i
goto :EOF
:^)
>data_A.txt ^
(
echo=dd2da0dcb5a54989dd4d2312013ddb12345c0593ed59a6d307461d925d57226d89d24c2e5a95c0d4082b14118cb8766d89ae69e40c4dac1ab5bd718bd7c58
echo=509ebfd1e2c180ccd6bd679204b7c255f3c7abcdefg7660e219fa9eb58658d96a3ef8cec179221acb78be81f8dd78bd3a8b1a3cdaef0cd691725d3402a495
echo=a3180dce7675aeb161f8fe25fcbd39ff2678faf2326d3e2a39fchfasff90a714134bdd22f91103026c494e6ffcfd62d5cb3d46992de9dfff71b49f9a734c0
echo=b5262c6ce5c4425a4ed737a7a8fdbc040c68003785d67177a25c86d9fb531ce42f74648783aed4bbb3aff7304b00d44b14eaa2a6c728b8802cafd22059570
echo=da7e1eb7ec147628a59e702c55159bc32d66f3c540dfb4be436f136137af913a7139640701eba84f34796da4f35c9fasdffae35542f56b1dccf009d1cec30
echo=72c99a6f4394b4f65d4b66b00071de1d40cb717f525863875c36b2bc79dd0a8491ee8854b8b4437bfcfe4aa8379861aa43a7850dfac144d5db5b2c6b75dcf
echo=4a8a39e68379b2c671d83935b13dc82dd60d5e8b36a32a8677698a9306876zcvaffaaa4af292d53a8f52df4ee1c7bc701068064f4d28009566e8825abf2ab
echo=074103664be0c91664bd4e2e51d0e051c9cf8f27c26511d3a691d0asdfadfa134234808a16bf0679a8500910b09cf24d9e9c88788b4a749a81ec2d15f78ca
echo=28dc03a7722b0781caa4dfasdf664w666777068c79456941a159ffefa1d9c34fed83b98858394c1aa471396a0b1a448d8dd89e361c564e6b27e451b2dd701
) && (
>data_B.txt ^
(
echo=dd2da0dcb5a54989dd4d2312013ddb12345c0593ed59a6d307461d925d57226d89d24c2e5a95c0d4082b14118cb8766d89ae69e40c4dac1ab5bd718bd7c58
echo=509ebfd1e2c180ccd6bd679204b7c255f3c7abcdefg7660e219fa9eb58658d96a3ef8cec179221acb78be81f8dd78bd3a8b1a3cdaef0cd691725d3402a495
echo=a3180dce7675aeb161f8fe25fcbd39ff2678faf2326d3e2a39fchfasff90a714134bdd22f91103026c494e6ffcfd62d5cb3d46992de9dfff71b49f9a734c0
echo=b5262c6ce5c4425a4ed737a7a8fdbc040c68003785d67177a25c86d9fb531ce42f74648783aed4bbb3aff7304b00d44b14eaa2a6c728b8802cafd22059570
echo=da7e1eb7ec147628a59e702c55159bc32d66f3c540dfb4be436f136137af913a7139640701eba84f34796da4f35c9fasdffae35542f56b1dccf009d1cec30
echo=72c99a6f4394b4f65d4b66b00071de1d40cb717f525863875c36b2bc79dd0a8491ee8854b8b4437bfcfe4aa8379861aa43a7850dfac144d5db5b2c6b75dcf
echo=4a8a39e68379b2c671d83935b13dc82dd60d5e8b36a32a8677698a9306876zcvaffaaa4af292d53a8f52df4ee1c7bc701068064f4d28009566e8825abf2ab
echo=074103664be0c91664bd4e2e51d0e051c9cf8f27c26511d3a691d0asdfadfa134234808a16bf0679a8500910b09cf24d9e9c88788b4a749a81ec2d15f78ca
echo=18dc03a7722b0781caa4dfasdf664w666777068c79456941a159ffefa1d9c34fed83b98858394c1aa471396a0b1a448d8dd89e361c564e6b27e451b2dd701
) ) && exit /b
- 输出是 NoMatch.txt 中保存的每个文件的最后一行
18dc03a7722b0781caa4dfasdf664w666777068c79456941a159ffefa1d9c34fed83b98858394c1aa471396a0b1a448d8dd89e361c564e6b27e451b2dd701
28dc03a7722b0781caa4dfasdf664w666777068c79456941a159ffefa1d9c34fed83b98858394c1aa471396a0b1a448d8dd89e361c564e6b27e451b2dd701