对于较大的文件大小,对两个文件使用连接会失败

对于较大的文件大小,对两个文件使用连接会失败

我在使用 join 来连接两个文件的脚本时遇到一些问题。示例输入文件包含如下行:

以下是 join 命令的输入文件和输出:

D:\work\BuildScripts\3C>cat D:\temp\aaa.txt
hzapplications\adn\adn4\adn4density\adn4_idd_module.cpp,83
hzapplications\adn\adn4\adn4density\adn4dencalmodule.cpp,73
hzapplications\adn\adn4\adn4density\adn4denimagemodulerm.cpp,111
hzapplications\adn\adn4\adn4density\adn4denimagemodulert.cpp,202
hzapplications\adn\adn4\adn4density\adn4densityanqmodules.cpp,445
hzapplications\adn\adn4\adn4density\adn4densityappl.cpp,378
hzapplications\adn\adn4\adn4density\adn4densityappl.h,50
hzapplications\adn\adn4\adn4density\adn4densityevrmodules.cpp,272
hzapplications\adn\adn4\adn4density\adn4densitykernel.cpp,490
hzapplications\adn\adn4\adn4density\adn4densitykernel.h,65
hzapplications\adn\adn4\adn4density\adn4densitysecimgmodule.cpp,209
hzapplications\adn\adn4\adn4density\adn4densitysecimgmodule.h,70
hzapplications\adn\adn4\adn4density\adn4densitysecmodule.cpp,218
hzapplications\adn\adn4\adn4density\adn4densitysecmodule.h,70
hzapplications\adn\adn4\adn4density\adn4dphimodules.cpp,610
hzapplications\adn\adn4\adn4density\adn4dphimodulesrt.cpp,115
hzapplications\adn\adn4\adn4density\adn4rhomodulesrt.cpp,102

D:\work\BuildScripts\3C>cat D:\temp\bbb.txt
hzapplications\activect\ptc\ictsx01\ictsx01_bootuptask.cpp,1
hzapplications\activeps\iola\acquisition\iola_acqmodule.cpp,4
hzapplications\activeps\iola\simulation\iola_simmodule.cpp,3
hzapplications\activeps\iolr\simulation\iolr_simmodule.cpp,1
hzapplications\activeps\iolr\task\iolr_poweron200vhitask.cpp,1
hzapplications\activeps\iolr\task\iolr_poweron200vlowtask.cpp,1
hzapplications\activeps\iolr\task\iolr_poweronnrlvtask.cpp,1
hzapplications\activeps\iolr\task\iolrtaskcommon.cpp,2
hzapplications\adn\adn4\adn4density\adn4densitykernel.cpp,1
hzapplications\adn\adn4\adn4equipment\adn4adseelem.cpp,1
hzapplications\adn\adn4\adn4equipment\adn4collar.cpp,1
hzapplications\adn\adn4\adn4equipment\adn4tool.cpp,2
hzapplications\adn\adn6c\adn6cequipment\adn6ccollar.cpp,1
hzapplications\adn\adn8\adn8equipment\adn8tool.cpp,1
hzapplications\adn\adn8\adn8neutron\adn8neutronkernel.cpp,1
hzapplications\adn\adn8d\adn8ddensity\adn8ddensitykernel.cpp,1
hzapplications\adn\adn8d\adn8dequipment\adn8dtool.cpp,1

D:\work\BuildScripts\3C>join --ignore-case -1 1 -2 1 -t"," -o "1.1,1.2,2.2" -e "0" -a 1 D:\temp\aaa.txt D:\temp\bbb.txt
hzapplications\adn\adn4\adn4density\adn4_idd_module.cpp,83,0
hzapplications\adn\adn4\adn4density\adn4dencalmodule.cpp,73,0
hzapplications\adn\adn4\adn4density\adn4denimagemodulerm.cpp,111,0
hzapplications\adn\adn4\adn4density\adn4denimagemodulert.cpp,202,0
hzapplications\adn\adn4\adn4density\adn4densityanqmodules.cpp,445,0
hzapplications\adn\adn4\adn4density\adn4densityappl.cpp,378,0
hzapplications\adn\adn4\adn4density\adn4densityappl.h,50,0
hzapplications\adn\adn4\adn4density\adn4densityevrmodules.cpp,272,0
hzapplications\adn\adn4\adn4density\adn4densitykernel.cpp,490,0
hzapplications\adn\adn4\adn4density\adn4densitykernel.h,65,0
hzapplications\adn\adn4\adn4density\adn4densitysecimgmodule.cpp,209,0
hzapplications\adn\adn4\adn4density\adn4densitysecimgmodule.h,70,0
hzapplications\adn\adn4\adn4density\adn4densitysecmodule.cpp,218,0
hzapplications\adn\adn4\adn4density\adn4densitysecmodule.h,70,0
hzapplications\adn\adn4\adn4density\adn4dphimodules.cpp,610,0
hzapplications\adn\adn4\adn4density\adn4dphimodulesrt.cpp,115,0
hzapplications\adn\adn4\adn4density\adn4rhomodulesrt.cpp,102,0

D:\work\BuildScripts\3C>

预期的输出是该特定行的连接方式如下: hzapplications\adn\adn4\adn4densis\adn4densiskernel.cpp,490,1

任何建议都是非常受欢迎的。我在 Windows 上使用 unxutils 包,这是确切的版本:

D:\work\BuildScripts\3C>join --version
join (GNU textutils) 2.0
Written by Mike Haertel.

Copyright (C) 1999 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

答案1

事实证明这--ignore-case就是问题所在。即使没有大写字母,它也有效果,因为它将所有小写字母视为大写,导致它们跳转到 ASCII 顺序中大小写字母之间的字符的另一侧:[\]^_

按正常排序顺序,iolrt排在后面iolr_,但--ignore-case顺序相反。

sort命令需要-f选项来产生正确的顺序。 (除了-t,-k1,1

相关内容