移动使用 Join 时添加的回车符

移动使用 Join 时添加的回车符

我正在加入两个管道分隔的文件,但在使用我的加入命令后:

join -a 1 -i -t"|" -o 1.3 1.1 2.2 1.4 1.5 2.3 2.4 2.5 2.6 2.7 2.8 2.9  <(sort -d -t"|" -z  alt.csv) <(sort -d -t"|" -z  ../original/alt.csv) > ../out/alt.csv

输出文件在发生连接的地方有一个回车符,例如:

IRN|EADUnitID|EADPhysicalTechnical|AdmPublishWebNoPassword|AdmPublishWebPassword
|EADUnitTitle|EADBiographyOrHistory|EADScopeAndContent|EADArrangement|EADAcquisitionInformationRef|EADRelatedMaterial|BibBibliographyRef_tab
51899|ga.1.1|GLS Add. GA 1/1|Yes|Yes
|Photographic negatives ||&lt;p&gt;The albums comprise of negatives of Gypsies and Gypsy life in Germany and eastern Europe. The albums have been indexed and the negatives numbered by Althaus in series I-IV; VII-VIII, though numbering is not continuous. The majority of the negatives have duplicates in slide or photograph format (GA 1/2 and GA 3) and reference has been made to these. The captions are those taken from the index except for unindexed negatives, whereupon the caption has been taken from a duplicate photograph or slide. Where there is no duplicate, the caption simply describes what can be seen in the negative. The list also includes 22 negatives that are indexed in the albums but are missing. There is a closed section from GA 1/1/53 - GA 1/1/68 due to the sensitive nature of the negatives. &lt;&#x0002F;p&gt;||||
51900|ga.1.1.1|GLS Add. GA 1/1/1|Yes|Yes
|Ehepaar Weltzel. ||||||
51901|ga.1.1.2|GLS Add. GA 1/1/2|Yes|Yes
|Ehepaar Weltzel. ||||||
51902|ga.1.1.3|GLS Add. GA 1/1/3|Yes|Yes
|Roßlau, Dessauerstr Kegli. Julius Braun, Bitterfield, 1939 Koitsch. ||||||

但为了正确处理,回车需要出现在最后一列之后:

IRN|EADUnitID|EADPhysicalTechnical|AdmPublishWebNoPassword|AdmPublishWebPassword|EADUnitTitle|EADBiographyOrHistory|EADScopeAndContent|EADArrangement|EADAcquisitionInformationRef|EADRelatedMaterial|BibBibliographyRef_tab
51899|ga.1.1|GLS Add. GA 1/1|Yes|Yes|Photographic negatives ||&lt;p&gt;The albums comprise of negatives of  life in Germany and eastern Europe. The albums have been indexed and the negatives numbered by Althaus in series I-IV; VII-VIII, though numbering is not continuous. The majority of the negatives have duplicates in slide or photograph format (GA 1/2 and GA 3) and reference has been made to these. The captions are those taken from the index except for unindexed negatives, whereupon the caption has been taken from a duplicate photograph or slide. Where there is no duplicate, the caption simply describes what can be seen in the negative. The list also includes 22 negatives that are indexed in the albums but are missing. There is a closed section from GA 1/1/53 - GA 1/1/68 due to the sensitive nature of the negatives. &lt;&#x0002F;p&gt;||||
51900|ga.1.1.1|GLS Add. GA 1/1/1|Yes|Yes|Ehepaar Weltzel. ||||||
51901|ga.1.1.2|GLS Add. GA 1/1/2|Yes|Yes|Ehepaar Weltzel. ||||||
51902|ga.1.1.3|GLS Add. GA 1/1/3|Yes|Yes|Roßlau, Dessauerstr Kegli. Julius Braun, Bitterfield, 1939 Koitsch. ||||||

有没有办法使用 sed 或 awk 来获得我想要的结果?我是否首先需要在最后一列的末尾添加另一个管道并根据出现的次数进行替换?

答案1

我已经找到了一个解决方案,但它并不是特别优雅。我确实决定向第二个文件添加一个额外的管道以进行连接,因为这允许我进行一些额外的处理以获得正确的格式。

现在我需要采取的步骤是:

    # add pipe to the end of the line for ORIGINAL files only
    sed -i 's/$/|/' ../original/alt.csv

    --- Do join and output joined file to ../out/alt.csv ---

    # match on last pipe and add a carriage return
    sed -i 's/\(.*\)\|/\0\r/' ../out/alt.csv

    # remove carriage return where join occurred (the use of pipe is simply to locate carriage return) and replace with pipe
    sed -i 's/\r|/|/' ../out/alt.csv

    # remove all blank lines 
    sed -i '/^\s*$/d' ../out/alt.csv

    # remove pipe at the end of the line of output file and add a carriage return
    sed -i 's/[^\r\n].$/\r/' ../out/alt.csv 

如果有一种简单的方法可以实现这一点,我会很高兴听到。

相关内容