如何将多个不同长度、多列的文本文件按列合并

如何将多个不同长度、多列的文本文件按列合并

我有 60 个长度不同且列名相同的文本文件。

例如:

cat Sample_145_Chimeric.out.junction.new.back_spliced_junction.bed.Circexplorer2.txt | gawk '{print $14}' | sort | uniq -c
  19258 circRNA
    612 ciRNA

cat Sample_146_Chimeric.out.junction.new.back_spliced_junction.bed.Circexplorer2.txt | gawk '{print $14}' | sort | uniq -c
  17791 circRNA
    729 ciRNA

cat Sample_147_Chimeric.out.junction.new.back_spliced_junction.bed.Circexplorer2.txt | gawk '{print $14}' | sort | uniq -c
  22838 circRNA
    686 ciRNA

cat Sample_148_Chimeric.out.junction.new.back_spliced_junction.bed.Circexplorer2.txt | gawk '{print $14}' | sort | uniq -c
  19404 circRNA
    475 ciRNA

我想生成一个包含所有已识别内容的“主”表circRNAsreadnumber其中每个样本作为列,flankintron每个行作为行名:

文件的屏幕截图

答案1

如果所有文件中的所有列都按相同的顺序排列,则只需将它们连接在一起>>

for x in {1..60}; do 
    # These flags for tail just cut of the top line, which is your headers
    tail -n 2 Sample_$x_blah.txt >> Sample_master.txt
    # and the double carat makes the output append^ 
done 

awk如果没有,那么你可以按照上面的方式编写翻译,即

$ cat Sample_1.txt 
col1,col2,col3,col4 #etc 
$ cat Sample_2.txt 
col4,col3,col2,col1 
$ cat Sample_1.txt > Sample_Master.txt # no translation needed
$ awk '{print $4","$3","$2","$1 }' Sample_2.txt >> Sample_Master.txt 

但是如果有 60 个文件,那么这项工作将比使用 python 的 csv 库编写 python 脚本要复杂得多……

相关内容