我有 60 个长度不同且列名相同的文本文件。
例如:
cat Sample_145_Chimeric.out.junction.new.back_spliced_junction.bed.Circexplorer2.txt | gawk '{print $14}' | sort | uniq -c
19258 circRNA
612 ciRNA
cat Sample_146_Chimeric.out.junction.new.back_spliced_junction.bed.Circexplorer2.txt | gawk '{print $14}' | sort | uniq -c
17791 circRNA
729 ciRNA
cat Sample_147_Chimeric.out.junction.new.back_spliced_junction.bed.Circexplorer2.txt | gawk '{print $14}' | sort | uniq -c
22838 circRNA
686 ciRNA
cat Sample_148_Chimeric.out.junction.new.back_spliced_junction.bed.Circexplorer2.txt | gawk '{print $14}' | sort | uniq -c
19404 circRNA
475 ciRNA
我想生成一个包含所有已识别内容的“主”表circRNAs
,readnumber
其中每个样本作为列,flankintron
每个行作为行名:
答案1
如果所有文件中的所有列都按相同的顺序排列,则只需将它们连接在一起>>
:
for x in {1..60}; do
# These flags for tail just cut of the top line, which is your headers
tail -n 2 Sample_$x_blah.txt >> Sample_master.txt
# and the double carat makes the output append^
done
awk
如果没有,那么你可以按照上面的方式编写翻译,即
$ cat Sample_1.txt
col1,col2,col3,col4 #etc
$ cat Sample_2.txt
col4,col3,col2,col1
$ cat Sample_1.txt > Sample_Master.txt # no translation needed
$ awk '{print $4","$3","$2","$1 }' Sample_2.txt >> Sample_Master.txt
但是如果有 60 个文件,那么这项工作将比使用 python 的 csv 库编写 python 脚本要复杂得多……