并行读取压缩文件的内容而不解压

并行读取压缩文件的内容而不解压

我有以下 zip 存档结构:

$ unzip -l Undetermined_S0_L004_R1_001_fastqc.zip 
Archive:  Undetermined_S0_L004_R1_001_fastqc.zip
  Length     Date   Time    Name
 --------    ----   ----    ----
        0  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/
        0  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/Icons/
        0  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/Images/
     1197  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/Icons/fastqc_icon.png
     1450  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/Icons/warning.png
     1561  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/Icons/error.png
     1715  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/Icons/tick.png
      782  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/summary.txt
     9095  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/Images/per_base_quality.png
    14381  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/Images/per_tile_quality.png
    23205  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/Images/per_sequence_quality.png
    30978  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/Images/per_base_sequence_content.png
    31152  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/Images/per_sequence_gc_content.png
     7861  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/Images/per_base_n_content.png
    18356  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/Images/sequence_length_distribution.png
    23040  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/Images/duplication_levels.png
     9096  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/Images/adapter_content.png
    58683  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/Images/kmer_profiles.png
   355919  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/fastqc_report.html
   301092  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/fastqc_data.txt
    10117  10-10-14 14:44   Undetermined_S0_L004_R1_001_fastqc/fastqc.fo
 --------                   -------
   899680                   21 files

如何并行使用fastqc_data.txtwith crimson,因为目前我收到以下错误:

find `pwd`/*_fastqc.zip -type f | parallel -j 3 unzip -c {} {}/fastqc_data.txt | crimson fastqc {} | less

Usage: crimson fastqc [OPTIONS] INPUT [OUTPUT]

Error: Invalid value for "input": Path "{}" does not exist.

答案1

您有一个由四个命令组成的管道:

  • find,其中列出了 zip 文件。
  • parallel,它调用unzip以提取每个 zip 文件中的一个文件。鉴于 被{}zip 文件的路径替换,您尝试home/user977828/stuff/Undetermined_S0_L004_R1_001_fastqc.zip/fastqc_data.txt从存档中提取文件(如果当前目录是/home/user977828/stuff)。
  • crimson,它在标准输入上接收一堆提取的文件,并使用参数fastqc和进行调用{}
  • less

parallel{}在其论点中进行替代。它对管道的其他部分无能为力。如果要单独调用crimson每个fastqc_data.txt文件,则需要将管道从unziptocrimson作为参数传递给parallel.

find *_fastqc.zip -type f | sed 's/\.zip$//' |
parallel -j 3 'unzip -c {}.zip {}/fastqc_data.txt | crimson fastqc /dev/stdin' |
less

相关内容