单个文本文件需要使用 shell 或 bash 脚本进行多个操作

单个文本文件需要使用 shell 或 bash 脚本进行多个操作

我的源文件:Test.txt

注意:文件是制表符分隔的,少数列没有列名:

Chr  Start  End   Alt   Value
Exo  0      10    .     1.50    .   20:-2     30:0.9    50:50   50
Exo  1      20    .     1.50    .   20:-1     30:-1     50:50   50
Exo  2      30    .     1.50    .   20:0.02   30:0.9    50:50   50
Exo  3      40    .     1.50    .   20:-1     30:-2     50:50   50
Nem  3      40    .     1.50    .   20:-1     30:-2     50:50   50

在上面的文件上尝试实现以下文件操作,例如:

1)第7列和第8列需要分割':'并且需要在更改后给出列名称,例如“mod1”,“mod2”,“mod3”,“mod4”。

2) 之后,将拆分的列移至“值”列旁边,并在“mod4”旁边再放置一个“注释”列(在该注释列中需要空白数据)。

3)过滤列“Mod2”所有大于0.01的值被删除

最终结果需要存储在输出文件夹中,例如

Chr  Start  End   Alt  Value  mod1  mod2  mod3  mod4  comment 
Exo  0      10    -1   1.50   20    -2    30    0.9           -1  50:50  50
Exo  1      20    -1   1.50   20    -1    30    -1            -1  50:50  50
Exo  3      40    -1   1.50   20    -1    30    -2            -1  50:50  50

我尝试了以下操作并实现了一些操作,其中一些操作还剩下:

#!bin/bash

cd /home/uxm/Desktop/Shell/

# Replace the only dots (.) by -1

awk -F'\t' '{for(i=1;i<=NF;i++){sub(/^\.$/,"-1",$i)}} 1' OFS="\t" Test.txt | tail >> Test1.txt

# splitted 7th no column by delimitted ":" 

awk '{ split($7, a, ":"); print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"a[1]"\t"a[2]"\t"$8"\t"$9"\t"$10"\t"$11 >> "testfile1.tmp"; }' Test1.txt;
mv testfile1.tmp Test2.txt;

# splitted 8th no column by delimitted ":" 

awk '{ split($9, a, ":"); print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"a[1]"\t"a[2]"\t"$10"\t"$11 >> "testfile2.tmp"; }' Test2.txt;
mv testfile2.tmp Test3.txt;

# Give name to splitted columns

awk -F'\t' -v OFS="\t" 'NR==1{$11="nCol\tMod1\tMod2\tMod3\tMod4"}1' Test3.txt >> Test4.txt

# Filter data by "Exo" word 

awk -F'\t' 'NR==1;{ if($1 == "Exo") { print }}' Test4.txt | tail >> Test5.txt

答案1

这是一个awk执行您列举的步骤的脚本。在一个脚本中执行所有操作的好处是不必awk多次运行并将中间结果存储在文件或变量中。

BEGIN { OFS = FS = "\t" }
NR == 1 {
    # Add new column headers

    # First four "mod" headers
    for (i = 1; i <= 4; ++i)
        $(NF + 1) = "mod" i

    # Then a "comment" header
    $(NF + 1) = "comment"

    # Output and continue with next input line
    print
    next
}

# Ignore lines that don't have "Exo" in the first column
$1 != "Exo" { next }

{
    # Working our way "backwards" from column 13 down to 1

    # Shift the last two columns right by three steps
    $13 = $10
    $12 = $9

    # Set column 11 to column 6, or to -1 if it's a dot
    if ($6 == ".")
        $11 = -1
    else
        $11 = $6 

    # Empty the comment column
    $10 = ""

    # Move column 8 into column 9
    $9 = $8

    # Split column 9 into columns 8 and 9
    split($9, a, ":")
    $9 = a[2]
    $8 = a[1]

    # Split column 7 into columns 6 and 7
    split($7, a, ":")
    $7 = a[2]
    $6 = a[1]

    # Column 5 remains unmodified

    # Put -1 in column 4 if it's a dot
    if ($4 == ".") $4 = -1

    # Columns 1, 2, 3 remains unmodified   
}

# Output if we want this line
$7 <= 0.01 { print }

运行它:

$ awk -f script.awk Test.txt
Chr     Start   End     Alt     Value   mod1    mod2    mod3    mod4    comment
Exo     0       10      -1      1.50    20      -2      30      0.9             -1      50:50   50
Exo     1       20      -1      1.50    20      -1      30      -1              -1      50:50   50
Exo     3       40      -1      1.50    20      -1      30      -2              -1      50:50   50

我从您自己的代码中假设您只对这些Exo行感兴趣,所以我让脚本只查看这些行。我假设 thaAlt列(以及原始第一个无名列)中的任何点都应该更改为-1,也可以通过查看您的代码来更改。

相关内容