缩写linux中的多个列名，保留最后一个字段

Question 1

我会编写一个简短的程序，将原始文件复制到具有短名称的新文件中。保留原始文件可以在出现问题时为您提供备份。您所写的内容取决于您熟悉的语言。这可能是您的 shell，例如 Bash，也可能是任何一种语言，例如 java、c、pearl、python 等。

这是一些伪代码：old是原始文件，new是新文件create new

begin a loop to read each  line in old
   read line from old
   delete all characters from line up to and including the last "/"
   delete delete all characters from line after the first 7
//This is what you want to save unless it conflicts with a previously saved line
   determine if you have a conflict.
   if there is a conflict
      add a number to the end of line to make it unique
   save line to new
   end of loop

Answer

我会编写一个简短的程序，将原始文件复制到具有短名称的新文件中。保留原始文件可以在出现问题时为您提供备份。您所写的内容取决于您熟悉的语言。这可能是您的 shell，例如 Bash，也可能是任何一种语言，例如 java、c、pearl、python 等。

这是一些伪代码：old是原始文件，new是新文件create new

begin a loop to read each  line in old
   read line from old
   delete all characters from line up to and including the last "/"
   delete delete all characters from line after the first 7
//This is what you want to save unless it conflicts with a previously saved line
   determine if you have a conflict.
   if there is a conflict
      add a number to the end of line to make it unique
   save line to new
   end of loop

Question 2

假设我有一个包含 4 列和两行的文件：

host:~ # cat file2
/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample2.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample3.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample4.so.rg.mk.bam
abc def ghi jkl

这个命令对我有用（不是很方便，但仍然）：

host:~ # sed -i -e 's/^\///g' -e 's/[[:alnum:]]\+\///g' -e 's/\.[[:alnum:]]\+//g' -e 's/\///g' file2
host:~ # cat file2
sample1 sample2 sample3 sample4
abc def ghi jkl

我确信有一种更有效的方法，但你可以尝试一下。

Answer

假设我有一个包含 4 列和两行的文件：

host:~ # cat file2
/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample2.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample3.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample4.so.rg.mk.bam
abc def ghi jkl

这个命令对我有用（不是很方便，但仍然）：

host:~ # sed -i -e 's/^\///g' -e 's/[[:alnum:]]\+\///g' -e 's/\.[[:alnum:]]\+//g' -e 's/\///g' file2
host:~ # cat file2
sample1 sample2 sample3 sample4
abc def ghi jkl

我确信有一种更有效的方法，但你可以尝试一下。

Question 3

假设不需要的后缀总是“.so.rg.mk.bam”，然后GNU sed的e估价命令可以用来basename仅在第一行运行文件名，将其替换为所需的输出：

sed -i '1s/.*/basename -as .so.rg.mk.bam -a &/e' filename

为了非GNU seds，head可以用来代替：

sed -i '1s/.*/'"$(basename -as .so.rg.mk.bam -a $(head -1 filename))"'/' filename

--

注意：要在不更改文件的情况下查看结果，请先尝试不更改文件-i。

Answer

假设不需要的后缀总是“.so.rg.mk.bam”，然后GNU sed的e估价命令可以用来basename仅在第一行运行文件名，将其替换为所需的输出：

sed -i '1s/.*/basename -as .so.rg.mk.bam -a &/e' filename

为了非GNU seds，head可以用来代替：

sed -i '1s/.*/'"$(basename -as .so.rg.mk.bam -a $(head -1 filename))"'/' filename

--

注意：要在不更改文件的情况下查看结果，请先尝试不更改文件-i。

Question 4

您可以使用 awk 来处理标头。以下 awk 脚本有效仅有的在第一行 ( NR==1)。它一次循环遍历该行中的所有字段。对于每个字段，它执行以下步骤：

找到文本的第一个实例/sample并将文本修剪到该实例（并通过/）。
找到一个时间段的剩余部分中的第一个实例，并从该时间段开始修剪掉该部分。
如果剩余部分太长，则sample根据需要修剪文本。保留多少的等式是“6 加第一个数字的位置减去总长度”。
处理完该字段后，将其打印出来，并带有尾随空格。
一旦我们完成了所有字段的循环，就打印一个换行符。

请注意，这会在行尾留下一个尾随空格。

awk 脚本：

NR == 1 {
  for(i=1; i <= NF; i++) {
    tail=substr($i, 1 + match($i, "/sample"))   # delete up to the first instance of "/sample"
    tail=substr(tail, 1, index(tail, ".") - 1)  # find, then stop short of, the first period
    if (length(tail) > 7) {                     # if it's too long
        match(tail, "[0-9]")                    # find the first digit
                                                # trim the beginning down, then append the number
        tail=substr(tail, 1, 6 + RSTART - length(tail))substr(tail, RSTART)
    }
    printf tail" "
  }
  print ""
}

样本输入：

/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample47.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample4631.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample1234567.so.rg.mk.bam

示例输出为：

sample1 sampl47 sam4631 1234567

Answer