从行中删除某些字段

从行中删除某些字段

我在文件中有以下几行:

Modified folders: html/project1/old/dev/vendor/symfony/yaml/Tests/bla.yml
Modified folders: html/port5/.DS_Store
Modified folders: html/trap/dev8/.DS_Store
Modified folders: html/bla3/test/appl/.DS_Store
Modified folders: html/bla4/pro1/app/bla/Api2.php
Modified folders: html/bla10/dev/appl/language/.DS_Store
Modified folders: html/bla11/dev/appl/language/abc.txt

这基本上是的输出rsync。我想列出文件的所有行,最多 3 个目录位置,例如

Modified folders: html/project1/old
Modified folders: html/port5
Modified folders: html/trap/dev8
Modified folders: html/bla3/test
Modified folders: html/bla4/pro1
Modified folders: html/bla10/dev
Modified folders: html/bla11/dev

有人可以提供给我任何命令或 shell 脚本来执行同样的操作吗?

答案1

也许是这样的:

$ sed -r 's|/[^/]*$||' file | sed -r 's|([^/]*/?[^/]*/?[^/]*).*|\1|'
Modified folders: html/project1/old
Modified folders: html/port5
Modified folders: html/trap/dev8
Modified folders: html/bla3/test
Modified folders: html/bla4/pro1
Modified folders: html/bla10/dev
Modified folders: html/bla11/dev

或者你可以使用以下命令完成第二部分cut

sed -r 's|/[^/]*$||' file | cut -d '/' -f 1,2,3

笔记

  • -r使用 ERE
  • s|old|new|old用。。。来代替new
  • [^/]*任意数量的字符/
  • $行结束
  • /?零个或一个/
  • (pattern)保存pattern以供以后参考\1
  • .*任意数量的任意字符
  • |(不带引号)shell 管道 - 将左侧命令的输出传递到右侧命令
  • cut -d '/'用作/分隔符
  • -f 1,2,3打印前三个字段

答案2

以下脚本将(几乎)按照您的要求执行。

#!/usr/bin/env perl

use strict;
use warnings;

while(<DATA>) {
    s!^(Modified\s+folders:\s+)((?:[^/]+/){1,3}).*?$!$1$2!;
    print;
}

__DATA__
Modified folders: html/project1/old/dev/vendor/symfony/yaml/Tests/bla.yml
Modified folders: html/port5/.DS_Store
Modified folders: html/trap/dev8/.DS_Store
Modified folders: html/bla3/test/appl/.DS_Store
Modified folders: html/bla4/pro1/app/bla/Api2.php
Modified folders: html/bla10/dev/appl/language/.DS_Store
Modified folders: html/bla11/dev/appl/language/abc.txt

它读取每一行输入,从中挑选一些值(我的正则表达式方法),用挑选的值替换该行,最后打印现在修改后的行(到 STDOUT)。

输出

Modified folders: html/project1/old/
Modified folders: html/port5/
Modified folders: html/trap/dev8/
Modified folders: html/bla3/test/
Modified folders: html/bla4/pro1/
Modified folders: html/bla10/dev/
Modified folders: html/bla11/dev/

如果我们在一行中写出正则表达式:

s!^(Modified\s+folders:\s+)((?:[^/]+/){1,3}).*?$!$1$2!;

那么看起来有点吓人,但实际上相当简单。基本操作符是替代运算符 s///来自 Perl。

s/foo/bar/;

将替换每个出现的foobar允许s我们将分隔符从/更改为其他内容。我!在这里使用了 ,所以我们也可以写

s!foo!bar!;

确实!不是意味着not它在这里只是一个任意字符。sLfooLbarL;也可以。 我们这样做是因为如果我们使用标准,/我们需要/在参数中转义(这被称为牙签语法)。 假设我们想用 替换路径/old/path/new/path现在比较:

s/\/old\/path/\/new\/path/; # escaping of / needed
s!/old/path!/new/path!;     # no escaping of / needed (but of ! if we had one in the text)

我们还可以将x修饰符应用于s///。它允许在 中使用任意空格(甚至换行符和注释)图案(左侧)以提高可读性。现在循环可以写成:

while(<DATA>) {
    s!^                         # match beginning of line
      (Modified\s+folders:\s+)  # the word "Modified", followed by 1 ore more 
                                # whitespace \s+,
                                # the literal "folders:", also followed by 1 or 
                                # more whitespace.
                                # We capture that match in $1 (that's why we have 
                                # parens around it).
      (                         # begin of 2nd capture group (in $2)
        (?:                     #   begin a group that is NOT captured (because of the "?:"
         [^/]+/                 #   one or more characters that are not a slash followed by a slash
        )                       #   end of group
        {1,3}                   #   this group should appear one to three times
      )                         # close capture group $2, i.e. remember the 1-3x slash thing
      .*?$                      # followed by arbitrary characters up to the end of line
     !$1$2!x;                   # Replace the line with the two found captures $1 and $2, i.e.
                                # with the text "Modified folders:" and the 1-3x slash thing.
    print;
}

完整的“脚本”也可以写成一行:

perl -pe 's!^(Modified\s+folders:\s+)((?:[^/]+/){1,3}).*?$!$1$2!x;' file

更新

我刚刚意识到Modified folders:字符串也可以看作路径的一个组成部分。因此模式可以简化为

perl -pe 's!^((?:[^/]+/){1,3}).*?$!$1!;' file

答案3

grep -oP '^.*?(/.*?){0,2}(?=/)'

简要说明黑暗的使用的正则表达式:

  • ^...我是行的开头
  • .*?一系列字符(但只是必要的数量)以匹配预路径
  • /.*?){0,2}0、1 或 2 个目录
  • (?=/)预测表达式——后面跟着一个/不包括的

相关内容