匹配路径名的第一部分和行尾的数字

匹配路径名的第一部分和行尾的数字

find我有来自 a和grep管道的以下类型的输出

./Columbia/815425_0001104659-11-049107.txt:         CENTRAL INDEX KEY:              0000815425
./Columbia/815425_0001104659-12-060231.txt:         CENTRAL INDEX KEY:              0000815425
./Columbia/815425_0001104659-13-066298.txt:         CENTRAL INDEX KEY:              0000815425
./Dimensional Advisors/355437_0001137439-04-000108.txt:         CENTRAL INDEX KEY:              0000355437
./Dimensional Advisors/355437_0001137439-05-000205.txt:         CENTRAL INDEX KEY:              0000355437
./Dimensional Advisors/355437_0001137439-06-000306.txt:         CENTRAL INDEX KEY:              0000355437
./Dimensional Advisors/355437_0001137439-08-000364.txt:         CENTRAL INDEX KEY:              0000355437
./Dimensional Advisors/355437_0001137439-09-000076.txt:         CENTRAL INDEX KEY:              0000355437
./Dimensional Advisors/355437_0001137439-12-000295.txt:         CENTRAL INDEX KEY:              0000355437
./Dimensional Advisors/355437_0001140361-10-035592.txt:         CENTRAL INDEX KEY:              0000355437

我想获得

Columbia 0000815425
Columbia 0000815425
Columbia 0000815425
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437

我在想sedgrep,但我对如何组合所有内容感到困惑:匹配第一部分:(如何在 之前匹配/?)

erik Funds$ cat myoutput | egrep -o "[A-Z].*/"
Columbia/
Columbia/
Columbia/
Dimensional Advisors/
Dimensional Advisors/
Dimensional Advisors/
Dimensional Advisors/
Dimensional Advisors/
Dimensional Advisors/
Dimensional Advisors/

以及最后 10 位数字:

erik Funds$ cat myoutput | egrep -o "[0-9]{10}$"
0000815425
0000815425
0000815425
0000355437
0000355437
0000355437
0000355437
0000355437
0000355437
0000355437

答案1

awk使用/作为字段分隔符,然后打印字段 2 和字段 3(带有必要的零填充):

... | awk -F/ '{ printf("%s %010d\n", $2, $3) }'

例子:

$ cat file.txt 
./Columbia/815425_0001104659-11-049107.txt:         CENTRAL INDEX KEY:              0000815425
./Columbia/815425_0001104659-12-060231.txt:         CENTRAL INDEX KEY:              0000815425
./Columbia/815425_0001104659-13-066298.txt:         CENTRAL INDEX KEY:              0000815425
./Dimensional Advisors/355437_0001137439-04-000108.txt:         CENTRAL INDEX KEY:              0000355437
./Dimensional Advisors/355437_0001137439-05-000205.txt:         CENTRAL INDEX KEY:              0000355437
./Dimensional Advisors/355437_0001137439-06-000306.txt:         CENTRAL INDEX KEY:              0000355437
./Dimensional Advisors/355437_0001137439-08-000364.txt:         CENTRAL INDEX KEY:              0000355437
./Dimensional Advisors/355437_0001137439-09-000076.txt:         CENTRAL INDEX KEY:              0000355437
./Dimensional Advisors/355437_0001137439-12-000295.txt:         CENTRAL INDEX KEY:              0000355437
./Dimensional Advisors/355437_0001140361-10-035592.txt:         CENTRAL INDEX KEY:              0000355437

$ awk -F/ '{ printf("%s %010d\n", $2, $3) }' file.txt
Columbia 0000815425
Columbia 0000815425
Columbia 0000815425
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437

答案2

$ sed -E 's|^\./([^/]+)/.*\s([0-9]+)$|\1 \2|' myoutput
Columbia 0000815425
Columbia 0000815425
Columbia 0000815425
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
  • -E扩展正则表达式
  • ^\./匹配./在行首
  • ([^/]+)捕获非/字符
  • /.*\s匹配/以及任意数量的字符,后跟空白字符
  • ([0-9]+)$捕获行尾的数字
  • \1 \2粘贴捕获的组,中间留有空格

sed允许使用除\换行符之外的任何分隔符,因此|在此处使用以避免转义/

答案3

awk - 使用 FS 分割

如果您设置awk在斜杠分隔符上拆分输入,然后在空格分隔符上重新拆分,您可以挑选出正确的字段:

<infile awk '{ n=$2; FS=" +"; $0=$0; print n, $NF; FS="/" }' FS=/

GNU awk - 使用 FPAT 字段描述

如果您正确描述这些字段,FPAT您可以获得所需的结果。这是一个适用于您的输入的示例:

<infile awk '{ print $1, $NF }' FPAT='[[:alnum:][:space:]]+'

输出

Columbia               0000815425
Columbia               0000815425
Columbia               0000815425
Dimensional Advisors               0000355437
Dimensional Advisors               0000355437
Dimensional Advisors               0000355437
Dimensional Advisors               0000355437
Dimensional Advisors               0000355437
Dimensional Advisors               0000355437
Dimensional Advisors               0000355437

答案4

tr这是另一种解决方案,通过和传输数据cut :

…|tr ':' '/'|cut -d'/' -f2,5|tr -d '/'

第一个tr替换:/,以便cut可以用作/分隔符来隔离最后一列,就像第二列一样。由于您提取了几列,cut因此 的输出将包含 a/作为分隔符,除非您使用它--output-delimiter,但使用 删除此字符会更短tr -d

如果您不想要多个空格,可以通过以下方式添加管道sed :

…|sed 's/  */ /g'

相关内容