find
我有来自 a和grep
管道的以下类型的输出
./Columbia/815425_0001104659-11-049107.txt: CENTRAL INDEX KEY: 0000815425
./Columbia/815425_0001104659-12-060231.txt: CENTRAL INDEX KEY: 0000815425
./Columbia/815425_0001104659-13-066298.txt: CENTRAL INDEX KEY: 0000815425
./Dimensional Advisors/355437_0001137439-04-000108.txt: CENTRAL INDEX KEY: 0000355437
./Dimensional Advisors/355437_0001137439-05-000205.txt: CENTRAL INDEX KEY: 0000355437
./Dimensional Advisors/355437_0001137439-06-000306.txt: CENTRAL INDEX KEY: 0000355437
./Dimensional Advisors/355437_0001137439-08-000364.txt: CENTRAL INDEX KEY: 0000355437
./Dimensional Advisors/355437_0001137439-09-000076.txt: CENTRAL INDEX KEY: 0000355437
./Dimensional Advisors/355437_0001137439-12-000295.txt: CENTRAL INDEX KEY: 0000355437
./Dimensional Advisors/355437_0001140361-10-035592.txt: CENTRAL INDEX KEY: 0000355437
我想获得
Columbia 0000815425
Columbia 0000815425
Columbia 0000815425
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
我在想sed
和grep
,但我对如何组合所有内容感到困惑:匹配第一部分:(如何在 之前匹配/
?)
erik Funds$ cat myoutput | egrep -o "[A-Z].*/"
Columbia/
Columbia/
Columbia/
Dimensional Advisors/
Dimensional Advisors/
Dimensional Advisors/
Dimensional Advisors/
Dimensional Advisors/
Dimensional Advisors/
Dimensional Advisors/
以及最后 10 位数字:
erik Funds$ cat myoutput | egrep -o "[0-9]{10}$"
0000815425
0000815425
0000815425
0000355437
0000355437
0000355437
0000355437
0000355437
0000355437
0000355437
答案1
awk
使用/
作为字段分隔符,然后打印字段 2 和字段 3(带有必要的零填充):
... | awk -F/ '{ printf("%s %010d\n", $2, $3) }'
例子:
$ cat file.txt
./Columbia/815425_0001104659-11-049107.txt: CENTRAL INDEX KEY: 0000815425
./Columbia/815425_0001104659-12-060231.txt: CENTRAL INDEX KEY: 0000815425
./Columbia/815425_0001104659-13-066298.txt: CENTRAL INDEX KEY: 0000815425
./Dimensional Advisors/355437_0001137439-04-000108.txt: CENTRAL INDEX KEY: 0000355437
./Dimensional Advisors/355437_0001137439-05-000205.txt: CENTRAL INDEX KEY: 0000355437
./Dimensional Advisors/355437_0001137439-06-000306.txt: CENTRAL INDEX KEY: 0000355437
./Dimensional Advisors/355437_0001137439-08-000364.txt: CENTRAL INDEX KEY: 0000355437
./Dimensional Advisors/355437_0001137439-09-000076.txt: CENTRAL INDEX KEY: 0000355437
./Dimensional Advisors/355437_0001137439-12-000295.txt: CENTRAL INDEX KEY: 0000355437
./Dimensional Advisors/355437_0001140361-10-035592.txt: CENTRAL INDEX KEY: 0000355437
$ awk -F/ '{ printf("%s %010d\n", $2, $3) }' file.txt
Columbia 0000815425
Columbia 0000815425
Columbia 0000815425
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
答案2
$ sed -E 's|^\./([^/]+)/.*\s([0-9]+)$|\1 \2|' myoutput
Columbia 0000815425
Columbia 0000815425
Columbia 0000815425
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
-E
扩展正则表达式^\./
匹配./
在行首([^/]+)
捕获非/
字符/.*\s
匹配/
以及任意数量的字符,后跟空白字符([0-9]+)$
捕获行尾的数字\1 \2
粘贴捕获的组,中间留有空格
sed
允许使用除\
换行符之外的任何分隔符,因此|
在此处使用以避免转义/
答案3
awk - 使用 FS 分割
如果您设置awk
在斜杠分隔符上拆分输入,然后在空格分隔符上重新拆分,您可以挑选出正确的字段:
<infile awk '{ n=$2; FS=" +"; $0=$0; print n, $NF; FS="/" }' FS=/
GNU awk - 使用 FPAT 字段描述
如果您正确描述这些字段,FPAT
您可以获得所需的结果。这是一个适用于您的输入的示例:
<infile awk '{ print $1, $NF }' FPAT='[[:alnum:][:space:]]+'
输出
Columbia 0000815425
Columbia 0000815425
Columbia 0000815425
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
Dimensional Advisors 0000355437
答案4
tr
这是另一种解决方案,通过和传输数据cut
:
…|tr ':' '/'|cut -d'/' -f2,5|tr -d '/'
第一个tr
替换:
为/
,以便cut
可以用作/
分隔符来隔离最后一列,就像第二列一样。由于您提取了几列,cut
因此 的输出将包含 a/
作为分隔符,除非您使用它--output-delimiter
,但使用 删除此字符会更短tr -d
。
如果您不想要多个空格,可以通过以下方式添加管道sed
:
…|sed 's/ */ /g'