我有一个由存储库管理工具的输出制作的文本文件aptly
,其中列出了我已发布的存储库,我需要从中提取信息。
文件格式如下:
Published repositories:
* test_repo_one/xenial [i386,amd64] publishes {main: [xenial-main_20190311]: Snapshot from mirror [xenial-main]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}, {multiverse: [xenial-multiverse_20190311]: Snapshot from mirror [xenial-multiverse]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}, {restricted: [xenial-restricted_20190311]: Snapshot from mirror [xenial-restricted]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}, {universe: [xenial-universe_20190311]: Snapshot from mirror [xenial-universe]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}
* test_repo_one/xenial-security [i386,amd64] publishes {main: [xenial-security-main_20190311]: Snapshot from mirror [xenial-security-main]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}, {multiverse: [xenial-security-multiverse_20190311]: Snapshot from mirror [xenial-security-multiverse]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}, {restricted: [xenial-security-restricted_20190311]: Snapshot from mirror [xenial-security-restricted]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}, {universe: [xenial-security-universe_20190311]: Snapshot from mirror [xenial-security-universe]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}
* test_repo_two/trusty [i386,amd64] publishes {main: [trusty-main_20190312]: Snapshot from mirror [trusty-main]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}, {multiverse: [trusty-multiverse_20190312]: Snapshot from mirror [trusty-multiverse]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}, {restricted: [trusty-restricted_20190312]: Snapshot from mirror [trusty-restricted]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}, {universe: [trusty-universe_20190312]: Snapshot from mirror [trusty-universe]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}
...
输出的最后一行以新行结束。
“已发布的存储库:”行不是必需的。
对于以“ *”开头的每一行,我需要删除无关信息,仅保留快照名称。 在 中没有办法做到这一点aptly
。 这些行中第一行所需的输出是。
test_repo_one/xenial [xenial-main_20190311] [xenial-multiverse_20190311] [xenial-restricted_20190311] [xenial-universe_20190311]
方括号也不是必需的,因此保留或删除方括号的解决方案都可以。我更喜欢sed
或awk
解决方案,但任何可行的解决方案都会受到高度赞赏。
答案1
Perl 方法:
$ perl -lne 'next unless /^\s*\*\s*(\S+)/; $n=$1; @k=(/\{.+?:\s*\[(.+?)\]/g); print "$n @k"' file
test_repo_one/xenial xenial-main_20190311 xenial-multiverse_20190311 xenial-restricted_20190311 xenial-universe_20190311
test_repo_one/xenial-security xenial-security-main_20190311 xenial-security-multiverse_20190311 xenial-security-restricted_20190311 xenial-security-universe_20190311
test_repo_two/trusty trusty-main_20190312 trusty-multiverse_20190312 trusty-restricted_20190312 trusty-universe_20190312
解释
perl -lne
:逐行读取输入文件(-n
),删除尾随换行符(-l
)并在每行上运行给出的脚本-e
。还会在每次调用-l
中添加换行符。print
next unless /^\s*\*\s*(\S+)/;
:查找存储库的名称,因此\S+
在以 0 个或更多空格字符 ( ) 开头的行上查找第一个非空格字符 (^\s*
),然后是*
(\*
),然后是 0 个或更多空格字符。之后最长的非空格字符就是我们想要的。如果此行与此正则表达式不匹配,则会next
将我们移至下一行。$n=$1
:将上面的匹配捕获的内容((\S+)
括号中的$1
)保存为$n
。@k=(/\{.+?:\s*\[(.+?)\]/g)
:查找所有包含{
、任何其他字符、然后是:
、空格和的情况,并捕获和[
之间的任何内容。将所有匹配的字符串保存在数组中。[
]
@k
print "$n @k"
:最后,打印上面的repo 名称、$n
和数组。@k
如果您希望包含方括号,则可以使用:
$ perl -lne 'next unless /^\s*\*\s*(\S+)/; $n=$1; @k=(/\{.+?:\s*(\[.+?\])/g); print "$n @k"' file
test_repo_one/xenial [xenial-main_20190311] [xenial-multiverse_20190311] [xenial-restricted_20190311] [xenial-universe_20190311]
test_repo_one/xenial-security [xenial-security-main_20190311] [xenial-security-multiverse_20190311] [xenial-security-restricted_20190311] [xenial-security-universe_20190311]
test_repo_two/trusty [trusty-main_20190312] [trusty-multiverse_20190312] [trusty-restricted_20190312] [trusty-universe_20190312]
答案2
两个答案合二为一
我在这里发布了两个答案:
- 希望更容易理解的 bash 脚本
- 使用常见 Linux 实用程序的一行程序
grep
,sed
以及cut
Bash 脚本的运行方式
我已经关闭了 gnome-terminal 换行功能,以使输入和输出文件更易于阅读。
───────────────────────────────────────────────────────────────────────────────────────────
rick@alien:~/askubuntu$ tput rmam # Turn off line wrap
───────────────────────────────────────────────────────────────────────────────────────────
rick@alien:~/askubuntu$ cat aptfilein
Published repositories:
* test_repo_one/xenial [i386,amd64] publishes {main: [xenial-main_20190311]: Snapshot from mirr}
* test_repo_one/xenial-security [i386,amd64] publishes {main: [xenial-security-main_20190311]: }
* test_repo_two/trusty [i386,amd64] publishes {main: [trusty-main_20190312]: Snapshot from mirr}
...
───────────────────────────────────────────────────────────────────────────────────────────
rick@alien:~/askubuntu$ time aptfileparse.sh
5 lines read from aptfilein
3 lines written to aptfileout
real 0m0.025s
user 0m0.016s
sys 0m0.004s
───────────────────────────────────────────────────────────────────────────────────────────
rick@alien:~/askubuntu$ cat aptfileout
test_repo_one/xenial [xenial-main_20190311] [xenial-multiverse_20190311] [xenial-restricted_201]
test_repo_one/xenial-security [xenial-security-main_20190311] [xenial-security-multiverse_20190]
test_repo_two/trusty [trusty-main_20190312] [trusty-multiverse_20190312] [trusty-restricted_201]
───────────────────────────────────────────────────────────────────────────────────────────
rick@alien:~/askubuntu$
实际的 Bash 脚本
记得使用以下命令使脚本可执行chmod a+x script.sh
#!/bin/bash
# NAME: aptfileparse.sh
# PATH: ~/askubuntu
# DESC: Parse Apt File giving new lines.
# DATE: July 1, 2019.
# NOTE: For: https://askubuntu.com/questions/1127821/text-processing-aptly-output-file
# Program would be ~10 lines shorter (but harder to read) with arrays.
: <<'END'
/* -----------------------------------------------------------------------------
INPUT FILE LAYOUT
=================
Published repositories:
* test_repo_one/xenial [i386,amd64] publishes {main: [xenial-main_20190311]: Snapshot from mirror [xenial-main]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}, {multiverse: [xenial-multiverse_20190311]: Snapshot from mirror [xenial-multiverse]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}, {restricted: [xenial-restricted_20190311]: Snapshot from mirror [xenial-restricted]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}, {universe: [xenial-universe_20190311]: Snapshot from mirror [xenial-universe]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}
* test_repo_one/xenial-security [i386,amd64] publishes {main: [xenial-security-main_20190311]: Snapshot from mirror [xenial-security-main]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}, {multiverse: [xenial-security-multiverse_20190311]: Snapshot from mirror [xenial-security-multiverse]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}, {restricted: [xenial-security-restricted_20190311]: Snapshot from mirror [xenial-security-restricted]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}, {universe: [xenial-security-universe_20190311]: Snapshot from mirror [xenial-security-universe]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}
* test_repo_two/trusty [i386,amd64] publishes {main: [trusty-main_20190312]: Snapshot from mirror [trusty-main]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}, {multiverse: [trusty-multiverse_20190312]: Snapshot from mirror [trusty-multiverse]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}, {restricted: [trusty-restricted_20190312]: Snapshot from mirror [trusty-restricted]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}, {universe: [trusty-universe_20190312]: Snapshot from mirror [trusty-universe]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}
...
OUTPUT FILE LAYOUT
==================
test_repo_one/xenial [xenial-main_20190311] [xenial-multiverse_20190311] [xenial-restricted_20190311] [xenial-universe_20190311]
Five fields to extract: name, main, multiverse, restricted, universe
----------------------------------------------------------------------------- */
END
INPUT="aptfilein"
OUTPUT="aptfileout"
> "$OUTPUT" # Erase previous output file
# Read all input lines
while IFS= read -r line ; do
let CountIn++
! [[ "$line" =~ " *" ]] && continue # skip lines not starting " *"
# Get name
line="${line#" * "}" # remove leading " * "
lout="${line%%" "*}" # name is up to next " "
line="${line#" "*}" # remove name from line
# Get main
line="${line#*"{main: "}" # remove leading "{main: "
lout="$lout ${line%%":"*}" # main is up to next ":"
line="${line#":"*}" # remove name from line
# Get multiverse
line="${line#*"{multiverse: "}" # remove leading "{multiverse: "
lout="$lout ${line%%":"*}" # maultiverse is up to next ":"
line="${line#":"*}" # remove multiverse from line
# Get restricted
line="${line#*"{restricted: "}" # remove leading "{restricted: "
lout="$lout ${line%%":"*}" # restricted is up to next ":"
line="${line#":"*}" # remove restricted from line
# Get universe
line="${line#*"{universe: "}" # remove leading "{universe: "
lout="$lout ${line%%":"*}" # universe is up to next ":"
line="${line#":"*}" # remove universe from line
# Append line to output file with leading space
echo " $lout" >> "$OUTPUT"
let CountOut++
done < "$INPUT"
echo "$CountIn lines read from $INPUT"
echo "$CountOut lines written to $OUTPUT"
具有常用实用程序的一行程序
单行命令在 Linux 社区中非常流行,本问答中发布了一些出色的答案。下面是一个使用大多数经验丰富的命令行用户熟悉的常用实用程序的示例awk
:perl
$ time grep ^" \*" aptfilein | sed 's/ \* //;s/ /: /;s/^/ /' | cut -d':' -f1,3,6,9,12 --output-delimiter=''
test_repo_one/xenial [xenial-main_20190311] [xenial-multiverse_20190311] [xenial-restricted_20190311] [xenial-universe_20190311]
test_repo_one/xenial-security [xenial-security-main_20190311] [xenial-security-multiverse_20190311] [xenial-security-restricted_20190311] [xenial-security-universe_20190311]
test_repo_two/trusty [trusty-main_20190312] [trusty-multiverse_20190312] [trusty-restricted_20190312] [trusty-universe_20190312]
real 0m0.011s
user 0m0.003s
sys 0m0.008s
grep ^" \*" aptfilein
- 该grep
命令选择包含搜索字符串的行。胡萝卜 (^
) 表示字符串必须从行首开始。反斜杠 (\
) 表示星号/splat (*
) 应按字面意思理解,而不是用作选择所有内容的通配符。总之,此命令选择文件中grep
以 开头的所有行。*
aptfilein
sed
是一个“流编辑器”,它编辑传入的行并更改它们,然后将它们传递出去。sed
这里有三个更改's/ \* //;s/ /: /;s/^/ /'
。更改位于引号 ('
) 之间,并用分号 (;
) 分隔符划定(分隔)。它们在下面三点中分解。s/ \* //
- 搜索第一次出现的*
并将其更改为空。这将删除*
每行开头的 。s/ /: /
- 搜索第一个空格并将其更改为冒号 (:
),后跟一个空格。这对于将第一个字段更改为键是必要的。例如test_repo_one/xenial
变成test_repo_one/xenial:
。s/^/ /
- 指示sed
在每行开头插入一个空格。cut -d':' -f1,3,6,9,12 --output-delimiter=''
- 使用cut
命令选择关键字段 # 1、3、6、9 和 12。关键字段由冒号分隔,如参数-d':'
所规定。通常输出字段的分隔方式相同,但可使用 --output-delimiter=''` 参数将其覆盖为空。
笔记:单行命令比 bash 更快,因为 bash 在字符串处理方面较慢。
答案3
我的 awk 方法:
$ cat 1.txt
Published repositories:
* test_repo_one/xenial [i386,amd64] publishes {main: [xenial-main_20190311]: Snapshot from mirror [xenial-main]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}, {multiverse: [xenial-multiverse_20190311]: Snapshot from mirror [xenial-multiverse]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}, {restricted: [xenial-restricted_20190311]: Snapshot from mirror [xenial-restricted]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}, {universe: [xenial-universe_20190311]: Snapshot from mirror [xenial-universe]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}
* test_repo_one/xenial-security [i386,amd64] publishes {main: [xenial-security-main_20190311]: Snapshot from mirror [xenial-security-main]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}, {multiverse: [xenial-security-multiverse_20190311]: Snapshot from mirror [xenial-security-multiverse]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}, {restricted: [xenial-security-restricted_20190311]: Snapshot from mirror [xenial-security-restricted]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}, {universe: [xenial-security-universe_20190311]: Snapshot from mirror [xenial-security-universe]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}
* test_repo_two/trusty [i386,amd64] publishes {main: [trusty-main_20190312]: Snapshot from mirror [trusty-main]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}, {multiverse: [trusty-multiverse_20190312]: Snapshot from mirror [trusty-multiverse]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}, {restricted: [trusty-restricted_20190312]: Snapshot from mirror [trusty-restricted]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}, {universe: [trusty-universe_20190312]: Snapshot from mirror [trusty-universe]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}
$ awk '$1=="*"{split ($0, a, /:/); print $2 a[2] a[5] a[8] a[11]}' 1.txt
test_repo_one/xenial [xenial-main_20190311] [xenial-multiverse_20190311] [xenial-restricted_20190311] [xenial-universe_20190311]
test_repo_one/xenial-security [xenial-security-main_20190311] [xenial-security-multiverse_20190311] [xenial-security-restricted_20190311] [xenial-security-universe_20190311]
test_repo_two/trusty [trusty-main_20190312] [trusty-multiverse_20190312] [trusty-restricted_20190312] [trusty-universe_20190312]