我有一个只打印一行的文件。我正在努力使用不同的 sed 命令来操作这一行。
apple orange.5678 dog cat 009 you
我想抓住“orange.5678”并包含“you”并忽略其他所有内容。我希望它看起来像下面这样
orange.5678 you
我不知道从哪里开始以及如何排除除“orange.5678”和“you”之外的所有内容。任何帮助都会很棒!
答案1
$ sed -r 's/.* ([^ ]+\.[^ ]+).* ([^ ]+)$/\1 \2/' orange
orange.5678 you
解释
-r
使用扩展正则表达式s/old/new
old
用。。。来代替new
.*
任意数量的任意字符(some characters)
保存some characters
以供稍后替换时参考[^ ]+
一些不是空格的字符\.
文字点$
行结束\1
反向引用已保存的模式
所以s/.* ([^ ]+\.[^ ]+).* ([^ ]+)$/\1 \2/
意味着,将行中的任何内容匹配到一些非空格字符之前的空格.
,然后是它后面的一些非空格字符(将这些字符保存在 的两侧.
),然后匹配任何字符并保存最后一组行上的非空格字符,并将整个匹配替换为以空格分隔的两个保存的模式
答案2
最简单的方法:
awk '{print $2, $6}' file.txt
如果您的实际用例比您的问题所表明的更复杂,并且您需要额外的逻辑(例如,如果它不是总是您需要的第二个和第六个字段),编辑你的问题澄清。
答案3
人们应该看看@Zanna 的另一个答案。非常优雅,展示了正则表达式的强大功能。
尝试使用这个表达式gawk
。普通 awk 不适用于分组。
^(?:\w+\s){0,}(\w+\.\w+)(?:\s\w+){0,}\s(\w+)$
它适用于以下变化
apple orange.5678 dog cat 009 you
apple apple grape.9991 pig cat piegon owl
grape.9991 pig cat piegon owl
这里是表达式的描述。
/
^(?:\w+\s){0,}(\w+\.\w+)(?:\s\w+){0,}\s(\w+)$
/
g
^ asserts position at start of the string
Non-capturing group (?:\w+\s){0,}
{0,} Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\w+ matches any word character (equal to [a-zA-Z0-9_])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
\s matches any whitespace character (equal to [\r\n\t\f\v ])
1st Capturing Group (\w+\.\w+)
\w+ matches any word character (equal to [a-zA-Z0-9_])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
\. matches the character . literally (case sensitive)
\w+ matches any word character (equal to [a-zA-Z0-9_])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
Non-capturing group (?:\s\w+){0,}
{0,} Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\s matches any whitespace character (equal to [\r\n\t\f\v ])
\w+ matches any word character (equal to [a-zA-Z0-9_])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
\s matches any whitespace character (equal to [\r\n\t\f\v ])
2nd Capturing Group (\w+)
\w+ matches any word character (equal to [a-zA-Z0-9_])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
$ asserts position at the end of the string, or before the line terminator right at the end of the string (if any)
答案4
如果必须使用正则表达式进行 sed,那么上面的答案将涵盖您。如果您愿意接受替代方案:
gv@debian: $ read -r a b c d e f<<<"apple orange.5678 dog cat 009 you" && echo "$b $f"
orange.5678 you
如果这是文件中的一行,则替换<<<"...."
为<file
此方法的工作需要默认 IFS = space。如果在 doube 中,请IFS=" "
在开始时应用。