排除 sed 中某个字符之前的字符

排除 sed 中某个字符之前的字符

我有一个只打印一行的文件。我正在努力使用不同的 sed 命令来操作这一行。

apple orange.5678 dog cat 009 you

我想抓住“orange.5678”并包含“you”并忽略其他所有内容。我希望它看起来像下面这样

orange.5678 you

我不知道从哪里开始以及如何排除除“orange.5678”和“you”之外的所有内容。任何帮助都会很棒!

答案1

$ sed -r 's/.* ([^ ]+\.[^ ]+).* ([^ ]+)$/\1 \2/' orange
orange.5678 you

解释

  • -r使用扩展正则表达式
  • s/old/newold用。。。来代替new
  • .*任意数量的任意字符
  • (some characters)保存some characters以供稍后替换时参考
  • [^ ]+一些不是空格的字符
  • \.文字点
  • $行结束
  • \1反向引用已保存的模式

所以s/.* ([^ ]+\.[^ ]+).* ([^ ]+)$/\1 \2/意味着,将行中的任何内容匹配到一些非空格字符之前的空格.,然后是它后面的一些非空格字符(将这些字符保存在 的两侧.),然后匹配任何字符并保存最后一组行上的非空格字符,并将整个匹配替换为以空格分隔的两个保存的模式

答案2

最简单的方法:

awk '{print $2, $6}' file.txt

如果您的实际用例比您的问题所表明的更复杂,并且您需要额外的逻辑(例如,如果它不是总是您需要的第二个和第六个字段),编辑你的问题澄清。

答案3

人们应该看看@Zanna 的另一个答案。非常优雅,展示了正则表达式的强大功能。

尝试使用这个表达式gawk。普通 awk 不适用于分组。

^(?:\w+\s){0,}(\w+\.\w+)(?:\s\w+){0,}\s(\w+)$

它适用于以下变化

apple orange.5678 dog cat 009 you
apple apple grape.9991 pig cat piegon owl
grape.9991 pig cat piegon owl

这里是表达式的描述。

/
^(?:\w+\s){0,}(\w+\.\w+)(?:\s\w+){0,}\s(\w+)$
/
g
^ asserts position at start of the string

Non-capturing group (?:\w+\s){0,}
{0,} Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\w+ matches any word character (equal to [a-zA-Z0-9_])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
\s matches any whitespace character (equal to [\r\n\t\f\v ])

1st Capturing Group (\w+\.\w+)
\w+ matches any word character (equal to [a-zA-Z0-9_])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
\. matches the character . literally (case sensitive)
\w+ matches any word character (equal to [a-zA-Z0-9_])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)

Non-capturing group (?:\s\w+){0,}
{0,} Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\s matches any whitespace character (equal to [\r\n\t\f\v ])
\w+ matches any word character (equal to [a-zA-Z0-9_])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
\s matches any whitespace character (equal to [\r\n\t\f\v ])

2nd Capturing Group (\w+)
\w+ matches any word character (equal to [a-zA-Z0-9_])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
$ asserts position at the end of the string, or before the line terminator right at the end of the string (if any)

答案4

如果必须使用正则表达式进行 sed,那么上面的答案将涵盖您。如果您愿意接受替代方案:

gv@debian: $ read -r a b c d e f<<<"apple orange.5678 dog cat 009 you" && echo "$b $f" 
orange.5678 you

如果这是文件中的一行,则替换<<<"...."<file

此方法的工作需要默认 IFS = space。如果在 doube 中,请IFS=" "在开始时应用。

相关内容