删除内嵌括号之间的重复单词

删除内嵌括号之间的重复单词

我们的输入看起来像

2012-04-17  [GBPGBP]
2012-04-13  [GBP GBP]
2012-04-13  [GBP]
2012-04-11  [GBPGBP]
2012-04-11  [GBP GBP]
2012-04-10  [GBPGBP]
2012-04-06  [GBP GBP GBP]
2012-04-17  [GBPGBP]
2012-04-13  [GBP CDN]
2012-04-13  [GBP]
2012-04-11  [GBPCDN]
2012-04-11  [GBP DL DL]
2012-04-10  [PSGBP]
2012-04-06  [PS PS]

我们希望得到像这样的输出

2012-04-17  [GBP]
2012-04-13  [GBP]
2012-04-13  [GBP]
2012-04-11  [GBP]
2012-04-11  [GBP]
2012-04-10  [GBP]
2012-04-06  [GBP]
2012-04-17  [GBP]
2012-04-13  [GBP CDN]
2012-04-13  [GBP]
2012-04-11  [GBPCDN]
2012-04-11  [GBP DL]
2012-04-10  [PSGBP]
2012-04-06  [PS]

基本上删除括号内任何重复的字符串。有什么建议么?

答案1

sed -e ': a' -e 's/\(\[[^][]*\)\([A-Z][A-Z][A-Z]*\)\([^][]*\)\2/\1\2\3/' -e 't a'
  • : a在脚本的开头设置一个标签。
  • s/\(wibble\)\(foo\)\(bar\)\2/\1\2\3/将 wibblefoobarfoo 替换为 wibblefoobar。
  • [A-Z][A-Z][A-Z]*匹配两个或多个字母
  • t aa如果前一个s命令进行了替换,则循环回到标签。

相关内容