我有一个 dbpedia 日志,里面全是 URL。有些 URL 没有格式化,但有些 URL 经过格式化,导致出现无数的加号。例如:
529e0532100c7d6f2b6ba4c093ff9581 - - [03/Jan/2014 00:00:00 +0100] "GET /sparql/?callback=a&default-graph-uri=http%3A%2F%2Fdbpedia.org&query=PREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E+++++PREFIX+dbo%3A+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2F%3E+++++++++++++++SELECT+%3Fpic%2C+%3Fabstract+WHERE+%7B+++++++++++++++++++++++++++%7B++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++%3Fs+rdfs%3Alabel+%22%D0%A0%D0%B2%D0%BE%D1%82%D0%B0%22%40ru+.++++++++++++++++++++++++++%7B++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++%3Fs+dbo%3Athumbnail+%3Fpic++%3B++++++++++++++++++++++++++++++++++++dbo%3Aabstract++%3Fabstract+++++++++++++++++++++++++++%7D++++++++++++++++++++++++++++++++++++++++++++++++++++++++UNION++++++++++++++++++++++++++++++++++++++++++++++++++++%7B++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++%3Fs+dbo%3AwikiPageDisambiguates+%3FactualResource+.+++++++++++%3FactualResource+rdfs%3Alabel++++%3FredirectsTo+%3B+++++++++++++++++++++++++++++dbo%3Athumbnail+%3Fpic+++++++++%3B+++++++++++++++++++++++++++++dbo%3Aabstract++%3Fabstract++++++++++++++++++FILTER(lang(%3FredirectsTo)+%3D+%22ru%22)++++++++++++++++%7D++++++++++++++++++++++++++++++++++++++++++++++++++++++++UNION++++++++++++++++++++++++++++++++++++++++++++++++++++%7B++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++%3Fs+dbo%3AwikiPageRedirects+%3FactualResource+.+++++++++++++++%3FactualResource+rdfs%3Alabel++++%3FredirectsTo+%3B+++++++++++++++++++++++++++++dbo%3Athumbnail+%3Fpic+++++++++%3B+++++++++++++++++++++++++++++dbo%3Aabstract++%3Fabstract++++++++++++++++++FILTER(lang(%3FredirectsTo)+%3D+%22ru%22)++++++++++++++++%7D++++++++++++++++++++++++++++++++++++++++++++++++++++%7D++++++++++++++++++++++++++++++++++++++++++++++++++++++++FILTER+(lang(%3Fabstract)+%3D+%22ru%22)++++++++++++++++++++++%7D+LIMIT+1+++++++++++++++++++++++++++++++++++++++++++++++&format=application%2Fjson&timeout=30000&debug=on&_=1388699454908 HTTP/1.0" 200 6845 "http://www.slovohvat.ru/g/g8FbJ" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1150.1 Iron/20.0.1150.1 Safari/536.11";
我想将多个加号减少为一个。我对sed
和通配符比较熟悉,但在这种情况下,我需要符号的 +-通配符+
。我该如何实现?
答案1
在基本正则表达式 (BRE) 语法中,a+
按字面意思处理,为了得到一个或多个它需要被转义:
sed 's/+\+/+/g'
相反,在扩展正则表达式 (ERE) 语法中,+
默认是量词,并\+
恢复字面含义:
sed -E 's/\++/+/g'
您可以使用 POSIX 量词\{1,\}
(BRE) 或{1,}
(ERE) 来避免一些混淆:
sed 's/+\{1,\}/+/g'
sed -E 's/\+{1,}/+/g'