我有一些以(因为它们是 markdown)开头的标题#
,并且我有以下两条规则:
- 标题(
#
) 应该有确切地上方两行换行符,下方一行 - 字幕(
##
,###
等等)应该有确切地上面一个空行,下面一个空行。 - 标题应优先于副标题。 (如果存在两个冲突的规则,请使用标题格式并忽略字幕)。
笔记:我正在尝试查找所有不符合这三个限制的标题。
以下是一些好标题和坏标题的示例
some text
# Title | BAD
## Subtitle | Good (Has two spaces below, is needed for next main title)
# Title | Good
## Subtitle | Bad
text
# Title | Bad
text
在摆弄正则表达式之后,我想出了这些表达式:
主要标题:正则表达式
((?<=\n{4})|(?<=.\n{2})|(?<=.\n))(# .*)|(# .*)(?=(\n.|\n{3}(?!# )|\n{4}))
字幕:正则表达式
'((?<=\n{3})|(?<=.\n))(##+.*)|(##+.*)(?=\n.|\n{3}(?!# )|\n{4}.)'
然而令我非常困惑的是,它们不与pcregrep
?这是我尝试运行的命令pcgrep
(只是为了完整性):
$ pcregrep -rniM --include='.*\.md' \
'((?<=\n{3})|(?<=.\n))(##+.*)|(##+.*)(?=\n.|\n{3}(?!# )|\n{4}.)' \
~/Programming/oppgaver/src/web
当我尝试只搜索一个文件并且我有其他几个可以正常工作的表达式时,它也不起作用。
我的有什么问题吗正则表达式,或者是一个错误的实现?
答案1
该解决方案修复了所有不正确的标题。
sed -r '
:loop; N; $!b loop
s/\n+(#[^\n]+)/\n\n\1/g
s/(#[^\n]+)\n+/\1\n\n/g
s/\n+(#[^\n#]+)/\n\n\n\1/g
' input.txt;
附评论:
sed -r '
### put all file into the pattern space,
# in other words, merge all lines into one line
:loop; N; $!b loop;
### first traversal of the pattern space
# searches the line with "#" sign (all cases matches - Titles, SubTitles, etc),
# takes all its upper empty lines
# and converts them to the one empty line
s/\n+(#[^\n]+)/\n\n\1/g;
### second traversal of the pattern space
# again, searches the line with "#" sign, take all its bottom empty lines
# and converts them to the one empty line
s/(#[^\n]+)\n+/\1\n\n/g;
### third traversal of the pattern space
# searches the single "#" sign (Titles only),
# takes all its upper newlines (at this moment only two of them are there,
# because of previous substitutions)
# and converts them to three newlines
s/\n+(#[^\n#]+)/\n\n\n\1/g
' input.txt
输入
text
# Title
## SubTitle
### SubSubTitle
# Title
## SubTitle
text
### SubSubTitle
# Title
# Title
# Title
## SubTitle
### SubSubTitle
输出
text
# Title
## SubTitle
### SubSubTitle
# Title
## SubTitle
text
### SubSubTitle
# Title
# Title
# Title
## SubTitle
### SubSubTitle