我有一个test.tex
内容与此类似的文件:
\documentclass{scrartcl}
\usepackage{graphicx}
\title{Test}
\author{Author 1, Author 2, Author 3}
\begin{document}
\end{document}
我想提取 中写的每个作者{ ... }
。因此我做了以下事情:
authors=$(cat test.tex | grep '\author' | tr -d '\author' | tr -d '{' | tr -d '}' )
此代码仅适用于这种情况。我的问题是
- 可能有
[]
而不是{}
- 该行可以跨越多行,如下例所示
\author{Author 1,
Author 2,
Author 3}
有谁知道如何解决这两个问题?
答案1
grep -zPo '\\author{\K[^}]*' ex1.tex | tr '\0\n' '\n '
一些快速解释说明:
-z
输入和输出记录(“行”)由 NULL ( ) 分隔\0
。因此完整的 TeX 文件将是一条记录。-P
使用 Perl PCRE 正则表达式变体。-o
仅输出记录中与 regExp 匹配的部分。\\author{\K
表示左上下文
更改tr '\0\n' '\n '
输出记录分隔符 ( \0
to \n
) 并删除名称内的换行符 ( \n
to )
答案2
#!/bin/bash
sed -nr '
/\\author/ {
:ending
/]|}$/! {
N
b ending
}
s/\\author(\{|\[)(.*)(}|])/\2/p
}
' test.tex
说明(代码相同,但添加了注释):
#!/bin/bash
sed -nr '
# if the line contains the \author string, we are working with it.
/\\author/ {
##### this part are needed for multiple line pattern processing
# put a label here. We will be return to this point,
# until we reach line, which have } or ] in the ending.
:ending
# if this line does not ended by } or ].
# It is tell us, that this line continues on the next line.
/]|}$/! {
# Take the next line and append it to the previous line.
# Just join them together.
N
# Go to the ":ending" label
b ending
}
##### ending multiple line pattern processing
# remove the \author word and brackets from line
s/\\author(\{|\[)(.*)(}|])/\2/p
}
' test.tex
测试文件
\documentclass{scrartcl}
\usepackage{graphicx}
\title{Test}
\author{Author 1, Author 2, Author 3}
\author[Author 1, Author 2, Author 3]
\author{Author 1,
Author 2,
Author 3}
\author[Author 1,
Author 2,
Author 3]
\begin{document}
\end{document}
输出
Author 1, Author 2, Author 3
Author 1, Author 2, Author 3
Author 1,
Author 2,
Author 3
Author 1,
Author 2,
Author 3
答案3
这似乎可以完成工作:egrep -o '[\[{]?Author' | sed -E 's/[\[{]//'
例子:
1)
echo "\documentclass{scrartcl}
\usepackage{graphicx}
\title{Test}
\author[Author 1,
Author 2
Author 3 ] " | egrep -o '[\[{]?Author' | sed -E 's/[\[{]//'
Author
Author
Author
2)
echo "\documentclass{scrartcl}
\usepackage{graphicx}
\title{Test}
\author[Author 1, Author 2, Author 3]
\begin{document}
\end{document}" | egrep -o '[\[{]?Author' | sed -E 's/[\[{]//'
Author
Author
Author
3)
echo "\documentclass{scrartcl}
\usepackage{graphicx}
\title{Test}
\author{Author 1, Author 2, Author 3}
\begin{document}
\end{document}" | egrep -o '[\[{]?Author' | sed -E 's/[\[{]//'
Author
Author
Author
你也许可以只使用grep
和lookbehinds之类的东西来做到这一点。我个人使用管道 into sed
after没有任何问题grep
。
答案4
Python
使用问题中给出的输入文件,可以像这样完成一个衬垫:
$ python -c 'import sys,re;f=open(sys.argv[1],"r");a=tuple(l for l in f.readlines() if l.startswith("\\author") );print("\n".join(re.split(", |,|{|}",a[0].strip())[1:]))' input.tex
Author 1
Author 2
Author 3
和一个脚本如下:
#!/usr/bin/env python
import sys,re
# read the doc, find the desired line
line=""
with open(sys.argv[1]) as f:
for l in f:
if l.startswith("\\author"):
line=l.strip()
break
# split at multiple separators, get slice of that list starting since 2nd item
author_list = re.split( ", |,|{|}", line )[1:]
# print 1 author per line
print("\n".join(author_list))
关键步骤有两个:读取文件并找到以字符串开头的行\\authors
,然后将多个分隔符处的行分成标记列表,并从该标记列表中构建一个换行分隔的字符串。我还冒昧地考虑了您可能必须在,
或处拆分的可能性,<space>
。