我有一个很大的 pdflatex 文档,其中的每一章都位于一个额外的文本文件中,并包含在\include{chapter3.tex}
...
- 我如何提取每章的页码并将其写入文本文件?
我想知道每章有多少页(在某一时刻)并获取一个列表,例如
(可以使用包含的文本文件来完成,还是最好在每章开头定义标签,然后读出这些标签的页码)?
第 1 章:5 页
第 2 章:10 页
第 3 章:4 页文档:19页
- 此外,我想计算某些命令的出现次数,例如
\N{note}
或\NK{note}
(我已经使用每个章节的包在文档中定义了这些命令来创建注释,fixme
并将它们写入文本文件中,例如:
第 \N \NK
第 1 章: 20 3
第 2 章: 3 5文件:23 8
答案1
我假设您不想修改源文件(否则会更容易 - 在每个文件后添加一些智能命令\chapter
)。
你可以做这样的事情:
\let\origchapter=\chapter
\def\chapter{\label{chap:#1}\origchapter}
这样做的缺点是标签与以前的页面。为了做得更好,您必须注意\chapter
(可选星号,可选参数)的语法,这实现起来有点耗时,但完全可行(而且也不是那么困难)。
然后只需使用一些 perl/python/lua/whatever 脚本来解析 .aux 文件。
或者,你可以使用类似
\write\somefilehandler{Chapter: \value{chapter}\thepage}
而不是\label
;那么您必须\somefilehandler
在文档开头打开并关闭它\AtEndDocument
。
至于你的第二个问题,一个简单的想法是:
\newcounter{foocount}
\let\origfoo=\foo
\def\foo{\stepcounter{foocounter}\origfoo}
\AtEndDocument{\message{\string\foo: \value{foocounter}}}
这将在日志文件和终端上为您提供整个文档的总数。如果您想要按章节计算总数,则可以执行\message
s(或\write
s)\chapter
,按照上述精神重新定义它。
答案2
正如我在上面的评论中提到的,这是我用来按部分拆分生成的 PDF 文件的 Python 脚本。您可能需要根据自己的需要对其进行调整,我希望它对您有用。它调用pdftk
以进行实际拆分。
也许有一个更“标准”的解决方案,我希望有人可以添加评论。
#!/usr/bin/python
class breakPt:
def __init__(self,pg,title,num):
self.pg,self.title,self.num=pg,title,num
import re,os
pts=[]
for l in file('master.toc'):
#if (not '{part}' in l) and (not '{section}' in l): continue
m=re.match(r'^\\contentsline\W*{(section|part|chapter)}{(.*)}{([0-9]+)}({[^}]*})?$',l)
if not m: continue
#print 'Match:',m.group(1,2,3)
type,raw,pg=m.group(1,2,3)
if type=='section':
m=re.match(r'^\\numberline\W*{([0-9]+)}(.*)$',raw)
num,raw=m.group(1,2)
raw=re.sub(r'\\FN@sf@gobble@opt .*$','',raw) # strip footnote
raw=re.sub(r'\\IeC\W*{.*?([a-zA-Z]) ?}',r'\1',raw) # remove accents
raw=re.sub(r'\\emph\W*{(.*?)}',r'\1',raw) # remove \\emph
raw=re.sub(r'(:|\W*\\&|\W*\().*$','',raw) # take just the "first part" as name
raw=re.sub(r' a ',r' ',raw) # remove 'a' as conjunction
raw=re.sub(r'[^a-zA-Z]+','_',raw) # remove commans
raw=raw.lower()
pts.append(breakPt(int(pg),raw,int(num) if type=='section' else -1))
#print 'added',pts[-1].num,pts[-1].title,pts[-1].pg
for i,pt in enumerate(pts):
bgPg,endPg=pt.pg,(pts[i+1].pg-1 if i+1<len(pts) else -1)
if pt.num<0: continue
#print pt.num,bgPg,endPg,pt.title
pgSpec='%02d-%02d'%(bgPg,endPg) if endPg>0 else '%02d-end'%bgPg
out='%02d-%s.pdf'%(pt.num,pt.title)
print pgSpec,out
os.system('pdftk master.pdf cat %s output %s'%(pgSpec,out))
答案3
我最终使用了基于 的解决方案bash script
。
因为其他人可能会感兴趣,所以我在这里分享。然而我不得不说这是实验并且可能包含一些非常恶意的黑客,因为我对脚本编写不是很有经验bash
。
#!/bin/bash
# script overwrites file DocStat.txt with recent statistics of latex writing project:
# Block 1: date, number of pages and file size of PDF
# Block 2: All chapters with title and number of pages
# Block 3: All chapters and sectinos with title and number of pages
# Block 4: word count statistics using textcount.pl for all chapters
# (during script runtime, a temporary file Docstat.tmp is created for collecting the output)
# the script scans the aux files to extract the page numbers, where sections begin
date > Docstat.txt
grep "Output written on" Diss.log >> Docstat.txt
grep "contentsline {chapter}" Diss.toc | sed 's/\\contentsline //g' | sed 's/\\numberline //g' >> Docstat.tmp
NEin=$(grep "newlabel{anf:Kap}\|newlabel{end:Kap}" 1_Introduction.aux | awk 'BEGIN {
FS="[{}]+"
} {
if ($2=="anf:Kap")
KapAnf=$4
if ($2=="end:Kap")
KapEnd=$4
} END {
# print KapAnf
print KapEnd-KapAnf+1
}')
NGru=$(grep "newlabel{anf:Kap}\|newlabel{end:Kap}" 2_Theory.aux | awk 'BEGIN {
FS="[{}]+"
} {
if ($2=="anf:Kap")
KapAnf=$4
if ($2=="end:Kap")
KapEnd=$4
} END {
# print KapAnf
print KapEnd-KapAnf+1
}')
NExp=$(grep "newlabel{anf:Kap}\|newlabel{end:Kap}" 3_Experimental.aux | awk 'BEGIN {
FS="[{}]+"
} {
if ($2=="anf:Kap")
KapAnf=$4
if ($2=="end:Kap")
KapEnd=$4
} END {
# print KapAnf
print KapEnd-KapAnf+1
}')
NEuD=$(grep "newlabel{anf:Kap}\|newlabel{end:Kap}" 5_ResultsAndDiscussion.aux | awk 'BEGIN {
FS="[{}]+"
} {
if ($2=="anf:Kap")
KapAnf=$4
if ($2=="end:Kap")
KapEnd=$4
} END {
# print KapAnf
print KapEnd-KapAnf+1
}')
NZus=$(grep "newlabel{anf:Kap}\|newlabel{end:Kap}" 7_Conclusion.aux | awk 'BEGIN {
FS="[{}]+"
} {
if ($2=="anf:Kap")
KapAnf=$4
if ($2=="end:Kap")
KapEnd=$4
} END {
# print KapAnf
print KapEnd-KapAnf+1
}')
NLit=$(awk 'BEGIN {
FS="[{}]+"
} {
if ($3=="References") #manuelle Anpassung, weil Lit. keine Nummer hat
KapA=$4
if ($4=="Publications")
# KapB=$5-5 # manual correction by 5 pages!!
# KapB= $KapB-4 # manuelle Korrektur um 4 Seite!!
} END {
print KapB-KapA+1
}' Docstat.tmp) #&& echo $DIFFERENZ
NGes=$(awk 'BEGIN {
FS="[{}]+"
} {
if ($4=="Introduction")
KapA=$5
if ($3=="References") #manuelle Anpassung, weil Lit. keine Nummer hat
KapB=$4
} END {
print KapB-KapA+1
}' Docstat.tmp) #&& echo $DIFFERENZ
NGesLit=$(($NGes + $NLit))
echo " " >> Docstat.txt
echo "==== page numbers" >>Docstat.txt
echo "total_(withoutRefs): $NGes S." >> Docstat.txt
echo "total: $NGesLit S." >> Docstat.txt
echo " " >>Docstat.txt
echo "Introduction: $NEin S." >> Docstat.txt
echo "Theory: $NGru S." >> Docstat.txt
echo "Experimental: $NExp S." >> Docstat.txt
echo "Results: $NEuD S." >> Docstat.txt
echo "Conclusions: $NZus S." >> Docstat.txt
echo "References: $NLit S." >> Docstat.txt
head -14 Docstat.txt
echo '==== number of lines'
grep Zeilenzahl Diss.log # must be
grep linenumber *.aux | sed 's/\\setcounter{linenumber}/ /g'
growlnotify -t "Diss Statistik: $NGes / $NGesLit S." -m "Ein $NEin, Gru $NGru, Erg $NEuD"
echo " " >> Docstat.txt
echo '==== Chapters' >> Docstat.txt
cat Docstat.tmp >> Docstat.txt
echo "" >> Docstat.txt
echo '==== Details' >> Docstat.txt
grep "contentsline \({chapter}\|{section}\)" Diss.toc | sed 's/\\contentsline //g' | sed 's/\\numberline //g' >> Docstat.txt
echo '==== Word count' >> Docstat.txt
texcount 1_Introduction.tex >> Docstat.txt
texcount 2_Theory.tex >> Docstat.txt
texcount 3_Experimental.tex >> Docstat.txt
texcount 5_Results.tex >> Docstat.txt
texcount 7_Conclusion.tex >> Docstat.txt
echo '==== Number of lines' >> Docstat.txt
grep Zeilenzahl Diss.log >> Docstat.txt
echo 'by chapter' >> Docstat.txt
grep linenumber *.aux | sed 's/\\setcounter{linenumber}/ /g' >> Docstat.txt
rm Docstat.tmp