我最近编辑了一个现有文档以从中创建一个新文档(即:我将整个文件夹复制到新位置并从那里开始)。早期文档有很多图片,但并非所有图片都在新版本中使用。
现在我有很多未使用的文件(jpg,pdf,png),/fig
我想删除它们,因为它们没有被任何\includegraphics
命令调用。
有没有办法列出已使用或未使用的文件?(我不是指辅助文件,我对那些没意见。)
答案1
我想出了这个小脚本(从项目的根文件夹运行):
#!/bin/bash
for image_file in $(ls fig/)
do
if grep $image_file *.log -c > 1
then
echo "File $image_file is in use."
else
echo "File $image_file is not in use."
mv "fig/$image_file" "fig/moved.$image_file" # or any other action
fi
done
答案2
如果有人仍在寻找,我已经编写了一个 Python 3 脚本来处理这个问题。我使用它来生成一个新的干净的 LaTex 文件夹,所有使用的文件都直接位于文件夹的根目录中,而不是分散在多个子目录中。这是 arXiv 和 HAL 等预印本服务器的要求。
(如果您只想删除不使用的文件,那么只需使用新创建的干净文件夹的内容)
该脚本的输入为:
- 要解析的 TeX 文件列表(如果您将文档分成多个文件,位于同一文件夹中)
- 我们希望查找的潜在未使用文件的文件扩展名列表
- 其他一些不言自明的选项
该脚本在指定的 TeX 文件中查找指定扩展名的所有出现情况,并构建具有此扩展名的所有已使用文件的列表。所有这些文件都被复制到新的指定文件夹中。为方便起见,还会复制位于 TeX 文件夹根目录中的其他文件(TeX 编译文件和以前未使用的文件除外)。提供的 TeX 文件也会被复制,但它们对这些文件的所有引用都会被更改,以便它们直接指向新文件夹根目录中的新文件。
这样,您就可以直接获得一个包含您需要的所有文件的可编译的 LaTex 文件夹。
以下是代码:
import os, sys, shutil
import re
import ntpath
############ INPUTS ###############
# list of Tex files to parse
# (they should all be within the same folder, as the image paths
# are computed relative to the first TeX file)
texPathList = ["/home/my/tex/folder/my_first_file.tex",
"/home/my/tex/folder/my_second_file.tex"]
# extensions to search
extensions=[".png", ".jpg", ".jpeg", ".pdf", ".eps"]
bExcludeComments = True # if True, files appearing in comments will not be kept
# path where all used images and the modified TeX files should be copied
# (you can then copy over missing files, e.g. other types of images, Bib files...)
# location of the new folder (should not exist already)
exportFolder = '/home/my/new/folder/clean_article/'
# should all other files in the root folder (not in subfolders) be copied ?
# (temporary TeX compilation files are not copied)
bCopyOtherRootFiles = True
############## CREATE CLEAN FOLDER #################
# 1 - load TeX files
text=''
for path in texPathList:
with open(path,'r') as f:
text = text + f.read()
# 2 - find all occurrences of the extension
global_matches = []
for extension in extensions:
escaped_extension = '\\'+extension # so that the point is correctly accounted for
pattern=r'\{[^}]+'+escaped_extension+'}'
if not bExcludeComments: # simply find all occurrences
matches = re.findall(pattern=pattern, string=text) # does not give the position
else: # more involved search
# 2.1 - find all matches
positions, matches = [], []
regex = re.compile(pattern)
for m in regex.finditer(text):
print(m.start(), m.group())
positions.append( m.start() )
matches.append( m.group())
# 2.2 - remove matches which appear in a commented line
# parse list in reverse order and remove if necessary
for i in range(len(matches)-1,-1,-1):
# look backwards in text for the first occurrence of '\n' or '%'
startPosition = positions[i]
while True:
if text[startPosition]=='%':
# the line is commented
print('file "{}" is commented (discarded)'.format(matches[i]))
positions.pop(i)
matches.pop(i)
break
if text[startPosition]=='\n':
# the line is not commented --> we keep it
break
startPosition -= 1
global_matches = global_matches + matches
# 3 - make sure there are no duplicates
fileList = set(global_matches)
if len(global_matches) != len(fileList):
print('WARNING: it seems you have duplicate images in your TeX')
# 3.1 - remove curly braces
fileList = [m[1:-1] for m in fileList]
# 4 - copy the used images to the designated new location
try:
os.makedirs(exportFolder)
except FileExistsError:
raise Exception('The new folder already exists, please delete it first')
texRoot = os.path.dirname(texPathList[0])
for m in fileList:
absolutePath = os.path.join(texRoot, m)
shutil.copy(absolutePath, exportFolder)
# 5 - copy the TeX files also, and modify the image paths they refer to
for path in texPathList:
with open(path,'r') as f:
text = f.read()
for m in fileList:
text = text.replace(m, ntpath.basename(m) )
newPath = os.path.join(exportFolder, ntpath.basename(path))
with open(newPath, 'w') as f:
f.write(text)
# 6 - if chosen, copy over all the other files (except TeX temp files)
# which are directly at the root of the original TeX folder
if bCopyOtherRootFiles:
excludedExtensions = ['.aux', '.bak', '.blg', '.bbl', '.spl', '.gz', '.out', '.log']
for filename in os.listdir(texRoot):
fullPath = os.path.join(texRoot, filename)
if os.path.isfile(fullPath):
ext = os.path.splitext(filename)[1]
# do not copy already modified TeX files
if not ( filename in [ntpath.basename(tex) for tex in texPathList]):
# do not copy temporary files
if not ( ext.lower() in excludedExtensions ):
# do not copy files we have already taken care of
if not ( ext.lower() in extensions ):
shutil.copy( fullPath, exportFolder)
# The export folder now contains the modified TeX files and all the required files !
答案3
我在这里写了这篇文章medium.com/@weslley.spereira/remove-unused-files-from-your-latex-project。简而言之,我对 Alessandro Cuttin 的脚本进行了一些概括,以涵盖更多目录级别。我希望它仍然有帮助。
nonUsed="./nonUsedFiles"
mkdir -p "$nonUsed"
# Directory Level 1
for imgFolder in $(ls -d "$projectFolder"/*/); do
echo "$imgFolder"
for imageFile in $(ls "$imgFolder"); do
# echo "$imageFile"
if grep "$imageFile" "$projectFolder/$mainfilename.log" -c > 1; then
echo "+ File $imageFile is in use."
else
echo "- File $imageFile is not in use."
mkdir -p $nonUsed"/"$imgFolder
mv "$imgFolder/$imageFile" "$nonUsed/$imgFolder$imageFile"
fi
done
done
# Directory Level 2
for imgFolder in $(ls -d "$projectFolder"/*/*/); do
echo "$imgFolder"
for imageFile in $(ls "$imgFolder"); do
# echo "$imageFile"
if grep "$imageFile" "$projectFolder/$mainfilename.log" -c > 1; then
echo "+ File $imageFile is in use."
else
echo "- File $imageFile is not in use."
mkdir -p $nonUsed"/"$imgFolder
mv "$imgFolder/$imageFile" "$nonUsed/$imgFolder$imageFile"
fi
done
done
答案4
有点晚了,但是这里是另一种 Python 方法。
此 CLI 应用程序的工作流程:
python3 move_unused_figures_from_latex_project.py
从 latex 项目根目录调用- 指出图形位于哪个文件夹中
- 指定一个文件夹名称来放置未使用的图形
- 如果您确实想移动图像,请指定
注意:要检查某个图形是否在 latex 项目中使用string_found_in_tex_files
,将调用一个函数,该函数检查项目中的 .tex 文件中是否存在该文件路径。因此,注释的图形不会被移动,就像任何与图形无关的文件路径一样。
从latex_project.py中移动未使用的图形:
import shutil
import os
# NOTE: main function is down below
def ask_existing_folder_name(default_folder_name=None) -> str:
""" ask user for a folder name. """
if default_folder_name is None:
folder_name = input('Please enter an existing directory name:')
else:
folder_name = input(
f'Please enter an existing directory (default = {default_folder_name}):')
if folder_name in ['', 'y', 'Y']:
folder_name = default_folder_name
if os.path.isdir(folder_name):
return folder_name
print('That folder does not exist!')
return ask_existing_folder_name(default_folder_name=default_folder_name)
def ask_new_folder_name(default_folder_name=None) -> str:
""" ask user to input a new folder name. """
if default_folder_name is None:
folder_name = input('Please enter a new directory name:')
else:
folder_name = input(f'Please enter a new directory name (default = {default_folder_name}):')
if folder_name in ['', 'y', 'Y']:
folder_name = default_folder_name
if not os.path.isdir(folder_name):
return folder_name
print('That folder does already exist!')
return ask_new_folder_name(default_folder_name=default_folder_name)
def string_found_in_tex_files(string_to_search: str) -> bool:
"""
return True if there exist a .tex file in the current
directory or any subdirectory that contains string_to_search.
"""
print(f"search string {string_to_search}")
for root, _, files in os.walk("."):
for filename in files:
filepath = os.path.join(root, filename)
if filepath.endswith('.tex') and os.path.isfile(filepath):
with open(filepath) as file:
if string_to_search in file.read():
return True
return False
def main():
""" interactive CLI that moves unused figures from latex project. """
print("welcome, we're going to remove all unused figures from this latex project")
print('NOTE: make sure to run this function from latex project root\n')
figures_folder_name = ask_existing_folder_name(default_folder_name='figures/')
print('unused figures are moved to a new directory')
unused_figures_folder_name = ask_new_folder_name(default_folder_name='unused_figures/')
os.mkdir(unused_figures_folder_name)
figure_file_paths = []
extensions = (".pdf", ".jpg", ".png", ".eps")
# collect all relative paths to figures
for root, _, files in os.walk(figures_folder_name):
for filename in files:
if filename.endswith(extensions):
file_path = os.path.join(root, filename)
figure_file_paths.append(file_path)
only_used_figures_detected = True
for file_path in figure_file_paths:
# take away the extension
(file_path_without_extension, _) = os.path.splitext(file_path)
if not string_found_in_tex_files(file_path_without_extension):
only_used_figures_detected = False
answer = input(f'{file_path} is unused,'\
f'do you want to move it to {unused_figures_folder_name} (Y/n)?')
if answer in ['n', 'N', 'no']:
continue
# move the file
shutil.move(file_path, unused_figures_folder_name)
print(f'{file_path} moved to {unused_figures_folder_name}')
if only_used_figures_detected:
print('all figures are used :)')
if __name__ == '__main__':
main()