如何清理未使用的文件?

如何清理未使用的文件?

我最近编辑了一个现有文档以从中创建一个新文档(即:我将整个文件夹复制到新位置并从那里开始)。早期文档有很多图片,但并非所有图片都在新版本中使用。

现在我有很多未使用的文件(jpg,pdf,png),/fig我想删除它们,因为它们没有被任何\includegraphics命令调用。

有没有办法列出已使用或未使用的文件?(我不是指辅助文件,我对那些没意见。)

答案1

我想出了这个小脚本(从项目的根文件夹运行):

#!/bin/bash

for image_file in $(ls fig/)
do
if grep $image_file *.log -c > 1
then
        echo "File $image_file is in use."
else
        echo "File $image_file is not in use."
        mv "fig/$image_file" "fig/moved.$image_file" # or any other action
fi
done

答案2

如果有人仍在寻找,我已经编写了一个 Python 3 脚本来处理这个问题。我使用它来生成一个新的干净的 LaTex 文件夹,所有使用的文件都直接位于文件夹的根目录中,而不是分散在多个子目录中。这是 arXiv 和 HAL 等预印本服务器的要求。

(如果您只想删除不使用的文件,那么只需使用新创建的干净文件夹的内容)

该脚本的输入为:

  • 要解析的 TeX 文件列表(如果您将文档分成多个文件,位于同一文件夹中)
  • 我们希望查找的潜在未使用文件的文件扩展名列表
  • 其他一些不言自明的选项

该脚本在指定的 TeX 文件中查找指定扩展名的所有出现情况,并构建具有此扩展名的所有已使用文件的列表。所有这些文件都被复制到新的指定文件夹中。为方便起见,还会复制位于 TeX 文件夹根目录中的其他文件(TeX 编译文件和以前未使用的文件除外)。提供的 TeX 文件也会被复制,但它们对这些文件的所有引用都会被更改,以便它们直接指向新文件夹根目录中的新文件。

这样,您就可以直接获得一个包含您需要的所有文件的可编译的 LaTex 文件夹。

以下是代码:

import os, sys, shutil
import re
import ntpath

############ INPUTS ###############
# list of Tex files to parse
# (they should all be within the same folder, as the image paths
# are computed relative to the first TeX file)
texPathList = ["/home/my/tex/folder/my_first_file.tex",
               "/home/my/tex/folder/my_second_file.tex"]

# extensions to search
extensions=[".png", ".jpg", ".jpeg", ".pdf", ".eps"]

bExcludeComments = True # if True, files appearing in comments will not be kept
# path where all used images and the modified TeX files should be copied
# (you can then copy over missing files, e.g. other types of images, Bib files...)

# location of the new folder (should not exist already)
exportFolder = '/home/my/new/folder/clean_article/'

#  should all other files in the root folder (not in subfolders) be copied ?
# (temporary TeX compilation files are not copied)
bCopyOtherRootFiles = True

############## CREATE CLEAN FOLDER #################
# 1 - load TeX files
text=''
for path in texPathList:
  with open(path,'r') as f:
    text = text + f.read()
    
# 2 - find all occurrences of the extension
global_matches = []
for extension in extensions:
  escaped_extension = '\\'+extension # so that the point is correctly accounted for
  pattern=r'\{[^}]+'+escaped_extension+'}'
  if not bExcludeComments: # simply find all occurrences
    matches = re.findall(pattern=pattern, string=text) # does not give the position
  else: # more involved search
    # 2.1 - find all matches
    positions, matches = [], []
    regex = re.compile(pattern)
    for m in regex.finditer(text):
        print(m.start(), m.group())
        positions.append( m.start() )
        matches.append( m.group())
    # 2.2 - remove matches which appear in a commented line
    # parse list in reverse order and remove if necessary
    for i in range(len(matches)-1,-1,-1):
      # look backwards in text for the first occurrence of '\n' or '%'
      startPosition = positions[i]
      while True:
        if text[startPosition]=='%':
          # the line is commented
          print('file "{}" is commented (discarded)'.format(matches[i]))
          positions.pop(i)
          matches.pop(i)
          break
        if text[startPosition]=='\n':
          # the line is not commented --> we keep it
          break
        startPosition -= 1
  global_matches = global_matches + matches
  
# 3 - make sure there are no duplicates
fileList = set(global_matches)
if len(global_matches) != len(fileList):
  print('WARNING: it seems you have duplicate images in your TeX')
# 3.1 - remove curly braces
fileList = [m[1:-1] for m in fileList]

# 4 - copy the used images to the designated new location
try:
  os.makedirs(exportFolder)
except FileExistsError:
  raise Exception('The new folder already exists, please delete it first')

texRoot = os.path.dirname(texPathList[0])
for m in fileList:
  absolutePath = os.path.join(texRoot, m)
  shutil.copy(absolutePath, exportFolder)

# 5 - copy the TeX files also, and modify the image paths they refer to
for path in texPathList:
  with open(path,'r') as f:
    text = f.read()
  for m in fileList:
    text = text.replace(m, ntpath.basename(m) )
  newPath = os.path.join(exportFolder, ntpath.basename(path))
  with open(newPath, 'w') as f:
    f.write(text)

# 6 - if chosen, copy over all the other files (except TeX temp files)
# which are directly at the root of the original TeX folder
if bCopyOtherRootFiles:
  excludedExtensions = ['.aux', '.bak', '.blg', '.bbl', '.spl', '.gz', '.out', '.log']
  for filename in os.listdir(texRoot):
    fullPath = os.path.join(texRoot, filename)
    if os.path.isfile(fullPath):
      ext = os.path.splitext(filename)[1]
      # do not copy already modified TeX files
      if not ( filename in [ntpath.basename(tex) for tex in texPathList]):
        # do not copy temporary files
        if not ( ext.lower() in excludedExtensions ):
          # do not copy files we have already taken care of
          if not ( ext.lower() in extensions ):
            shutil.copy( fullPath, exportFolder)

# The export folder now contains the modified TeX files and all the required files !

答案3

我在这里写了这篇文章medium.com/@weslley.spereira/remove-unused-files-from-your-latex-project。简而言之,我对 Alessandro Cuttin 的脚本进行了一些概括,以涵盖更多目录级别。我希望它仍然有帮助。

nonUsed="./nonUsedFiles"
mkdir -p "$nonUsed"

# Directory Level 1
for imgFolder in $(ls -d "$projectFolder"/*/); do
    echo "$imgFolder"
    for imageFile in $(ls "$imgFolder"); do
#        echo "$imageFile"
        if grep "$imageFile" "$projectFolder/$mainfilename.log" -c > 1; then
            echo "+ File $imageFile is in use."
        else
            echo "- File $imageFile is not in use."
            mkdir -p $nonUsed"/"$imgFolder
            mv "$imgFolder/$imageFile" "$nonUsed/$imgFolder$imageFile"
        fi
    done
done

# Directory Level 2
for imgFolder in $(ls -d "$projectFolder"/*/*/); do
    echo "$imgFolder"
    for imageFile in $(ls "$imgFolder"); do
#        echo "$imageFile"
        if grep "$imageFile" "$projectFolder/$mainfilename.log" -c > 1; then
            echo "+ File $imageFile is in use."
        else
            echo "- File $imageFile is not in use."
            mkdir -p $nonUsed"/"$imgFolder
            mv "$imgFolder/$imageFile" "$nonUsed/$imgFolder$imageFile"
        fi
    done
done

答案4

有点晚了,但是这里是另一种 Python 方法。

此 CLI 应用程序的工作流程:

  • python3 move_unused_figures_from_latex_project.py从 latex 项目根目录调用
  • 指出图形位于哪个文件夹中
  • 指定一个文件夹名称来放置未使用的图形
  • 如果您确实想移动图像,请指定

注意:要检查某个图形是否在 latex 项目中使用string_found_in_tex_files,将调用一个函数,该函数检查项目中的 .tex 文件中是否存在该文件路径。因此,注释的图形不会被移动,就像任何与图形无关的文件路径一样。

从latex_project.py中移动未使用的图形:

import shutil
import os

# NOTE: main function is down below

def ask_existing_folder_name(default_folder_name=None) -> str:
    """ ask user for a folder name. """

    if default_folder_name is None:
        folder_name = input('Please enter an existing directory name:')
    else:
        folder_name = input(
                f'Please enter an existing directory (default = {default_folder_name}):')

        if folder_name in ['', 'y', 'Y']:
            folder_name = default_folder_name

    if os.path.isdir(folder_name):
        return folder_name

    print('That folder does not exist!')
    return ask_existing_folder_name(default_folder_name=default_folder_name)


def ask_new_folder_name(default_folder_name=None) -> str:
    """ ask user to input a new folder name. """
    if default_folder_name is None:
        folder_name = input('Please enter a new directory name:')
    else:
        folder_name = input(f'Please enter a new directory name (default = {default_folder_name}):')

        if folder_name in ['', 'y', 'Y']:
            folder_name = default_folder_name

    if not os.path.isdir(folder_name):
        return folder_name

    print('That folder does already exist!')
    return ask_new_folder_name(default_folder_name=default_folder_name)

def string_found_in_tex_files(string_to_search: str) -> bool:
    """
    return True if there exist a .tex file in the current
    directory or any subdirectory that contains string_to_search.
    """
    print(f"search string {string_to_search}")
    for root, _, files in os.walk("."):
        for filename in files:
            filepath = os.path.join(root, filename)
            if filepath.endswith('.tex') and os.path.isfile(filepath):
                with open(filepath) as file:
                    if string_to_search in file.read():
                        return True
    return False

def main():
    """ interactive CLI that moves unused figures from latex project. """

    print("welcome, we're going to remove all unused figures from this latex project")
    print('NOTE: make sure to run this function from latex project root\n')

    figures_folder_name = ask_existing_folder_name(default_folder_name='figures/')

    print('unused figures are moved to a new directory')
    unused_figures_folder_name = ask_new_folder_name(default_folder_name='unused_figures/')
    os.mkdir(unused_figures_folder_name)

    figure_file_paths = []

    extensions = (".pdf", ".jpg", ".png", ".eps")

    # collect all relative paths to figures
    for root, _, files in os.walk(figures_folder_name):
        for filename in files:
            if filename.endswith(extensions):
                file_path = os.path.join(root, filename)
                figure_file_paths.append(file_path)

    only_used_figures_detected = True
    for file_path in figure_file_paths:

        # take away the extension
        (file_path_without_extension, _) = os.path.splitext(file_path)

        if not string_found_in_tex_files(file_path_without_extension):

            only_used_figures_detected = False

            answer = input(f'{file_path} is unused,'\
                    f'do you want to move it to {unused_figures_folder_name} (Y/n)?')
            if answer in ['n', 'N', 'no']:
                continue

            # move the file
            shutil.move(file_path, unused_figures_folder_name)
            print(f'{file_path} moved to {unused_figures_folder_name}')


    if only_used_figures_detected:
        print('all figures are used :)')

if __name__ == '__main__':
    main()

相关内容