需要使用变量重命名许多很长的文件名

需要使用变量重命名许多很长的文件名

我的公司每天都会收到一份需要处理的文件列表,而这些文件名几乎是我们系统无法处理的。有没有办法重命名这些文件?我对任何类型的脚本都是新手,所以我不知道从哪里开始。我使用的是 Windows 系统。我试过使用批量重命名实用程序,但我不知道如何删除 AB_C_D_,而且它有时会因为一些我还没有弄清楚的未知原因而出错。有没有办法使用 PowerShell 重命名这些文件?

现在文件名如下所示:

Sample1_Sample2_1_05-11-2015_0_Sample3-AB_C_D_045_4_Sample4_123456.pdf

这就是我想要做的:

  • 删除Sample1(始终相同)
  • 保留Sample2文件名以 开头Sample2(始终相同)
  • 消除_1
  • 留下日期(未来的日期,将会改变)
  • 删除0_Sample3(始终相同)
    -保留页码(045,每个文件都不同)并将其放在日期之后。-
    删除_4_Sample4_-
    保留 123456(这是一个识别号,每个文件都不同)。

主要问题是我想删除 AB_C_D_,这些字母会发生变化。可能会多一些或少一些(例如 A_C_D_),我不知道如何删除这部分。


因此最终的文件名将是Sample2_05-11-2015_045_123456.pdf

如果有人能帮助我或者指出如何做到这一点的正确方向,我将非常感激!

提前致谢,HH-GeekyGal

答案1

此 Powershell 脚本将按您需要的方式重命名文件。将其另存为RenameFiles.ps1并从 PowerShell 控制台运行。

脚本接受以下参数:

  • 小路必需,磁盘上现有的文件夹,用于存储文件。您可以提供多个路径。
  • 递归可选开关,控制递归。如果指定,脚本将重命名所有子文件夹中的文件。
  • 如果什么可选开关,如果指定,脚本将仅报告新旧文件名。不会进行重命名。

示例(从 PowerShell 控制台运行):

  • 重命名文件夹中的所有文件c:\path\to\files

    .\RenameFiles.ps1 -Path 'c:\path\to\files'
    
  • 重命名文件夹中的所有pdf文件c:\path\to\files

    .\RenameFiles.ps1 -Path 'c:\path\to\files\*.pdf'
    
  • 重命名文件夹中的所有pdf文件c:\path\to\files,递归

    .\RenameFiles.ps1 -Path 'c:\path\to\files\*.pdf' -Recurse
    
  • 扫描多个文件夹中的文件,递归,仅报告(不重命名):

    .\RenameFiles.ps1 -Path 'c:\path\A\*.pdf', 'c:\path\B\*.psd' -Recurse -WhatIf
    

RenameFiles.ps1脚本本身:

# Arguments accepted by script
Param
(
    # One or multiple paths, as array of strings
    [Parameter(Mandatory = $true, ValueFromPipeline = $true)]
    [string[]]$Path,

    # Recurse switch
    [switch]$Recurse,

    # Whatif switch
    [switch]$WhatIf
)

# This function transforms long file name (w\o extension) to short via regex
function Split-FileName
{
    [CmdletBinding()]
    Param
    (
        # Original file name
        [Parameter(Mandatory = $true, ValueFromPipeline = $true)]
        [string]$FileName
    )

    Begin
    {
        # You can change this block to adapt new rules for file renaming,
        # without modifying other parts of script.

        # Regex to match, capture groups are used to build new file name
        $Regex = '(Sample2).*(\d{2}-\d{2}-\d{4}).*(?<=[a-z]_)(\d+)(?=_\d+).*(?<=_)(\d+)$'

        # Scriptblock that builds new file name. $Matches is hashtable, but we need array for the format (-f) operator.
        # So this code: @(0..$Matches.Count | ForEach-Object {$Matches[$_]})} transforms it to the array.

        # Basically, we creating a new array of integers from 0 to count of $Matches keys, e.g. @(0,1,2,3,4,5)
        # and passing it down the pipeline. Then, in the foreach loop we output values of $Matches keys which name
        # match the current pipeline object, e.g. $Matches['1'], $Matches['2'], etc.
        # $Matches['0'] holds whole matched string, other keys hold capture groups.

        # This would also work:
        # $NewFileName = {'{0}_{1}_{2}_{3}{4}' -f $Matches['1'], $Matches['2'], $Matches['3'], $Matches['4'], $Matches['5']

        $NewFileName = {'{1}_{2}_{3}_{4}{5}' -f @(0..$Matches.Count | ForEach-Object {$Matches[$_]})}

    }

    Process
    {
        # If original file name matches regex
        if($FileName -match $Regex)
        {
            # Call scriptblock to generate new file name
            . $NewFileName
        }
    }
}

# For each path, get all file objects
Get-ChildItem -Path $Path -Recurse:$Recurse |
    # That are not directory
    Where-Object {!$_.PsIsContainer} |
        # For each file
        ForEach-Object {
            # Try to create new file name
            $NewBaseName = $_.BaseName | Split-FileName

            if($NewBaseName)
            {
                # If file name matched regex and we've got a new file name...

                # Build full path for the file with new name
                $NewFullName = Join-Path -Path $_.DirectoryName -ChildPath ($NewBaseName + $_.Extension)

                if(Test-Path -Path $NewFullName -PathType Leaf)
                {
                    # If such file already exists, show error message
                    Write-Host "File already exist: $NewFullName"
                }
                else
                {
                    # If not, rename it or just show report, depending on WhatIf switch
                    Rename-Item -Path $_.FullName -NewName $NewFullName -WhatIf:$WhatIf -Force
                }
            }
    }

此脚本中使用的正则表达式:https://regex101.com/r/hT2uN9/2(请注意,默认情况下,PowerShell 的正则表达式不区分大小写)。正则表达式解释的副本在此处:

正则表达式

(Sample2).*(\d{2}-\d{2}-\d{4}).*(?<=[a-z]_)(\d+)(?=_\d+).*(?<=_)(\d+)$

示例2细绳:

1st Capturing group (Sample2)

Sample2 matches the characters Sample2 literally (case insensitive)

任何角色(未被捕获且不存在于$Matches变量中):

.* matches any character (except newline)
Quantifier: * Between zero and unlimited times, as many times as possible,
giving back as needed [greedy]

日期

2nd Capturing group (\d{2}-\d{2}-\d{4})

\d{2} match a digit [0-9]
Quantifier: {2} Exactly 2 times
- matches the character - literally

\d{2} match a digit [0-9]
Quantifier: {2} Exactly 2 times
- matches the character - literally

\d{4} match a digit [0-9]
Quantifier: {4} Exactly 4 times

任何角色(未被捕获且不存在于$Matches变量中):

.* matches any character (except newline)
Quantifier: * Between zero and unlimited times, as many times as possible,
giving back as needed [greedy]

页数

(?<=[a-z]_) Positive Lookbehind - Assert that the regex below can be matched

[a-z] match a single character present in the list below
a-z a single character in the range between a and z (case insensitive)
_ matches the character _ literally

3rd Capturing group (\d+)

\d+ match a digit [0-9]
Quantifier: + Between one and unlimited times, as many times as possible,
giving back as needed [greedy]

(?=_\d+) Positive Lookahead - Assert that the regex below can be matched
_ matches the character _ literally

\d+ match a digit [0-9]
Quantifier: + Between one and unlimited times, as many times as possible,
giving back as needed [greedy]

任何角色(未被捕获且不存在于$Matches变量中):

.* matches any character (except newline)
Quantifier: * Between zero and unlimited times, as many times as possible,
giving back as needed [greedy]

身份证号

(?<=_) Positive Lookbehind - Assert that the regex below can be matched
_ matches the character _ literally

4th Capturing group (\d+)

\d+ match a digit [0-9]
Quantifier: + Between one and unlimited times, as many times as possible,
giving back as needed [greedy]

答案2

就像 Karan 链接的那样,正则表达式是实现此目的的方法。我使用的是 Linux,因此我不确定 powershell 是否有合适的内置程序,但如果没有,请从 sourceforge 下载适用于 Windows 的 sed。它简直太棒了。

我的 sed-fu 很糟糕,但是这会将原始字符串重新格式化为新的字符串:

sed -r 's/Sample1_(Sample2_)[0-9]*_(..-..-....)_.*-[A-Z_]*(_[0-9][0-9]*_)._Sample4_(.)/\1\2\3\4/'

我确信有更简单的方法可以实现同样的目标。

如果您能读懂 bash,下面是一个如何使用它重命名的示例:

for i in $(ls);do mv $i $(echo $i|sed -r 's/Sample1_(Sample2_)[0-9]*_(..-..-....)_.*-[A-Z_]*(_[0-9][0-9]*_)._Sample4_(.*)/\1\2\3\4/');done

毫无疑问,在 powershell 中编写类似的脚本相当简单,但这留给读者练习吧 :P

編輯:錯誤

EDIT2:看了看我写的内容,可能很难理解,所以我将尝试展示我想要做的事情:

总的来说,正则表达式会读取该行并将我们想要保留的部分括在括号中。它们称为模式。读取该行后,丢弃所选模式以外的所有内容。

sed -r   //-r switch is here only to allow the use of parens without escaping them. It's confusing enough without backslashes.
's/      //s is the command, stands for subtitute. syntax s/[search pattern]/[replace pattern]/. string matching SP is replaced with RP.
         //Here I use the command to match the whole line and save the parts I want.

Sample1_(Sample2_)  //set "Sample2_" as first pattern
[0-9]*_(..-..-....) //read onwards and skip zero or more numerals ([0-9]*) between two underscores. Read xx-xx-xxxx as second pattern where x is any character
_.*-[A-Z_]*(_[0-9][0-9]*_) //after underscore, skip any number of characters (.*) until run across dash. after that, skip any number of capital letters and underscores until you run into underscore followed by more than one numeral and underscore (_[0-9][0-9]*_). Save that as pat 3
._Sample4_(.*) //grab everything after Sample4_ as pat 4
/\1\2\3\4/'   //First slash ends the search pattern for the s command and begin the . After that, \1, \2, \3 and \4 insert patterns we saved in search part discarding the rest. final slash ends the s command.

正则表达式虽然难以阅读,但编写起来却很容易。这也意味着它很容易出错,调试起来也很难,但你不可能拥有一切。

这是 basic/python/pseudocode-ish 格式的 shell 脚本的要点。

for OLDNAME in DIRECTORY
     let NEWNAME = output of sed command with OLDNAME piped as input.
     rename OLDNAME NEWNAME
next

相关内容