我的公司每天都会收到一份需要处理的文件列表,而这些文件名几乎是我们系统无法处理的。有没有办法重命名这些文件?我对任何类型的脚本都是新手,所以我不知道从哪里开始。我使用的是 Windows 系统。我试过使用批量重命名实用程序,但我不知道如何删除 AB_C_D_,而且它有时会因为一些我还没有弄清楚的未知原因而出错。有没有办法使用 PowerShell 重命名这些文件?
现在文件名如下所示:
Sample1_Sample2_1_05-11-2015_0_Sample3-AB_C_D_045_4_Sample4_123456.pdf
这就是我想要做的:
- 删除
Sample1
(始终相同) - 保留
Sample2
文件名以 开头Sample2
(始终相同) - 消除
_1
- 留下日期(未来的日期,将会改变)
- 删除
0_Sample3
(始终相同)
-保留页码(045,每个文件都不同)并将其放在日期之后。-
删除_4_Sample4_-
保留 123456(这是一个识别号,每个文件都不同)。
主要问题是我想删除 AB_C_D_,这些字母会发生变化。可能会多一些或少一些(例如 A_C_D_),我不知道如何删除这部分。
因此最终的文件名将是Sample2_05-11-2015_045_123456.pdf
如果有人能帮助我或者指出如何做到这一点的正确方向,我将非常感激!
提前致谢,HH-GeekyGal
答案1
此 Powershell 脚本将按您需要的方式重命名文件。将其另存为RenameFiles.ps1
并从 PowerShell 控制台运行。
脚本接受以下参数:
- 小路:必需,磁盘上现有的文件夹,用于存储文件。您可以提供多个路径。
- 递归:可选开关,控制递归。如果指定,脚本将重命名所有子文件夹中的文件。
- 如果什么:可选开关,如果指定,脚本将仅报告新旧文件名。不会进行重命名。
示例(从 PowerShell 控制台运行):
重命名文件夹中的所有文件
c:\path\to\files
:.\RenameFiles.ps1 -Path 'c:\path\to\files'
重命名文件夹中的所有
pdf
文件c:\path\to\files
:.\RenameFiles.ps1 -Path 'c:\path\to\files\*.pdf'
重命名文件夹中的所有
pdf
文件c:\path\to\files
,递归.\RenameFiles.ps1 -Path 'c:\path\to\files\*.pdf' -Recurse
扫描多个文件夹中的文件,递归,仅报告(不重命名):
.\RenameFiles.ps1 -Path 'c:\path\A\*.pdf', 'c:\path\B\*.psd' -Recurse -WhatIf
RenameFiles.ps1
脚本本身:
# Arguments accepted by script
Param
(
# One or multiple paths, as array of strings
[Parameter(Mandatory = $true, ValueFromPipeline = $true)]
[string[]]$Path,
# Recurse switch
[switch]$Recurse,
# Whatif switch
[switch]$WhatIf
)
# This function transforms long file name (w\o extension) to short via regex
function Split-FileName
{
[CmdletBinding()]
Param
(
# Original file name
[Parameter(Mandatory = $true, ValueFromPipeline = $true)]
[string]$FileName
)
Begin
{
# You can change this block to adapt new rules for file renaming,
# without modifying other parts of script.
# Regex to match, capture groups are used to build new file name
$Regex = '(Sample2).*(\d{2}-\d{2}-\d{4}).*(?<=[a-z]_)(\d+)(?=_\d+).*(?<=_)(\d+)$'
# Scriptblock that builds new file name. $Matches is hashtable, but we need array for the format (-f) operator.
# So this code: @(0..$Matches.Count | ForEach-Object {$Matches[$_]})} transforms it to the array.
# Basically, we creating a new array of integers from 0 to count of $Matches keys, e.g. @(0,1,2,3,4,5)
# and passing it down the pipeline. Then, in the foreach loop we output values of $Matches keys which name
# match the current pipeline object, e.g. $Matches['1'], $Matches['2'], etc.
# $Matches['0'] holds whole matched string, other keys hold capture groups.
# This would also work:
# $NewFileName = {'{0}_{1}_{2}_{3}{4}' -f $Matches['1'], $Matches['2'], $Matches['3'], $Matches['4'], $Matches['5']
$NewFileName = {'{1}_{2}_{3}_{4}{5}' -f @(0..$Matches.Count | ForEach-Object {$Matches[$_]})}
}
Process
{
# If original file name matches regex
if($FileName -match $Regex)
{
# Call scriptblock to generate new file name
. $NewFileName
}
}
}
# For each path, get all file objects
Get-ChildItem -Path $Path -Recurse:$Recurse |
# That are not directory
Where-Object {!$_.PsIsContainer} |
# For each file
ForEach-Object {
# Try to create new file name
$NewBaseName = $_.BaseName | Split-FileName
if($NewBaseName)
{
# If file name matched regex and we've got a new file name...
# Build full path for the file with new name
$NewFullName = Join-Path -Path $_.DirectoryName -ChildPath ($NewBaseName + $_.Extension)
if(Test-Path -Path $NewFullName -PathType Leaf)
{
# If such file already exists, show error message
Write-Host "File already exist: $NewFullName"
}
else
{
# If not, rename it or just show report, depending on WhatIf switch
Rename-Item -Path $_.FullName -NewName $NewFullName -WhatIf:$WhatIf -Force
}
}
}
此脚本中使用的正则表达式:https://regex101.com/r/hT2uN9/2(请注意,默认情况下,PowerShell 的正则表达式不区分大小写)。正则表达式解释的副本在此处:
正则表达式:
(Sample2).*(\d{2}-\d{2}-\d{4}).*(?<=[a-z]_)(\d+)(?=_\d+).*(?<=_)(\d+)$
示例2细绳:
1st Capturing group (Sample2)
Sample2 matches the characters Sample2 literally (case insensitive)
任何角色(未被捕获且不存在于$Matches
变量中):
.* matches any character (except newline)
Quantifier: * Between zero and unlimited times, as many times as possible,
giving back as needed [greedy]
日期:
2nd Capturing group (\d{2}-\d{2}-\d{4})
\d{2} match a digit [0-9]
Quantifier: {2} Exactly 2 times
- matches the character - literally
\d{2} match a digit [0-9]
Quantifier: {2} Exactly 2 times
- matches the character - literally
\d{4} match a digit [0-9]
Quantifier: {4} Exactly 4 times
任何角色(未被捕获且不存在于$Matches
变量中):
.* matches any character (except newline)
Quantifier: * Between zero and unlimited times, as many times as possible,
giving back as needed [greedy]
页数:
(?<=[a-z]_) Positive Lookbehind - Assert that the regex below can be matched
[a-z] match a single character present in the list below
a-z a single character in the range between a and z (case insensitive)
_ matches the character _ literally
3rd Capturing group (\d+)
\d+ match a digit [0-9]
Quantifier: + Between one and unlimited times, as many times as possible,
giving back as needed [greedy]
(?=_\d+) Positive Lookahead - Assert that the regex below can be matched
_ matches the character _ literally
\d+ match a digit [0-9]
Quantifier: + Between one and unlimited times, as many times as possible,
giving back as needed [greedy]
任何角色(未被捕获且不存在于$Matches
变量中):
.* matches any character (except newline)
Quantifier: * Between zero and unlimited times, as many times as possible,
giving back as needed [greedy]
身份证号:
(?<=_) Positive Lookbehind - Assert that the regex below can be matched
_ matches the character _ literally
4th Capturing group (\d+)
\d+ match a digit [0-9]
Quantifier: + Between one and unlimited times, as many times as possible,
giving back as needed [greedy]
答案2
就像 Karan 链接的那样,正则表达式是实现此目的的方法。我使用的是 Linux,因此我不确定 powershell 是否有合适的内置程序,但如果没有,请从 sourceforge 下载适用于 Windows 的 sed。它简直太棒了。
我的 sed-fu 很糟糕,但是这会将原始字符串重新格式化为新的字符串:
sed -r 's/Sample1_(Sample2_)[0-9]*_(..-..-....)_.*-[A-Z_]*(_[0-9][0-9]*_)._Sample4_(.)/\1\2\3\4/'
我确信有更简单的方法可以实现同样的目标。
如果您能读懂 bash,下面是一个如何使用它重命名的示例:
for i in $(ls);do mv $i $(echo $i|sed -r 's/Sample1_(Sample2_)[0-9]*_(..-..-....)_.*-[A-Z_]*(_[0-9][0-9]*_)._Sample4_(.*)/\1\2\3\4/');done
毫无疑问,在 powershell 中编写类似的脚本相当简单,但这留给读者练习吧 :P
編輯:錯誤
EDIT2:看了看我写的内容,可能很难理解,所以我将尝试展示我想要做的事情:
总的来说,正则表达式会读取该行并将我们想要保留的部分括在括号中。它们称为模式。读取该行后,丢弃所选模式以外的所有内容。
sed -r //-r switch is here only to allow the use of parens without escaping them. It's confusing enough without backslashes.
's/ //s is the command, stands for subtitute. syntax s/[search pattern]/[replace pattern]/. string matching SP is replaced with RP.
//Here I use the command to match the whole line and save the parts I want.
Sample1_(Sample2_) //set "Sample2_" as first pattern
[0-9]*_(..-..-....) //read onwards and skip zero or more numerals ([0-9]*) between two underscores. Read xx-xx-xxxx as second pattern where x is any character
_.*-[A-Z_]*(_[0-9][0-9]*_) //after underscore, skip any number of characters (.*) until run across dash. after that, skip any number of capital letters and underscores until you run into underscore followed by more than one numeral and underscore (_[0-9][0-9]*_). Save that as pat 3
._Sample4_(.*) //grab everything after Sample4_ as pat 4
/\1\2\3\4/' //First slash ends the search pattern for the s command and begin the . After that, \1, \2, \3 and \4 insert patterns we saved in search part discarding the rest. final slash ends the s command.
正则表达式虽然难以阅读,但编写起来却很容易。这也意味着它很容易出错,调试起来也很难,但你不可能拥有一切。
这是 basic/python/pseudocode-ish 格式的 shell 脚本的要点。
for OLDNAME in DIRECTORY
let NEWNAME = output of sed command with OLDNAME piped as input.
rename OLDNAME NEWNAME
next