Bash 脚本用完整路径替换文件中的附件路径

Question 1

以下是执行此操作的 Perl 脚本的三个版本。它们三个都要求第一个参数是要搜索的目录（例如./app或./）。其余参数是要修改的任何 Markdown 文件的名称（例如./app/file1.md或./app/*.md）

它们都是为了仅搜索.png文件而编写的，但是可以通过更改所使用的正则表达式和全局变量来轻松更改。

#!注意：对于所有三个脚本，如果您希望脚本修改 Markdown 文件而不是仅打印到标准输出，请删除第一行。首先＃！用于测试，验证它是否符合您的要求。第二个实际上修改了 markdown 文件（并将原始文件复制到 .bak -如果您不想制作备份副本，-i.bak只需更改它）。-i查看man perlrun并搜索-i有关此选项如何工作的详细信息。

另请注意，文件::基本名称和文件::查找使用的模块是 perl 核心库模块，包含在 perl 中。 File::Basename本质上执行命令的操作basename，并File::Find像命令一样递归搜索目录find。

为什么是 Perl 而不是 sh 或 bash？因为 shell 对于文本或数据处理来说是一种糟糕的语言。看为什么使用 shell 循环处理文本被认为是不好的做法？出于某些原因。 Shell 的工作是协调其他节目做数据处理工作，而不是做数据处理本身。使用shell进行数据处理，就像需要螺丝刀时使用铲子，需要钢包时使用叉子一样。

所有三个版本都使用以下文件和目录结构进行了测试：

app/attachments/img1.png
app/attachments/img2.png
app/attachments/more/img5.png
app/file1.md
app/file2.md
app/file3.md
app/img4.png
app/other/img3.png

第一个版本

如果附件文件只能在 ./app 或 ./app/attachments 中找到，第一个版本很有用。

如果在标记指定的位置找到附件![[filename]]，则将其保留原样。如果未找到，脚本将首先在顶级目录中查找，然后在 Attachments/ 子目录中查找。

$ cat fix-paths1.pl 
#!/usr/bin/perl -p
#!/usr/bin/perl -p -i.bak

BEGIN { $dir = shift };

use File::Basename;

if (/!\[\[([^]]*\.png)\]\]/i) {
  $file = $1;
  next if -f $file1;
  $bn = fileparse($file);

  if (-f "$dir/$bn") {
    s/$file/$bn/
  } elsif (-f "$dir/attachments/$bn") {
    s/$file/attachments\/$bn/
  } else {
    print STDERR "WARNING: Attachment '$file' does not exist. $ARGV:$.\n"
  };
}

示例运行 - file1.md 与您的问题相同：

$ ./fix-paths1.pl ./app/ app/file1.md 
Here is an image:
![[attachments/img1.png]]

Here is another image:
![[attachments/img2.png]]

第二版

如果可以在 ./app/ 的任何直接子目录中找到文件，则第二个版本很有用 - 即 app/attachments/ 但不是 app/attachments/more/

它使用 perl 的函数在指定目录 ( ) 和所有直接子目录中glob构建文件数组。该数组用作所有匹配文件的缓存，因为搜索目录是一项相当“昂贵”的操作 - 绝对是您不想在循环中重复执行的操作。.png./app/

$ cat fix-paths2.pl
#!/usr/bin/perl -p
#!/usr/bin/perl -p -i.bak

use File::Basename;

BEGIN {
  $dir = shift;
  $dir =~ s:/+$::;

  @png = glob("$dir/*.png");
  push @png, glob("$dir/*/*.png");
  @png = map { s:^$dir/:: ? $_ : $_ } @png;
};

if (/!\[\[([^]]*\.png)\]\]/i) {
  $file = $1;
  next if -f $file1;

  $bn = fileparse($file);

  ($found) = grep { m:(^|/)$bn$: } @png;

  if ($found) {
    s/$file/$found/;
  } else {
    print STDERR "WARNING: Attachment '$file' does not exist. $ARGV:$.\n"
  };
}

示例运行：

$ cat app/file2.md 
Here is an image:
![[img1.png]]

Here is another image:
![[attachments/img2.png]]

and another:
![[img3.png]]



$ ./fix-paths2.pl ./app/ ./app/file2.md 
Here is an image:
![[attachments/img1.png]]

Here is another image:
![[attachments/img2.png]]

and another:
![[other/img3.png]]

此版本找到并更正了 img1.png 和 img3.png 的路径。

第三版

如果可以在的任何子目录中找到附件文件，则第三个版本非常有用./app/，无论它们在目录树中有多深。这个版本和第二个版本之间的唯一区别是它填充数组的方式@png。第二个版本使用该glob()函数，而第三个版本使用File::Find.

用于缓存搜索结果的 @png 数组在这里真正显示了它的价值 - 递归目录搜索是比“简单”全局搜索更昂贵的操作。

$ cat fix-paths3.pl
#!/usr/bin/perl -p
#!/usr/bin/perl -p -i.bak

use File::Basename;
use File::Find;

BEGIN {
  $dir = shift;
  $dir =~ s:/+$::;

  sub wanted {
    if (m/\.png$/) {
      ($f = $File::Find::name) =~ s:^$dir/::;
      push @png, "$f";
    };
  };

  find(\&wanted, $dir);
};

if (/!\[\[([^]]*\.png)\]\]/i) {
  $file = $1;
  next if -f $file1;

  $bn = fileparse($file);

  ($found) = grep { m:(^|/)$bn$: } @png;

  if ($found) {
    s/$file/$found/;
  } else {
    print STDERR "WARNING: Attachment '$file' does not exist. $ARGV:$.\n"
  };
}

示例运行：

$ cat app/file3.md 
Here is an image:
![[img1.png]]

Here is another image:
![[attachments/img2.png]]

and another:
![[img3.png]]

and another:
![[attachments/img4.png]]

and another:
![[other/img5.png]]



$ ./fix-paths3.pl ./app/ ./app/file3.md 
Here is an image:
![[attachments/img1.png]]

Here is another image:
![[attachments/img2.png]]

and another:
![[other/img3.png]]

and another:
![[img4.png]]

and another:

![[attachments/more/img5.png]]

该版本还发现，img5.png尽管attachments/more/file3.md 文件说它位于other/

错误

从中删除前导和尾随空格可能是值得的$file，具体取决于附件文件名中是否有多余的空格以及 Markdown 解释器处理多余空格的严格程度。在之后添加以下行$file = $1;：
```
$file =~ s/^\s*|\s*$//g;
```
如果在 markdown 文件所说的位置找不到 .png 文件，第二个和第三个版本将返回第一的匹配文件，即使有多个同名文件（是的，这更多的是一种设计决策，而不是实际的错误 - 我选择以这种方式编写）。有时这可能不是您期望的文件 - 这是以下情况的自然结果吉戈规则。

这可以通过计算匹配数来“修复”（提示：perl 的内置grep函数返回一个数组 - 上面的脚本丢弃除第一个结果之外的所有结果。该$found变量可以用数组变量替换@found）并打印错误如果有多个，或者有某种启发法，优先选择某些目录中的附件文件而不是其他目录（或者优先选择较新的文件而不是较旧的文件，或者较旧的文件而不是较新的文件，或者......）。真正的解决方法是编辑输入 Markdown 文件以避免歧义。

perldoc -f grep有关 perl 函数的详细信息，请参见参考资料grep。

Answer

以下是执行此操作的 Perl 脚本的三个版本。它们三个都要求第一个参数是要搜索的目录（例如./app或./）。其余参数是要修改的任何 Markdown 文件的名称（例如./app/file1.md或./app/*.md）

它们都是为了仅搜索.png文件而编写的，但是可以通过更改所使用的正则表达式和全局变量来轻松更改。

#!注意：对于所有三个脚本，如果您希望脚本修改 Markdown 文件而不是仅打印到标准输出，请删除第一行。首先＃！用于测试，验证它是否符合您的要求。第二个实际上修改了 markdown 文件（并将原始文件复制到 .bak -如果您不想制作备份副本，-i.bak只需更改它）。-i查看man perlrun并搜索-i有关此选项如何工作的详细信息。

另请注意，文件::基本名称和文件::查找使用的模块是 perl 核心库模块，包含在 perl 中。 File::Basename本质上执行命令的操作basename，并File::Find像命令一样递归搜索目录find。

为什么是 Perl 而不是 sh 或 bash？因为 shell 对于文本或数据处理来说是一种糟糕的语言。看为什么使用 shell 循环处理文本被认为是不好的做法？出于某些原因。 Shell 的工作是协调其他节目做数据处理工作，而不是做数据处理本身。使用shell进行数据处理，就像需要螺丝刀时使用铲子，需要钢包时使用叉子一样。

所有三个版本都使用以下文件和目录结构进行了测试：

app/attachments/img1.png
app/attachments/img2.png
app/attachments/more/img5.png
app/file1.md
app/file2.md
app/file3.md
app/img4.png
app/other/img3.png

第一个版本

如果附件文件只能在 ./app 或 ./app/attachments 中找到，第一个版本很有用。

如果在标记指定的位置找到附件![[filename]]，则将其保留原样。如果未找到，脚本将首先在顶级目录中查找，然后在 Attachments/ 子目录中查找。

$ cat fix-paths1.pl 
#!/usr/bin/perl -p
#!/usr/bin/perl -p -i.bak

BEGIN { $dir = shift };

use File::Basename;

if (/!\[\[([^]]*\.png)\]\]/i) {
  $file = $1;
  next if -f $file1;
  $bn = fileparse($file);

  if (-f "$dir/$bn") {
    s/$file/$bn/
  } elsif (-f "$dir/attachments/$bn") {
    s/$file/attachments\/$bn/
  } else {
    print STDERR "WARNING: Attachment '$file' does not exist. $ARGV:$.\n"
  };
}

示例运行 - file1.md 与您的问题相同：

$ ./fix-paths1.pl ./app/ app/file1.md 
Here is an image:
![[attachments/img1.png]]

Here is another image:
![[attachments/img2.png]]

第二版

如果可以在 ./app/ 的任何直接子目录中找到文件，则第二个版本很有用 - 即 app/attachments/ 但不是 app/attachments/more/

它使用 perl 的函数在指定目录 ( ) 和所有直接子目录中glob构建文件数组。该数组用作所有匹配文件的缓存，因为搜索目录是一项相当“昂贵”的操作 - 绝对是您不想在循环中重复执行的操作。.png./app/

$ cat fix-paths2.pl
#!/usr/bin/perl -p
#!/usr/bin/perl -p -i.bak

use File::Basename;

BEGIN {
  $dir = shift;
  $dir =~ s:/+$::;

  @png = glob("$dir/*.png");
  push @png, glob("$dir/*/*.png");
  @png = map { s:^$dir/:: ? $_ : $_ } @png;
};

if (/!\[\[([^]]*\.png)\]\]/i) {
  $file = $1;
  next if -f $file1;

  $bn = fileparse($file);

  ($found) = grep { m:(^|/)$bn$: } @png;

  if ($found) {
    s/$file/$found/;
  } else {
    print STDERR "WARNING: Attachment '$file' does not exist. $ARGV:$.\n"
  };
}

示例运行：

$ cat app/file2.md 
Here is an image:
![[img1.png]]

Here is another image:
![[attachments/img2.png]]

and another:
![[img3.png]]



$ ./fix-paths2.pl ./app/ ./app/file2.md 
Here is an image:
![[attachments/img1.png]]

Here is another image:
![[attachments/img2.png]]

and another:
![[other/img3.png]]

此版本找到并更正了 img1.png 和 img3.png 的路径。

第三版

如果可以在的任何子目录中找到附件文件，则第三个版本非常有用./app/，无论它们在目录树中有多深。这个版本和第二个版本之间的唯一区别是它填充数组的方式@png。第二个版本使用该glob()函数，而第三个版本使用File::Find.

用于缓存搜索结果的 @png 数组在这里真正显示了它的价值 - 递归目录搜索是比“简单”全局搜索更昂贵的操作。

$ cat fix-paths3.pl
#!/usr/bin/perl -p
#!/usr/bin/perl -p -i.bak

use File::Basename;
use File::Find;

BEGIN {
  $dir = shift;
  $dir =~ s:/+$::;

  sub wanted {
    if (m/\.png$/) {
      ($f = $File::Find::name) =~ s:^$dir/::;
      push @png, "$f";
    };
  };

  find(\&wanted, $dir);
};

if (/!\[\[([^]]*\.png)\]\]/i) {
  $file = $1;
  next if -f $file1;

  $bn = fileparse($file);

  ($found) = grep { m:(^|/)$bn$: } @png;

  if ($found) {
    s/$file/$found/;
  } else {
    print STDERR "WARNING: Attachment '$file' does not exist. $ARGV:$.\n"
  };
}

示例运行：

$ cat app/file3.md 
Here is an image:
![[img1.png]]

Here is another image:
![[attachments/img2.png]]

and another:
![[img3.png]]

and another:
![[attachments/img4.png]]

and another:
![[other/img5.png]]



$ ./fix-paths3.pl ./app/ ./app/file3.md 
Here is an image:
![[attachments/img1.png]]

Here is another image:
![[attachments/img2.png]]

and another:
![[other/img3.png]]

and another:
![[img4.png]]

and another:

![[attachments/more/img5.png]]

该版本还发现，img5.png尽管attachments/more/file3.md 文件说它位于other/

错误

从中删除前导和尾随空格可能是值得的$file，具体取决于附件文件名中是否有多余的空格以及 Markdown 解释器处理多余空格的严格程度。在之后添加以下行$file = $1;：
```
$file =~ s/^\s*|\s*$//g;
```
如果在 markdown 文件所说的位置找不到 .png 文件，第二个和第三个版本将返回第一的匹配文件，即使有多个同名文件（是的，这更多的是一种设计决策，而不是实际的错误 - 我选择以这种方式编写）。有时这可能不是您期望的文件 - 这是以下情况的自然结果吉戈规则。

这可以通过计算匹配数来“修复”（提示：perl 的内置grep函数返回一个数组 - 上面的脚本丢弃除第一个结果之外的所有结果。该$found变量可以用数组变量替换@found）并打印错误如果有多个，或者有某种启发法，优先选择某些目录中的附件文件而不是其他目录（或者优先选择较新的文件而不是较旧的文件，或者较旧的文件而不是较新的文件，或者......）。真正的解决方法是编辑输入 Markdown 文件以避免歧义。

perldoc -f grep有关 perl 函数的详细信息，请参见参考资料grep。

Question 2

假设您每行只有一个文件链接，并且您要逐行进行操作，那么您sed应该没问题。所以我们有这样的东西：

file=$(sed -n 's/!\[\[\(.*\.png\)]]/\1/gp')
name=$(basename "$file")
fullFile=$(find . -name "$name" -print -quit)

在这里，我使用-quit以便在找到第一个匹配项后立即停止搜索。我希望这能让您走上正确的方向。

Answer

假设您每行只有一个文件链接，并且您要逐行进行操作，那么您sed应该没问题。所以我们有这样的东西：

file=$(sed -n 's/!\[\[\(.*\.png\)]]/\1/gp')
name=$(basename "$file")
fullFile=$(find . -name "$name" -print -quit)

在这里，我使用-quit以便在找到第一个匹配项后立即停止搜索。我希望这能让您走上正确的方向。

Bash 脚本用完整路径替换文件中的附件路径

答案1

第一个版本

第二版

第三版

错误

答案2

相关内容