如何提取关键字之间的文本?文本保存在 txt 或 json 文件中。输入如下。“适应环境和项目挑战\n 有能力管理问题、沟通和影响技能,对卓越技术和用户体验充满热情\n 卓越的组织能力,”
关键词是“能力”、“技能”和“经验”。输出应该是这些关键词之间的文本。在此示例中,输出应该是:
管理问题、沟通和影响对伟大技术和用户体验的热情\n卓越的组织
正则表达式必须准备好接受 4 或 5 个关键字。这可能吗?
我使用了下面的代码,但只有当文本在程序中而不是在 txt 文件中时,它才有效。这只适用于 2 个关键字。我需要几个。
$file = 'C:\Users\Acer Nitro\Desktop\perl\sim.txt';
open(SESAME, $file);
while(<SESAME>)
{
$text .= $_;
}
close(SESAME);
print $text;
($re=$text)=~s/((\bskill\b)|(\bability\b)|.)/${[')','']}[!$3]\Q$1\E${['(','']}[!$2]/gs;
@$ = (eval{/$re/},$@);
print join"\n",@$ unless $$[-1]=~/unmatched/;
你能帮助我吗?
答案1
我认为你必须改变你的正则表达式。“\ability”和“\skill”可能不是你想要的。“\a”是“bell”的字符,“\s”是空格字符的匹配。
您想要捕获的文本部分可以与括号中的正则表达式的适当部分匹配。当整个 RE 找到匹配项时,可以使用 $1、$2 等访问部分匹配的部分。也许……'(\w+)\s+(ability|skill)\s+(\w+)'
答案2
你的脚本有很多错误,我已经重写并简化了它<
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
# file to search
my $file = 'C:\Users\Acer Nitro\Desktop\perl\sim.txt';
open my $fh, '<', $file or die "unable to open '$file' for reading: $!";
# read whole file in a single string
undef $/;
my $full = <$fh>;
# search text between keywords
my @found = $full =~ /\b(?:ability|skills|experience)\b\R?\K(.+?)(?=\b(?:ability|skills|experience)\b)/gsi;
# dump the result
print Dumper\@found;
给定示例的输出:
$VAR1 = [
' to manage issues, communications and influencing ',
',Passion for great technology and user ',
'Exceptional organizational '
];
正则表达式解释:
/ # regex delimiter
\b # word boundary
(?: # non capture group
ability # literally
| # OR
skills # literally
| # OR
experience # literally
) # end group
\b # word boundary
\R? # optional linebreak
\K # forget all we have seen until this position
(.+?) # group 1, the text we want
(?= # positive lookahead
\b # word boundary
(?: # non capture group
ability # literally
| # OR
skills # literally
| # OR
experience # literally
) # end group
\b # word boundary
) # end lookahead
/gsi # delimiter, global; dot matches newline; case insensitive