我试图从包含名称的文本文件中查找罗马化的韩语名称并确定匹配类型的计数,然后打印计数。为此,我制作了一个 AWK 脚本,但运行此脚本会导致 regex 变量和 周围的第二个块出现语法错误'{' '}'
。正则表达式变量保存我的正则表达式模式。
这是我的代码:
BEGIN {
correctMatch = 0;
falsePositive = 0;
falseNegative = 0;
correctNonMatch = 0;
regex = (g|kk|n|d|tt|r|m|b|pp|s|ss|j|jj|ch|tch|k|t|p|h)?(a|ae|ya|yae|eo|e|yeo|ye|o|wa|wae|oe|yo|u|wo|we|wi|yu|eu|ui|i|oo|ah)(k|n|t|l|m|p|ng)? (g|kk|n|d|tt|r|m|b|pp|s|ss|j|jj|ch|tch|k|t|p|h)?(a|ae|ya|yae|eo|e|yeo|ye|o|wa|wae|oe|yo|u|wo|we|wi|yu|eu|ui|i|oo|ah)(k|n|t|l|m|p|ng)?-?(g|kk|n|d|tt|r|m|b|pp|s|ss|j|jj|ch|tch|k|t|p|h)?(a|ae|ya|yae|eo|e|yeo|ye|o|wa|wae|oe|yo|u|wo|we|wi|yu|eu|ui|i|oo|ah)(k|n|t|l|m|p|ng)?;
}
{
$NF=="Korean" && tolower($0)~regex {correctMatch = correctMatch + 1}
$NF!="Korean" && tolower($0)~regex {falsePositive = falsePositive + 1}
$NF=="Korean" && tolower($0)!~regex {falseNegative = falseNegative + 1}
$NF!="Korean" && tolower($0)!~regex {correctNonMatch = correctNonMatch + 1}
}
END {
print "Correct Match:" correctMatch;
print "False Positive:" falsePositive;
print "False Negative:" falseNegative;
print "Non Correct-Match:" correctNonMatch;
}
答案1
正如评论中已经提到的,您不能只在 awk 操作部分内的操作之前添加条件(即在 之间{...}
),就像在 C 程序中添加条件一样。要解决这个问题并解决其他低效率和不必要的代码重复问题,请将其更改为:
BEGIN {
regex = "(g|kk|n|d|tt|r|m|b|pp|s|ss|j|jj|ch|tch|k|t|p|h)?(a|ae|ya|yae|eo|e|yeo|ye|o|wa|wae|oe|yo|u|wo|we|wi|yu|eu|ui|i|oo|ah)(k|n|t|l|m|p|ng)? (g|kk|n|d|tt|r|m|b|pp|s|ss|j|jj|ch|tch|k|t|p|h)?(a|ae|ya|yae|eo|e|yeo|ye|o|wa|wae|oe|yo|u|wo|we|wi|yu|eu|ui|i|oo|ah)(k|n|t|l|m|p|ng)?-?(g|kk|n|d|tt|r|m|b|pp|s|ss|j|jj|ch|tch|k|t|p|h)?(a|ae|ya|yae|eo|e|yeo|ye|o|wa|wae|oe|yo|u|wo|we|wi|yu|eu|ui|i|oo|ah)(k|n|t|l|m|p|ng)?"
}
{
hitNf = ( $NF == "Korean" )
hitRe = ( tolower($0) ~ regex )
correctMatch += ( hitNf && hitRe )
falsePositive += ( !hitNf && hitRe )
falseNegative += ( hitNf && !hitRe )
correcNonMatch += ( !hitNf && !hitRe )
}
END {
print "Correct Match:" correctMatch+0
print "False Positive:" falsePositive+0
print "False Negative:" falseNegative+0
print "Non Correct-Match:" correctNonMatch+0
}
使用上面的结构,您显然可以直接测试正则表达式,而不是先将其存储在变量中。顺便说一句,在您的代码中regex = foo
,您没有将正则表达式存储在名为 的变量中regex
。我在上面为您修复了这个问题regex = "foo"
(动态正则表达式),但较新版本的 GNU awk 也支持使用regex = @/foo/
.看https://www.gnu.org/software/gawk/manual/gawk.html#Compulated-Regexps和https://www.gnu.org/software/gawk/manual/gawk.html#Regexp-Constants。