如何验证每行中的所有单词/字符串的计数相同
如果每行中的所有单词具有相同的计数,则语法将返回 true 和计数的单词数
如果行的计数不同,语法将返回 false 且 count=NA
例如下面的例子,我们会得到真的和计数=5
sdb sde sdc sdf sdd
sdc sdb sde sdd sdf
sdb sdc sde sdf sdd
sde sdb sdd sdc sdf
sdc sde sdd sdb sdf
关于以下示例,我们将得到错误的和计数=NA
sdb sde sdc sdf sdd
sdc sdb sde sdd sdf
sdb sdc sde sdf
sde sdb sdd sdc sdf
sde sdd sdb sdf
关于以下示例的另一个示例,我们将得到错误的和计数=NA
sdb sde sdc sdf sdd
sdc sdb sde sdd sdf
sdb sdc sde sdf
sde sdb sdd sdc sdf
sde sdd sdb sdf sde
答案1
使用awk
:
awk 'BEGIN { r = "true" } NR == 1 { n = NF; next } NF != n { r = "false"; n = "N/A"; exit } END { printf("status=%s count=%s\n", r, n) }' somefilename
或者作为awk
脚本:
#!/usr/bin/awk -f
BEGIN { r = "true" }
NR == 1 { n = NF; next }
NF != n { r = "false"; n = "N/A"; exit }
END { printf("status=%s count=%s\n", r, n) }
该脚本将从设置r
(如“结果”中)开始true
(我们假设它将是 true 而不是 false)。然后它初始化n
(如“number”)为第一行的字段数。
如果输入数据中的任何其他行具有不同数量的字段,则r
设置为false
并n
设置为N/A
并且脚本退出(通过END
块)。
最后,打印r
和的当前值。n
该脚本的输出将类似于
status=true count=5
或者
status=false count=N/A
这可以与export
orbash
的declare
, or一起使用eval
:
declare $( awk '...' somefilename )
这将创建 shell 变量count
,status
并且这些变量将在调用 shell 中可用。
答案2
您可以使用关联数组来保存每个计数的数量:
#!/bin/bash
declare -A seen
while read -a line ; do
(( seen[${#line[@]}]++ ))
done
if [[ ${#seen[@]} == 1 ]] ; then
echo count=${#seen[@]}
exit
else
echo count=NA
exit 1
fi
或者,您可以使用外部工具来完成这项工作。例如,以下脚本使用 Perl 计算单词数(通过其-a
自动拆分选项),sort -u
获取唯一计数,并wc -l
检查是否有一个或多个计数。
#!/bin/bash
out=$(perl -lane 'print scalar @F' | sort -u)
if ((1 == $(wc -l <<<"$out") )) ; then
echo count=$out
exit
else
echo count=NA
exit 1
fi
答案3
if
count=$(
awk 'NR == 1 {print count = NF}
NF != count {exit 1}' < file
)
then
if [ -z "$count" ]; then
echo "OK? Not OK? file is empty"
else
echo "OK all lines have $count words"
fi
else
echo >&2 "Not all lines have the same number of words or the file can't be read"
fi
请注意,在最后一部分中,您可以区分不同计数和无法打开文件再次与[ -z "$count" ]
。
答案4
#!/usr/bin/perl
use strict; # get perl to warn us if we try to use an undeclared variable.
# get all words on first line, and store them in a hash
#
# note: it doesn't matter which line we get the word list from because
# we only want to know if all lines have the same number of identical
# words.
my %words = map { $_ => 1 } split (/\s+/,<>);
while(<>) {
# now do the same for each subsequent line
my %thisline = map { $_ => 1 } split ;
# and compare them. exit with a non-zero exit code if they differ.
if (%words != %thisline) {
# optionally print a warning message to STDERR here.
exit 1;
}
};
# print the number of words we saw on the first line
print scalar keys %words, "\n";
exit 0
(exit 0
最后一行不是必需的 - 无论如何,这是默认值。它“有用”仅用于记录返回代码是该程序输出的重要部分。
笔记:这不会计算一行中的重复单词。例如sda sdb sdc sdc sdc
将算作3言语,不5因为最后三个词是一样的。如果这很重要,那么哈希值还应该计算每个单词出现的次数。像这样的东西:
#!/usr/bin/perl
use strict; # get perl to warn us if we try to use an undeclared variable.
# get all words on first line, and store them in a hash
#
# note: it doesn't matter which line we get the word list from because
# we only want to know if all lines have the same number of identical
# words.
my %words=();
$words{$_}++ for split (/\s+/,<>);
while(<>) {
# now do the same for each subsequent line
my %thisline=();
$thisline{$_}++ for split;
# and compare them. exit with a non-zero exit code if they differ.
if (%words != %thisline) {
# optionally print a warning message to STDERR here
exit 1;
}
};
# add up the number of times each word was seen on the first line
my $count=0;
foreach (keys %words) {
$count += $words{$_};
};
# print the total
print "$count\n";
exit 0;
显着的区别在于散列数组的填充方式。在第一个版本中,它只是将每个键(“单词”)的值设置为 1。在第二个版本中,它计算每个键出现的次数。
第二个版本还必须将每个键的值相加,它不能只打印看到的键的数量。