bash +如何验证每行中的所有单词具有相同的计数

bash +如何验证每行中的所有单词具有相同的计数

如何验证每行中的所有单词/字符串的计数相同

如果每行中的所有单词具有相同的计数,则语法将返回 true 和计数的单词数

如果行的计数不同,语法将返回 false 且 count=NA

例如下面的例子,我们会得到真的计数=5

sdb sde sdc sdf sdd
sdc sdb sde sdd sdf
sdb sdc sde sdf sdd
sde sdb sdd sdc sdf
sdc sde sdd sdb sdf

关于以下示例,我们将得到错误的计数=NA

sdb sde sdc sdf sdd
sdc sdb sde sdd sdf
sdb sdc sde sdf 
sde sdb sdd sdc sdf
sde sdd sdb sdf

关于以下示例的另一个示例,我们将得到错误的计数=NA

sdb sde sdc sdf sdd
sdc sdb sde sdd sdf
sdb sdc sde sdf 
sde sdb sdd sdc sdf
sde sdd sdb sdf sde 

答案1

使用awk

awk 'BEGIN { r = "true" } NR == 1 { n = NF; next } NF != n { r = "false"; n = "N/A"; exit } END { printf("status=%s count=%s\n", r, n) }' somefilename

或者作为awk脚本:

#!/usr/bin/awk -f

BEGIN { r = "true" }

NR == 1 { n = NF; next }
NF != n { r = "false"; n = "N/A"; exit }

END { printf("status=%s count=%s\n", r, n) }

该脚本将从设置r(如“结果”中)开始true(我们假设它将是 true 而不是 false)。然后它初始化n(如“number”)为第一行的字段数。

如果输入数据中的任何其他行具有不同数量的字段,则r设置为falsen设置为N/A并且脚本退出(通过END块)。

最后,打印r和的当前值。n


该脚本的输出将类似于

status=true count=5

或者

status=false count=N/A

这可以与exportorbashdeclare, or一起使用eval

declare $( awk '...' somefilename )

这将创建 shell 变量countstatus并且这些变量将在调用 shell 中可用。

答案2

您可以使用关联数组来保存每个计数的数量:

#!/bin/bash
declare -A seen
while read -a line ; do
    (( seen[${#line[@]}]++ ))
done

if [[ ${#seen[@]} == 1 ]] ; then
    echo count=${#seen[@]}
    exit
else
    echo count=NA
    exit 1
fi

或者,您可以使用外部工具来完成这项工作。例如,以下脚本使用 Perl 计算单词数(通过其-a自动拆分选项),sort -u获取唯一计数,并wc -l检查是否有一个或多个计数。

#!/bin/bash
out=$(perl -lane 'print scalar @F' | sort -u)
if ((1 == $(wc -l <<<"$out") )) ; then
    echo count=$out
    exit
else
    echo count=NA
    exit 1
fi

答案3

if
  count=$(
    awk 'NR == 1 {print count = NF}
         NF != count {exit 1}' < file
  )
then
  if [ -z "$count" ]; then
    echo "OK? Not OK? file is empty"
  else
    echo "OK all lines have $count words"
  fi
else
  echo >&2 "Not all lines have the same number of words or the file can't be read"
fi

请注意,在最后一部分中,您可以区分不同计数无法打开文件再次与[ -z "$count" ]

答案4

#!/usr/bin/perl

use strict; # get perl to warn us if we try to use an undeclared variable.

# get all words on first line, and store them in a hash
#
# note: it doesn't matter which line we get the word list from because
# we only want to know if all lines have the same number of identical
# words.
my %words = map { $_ => 1 } split (/\s+/,<>);

while(<>) {
  # now do the same for each subsequent line
  my %thisline = map { $_ => 1 } split ;

  # and compare them.  exit with a non-zero exit code if they differ.
  if (%words != %thisline) {
    # optionally print a warning message to STDERR here.
    exit 1;
  }
};

# print the number of words we saw on the first line
print scalar keys %words, "\n";
exit 0

exit 0最后一行不是必需的 - 无论如何,这是默认值。它“有用”仅用于记录返回代码是该程序输出的重要部分。

笔记:这不会计算一行中的重复单词。例如sda sdb sdc sdc sdc将算作3言语,不5因为最后三个词是一样的。如果这很重要,那么哈希值还应该计算每个单词出现的次数。像这样的东西:

#!/usr/bin/perl

use strict;   # get perl to warn us if we try to use an undeclared variable.

# get all words on first line, and store them in a hash
#
# note: it doesn't matter which line we get the word list from because
# we only want to know if all lines have the same number of identical
# words.
my %words=();
$words{$_}++ for split (/\s+/,<>);

while(<>) {
  # now do the same for each subsequent line
  my %thisline=();
  $thisline{$_}++ for split;

  # and compare them.  exit with a non-zero exit code if they differ.
  if (%words != %thisline) {
    # optionally print a warning message to STDERR here
    exit 1;
  }
};

# add up the number of times each word was seen  on the first line  
my $count=0;
foreach (keys %words) {
  $count += $words{$_};
};

# print the total
print "$count\n";
exit 0;

显着的区别在于散列数组的填充方式。在第一个版本中,它只是将每个键(“单词”)的值设置为 1。在第二个版本中,它计算每个键出现的次数。

第二个版本还必须将每个键的值相加,它不能只打印看到的键的数量。

相关内容