bash +如何验证每行中的所有单词具有相同的计数

Question 1

使用awk：

awk 'BEGIN { r = "true" } NR == 1 { n = NF; next } NF != n { r = "false"; n = "N/A"; exit } END { printf("status=%s count=%s\n", r, n) }' somefilename

或者作为awk脚本：

#!/usr/bin/awk -f

BEGIN { r = "true" }

NR == 1 { n = NF; next }
NF != n { r = "false"; n = "N/A"; exit }

END { printf("status=%s count=%s\n", r, n) }

该脚本将从设置r（如“结果”中）开始true（我们假设它将是 true 而不是 false）。然后它初始化n（如“number”）为第一行的字段数。

如果输入数据中的任何其他行具有不同数量的字段，则r设置为false并n设置为N/A并且脚本退出（通过END块）。

最后，打印r和的当前值。n

该脚本的输出将类似于

status=true count=5

或者

status=false count=N/A

这可以与exportorbash的declare, or一起使用eval：

declare $( awk '...' somefilename )

这将创建 shell 变量count，status并且这些变量将在调用 shell 中可用。

Answer

使用awk：

awk 'BEGIN { r = "true" } NR == 1 { n = NF; next } NF != n { r = "false"; n = "N/A"; exit } END { printf("status=%s count=%s\n", r, n) }' somefilename

或者作为awk脚本：

#!/usr/bin/awk -f

BEGIN { r = "true" }

NR == 1 { n = NF; next }
NF != n { r = "false"; n = "N/A"; exit }

END { printf("status=%s count=%s\n", r, n) }

该脚本将从设置r（如“结果”中）开始true（我们假设它将是 true 而不是 false）。然后它初始化n（如“number”）为第一行的字段数。

如果输入数据中的任何其他行具有不同数量的字段，则r设置为false并n设置为N/A并且脚本退出（通过END块）。

最后，打印r和的当前值。n

该脚本的输出将类似于

status=true count=5

或者

status=false count=N/A

这可以与exportorbash的declare, or一起使用eval：

declare $( awk '...' somefilename )

这将创建 shell 变量count，status并且这些变量将在调用 shell 中可用。

Question 2

您可以使用关联数组来保存每个计数的数量：

#!/bin/bash
declare -A seen
while read -a line ; do
    (( seen[${#line[@]}]++ ))
done

if [[ ${#seen[@]} == 1 ]] ; then
    echo count=${#seen[@]}
    exit
else
    echo count=NA
    exit 1
fi

或者，您可以使用外部工具来完成这项工作。例如，以下脚本使用 Perl 计算单词数（通过其-a自动拆分选项），sort -u获取唯一计数，并wc -l检查是否有一个或多个计数。

#!/bin/bash
out=$(perl -lane 'print scalar @F' | sort -u)
if ((1 == $(wc -l <<<"$out") )) ; then
    echo count=$out
    exit
else
    echo count=NA
    exit 1
fi

Answer

您可以使用关联数组来保存每个计数的数量：

#!/bin/bash
declare -A seen
while read -a line ; do
    (( seen[${#line[@]}]++ ))
done

if [[ ${#seen[@]} == 1 ]] ; then
    echo count=${#seen[@]}
    exit
else
    echo count=NA
    exit 1
fi

或者，您可以使用外部工具来完成这项工作。例如，以下脚本使用 Perl 计算单词数（通过其-a自动拆分选项），sort -u获取唯一计数，并wc -l检查是否有一个或多个计数。

#!/bin/bash
out=$(perl -lane 'print scalar @F' | sort -u)
if ((1 == $(wc -l <<<"$out") )) ; then
    echo count=$out
    exit
else
    echo count=NA
    exit 1
fi

Question 3

if
  count=$(
    awk 'NR == 1 {print count = NF}
         NF != count {exit 1}' < file
  )
then
  if [ -z "$count" ]; then
    echo "OK? Not OK? file is empty"
  else
    echo "OK all lines have $count words"
  fi
else
  echo >&2 "Not all lines have the same number of words or the file can't be read"
fi

请注意，在最后一部分中，您可以区分不同计数和无法打开文件再次与[ -z "$count" ]。

Answer

if
  count=$(
    awk 'NR == 1 {print count = NF}
         NF != count {exit 1}' < file
  )
then
  if [ -z "$count" ]; then
    echo "OK? Not OK? file is empty"
  else
    echo "OK all lines have $count words"
  fi
else
  echo >&2 "Not all lines have the same number of words or the file can't be read"
fi

请注意，在最后一部分中，您可以区分不同计数和无法打开文件再次与[ -z "$count" ]。

Question 4

#!/usr/bin/perl

use strict; # get perl to warn us if we try to use an undeclared variable.

# get all words on first line, and store them in a hash
#
# note: it doesn't matter which line we get the word list from because
# we only want to know if all lines have the same number of identical
# words.
my %words = map { $_ => 1 } split (/\s+/,<>);

while(<>) {
  # now do the same for each subsequent line
  my %thisline = map { $_ => 1 } split ;

  # and compare them.  exit with a non-zero exit code if they differ.
  if (%words != %thisline) {
    # optionally print a warning message to STDERR here.
    exit 1;
  }
};

# print the number of words we saw on the first line
print scalar keys %words, "\n";
exit 0

（exit 0最后一行不是必需的 - 无论如何，这是默认值。它“有用”仅用于记录返回代码是该程序输出的重要部分。

笔记：这不会计算一行中的重复单词。例如sda sdb sdc sdc sdc将算作3言语，不5因为最后三个词是一样的。如果这很重要，那么哈希值还应该计算每个单词出现的次数。像这样的东西：

#!/usr/bin/perl

use strict;   # get perl to warn us if we try to use an undeclared variable.

# get all words on first line, and store them in a hash
#
# note: it doesn't matter which line we get the word list from because
# we only want to know if all lines have the same number of identical
# words.
my %words=();
$words{$_}++ for split (/\s+/,<>);

while(<>) {
  # now do the same for each subsequent line
  my %thisline=();
  $thisline{$_}++ for split;

  # and compare them.  exit with a non-zero exit code if they differ.
  if (%words != %thisline) {
    # optionally print a warning message to STDERR here
    exit 1;
  }
};

# add up the number of times each word was seen  on the first line  
my $count=0;
foreach (keys %words) {
  $count += $words{$_};
};

# print the total
print "$count\n";
exit 0;

显着的区别在于散列数组的填充方式。在第一个版本中，它只是将每个键（“单词”）的值设置为 1。在第二个版本中，它计算每个键出现的次数。

第二个版本还必须将每个键的值相加，它不能只打印看到的键的数量。

Answer

#!/usr/bin/perl

use strict; # get perl to warn us if we try to use an undeclared variable.

# get all words on first line, and store them in a hash
#
# note: it doesn't matter which line we get the word list from because
# we only want to know if all lines have the same number of identical
# words.
my %words = map { $_ => 1 } split (/\s+/,<>);

while(<>) {
  # now do the same for each subsequent line
  my %thisline = map { $_ => 1 } split ;

  # and compare them.  exit with a non-zero exit code if they differ.
  if (%words != %thisline) {
    # optionally print a warning message to STDERR here.
    exit 1;
  }
};

# print the number of words we saw on the first line
print scalar keys %words, "\n";
exit 0

（exit 0最后一行不是必需的 - 无论如何，这是默认值。它“有用”仅用于记录返回代码是该程序输出的重要部分。

笔记：这不会计算一行中的重复单词。例如sda sdb sdc sdc sdc将算作3言语，不5因为最后三个词是一样的。如果这很重要，那么哈希值还应该计算每个单词出现的次数。像这样的东西：

#!/usr/bin/perl

use strict;   # get perl to warn us if we try to use an undeclared variable.

# get all words on first line, and store them in a hash
#
# note: it doesn't matter which line we get the word list from because
# we only want to know if all lines have the same number of identical
# words.
my %words=();
$words{$_}++ for split (/\s+/,<>);

while(<>) {
  # now do the same for each subsequent line
  my %thisline=();
  $thisline{$_}++ for split;

  # and compare them.  exit with a non-zero exit code if they differ.
  if (%words != %thisline) {
    # optionally print a warning message to STDERR here
    exit 1;
  }
};

# add up the number of times each word was seen  on the first line  
my $count=0;
foreach (keys %words) {
  $count += $words{$_};
};

# print the total
print "$count\n";
exit 0;

显着的区别在于散列数组的填充方式。在第一个版本中，它只是将每个键（“单词”）的值设置为 1。在第二个版本中，它计算每个键出现的次数。

第二个版本还必须将每个键的值相加，它不能只打印看到的键的数量。

bash +如何验证每行中的所有单词具有相同的计数

答案1

答案2

答案3

答案4

相关内容