查找文本文件中缺失的值

2024-5-22 • tag-icon

shell-script text-processing scripting

查找文本文件中缺失的值

我有一个包含以下数据的文本文件。

 Name             Feature
 Marry            Lecturer
 Marry            Student
 Marry            Leader
 Bob              Lecturer
 Bob              Student
 Som              Student

我对每个人只有 3 个特征，即讲师,学生和领导者。

上面的例子只是一个示例，在我的真实数据中，有更多的人具有这些功能。

现在，我想制作一个 Unix 脚本，通过它我可以检查各个人缺少哪 3 个功能。

我知道可以通过建立键值关系来完成，但我无法正确地弄清楚。

我bash在 SunOS 5.10 i386 上运行 shell。

答案1

如果 list.txt 中有姓名列表，您可以执行以下操作：

for i in Student Leader Lecturer; do grep -F $i list.txt | cut -d ' ' -f 1 | sort > $i.out ; done

要获取 3 个单独排序文件中的名称，您可以将其与diffuse(或xxdiff或diff3) 进行比较：

diffuse *.out

如果您只想拥有缺少每个标签的人员姓名的文件，您可以首先生成一个包含所有姓名的文件，然后用于uniq -u查找不在该列表中的人员（真正唯一的人员）：

sed -n '1!p' list.txt  | cut -d ' ' -f 1 | sort -u > names.all
for i in Student Leader Lecturer; do fgrep $i list.txt | cut -d ' ' -f 1  | cat - names.all | sort | uniq -u > $i.missing ; done

如果您想通过脚本和文件执行此操作feature：

Leader 
Student
Lecturer

和中的源表example.txt，您可以使用：

#!/bin/bash

rm -f *.missing names.all
feature=feature
sed -n '1!p' example.txt | cut -d ' ' -f 1 | sort -u > names.all
for i in $(cat $feature)
do
    fgrep $i example.txt | cut -d ' ' -f 1 | cat - names.all | sort | uniq -u > $i.missing 
done

答案2

您可以使用数组在纯 bash 中执行此操作：

#!/usr/bin/env bash

## Declare the various arrays we will be using
declare -A hasfeat;
declare -A names;
declare -A features;
## The input file
file="/path/to/file"

## The awk is used to skip the first line, the header
awk 'NR>1' "$file" |
    {
        while read name feat;
        do
            ## Save the names
            names[$name]=1;
            ## Save the features
            features[$feat]=1;
            ## Save this name/feature combination
            hasfeat[$name,$feat]=1;
        done
        ## For each name in the file
        for name in ${!names[@]}
        do
            ## For each feature in the file
            for feat in ${!features[@]}
            do
                ## Print the name if it doesn't have this feature
                [ -z ${array[$name,$feat]} ] && echo $name lacks $feat
            done
        done;
    }

或者，更简洁地，用 Perl 语言：

$ perl -lane 'if($.>1){$l{$F[1]}++;$k{$F[0]}{$F[1]}++}
  END{foreach $f (keys(%l)){ 
    map{print "$_ lacks $f" unless $k{$_}{$f}}keys(%k)
    }}' file

相关内容