如何仅通过比较 2 个不同文件中的 2 列来附加缺失的行

Question 1

也许是这样的？

cat file2 | awk '!(1 in f) {if ((getline l < "-") == 1) split(l, f)} $3!=f[3] {print;next} {print l; delete f}' file1 | column -t

请注意，该脚本期望file1为争论到 awk，同时期望file2其标准输入。我使用了“猫的无用使用”来更明确地表明这一点，但自然地，您可以将其作为< file2重定向提供。事实上，您甚至可以将文件名嵌入到脚本本身中，以代替"file2"，但这种方式更灵活一些。"-"getline

另请注意，这两个文件预计将在 field3 值方面开始“同步”，或者如果这对您的用例有意义，则可能会file2“提前” 。file1

为了便于阅读，脚本单独进行了分解，并详细注释了解释：

# Check if our `real_fields` array is not existent.
# NOTE: we use the `<index> in <array>` construct
# in order to force awk treat `real_fields` name as an
# array (instead of as a scalar as it would by default)
# and build it in an empty state
!(1 in real_fields) {
    # get the next line (if any) from the "real" file
    if ((getline real_line < "-") == 1)
        # split that line in separate fields populating
        # our `real_fields` array
        split(real_line, real_fields)
        # awk split function creates an array with numeric
        # indexes for each field found as per FS separator
}
# if field3 of the current line of the "reference"
# file does not match the current line of the "real" file..
$3!=real_fields[3] {
    # print current line of "reference" file
    print
    # go reading next line of "reference" file thus
    # skipping the final awk pattern
    next
}
# final awk pattern, we get here only if the pattern
# above did not match, i.e. if field3 values from both
# files match
{
    # print current line of "real" file
    print real_line
    # delete our real_fields array, thus triggering
    # the fetching of the next line of "real" file as
    # performed by the first awk pattern
    delete real_fields
}

Answer

也许是这样的？

cat file2 | awk '!(1 in f) {if ((getline l < "-") == 1) split(l, f)} $3!=f[3] {print;next} {print l; delete f}' file1 | column -t

请注意，该脚本期望file1为争论到 awk，同时期望file2其标准输入。我使用了“猫的无用使用”来更明确地表明这一点，但自然地，您可以将其作为< file2重定向提供。事实上，您甚至可以将文件名嵌入到脚本本身中，以代替"file2"，但这种方式更灵活一些。"-"getline

另请注意，这两个文件预计将在 field3 值方面开始“同步”，或者如果这对您的用例有意义，则可能会file2“提前” 。file1

为了便于阅读，脚本单独进行了分解，并详细注释了解释：

# Check if our `real_fields` array is not existent.
# NOTE: we use the `<index> in <array>` construct
# in order to force awk treat `real_fields` name as an
# array (instead of as a scalar as it would by default)
# and build it in an empty state
!(1 in real_fields) {
    # get the next line (if any) from the "real" file
    if ((getline real_line < "-") == 1)
        # split that line in separate fields populating
        # our `real_fields` array
        split(real_line, real_fields)
        # awk split function creates an array with numeric
        # indexes for each field found as per FS separator
}
# if field3 of the current line of the "reference"
# file does not match the current line of the "real" file..
$3!=real_fields[3] {
    # print current line of "reference" file
    print
    # go reading next line of "reference" file thus
    # skipping the final awk pattern
    next
}
# final awk pattern, we get here only if the pattern
# above did not match, i.e. if field3 values from both
# files match
{
    # print current line of "real" file
    print real_line
    # delete our real_fields array, thus triggering
    # the fetching of the next line of "real" file as
    # performed by the first awk pattern
    delete real_fields
}

Question 2

您需要设置数组的顺序，否则 awk 将重新排序您的行。

#!/usr/bin/awk -f

BEGIN {
    PROCINFO["sorted_in"] = "@ind_str_asc"
}
NR==FNR {
    a[i++,$3]=$0
    next
} 
{
    for (c in a) {
        split(c, s, SUBSEP)
        if (s[2] == $3) {
            print $0
            getline
        } else {
            print a[c]
        }
    }
}

./script.awk file1 file2

Answer

您需要设置数组的顺序，否则 awk 将重新排序您的行。

#!/usr/bin/awk -f

BEGIN {
    PROCINFO["sorted_in"] = "@ind_str_asc"
}
NR==FNR {
    a[i++,$3]=$0
    next
} 
{
    for (c in a) {
        split(c, s, SUBSEP)
        if (s[2] == $3) {
            print $0
            getline
        } else {
            print a[c]
        }
    }
}

./script.awk file1 file2

如何仅通过比较 2 个不同文件中的 2 列来附加缺失的行

答案1

答案2

相关内容