循环 awk 列中的一个或多个匹配项

Question 1

使用 GNU awk for FPAT（然后，因为我们已经需要 gawk，所以还使用gensub()和\s简写[[:space:]]）：

$ cat tst.awk
BEGIN {
    FPAT = "([^,]*)|(\"[^\"]+\")"
    OFS=","
}
{
    name = gensub(/^"|"$/,"","g",$1)
    n = split(gensub(/^"|"$/,"","g",$2),emails,/\s*[;,|:]\s*/)
    for (i=1; i<=n; i++) {
        print name, emails[i]
    }
}
$
$ awk -f tst.awk file
agrippa,[email protected]
elvirka,[email protected]
Inofs,[email protected]
Inofs,[email protected]
bekbz,[email protected]
bekbz,[email protected]
njkzif,[email protected]
njkzif,[email protected]
njycz,[email protected]
njycz,[email protected]
DanielEdict,[email protected]
JosEmbesy,[email protected]
JosEmbesy,[email protected]
Walterdon,[email protected]
Walterdon,[email protected]
Kennethlob,[email protected]
Ninosh,[email protected]
Patrickbam,[email protected]

FWIW 我通常使用该*sub(/^"|"$/,"",...)方法从 CSV 字段中删除可能的前导/训练双引号，因为它比该substr()方法有一个好处，即在没有双引号的情况下不会破坏字段。

您可能还想添加一些错误检测，以防电子邮件地址损坏或您忘记处理的情况（例如中的分隔符[;,|:]）：

$ cat tst.awk
BEGIN {
    FPAT = "([^,]*)|(\"[^\"]+\")"
    OFS=","
}
{
    name = gensub(/^"|"$/,"","g",$1)
    n = split(gensub(/^"|"$/,"","g",$2),emails,/\s*[;,|:]\s*/)
    for (i=1; i<=n; i++) {
        email = emails[i]
        if ( gsub(/@/,"&",email) != 1 ) {
            printf "ERROR: too few or too many email addresses in \"%s\"\n", email | "cat>&2"
            exit 1
        }
        print name, email
    }
}

如果你真的想验证电子邮件地址，FWIW 在过去 5 年左右的时间里没有任何问题，我知道我一直在使用这个修改后的正则表达式版本http://www.regular-expressions.info/email.html（我特别使用 [a-zA-Z] 而不是 [:alpha:] 因为我只想接受在我的语言环境中被认为是这样的字母 - 您决定什么对您的应用程序有意义）：

    (email ~ /^[0-9a-zA-Z._%+-]+@[0-9a-zA-Z.-]+\.[a-zA-Z]{2,}$/)

Answer

使用 GNU awk for FPAT（然后，因为我们已经需要 gawk，所以还使用gensub()和\s简写[[:space:]]）：

$ cat tst.awk
BEGIN {
    FPAT = "([^,]*)|(\"[^\"]+\")"
    OFS=","
}
{
    name = gensub(/^"|"$/,"","g",$1)
    n = split(gensub(/^"|"$/,"","g",$2),emails,/\s*[;,|:]\s*/)
    for (i=1; i<=n; i++) {
        print name, emails[i]
    }
}
$
$ awk -f tst.awk file
agrippa,[email protected]
elvirka,[email protected]
Inofs,[email protected]
Inofs,[email protected]
bekbz,[email protected]
bekbz,[email protected]
njkzif,[email protected]
njkzif,[email protected]
njycz,[email protected]
njycz,[email protected]
DanielEdict,[email protected]
JosEmbesy,[email protected]
JosEmbesy,[email protected]
Walterdon,[email protected]
Walterdon,[email protected]
Kennethlob,[email protected]
Ninosh,[email protected]
Patrickbam,[email protected]

FWIW 我通常使用该*sub(/^"|"$/,"",...)方法从 CSV 字段中删除可能的前导/训练双引号，因为它比该substr()方法有一个好处，即在没有双引号的情况下不会破坏字段。

您可能还想添加一些错误检测，以防电子邮件地址损坏或您忘记处理的情况（例如中的分隔符[;,|:]）：

$ cat tst.awk
BEGIN {
    FPAT = "([^,]*)|(\"[^\"]+\")"
    OFS=","
}
{
    name = gensub(/^"|"$/,"","g",$1)
    n = split(gensub(/^"|"$/,"","g",$2),emails,/\s*[;,|:]\s*/)
    for (i=1; i<=n; i++) {
        email = emails[i]
        if ( gsub(/@/,"&",email) != 1 ) {
            printf "ERROR: too few or too many email addresses in \"%s\"\n", email | "cat>&2"
            exit 1
        }
        print name, email
    }
}

如果你真的想验证电子邮件地址，FWIW 在过去 5 年左右的时间里没有任何问题，我知道我一直在使用这个修改后的正则表达式版本http://www.regular-expressions.info/email.html（我特别使用 [a-zA-Z] 而不是 [:alpha:] 因为我只想接受在我的语言环境中被认为是这样的字母 - 您决定什么对您的应用程序有意义）：

    (email ~ /^[0-9a-zA-Z._%+-]+@[0-9a-zA-Z.-]+\.[a-zA-Z]{2,}$/)

Question 2

不确定我是否理解您对 15+ 和 7 列的括号内评论，但对于给出的示例，请尝试

awk -F, '


        {gsub (/[" ]/,_)                        # remove double quotes and space all over
         D1 = $1                                # save field 1 and
         sub ($1 FS, _)                         # remove it from line
         n  = split ($0, T, /[,;:\|]/)          # split the residual line into array T
         for (i=1; i<=n; i++) print D1, T[i]    # print former $1, and each T element
        }
' OFS=, file
agrippa,[email protected]
elvirka,[email protected]
Inofs,[email protected]
Inofs,[email protected]
.
.
.
Patrickbam,[email protected]

Answer

不确定我是否理解您对 15+ 和 7 列的括号内评论，但对于给出的示例，请尝试

awk -F, '


        {gsub (/[" ]/,_)                        # remove double quotes and space all over
         D1 = $1                                # save field 1 and
         sub ($1 FS, _)                         # remove it from line
         n  = split ($0, T, /[,;:\|]/)          # split the residual line into array T
         for (i=1; i<=n; i++) print D1, T[i]    # print former $1, and each T element
        }
' OFS=, file
agrippa,[email protected]
elvirka,[email protected]
Inofs,[email protected]
Inofs,[email protected]
.
.
.
Patrickbam,[email protected]

循环 awk 列中的一个或多个匹配项

答案1

答案2

相关内容