如何将 awk 命令概括为脚本？（从文件中提取/重新排列列）

Question 1

您可以使用巴什的getopts（你必须向下滚动一点）进行一些命令行解析：

#!/bin/bash
delimiter=:
first=1
second=2
while getopts d:f:s: FLAG; do
  case $FLAG in
    d) delimiter=$OPTARG;;
    f) first=$OPTARG;;
    s) second=$OPTARG;;
    *) echo error >&2; exit 2;;
  esac
done
shift $((OPTIND-1))
awk -F"$delimiter" -v "OFS=$delimiter" -v first="$first" -v second="$second" '{ print $first OFS $second }' "$@"

Answer

您可以使用巴什的getopts（你必须向下滚动一点）进行一些命令行解析：

#!/bin/bash
delimiter=:
first=1
second=2
while getopts d:f:s: FLAG; do
  case $FLAG in
    d) delimiter=$OPTARG;;
    f) first=$OPTARG;;
    s) second=$OPTARG;;
    *) echo error >&2; exit 2;;
  esac
done
shift $((OPTIND-1))
awk -F"$delimiter" -v "OFS=$delimiter" -v first="$first" -v second="$second" '{ print $first OFS $second }' "$@"

Question 2

以下 shell 脚本采用可选-d选项来设置分隔符（制表符是默认值），以及-c带有列规范的非可选选项。

列规范与的类似，cut但也允许重新排列和复制输出列，以及向后指定范围。还支持开放范围。

要解析的文件在命令行上作为最后一个操作数给出，或通过标准输入传递。

#!/bin/sh

delim='\t'   # tab is default delimiter

# parse command line option
while getopts 'd:c:' opt; do
    case $opt in
        d)
            delim=$OPTARG
            ;;
        c)
            cols=$OPTARG
            ;;
        *)
            echo 'Error in command line parsing' >&2
            exit 1
    esac
done
shift "$(( OPTIND - 1 ))"

if [ -z "$cols" ]; then
    echo 'Missing column specification (the -c option)' >&2
    exit 1
fi

# ${1:--} will expand to the filename or to "-" if $1 is empty or unset
cat "${1:--}" |
awk -F "$delim" -v cols="$cols" '
    BEGIN {
        # output delim will be same as input delim
        OFS = FS

        # get array of column specs
        ncolspec = split(cols, colspec, ",")
    }

    {
        # get fields of current line
        # (need this as we are rewriting $0 below)
        split($0, fields, FS)

        nf = NF     # save NF in case we have an open-ended range
        $0 = "";    # empty $0

        # go through given column specification and
        # create a record from it
        for (i = 1; i <= ncolspec; ++i)
            if (split(colspec[i], r, "-") == 1)
                # single column spec
                $(NF+1) = fields[colspec[i]]
            else {
                # column range spec

                if (r[1] == "") r[1] = 1    # open start range
                if (r[2] == "") r[2] = nf   # open end range

                if (r[1] < r[2])
                    # forward range
                    for (j = r[1]; j <= r[2]; ++j)
                        $(NF + 1) = fields[j]
                else
                    # backward range
                    for (j = r[1]; j >= r[2]; --j)
                        $(NF + 1) = fields[j]
            }

        print
    }'

由于代码需要重新解析每个新行的列规范，因此效率稍低。如果不需要支持开放式范围，或者假设所有行都具有完全相同的列数，则只能在块BEGIN（或单独的NR==1块）中完成一次规范传递以创建数组应输出的字段。

缺少：列规范的健全性检查。格式错误的规范字符串很可能会导致奇怪的情况。

测试：

$ cat file
1:2:3
a:b:c
@:(:)

$ sh script.sh -d : -c 1,3 <file
1:3
a:c
@:)

$ sh script.sh -d : -c 3,1 <file
3:1
c:a
):@

$ sh script.sh -d : -c 3-1,1,1-3 <file
3:2:1:1:1:2:3
c:b:a:a:a:b:c
):(:@:@:@:(:)

$ sh script.sh -d : -c 1-,3 <file
1:2:3:3
a:b:c:c
@:(:):)

Answer

以下 shell 脚本采用可选-d选项来设置分隔符（制表符是默认值），以及-c带有列规范的非可选选项。

列规范与的类似，cut但也允许重新排列和复制输出列，以及向后指定范围。还支持开放范围。

要解析的文件在命令行上作为最后一个操作数给出，或通过标准输入传递。

#!/bin/sh

delim='\t'   # tab is default delimiter

# parse command line option
while getopts 'd:c:' opt; do
    case $opt in
        d)
            delim=$OPTARG
            ;;
        c)
            cols=$OPTARG
            ;;
        *)
            echo 'Error in command line parsing' >&2
            exit 1
    esac
done
shift "$(( OPTIND - 1 ))"

if [ -z "$cols" ]; then
    echo 'Missing column specification (the -c option)' >&2
    exit 1
fi

# ${1:--} will expand to the filename or to "-" if $1 is empty or unset
cat "${1:--}" |
awk -F "$delim" -v cols="$cols" '
    BEGIN {
        # output delim will be same as input delim
        OFS = FS

        # get array of column specs
        ncolspec = split(cols, colspec, ",")
    }

    {
        # get fields of current line
        # (need this as we are rewriting $0 below)
        split($0, fields, FS)

        nf = NF     # save NF in case we have an open-ended range
        $0 = "";    # empty $0

        # go through given column specification and
        # create a record from it
        for (i = 1; i <= ncolspec; ++i)
            if (split(colspec[i], r, "-") == 1)
                # single column spec
                $(NF+1) = fields[colspec[i]]
            else {
                # column range spec

                if (r[1] == "") r[1] = 1    # open start range
                if (r[2] == "") r[2] = nf   # open end range

                if (r[1] < r[2])
                    # forward range
                    for (j = r[1]; j <= r[2]; ++j)
                        $(NF + 1) = fields[j]
                else
                    # backward range
                    for (j = r[1]; j >= r[2]; --j)
                        $(NF + 1) = fields[j]
            }

        print
    }'

由于代码需要重新解析每个新行的列规范，因此效率稍低。如果不需要支持开放式范围，或者假设所有行都具有完全相同的列数，则只能在块BEGIN（或单独的NR==1块）中完成一次规范传递以创建数组应输出的字段。

缺少：列规范的健全性检查。格式错误的规范字符串很可能会导致奇怪的情况。

测试：

$ cat file
1:2:3
a:b:c
@:(:)

$ sh script.sh -d : -c 1,3 <file
1:3
a:c
@:)

$ sh script.sh -d : -c 3,1 <file
3:1
c:a
):@

$ sh script.sh -d : -c 3-1,1,1-3 <file
3:2:1:1:1:2:3
c:b:a:a:a:b:c
):(:@:@:@:(:)

$ sh script.sh -d : -c 1-,3 <file
1:2:3:3
a:b:c:c
@:(:):)

Question 3

感谢您的回复。这是我的脚本。我通过反复试验创建了它，但这通常不会产生可行的解决方案，并且没有系统的方法来提出我一直瞄准的脚本。如果可以的话，请提供一些代码审查。谢谢。

该脚本在以下示例中有效（不确定一般情况下是否有效）：

$ projection -d ":" /etc/passwd 4 3 6 7

$ projection -d "/" /etc/passwd 4 3 6 7

脚本projection是：

#! /bin/bash

# default arg value                                                                                                                                                               
delim="," # CSV by default                                                                                                                                                        
# Parse flagged arguments:                                                                                                                                                        
while getopts "td:" flag
do
  case $flag in
    d) delim=$OPTARG;;
    t) delim="\t";;
    ?) exit;;
  esac
done
# Delete the flagged arguments:                                                                                                                                                   
shift $(($OPTIND -1))

inputfile="$1"
shift 1

fs=("$@")
# prepend "$" to each field number                                                                                                                                                
fields=()
for f in "${fs[@]}"; do
    fields+=(\$"$f")
done

awk -F"$delim" "{ print $(join_by.sh " \"$delim\" " "${fields[@]}") }" "$inputfile"

哪里join_by.sh

#! /bin/bash                                                                                                                                                                      

# https://stackoverflow.com/questions/1527049/join-elements-of-an-array                                                                                                           
# https://stackoverflow.com/a/2317171/                                                                                                                                

# get the separator:                                                                                                                                                              
d="$1";
shift;

# interpolate other parameters by teh separator                                                                                                                                   
# by treating the first parameter specially                                                                                                                                       
echo -n "$1";
shift;
printf "%s" "${@/#/$d}";

Answer

感谢您的回复。这是我的脚本。我通过反复试验创建了它，但这通常不会产生可行的解决方案，并且没有系统的方法来提出我一直瞄准的脚本。如果可以的话，请提供一些代码审查。谢谢。

该脚本在以下示例中有效（不确定一般情况下是否有效）：

$ projection -d ":" /etc/passwd 4 3 6 7

$ projection -d "/" /etc/passwd 4 3 6 7

脚本projection是：

#! /bin/bash

# default arg value                                                                                                                                                               
delim="," # CSV by default                                                                                                                                                        
# Parse flagged arguments:                                                                                                                                                        
while getopts "td:" flag
do
  case $flag in
    d) delim=$OPTARG;;
    t) delim="\t";;
    ?) exit;;
  esac
done
# Delete the flagged arguments:                                                                                                                                                   
shift $(($OPTIND -1))

inputfile="$1"
shift 1

fs=("$@")
# prepend "$" to each field number                                                                                                                                                
fields=()
for f in "${fs[@]}"; do
    fields+=(\$"$f")
done

awk -F"$delim" "{ print $(join_by.sh " \"$delim\" " "${fields[@]}") }" "$inputfile"

哪里join_by.sh

#! /bin/bash                                                                                                                                                                      

# https://stackoverflow.com/questions/1527049/join-elements-of-an-array                                                                                                           
# https://stackoverflow.com/a/2317171/                                                                                                                                

# get the separator:                                                                                                                                                              
d="$1";
shift;

# interpolate other parameters by teh separator                                                                                                                                   
# by treating the first parameter specially                                                                                                                                       
echo -n "$1";
shift;
printf "%s" "${@/#/$d}";

如何将 awk 命令概括为脚本？（从文件中提取/重新排列列）

答案1

答案2

答案3

相关内容