从文件的每一行中抽取一个元素，直到 k

Question 1

在高级语言中，您可以使用数组的数组，但 bash 没有这些。像这个这样的涉及多级数据结构的问题在 shell 中解决起来往往非常乏味。

但由于您的目标是学习 Unix 文本处理，而不是 Python，所以让我们在 shell 中解决它。

在此解决方案中，我们通读一次文件以获取行标题，然后再次通读多次以收集所需数量的元素。我们保留两个数组：outrow是一个输出行数组，每行都被附加到我们去的地方；cursor是一个整数数组，用于存储我们在每行上的位置。

请注意，如果没有足够的元素来满足请求，此脚本将永远循环。解决这个问题留给读者作为练习。

#!/bin/bash
k=$1
input=input.txt
declare -a outrow
declare -a cursor
K=0
n=0
while read line
do
    outrow[$n]=${line%% *}
    cursor[$n]=1
    (( n++ ))
done < $input

while [[ $K -lt $k ]]
do
    n=0
    while read line
    do
        declare -a col=( $line )
        if [[ ${#col[@]} -gt ${cursor[$n]} ]]
        then
            outrow[$n]+=" ${col[ ${cursor[$n]} ]}"
            (( cursor[$n]++ ))
            (( K++ ))
            [[ $K -lt $k ]] || break
        fi
        (( n++ ))
    done < $input
done

for row in "${outrow[@]}"
do
    echo "$row"
done

Answer

在高级语言中，您可以使用数组的数组，但 bash 没有这些。像这个这样的涉及多级数据结构的问题在 shell 中解决起来往往非常乏味。

但由于您的目标是学习 Unix 文本处理，而不是 Python，所以让我们在 shell 中解决它。

在此解决方案中，我们通读一次文件以获取行标题，然后再次通读多次以收集所需数量的元素。我们保留两个数组：outrow是一个输出行数组，每行都被附加到我们去的地方；cursor是一个整数数组，用于存储我们在每行上的位置。

请注意，如果没有足够的元素来满足请求，此脚本将永远循环。解决这个问题留给读者作为练习。

#!/bin/bash
k=$1
input=input.txt
declare -a outrow
declare -a cursor
K=0
n=0
while read line
do
    outrow[$n]=${line%% *}
    cursor[$n]=1
    (( n++ ))
done < $input

while [[ $K -lt $k ]]
do
    n=0
    while read line
    do
        declare -a col=( $line )
        if [[ ${#col[@]} -gt ${cursor[$n]} ]]
        then
            outrow[$n]+=" ${col[ ${cursor[$n]} ]}"
            (( cursor[$n]++ ))
            (( K++ ))
            [[ $K -lt $k ]] || break
        fi
        (( n++ ))
    done < $input
done

for row in "${outrow[@]}"
do
    echo "$row"
done

Question 2

笔记：通过更改num变量，您可以调节元素的数量。

gawk -v num=5 '
BEGIN {
    PROCINFO["sorted_in"] = "@ind_str_asc"
}
{
    ### 
    # Traverse throught input.txt from first to last line
    # and store all elements in the two-dimensional array - table
    # along the way, maintain the array of counters for each letter
    ###

    # The array of counters for each unique element from the first column.
    # In our case the indexes of array are capital letters (A, B, C, D)
    # and values are the amount of each letter occurrences.
    cnt_arr[$1]++

    # Two dimension array - table
    # it looks like chess board - rows named by letters (A, B, C, D)
    # and columns named by numbers (1, 2, 3, 4, 5... etc).
    # Its cells contains numbers from the second column.
    # For example, if letter A occurrences 5 times in the input.txt
    # then, the table will have the A row with 5 columns 
    table[$1][cnt_arr[$1]] = $2
}
# At this point, all lines from input.txt are processed
# and stored in the table
END {
    # Do needed number of iterations - specified by the num variable
    for(i = 0; i < num; i++) {

        # On each iteration run the inner loop,
        # which iterating through all rows in the table
        for(row_name in table) {

            # Check each cell - if it is non-empty
            # add its value to the result_arr[row_name], separated by OFS.
            # OFS - output field separator, the space by default
            if(table[row_name][i]) {
                result_arr[row_name] = result_arr[row_name] OFS table[row_name][i]
                # and count the number of succesful occurences
                cnt++
            }

            # If count of non-empty cells equals to the num variable
            # or equals to the NR (number of records|lines)
            # print the result_arr and exit
            if(cnt == num || cnt >= NR) {
                for(i in result_arr) {
                    print i result_arr[i]
                }
                exit
            }
        }
    }
}' input.txt

信息关于PROCINFO["sorted_in"] = "@ind_str_asc"线是这里。

输入

A 1
B 2
C 9
D 1
A 5
B 3
C 9
A 6
C 7
A 5
C 1

输出

A 1 5
B 2
C 9
D 1

Answer

笔记：通过更改num变量，您可以调节元素的数量。

gawk -v num=5 '
BEGIN {
    PROCINFO["sorted_in"] = "@ind_str_asc"
}
{
    ### 
    # Traverse throught input.txt from first to last line
    # and store all elements in the two-dimensional array - table
    # along the way, maintain the array of counters for each letter
    ###

    # The array of counters for each unique element from the first column.
    # In our case the indexes of array are capital letters (A, B, C, D)
    # and values are the amount of each letter occurrences.
    cnt_arr[$1]++

    # Two dimension array - table
    # it looks like chess board - rows named by letters (A, B, C, D)
    # and columns named by numbers (1, 2, 3, 4, 5... etc).
    # Its cells contains numbers from the second column.
    # For example, if letter A occurrences 5 times in the input.txt
    # then, the table will have the A row with 5 columns 
    table[$1][cnt_arr[$1]] = $2
}
# At this point, all lines from input.txt are processed
# and stored in the table
END {
    # Do needed number of iterations - specified by the num variable
    for(i = 0; i < num; i++) {

        # On each iteration run the inner loop,
        # which iterating through all rows in the table
        for(row_name in table) {

            # Check each cell - if it is non-empty
            # add its value to the result_arr[row_name], separated by OFS.
            # OFS - output field separator, the space by default
            if(table[row_name][i]) {
                result_arr[row_name] = result_arr[row_name] OFS table[row_name][i]
                # and count the number of succesful occurences
                cnt++
            }

            # If count of non-empty cells equals to the num variable
            # or equals to the NR (number of records|lines)
            # print the result_arr and exit
            if(cnt == num || cnt >= NR) {
                for(i in result_arr) {
                    print i result_arr[i]
                }
                exit
            }
        }
    }
}' input.txt

信息关于PROCINFO["sorted_in"] = "@ind_str_asc"线是这里。

输入

A 1
B 2
C 9
D 1
A 5
B 3
C 9
A 6
C 7
A 5
C 1

输出

A 1 5
B 2
C 9
D 1

从文件的每一行中抽取一个元素，直到 k

答案1

答案2

相关内容