将长单元格包裹在 tsv 中以使它们保持在同一列中

Question

我不知道是否了解想要实现的目标，但也许这个awk程序对您有用：

内容script.awk：

{
    ## Number of blocks printed to output.
    block = 0 

    ## Get number of columns searching how many tabs exists in the line.
    ## I substract one because each line has a tab at the end and splits
    ## function count blank space after it like a new column.
    col_nums = split( $0, dummy, /\t+/ )
    --col_nums

    ## Incorrect line if it has not any tab. Omit it.
    if ( col_nums < 1 ) { 
        next
    }   

    ## Get number of chars of each block to output.
    ## 'max_cell_length' is an input argument provided by the user. It means
    ## number of chars to input by line.
    chars = max_cell_length / col_nums

    ## For each column...
    for ( i = 1; i <= NF; i++ ) { 

        ## This is the index where I begin to extract a substring. Zero is
        ## at first char.
        begin_idx = 0 

        ## Get for each column blocks of 'chars' characters. And repeat until
        ## end of column.
        while ( begin_idx < length( $i ) ) { 
            column = substr( $i, begin_idx, chars )

            ## Increment index to extract next block where last one ended.
            begin_idx += chars

            ## Print block to output.
            printf "%s ", column

            ## When have been printed number of columns indicated by the 
            ## user, change to next line.
            if ( ++block % col_nums == 0 ) { 
                printf "\n"
            }   
        }   
    }   
}

{
    ## For each line, print an extra newline for a pretty output.
    printf "\n"
}

运行脚本：

python3 -c 'for i in (1,2,3): print(((str(i)*50)+"\t")*3)' | awk -v max_cell_length=30 -f script.awk -

结果：

1111111111 1111111111 1111111111 
1111111111 1111111111 1111111111 
1111111111 1111111111 1111111111 
1111111111 1111111111 1111111111 
1111111111 1111111111 1111111111 

2222222222 2222222222 2222222222 
2222222222 2222222222 2222222222 
2222222222 2222222222 2222222222 
2222222222 2222222222 2222222222 
2222222222 2222222222 2222222222 

3333333333 3333333333 3333333333 
3333333333 3333333333 3333333333 
3333333333 3333333333 3333333333 
3333333333 3333333333 3333333333 
3333333333 3333333333 3333333333

您可以使用变量max_cell_length来指示每行输出的字符数（没有库），我想这将是原始数据中字符数的一个因素。否则输出的格式将是错误的，我用测试了它30，正如您在这篇文章中看到的，并用50.两者似乎都是正确的，但对于许多其他奇怪的数字则不然。

Answer 1

我不知道是否了解想要实现的目标，但也许这个awk程序对您有用：

内容script.awk：

{
    ## Number of blocks printed to output.
    block = 0 

    ## Get number of columns searching how many tabs exists in the line.
    ## I substract one because each line has a tab at the end and splits
    ## function count blank space after it like a new column.
    col_nums = split( $0, dummy, /\t+/ )
    --col_nums

    ## Incorrect line if it has not any tab. Omit it.
    if ( col_nums < 1 ) { 
        next
    }   

    ## Get number of chars of each block to output.
    ## 'max_cell_length' is an input argument provided by the user. It means
    ## number of chars to input by line.
    chars = max_cell_length / col_nums

    ## For each column...
    for ( i = 1; i <= NF; i++ ) { 

        ## This is the index where I begin to extract a substring. Zero is
        ## at first char.
        begin_idx = 0 

        ## Get for each column blocks of 'chars' characters. And repeat until
        ## end of column.
        while ( begin_idx < length( $i ) ) { 
            column = substr( $i, begin_idx, chars )

            ## Increment index to extract next block where last one ended.
            begin_idx += chars

            ## Print block to output.
            printf "%s ", column

            ## When have been printed number of columns indicated by the 
            ## user, change to next line.
            if ( ++block % col_nums == 0 ) { 
                printf "\n"
            }   
        }   
    }   
}

{
    ## For each line, print an extra newline for a pretty output.
    printf "\n"
}

运行脚本：

python3 -c 'for i in (1,2,3): print(((str(i)*50)+"\t")*3)' | awk -v max_cell_length=30 -f script.awk -

结果：

1111111111 1111111111 1111111111 
1111111111 1111111111 1111111111 
1111111111 1111111111 1111111111 
1111111111 1111111111 1111111111 
1111111111 1111111111 1111111111 

2222222222 2222222222 2222222222 
2222222222 2222222222 2222222222 
2222222222 2222222222 2222222222 
2222222222 2222222222 2222222222 
2222222222 2222222222 2222222222 

3333333333 3333333333 3333333333 
3333333333 3333333333 3333333333 
3333333333 3333333333 3333333333 
3333333333 3333333333 3333333333 
3333333333 3333333333 3333333333

您可以使用变量max_cell_length来指示每行输出的字符数（没有库），我想这将是原始数据中字符数的一个因素。否则输出的格式将是错误的，我用测试了它30，正如您在这篇文章中看到的，并用50.两者似乎都是正确的，但对于许多其他奇怪的数字则不然。

将长单元格包裹在 tsv 中以使它们保持在同一列中

答案1

相关内容