根据行内容排列文本

Question

使用 GNUawk将文本读取为固定宽度的记录集，其中每个记录分为宽度为 6（左标签）、42（左文本行）、6（右标签）和 42（右文本行）的字段。文本）：

BEGIN {
        FIELDWIDTHS = "6 42 6 42"
}

# New label seen on the left hand side.
# If this is a completely new label, then
# add it to the end of the "labels" array.
$1 != "      " {
        llabel = $1
        if (!seenlabels[llabel]++)
                labels[++n] = llabel
}

# Same as above, but for the right hand side.
$3 != "      " {
        rlabel = $3
        if (!seenlabels[rlabel]++)
                labels[++n] = rlabel
}

# Add text to the labelled paragraphs, left and right,
# as strings delimited by ORS (newline).
{
        ltext[llabel] = (ltext[llabel] == "" ? $2 : ltext[llabel] ORS $2)
        rtext[rlabel] = (rtext[rlabel] == "" ? $4 : rtext[rlabel] ORS $4)
}

# At end, output.
END {
        # Iterate over all paragraphs (there are "n" of them).
        for (i = 1; i <= n; ++i) {
                delete llines
                delete rlines

                # Split the text for the left and right paragraph,
                # into arrays, "llines" and "rlines".
                a = split(ltext[labels[i]], llines, ORS)
                b = split(rtext[labels[i]], rlines, ORS)

                # The arrays may be of different lengths, but
                # "c" will be the length of the longest, i.e.
                # the number of lines of the paragraph to the
                # left or right, whichever is longes.
                c = (a > b ? a : b)

                # Print the first line of the left and right
                # of this paragarph (includes the label at the left).
                printf("%-6s%-42s%-6s%-42s\n", labels[i], llines[1], "", rlines[1])

                # Then print the other lines (no label).
                for (j = 2; j <= c; ++j)
                        printf("%-6s%-42s%-6s%-42s\n", "", llines[j], "", rlines[j])
        }
}

测试：

$ cat file
1:1   Lorem ipsum dolor sit amet consectetur    1:1   This is sample text of varying length.
      adipiscing elit.                          1:2   This is another paragraph in this file.
1:2   Vivamus integer non suscipit taciti mus         Yet another sentence in this paragraph.
      etiam at primis tempor sagittis.          1:3   Another paragraph can be found here!

$ gawk -f script file
1:1   Lorem ipsum dolor sit amet consectetur          This is sample text of varying length.
      adipiscing elit.
1:2   Vivamus integer non suscipit taciti mus         This is another paragraph in this file.
      etiam at primis tempor sagittis.                Yet another sentence in this paragraph.
1:3                                                   Another paragraph can be found here!

awk由于这是使用（变量）的 POSIX 规范的 GNU 特定扩展FIELDWIDTHS，因此它不是严格的 POSIX 答案。

对于 POSIX 兼容答案，只需将该BEGIN部分替换为：

{
    rec = $0
    $0 = ""
    $1 = substr(rec,1,6)
    $2 = substr(rec,7,42)
    $3 = substr(rec,49,6)
    $4 = substr(rec,55)
}

Answer 1

使用 GNUawk将文本读取为固定宽度的记录集，其中每个记录分为宽度为 6（左标签）、42（左文本行）、6（右标签）和 42（右文本行）的字段。文本）：

BEGIN {
        FIELDWIDTHS = "6 42 6 42"
}

# New label seen on the left hand side.
# If this is a completely new label, then
# add it to the end of the "labels" array.
$1 != "      " {
        llabel = $1
        if (!seenlabels[llabel]++)
                labels[++n] = llabel
}

# Same as above, but for the right hand side.
$3 != "      " {
        rlabel = $3
        if (!seenlabels[rlabel]++)
                labels[++n] = rlabel
}

# Add text to the labelled paragraphs, left and right,
# as strings delimited by ORS (newline).
{
        ltext[llabel] = (ltext[llabel] == "" ? $2 : ltext[llabel] ORS $2)
        rtext[rlabel] = (rtext[rlabel] == "" ? $4 : rtext[rlabel] ORS $4)
}

# At end, output.
END {
        # Iterate over all paragraphs (there are "n" of them).
        for (i = 1; i <= n; ++i) {
                delete llines
                delete rlines

                # Split the text for the left and right paragraph,
                # into arrays, "llines" and "rlines".
                a = split(ltext[labels[i]], llines, ORS)
                b = split(rtext[labels[i]], rlines, ORS)

                # The arrays may be of different lengths, but
                # "c" will be the length of the longest, i.e.
                # the number of lines of the paragraph to the
                # left or right, whichever is longes.
                c = (a > b ? a : b)

                # Print the first line of the left and right
                # of this paragarph (includes the label at the left).
                printf("%-6s%-42s%-6s%-42s\n", labels[i], llines[1], "", rlines[1])

                # Then print the other lines (no label).
                for (j = 2; j <= c; ++j)
                        printf("%-6s%-42s%-6s%-42s\n", "", llines[j], "", rlines[j])
        }
}

测试：

$ cat file
1:1   Lorem ipsum dolor sit amet consectetur    1:1   This is sample text of varying length.
      adipiscing elit.                          1:2   This is another paragraph in this file.
1:2   Vivamus integer non suscipit taciti mus         Yet another sentence in this paragraph.
      etiam at primis tempor sagittis.          1:3   Another paragraph can be found here!

$ gawk -f script file
1:1   Lorem ipsum dolor sit amet consectetur          This is sample text of varying length.
      adipiscing elit.
1:2   Vivamus integer non suscipit taciti mus         This is another paragraph in this file.
      etiam at primis tempor sagittis.                Yet another sentence in this paragraph.
1:3                                                   Another paragraph can be found here!

awk由于这是使用（变量）的 POSIX 规范的 GNU 特定扩展FIELDWIDTHS，因此它不是严格的 POSIX 答案。

对于 POSIX 兼容答案，只需将该BEGIN部分替换为：

{
    rec = $0
    $0 = ""
    $1 = substr(rec,1,6)
    $2 = substr(rec,7,42)
    $3 = substr(rec,49,6)
    $4 = substr(rec,55)
}

根据行内容排列文本

答案1

相关内容