将行块与后续行块连接

将行块与后续行块连接

我正在尝试使用 pdf2text 处理从讲座幻灯片导出的一些文本。一些幻灯片的要点如下所示:

title for the list
-
-
-
a bullet point text
another bullet point text
yet another bullet point text
- nested bullet point
- another nested bullet point
- yet another nested bullet point
title for the next list

我想将它们连接到正确的(降价)列表中,如下所示:

title for the first list

-   a bullet point text
-   another bullet point text
-   yet another bullet point text
    -   nested bullet point
    -   another nested bullet point
    -   yet another nested bullet point

title for the next list

答案1

刚刚使用 bash 脚本完成了

#!/bin/bash
c=0
[[ $# -eq 0 ]] && { echo "Error: Please Specify Input file" >&2; exit 1; }

while read line
do
        if [[ $line = "-" ]]; then
                (( c++ ))
                if [[ $c -eq 1 ]]; then
                    echo ""
                fi
        elif [[ $line != "" ]] && [[ $c -ne 0 ]]; then
                echo "-   ${line}"
                (( c-- ))
                if [[ $c -eq 0 ]]; then
                    echo ""
                fi
        elif [[ $line =~ "- " ]] && [[ $c -ne 0 ]]; then
                echo "    $line"
        else
                echo "$line"
        fi
done < $1

已测试并使用输入示例。

答案2

感谢@Rahul,但修改后的版本:

#!/bin/bash

if [[ -z "$1" || ! -f "$1" ]]; then
    printf "Usage: %s <FILE>\n" "$(basename $0)"
    exit 1
fi

c=0
eoli=0
pad=4

while read line
do
        if [[ "$line" = "-" ]]; then
                 (( c++ ))
        elif (( c > 0 )); then
                echo "- $line"
                ! (( --c )) && eoli=1
        elif ((eoli)) && [[ "$line" =~ ^-\  ]]; then
                printf "%-*s%s\n" $pad "" "$line"
        else
                eoli=0
                echo "$line"
        fi
done < "$1"

使用 awk:

#!/usr/bin/awk -f

BEGIN {
    c=0
    eoli=0
    pad=4
};

{
    if (/^-$/) { 
        ++c 
    } else if (c > 0) {
        printf "- %s\n", $0
        eoli = (--c == 0)
    } else if (eoli && /^- /) {
        printf  "%*s%s\n", pad, "", $0
    } else {
        eoli=0
        print $0
    }
}

相关内容