我正在尝试使用 pdf2text 处理从讲座幻灯片导出的一些文本。一些幻灯片的要点如下所示:
title for the list
-
-
-
a bullet point text
another bullet point text
yet another bullet point text
- nested bullet point
- another nested bullet point
- yet another nested bullet point
title for the next list
我想将它们连接到正确的(降价)列表中,如下所示:
title for the first list
- a bullet point text
- another bullet point text
- yet another bullet point text
- nested bullet point
- another nested bullet point
- yet another nested bullet point
title for the next list
答案1
刚刚使用 bash 脚本完成了
#!/bin/bash
c=0
[[ $# -eq 0 ]] && { echo "Error: Please Specify Input file" >&2; exit 1; }
while read line
do
if [[ $line = "-" ]]; then
(( c++ ))
if [[ $c -eq 1 ]]; then
echo ""
fi
elif [[ $line != "" ]] && [[ $c -ne 0 ]]; then
echo "- ${line}"
(( c-- ))
if [[ $c -eq 0 ]]; then
echo ""
fi
elif [[ $line =~ "- " ]] && [[ $c -ne 0 ]]; then
echo " $line"
else
echo "$line"
fi
done < $1
已测试并使用输入示例。
答案2
感谢@Rahul,但修改后的版本:
#!/bin/bash
if [[ -z "$1" || ! -f "$1" ]]; then
printf "Usage: %s <FILE>\n" "$(basename $0)"
exit 1
fi
c=0
eoli=0
pad=4
while read line
do
if [[ "$line" = "-" ]]; then
(( c++ ))
elif (( c > 0 )); then
echo "- $line"
! (( --c )) && eoli=1
elif ((eoli)) && [[ "$line" =~ ^-\ ]]; then
printf "%-*s%s\n" $pad "" "$line"
else
eoli=0
echo "$line"
fi
done < "$1"
使用 awk:
#!/usr/bin/awk -f
BEGIN {
c=0
eoli=0
pad=4
};
{
if (/^-$/) {
++c
} else if (c > 0) {
printf "- %s\n", $0
eoli = (--c == 0)
} else if (eoli && /^- /) {
printf "%*s%s\n", pad, "", $0
} else {
eoli=0
print $0
}
}