如何将 Tiddlywiki 解析为一堆纯文本文件？

Question 1

无论如何，这个脚本应该可以做到这一点，但需要满足一些假设。例如，如果标签中的属性div包含右尖括号 ( )，如果和属性>的顺序发生变化，或者标签跨越多行，它就会中断。titlecreatordiv

#!/usr/bin/awk -f

# treat the opening tag line here
/<div title=".*" creator=".*"/ {
    indiv = 1                                            # inside div from here on
    name = gensub(/.* title="([^"]+)".*/, "\\1", "")     # extract name
    tagsattr = gensub(/.* tags="([^"]+)".*/, "\\1", "")  # extract tags string
    split(tagsattr, tags, /, /)                          # split tags into array

    print(name) > name                                   # print name into file "name"
    for(tag in tags) printf("@%s ", tags[tag]) >> name   # print tags with "@" prefix
    printf("\n\n") >> name                               # two newlines
    sub(/.*<div [^>]+>/, "")                             # remove the tag so the rest
                                                         # of the line can be printed
}

# treat closing line
indiv == 1 && /<\/div>/ {
    sub(/<\/div>.*/, "")                                 # remove tag so the rest
    print >> name                                        # can be printed
    indiv = 0                                            # outside div from here on
}

# print all other lines inside of div
indiv == 1 {
    print >> name
}

chmod +x并以输入文件名作为参数进行调用。事实上，它将在当前目录中创建其输出文件，因此请小心。

如果您的输入文件以目录树的形式构造，您可能必须使用 shell 通配符、循环或实用程序来找到正确的命令行find。

Answer

无论如何，这个脚本应该可以做到这一点，但需要满足一些假设。例如，如果标签中的属性div包含右尖括号 ( )，如果和属性>的顺序发生变化，或者标签跨越多行，它就会中断。titlecreatordiv

#!/usr/bin/awk -f

# treat the opening tag line here
/<div title=".*" creator=".*"/ {
    indiv = 1                                            # inside div from here on
    name = gensub(/.* title="([^"]+)".*/, "\\1", "")     # extract name
    tagsattr = gensub(/.* tags="([^"]+)".*/, "\\1", "")  # extract tags string
    split(tagsattr, tags, /, /)                          # split tags into array

    print(name) > name                                   # print name into file "name"
    for(tag in tags) printf("@%s ", tags[tag]) >> name   # print tags with "@" prefix
    printf("\n\n") >> name                               # two newlines
    sub(/.*<div [^>]+>/, "")                             # remove the tag so the rest
                                                         # of the line can be printed
}

# treat closing line
indiv == 1 && /<\/div>/ {
    sub(/<\/div>.*/, "")                                 # remove tag so the rest
    print >> name                                        # can be printed
    indiv = 0                                            # outside div from here on
}

# print all other lines inside of div
indiv == 1 {
    print >> name
}

chmod +x并以输入文件名作为参数进行调用。事实上，它将在当前目录中创建其输出文件，因此请小心。

如果您的输入文件以目录树的形式构造，您可能必须使用 shell 通配符、循环或实用程序来找到正确的命令行find。

Question 2

注意 gensub 是 awk 的 gawk 扩展，因此第一行实际上应该是

#!/usr/bin/gawk -f

在某些版本的 TiddlyWiki 中，这些行如下所示（第 4 行）：

/<div title=".*" modifier=".*"/

我想将所有 tiddlers 提取到一个 html 文件中，因此我删除了所有到“名称”文件的重定向，并添加了以下顶部和尾部代码：

BEGIN { print("<html>") }
END { print("</html>") }

代码真的很有用，展示了 awk 的强大功能！非常感谢，Peter

Answer

注意 gensub 是 awk 的 gawk 扩展，因此第一行实际上应该是

#!/usr/bin/gawk -f

在某些版本的 TiddlyWiki 中，这些行如下所示（第 4 行）：

/<div title=".*" modifier=".*"/

我想将所有 tiddlers 提取到一个 html 文件中，因此我删除了所有到“名称”文件的重定向，并添加了以下顶部和尾部代码：

BEGIN { print("<html>") }
END { print("</html>") }

代码真的很有用，展示了 awk 的强大功能！非常感谢，Peter

如何将 Tiddlywiki 解析为一堆纯文本文件？

答案1

答案2

相关内容