如何从fb2书中提取目录？

Question 1

使用xmlstarlet：

xmlstarlet select --template \
    --value-of '//_:section/_:title/_:p | //_:subtitle' \
    -nl file.xml

或者，使用简短的选项，

xmlstarlet sel -t \
    -v '//_:section/_:title/_:p | //_:subtitle' \
    -n file.xml

这里使用的 XPath 查询将提取每个下的p节点的值，以及所有节点的值。titlesectionsubtitle

表达式中每个节点名称之前的前缀_:是文档正在使用的命名空间标识符的匿名占位符。

根据您的示例文档，上述两个命令中的任何一个的输出将是

Part 1
Some name of Part 1
Chapter 1
Some name of Chapter 1
Episode 1
Episode 2
Part 2
Some name of Part 2
Chapter 3
Some name of Chapter 3
Episode 3
Episode 4

您是否还想要书名，然后删除_:section表达式中的限制（这将使书名的p节点也匹配）。

获取每个部分的标题和副标题（避免书名）的另一种方法，可能看起来更干净一些（因为它表明副标题是从各部分中选取的，而不是从任何地方选取的），是首先限制匹配到部分，然后从这些部分获取数据：

xmlstarlet select --template \
    --match '//_:section' \
    --value-of '_:title/_:p | _:subtitle' \
    -nl file.xml

Answer

使用xmlstarlet：

xmlstarlet select --template \
    --value-of '//_:section/_:title/_:p | //_:subtitle' \
    -nl file.xml

或者，使用简短的选项，

xmlstarlet sel -t \
    -v '//_:section/_:title/_:p | //_:subtitle' \
    -n file.xml

这里使用的 XPath 查询将提取每个下的p节点的值，以及所有节点的值。titlesectionsubtitle

表达式中每个节点名称之前的前缀_:是文档正在使用的命名空间标识符的匿名占位符。

根据您的示例文档，上述两个命令中的任何一个的输出将是

Part 1
Some name of Part 1
Chapter 1
Some name of Chapter 1
Episode 1
Episode 2
Part 2
Some name of Part 2
Chapter 3
Some name of Chapter 3
Episode 3
Episode 4

您是否还想要书名，然后删除_:section表达式中的限制（这将使书名的p节点也匹配）。

获取每个部分的标题和副标题（避免书名）的另一种方法，可能看起来更干净一些（因为它表明副标题是从各部分中选取的，而不是从任何地方选取的），是首先限制匹配到部分，然后从这些部分获取数据：

xmlstarlet select --template \
    --match '//_:section' \
    --value-of '_:title/_:p | _:subtitle' \
    -nl file.xml

Question 2

与一个XPath3意识FOSS（GPLv3）命令行工具，xidel:

XPath2 构建序列:

xidel -e '(//section/title/p, //subtitle)'  file.xml

XPath1:

xidel -e '//section/title/p | //subtitle'  file.xml

Part 1
Some name of Part 1
Chapter 1
Some name of Chapter 1
Episode 1
Episode 2
Part 2
Some name of Part 2
Chapter 3
Some name of Chapter 3
Episode 3
Episode 4

xidel是查询 XML/HTML/JSON 的瑞士军刀。它足够智能，可以namespace自行管理默认设置。

Answer