我有一个很长的 HTML 表,我正在将其转换为 XML 并分成几个部分。
缩写来源:
<html>
<head>
<title>Sample doc</title>
</head>
<body>
<table>
<tr>
<th>Category title</th>
<th>Parameter name</th>
<th>Level</th>
<th>Values</th>
<th>Description</th>
</tr>
<tr>
<td class="category">Category A</td>
<td class="paramname">Parameter 1</td>
<td class="lvl">1</td>
<td class="values">1-100</td>
<td class="description"><p>The quick brown fox jumped over the lazy dogs.</p></td>
</tr>
<tr>
<td class="category">Category A</td>
<td class="paramname">Parameter 2</td>
<td class="lvl">2</td>
<td class="values">2-200</td>
<td class="description"><p>Every good boy does fine.</p>
</td>
</tr>
<tr>
<td class="category">Category B</td>
<td class="paramname">Parameter 3</td>
<td class="lvl">3</td>
<td class="values">3-300</td>
<td class="description"><p>Colorless green ideas sleep furiously.</p></td>
</tr>
<tr>
<td class="category">Category B</td>
<td class="paramname">Parameter 4</td>
<td class="lvl">4</td>
<td class="values">4-400</td>
<td class="description"><p>This has been a test of the emergency broadcast system.</p></td>
</tr>
</table>
</body>
</html>
期望的输出:
<xml>
<section>
<title>Category A</title>
<para><emphasis><heading>Parameter Name: Parameter 1</heading></emphasis></para>
<para>Level: 1</para>
<para>Values: 1-100</para>
<para>Description: The quick brown fox jumped over the lazy dogs.</para>
<para><emphasis><heading>Parameter Name: Parameter 2</heading></emphasis></para>
<para>Level: 2</para>
<para>Values: 2-200</para>
<para>Description: Every good boy does fine.</para>
</section>
<section>
<title>Category B</title>
<para><emphasis><heading>Parameter Name: Parameter 3</heading></emphasis></para>
<para>Level: 3</para>
<para>Values: 3-300</para>
<para>Description: Colorless green ideas sleep furiously.</para>
<para><emphasis><heading>Parameter Name: Parameter 4</heading></emphasis></para>
<para>Level: 4</para>
<para>Values: 4-400</para>
<para>Description: This has been a test of the emergency broadcast system.</para>
</section>
</xml>
问题在于检测何时发生更改,并在输出流中category
创建元素以包含这些类别。section
我已经弄清楚了这个问题的第一部分——检测类别变化,以及第一个和最后一个tr
.但 XSLT 不允许发出仅包含结束或开始标记的 xml 片段,因此我不确定如何解决此问题的第二部分。以下片段是我的第一次尝试,但这不起作用:
<xsl:for-each select="tr">
<xsl:choose>
<xsl:when test="tr[1]">
<section><title><xsl:value-of select="td[@class='category']" /></title>
<xsl:apply-templates/>
</xsl:when>
<xsl:when test="tr[last()]">
<xsl:apply-templates/></section>
</xsl:when>
<xsl:when test="preceding-sibling::tr[1]/td[@class='category'] !=td[@class='category']">
</section><section><title><xsl:value-of select="td[@class='category']" /></title>
<xsl:apply-templates/>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each><!-- tr -->
似乎我需要缓冲输出,直到到达类别断点,此时我会将缓冲区内容包含在section
标签中,但我不知道该怎么做。
答案1
我正在使用 XSLT 2.0,并了解了for-each-group
,它极大地简化了这一过程。
<xsl:stylesheet
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
exclude-result-prefixes="xs">
<xsl:template match="/">
<xml>
<xsl:for-each select="//table">
<xsl:for-each-group select="tr" group-by="td[@class='category']" >
<section><title><xsl:value-of select="td[@class='category']" /></title>
<xsl:for-each select="current-group()">
<!-- handling for individual tr omitted -->
<xsl:apply-templates />
</xsl:for-each>
</section>
</xsl:for-each-group>
</xsl:for-each>
</xml>
</xsl:template>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>