从 HTML 中提取值的 Bash 脚本

从 HTML 中提取值的 Bash 脚本

我正在尝试从 HA7NET 1wire 设备服务器中提取计数器值,但我不太习惯 sed 或 awk 或 bash 脚本,所以我遇到了问题。

该脚本为我提供了一个包含计数器 ID 的数组:

#!/bin/sh
Counters=$(curl -q "http://192.168.70.21/1Wire/Search.html?FamilyCode=1D" 2>/dev/null | sed --silent -e 's/.*<INPUT.*NAME="Address_\(.*\)".*VALUE="\(.*\)".*./\2/p')

# iterating by for to see the array.
for x in $Counters; do echo $x; done;

结果列出了设备,如下所示:

    D90000000C8A9A1D
    C00000000C8C9D1D
    2D0000000EE97D1D

现在我想使用该数组进行其他curl 请求来获取计数器的实际读数?从两个计数器获取读数的 URL(每个设备有 2 个计数器 A、B),它可以扩展为一次读取所有设备,如下所示:

curl -q "http://192.168.70.21/1Wire/ReadCounter.html?Address_Channel_Array={D90000000C8A9A1D,A},{D90000000C8A9A1D,B},{C00000000C8C9D1D,A},{C00000000C8C9D1D,B},2D0000000EE97D1D,A},{2D0000000EE97D1D,B}" 

以及生成的 html 页面:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html><!-- InstanceBegin template="/Templates/1WireReply.dwt" codeOutsideHTMLIsLocked="false" -->
<head>
<!-- InstanceBeginEditable name="doctitle" -->
<title>Read Counter Reply</title><!-- InstanceEndEditable -->
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<!-- InstanceBeginEditable name="head" --><!-- InstanceEndEditable -->
<style type="text/css">
<!--
@import url("/eds.css");
-->
</style>
<!-- InstanceParam name="pagePreprocessor" type="text" value="preProcessReadCounter" --><!-- InstanceParam name="functionname" type="text" value="Read Counter" --><!-- InstanceParam name="nextpage" type="text" value="PgReadCounterResult" --><!-- InstanceParam name="enctype" type="text" value="application/x-www-form-urlencoded" --><!-- InstanceParam name="name" type="text" value="Read Counter Result" -->
</head><body>
<table width="100%" border="0" cellspacing="0" cellpadding="0" bgcolor="#EEEEEE">
<tr>
<td class="title" colspan="2"><h1>&nbsp;</h1><h1 class="title">Embedded Data Systems</h1><a class="title" href="http://www.embeddeddatasystems.com">http://www.embeddeddatasystems.com</a></td></tr><tr class="spacer">
<td><H2 class="spacer">Read Counter Reply</h2></td><td><p class="spacer">HA7Net: 1.0.0.22</p></td></tr><tr>
<td colspan="2"><FORM METHOD="POST" ACTION="/Forms/ReadCounterResult_1" name="Read Counter Result"><table name="Exceptions" ID="Exceptions">
<tr>
<td><INPUT CLASS="HA7Value" NAME="Exception_Code_0" ID="Exception_Code_0" TYPE="hidden" VALUE="0" Size="5" disabled></td><td><INPUT CLASS="HA7Value" NAME="Exception_String_0" ID="Exception_String_0" TYPE="hidden" VALUE="None" Size="5" disabled></td></tr></table><!-- InstanceBeginEditable name="WorkArea" -->
<table name="Counter" id="Counter">
<tr><td colspan=1>Address</td><td colspan=1>Count</td><td colspan=1>Status</td></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_0" ID="Address_0" TYPE="text" VALUE="D90000000C8A9A1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_0" ID="Count_0" TYPE="text" VALUE="240155653"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_0" ID="Device_Exception_0" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_0" ID="Device_Exception_Code_0" TYPE="hidden" VALUE="0"></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_1" ID="Address_1" TYPE="text" VALUE="D90000000C8A9A1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_1" ID="Count_1" TYPE="text" VALUE="48719610"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_1" ID="Device_Exception_1" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_1" ID="Device_Exception_Code_1" TYPE="hidden" VALUE="0"></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_2" ID="Address_2" TYPE="text" VALUE="C00000000C8C9D1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_2" ID="Count_2" TYPE="text" VALUE="0"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_2" ID="Device_Exception_2" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_2" ID="Device_Exception_Code_2" TYPE="hidden" VALUE="0"></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_3" ID="Address_3" TYPE="text" VALUE="C00000000C8C9D1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_3" ID="Count_3" TYPE="text" VALUE="1"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_3" ID="Device_Exception_3" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_3" ID="Device_Exception_Code_3" TYPE="hidden" VALUE="0"></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_4" ID="Address_4" TYPE="text" VALUE="2D0000000EE97D1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_4" ID="Count_4" TYPE="text" VALUE="1973018"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_4" ID="Device_Exception_4" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_4" ID="Device_Exception_Code_4" TYPE="hidden" VALUE="0"></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_5" ID="Address_5" TYPE="text" VALUE="2D0000000EE97D1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_5" ID="Count_5" TYPE="text" VALUE="17260345"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_5" ID="Device_Exception_5" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_5" ID="Device_Exception_Code_5" TYPE="hidden" VALUE="0"></tr>
</table><!-- InstanceEndEditable -->
<table name="Statistics" ID="Statistics">

 

我想将每个计数器的值提取到文件或变量中以供进一步使用。

我知道可以使用 owfs,但我希望能够灵活地这样做。

答案1

HTML 不是一种常规语言,因此首先要注意,尝试使用正则表达式解析它是一张通向疯狂的头等奖。也就是说,它看起来像是您正在尝试咀嚼的 HTML应该适合相当简单的提取。您要提取的数据似乎来自这些行:

<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_0" ID="Address_0" TYPE="text" VALUE="D90000000C8A9A1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_0" ID="Count_0" TYPE="text" VALUE="240155653"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_0" ID="Device_Exception_0" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_0" ID="Device_Exception_Code_0" TYPE="hidden" VALUE="0"></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_1" ID="Address_1" TYPE="text" VALUE="D90000000C8A9A1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_1" ID="Count_1" TYPE="text" VALUE="48719610"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_1" ID="Device_Exception_1" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_1" ID="Device_Exception_Code_1" TYPE="hidden" VALUE="0"></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_2" ID="Address_2" TYPE="text" VALUE="C00000000C8C9D1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_2" ID="Count_2" TYPE="text" VALUE="0"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_2" ID="Device_Exception_2" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_2" ID="Device_Exception_Code_2" TYPE="hidden" VALUE="0"></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_3" ID="Address_3" TYPE="text" VALUE="C00000000C8C9D1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_3" ID="Count_3" TYPE="text" VALUE="1"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_3" ID="Device_Exception_3" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_3" ID="Device_Exception_Code_3" TYPE="hidden" VALUE="0"></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_4" ID="Address_4" TYPE="text" VALUE="2D0000000EE97D1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_4" ID="Count_4" TYPE="text" VALUE="1973018"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_4" ID="Device_Exception_4" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_4" ID="Device_Exception_Code_4" TYPE="hidden" VALUE="0"></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_5" ID="Address_5" TYPE="text" VALUE="2D0000000EE97D1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_5" ID="Count_5" TYPE="text" VALUE="17260345"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_5" ID="Device_Exception_5" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_5" ID="Device_Exception_Code_5" TYPE="hidden" VALUE="0"></tr>

所以,我们大概可以sed这样:

$ sed -n '/HA7Value.*Address_/{ s/VALUE="/%%%/;s/^.*%%%//; s/".*//; p; }' input.html
D90000000C8A9A1D
D90000000C8A9A1D
C00000000C8C9D1D
C00000000C8C9D1D
2D0000000EE97D1D
2D0000000EE97D1D

对此进行阐述:

/HA7Value.*Address_/ # Only run on lines that match this expression
{                    # Begin code block
  s/VALUE="/%%%/     # Replace (only) the first 'VALUE="' with a special marker
  s/^.*%%%//         # Delete everything up to that marker
  s/".*//            # Delete from the first '"' to the end of the line
  p                  # Print what's left
}                    # End code block

相关内容