这是我的代码,用于提取标题下的某些数据Item Drop%
。我想提取90.5%
该标题下的内容。但我只能提取整个列,而不仅仅是该值。任何想法 ?
#!/usr/bin/perl
use strict;
use warnings;
use HTML::TableExtract;
use LWP::Simple;
my $file = 'data.html';
unless ( -e $file ) {
my $rc = getstore(
'proj/Desktop/folder1/data.html',
$file);
die "Failed to download document\n" unless $rc == 200;
}
my $te = HTML::TableExtract->new( headers => qw(Item Drop%)]);
$te->parse_file($file);
my ($table) = $te->tables;
foreach my $ts (ts->tables) {
print "Table (", join(',', $ts->coords), ");\n";
foreach my $row ($ts->rows) {
print join(',', @$row), "\n";
}
}
我的data.html
是:
..
..
..
<table align = "center" class="" style= .......>
<tr>
<th rowspan="2">EM</th>
<th colspan="2"><a href= "proj/Desktop/folder1/data.html" class = ..../th>
<td> 90.5%</td>
</tr>
..
..
..
..
<tr>
<th rowspan="2">EM</th>
<th colspan="2"><a href= "proj/Desktop/folder1/data.html" class = ..../th>
<td> 40%</td>
</tr>
</table>
答案1
我的想法是路径大多数情况下,这是任何语言中 HTML 抓取的更好方法,并且不限于表格。珀尔的HTML::TreeBuilder::XPath
是必须具备的,并且可以轻松获取您的价值,请检查:
#!/usr/bin/env perl
use strict; use warnings;
use HTML::TreeBuilder::XPath;
my $tree = HTML::TreeBuilder::XPath->new;
$tree->parse_file("./data.html");
print [$tree->findvalues('//table//td[contains(text(), "%")')]->[0];
输出
90.5%