是否有一个简单的 Bash 工具可以快速呈现基本的 HTML?

是否有一个简单的 Bash 工具可以快速呈现基本的 HTML?

有时我需要执行一个简单的任务,将基本 HTML 输出到控制台中。我希望对其进行最小化渲染,以便一目了然地阅读。是否有一个实用程序可以在 shell 中处理基本的 HTML 渲染(想想山猫-风格渲染——但不是实际的浏览器)?

例如,有时我会watch在 Apache 的mod_status页面上放一个:

watch -n 1 curl http://some-server/server-status

页面的输出是带有一些最小标记的 HTML,在 shell 中显示如下:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html><head>
<title>Apache Status</title>
</head><body>
<h1>Apache Server Status for localhost</h1>

<dl><dt>Server Version: Apache/2.2.22 (Ubuntu) PHP/5.3.10-1ubuntu3.15 with Suhosin-Patch</dt>
<dt>Server Built: Jul 22 2014 14:35:25
</dt></dl><hr /><dl>
<dt>Current Time: Wednesday, 19-Nov-2014 15:21:40 UTC</dt>
<dt>Restart Time: Wednesday, 19-Nov-2014 15:13:02 UTC</dt>
<dt>Parent Server Generation: 1</dt>
<dt>Server uptime:  8 minutes 38 seconds</dt>
<dt>Total accesses: 549 - Total Traffic: 2.8 MB</dt>
<dt>CPU Usage: u35.77 s12.76 cu0 cs0 - 9.37% CPU load</dt>
<dt>1.06 requests/sec - 5.6 kB/second - 5.3 kB/request</dt>
<dt>1 requests currently being processed, 9 idle workers</dt>
</dl><pre>__W._______.....................................................
................................................................
................................................................
................................................................
</pre>
<p>Scoreboard Key:<br />
"<b><code>_</code></b>" Waiting for Connection,
"<b><code>S</code></b>" Starting up,
"<b><code>R</code></b>" Reading Request,<br />
"<b><code>W</code></b>" Sending Reply,
"<b><code>K</code></b>" Keepalive (read),
"<b><code>D</code></b>" DNS Lookup,<br />
"<b><code>C</code></b>" Closing connection,
"<b><code>L</code></b>" Logging,
"<b><code>G</code></b>" Gracefully finishing,<br />
"<b><code>I</code></b>" Idle cleanup of worker,
"<b><code>.</code></b>" Open slot with no current process</p>
<p />

在 Lynx 中查看时,相同的 HTML 呈现为: Apache Status (p1 of 2) Apache Server Status for localhost

   Server Version: Apache/2.2.22 (Ubuntu) PHP/5.3.10-1ubuntu3.15 with Suhosin-Patch
   Server Built: Jul 22 2014 14:35:25
     ________________________________________________________________________________________________________

   Current Time: Wednesday, 19-Nov-2014 15:23:50 UTC
   Restart Time: Wednesday, 19-Nov-2014 15:13:02 UTC
   Parent Server Generation: 1
   Server uptime: 10 minutes 48 seconds
   Total accesses: 606 - Total Traffic: 3.1 MB
   CPU Usage: u37.48 s13.6 cu0 cs0 - 7.88% CPU load
   .935 requests/sec - 5088 B/second - 5.3 kB/request
   2 requests currently being processed, 9 idle workers

_C_______W_.....................................................
................................................................
................................................................
................................................................

   Scoreboard Key:
   "_" Waiting for Connection, "S" Starting up, "R" Reading Request,
   "W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
   "C" Closing connection, "L" Logging, "G" Gracefully finishing,
   "I" Idle cleanup of worker, "." Open slot with no current process

答案1

lynx有一个“转储”模式,您可以将其与以下命令一起使用watch

$ watch lynx https://www.google.com -dump

输出的屏幕截图

man lynx:

   -dump  dumps  the  formatted  output  of  the default document or those
          specified on  the  command  line  to  standard  output.   Unlike
          interactive mode, all documents are processed.  This can be used
          in the following way:

          lynx -dump http://www.subir.com/lynx.html

          Files specified on the command line are  formatted  as  HTML  if
          their  names  end  with one of the standard web suffixes such as
          “.htm” or “.html”.  Use the -force_html option to  format  files
          whose names do not follow this convention.

问个Ubuntu问题还有更多选择。

答案2

w3m是另一个有选项的程序-dump

它是后端 Emacs 最流行的 Web 浏览器。

答案3

至少有两个名为html2text(1) (2)来完成这项工作。

答案4

对于不同类型的方法,潘多克可以在多种格式之间进行转换,包括从 html 到纯文本,并且您可以直接给它一个 URL 以将 html 转换为其他格式,

pandoc --to plain https://example.net

或者,如果您想要一些格式,您可以使用 markdown 输出:

pandoc --to markdown https://example.net

相关内容