几个小时后,Varnish 停止监听 80 端口

几个小时后,Varnish 停止监听 80 端口

我已将 varnish 配置为监听端口 80,将 nginx 配置为监听端口 8080。在正常运行约 24 小时后,我的网站已停机 22 小时。我检查后发现 varnish 没有监听端口 80。


abc@abc:~$ sudo netstat -anp --tcp --udp | grep LISTEN
tcp        0      0    *               LISTEN      571/varnishd
tcp        0      0*               LISTEN      376/nginx
tcp        0      0  *               LISTEN      376/nginx        
tcp        0      0 publicip:6082 *               LISTEN      569/varnishd
tcp6       0      0 :::80                   :::*                    LISTEN      376/nginx         
tcp6       0      0 ::1:6082                :::*                    LISTEN      569/varnishd


abc@abc:~$ sudo netstat -anp --tcp --udp | grep LISTEN
tcp        0      0*               LISTEN      376/nginx
tcp        0      0  *               LISTEN      376/nginx
tcp        0      0 publicip:6082 *               LISTEN      745/varnishd
tcp6       0      0 :::80                   :::*                    LISTEN      376/nginx
tcp6       0      0 ::1:6082                :::*                    LISTEN      745/varnishd

这是我的 /etc/default/varnish:

## Alternative 2, Configuration with VCL
# Listen on port 6081, administration on localhost:6082, and forward to
# one content server selected by the vcl file, based on the request.  Use a 1GB
# fixed-size cache file.
DAEMON_OPTS="-a :80 \
             -T localhost:6082 \
             -f /etc/varnish/default.vcl \
             -S /etc/varnish/secret \
             -s malloc,96m"

在第二种情况下,Varnish 没有监听 80 端口,这有什么具体原因吗?我可能只需要检查一下,如果 Varnish 没有启动,就重新启动它,但这仍然意味着几分钟的停机时间。

我的 varnish.vcl 文件:http://pastebin.com/UH2c8KdH 我在 ubuntu 12.04 x86 上

大约 2 小时后它再次发生,这是我从系统日志中发现的。

Feb 14 18:16:00 abc varnishd[745]: Child (749) not responding to CLI, killing it.
Feb 14 18:16:51 abc varnishd[745]: Child (749) not responding to CLI, killing it.
Feb 14 18:17:49 abc varnishd[745]: Child (749) not responding to CLI, killing it.
Feb 14 18:18:06 abc varnishd[745]: Child (749) not responding to CLI, killing it.
Feb 14 18:19:33 abc varnishd[745]: Child (749) not responding to CLI, killing it.
Feb 14 18:21:25 abc varnishd[745]: Child (749) not responding to CLI, killing it.
Feb 14 18:22:34 abc varnishd[745]: Child (749) not responding to CLI, killing it.
Feb 14 18:28:28 abc varnishd[745]: Child (749) not responding to CLI, killing it.
Feb 14 18:29:41 abc varnishd[745]: Child (749) not responding to CLI, killing it.
Feb 14 18:29:48 abc last message repeated 2 times
Feb 14 18:29:48 abc varnishd[745]: Child (749) died signal=3
Feb 14 18:29:49 abc varnishd[745]: Child cleanup complete
Feb 14 18:29:55 abc varnishd[745]: child (1380) Started
Feb 14 18:29:58 abc varnishd[745]: Pushing vcls failed: CLI communication error (hdr)
Feb 14 18:29:58 abc varnishd[745]: Stopping Child
Feb 14 18:29:58 abc varnishd[745]: Child (1380) said Child starts
Feb 14 18:29:59 abc varnishd[745]: Child (1380) said Child dies
Feb 14 18:30:02 abc varnishd[745]: Child (1380) died status=1
Feb 14 18:30:04 abc varnishd[745]: Child cleanup complete

我不确定为什么进程 ID 与我之前发布的不同。也许我在故障排除时重新启动了它。我真的无法从这些日志中看出太多信息。任何帮助都非常感谢。




Feb 13 17:40:44 dragon75 varnishd[581]: Child (583) died signal=3
Feb 13 17:41:09 dragon75 varnishd[581]: child (2682) Started
Feb 13 17:42:31 dragon75 varnishd[581]: Child (2682) said Child starts
Feb 13 17:51:48 dragon75 varnishd[581]: Child (2682) died status=1
Feb 13 17:51:48 dragon75 varnishd[581]: Child (-1) said Child dies


Feb 14 18:29:48 dragon75 varnishd[745]: Child (749) died signal=3
Feb 14 18:29:55 dragon75 varnishd[745]: child (1380) Started
Feb 14 18:29:58 dragon75 varnishd[745]: Child (1380) said Child starts
Feb 14 18:29:59 dragon75 varnishd[745]: Child (1380) said Child dies
Feb 14 18:30:02 dragon75 varnishd[745]: Child (1380) died status=1

根据消息,16:31 varnish 启动,然后 /var/log/messages 中有 5 条 MARK 消息,18:29 varnish child died 消息。中间什么都没有。

我不认为资源是瓶颈。这是一个新网站,仍处于测试阶段。我还没有真正在上面放任何东西。除了我在另一台服务器上的 uptime 脚本(它只检查主页)外,没有流量。这是我第一次使用 varnish。



这控制监控父进程等待子进程响应健康检查的时间。如果操作系统正忙于将数据分页到磁盘或从磁盘分页,则 10 秒的默认值可能太低。将其增加到 1 分钟(从 4.0 开始为默认值),看看问题是否消失。

