AMQP 的 Graphite 指标抛出错误,需要很长时间才能重新连接

AMQP 的 Graphite 指标抛出错误,需要很长时间才能重新连接

我正在尝试通过 RabbitMQ 交换将一些指标拉入 Graphite。我已让我的发布者愉快地将数据发布到名为 的交换器metrics,并且我已配置carbon.conf以下内容:

ENABLE_AMQP = True
AMQP_HOST = hostname
AMQP_PORT = 5672
AMQP_VHOST = /vhost
AMQP_USER = user
AMQP_PASSWORD = password
AMQP_EXCHANGE = metrics
AMQP_METRIC_NAME_IN_BODY = True

rMQ 安装是 haproxy 后面的双节点集群。

当这个方法奏效时,效果很好。然而,Carbon 经常会出现以下问题:

02/05/2013 15:13:14 :: [console] Unhandled error in Deferred:
02/05/2013 15:13:14 :: [console] Unhandled Error
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 421, in errback
    self._startRunCallbacks(fail)
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 488, in _startRunCallbacks
    self._runCallbacks()
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 575, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1126, in gotResult
    _inlineCallbacks(r, g, deferred)
--- <exception caught here> ---
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1068, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/usr/local/lib/python2.7/dist-packages/twisted/python/failure.py", line 389, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/opt/graphite/lib/carbon/amqp_listener.py", line 70, in connectionMade
    yield self.receive_loop()
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1068, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/usr/local/lib/python2.7/dist-packages/twisted/python/failure.py", line 389, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/opt/graphite/lib/carbon/amqp_listener.py", line 102, in receive_loop
    msg = yield queue.get()
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 575, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/usr/local/lib/python2.7/dist-packages/txamqp/queue.py", line 32, in _raiseIfClosed
    raise Closed()
txamqp.queue.Closed:

02/05/2013 15:13:14 :: [console] <twisted.internet.tcp.Connector instance at 0x2219f80> will retry in 1976 seconds
02/05/2013 15:13:14 :: [console] Stopping factory <carbon.amqp_listener.AMQPReconnectingFactory instance at 0x2214ab8>

不知怎么的,连接断了。更糟糕的是,它要在半小时后重新连接!

我如何

  1. 找出断开连接的原因吗?
  2. 大幅减少重新连接时间?

软件:

txAMQP==0.6.2
graphite 0.9.11
RabbitMQ 3.1.0
Haproxy 1.4.18

答案1

我们今天遇到了同样的问题。我不确定 #1 是否正确,但我认为第二个问题是 amqp_listener.py 中的重新连接延迟从未重置,应该在 buildProtocol 中构建协议之前重置。我在这里提交了一个拉取请求:https://github.com/graphite-project/carbon/pull/102。 希望这可以帮助。

变更前(省略例外情况):

console.log.2013_5_2:02/05/2013 17:11:14 :: 将在 2 秒后重试 console.log.2013_5_2:02/05/2013 17:11:16 :: 将在 5 秒后重试 console.log.2013_5_2:02/05/2013 17:41:18 :: 将在 12 秒后重试 console.log.2013_5_2:02/05/2013 18:11:22 :: 将在 28 秒后重试 console.log.2013_5_2:02/05/2013 18:41:26 :: 将在 77 秒后重试 console.log.2013_5_2:02/05/2013 19:11:32 :: 将在 178 秒后重试console.log.2013_5_2:02/05/2013 19:41:39 :: 将在 455 秒后重试 console.log.2013_5_2:02/05/2013 20:11:48 :: 将在 967 秒后重试 console.log.2013_5_2:02/05/2013 20:42:01 :: 将在 1831 秒后重试 console.log.2013_5_2:02/05/2013 21:22:13 :: 将在 3375 秒后重试

变更后(省略例外情况):

console.log.2013_5_2:02/05/2013 21:42:21 :: 将在 2 秒后重试 console.log.2013_5_2:02/05/2013 21:42:24 :: 将在 9 秒后重试 console.log.2013_5_2:02/05/2013 22:12:18 :: 将在 2 秒后重试 console.log.2013_5_2:02/05/2013 22:12:21 :: 将在 9 秒后重试 console.log.2013_5_2:02/05/2013 22:42:32 :: 将在 2 秒后重试 console.log.2013_5_2:02/05/2013 22:42:35 :: 将在 7 秒后重试console.log.2013_5_2:02/05/2013 23:12:29 :: 将在 2 秒后重试 console.log.2013_5_2:02/05/2013 23:12:32 :: 将在 5 秒后重试 console.log.2013_5_2:02/05/2013 23:42:38 :: 将在 2 秒后重试 console.log.2013_5_2:02/05/2013 23:42:41 :: 将在 6 秒后重试

相关内容