mongo 客户端大约有 25% 的几率会失败,并提示“消息……太大”

mongo 客户端大约有 25% 的几率会失败,并提示“消息……太大”

我正在尝试在 mongo 服务器前以 tcp 模式使用 haproxy。在 haproxy 机器上,我有一个 mongo 客户端可以进行测试。

当从 haproxy 机器直接连接到 mongo 服务器时,它 100% 正常工作

当我使用 haproxy 从 haproxy 机器连接到 mongo 服务器时,大约 25% 的时间无法协商正确的 mongo 连接。Mongo 客户端说 recv():消息长度 1347703880 太大。最大值为 48000000

这似乎不是 mongo 客户端或服务器的问题,因为直接连接 100% 都可以正常工作。

场景中的服务器:

     10.5.198.10     haproxy and mongo client for testing
     10.5.20.20       mongo server running port 17010

版本信息 / HA 代理机 & mongo 客户端

    OS: Debian Jessie
    SMP Debian 3.16.7-ckt20-1+deb8u3 (2016-01-17) x86_64 GNU/Linux

    bluebrick@ip-10-5-198-10:~$ mongo --version
    MongoDB shell version: 2.4.10

    root@ip-10-5-198-10:~/tests/pmongo# haproxy -vv
    HA-Proxy version 1.6.3 2015/12/25
    Copyright 2000-2015 Willy Tarreau <[email protected]>
    Build options :
      TARGET  = linux2628
      CPU     = generic
      CC      = gcc
      CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement
      OPTIONS = 
    Default settings :
      maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
    Encrypted password support via crypt(3): yes
    Built without compression support (neither USE_ZLIB nor USE_SLZ are set)
    Compression algorithms supported : identity("identity")
    Built without OpenSSL support (USE_OPENSSL not set)
    Built without PCRE support (using libc's regex instead)
    Built without Lua support
    Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
    Available polling systems :
          epoll : pref=300,  test result OK
           poll : pref=200,  test result OK
         select : pref=150,  test result OK
    Total: 3 (3 usable), will use epoll.

版本信息/mongo 服务器

    Server OS: Ubuntu trusty
    14.04.1-Ubuntu SMP Tue Sep 1 09:32:55 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

    [email protected]:/home/bluebrick# mongod --version
    db version v3.2.1
    git version: a14d55980c2cdc565d4704a7e3ad37e4e535c1b2
    OpenSSL version: OpenSSL 1.0.1f 6 Jan 2014
    allocator: tcmalloc
    modules: none
    build environment:
        distmod: ubuntu1404
        distarch: x86_64
        target_arch: x86_64

ha 代理配置文件

    root@ip-10-5-198-10:~/tests/pmongo# cat conf.10
    ######################################################################
    global
    ######################################################################
            maxconn 2048
            log /dev/log    local0 
            log /dev/log  local1 debug
            chroot /var/lib/haproxy
            user haproxy
            group haproxy
            debug
    ######################################################################
    defaults
    ######################################################################
            log     global
            mode    tcp
            option tcplog
            timeout connect 5000
            timeout client  50000
            timeout server  50000
    ######################################################################
    frontent
    ######################################################################
            frontend   fe_20_20_mongo_27010_tcp
            bind 10.5.198.10:27010
            mode tcp
            option tcplog
            use_backend    be_20_20_mongo_27010_tcp
    ######################################################################
    backend
    ######################################################################
            backend   be_20_20_mongo_27010_tcp
            mode tcp
            option tcplog
            option             tcpka
            server node1 10.5.20.20:27010 
    ##################################################
    ##################################################

当我绕过 haproxy 连接到 mongo 时,它看起来像这样:

    bluebrick@ip-10-5-198-10:~$ mongo 10.5.20.20:27010 -verbose
    MongoDB shell version: 2.4.10
    Sat Feb 27 13:12:46.776 versionArrayTest passed
    connecting to: 10.5.20.20:27010/test
    Sat Feb 27 13:12:46.798 creating new connection to:10.5.20.20:27010
    Sat Feb 27 13:12:46.799 BackgroundJob starting: ConnectBG
    Sat Feb 27 13:12:46.803 connected connection!
    Server has startup warnings: 
    2016-02-27T12:48:57.313-0500 I CONTROL  [initandlisten] ** WARNING: You are running this process as the root user, which is not recommended.
    2016-02-27T12:48:57.313-0500 I CONTROL  [initandlisten] 
    2016-02-27T12:48:57.313-0500 I CONTROL  [initandlisten] 
    2016-02-27T12:48:57.313-0500 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
    2016-02-27T12:48:57.313-0500 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
    2016-02-27T12:48:57.313-0500 I CONTROL  [initandlisten] 
    2016-02-27T12:48:57.313-0500 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is 'always'.
    2016-02-27T12:48:57.313-0500 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
    2016-02-27T12:48:57.313-0500 I CONTROL  [initandlisten] 
    rs0:SECONDARY> quit()
    bluebrick@ip-10-5-198-10:~$ 

当我运行 haproxy 服务器时它看起来像这样:

    root@ip-10-5-198-10:~/tests/pmongo# haproxy -d -f conf.10
    Available polling systems :
          epoll : pref=300,  test result OK
           poll : pref=200,  test result OK
         select : pref=150,  test result FAILED
    Total: 3 (2 usable), will use epoll.
    Using epoll() as the polling mechanism.
    00000000:fe_20_20_mongo_27010_tcp.accept(0004)=0006 from [10.5.198.10:43177]
    00000000:be_20_20_mongo_27010_tcp.srvcls[0006:0007]
    00000000:be_20_20_mongo_27010_tcp.clicls[0006:0007]
    00000000:be_20_20_mongo_27010_tcp.closed[0006:0007]
    00000001:fe_20_20_mongo_27010_tcp.accept(0004)=0006 from [10.5.198.10:43206]
    00000001:be_20_20_mongo_27010_tcp.srvcls[0006:0007]
    00000001:be_20_20_mongo_27010_tcp.clicls[0006:0007]
    00000001:be_20_20_mongo_27010_tcp.closed[0006:0007]

当我使用 haproxy 连接到 mongo 并且它工作时它看起来像这样:

    bluebrick@ip-10-5-198-10:~$ mongo 10.5.198.10:27010 -verbose
    mongodb shell version: 2.4.10
    sat feb 27 13:04:00.655 versionarraytest passed
    connecting to: 10.5.198.10:27010/test
    sat feb 27 13:04:00.678 creating new connection to:10.5.198.10:27010
    sat feb 27 13:04:00.678 backgroundjob starting: connectbg
    sat feb 27 13:04:00.678 connected connection!
    server has startup warnings: 
    2016-02-27t12:48:57.313-0500 i control  [initandlisten] ** warning: you are running this process as the root user, which is not recommended.
    2016-02-27t12:48:57.313-0500 i control  [initandlisten] 
    2016-02-27t12:48:57.313-0500 i control  [initandlisten] 
    2016-02-27t12:48:57.313-0500 i control  [initandlisten] ** warning: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
    2016-02-27t12:48:57.313-0500 i control  [initandlisten] **        we suggest setting it to 'never'
    2016-02-27t12:48:57.313-0500 i control  [initandlisten] 
    2016-02-27t12:48:57.313-0500 i control  [initandlisten] ** warning: /sys/kernel/mm/transparent_hugepage/defrag is 'always'.
    2016-02-27t12:48:57.313-0500 i control  [initandlisten] **        we suggest setting it to 'never'
    2016-02-27t12:48:57.313-0500 i control  [initandlisten] 
    rs0:secondary> quit()

当我使用 haproxy 连接到 mongo 并且它失败时,它看起来像这样:

    bluebrick@ip-10-5-198-10:~$ mongo 10.5.198.10:27010 -verbose
    MongoDB shell version: 2.4.10
    Sat Feb 27 13:04:03.900 versionArrayTest passed
    connecting to: 10.5.198.10:27010/test
    Sat Feb 27 13:04:03.922 creating new connection to:10.5.198.10:27010
    Sat Feb 27 13:04:03.922 BackgroundJob starting: ConnectBG
    Sat Feb 27 13:04:03.922 connected connection!
    Sat Feb 27 13:04:03.923 recv(): message len 1347703880 is too large. Max is 48000000
    Sat Feb 27 13:04:03.923 DBClientCursor::init call() failed
    Sat Feb 27 13:04:03.923 User Assertion: 10276:DBClientBase::findN: transport error: 10.5.198.10:27010 ns: admin.$cmd query: { whatsmyuri: 1 }
    Sat Feb 27 13:04:03.923 Error: DBClientBase::findN: transport error: 10.5.198.10:27010 ns: admin.$cmd query: { whatsmyuri: 1 } at src/mongo/shell/mongo.js:147
    Sat Feb 27 13:04:03.923 User Assertion: 12513:connect failed
    Sat Feb 27 13:04:03.923 freeing 1 uncollected N5mongo20DBClientWithCommandsE objects
    exception: connect failed
    bluebrick@ip-10-5-198-10:~$ 

查看 mongo 服务器日志:良好的连接如下所示:

    2016-02-27T12:53:14.944-0500 D STORAGE  [WTJournalFlusher] flushed journal
    2016-02-27T12:53:14.966-0500 I NETWORK  [initandlisten] connection accepted from 10.5.198.10:36447 #30 (9 connections now open)
    2016-02-27T12:53:14.966-0500 D COMMAND  [conn30] run command admin.$cmd { whatsmyuri: 1 }
    2016-02-27T12:53:14.966-0500 I COMMAND  [conn30] command admin.$cmd command: whatsmyuri { whatsmyuri: 1 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:66 locks:{} protocol:op_query 0ms
    2016-02-27T12:53:14.968-0500 D COMMAND  [conn30] run command admin.$cmd { getLog: "startupWarnings" }
    2016-02-27T12:53:14.968-0500 D COMMAND  [conn30] command: getLog
    2016-02-27T12:53:14.968-0500 I COMMAND  [conn30] command admin.$cmd command: getLog { getLog: "startupWarnings" } keyUpdates:0 writeConflicts:0 numYields:0 reslen:949 locks:{} protocol:op_query 0ms
    2016-02-27T12:53:14.981-0500 D COMMAND  [conn30] run command admin.$cmd { replSetGetStatus: 1.0, forShell: 1.0 }
    2016-02-27T12:53:14.981-0500 D COMMAND  [conn30] command: replSetGetStatus
    2016-02-27T12:53:14.981-0500 I COMMAND  [conn30] command admin.$cmd command: replSetGetStatus { replSetGetStatus: 1.0, forShell: 1.0 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:964 locks:{} protocol:op_query 0ms

查看 Mongo 服务器日志:连接失败

    There is nothing put in the mongo server logs in this scenario.

相关内容