根本原因

根本原因

感谢您对此问题提供的专家帮助。

同一台机器上运行的两个 weblogic 管理节点有一天宕机了,在尝试使用重新启动它们时startManagedWebLogic.sh,发现 Weblogic 卡在了语句上。

<Nov 17, 2015 12:11:29 AM UTC> <Info> <WorkManager> <BEA-002900> <Initializing self-tuning thread pool>

在两个节点上都观察到了这种行为。

运行 aPS -3 PID产生了以下线程转储。运行 3、4 次后,每次都能在线程转储中观察到该方法的存在。

weblogic/diagnostics/flightrecorder/FlightRecorderManager.isRecordingPossible(FlightRecorderManager.java:181)

非常感谢任何关于解决该问题并启动 weblogic 节点的指示。

下面给出了线程转储以供参考。

===== FULL THREAD DUMP ===============
Mon Nov 16 19:19:12 2015
Oracle JRockit(R) R28.2.7-7-155314-1.6.0_45-20130329-0641-linux-x86_64

    "Main Thread" id=1 idx=0x4 tid=26997 prio=5 alive, native_blocked
    at java/lang/System.currentTimeMillis()J(Native Method)
    at java/io/ExpiringCache.put(ExpiringCache.java:74)[inlined]
    at java/io/UnixFileSystem.canonicalize(UnixFileSystem.java:158)[optimized]
    ^-- Holding lock: java/io/ExpiringCache@0x1416ba3a8[biased lock]
    at java/io/File.getCanonicalPath(File.java:559)[inlined]
    at java/io/File.getCanonicalFile(File.java:583)[inlined]
    at oracle/jrockit/jfr/Repository$1.run(Repository.java:71)[inlined]
    at oracle/jrockit/jfr/Repository$1.run(Repository.java:68)[optimized]
    at jrockit/vm/AccessController.doPrivileged(AccessController.java:232)
    at jrockit/vm/AccessController.doPrivileged(AccessController.java:240)
    at oracle/jrockit/jfr/Repository.tryToUseAsRepository(Repository.java:68)
    at oracle/jrockit/jfr/Repository.createUniqueRepository(Repository.java:48)
    at oracle/jrockit/jfr/Repository.<init>(Repository.java:26)
    at oracle/jrockit/jfr/JFRImpl.<init>(JFRImpl.java:102)
    at oracle/jrockit/jfr/VMJFR.<init>(VMJFR.java:69)
    at oracle/jrockit/jfr/VMJFR.create(VMJFR.java:572)
    at oracle/jrockit/jfr/JFR.get(JFR.java:59)
    ^-- Holding lock: java/lang/Class@0x141e80420[biased lock]
    at com/oracle/jrockit/jfr/FlightRecorder.isNativeImplementation(FlightRecorder.java:25)
    at weblogic/diagnostics/flightrecorder/FlightRecorderManager.isRecordingPossible(FlightRecorderManager.java:181)
    at weblogic/diagnostics/instrumentation/gathering/DataGatheringManager.initialize(DataGatheringManager.java:319)
    ^-- Holding lock: java/lang/Class@0x141637e10[biased lock]
    at weblogic/diagnostics/image/ImageManager.<init>(ImageManager.java:115)
    at weblogic/diagnostics/image/ImageManager.<clinit>(ImageManager.java:57)
    at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
    at jrockit/vm/RNI.initializeClass(J)V(Native Method)
    at weblogic/work/ServerWorkManagerFactory.initializeHere(ServerWorkManagerFactory.java:121)
    at weblogic/work/ServerWorkManagerFactory.initialize(ServerWorkManagerFactory.java:59)
    ^-- Holding lock: java/lang/Class@0x1415e2140[biased lock]
    at weblogic/t3/srvr/BootService.start(BootService.java:61)
    at weblogic/t3/srvr/ServerServicesManager.startService(ServerServicesManager.java:461)
    at weblogic/t3/srvr/ServerServicesManager.startInStandbyState(ServerServicesManager.java:166)
    ^-- Holding lock: java/lang/Class@0x1415d52d8[biased lock]
    at weblogic/t3/srvr/T3Srvr.initializeStandby(T3Srvr.java:881)
    at weblogic/t3/srvr/T3Srvr.startup(T3Srvr.java:568)
    at weblogic/t3/srvr/T3Srvr.run(T3Srvr.java:469)
    at weblogic/Server.main(Server.java:71)
    at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
    -- end of trace

"(Signal Handler)" id=2 idx=0x8 tid=26998 prio=5 alive, daemon

"(OC Main Thread)" id=3 idx=0xc tid=26999 prio=5 alive, native_waiting, daemon

"(GC Worker Thread 1)" id=? idx=0x10 tid=27000 prio=5 alive, daemon

"(GC Worker Thread 2)" id=? idx=0x14 tid=27001 prio=5 alive, daemon

"(GC Worker Thread 3)" id=? idx=0x18 tid=27002 prio=5 alive, daemon

"(GC Worker Thread 4)" id=? idx=0x1c tid=27003 prio=5 alive, daemon

"(GC Worker Thread 5)" id=? idx=0x20 tid=27004 prio=5 alive, daemon

"(GC Worker Thread 6)" id=? idx=0x24 tid=27005 prio=5 alive, daemon

"(GC Worker Thread 7)" id=? idx=0x28 tid=27006 prio=5 alive, daemon

"(GC Worker Thread 8)" id=? idx=0x2c tid=27007 prio=5 alive, daemon

"(GC Worker Thread 9)" id=? idx=0x30 tid=27008 prio=5 alive, daemon

"(GC Worker Thread 10)" id=? idx=0x34 tid=27009 prio=5 alive, daemon

"(GC Worker Thread 11)" id=? idx=0x38 tid=27010 prio=5 alive, daemon

"(GC Worker Thread 12)" id=? idx=0x3c tid=27011 prio=5 alive, daemon

"(GC Worker Thread 13)" id=? idx=0x40 tid=27012 prio=5 alive, daemon

"(Code Generation Thread 1)" id=4 idx=0x44 tid=27013 prio=5 alive, native_waiting, daemon

"(Code Optimization Thread 1)" id=5 idx=0x48 tid=27014 prio=5 alive, native_waiting, daemon

"(VM Periodic Task)" id=6 idx=0x4c tid=27015 prio=10 alive, native_blocked, daemon

"Finalizer" id=7 idx=0x50 tid=27016 prio=8 alive, native_waiting, daemon
    at jrockit/memory/Finalizer.waitForFinalizees(J[Ljava/lang/Object;)I(Native Method)
    at jrockit/memory/Finalizer.access$700(Finalizer.java:12)
    at jrockit/memory/Finalizer$4.run(Finalizer.java:201)
    at java/lang/Thread.run(Thread.java:662)
    at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
    -- end of trace

"Reference Handler" id=8 idx=0x54 tid=27017 prio=10 alive, native_waiting, daemon
    at java/lang/ref/Reference.waitForActivatedQueue(J)Ljava/lang/ref/Reference;(Native Method)
    at java/lang/ref/Reference.access$100(Reference.java:11)
    at java/lang/ref/Reference$ReferenceHandler.run(Reference.java:82)
    at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
    -- end of trace

"(Sensor Event Thread)" id=9 idx=0x58 tid=27018 prio=5 alive, native_blocked, daemon

"VM JFR Buffer Thread" id=10 idx=0x5c tid=27019 prio=5 alive, in native, daemon

"Timer-0" id=13 idx=0x60 tid=27020 prio=5 alive, waiting, native_blocked, daemon
    -- Waiting for notification on: java/util/TaskQueue@0x1415dfa70[fat lock]
    at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method)
    at java/lang/Object.wait(J)V(Native Method)
    at java/lang/Object.wait(Object.java:485)
    at java/util/TimerThread.mainLoop(Timer.java:483)
    ^-- Lock released while waiting: java/util/TaskQueue@0x1415dfa70[fat lock]
    at java/util/TimerThread.run(Timer.java:462)
    at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
    -- end of trace

"Timer-1" id=14 idx=0x64 tid=27021 prio=5 alive, waiting, native_blocked, daemon
    -- Waiting for notification on: java/util/TaskQueue@0x1415dfad8[fat lock]
    at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method)
    at java/lang/Object.wait(J)V(Native Method)
    at java/util/TimerThread.mainLoop(Timer.java:509)
    ^-- Lock released while waiting: java/util/TaskQueue@0x1415dfad8[fat lock]
    at java/util/TimerThread.run(Timer.java:462)
    at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
    -- end of trace

"[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'" id=15 idx=0x68 tid=27022 prio=5 alive, waiting, native_blocked, daemon
    -- Waiting for notification on: weblogic/work/ExecuteThread@0x1415e0518[fat lock]
    at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method)
    at java/lang/Object.wait(J)V(Native Method)
    at java/lang/Object.wait(Object.java:485)
    at weblogic/work/ExecuteThread.waitForRequest(ExecuteThread.java:205)
    ^-- Lock released while waiting: weblogic/work/ExecuteThread@0x1415e0518[fat lock]
    at weblogic/work/ExecuteThread.run(ExecuteThread.java:226)
    at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
    -- end of trace

"JFR request timer" id=16 idx=0x6c tid=27023 prio=5 alive, waiting, native_blocked, daemon
    -- Waiting for notification on: java/util/TaskQueue@0x1415dfb58[fat lock]
    at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method)
    at java/lang/Object.wait(J)V(Native Method)
    at java/lang/Object.wait(Object.java:485)
    at java/util/TimerThread.mainLoop(Timer.java:483)
    ^-- Lock released while waiting: java/util/TaskQueue@0x1415dfb58[fat lock]
    at java/util/TimerThread.run(Timer.java:462)
    at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
    -- end of trace

===== END OF THREAD DUMP ===============

谢谢并问候 Jimmi

答案1

最后,我们找到了一种启动 weblogic 节点的方法。这里有一些可能对其他人也有帮助的数据。

根本原因

我们所面临的问题的根本原因被发布为问题,与 /tmp 文件夹的权限有关。

托管这些 weblogic 节点的机器已/tmp重置文件夹权限,如下所示

drw-r--r-- 10 root root 4096 Nov 20 00:00 tmp

当节点早些时候工作时,权限被设置为

drwxrwxrwt 36 root root 20480 Nov 20 00:24 tmp

Weblogic 进程使用 root 以外的用户启动。

看起来这个更改权限导致节点关闭,然后阻止它们重新启动。

解决方案

由于我没有根访问权限,因此无法重置权限。

java.io.tmpdir目前,通过将 jvm 所需的临时文件夹指向 weblogic 用户可以访问的另一个文件夹,并在文件中设置 JVM 属性来启动节点$DOMAIN_HOME/bin/setDomainEnv.sh

例子:

-Djava.io.tmpdir=/home/weblogic/tmp

相关内容