一台服务器因为ORA-00600[kcbshlc_1]错误引起PMON异常导致数据库down掉
Sun Jul 8 17:20:10 2012
Errors in file /opt/oracle/admin/xff/bdump/xff_pmon_16412.trc:
ORA-00600: internal error code, arguments: [kcbshlc_1], [33], [], [], [], [], [], []
Sun Jul 8 17:20:12 2012
Errors in file /opt/oracle/admin/xff/bdump/xff_pmon_16412.trc:
ORA-00600: internal error code, arguments: [kcbshlc_1], [33], [], [], [], [], [], []
Sun Jul 8 17:20:12 2012
PMON: terminating instance due to error 472
分析trace文件
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /opt/oracle/product/10.2.0
System name: Linux
Node name: localhost.localdomain
Release: 2.6.9-89.ELsmp
Version: #1 SMP Mon Apr 20 10:33:05 EDT 2009
Machine: x86_64
Instance name: xff
Redo thread mounted by this instance: 1
Oracle process number: 2
Unix process pid: 16412, image: oracle@localhost.localdomain (PMON)
*** 2012-07-08 03:00:11.351
*** SERVICE NAME:(SYS$BACKGROUND) 2012-07-08 03:00:11.338
*** SESSION ID:(1105.1) 2012-07-08 03:00:11.338
wsd 0x1f8169a6c8, sbuf (nil), setid 9, op 0
lcuridx 0, lasz (nil)
freeing in-flux r/w latch for process state: 1fc165d248
... in-flux r/w latch 1fc1fcc9b0 Child cache buffers chains level=1 child#=4753
Location from where latch is held: kcbgtcr: kslbegin excl:
Context saved from call: 113266196
state=busy(exclusive) (val=0x2000000000000071) holder orapid = 113
waiters [orapid (seconds since: put on list, posted, alive check)]:
139 (2, 1341687611, 2)
192 (2, 1341687611, 2)
191 (2, 1341687611, 2)
173 (2, 1341687611, 2)
185 (2, 1341687611, 2)
176 (2, 1341687611, 2)
174 (2, 1341687611, 2)
118 (2, 1341687611, 2)
190 (2, 1341687611, 2)
179 (2, 1341687611, 2)
184 (1, 1341687611, 1)
189 (1, 1341687611, 1)
177 (1, 1341687611, 1)
195 (1, 1341687611, 1)
187 (1, 1341687611, 1)
194 (1, 1341687611, 1)
147 (1, 1341687611, 1)
183 (1, 1341687611, 1)
143 (1, 1341687611, 1)
144 (1, 1341687611, 1)
186 (1, 1341687611, 1)
188 (1, 1341687611, 1)
196 (1, 1341687611, 1)
145 (1, 1341687611, 1)
193 (1, 1341687611, 1)
waiter count=25
*** 2012-07-08 03:50:06.228
wsd 0x1f8169ac20, sbuf 0xac1ffafe8, setid 10, op 3
lcuridx 1, lasz 0x3c1ffc110
*** 2012-07-08 16:30:05.294
freeing in-flux r/w latch for process state: 20406507f0
... in-flux r/w latch 1f81265f28 Child cache buffers chains level=1 child#=14180
Location from where latch is held: kcbgtcr: kslbegin excl:
Context saved from call: 71341989
state=busy(exclusive) (val=0x2000000000000066) holder orapid = 102
waiters [orapid (seconds since: put on list, posted, alive check)]:
121 (2, 1341736205, 2)
116 (2, 1341736205, 2)
125 (2, 1341736205, 2)
140 (2, 1341736205, 2)
145 (2, 1341736205, 2)
waiter count=5
freeing in-flux r/w latch for process state: 1fc165f9d0
... in-flux r/w latch 1f813aec18 Child cache buffers chains level=1 child#=20914
Location from where latch is held: kcbrls: kslbegin:
Context saved from call: 96505705
state=busy(exclusive) (val=0x200000000000007b) holder orapid = 123
*** 2012-07-08 17:20:10.876
wsd 0x1f8169a6c8, sbuf (nil), setid 9, op 0
lcuridx 0, lasz (nil)
*** 2012-07-08 17:20:10.876
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kcbshlc_1], [33], [], [], [], [], [], []
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst()+31 call ksedst1() 000000000 ? 000000001 ?
7FBFFFCEB0 ? 7FBFFFCF10 ?
7FBFFFCE50 ? 000000000 ?
ksedmp()+610 call ksedst() 000000000 ? 000000001 ?
7FBFFFCEB0 ? 7FBFFFCF10 ?
7FBFFFCE50 ? 000000000 ?
ksfdmp()+21 call ksedmp() 000000003 ? 000000001 ?
7FBFFFCEB0 ? 7FBFFFCF10 ?
7FBFFFCE50 ? 000000000 ?
kgerinv()+161 call ksfdmp() 000000003 ? 000000001 ?
7FBFFFCEB0 ? 7FBFFFCF10 ?
7FBFFFCE50 ? 000000000 ?
kgeasnmierr()+163 call kgerinv() 0066876E0 ? 2A97200260 ?
7FBFFFCF10 ? 7FBFFFCE50 ?
000000000 ? 000000000 ?
kcbshlc()+239 call kgeasnmierr() 0066876E0 ? 2A97200260 ?
7FBFFFCF10 ? 7FBFFFCE50 ?
000000000 ? 000000021 ?
kslilcr()+770 call kcbshlc() 0066876E0 ? 1F801DDB28 ?
7FBFFFCF10 ? 7FBFFFCE50 ?
000000000 ? 000000021 ?
ksl_cleanup()+1567 call kslilcr() 7FBFFFCE50 ? 000000000 ?
7FBFFFDCE0 ? 1F801DDB28 ?
0066876E0 ? 000000021 ?
ksuxfl()+492 call ksl_cleanup() 000000000 ? 000000000 ?
000000000 ? 1F801DDB28 ?
0066876E0 ? 000000021 ?
ksuxda()+55 call ksuxfl() 1FC165B8E0 ? 000000000 ?
000000000 ? 1F801DDB28 ?
0066876E0 ? 000000021 ?
ksucln()+1390 call ksuxda() 1FC165B8E0 ? 000000000 ?
000000000 ? 1F801DDB28 ?
0066876E0 ? 000000021 ?
ksbrdp()+794 call ksucln() 060008100 ? 000000000 ?
FFFFFFFF9720ED9F ?
1F801DDB28 ? 0066876E0 ?
000000021 ?
opirip()+616 call ksbrdp() 060008100 ? 000000000 ?
000000001 ? 060008100 ?
0066876E0 ? 000000021 ?
opidrv()+582 call opirip() 000000032 ? 000000004 ?
7FBFFFF698 ? 060008100 ?
0066876E0 ? 000000021 ?
sou2o()+114 call opidrv() 000000032 ? 000000004 ?
7FBFFFF698 ? 060008100 ?
0066876E0 ? 000000021 ?
opimai_real()+317 call sou2o() 7FBFFFF670 ? 000000032 ?
000000004 ? 7FBFFFF698 ?
0066876E0 ? 000000021 ?
main()+116 call opimai_real() 000000003 ? 7FBFFFF700 ?
000000004 ? 7FBFFFF698 ?
0066876E0 ? 000000021 ?
__libc_start_main() call main() 000000003 ? 7FBFFFF700 ?
+219 000000004 ? 7FBFFFF698 ?
0066876E0 ? 000000021 ?
_start()+42 call __libc_start_main() 000713984 ? 000000001 ?
7FBFFFF848 ? 005288D00 ?
000000000 ? 000000003 ?
通过这个trace可以看出数据库运行在LINUX 64操作系统,版本是10.2.0.4。
出现错误的原因:
PMON在清理1fc165d248的时候,因为被orapid = 102持有,导致清理失败.
PMON在清理20406507f0的时候,因为被orapid = 102持有,导致清理失败.
PMON在清理1fc165f9d0的时候,因为被orapid = 123持有,导致清理失败.
查询MOS[443909.1]
发现是unpublished Bug 4723109.处理方法打上Patch 4723109.