truncate table强制终止导致ORA-00600[ktspfundo-2]

朋友的金蝶erp系统异常abort,让我帮忙分析原因.
ORA-00600[ktspfundo-2]错误

Fri Jul 27 08:53:33 2012
Errors in file /u01/oracle/admin/finance/udump/finance_ora_7687.trc:
ORA-00600: internal error code, arguments: [ktspfundo-2], [], [], [], [], [], [], []
ORA-01013: user requested cancel of current operation
Fri Jul 27 08:53:33 2012
Errors in file /u01/oracle/admin/finance/udump/finance_ora_7687.trc:
ORA-00600: internal error code, arguments: [ktspfundo-2], [], [], [], [], [], [], []
ORA-01013: user requested cancel of current operation
Fri Jul 27 08:54:16 2012
Errors in file /u01/oracle/admin/finance/udump/finance_ora_7687.trc:
ORA-00600: internal error code, arguments: [ktspfundo-2], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [ktspfundo-2], [], [], [], [], [], [], []
ORA-01013: user requested cancel of current operation
Fri Jul 27 08:57:12 2012
Errors in file /u01/oracle/admin/finance/bdump/finance_smon_4156.trc:
ORA-00600: internal error code, arguments: [ktspfundo-2], [], [], [], [], [], [], []
Fri Jul 27 08:57:20 2012
ORACLE Instance finance (pid = 15)-Error 600 encountered while recovering transaction (8, 3) on object 34294107.
Fri Jul 27 08:57:20 2012
Errors in file /u01/oracle/admin/finance/bdump/finance_smon_4156.trc:
ORA-00600: internal error code, arguments: [ktspfundo-2], [], [], [], [], [], [], []
Fri Jul 27 09:07:14 2012
Errors in file /u01/oracle/admin/finance/bdump/finance_smon_4156.trc:
ORA-00600: internal error code, arguments: [ktspfundo-2], [], [], [], [], [], [], []
Fri Jul 27 09:07:15 2012
Errors in file /u01/oracle/admin/finance/bdump/finance_pmon_4130.trc:
ORA-00474: SMON process terminated with error

从这里可以大概看出数据库在进行一个参数,然后用户终止该操作导致,导致ORA-00600[ktspfundo-2]错误,然后出现smon回滚,因为回滚失败从而使得数据块down掉

分析trace文件

*** 2012-07-27 08:53:33.293
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [ktspfundo-2], [], [], [], [], [], [], []
ORA-01013: user requested cancel of current operation
Current SQL statement for this session:
TRUNCATE TABLE VTC3B8DR2G7J926FWOBK839XOR
----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
ksedst()+31          call     ksedst1()            000000000 ? 000000001 ?
                                                   7FFF41B0EE70 ? 7FFF41B0EED0 ?
                                                   7FFF41B0EE10 ? 000000000 ?
ksedmp()+610         call     ksedst()             000000000 ? 000000001 ?
                                                   7FFF41B0EE70 ? 7FFF41B0EED0 ?
                                                   7FFF41B0EE10 ? 000000000 ?
ksfdmp()+21          call     ksedmp()             000000003 ? 000000001 ?
                                                   7FFF41B0EE70 ? 7FFF41B0EED0 ?
                                                   7FFF41B0EE10 ? 000000000 ?
kgerinv()+161        call     ksfdmp()             000000003 ? 000000001 ?
                                                   7FFF41B0EE70 ? 7FFF41B0EED0 ?
                                                   7FFF41B0EE10 ? 000000000 ?
kgeasnmierr()+163    call     kgerinv()            0068966E0 ? 2AE87C6E1168 ?
                                                   7FFF41B0EED0 ? 7FFF41B0EE10 ?
                                                   000000000 ? 000000000 ?
ktspfundo()+3902     call     kgeasnmierr()        0068966E0 ? 2AE87C6E1168 ?
                                                   7FFF41B0EED0 ? 7FFF41B0EE10 ?
                                                   000000010 ? 00689C0C0 ?
kcoubk()+351         call     ktspfundo()          7FFF41B10810 ? 2AE80C800CFA ?
                                                   4D6EE6014 ? 000000002 ?
                                                   000000010 ? 7FFF41B11128 ?
ktundo()+1208        call     kcoubk()             7FFF41B111F8 ? 7FFF41B10810 ?
                                                   2AE87E384024 ? 000000002 ?
                                                   000000002 ? 000000000 ?
ktubko()+499         call     ktundo()             000000001 ? 010E5C341 ?
                                                   2AE87E384020 ? 000000058 ?
                                                   000008430 ? 7657E9990 ?
ktuabt()+810         call     ktubko()             7657E9990 ? 7FFF41B1188C ?
                                                   000000002 ? 7FFF41B11648 ?
                                                   7FFF41B11570 ? 7FFF41B11870 ?
ktcrab()+292         call     ktuabt()             7657E98F8 ? 000000002 ?
                                                   000000002 ? 7FFF41B11648 ?
                                                   7FFF41B11570 ? 7FFF41B11870 ?
ktccle()+516         call     ktcrab()             7657E98F8 ? 000000002 ?
                                                   000000002 ? 7FFF41B11648 ?
                                                   7FFF41B11570 ? 7FFF41B11870 ?
ksepop()+384         call     ktccle()             000000006 ? 000000002 ?
                                                   000000002 ? 7FFF41B11648 ?
                                                   7FFF41B11570 ? 7FFF41B11870 ?
kgepop()+123         call     ksepop()             0068966E0 ? 000000006 ?
                                                   000000002 ? 7FFF41B11648 ?
                                                   7FFF41B11570 ? 7FFF41B11870 ?
kgesev()+315         call     kgepop()             0068966E0 ? 2AE87C6E1168 ?
                                                   0000003F5 ? 7FFF41B11648 ?
                                                   7FFF41B11570 ? 7FFF41B11870 ?
ksesec0()+186        call     kgesev()             0068966E0 ? 2AE87C6E1168 ?
                                                   0000003F5 ? 000000000 ?
                                                   7FFF41B11B30 ? 7FFF41B11870 ?
ksqcmi()+2322        call     ksesec0()            000000000 ? 000000000 ?
                                                   000001000 ? 000000000 ?
                                                   000000013 ? 000000005 ?
ksqcnv()+496         call     ksqcmi()             77E586B88 ? 000000006 ?
                                                   00000FFFF ? 00147AE14 ?
                                                   7FFF41B126A0 ? 7FFF41B128A8 ?
ksqcov()+44          call     ksqcnv()             77E586B88 ? 000000006 ?
                                                   000000000 ? 00147AE14 ?
                                                   7FFF41B128A8 ? 000000004 ?
kcbo_reuse_obj()+14  call     ksqcov()             77E586B68 ? 000000006 ?
09                                                 000000000 ? 00147AE14 ?
                                                   7FFF41B128A8 ? 000000004 ?
kcbrbo()+1126        call     kcbo_reuse_obj()     7FFF41B12F04 ? 7FFF41B12F0C ?
                                                   000000001 ? 00147AE14 ?
                                                   7FFF41B128A8 ? 000000004 ?
ktsstrn_segment()+3  call     kcbrbo()             7FFF41B12F04 ? 7FFF41B12F0C ?
941                                                000000001 ? 00147AE14 ?
                                                   7FFF41B128A8 ? 000000004 ?
kkbtts_trunc_tbl_se  call     ktsstrn_segment()    7FFF41B13180 ? 000000000 ?
g()+1018                                           0020C6444 ? 000000000 ?
                                                   7FFF41B14C00 ? 7FFF00000001 ?
kkbtrn()+8156        call     kkbtts_trunc_tbl_se  735ACA058 ? 77BC78D18 ?
                              g()                  000000000 ? 000000002 ?
                                                   000000000 ? 7FFF41B14C00 ?
opiexe()+15805       call     kkbtrn()             735ACA058 ? 000000000 ?
                                                   718831208 ? 000000000 ?
                                                   000000002 ? 7FFF00000000 ?
opiosq0()+3316       call     opiexe()             000000004 ? 000000000 ?
                                                   7FFF41B15F48 ? 00000000B ?
                                                   000000002 ? 7FFF00000000 ?
kpooprx()+315        call     opiosq0()            000000003 ? 00000000E ?
                                                   7FFF41B160B8 ? 0000000A4 ?
                                                   000000002 ? 7FFF00000000 ?
kpoal8()+799         call     kpooprx()            7FFF41B19264 ? 7FFF41B17280 ?
                                                   000000029 ? 000000001 ?
                                                   000000000 ? 7FFF00000000 ?
opiodr()+984         call     kpoal8()             00000005E ? 000000017 ?
                                                   7FFF41B19260 ? 000000001 ?
                                                   000000001 ? 7FFF00000000 ?
ttcpip()+1012        call     opiodr()             00000005E ? 000000017 ?
                                                   7FFF41B19260 ? 000000000 ?
                                                   0059C09B0 ? 7FFF00000000 ?
opitsk()+1322        call     ttcpip()             00689E3B0 ? 7FFF41B17248 ?
                                                   7FFF41B19260 ? 000000000 ?
                                                   7FFF41B18D58 ? 7FFF41B193C8 ?
opiino()+1026        call     opitsk()             000000003 ? 000000000 ?
                                                   7FFF41B19260 ? 000000001 ?
                                                   000000000 ? 4E58D8C00000001 ?
opiodr()+984         call     opiino()             00000003C ? 000000004 ?
                                                   7FFF41B1A428 ? 000000000 ?
                                                   000000000 ? 4E58D8C00000001 ?
opidrv()+547         call     opiodr()             00000003C ? 000000004 ?
                                                   7FFF41B1A428 ? 000000000 ?
                                                   0059C0460 ? 4E58D8C00000001 ?
sou2o()+114          call     opidrv()             00000003C ? 000000004 ?
                                                   7FFF41B1A428 ? 000000000 ?
                                                   0059C0460 ? 4E58D8C00000001 ?
opimai_real()+163    call     sou2o()              7FFF41B1A400 ? 00000003C ?
                                                   000000004 ? 7FFF41B1A428 ?
                                                   0059C0460 ? 4E58D8C00000001 ?
main()+116           call     opimai_real()        000000002 ? 7FFF41B1A490 ?
                                                   000000004 ? 7FFF41B1A428 ?
                                                   0059C0460 ? 4E58D8C00000001 ?
__libc_start_main()  call     main()               000000002 ? 7FFF41B1A490 ?
+244                                               000000004 ? 7FFF41B1A428 ?
                                                   0059C0460 ? 4E58D8C00000001 ?
_start()+41          call     __libc_start_main()  000723088 ? 000000002 ?
                                                   7FFF41B1A5E8 ? 000000000 ?
                                                   0059C0460 ? 000000002 ?
 
--------------------- Binary Stack Dump ---------------------

这里可以得到更加准确的信息,数据库在truncate table的时候,有人异常终止程序,导致数据库出现ORA-00600[ktspfundo-2].查询mos未发现相关bug记录,从这些信息初步判断是因为oracle的bug导致在truncate表的时候异常终止,然后出现该对象上的回滚记录异常(当然truncate本身不需要回滚,但是可能记录一些附带的回滚信息),然后出现对象回滚异常是的数据库down.重启数据库,对象回滚段信息已经自动回滚完成,数据库正常.因为truncate表被异常终止的情况本身不多见,引发该bug更不常见,如果只是偶尔发生一次,建议忽略该错误.当然如果有时间和兴趣,可以提交sr

客户端版本导致ORA-00600[kssadd_stage: null parent]

有客户一台应用不能正常工作,报ORA-00600[kssadd_stage: null parent]错误,重启中间件后工作正常.
alert日志

ORA-00600: internal error code, arguments: [kssadd_stage: null parent], [], [], [], [], [], [], []
Tue Jul 17 14:57:37 2012
Trace dumping is performing id=[cdmp_20120717145742]
Tue Jul 17 14:57:39 2012
Errors in file /oracle/10g/admin/fdjdb/udump/fdjdb2_ora_307720.trc:
ORA-00600: internal error code, arguments: [kssadd_stage: null parent], [], [], [], [], [], [], []
Tue Jul 17 14:57:45 2012
Errors in file /oracle/10g/admin/fdjdb/udump/fdjdb2_ora_357344.trc:
ORA-00600: internal error code, arguments: [kssadd_stage: null parent], [], [], [], [], [], [], []

trace文件

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /oracle/10g/db
System name:    AIX
Node name:      ora2
Release:        1
Version:        6
Machine:        00CCFD354C00
Instance name: fdjdb2
Redo thread mounted by this instance: 2
Oracle process number: 89
Unix process pid: 111068, image: oracle@ora2

*** ACTION NAME:() 2012-07-17 15:08:42.043
*** MODULE NAME:(gsrvr.exe) 2012-07-17 15:08:42.043
*** SERVICE NAME:(fdjdb) 2012-07-17 15:08:42.043
*** SESSION ID:(991.44140) 2012-07-17 15:08:42.043
*** 2012-07-17 15:08:42.043
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kssadd_stage: null parent], [], [], [], [], [], [], []
No current SQL statement being executed.
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst+001c          bl       ksedst1              40D1A9663F9E7ABB ?
                                                   6ED89E14D59386B5 ?
ksedmp+0290          bl       ksedst               104A2CDB0 ?
ksfdmp+0018          bl       03F2735C
kgerinv+00dc         bl       _ptrgl
kgeasnmierr+004c     bl       kgerinv              11041D938 ? 700000220FFBFC8 ?
                                                   110000770 ? 7000004FDF0C700 ?
                                                   FFFFFFFFFFF89C0 ?
kssadd_stage+0080    bl       kgeasnmierr          110195490 ? 110450040 ?
                                                   104AC46B8 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 7000004F0FA9208 ?
kqreqa+0058          bl       kssadd_stage         105670038 ? 104CF7BA0 ?
                                                   000000000 ? 000000000 ?
kqrpre1+0850         bl       kqreqa               100203514 ? 1101A2B20 ?
kqrpre+001c          bl       kqrpre1              710000770 ? 000000009 ?
                                                   FFFFFFFFFFF9088 ?
                                                   28A4202200000000 ?
                                                   10012AEE4 ? FFFFFFFFFFF9080 ?
                                                   000000000 ? 11022A3E0 ?
opiosq0+009c         bl       kqrpre               000000000 ? 000000000 ?
                                                   000000000 ? 1101A2B20 ?
                                                   FFFFFFFFFFF9198 ? 1104B7C60 ?
                                                   FFFFFFFFFFF9458 ?
kpooprx+0168         bl       opiosq0              4A00000001 ? 000000001 ?
                                                   000000000 ? A40000000000FF ?
kpoal8+0400          bl       kpooprx              FFFFFFFFFFFB964 ?
                                                   FFFFFFFFFFFB680 ?
                                                   5000000050 ? 100000001 ?
                                                   000000000 ? A40000000000A4 ?
                                                   000000000 ? 1103A1AD8 ?
opiodr+0ae0          bl       _ptrgl
ttcpip+1020          bl       _ptrgl
opitsk+1124          bl       01F971E8
opiino+0990          bl       opitsk               000000000 ? 000000000 ?
opiodr+0ae0          bl       _ptrgl
opidrv+0484          bl       01F96034
sou2o+0090           bl       opidrv               3C02D9A29C ? 4A006E298 ?
                                                   FFFFFFFFFFFF8A0 ?
opimai_real+01bc     bl       01F939B4
main+0098            bl       opimai_real          000000000 ? 000000000 ?
__start+0070         bl       main                 000000000 ? 000000000 ?

--------------------- Binary Stack Dump ---------------------

咨询客户得知访问该数据库的是通过中间件(OCI)+10g Release 1 (10.1) for Windows访问数据库,然后查询MOS[ID 752149.1]发现stack trace
kssadd_stage <- kqreqa <- kqrpre1 <- kqrpre <- opiosq0 <- kpooprx <- kpoal8 <- opiodr <- ttcpip <- opitsk <- opiino <- opiodr <- opidrv <- sou2o <- opimai_real <- main <- start 和客户端版本和访问环境都和unpublished Bug 4937225相似
处理建议
客户端升级到10.2.0.3或者更高版本

8i升级到9i出现ORA-07445[pevm_MOVC_i()+18]

一个朋友数据库从8i升级到9i后,出现ORA-07445[pevm_MOVC_i()+18]错误
alert日志ORA-07445[pevm_MOVC_i()+18]

Mon Jul 16 12:21:54 2012
Errors in file /oracle/admin/ora8/udump/ora8_ora_8938.trc:
ORA-07445: exception encountered: core dump [pevm_MOVC_i()+18] [SIGSEGV] [Address not mapped to object] [0x7] [] []

trace文件

--版本平台信息
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production
ORACLE_HOME = /oracle/product/9.2.0
System name:	Linux
Node name:	localhost.localdomain
Release:	2.6.18-194.el5PAE
Version:	#1 SMP Tue Mar 16 22:00:21 EDT 2010
Machine:	i686
Instance name: ora8
Redo thread mounted by this instance: 1
Oracle process number: 15
Unix process pid: 8938, image: oracle@localhost.localdomain (TNS V1-V3)

--trace信息
*** 2012-07-16 12:21:54.399
*** SESSION ID:(12.6) 2012-07-16 12:21:54.399
Exception signal: 11 (SIGSEGV), code: 1 (Address not mapped to object), addr: 0x7, PC: [0x9bfac06, pevm_MOVC_i()+18]
Registers:
%eax: 0x00000000 %ebx: 0x00000025 %ecx: 0x00000000
%edx: 0xbf93bf50 %edi: 0x00000000 %esi: 0x002ff1d8
%esp: 0xbf93bc28 %ebp: 0xbf93bc60 %eip: 0x09bfac06
%efl: 0x00010296
  pevm_MOVC_i()+6 (0x9bfabfa) mov %edi,0xffffffcc(%ebp)
  pevm_MOVC_i()+9 (0x9bfabfd) mov %esi,0xffffffd0(%ebp)
  pevm_MOVC_i()+12 (0x9bfac00) mov %ebx,0xffffffc8(%ebp)
  pevm_MOVC_i()+15 (0x9bfac03) mov 0x14(%ebp),%eax
> pevm_MOVC_i()+18 (0x9bfac06) movb 0x7(%eax),%dl
  pevm_MOVC_i()+21 (0x9bfac09) mov $0x0,0xfffffff0(%ebp)
  pevm_MOVC_i()+28 (0x9bfac10) movb %dl,0xffffffe0(%ebp)
  pevm_MOVC_i()+31 (0x9bfac13) movb %dl,0xffffffe0(%ebp)
  pevm_MOVC_i()+34 (0x9bfac16) cmpb $0x1,%dl
*** 2012-07-16 12:21:54.407
ksedmp: internal or fatal error
ORA-07445: exception encountered: core dump [pevm_MOVC_i()+18] [SIGSEGV] [Address not mapped to object] [0x7] [] []
----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
ksedmp()+269         call     ksedst()+0           1 ? 0 ? 0 ? 1 ? 64252C31 ?
                                                   6666006C ?
ssexhd()+1108        call     ksedmp()+0           3 ? 0 ? 0 ? 0 ? 0 ? 0 ?
pevm_MOVC_i()+18     signal   ssexhd()+0           B ? BF93B8BC ? BF93B93C ?
pfrrun()+8458        call     pevm_MOVC_i()+0      2FF19C ? 16 ? BE14650 ? 0 ?
pricar()+1277        call     pfrrun()+0           2FF19C ? 1 ? BF93CCFC ?
                                                   AD638A0 ? 2DFBAC ? 0 ?
pricbr()+427         call     pricar()+0           BF93DA88 ? BF93D084 ?
                                                   9BEAE0C ? 1 ? 0 ? 98C93728 ?
prient2()+598        call     pricbr()+0           BF93DA88 ? BF93D084 ? 0 ?
prient()+1438        call     prient2()+0          BF93DA88 ? BF93D084 ? 1 ?
                                                   BF93E4E0 ? 0 ?
kkxrpc()+347         call     prient()+0           BF93DA88 ? AD638A0 ?
                                                   BF93E534 ? 38 ? 1C8C997 ? 0 ?
kporpc()+138         call     kkxrpc()+0           4C ? F ? BF93E63C ?
opiodr()+5238        call     kjushutdown()+2671   4C ? F ? BF93E63C ?
ttcpip()+2124        call     opiodr()+0           4C ? F ? BF93E63C ? 0 ?
Cannot find symbol in /lib/libc.so.6.
opitsk()+1635        call     ttcpip()+0           AD638A0 ? 4C ? BF93E63C ? 0 ?
                                                   BF93EF14 ? BF93EF10 ?
opiino()+602         call     opitsk()+0           0 ? 0 ? AD638A0 ? BE01DE0 ?
                                                   103 ? 0 ?
opiodr()+5238        call     kjushutdown()+2671   3C ? 4 ? BF9402E0 ?
opidrv()+517         call     opiodr()+0           3C ? 4 ? BF9402E0 ? 0 ?
sou2o()+25           call     opidrv()+0           3C ? 4 ? BF9402E0 ?
main()+182           call     sou2o()+0            BF9402C4 ? 3C ? 4 ?
                                                   BF9402E0 ? 0 ? 0 ?
00125E9C             call     main()+0             2 ? BF940384 ? BF940390 ?
                                                   88A810 ? 0 ? 1 ?
 
---------------------Binary Stack Dump ---------------------

--进程信息
Process global information:
     process: 0x962ba0b8, call: 0x96342cd8, xact: (nil), curses: 0x962e4070, usrses: 0x962e4070
  ----------------------------------------
  SO: 0x962ba0b8, type: 2, owner: (nil), flag: INIT/-/-/0x00
  (process) Oracle pid=15, calls cur/top: 0x96342cd8/0x96342cd8, flag: (0) -
            int error: 0, call error: 0, sess error: 0, txn error 0
  (post info) last post received: 196 0 4
              last post received-location: kslpsr
              last process to post me: 962b7828 1 6
              last post sent: 0 0 15
              last post sent-location: ksasnd
              last process posted by me: 962b7828 1 6
    (latch info) wait_event=0 bits=0
    Process Group: DEFAULT, pseudo proc: 0x962d9444
    O/S info: user: oracle, term: UNKNOWN, ospid: 8938
    OSD pid info: Unix process pid: 8938, image: oracle@localhost.localdomain (TNS V1-V3)
    ----------------------------------------
    SO: 0x962e4070, type: 4, owner: 0x962ba0b8, flag: INIT/-/-/0x00
    (session) trans: (nil), creator: 0x962ba0b8, flag: (8000041) USR/- BSY/-/-/-/-/-
              DID: 0001-000F-00000004, short-term DID: 0000-0000-00000000
              txn branch: (nil)
              oct: 0, prv: 0, sql: (nil), psql: 0x98c3b858, user: 95/DDDD
    O/S info: user: mis, term: LANDERSVR3, ospid: 7904:3012, machine: XANDER\LANDERSVR3
              program: c:\orant\bin\f50run32.exe c:\forms\bas9010.fmx
    application name: c:\orant\bin\f50run32.exe c:\forms\bas9010.fmx, hash value=0
    last wait for 'db file sequential read' blocking sess=0x0 seq=1277 wait_time=11
                file#=1, block#=21b, blocks=1

1.该用户程序是从8i升级到9i之后产生该错误
2.报错的访问程序是FORM 5

解决方案
查询MOS[ID 273411.1]发现是因为FORM 5和9i不兼容导致该错误,ORACLE未给出解决方案,言外之意,如果FORM不能升级,那就只能把ORACLE重新降级到8i.

温馨提示
在做oracle数据库升级前,需要实现进行评估,测试,如果是oracle相关软件和oracle数据库结合紧密,升级前最好需要和ORACLE技术人员确认是否兼容.

ORA-00600[kcbshlc_1]导致数据库 down 案例

一台服务器因为ORA-00600[kcbshlc_1]错误引起PMON异常导致数据库down掉

Sun Jul  8 17:20:10 2012
Errors in file /opt/oracle/admin/xff/bdump/xff_pmon_16412.trc:
ORA-00600: internal error code, arguments: [kcbshlc_1], [33], [], [], [], [], [], []
Sun Jul  8 17:20:12 2012
Errors in file /opt/oracle/admin/xff/bdump/xff_pmon_16412.trc:
ORA-00600: internal error code, arguments: [kcbshlc_1], [33], [], [], [], [], [], []
Sun Jul  8 17:20:12 2012
PMON: terminating instance due to error 472

分析trace文件

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /opt/oracle/product/10.2.0
System name:	Linux
Node name:	localhost.localdomain
Release:	2.6.9-89.ELsmp
Version:	#1 SMP Mon Apr 20 10:33:05 EDT 2009
Machine:	x86_64
Instance name: xff
Redo thread mounted by this instance: 1
Oracle process number: 2
Unix process pid: 16412, image: oracle@localhost.localdomain (PMON)

*** 2012-07-08 03:00:11.351
*** SERVICE NAME:(SYS$BACKGROUND) 2012-07-08 03:00:11.338
*** SESSION ID:(1105.1) 2012-07-08 03:00:11.338
 wsd 0x1f8169a6c8, sbuf (nil), setid 9, op 0
lcuridx 0, lasz (nil)
freeing in-flux r/w latch for process state: 1fc165d248
... in-flux r/w latch  1fc1fcc9b0 Child cache buffers chains level=1 child#=4753 
        Location from where latch is held: kcbgtcr: kslbegin excl: 
        Context saved from call: 113266196
        state=busy(exclusive) (val=0x2000000000000071) holder orapid = 113
    waiters [orapid (seconds since: put on list, posted, alive check)]:
     139 (2, 1341687611, 2)
     192 (2, 1341687611, 2)
     191 (2, 1341687611, 2)
     173 (2, 1341687611, 2)
     185 (2, 1341687611, 2)
     176 (2, 1341687611, 2)
     174 (2, 1341687611, 2)
     118 (2, 1341687611, 2)
     190 (2, 1341687611, 2)
     179 (2, 1341687611, 2)
     184 (1, 1341687611, 1)
     189 (1, 1341687611, 1)
     177 (1, 1341687611, 1)
     195 (1, 1341687611, 1)
     187 (1, 1341687611, 1)
     194 (1, 1341687611, 1)
     147 (1, 1341687611, 1)
     183 (1, 1341687611, 1)
     143 (1, 1341687611, 1)
     144 (1, 1341687611, 1)
     186 (1, 1341687611, 1)
     188 (1, 1341687611, 1)
     196 (1, 1341687611, 1)
     145 (1, 1341687611, 1)
     193 (1, 1341687611, 1)
     waiter count=25
*** 2012-07-08 03:50:06.228
 wsd 0x1f8169ac20, sbuf 0xac1ffafe8, setid 10, op 3
lcuridx 1, lasz 0x3c1ffc110
*** 2012-07-08 16:30:05.294
freeing in-flux r/w latch for process state: 20406507f0
... in-flux r/w latch  1f81265f28 Child cache buffers chains level=1 child#=14180 
        Location from where latch is held: kcbgtcr: kslbegin excl: 
        Context saved from call: 71341989
        state=busy(exclusive) (val=0x2000000000000066) holder orapid = 102
    waiters [orapid (seconds since: put on list, posted, alive check)]:
     121 (2, 1341736205, 2)
     116 (2, 1341736205, 2)
     125 (2, 1341736205, 2)
     140 (2, 1341736205, 2)
     145 (2, 1341736205, 2)
     waiter count=5
freeing in-flux r/w latch for process state: 1fc165f9d0
... in-flux r/w latch  1f813aec18 Child cache buffers chains level=1 child#=20914 
        Location from where latch is held: kcbrls: kslbegin: 
        Context saved from call: 96505705
        state=busy(exclusive) (val=0x200000000000007b) holder orapid = 123
*** 2012-07-08 17:20:10.876
 wsd 0x1f8169a6c8, sbuf (nil), setid 9, op 0
lcuridx 0, lasz (nil)
*** 2012-07-08 17:20:10.876
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kcbshlc_1], [33], [], [], [], [], [], []
----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
ksedst()+31          call     ksedst1()            000000000 ? 000000001 ?
                                                   7FBFFFCEB0 ? 7FBFFFCF10 ?
                                                   7FBFFFCE50 ? 000000000 ?
ksedmp()+610         call     ksedst()             000000000 ? 000000001 ?
                                                   7FBFFFCEB0 ? 7FBFFFCF10 ?
                                                   7FBFFFCE50 ? 000000000 ?
ksfdmp()+21          call     ksedmp()             000000003 ? 000000001 ?
                                                   7FBFFFCEB0 ? 7FBFFFCF10 ?
                                                   7FBFFFCE50 ? 000000000 ?
kgerinv()+161        call     ksfdmp()             000000003 ? 000000001 ?
                                                   7FBFFFCEB0 ? 7FBFFFCF10 ?
                                                   7FBFFFCE50 ? 000000000 ?
kgeasnmierr()+163    call     kgerinv()            0066876E0 ? 2A97200260 ?
                                                   7FBFFFCF10 ? 7FBFFFCE50 ?
                                                   000000000 ? 000000000 ?
kcbshlc()+239        call     kgeasnmierr()        0066876E0 ? 2A97200260 ?
                                                   7FBFFFCF10 ? 7FBFFFCE50 ?
                                                   000000000 ? 000000021 ?
kslilcr()+770        call     kcbshlc()            0066876E0 ? 1F801DDB28 ?
                                                   7FBFFFCF10 ? 7FBFFFCE50 ?
                                                   000000000 ? 000000021 ?
ksl_cleanup()+1567   call     kslilcr()            7FBFFFCE50 ? 000000000 ?
                                                   7FBFFFDCE0 ? 1F801DDB28 ?
                                                   0066876E0 ? 000000021 ?
ksuxfl()+492         call     ksl_cleanup()        000000000 ? 000000000 ?
                                                   000000000 ? 1F801DDB28 ?
                                                   0066876E0 ? 000000021 ?
ksuxda()+55          call     ksuxfl()             1FC165B8E0 ? 000000000 ?
                                                   000000000 ? 1F801DDB28 ?
                                                   0066876E0 ? 000000021 ?
ksucln()+1390        call     ksuxda()             1FC165B8E0 ? 000000000 ?
                                                   000000000 ? 1F801DDB28 ?
                                                   0066876E0 ? 000000021 ?
ksbrdp()+794         call     ksucln()             060008100 ? 000000000 ?
                                                   FFFFFFFF9720ED9F ?
                                                   1F801DDB28 ? 0066876E0 ?
                                                   000000021 ?
opirip()+616         call     ksbrdp()             060008100 ? 000000000 ?
                                                   000000001 ? 060008100 ?
                                                   0066876E0 ? 000000021 ?
opidrv()+582         call     opirip()             000000032 ? 000000004 ?
                                                   7FBFFFF698 ? 060008100 ?
                                                   0066876E0 ? 000000021 ?
sou2o()+114          call     opidrv()             000000032 ? 000000004 ?
                                                   7FBFFFF698 ? 060008100 ?
                                                   0066876E0 ? 000000021 ?
opimai_real()+317    call     sou2o()              7FBFFFF670 ? 000000032 ?
                                                   000000004 ? 7FBFFFF698 ?
                                                   0066876E0 ? 000000021 ?
main()+116           call     opimai_real()        000000003 ? 7FBFFFF700 ?
                                                   000000004 ? 7FBFFFF698 ?
                                                   0066876E0 ? 000000021 ?
__libc_start_main()  call     main()               000000003 ? 7FBFFFF700 ?
+219                                               000000004 ? 7FBFFFF698 ?
                                                   0066876E0 ? 000000021 ?
_start()+42          call     __libc_start_main()  000713984 ? 000000001 ?
                                                   7FBFFFF848 ? 005288D00 ?
                                                   000000000 ? 000000003 ?

通过这个trace可以看出数据库运行在LINUX 64操作系统,版本是10.2.0.4。
出现错误的原因:
PMON在清理1fc165d248的时候,因为被orapid = 102持有,导致清理失败.
PMON在清理20406507f0的时候,因为被orapid = 102持有,导致清理失败.
PMON在清理1fc165f9d0的时候,因为被orapid = 123持有,导致清理失败.

查询MOS[443909.1]
发现是unpublished Bug 4723109.处理方法打上Patch 4723109.

DBCA Fails With ORA-15243

今天接到朋友的电话说他们装ORACLE 11G R1 RAC的时候遇到ORA-12801/ORA-15243错误,请求我帮忙解决
具体情况
AIX系统以前装过11G R2 RAC,现因为项目要求11G R1,已经重装了系统,然后安装R1,在安装到DBCA配置ASM的时候,出现ORA-12801/ORA-15243错误

ORA-12801: error signaled in parallel query server PZ99, instance wmsdb1:+ASM1(1)
ORA-15243: 11.2.0.0.0 is not a valid version number


通过SQLPLUS登录ASM1实例查询发现该有一个ORADATA磁盘组,包含了一个/dev/rhdisk1.通过询问,得出结论是这个磁盘组以前是安装R2的时候作为存储OCR和VOTINGDISK使用,重装系统的时候未对该磁盘进行处理.

处理思路[想办法清除磁盘中asm信息]
1.尝试通过sqlplus 删除该磁盘组,报该磁盘组处于dismount状态
2.尝试mount该磁盘组,提示版本无效(ORA-15243)[当前的asm程序是11.1而磁盘组信息是11.2 程序当然不一致了]
3.直接使用dd清理该asm disk header信息(dd if=/dev/zero of=/dev/rhdisk1 bs=4096 count=1)
4.重新运行dbca一切工作正常

MOS中相关文章[1460997.1]只适合linux asmlib情况

Applies to:
Oracle Server - Enterprise Edition - Version 11.1.0.7 and later
Information in this document applies to any platform.

Symptoms
On : 11.1.0.7 version, STORAGE

When attempting to create database or query gv$asm_diskgroup,
the following error occurs.

ERROR
-----------------------
ORA-12801: error signaled in parallel query server PZ99, instance dchilcmsdb2.hq.navteq.com:+ASM2 (2)
ORA-15243: 11.2.0.0.0 is not a valid version number


STEPS
-----------------------
The issue can be reproduced at will with the following steps:
1. Previously had 11GR2 installed and configured. Removed this installation then installed 
   11.1.0.7  and created diskgroups using some of the same disks previously used.
2. Attempt to create database and receive the errors. Drop the newly 
   created diskgroups and query the view still get same errors.


BUSINESS IMPACT
-----------------------
The issue has the following business impact:
Due to this issue, users cannot create new database.

Changes
 Removed 11.2.0.1 installation and installed 11.1.0.7 software without cleaning up all of 
 the diskgroup information from previous installation.

Cause
All the current information shows that we are using correct binaries and 
that the diskgroups that are being used have correct comparability settings. 
HTML shows that the disks for the old diskgroup are still being discovered. 
This in conjunction with the text of the error as follows shows that 
we are picking up 11.2.0.0.0 as version from somewhere.
ORA-15243: 11.2.0.0.0 is not a valid version number
 
Problem was caused by the disks that had been used for the 
OCR/Voting disk diskgroup in 11GR2 installation still being present and accessible.
 

Solution
As the root user execute /etc/init.d/oracleasm/deletedisk command against all the disks 
that were previously used for the OCR/Voting disk diskgroup then try the operation again.