ORA-30013导致RAC 节点down掉

今天一朋友让我帮忙分析他们的9.2.0.2 rac 节点2异常down掉原因,相关信息如下:
前提信息

OS:HP-UX B.11.31
DB:9.2.0.2.0 RAC

节点2alert日志信息

Fri Sep  7 13:13:49 2012
ARC0: Completed archiving  log 11 thread 2 sequence 11651
Fri Sep  7 13:31:56 2012
Errors in file /oracle/admin/agent/udump/agent2_ora_797.trc:
ORA-00600: internal error code, arguments: [kgavsd_3], [0], [], [], [], [], [], []
ORA-00028: your session has been killed
Fri Sep  7 13:31:58 2012
Errors in file /oracle/admin/agent/bdump/agent2_pmon_5938.trc:
ORA-30013: undo tablespace 'UNDOTBS1' is currently in use
Fri Sep  7 13:31:58 2012
PMON: terminating instance due to error 30013
Fri Sep  7 13:31:58 2012
Errors in file /oracle/admin/agent/bdump/agent2_lms7_6033.trc:
ORA-30013: undo tablespace '' is currently in use
Fri Sep  7 13:31:58 2012
…………
Errors in file /oracle/admin/agent/bdump/agent2_lms0_6027.trc:
ORA-30013: undo tablespace '' is currently in use
Fri Sep  7 13:31:58 2012
System state dump is made for local instance
Fri Sep  7 13:32:03 2012
Instance terminated by PMON, pid = 5938
Fri Sep  7 14:34:35 2012

这里可以看到因为ORA-30013的错误使得pmon进程异常,从而使得该rac的节点2 down掉.同时这里还发现了ORA-00600[kgavsd_3]错误,是否是因为该ORA-600导致了数据库异常down还是一个偶然机会,我们继续分析

查看ORA-600[kgavsd_3]相关trace文件

*********START PLSQL RUNTIME DUMP************
***Got ORA-28 while running PLSQL***
***********END PLSQL RUNTIME DUMP************
*** 2012-09-07 13:31:56.740
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kgavsd_3], [0], [], [], [], [], [], []
ORA-00028: your session has been killed
Current SQL statement for this session:
--用户补档
DECLARE
  OUT_ERR_CODE NUMBER;
  OUT_ERR_MSG  VARCHAR2(1000);
  V_COUNT NUMBER;
BEGIN   
    WHILE TRUE LOOP
      SELECT COUNT(*) INTO V_COUNT FROM amc_stat_log  where proc_name in('pRunOdsChannelWareData') 
       and run_param=201208 AND STATE='A';
      IF V_COUNT>0 THEN
         dbms_output.put_line('exit loop '|| sysdate);
         EXIT; 
      END IF;
      sys.Dbms_Lock.sleep(600);
      dbms_output.put_line('wake up '|| sysdate);
    END LOOP;
   
   PKG_AME_ODS_DATA.P_Add_TO_AgentServ(201208,OUT_ERR_CODE,OUT_ERR_MSG);
   PKG_AME_ODS_DATA.P_Update_Serv_Ware_ID(201208, OUT_ERR_CODE, OUT_ERR_MSG);
   PKG_AMS_SETTLE.P_COMMISION_51PRE_FLAG(201208,OUT_ERR_CODE, OUT_ERR_MSG); 
END;
----- PL/SQL Call Stack -----
  object      line  object
  handle    number  name
c000000e84ec59d0        14  anonymous block
----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
Cannot find symbol in .
Cannot find symbol in .
Cannot find symbol in .
ksedmp()+512         call     9fffffffffff3940     000000000 ?
                                                   C000000000000A17 ?
                                                   40000000025B6540 ?
ksfdmp()+64          call     9fffffffffff3940     000000003 ?
kgerinv()+352        call     9fffffffffff3940     60000000000466B0 ?
                                                   000000003 ?
                                                   C000000000000714 ?
                                                   4000000004EBA6A0 ?
                                                   00001821B ?
                                                   6000000000468EA8 ?
kgesinv()+48         call     9fffffffffff3940     60000000000466B0 ?
                                                   600000000059BD98 ?
                                                   600000000046B070 ?
                                                   60000000000179C0 ?
                                                   6000000000017950 ?
kgesin()+112         call     9fffffffffff3940     60000000000466B0 ?
                                                   600000000059BD98 ?
                                                   4000000000B44C10 ?
                                                   000000001 ?
                                                   9FFFFFFFFFFF4310 ?
$cold_kgavsd_stackl  call     9fffffffffff3940     60000000000466B0 ?
et_done()+1184                                     600000000059BD98 ?
                                                   4000000000B44C10 ?
                                                   000000001 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   00001FE19 ?
pbesd_stacklet_done  call     9fffffffffff3940     60000000000466B0 ?
()+96                                              000000002 ? 000000000 ?
                                                   9FFFFFFFBEC6AE68 ?
pfrrun()+3328        call     9fffffffffff3940     9FFFFFFFBEC6AE68 ?
                                                   C000000000001D42 ?
                                                   4000000001ABBDA0 ?
                                                   9FFFFFFFBEC6B06C ?
                                                   9FFFFFFFFFFF64F0 ?
                                                   6000000000468EA8 ?
                                                   000000000 ? 000000000 ?
peicnt()+480         call     0000000000000000     9FFFFFFFBEC6AEE8 ?
                                                   C000000000000996 ?
                                                   40000000019F3680 ?
kkxexe()+816         call     0000000000000000     9FFFFFFFFFFF64F0 ?
                                                   9FFFFFFFBEC6AE68 ?
                                                   C00000000000099B ?
                                                   4000000001DD90F0 ?
                                                   00000FE4F ?
                                                   9FFFFFFFFFFF5F00 ?
                                                   60000000000467F0 ?
                                                   4000000000B603F0 ?
opiexe()+11168       call     0000000000000000     000000000 ?
                                                   C000000000002C60 ?
                                                   4000000001BEF980 ?
                                                   00000800F ?
                                                   9FFFFFFFFFFF6470 ?
                                                   9FFFFFFFBEC6AEB2 ?
                                                   6000000000040900 ?
                                                   9FFFFFFFBEC6B534 ?
opiall0()+3184       call     _etext_f()+23058430  000000004 ? 0000000C4 ?
                              09110686928          9FFFFFFFFFFF7B40 ?
                                                   C000000000002BDF ?
                                                   4000000001B26CD0 ?
                                                   000000000 ? 00000C893 ?
                                                   9FFFFFFFFFFF6690 ?
Cannot find symbol in .
kpoal8()+2064        call     9fffffffffff7ad0     000000001 ?
                                                   9FFFFFFFFFFF8304 ?
                                                   FFFFFFFFBFFFFFFF ?
                                                   9FFFFFFFFFFF83E4 ?
                                                   FFFFFFFFFFE7FBDF ?
                                                   9FFFFFFFFFFF7B88 ?
                                                   000000000 ?
                                                   6000000000474528 ?
opiodr()+3584        call     9fffffffffff81fc     6000000000040950 ?
                                                   000000000 ? 000000000 ?
                                                   C000000000002C60 ?
                                                   4000000001C09FE0 ?
                                                   00000C50B ?
                                                   9FFFFFFFFFFF81F0 ?
                                                   9FFFFFFFFFFF81D8 ?
ttcpip()+3776        call     _etext_f()+23058430  00000005E ? 000000014 ?
                              09114957288          9FFFFFFFFFFFA5F0 ?
                                                   6000000000040918 ?
                                                   C000000000001ABD ?
                                                   4000000001AB3BA0 ?
                                                   000000000 ? 00000C59B ?
opitsk()+1872        call     9fffffffffffa200     6000000000049FC0 ?
                                                   000000001 ?
                                                   9FFFFFFFFFFFA5F0 ?
                                                   000000001 ?
                                                   9FFFFFFFFFFFA740 ?
                                                   9FFFFFFFFFFFA564 ?
                                                   9FFFFFFFBF780058 ?
                                                   000000000 ?
opiino()+3184        call     000000000000057b     000000000 ? 000000000 ?
                                                   C00000000000132B ?
                                                   4000000001F78730 ?
                                                   000008001 ?
opiodr()+3584        call     0000000000000000     6000000000548A38 ?
                                                   4000000000B606F0 ?
                                                   6000000000548A38 ?
                                                   C000000000002C60 ?
                                                   4000000001C09FE0 ?
                                                   00000A201 ?
                                                   9FFFFFFFFFFFBC90 ?
                                                   4000000000B606F0 ?
opidrv()+976         call     _etext_f()+23058430  00000003C ? 000000004 ?
                              09114957288          9FFFFFFFFFFFEFB0 ?
                                                   6000000000040918 ?
sou2o()+80           call     _etext_f()+23058430  000000004 ? 000000004 ?
                              09114957288          9FFFFFFFFFFFEFB0 ?
main()+352           call     _etext_f()+23058430  9FFFFFFFFFFFEFD0 ?
                              09114957288          9FFFFFFFFFFFEFD4 ?
                                                   60000000004744F0 ?
                                                   9FFFFFFFFFFFEFB0 ?
main_opd_entry()+80  call     _etext_f()+23058430  000000000 ?
                              09114957288          9FFFFFFFFFFFF498 ?
                                                   C000000000000004 ?
                                                   C00000000002BE30 ?
 
--------------------- Binary Stack Dump ---------------------

Process global information:
     process: c000000d6428c0c0, call: c000000e46e772a8, xact: 0000000000000000, 
     curses: c000000d6437d020, usrses: c000000d6437d020
  ----------------------------------------
  SO: c000000d6428c0c0, type: 2, owner: 0000000000000000, flag: INIT/-/-/0x00
  (process) Oracle pid=282, calls cur/top: c000000e46e772a8/c000000e46e772a8, flag: (0) -
            int error: 28, call error: 0, sess error: 0, txn error 0
  (post info) last post received: 0 0 0
              last post received-location: No post
              last process to post me: c000000d64234f18 1 6
              last post sent: 0 0 104
              last post sent-location: kglpsl: in loop
              last process posted by me: c000000d6428e900 23 0
    (latch info) wait_event=0 bits=0
    Process Group: DEFAULT, pseudo proc: c000000d62234ee0
    O/S info: user: oracle, term: UNKNOWN, ospid: 797
    OSD pid info: Unix process pid: 797, image: oracle@gzagent2 (TNS V1-V3)
    ----------------------------------------
    SO: c000000d6437d020, type: 4, owner: c000000d6428c0c0, flag: INIT/-/-/0x00
    (session) trans: 0000000000000000, creator: c000000d6428c0c0, flag: (41) USR/- BSY/-/-/-/KIL/-
              DID: 0000-0000-00000000, short-term DID: 0000-0000-00000000
              txn branch: 0000000000000000
              oct: 0, prv: 0, sql: c000000e76b03f10, psql: 0000000000000000, user: 31/CUSTOM
    O/S info: user: huangqianhai_lc, term: SVCTAG-D1MLV2X, ospid: 11124:11796, machine: WORKGROUP\SVCTAG-D1MLV2X
              program: plsqldev.exe
    application name: PL/SQL Developer, hash value=1190136663
    action name: 测试窗口 - 新建, hash value=3604520210
    last wait for 'null event' blocking sess=0x0 seq=142 wait_time=567341620
                =ea60, =0, =0
    temporary object counter: 0
      ----------------------------------------

通过这里可以看出来是因为pl/sql dev进行一个plsql的操作导致该错误发生,查询MOS[ID 403575.1]发现

Applies to:

Oracle Server - Enterprise Edition - Version: 9.2.0.7 and later   [Release: 9.2 and later ]
Information in this document applies to any platform.
***Checked for relevance on 27-Oct-2010***
Symptoms

The following errors appears in the alert log file :

Probe:read_pipe: receive failed, status 3
Probe:S:debug_loop: timeout. Action 1

*********START PLSQL RUNTIME DUMP************
***Got ORA-604 while running PLSQL***
***********END PLSQL RUNTIME DUMP************

ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kgavsd_3], [0], [], [], [], [], [], []
ORA-00604: error occurred at recursive SQL level 2

Current SQL statement for this session:
begin :id := sys.dbms_transaction.local_transaction_id; end;


.
Cause

The ora-600 kgavsd_3 appears when calling kgavsd_stacklet_done function which is related to PLSQL DEBUG.

From traces, dbms_debug package is being used during trace generation.
The return code of 3 further indicates that the dbms_pipe message was interrupted.

Probably user cancelled a plsql program, so the appeared while trying to dump the stack

Solution

There is no data corruption over here. 
The error appears to be due to abnormal termination of aPL/SQL Developer application while executing a PL/SQL block.
Changing the PL/SQL and/or the procedure code could help in avoiding this error message.

Hence, this error can be safely ignored.

查找trace文件确实发现有name=SYS.DBMS_DEBUG,进一步表明该错误是由于plsql dev工具使用debug模式运行上面的plsql而引起该错误的发生,但是因为mos中记录和错误不是完全的一致,所以不能十分确定是该错误导致数据库down掉

继续分析ORA-30013

Error:      ORA-30013  (ORA-30013)
Text:      undo tablespace '%s' is currently in use 
---------------------------------------------------------------------------
Cause:    the specified undo tablespace is currently used by another instance. 
Action:    Wait for the undo tablespace to become available or change to another name and reissue the statement.

这个说明是没有疑问的:因为2节点配置的当前undo是UNDOTBS2,而UNDOTBS1是1节点使用的,证明这里的undo确实发生了错误,继续查询mos发现Bug 3368552

Hdr: 3368552 9.2.0.3 RDBMS 9.2.0.3 RAC PRODID-5 PORTID-23
Abstract: RAC:  ORA-30013 WHEN INSTANCE 2 ATTEMPTS TO ACCESS UNDO TABLESPACE OF INSTANCE 1
 
*** 01/12/04 06:21 am ***
TAR:
----
3554549.995
 
PROBLEM:
--------
The RAC database has been stable, but experienced an instance termination due 
to ORA-30031 error in the alert log (instance 2):
 
...
Tue Dec 23 03:01:46 2003
ARC1: Evaluating archive  log 4 thread 2 sequence 1116
ARC1: Beginning to archive log 4 thread 2 sequence 1116
Creating archive destination LOG_ARCHIVE_DEST_1: 
'/oracle/oradata/VLDB/logs/archives/VLDBN2/VLDB_0000001116_0002.arc'
ARC1: Completed archiving  log 4 thread 2 sequence 1116
Tue Dec 23 08:14:09 2003
Errors in file /oracle/admin/VLDB/bdump/vldbn2_pmon_22860.trc:
ORA-30013: undo tablespace 'UNDOTBS1' is currently in use
Tue Dec 23 08:14:09 2003
PMON: terminating instance due to error 30013
Tue Dec 23 08:14:10 2003
System state dump is made for local instance
Tue Dec 23 08:14:12 2003
Trace dumping is performing id=[cdmp_20031223081410]
Tue Dec 23 08:14:14 2003
Instance terminated by PMON, pid = 22860
<eof>
 
Instance 1 alert log shows only the reconfiguration and the cdump info:
..
Tue Dec 23 03:54:13 2003
Errors in file /oracle/admin/VLDB/udump/vldbn1_ora_13564.trc:
ORA-27037: unable to obtain file status
SVR4 Error: 2: No such file or directory
Additional information: 4
Tue Dec 23 08:14:10 2003
Trace dumping is performing id=[cdmp_20031223081410]
Tue Dec 23 08:14:12 2003
Reconfiguration started
List of nodes: 0,
Global Resource Directory frozen
one node partition
Communication channels reestablished
...

因为在9.2.0.3的RAC中有着该bug,那么我们可以大胆猜测在9.2.0.2中应该存在该bug,那么结合上面的ORA-00600[kgavsd_3]错误,我们大概还原该事故的全部:
1.节点1 dml操作了程序中报错的plsql中要范围的部分表对象,但是未提交(或者正在执行)
2.节点2 有用户使用pl/sql dev去执行程序中的plsql,因为是debug模式执行,需要UNDOTBS1的块来构建cr,从而使得节点2去访问UNDOTBS1,引发了Bug 3368552 从而使得数据库直接kill掉该plsql dev会话,进而出现ORA-00600[kgavsd_3]错误和pmon进程异常使得节点2 down掉

TXChecker初试

当我们对数据库进行异常恢复,很多时候要选择屏蔽回滚段,但是我们有没有办法来评估屏蔽回滚段到底会对我们的数据库的数据产生多大影响呢?其实我们可以通过TXChecker工具来评估数据库undo异常时候受到影响的数据对象有哪些,从而进一步确定是否真的需要对其undo下手处理.
模拟数据异常事务为提交情况

[oracle@xifenfei ~]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.4.0 - Production on Thu Aug 30 20:32:27 2012

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.

Connected to an idle instance.

SQL> startup;
ORACLE instance started.

Total System Global Area  318767104 bytes
Fixed Size                  1267236 bytes
Variable Size             100665820 bytes
Database Buffers          209715200 bytes
Redo Buffers                7118848 bytes
Database mounted.
Database opened.
SQL> create table t_xifenfei tablespace system 
  2  as
  3  select * from dba_objects;

Table created.

SQL> create table chf.t_xifenfei tablespace users
  2  as
  3  select * from dba_objects;

Table created.

SQL> delete from t_xifenfei where rownum<10;

9 rows deleted.

SQL> delete from chf.t_xifenfei where rownum<100;

99 rows deleted.

SQL> alter system checkpoint;

System altered.

SQL> shutdow abort;
ORACLE instance shut down.

在system和users表空间分别模拟了sys和chf的t_xifenfei表含事务未提交情况

TXChecker用法说明

[oracle@xifenfei txchecker]$ ./TXChecker 


TXChecker - v1.4 by Center Of Expertise (COE), Oracle Corporation (build 10/16/07)

Usage is: TXChecker [options]
Options:
  -a                     When scanning datafiles (with -d/-f/-l/-t options) report objects using any of the undo
                         segments (not just those with errors) (OPTIONAL)
  -b                     For objects found, print the datablock addresses. See readme for further details (OPTIONAL)
  -c<controlfile_name>   Fully qualified controlfile name to read (MANDATORY)
  -d                     Scan database for active TXs (use when undo not available) (OPTIONAL)
  -f<filename>           Scan the named datafile for active TXs (OPTIONAL)
  -g                     Indicates you want to find all blocks taking part in transactions with
                         a USN > than the USN supplied in -x<XID> parameter (same constraints as -w) (OPTIONAL)
  -l<listfile>           Scan all the datafiles listed in the listfile for active TXs (OPTIONAL)
  -m<minutes>            Number of minutes used to consider a TX as active (1-120) (DEFAULTS TO 60 MINUTES)
  -p                     Show the names and last known status of the UNDO segments (OPTIONAL)
  -s                     Skip read-only or offline normal datafiles (OPTIONAL)
  -t<tablespace>         Scan all the datafiles for this tablespace (OPTIONAL)
  -u                     Report ITL entries active if marked with an upper bound ('U' flag) fast commit SCN
                         instead if active transactions (OPTIONAL)
  -w<wrap#>              Wrap# for XID in ITL entry to report blocks where wrap# > this one (OPTIONAL)
                         Must use -x with this option. See the readme for details
  -x<XID>                XID for transaction wanting to search for (OPTIONAL)
                         Use format rrrr.ssss.wwwwwwww using Hexadecimal numbers
                         See the readme for full instructions on using -x, -w and -g options

NOTE: Options -d/-f/-l/-t are exclusive, and only one should be specified.
[/sql]
<strong>TXChecker初始</strong>
1
[oracle@xifenfei txchecker]$ ./TXChecker -c/u01/oracle/oradata/XFF/control01.ctl -d -a


TXChecker - v1.4 by Center Of Expertise (COE), Oracle Corporation (build 10/16/07)
Database Name: XFF   Version: 10.2.0

*** Database last checkpointed at 08/30/2012 20:34:43 (SCN: 0xa.0x1e142)
*** Using 60 minutes to find most active transactions (-m60)

*** Undo Segments:
USN:   0  Name: SYSTEM          TBS#:  0 File:   1  Block:      9  Instance:  0  SMU: N  Status:  3 - Online 
                                SCN: 0000.00000000   XactSQN: 0x00000000   UndoSQN: 0x00000000
USN:   1  Name: _SYSSMU1$       TBS#:  1 File:   2  Block:      9  Instance:  0  SMU: Y  Status:  3 - Online 
                                SCN: 000a.000191c3   XactSQN: 0x000000d2   UndoSQN: 0x0000013c
…………省略
USN:   9  Name: _SYSSMU9$       TBS#:  1 File:   2  Block:    137  Instance:  0  SMU: Y  Status:  3 - Online 
                                SCN: 000a.000191d8   XactSQN: 0x0000011a   UndoSQN: 0x000000dc
USN:  10  Name: _SYSSMU10$      TBS#:  1 File:   2  Block:    153  Instance:  0  SMU: Y  Status:  3 - Online 
                                SCN: 000a.000191d7   XactSQN: 0x000000c1   UndoSQN: 0x000000d9
…………省略
USN:  20  Name: _SYSSMU20$      TBS#:  5 File:   5  Block:    153  Instance:  0  SMU: Y  Status:  1 - Invalid / Dropped 
                                SCN: 0000.00070ec6   XactSQN: 0x00000002   UndoSQN: 0x00000001

*** Active Transactions:
USN: 6  Name: _SYSSMU6$  File: 2  Block: 89  Instance: 0  Status: 3 - Online
* Active TX at slot 6  #undo blocks: 3  Last bk: 2.90

Obj#:  51938  Name: CHF.T_XIFENFEI                                    Type: TABLE               Undo recs: 99
              Used undo segment IDs: 6 

Obj#:  51937  Name: SYS.T_XIFENFEI                                    Type: TABLE               Undo recs: 9
              Used undo segment IDs: 6 

         File (validate_objects.sql) being created for object validattion already exists
         and will be overwritten!!!
         Do you want to continue overwriting [Y/N]?y

*** Undo segments (headers) that encountered errors preventing Active TX scan:
USN:  11  Name: _SYSSMU11$      File:   5  Block:       9  Instance:  0  Error: 27 - Undo segment was dropped
…………省略
USN:  20  Name: _SYSSMU20$      File:   5  Block:     153  Instance:  0  Error: 27 - Undo segment was dropped

WARNING: Analyzing the full database for active transactions will take some time!

         Are you sure you want to analyze the full database [Y/N]?
         Are you sure you want to analyze the full database [Y/N]?y

*** Scanning database for datablocks that may require undo (PLEASE WAIT...):
*** Asterisk ('*') denotes blocks being updated since 08/08/2012 03:30:32 (SCN: 0x0.0x79607)

Scanning datafile:  3 - /u01/oracle/oradata/XFF/sysaux01.dbf (SYSAUX)                           - Active TX blocks: 147 *
Scanning datafile:  1 - /u01/oracle/oradata/XFF/system01.dbf (SYSTEM)                           - Active TX blocks: 11597 *
--undo表空间跳过
Undo datafile (/u01/oracle/oradata/XFF/undotbs01.dbf) - SKIPPING
Scanning datafile:  4 - /u01/oracle/oradata/XFF/users01.dbf (USERS)                             - Active TX blocks: 2 *
--因为数据文件丢失控制文件中offline的跳过(其实只要数据文件丢失就会跳过)
Cannot access datafile (/u01/oracle/oradata/XFF/xifenfei01.dbf) (error 2 - No such file or directory) - SKIPPING
Cannot access datafile (/u01/oracle/oradata/XFF/xifenfei02.dbf) (error 2 - No such file or directory) - SKIPPING
Scanning datafile:  7 - /u01/oracle/oradata/XFF/xifenfei03.dbf (XIFENFEI2)                      - Active TX blocks: 0

*** Objects that may require undo data:
*** Asterisk ('*') denotes blocks being updated since 08/08/2012 03:30:32 (SCN: 0x0.0x79607)

DataObj#:  51938  Name: CHF.T_XIFENFEI                                    Type: TABLE               Datablocks: 2 *
Used undo segment IDs: 6 
DataObj#:  46434  Name: MDSYS.SYS_IL0000046432C00006$$                    Type: INDEX               Datablocks: 4 
Used undo segment IDs: 2 
DataObj#:    124  Name: SYS.I_ACCESS1                                     Type: INDEX               Datablocks: 27 
Used undo segment IDs: 1 2 3 4 5 6 7 8 9 10 
…………省略
DataObj#:   8824  Name: SYS.SYS_IL0000008822C00008$$                      Type: INDEX               Datablocks: 1 *
Used undo segment IDs: 4 
DataObj#:  51937  Name: SYS.T_XIFENFEI                                    Type: TABLE               Datablocks: 1 *
Used undo segment IDs: 6 
DataObj#:  51557  Name: SYS.UTL_RECOMP_COMPILED                           Type: TABLE               Datablocks: 1 
Used undo segment IDs: 8 
…………省略
DataObj#:  42131  Name: XDB.XDB$ELEMENT_PROPNUMBER                        Type: INDEX               Datablocks: 1 
Used undo segment IDs: 2 

*** Use validate_objects.sql script file to validate the structure of possibly corrupt objects
    if the undo required is not available.

Undo Segment Usage Summary
**************************
*** Undo segments identified in use by active transaction datablocks AFTER 08/08/2012 03:30:32 (SCN: 0x0.0x79607):
USN:   1  Name: _SYSSMU1$     
USN:   2  Name: _SYSSMU2$     
USN:   3  Name: _SYSSMU3$     
USN:   4  Name: _SYSSMU4$     
USN:   5  Name: _SYSSMU5$     
USN:   6  Name: _SYSSMU6$     
USN:   9  Name: _SYSSMU9$     

*** Undo segments identified in use by active transacation datablocks BEFORE 08/08/2012 03:30:32 (SCN: 0x0.0x79607):
USN:   1  Name: _SYSSMU1$     
USN:   2  Name: _SYSSMU2$     
USN:   3  Name: _SYSSMU3$     
USN:   4  Name: _SYSSMU4$     
USN:   5  Name: _SYSSMU5$     
USN:   7  Name: _SYSSMU7$     
USN:   8  Name: _SYSSMU8$     
USN:   9  Name: _SYSSMU9$     
USN:  10  Name: _SYSSMU10$    

NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!
*************************************************************************************
Upload the logfile (TXChecker_083012_2042_XFF.log) to Oracle Support Services for analysis.
Do NOT attempt to force the database open until the logfile has been analyzed.

验证对象脚本

[oracle@xifenfei txchecker]$ more validate_objects.sql
rem validate_objects.sql - checks strcuture of objects needing unavailable undo data
rem                        Created by findtxns program, Oracle Corporation
set echo on
ANALYZE TABLE CHF.T_XIFENFEI VALIDATE STRUCTURE CASCADE;
ANALYZE INDEX MDSYS.SYS_IL0000046432C00006$$ VALIDATE STRUCTURE;
ANALYZE INDEX SYS.I_ACCESS1 VALIDATE STRUCTURE;
…………省略
ANALYZE INDEX XDB.SYS_C003167 VALIDATE STRUCTURE;
ANALYZE INDEX XDB.XDB$ELEMENT_PROPNAME VALIDATE STRUCTURE;
ANALYZE INDEX XDB.XDB$ELEMENT_PROPNUMBER VALIDATE STRUCTURE;

dbca创建数据库报ORA-00443

今天早上一个朋友和我说他们RAC dbca创建库不成功提示ORA-00443错误,让我帮他们分析下是什么原因导致
提示错误如图

环境状况

OS:LINUX REDHAT x86_64  5.7 
kernel:2.6.18-194.el5
memory:100G
CPU:ntel(R) Xeon(R) CPU E7- 8837  @ 2.67GHz   *  64
ORACLE:10.2.0.4

查看alert日志错误

Wed Sep  5 01:32:33 2012
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
PMON started with pid=2, OS id=17859
DIAG started with pid=7, OS id=17861
PSP0 started with pid=12, OS id=17863
LMON started with pid=17, OS id=17865
LMD0 started with pid=22, OS id=17867
MMAN started with pid=27, OS id=17869
DBW0 started with pid=32, OS id=17871
Wed Sep  5 01:32:33 2012
Errors in file /u01/app/oracle/admin/dtjcdb/bdump/dtjcdb1_ora_17873.trc:
ORA-00600: internal error code, arguments: [ksbmoveme4], [], [], [], [], [], [], []
ORA-27300: OS system dependent operation:run on node failed with status: 2
ORA-27301: OS failure message: No such file or directory
ORA-27302: failure occurred at: skgpmoveme:1
Wed Sep  5 01:32:34 2012
Trace dumping is performing id=[cdmp_20120905013234]
Wed Sep  5 01:32:34 2012
Process DBW1 died, see its trace file
USER: terminating instance due to error 443
Instance terminated by USER, pid = 17857

trace文件中内容

*** 2012-09-05 01:32:33.996
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [ksbmoveme4], [], [], [], [], [], [], []
ORA-27300: OS system dependent operation:run on node failed with status: 2
ORA-27301: OS failure message: No such file or directory
ORA-27302: failure occurred at: skgpmoveme:1
Current SQL information unavailable - no session.
----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
ksedst()+31          call     ksedst1()            000000000 ? 000000001 ?
                                                   7FFFC4ABA4C0 ? 7FFFC4ABA520 ?
                                                   7FFFC4ABA460 ? 000000000 ?
ksedmp()+610         call     ksedst()             000000000 ? 000000001 ?
                                                   7FFFC4ABA4C0 ? 7FFFC4ABA520 ?
                                                   7FFFC4ABA460 ? 000000000 ?
ksfdmp()+21          call     ksedmp()             000000003 ? 000000001 ?
                                                   7FFFC4ABA4C0 ? 7FFFC4ABA520 ?
                                                   7FFFC4ABA460 ? 000000000 ?
kgerinv()+161        call     ksfdmp()             000000003 ? 000000001 ?
                                                   7FFFC4ABA4C0 ? 7FFFC4ABA520 ?
                                                   7FFFC4ABA460 ? 000000000 ?
kgesinv()+33         call     kgerinv()            0068966E0 ? 000000000 ?
                                                   7FFFC4ABA520 ? 7FFFC4ABA460 ?
                                                   000000000 ? 000000000 ?
ksesin()+211         call     kgesinv()            0068966E0 ? 000000000 ?
                                                   7FFFC4ABA520 ? 7FFFC4ABA460 ?
                                                   000000000 ? 000000000 ?
ksbmoveme()+350      call     ksesin()             00533D5C8 ? 000000000 ?
                                                   006896FA3 ? 000000001 ?
                                                   000000001 ? 000000000 ?
ksosp_set_current()  call     ksbmoveme()          000000001 ? 000000000 ?
+117                                               006896FA3 ? 000000001 ?
                                                   000000001 ? 000000000 ?
kso_init()+161       call     ksosp_set_current()  151056D10 ? 000000000 ?
                                                   006896FA3 ? 000000001 ?
                                                   000000001 ? 000000000 ?
opirip()+523         call     kso_init()           151056D10 ? 000000000 ?
                                                   006896FA3 ? 000000001 ?
                                                   000000001 ? 000000000 ?
opidrv()+582         call     opirip()             000000032 ? 000000004 ?
                                                   7FFFC4ABC128 ? 000000001 ?
                                                   000000001 ? 000000000 ?
sou2o()+114          call     opidrv()             000000032 ? 000000004 ?
                                                   7FFFC4ABC128 ? 000000001 ?
                                                   000000001 ? 000000000 ?
opimai_real()+317    call     sou2o()              7FFFC4ABC100 ? 000000032 ?
                                                   000000004 ? 7FFFC4ABC128 ?
                                                   000000001 ? 000000000 ?
main()+116           call     opimai_real()        000000003 ? 7FFFC4ABC190 ?
                                                   000000004 ? 7FFFC4ABC128 ?
                                                   000000001 ? 000000000 ?
__libc_start_main()  call     main()               000000003 ? 7FFFC4ABC190 ?
+244                                               000000004 ? 7FFFC4ABC128 ?
                                                   000000001 ? 000000000 ?
_start()+41          call     __libc_start_main()  000723088 ? 000000001 ?
                                                   7FFFC4ABC2E8 ? 000000000 ?
                                                   000000001 ? 000000003 ?
 
--------------------- Binary Stack Dump ---------------------

通过查询MOS发现[ID 422908.1]有类此的错误提示,但是该提示说是因为系统重新增加了过多CPU导致数据库crashed掉并且出现 ORA-27300 ORA-27301 ORA-27302 错误.在该案例中,起始就是64c,根据经验在win的10.2.0.4中如果cpu超过32c也是在dbcd创建数据库2%的地方hang住,所以怀疑该错误也是由于cpu太多导致.

处理方法

To solve the problem:

1) apply patch:6471079

- or -

2) apply the 10.2.0.5 (when available)

- or -

3) upgrade to 11g

朋友打上patch:6471079,dbca正常建库

_no_recovery_through_resetlogs参数功能探讨

_no_recovery_through_resetlogs参数默认值和描述

SQL> select a.ksppinm name,b.ksppstvl value,a.ksppdesc description
  2    from x$ksppi a,x$ksppcv b
  3   where a.inst_id = USERENV ('Instance')
  4     and b.inst_id = USERENV ('Instance')
  5     and a.indx = b.indx
  6     and upper(a.ksppinm) LIKE upper('%&param%')
  7  order by name
  8  /
Enter value for param: _no_recovery_through_resetlogs
old   6:    and upper(a.ksppinm) LIKE upper('%&param%')
new   6:    and upper(a.ksppinm) LIKE upper('%_no_recovery_through_resetlogs%')

NAME                             VALUE                    DESCRIPTION
-------------------------------- ------------------------ --------------------------------------------
_no_recovery_through_resetlogs   FALSE                    no recovery through this resetlogs operation

大家知道在10gr2版本及其以后版本,大家知道默认情况下,可以实现跨resetlogs恢复数据库.通过该参数的描述可以看出,该参数的用途是使得resetlogs之后不能继续进行恢复(我的理解是以前的备份不能应用resetlogs后的归档日志)
在实际中该函数的作用是否和该参数的描述相符,我们通过试验验证

rman备份数据库

[oracle@xifenfei tmp]$ rman target /

Recovery Manager: Release 10.2.0.4.0 - Production on Thu Aug 30 11:51:49 2012

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

connected to target database: XFF (DBID=3440302261)

RMAN> backup database format '/u01/oracle/oradata/tmp/10g_db_%U';

Starting backup at 30-AUG-12
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=143 devtype=DISK
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00001 name=/u01/oracle/oradata/XFF/system01.dbf
input datafile fno=00003 name=/u01/oracle/oradata/XFF/sysaux01.dbf
input datafile fno=00002 name=/u01/oracle/oradata/XFF/undotbs01.dbf
input datafile fno=00004 name=/u01/oracle/oradata/XFF/users01.dbf
channel ORA_DISK_1: starting piece 1 at 30-AUG-12
channel ORA_DISK_1: finished piece 1 at 30-AUG-12
piece handle=/u01/oracle/oradata/tmp/10g_db_0anjuhve_1_1 tag=TAG20120830T115214 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:01:35
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
including current control file in backupset
including current SPFILE in backupset
channel ORA_DISK_1: starting piece 1 at 30-AUG-12
channel ORA_DISK_1: finished piece 1 at 30-AUG-12
piece handle=/u01/oracle/oradata/tmp/10g_db_0bnjui2d_1_1 tag=TAG20120830T115214 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:03
Finished backup at 30-AUG-12

resetlogs打开数据库

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount;
ORACLE instance started.

Total System Global Area  318767104 bytes
Fixed Size                  1267236 bytes
Variable Size             100665820 bytes
Database Buffers          209715200 bytes
Redo Buffers                7118848 bytes
Database mounted.

SQL> recover database until cancel;
Media recovery complete.
SQL> alter database open resetlogs;

Database altered.

创建测试表

SQL> create table t_xifenfei01
  2  as
  3  select * from dba_tables;

Table created.

SQL> create table t_xifenfei02
  2  as
  3  select * from dba_objects;

Table created.

SQL>  alter system switch logfile;

System altered.

SQL> /

System altered.

SQL> create table t_xifenfei03 
  2  as
  3  select * from dba_objects;

Table created.

SQL> alter system switch logfile;

System altered.

SQL> /

System altered.

恢复数据库

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.

[oracle@xifenfei XFF]$ rm *.dbf
[oracle@xifenfei XFF]$ rman target /

Recovery Manager: Release 10.2.0.4.0 - Production on Thu Aug 30 12:00:47 2012

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

connected to target database (not started)

RMAN> startup mount

Oracle instance started
database mounted

Total System Global Area     318767104 bytes

Fixed Size                     1267236 bytes
Variable Size                100665820 bytes
Database Buffers             209715200 bytes
Redo Buffers                   7118848 bytes

RMAN> restore database;

Starting restore at 30-AUG-12
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=157 devtype=DISK


channel ORA_DISK_1: starting datafile backupset restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
restoring datafile 00001 to /u01/oracle/oradata/XFF/system01.dbf
restoring datafile 00002 to /u01/oracle/oradata/XFF/undotbs01.dbf
restoring datafile 00003 to /u01/oracle/oradata/XFF/sysaux01.dbf
restoring datafile 00004 to /u01/oracle/oradata/XFF/users01.dbf
channel ORA_DISK_1: reading from backup piece /u01/oracle/oradata/tmp/10g_db_0anjuhve_1_1
channel ORA_DISK_1: restored backup piece 1
piece handle=/u01/oracle/oradata/tmp/10g_db_0anjuhve_1_1 tag=TAG20120830T115214
channel ORA_DISK_1: restore complete, elapsed time: 00:00:55
Finished restore at 30-AUG-12

RMAN> recover database;

Starting recover at 30-AUG-12
using channel ORA_DISK_1

starting media recovery

archive log thread 1 sequence 7 is already on disk as file /u01/oracle/oradata/XFF/archivelog/1_7_790743352.dbf
archive log thread 1 sequence 8 is already on disk as file /u01/oracle/oradata/XFF/archivelog/1_8_790743352.dbf
archive log thread 1 sequence 9 is already on disk as file /u01/oracle/oradata/XFF/archivelog/1_9_790743352.dbf
archive log thread 1 sequence 10 is already on disk as file /u01/oracle/oradata/XFF/archivelog/1_10_790743352.dbf
archive log thread 1 sequence 1 is already on disk as file /u01/oracle/oradata/XFF/archivelog/1_1_792676624.dbf
archive log thread 1 sequence 2 is already on disk as file /u01/oracle/oradata/XFF/archivelog/1_2_792676624.dbf
archive log thread 1 sequence 3 is already on disk as file /u01/oracle/oradata/XFF/archivelog/1_3_792676624.dbf
archive log thread 1 sequence 4 is already on disk as file /u01/oracle/oradata/XFF/archivelog/1_4_792676624.dbf
archive log filename=/u01/oracle/oradata/XFF/archivelog/1_7_790743352.dbf thread=1 sequence=7
archive log filename=/u01/oracle/oradata/XFF/archivelog/1_8_790743352.dbf thread=1 sequence=8
archive log filename=/u01/oracle/oradata/XFF/archivelog/1_9_790743352.dbf thread=1 sequence=9
archive log filename=/u01/oracle/oradata/XFF/archivelog/1_10_790743352.dbf thread=1 sequence=10
archive log filename=/u01/oracle/oradata/XFF/archivelog/1_1_792676624.dbf thread=1 sequence=1
archive log filename=/u01/oracle/oradata/XFF/archivelog/1_2_792676624.dbf thread=1 sequence=2
media recovery complete, elapsed time: 00:00:06
Finished recover at 30-AUG-12

SQL> alter database open;

Database altered.

SQL> select table_name from user_tables where table_name like 'T_XIFENFEI0_';

TABLE_NAME
------------------------------
T_XIFENFEI01
T_XIFENFEI02
T_XIFENFEI03

证明10gr2确实可以跨resetlogs recover 日志恢复数据库

测试_no_recovery_through_resetlogs参数

SQL> create table t_xifenfei04 as
  2  select * from dba_objects;

Table created.

SQL> alter system set "_no_recovery_through_resetlogs"=true scope=spfile;

System altered.

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount;
ORACLE instance started.

Total System Global Area  318767104 bytes
Fixed Size                  1267236 bytes
Variable Size             100665820 bytes
Database Buffers          209715200 bytes
Redo Buffers                7118848 bytes
Database mounted.
SQL> recover database until cancel;
Media recovery complete.
SQL> alter database open resetlogs;

Database altered.

SQL> create table t_xifenfei05 
  2  as
  3  select * from dba_objects;

Table created.

SQL> create table t_xifenfei06
  2  as
  3  select * from dba_objects;

Table created.

SQL> alter system switch logfile;

System altered.

SQL> /

System altered.

SQL> /

System altered.

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.

[oracle@xifenfei XFF]$ rm *.dbf
[oracle@xifenfei XFF]$ rman target /

Recovery Manager: Release 10.2.0.4.0 - Production on Thu Aug 30 12:47:40 2012

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

connected to target database (not started)

RMAN> startup mount;

Oracle instance started
database mounted

Total System Global Area     318767104 bytes

Fixed Size                     1267236 bytes
Variable Size                100665820 bytes
Database Buffers             209715200 bytes
Redo Buffers                   7118848 bytes

RMAN> restore database;

Starting restore at 30-AUG-12
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=157 devtype=DISK

channel ORA_DISK_1: starting datafile backupset restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
restoring datafile 00001 to /u01/oracle/oradata/XFF/system01.dbf
restoring datafile 00002 to /u01/oracle/oradata/XFF/undotbs01.dbf
restoring datafile 00003 to /u01/oracle/oradata/XFF/sysaux01.dbf
restoring datafile 00004 to /u01/oracle/oradata/XFF/users01.dbf
channel ORA_DISK_1: reading from backup piece /u01/oracle/oradata/tmp/10g_db_0anjuhve_1_1
channel ORA_DISK_1: restored backup piece 1
piece handle=/u01/oracle/oradata/tmp/10g_db_0anjuhve_1_1 tag=TAG20120830T115214
channel ORA_DISK_1: restore complete, elapsed time: 00:00:45
Finished restore at 30-AUG-12

RMAN> recover database;

Starting recover at 30-AUG-12
using channel ORA_DISK_1
starting media recovery

archive log thread 1 sequence 7 is already on disk as file /u01/oracle/oradata/XFF/archivelog/1_7_790743352.dbf
archive log thread 1 sequence 8 is already on disk as file /u01/oracle/oradata/XFF/archivelog/1_8_790743352.dbf
archive log thread 1 sequence 9 is already on disk as file /u01/oracle/oradata/XFF/archivelog/1_9_790743352.dbf
archive log thread 1 sequence 10 is already on disk as file /u01/oracle/oradata/XFF/archivelog/1_10_790743352.dbf
archive log thread 1 sequence 1 is already on disk as file /u01/oracle/oradata/XFF/archivelog/1_1_792676624.dbf
archive log thread 1 sequence 2 is already on disk as file /u01/oracle/oradata/XFF/archivelog/1_2_792676624.dbf
archive log thread 1 sequence 3 is already on disk as file /u01/oracle/oradata/XFF/archivelog/1_3_792676624.dbf
archive log thread 1 sequence 4 is already on disk as file /u01/oracle/oradata/XFF/archivelog/1_4_792676624.dbf
archive log thread 1 sequence 5 is already on disk as file /u01/oracle/oradata/XFF/archivelog/1_5_792676624.dbf
archive log thread 1 sequence 6 is already on disk as file /u01/oracle/oradata/XFF/archivelog/1_6_792676624.dbf
archive log thread 1 sequence 1 is already on disk as file /u01/oracle/oradata/XFF/archivelog/1_1_792679299.dbf
archive log thread 1 sequence 2 is already on disk as file /u01/oracle/oradata/XFF/archivelog/1_2_792679299.dbf
archive log thread 1 sequence 3 is already on disk as file /u01/oracle/oradata/XFF/archivelog/1_3_792679299.dbf
archive log filename=/u01/oracle/oradata/XFF/archivelog/1_7_790743352.dbf thread=1 sequence=7
archive log filename=/u01/oracle/oradata/XFF/archivelog/1_8_790743352.dbf thread=1 sequence=8
archive log filename=/u01/oracle/oradata/XFF/archivelog/1_9_790743352.dbf thread=1 sequence=9
archive log filename=/u01/oracle/oradata/XFF/archivelog/1_10_790743352.dbf thread=1 sequence=10
--第一次 resetlogs后的归档
archive log filename=/u01/oracle/oradata/XFF/archivelog/1_1_792676624.dbf thread=1 sequence=1 
archive log filename=/u01/oracle/oradata/XFF/archivelog/1_2_792676624.dbf thread=1 sequence=2
archive log filename=/u01/oracle/oradata/XFF/archivelog/1_3_792676624.dbf thread=1 sequence=3
archive log filename=/u01/oracle/oradata/XFF/archivelog/1_4_792676624.dbf thread=1 sequence=4
archive log filename=/u01/oracle/oradata/XFF/archivelog/1_5_792676624.dbf thread=1 sequence=5
archive log filename=/u01/oracle/oradata/XFF/archivelog/1_6_792676624.dbf thread=1 sequence=6
--第二次 resetlogs后的归档(设置了_no_recovery_through_resetlogs参数为true并resetlogs后的归档日志
archive log filename=/u01/oracle/oradata/XFF/archivelog/1_1_792679299.dbf thread=1 sequence=1 
media recovery complete, elapsed time: 00:00:13
Finished recover at 30-AUG-12

RMAN> alter database open;

database opened

RMAN> exit


Recovery Manager complete.
[oracle@xifenfei XFF]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.4.0 - Production on Thu Aug 30 12:49:46 2012

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> select table_name from user_tables where table_name like 'T_XIFENFEI0_';

TABLE_NAME
------------------------------
T_XIFENFEI01
T_XIFENFEI02
T_XIFENFEI03
T_XIFENFEI04
T_XIFENFEI05
T_XIFENFEI06

6 rows selected.

通过这里的测试证明使用_no_recovery_through_resetlogs=true后,resetlogs之后还是可以正常可以recover相关日志,证明_no_recovery_through_resetlogs参数不是限制这里的resetlogs后的归档日志应用,至于该参数的具体用途也希望知道的朋友告知下。但是这个参数在clone db和从rac db恢复到单实例db的时候,可能因为bug原因需要设置该参数为true,如:

--rac恢复到单实例RMAN Duplicate from RAC backup fails ORA-38856 [ID 334899.1]
sql>alter open database resetlogs;
ORA-38856: cannot mark instance UNNAMED_INSTANCE_2 (redo thread 2) as enabled

控制文件异常导致ORA-00600[kccsbck_first]

今天接到一个朋友求救他们的his系统数据库不能访问,情况比较紧急,让我帮忙处理.登录数据库得到信息如下:

操作系统:windows 2003
数据库:8.1.7
容灾方案:双机+emc存储镜像
备份:数据库无任何备份
启动到mount报错类此:
ORA-00600: internal error code, arguments: [kccsbck_first], [1], [4141358753], [], [], [], [], []

这个问题在上周的数据库恢复中遇到过一次,他们也是因为双机的案例,当时的情况见:双机mount数据库出现ORA-00600[kccsbck_first],有了上次的思维,我开始也怀疑是客户的双机的问题,但是客户说双机在半年前就关闭了,没有启动过;因为我对win的双机不太熟悉,怕他们双机软自动启动系统然后接管oracle从而导致这个问题,然后让客户检查另一台机器,确定没有启动和接管oracle 服务.
然后查询MOS发现win上面的特殊之处:是控制文件corruption导致故障(不过dbv检查不出来),而且三个控制文件有同样的问题
MOS记录如下(不过8.1.7也存在同样问题) [ID 291684.1]

Applies to:

Oracle Server - Enterprise Edition - Version: 9.0.1.5 and later   [Release: 9.0.1 and later ]
Information in this document applies to any platform.
***Checked for relevance on 09-APR-2012***
Symptoms



Alter database mount exclusive results in

ORA-00600: internal error code, arguments: [kccsbck_first], [1], [2141358753], [], [], [], [], []

The description of the error is:

'We receive this error because we are attempting to be the first 
thread/instance to mount the database and cannot because it appears that 
at least one other thread has mounted the database already'.


However in this case the database was a standalone database on Windows.
It had only one oracle service running.

The operating system was rebooted, the oracle service was deleted and a new service created.
Even then the error persisted. 
Cause

There was some corruption present in the controlfile.


Solution

In this case the problem was resolved by:

+ Taking a backup of the old control file

+ Recreating the control file using the following document
How to Recreate a Controlfile	 [Document 735106.1]

因为数据库不能mount,所以不能使用backup controlfile to trace;
因为是win系统,没有任何的控制文件备份,只能把控制文件拷贝到linux下面通过strings命令,自己编辑创建控制文件脚本(noresetlogs).执行脚本创建控制文件,recover database,应用redo文件恢复,然后resetlogs库,恢复成功(注意:8i中不需要另外增加临时文件)