记录8.0.5数据库恢复过程

某银行需要恢复2004年磁带备份数据库 8.0.5,当时因为开发商使用begin backup命令进行热备,然后压缩到磁带上.在硬盘紧张,rman不成熟的时代,使用这样的方法备份本身没有大问题,可是因为没有备份归档日志,给现在的恢复带来了很多的问题.
尝试不完全恢复启动库

SVRMGR> startup pfile='init.ora' mount
ORACLE instance started.
Total System Global Area                        141826208 bytes
Fixed Size                                          47264 bytes
Variable Size                                   124829696 bytes
Database Buffers                                 16777216 bytes
Redo Buffers                                       172032 bytes
Database mounted.
SVRMGR> recover database using backup controlfile until cancel;
ORA-00279: change 613561556 generated at 08/21/04 22:42:48 needed for thread 1
ORA-00289: suggestion : /oracle/product/8.0.5/dbs/arch1_55329.dbf
ORA-00280: change 613561556 for thread 1 is in sequence #55329
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
Media recovery cancelled.
SVRMGR> alter database open resetlogs;
alter database open resetlogs
*
ORA-00600: internal error code, arguments: [4146], [31400], [31370], [], [], [], [], []

这个错误是因为回滚段corruption导致,客户已经没有了一致性要求,所以解决办法是通过一些手段找出回滚段并屏蔽

屏蔽回滚段重启库

SVRMGR> shutdown abort
ORACLE instance shut down.
SVRMGR> startup  pfile='init.ora' mount
ORACLE instance started.
Total System Global Area                        141826208 bytes
Fixed Size                                          47264 bytes
Variable Size                                   124829696 bytes
Database Buffers                                 16777216 bytes
Redo Buffers                                       172032 bytes
Database mounted.
SVRMGR> alter database open;
alter database open
*
ORA-00600: internal error code, arguments: [3668], [1], [2], [55992], [55992], [4], [], []

根据ORA-00600[3668]错误的提示,因为重建控制文件,有一个数据文件因为在磁带恢复丢失,所以控制文件在上次重建的时候没有包含进去,根据经验在8.0中可以通过resetlogs解决

resetlogs重新打开数据库

SVRMGR> alter database open resetlogs;
alter database open resetlogs
*
ORA-01139: RESETLOGS option only valid after an incomplete database recovery
SVRMGR> recover datatabase using backup controlfile until cancel;
ORA-00274: illegal recovery option DATATABASE
SVRMGR> recover database using backup controlfile until cancel;
ORA-00279: change 613561558 generated at 07/30/12 09:14:49 needed for thread 1
ORA-00289: suggestion : /oracle/product/8.0.5/dbs/arch1_1.dbf
ORA-00280: change 613561558 for thread 1 is in sequence #1
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
Media recovery cancelled.
SVRMGR> alter database open resetlogs;
alter database open resetlogs
*
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number  with name "" too small

根据经验可能是屏蔽了undo或者undo损坏,然后有事务可能需要读undo,无法读到对应记录,可能出现此错误,因为无trace文件,尝试推进scn解决.当然这个问题可以通过open时做10046然后深入分析也许可以找到原因.

open数据库成功
根据上面的分析,重启库,推进scn,resetlogs库解决问题

SVRMGR> shutdown immediate
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.
SVRMGR> startup pfile='init.ora' mount;
ORACLE instance started.
Total System Global Area                        141826208 bytes
Fixed Size                                          47264 bytes
Variable Size                                   124829696 bytes
Database Buffers                                 16777216 bytes
Redo Buffers                                       172032 bytes
Database mounted.
SVRMGR> alter database open ;
alter database open 
*
ORA-00600: internal error code, arguments: [3668], [1], [2], [55994], [55994], [4], [], []
SVRMGR> recover database using backup controlfile until cancel;
ORA-00279: change 613561560 generated at 07/30/12 09:17:11 needed for thread 1
ORA-00289: suggestion : /oracle/product/8.0.5/dbs/arch1_1.dbf
ORA-00280: change 613561560 for thread 1 is in sequence #1
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
Media recovery cancelled.
SVRMGR> alter database open resetlogs;
Statement processed.

这次恢复比较幸运,在备份到磁带过程中被覆盖的一个同名的数据文件,因为是index,通过查询dba_extents,然后删除并重建相关index,s数据库算恢复完全.有些时候,对数据库多一份备份,可能成为最后的救命稻草,哪怕是一份不是十分完整的备份.再次重复:对dba而言,数据库备份重于一切

bbed修改undo$(回滚段)状态

指定undo$对应数据块(11g 1_225,10g 1_106)[具体时候需要查询]

BBED> set file 1 block 225
        FILE#           1
        BLOCK#          225

查询数据内容,主要关注kdbr

BBED> map
 File: /u01/oracle/oradata/ora11g/system01.dbf (1)
 Block: 225                                   Dba:0x004000e1
------------------------------------------------------------
 KTB Data Block (Table/Cluster)

 struct kcbh, 20 bytes                      @0       

 struct ktbbh, 48 bytes                     @20      

 struct kdbh, 14 bytes                      @68      

 struct kdbt[1], 4 bytes                    @82      

 sb2 kdbr[31]                               @86      <===表示31条记录,从0开始 

 ub1 freespace[3644]                        @148     

 ub1 rowdata[4396]                          @3792    

 ub4 tailchk                                @8188   

查看30号回滚段内容(列举其中一个)

BBED> p *kdbr[30]
rowdata[0]
----------
ub1 rowdata[0]                              @3792     0x2c

BBED> x /rncnnnnnnn
rowdata[0]                                  @3792    
----------
flag@3792: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@3793: 0x01
cols@3794:   17

col    0[2] @3795: 30 
col   1[21] @3798: _SYSSMU30_3379578723$
col    2[2] @3820: 1 
col    3[2] @3823: 3 
col    4[3] @3826: 432 
col    5[1] @3830: 0 
col    6[1] @3832: 0 
col    7[1] @3834: 0 
col    8[1] @3836: 0 
col    9[1] @3838: 0 
col   10[2] @3840: 5     <===修改前对应值undo$.status$
col   11[2] @3843: 6 
col   12[0] @3846: *NULL*
col   13[0] @3847: *NULL*
col   14[0] @3848: *NULL*
col   15[0] @3849: *NULL*
col   16[2] @3850: 2 

BBED> x /r
rowdata[0]                                  @3792    
----------
flag@3792: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@3793: 0x01
cols@3794:   17

col    0[2] @3795:  0xc1  0x1f 
col   1[21] @3798:  0x5f  0x53  0x59  0x53  0x53  0x4d  0x55  0x33  0x30  0x5f 
 0x33  0x33  0x37  0x39  0x35  0x37  0x38  0x37  0x32  0x33  0x24 
col    2[2] @3820:  0xc1  0x02 
col    3[2] @3823:  0xc1  0x04 
col    4[3] @3826:  0xc2  0x05  0x21 
col    5[1] @3830:  0x80 
col    6[1] @3832:  0x80 
col    7[1] @3834:  0x80 
col    8[1] @3836:  0x80 
col    9[1] @3838:  0x80 
col   10[2] @3840:  0xc1  0x06   <===修改前16进制值 
col   11[2] @3843:  0xc1  0x07 
col   12[0] @3846: *NULL*
col   13[0] @3847: *NULL*
col   14[0] @3848: *NULL*
col   15[0] @3849: *NULL*
col   16[2] @3850:  0xc1  0x03 

修改回滚段状态

BBED> m /x 02 offset 3842    <===注意修改为02
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
 File: /u01/oracle/oradata/ora11g/system01.dbf (1)
 Block: 225              Offsets: 3842 to 4353           Dba:0x004000e1
------------------------------------------------------------------------
 0202c107 ffffffff 02c1032c 001102c1 


 <32 bytes per line>

BBED> p *kdbr[30]
rowdata[0]
----------
ub1 rowdata[0]                              @3792     0x2c

BBED> x /r
rowdata[0]                                  @3792    
----------
flag@3792: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@3793: 0x01
cols@3794:   17

col    0[2] @3795:  0xc1  0x1f 
col   1[21] @3798:  0x5f  0x53  0x59  0x53  0x53  0x4d  0x55  0x33  0x30  0x5f 
 0x33  0x33  0x37  0x39  0x35  0x37  0x38  0x37  0x32  0x33  0x24 
col    2[2] @3820:  0xc1  0x02 
col    3[2] @3823:  0xc1  0x04 
col    4[3] @3826:  0xc2  0x05  0x21 
col    5[1] @3830:  0x80 
col    6[1] @3832:  0x80 
col    7[1] @3834:  0x80 
col    8[1] @3836:  0x80 
col    9[1] @3838:  0x80 
col   10[2] @3840:  0xc1  0x02   <===修改后16进制值
col   11[2] @3843:  0xc1  0x07 
col   12[0] @3846: *NULL*
col   13[0] @3847: *NULL*
col   14[0] @3848: *NULL*
col   15[0] @3849: *NULL*
col   16[2] @3850:  0xc1  0x03 

BBED>  x /rncnnnnnnn
rowdata[0]                                  @3792    
----------
flag@3792: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@3793: 0x01
cols@3794:   17

col    0[2] @3795: 30 
col   1[21] @3798: _SYSSMU30_3379578723$
col    2[2] @3820: 1 
col    3[2] @3823: 3 
col    4[3] @3826: 432 
col    5[1] @3830: 0 
col    6[1] @3832: 0 
col    7[1] @3834: 0 
col    8[1] @3836: 0 
col    9[1] @3838: 0 
col   10[2] @3840: 1    <===实际展示值undo$.status$
col   11[2] @3843: 6 
col   12[0] @3846: *NULL*
col   13[0] @3847: *NULL*
col   14[0] @3848: *NULL*
col   15[0] @3849: *NULL*
col   16[2] @3850: 2  

1.修改为02(表示的实际值比修改值大1[1:DELETE,2:OFFLINE, 3:ONLINE,4:UNDEFINED,5:NEEDS RECOVERY,6:PARTLY AVAILABLE,其他表示:UNDEFINED])
2.Offset需要在10列漂移量上+2(或者11列偏移量-1)

修改验证值

BBED> sum apply
Check value for File 1, Block 225:
current = 0x9708, required = 0x9708

补充说明
1)通过dbms_metadata.get_ddl分析DBA_ROLLBACK_SEGS可以知道seg$.status$表示含义

2, 'OFFLINE'
3, 'ONLINE',
4, 'UNDEFINED'
5, 'NEEDS RECOVERY',
6, 'PARTLY AVAILABLE'
other, 'UNDEFINED'

2)该方法一般出现在使用隐含参数屏蔽回滚段无效之后使用,一般不到最后,不建议使用该方法,可能引起不可预知的后果
3)使用该方法启动数据库后,需要手工删除seg$中被bbed处理过的回滚段

因非常规操作导致删除表空间提示ORA-01561解决办法

今天测试手工通过bbed修改undo$中回滚段状态(从status$=5修改为1)[NEEDS RECOVERY 修改为 DELETE],然后尝试删除表空间,发现不能删除
删除表空间提示ORA-01561

SQL> drop tablespace undotbs;
drop tablespace undotbs
*
ERROR at line 1:
ORA-01561: failed to remove all objects in the tablespace specified


SQL> drop tablespace undotbs including contents; 
drop tablespace undotbs including contents
*
ERROR at line 1:
ORA-01561: failed to remove all objects in the tablespace specified

查询相关信息

SQL> select ts#,name from v$tablespace;

       TS# NAME
---------- ------------------------------
         0 SYSTEM
         1 SYSAUX
         4 USERS
         6 UNDOTBS
         3 TEMP

SQL> select name,ts#,status$ from undo$;

NAME                                  TS#    STATUS$
------------------------------ ---------- ----------
SYSTEM                                  0          3
_SYSSMU1_3138885392$                    2          1
_SYSSMU2_4228238222$                    2          1
_SYSSMU3_2210742642$                    2          1
_SYSSMU4_1455318006$                    2          1
_SYSSMU5_3787622316$                    2          1
_SYSSMU6_2460248069$                    2          1
_SYSSMU7_1924883037$                    2          1
_SYSSMU8_1909280886$                    2          1
_SYSSMU9_3593450615$                    2          1
_SYSSMU10_2490256178$                   2          1

NAME                                  TS#    STATUS$
------------------------------ ---------- ----------
_SYSSMU11_253524401$                    6          1
_SYSSMU12_842775869$                    6          1
_SYSSMU13_2794767139$                   6          1
_SYSSMU14_2067649841$                   6          1
_SYSSMU15_3270221471$                   6          1
_SYSSMU16_4094338609$                   6          1
_SYSSMU17_709661646$                    6          1
_SYSSMU18_699588262$                    6          1
_SYSSMU19_718640828$                    6          1
_SYSSMU20_3516920665$                   6          1
_SYSSMU21_793796797$                    6          1

NAME                                  TS#    STATUS$
------------------------------ ---------- ----------
_SYSSMU22_3988785920$                   6          1
_SYSSMU23_1828333848$                   6          1
_SYSSMU24_1223218862$                   6          1
_SYSSMU25_2939844199$                   6          1
_SYSSMU26_1317300205$                   6          1
_SYSSMU27_1654033223$                   6          1
_SYSSMU28_3748619502$                   6          1
_SYSSMU29_1868765904$                   6          1
_SYSSMU30_3379578723$                   6          1

31 rows selected.

SQL> select segment_name,status from dba_rollback_segs;

SEGMENT_NAME                   STATUS
------------------------------ ----------------
SYSTEM                         ONLINE

通过这里可以看出,通过bbed的修改,除system回滚段外,其他均已经被标志为delete状态,对于这样的情况,很本能的怀疑是extent或者segment未被清理掉导致

查询EXTENT和SEGMENT

SQL> select SEGMENT_NAME from dba_extents where TABLESPACE_NAME='UNDOTBS';

no rows selected

SQL> select segment_name from dba_segments where  TABLESPACE_NAME='UNDOTBS';

no rows selected

SQL> select count(*) from seg$ where ts#=6;

  COUNT(*)
----------
        10

SQL> select count(*) from seg$ where ts#=2;

  COUNT(*)
----------
         0

SQL> select file#,type# from  seg$ where ts#=6;

     FILE#      TYPE#
---------- ----------
         3         10
         3         10
         3         10
         3         10
         3         10
         3         10
         3         10
         3         10
         3         10
         3         10

10 rows selected.

通过查询我们发现SEG$中含有10条记录,而通过dbms_metadata.get_ddl分析DBA_SEGMENTS是的,得出type为10恰好是TYPE2 UNDO信息.

解决办法
删除掉这些因为手工修改undo$信息导致遗留下来的后遗症对象

SQL> delete from seg$ where ts#=6;

10 rows deleted.

SQL> commit;

Commit complete.

SQL> drop tablespace undotbs ;

Tablespace dropped.

这样的直接修改基表的做法,在一般的情况下非常不建议使用,可能带来系统不稳定.但是在数据库异常处理过程中,可能将成为一个法宝

truncate table强制终止导致ORA-00600[ktspfundo-2]

朋友的金蝶erp系统异常abort,让我帮忙分析原因.
ORA-00600[ktspfundo-2]错误

Fri Jul 27 08:53:33 2012
Errors in file /u01/oracle/admin/finance/udump/finance_ora_7687.trc:
ORA-00600: internal error code, arguments: [ktspfundo-2], [], [], [], [], [], [], []
ORA-01013: user requested cancel of current operation
Fri Jul 27 08:53:33 2012
Errors in file /u01/oracle/admin/finance/udump/finance_ora_7687.trc:
ORA-00600: internal error code, arguments: [ktspfundo-2], [], [], [], [], [], [], []
ORA-01013: user requested cancel of current operation
Fri Jul 27 08:54:16 2012
Errors in file /u01/oracle/admin/finance/udump/finance_ora_7687.trc:
ORA-00600: internal error code, arguments: [ktspfundo-2], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [ktspfundo-2], [], [], [], [], [], [], []
ORA-01013: user requested cancel of current operation
Fri Jul 27 08:57:12 2012
Errors in file /u01/oracle/admin/finance/bdump/finance_smon_4156.trc:
ORA-00600: internal error code, arguments: [ktspfundo-2], [], [], [], [], [], [], []
Fri Jul 27 08:57:20 2012
ORACLE Instance finance (pid = 15)-Error 600 encountered while recovering transaction (8, 3) on object 34294107.
Fri Jul 27 08:57:20 2012
Errors in file /u01/oracle/admin/finance/bdump/finance_smon_4156.trc:
ORA-00600: internal error code, arguments: [ktspfundo-2], [], [], [], [], [], [], []
Fri Jul 27 09:07:14 2012
Errors in file /u01/oracle/admin/finance/bdump/finance_smon_4156.trc:
ORA-00600: internal error code, arguments: [ktspfundo-2], [], [], [], [], [], [], []
Fri Jul 27 09:07:15 2012
Errors in file /u01/oracle/admin/finance/bdump/finance_pmon_4130.trc:
ORA-00474: SMON process terminated with error

从这里可以大概看出数据库在进行一个参数,然后用户终止该操作导致,导致ORA-00600[ktspfundo-2]错误,然后出现smon回滚,因为回滚失败从而使得数据块down掉

分析trace文件

*** 2012-07-27 08:53:33.293
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [ktspfundo-2], [], [], [], [], [], [], []
ORA-01013: user requested cancel of current operation
Current SQL statement for this session:
TRUNCATE TABLE VTC3B8DR2G7J926FWOBK839XOR
----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
ksedst()+31          call     ksedst1()            000000000 ? 000000001 ?
                                                   7FFF41B0EE70 ? 7FFF41B0EED0 ?
                                                   7FFF41B0EE10 ? 000000000 ?
ksedmp()+610         call     ksedst()             000000000 ? 000000001 ?
                                                   7FFF41B0EE70 ? 7FFF41B0EED0 ?
                                                   7FFF41B0EE10 ? 000000000 ?
ksfdmp()+21          call     ksedmp()             000000003 ? 000000001 ?
                                                   7FFF41B0EE70 ? 7FFF41B0EED0 ?
                                                   7FFF41B0EE10 ? 000000000 ?
kgerinv()+161        call     ksfdmp()             000000003 ? 000000001 ?
                                                   7FFF41B0EE70 ? 7FFF41B0EED0 ?
                                                   7FFF41B0EE10 ? 000000000 ?
kgeasnmierr()+163    call     kgerinv()            0068966E0 ? 2AE87C6E1168 ?
                                                   7FFF41B0EED0 ? 7FFF41B0EE10 ?
                                                   000000000 ? 000000000 ?
ktspfundo()+3902     call     kgeasnmierr()        0068966E0 ? 2AE87C6E1168 ?
                                                   7FFF41B0EED0 ? 7FFF41B0EE10 ?
                                                   000000010 ? 00689C0C0 ?
kcoubk()+351         call     ktspfundo()          7FFF41B10810 ? 2AE80C800CFA ?
                                                   4D6EE6014 ? 000000002 ?
                                                   000000010 ? 7FFF41B11128 ?
ktundo()+1208        call     kcoubk()             7FFF41B111F8 ? 7FFF41B10810 ?
                                                   2AE87E384024 ? 000000002 ?
                                                   000000002 ? 000000000 ?
ktubko()+499         call     ktundo()             000000001 ? 010E5C341 ?
                                                   2AE87E384020 ? 000000058 ?
                                                   000008430 ? 7657E9990 ?
ktuabt()+810         call     ktubko()             7657E9990 ? 7FFF41B1188C ?
                                                   000000002 ? 7FFF41B11648 ?
                                                   7FFF41B11570 ? 7FFF41B11870 ?
ktcrab()+292         call     ktuabt()             7657E98F8 ? 000000002 ?
                                                   000000002 ? 7FFF41B11648 ?
                                                   7FFF41B11570 ? 7FFF41B11870 ?
ktccle()+516         call     ktcrab()             7657E98F8 ? 000000002 ?
                                                   000000002 ? 7FFF41B11648 ?
                                                   7FFF41B11570 ? 7FFF41B11870 ?
ksepop()+384         call     ktccle()             000000006 ? 000000002 ?
                                                   000000002 ? 7FFF41B11648 ?
                                                   7FFF41B11570 ? 7FFF41B11870 ?
kgepop()+123         call     ksepop()             0068966E0 ? 000000006 ?
                                                   000000002 ? 7FFF41B11648 ?
                                                   7FFF41B11570 ? 7FFF41B11870 ?
kgesev()+315         call     kgepop()             0068966E0 ? 2AE87C6E1168 ?
                                                   0000003F5 ? 7FFF41B11648 ?
                                                   7FFF41B11570 ? 7FFF41B11870 ?
ksesec0()+186        call     kgesev()             0068966E0 ? 2AE87C6E1168 ?
                                                   0000003F5 ? 000000000 ?
                                                   7FFF41B11B30 ? 7FFF41B11870 ?
ksqcmi()+2322        call     ksesec0()            000000000 ? 000000000 ?
                                                   000001000 ? 000000000 ?
                                                   000000013 ? 000000005 ?
ksqcnv()+496         call     ksqcmi()             77E586B88 ? 000000006 ?
                                                   00000FFFF ? 00147AE14 ?
                                                   7FFF41B126A0 ? 7FFF41B128A8 ?
ksqcov()+44          call     ksqcnv()             77E586B88 ? 000000006 ?
                                                   000000000 ? 00147AE14 ?
                                                   7FFF41B128A8 ? 000000004 ?
kcbo_reuse_obj()+14  call     ksqcov()             77E586B68 ? 000000006 ?
09                                                 000000000 ? 00147AE14 ?
                                                   7FFF41B128A8 ? 000000004 ?
kcbrbo()+1126        call     kcbo_reuse_obj()     7FFF41B12F04 ? 7FFF41B12F0C ?
                                                   000000001 ? 00147AE14 ?
                                                   7FFF41B128A8 ? 000000004 ?
ktsstrn_segment()+3  call     kcbrbo()             7FFF41B12F04 ? 7FFF41B12F0C ?
941                                                000000001 ? 00147AE14 ?
                                                   7FFF41B128A8 ? 000000004 ?
kkbtts_trunc_tbl_se  call     ktsstrn_segment()    7FFF41B13180 ? 000000000 ?
g()+1018                                           0020C6444 ? 000000000 ?
                                                   7FFF41B14C00 ? 7FFF00000001 ?
kkbtrn()+8156        call     kkbtts_trunc_tbl_se  735ACA058 ? 77BC78D18 ?
                              g()                  000000000 ? 000000002 ?
                                                   000000000 ? 7FFF41B14C00 ?
opiexe()+15805       call     kkbtrn()             735ACA058 ? 000000000 ?
                                                   718831208 ? 000000000 ?
                                                   000000002 ? 7FFF00000000 ?
opiosq0()+3316       call     opiexe()             000000004 ? 000000000 ?
                                                   7FFF41B15F48 ? 00000000B ?
                                                   000000002 ? 7FFF00000000 ?
kpooprx()+315        call     opiosq0()            000000003 ? 00000000E ?
                                                   7FFF41B160B8 ? 0000000A4 ?
                                                   000000002 ? 7FFF00000000 ?
kpoal8()+799         call     kpooprx()            7FFF41B19264 ? 7FFF41B17280 ?
                                                   000000029 ? 000000001 ?
                                                   000000000 ? 7FFF00000000 ?
opiodr()+984         call     kpoal8()             00000005E ? 000000017 ?
                                                   7FFF41B19260 ? 000000001 ?
                                                   000000001 ? 7FFF00000000 ?
ttcpip()+1012        call     opiodr()             00000005E ? 000000017 ?
                                                   7FFF41B19260 ? 000000000 ?
                                                   0059C09B0 ? 7FFF00000000 ?
opitsk()+1322        call     ttcpip()             00689E3B0 ? 7FFF41B17248 ?
                                                   7FFF41B19260 ? 000000000 ?
                                                   7FFF41B18D58 ? 7FFF41B193C8 ?
opiino()+1026        call     opitsk()             000000003 ? 000000000 ?
                                                   7FFF41B19260 ? 000000001 ?
                                                   000000000 ? 4E58D8C00000001 ?
opiodr()+984         call     opiino()             00000003C ? 000000004 ?
                                                   7FFF41B1A428 ? 000000000 ?
                                                   000000000 ? 4E58D8C00000001 ?
opidrv()+547         call     opiodr()             00000003C ? 000000004 ?
                                                   7FFF41B1A428 ? 000000000 ?
                                                   0059C0460 ? 4E58D8C00000001 ?
sou2o()+114          call     opidrv()             00000003C ? 000000004 ?
                                                   7FFF41B1A428 ? 000000000 ?
                                                   0059C0460 ? 4E58D8C00000001 ?
opimai_real()+163    call     sou2o()              7FFF41B1A400 ? 00000003C ?
                                                   000000004 ? 7FFF41B1A428 ?
                                                   0059C0460 ? 4E58D8C00000001 ?
main()+116           call     opimai_real()        000000002 ? 7FFF41B1A490 ?
                                                   000000004 ? 7FFF41B1A428 ?
                                                   0059C0460 ? 4E58D8C00000001 ?
__libc_start_main()  call     main()               000000002 ? 7FFF41B1A490 ?
+244                                               000000004 ? 7FFF41B1A428 ?
                                                   0059C0460 ? 4E58D8C00000001 ?
_start()+41          call     __libc_start_main()  000723088 ? 000000002 ?
                                                   7FFF41B1A5E8 ? 000000000 ?
                                                   0059C0460 ? 000000002 ?
 
--------------------- Binary Stack Dump ---------------------

这里可以得到更加准确的信息,数据库在truncate table的时候,有人异常终止程序,导致数据库出现ORA-00600[ktspfundo-2].查询mos未发现相关bug记录,从这些信息初步判断是因为oracle的bug导致在truncate表的时候异常终止,然后出现该对象上的回滚记录异常(当然truncate本身不需要回滚,但是可能记录一些附带的回滚信息),然后出现对象回滚异常是的数据库down.重启数据库,对象回滚段信息已经自动回滚完成,数据库正常.因为truncate表被异常终止的情况本身不多见,引发该bug更不常见,如果只是偶尔发生一次,建议忽略该错误.当然如果有时间和兴趣,可以提交sr

数据库报ORA-00607/ORA-00600[4194]错误

昨天晚上处理一起比较奇特的ORA-00600[4194]错误的数据库恢复案例,客户数据库刚刚上线,因为一时疏忽没有做备份.谁知天有不测风云,就这样的系统也会出问题(数据库文件总共 5g redo log sequence#=9).这个事故告诉我们:作为dba在任何时候都不要有侥幸心理,备份重于一切
数据库报ORA-00607/ORA-00600[4194]错误

Thu Jul 26 13:21:11 2012
SMON: enabling cache recovery
Thu Jul 26 13:21:11 2012
Errors in file /orasvr/admin/mispdata/udump/mispdata_ora_2865.trc:
ORA-00600: internal error code, arguments: [4194], [31], [2], [], [], [], [], []
Thu Jul 26 13:21:11 2012
Doing block recovery for file 1 block 18
Block recovery from logseq 3994, block 3 to scn 89979535
Thu Jul 26 13:21:11 2012
Recovery of Online Redo Log: Thread 1 Group 1 Seq 3994 Reading mem 0
  Mem# 0: /orasvr/mispdata/redo01.log
Block recovery stopped at EOT rba 3994.5.16
Block recovery completed at rba 3994.5.16, scn 0.89979533
Doing block recovery for file 1 block 9
Block recovery from logseq 3994, block 3 to scn 89979532
Thu Jul 26 13:21:11 2012
Recovery of Online Redo Log: Thread 1 Group 1 Seq 3994 Reading mem 0
  Mem# 0: /orasvr/mispdata/redo01.log
Block recovery completed at rba 3994.5.16, scn 0.89979533
Thu Jul 26 13:21:11 2012
Errors in file /orasvr/admin/mispdata/udump/mispdata_ora_2865.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [4194], [31], [2], [], [], [], [], []
Error 604 happened during db open, shutting down database
USER: terminating instance due to error 604
Instance terminated by USER, pid = 2865
ORA-1092 signalled during: ALTER DATABASE OPEN...

通过alert日志中,我们可以发现是因为ORA-00600[4194]导致数据库不能被正常open,但是这次不同的是在报ORA-00600之前有ORA-00607的错误出现,根据这个提示,应该是一个基本的数据块有问题导致.而ORA-00600[4194]是因为undo和redo不一致导致.对于本错误放在一起分析,大概的评估是因为内部对象的异常出现ora-607,导致undo和redo不一致出现ORA-00600[4194].

trace文件分析

--dump redo
DUMP OF REDO FROM FILE '/orasvr/mispdata/redo02.log'
 Opcodes *.*
 DBAs (file#, block#):
      (1, 18)
 RBAs: 0x000000.00000000.0000 thru 0xffffffff.ffffffff.ffff
 SCNs: scn: 0x0000.00000000 thru scn: 0xffff.ffffffff
 Times: creation thru eternity
 FILE HEADER:
	Compatibility Vsn = 169870080=0xa200300
	Db ID=658120234=0x273a1e2a, Db Name='MISPDATA'
	Activation ID=658142762=0x273a762a
	Control Seq=16668=0x411c, File size=102400=0x19000
	File Number=2, Blksiz=512, File Type=2 LOG
 descrip:"Thread 0001, Seq# 0000003992, SCN 0x0000055c5e3c-0x0000055cac62"
 thread: 1 nab: 0x5 seq: 0x00000f98 hws: 0x6 eot: 0 dis: 0
 resetlogs count: 0x2d42646a scn: 0x0000.00000001 (1)
 resetlogs terminal rcv count: 0x0 scn: 0x0000.00000000
 prev resetlogs count: 0x0 scn: 0x0000.00000000
 prev resetlogs terminal rcv count: 0x0 scn: 0x0000.00000000
 Low  scn: 0x0000.055c5e3c (89939516) 07/26/2012 11:17:42
 Next scn: 0x0000.055cac62 (89959522) 07/26/2012 13:16:19
 Enabled scn: 0x0000.00000001 (1) 08/16/2011 11:50:10
 Thread closed scn: 0x0000.055cac61 (89959521) 07/26/2012 11:17:42
 Disk cksum: 0x3088 Calc cksum: 0x3088
 Terminal recovery stop scn: 0x0000.00000000
 Terminal recovery  01/01/1988 00:00:00
 Most recent redo scn: 0x0000.00000000
 Largest LWN: 0 blocks
 End-of-redo stream : No
 Unprotected mode
 Miscellaneous flags: 0x0
 Thread internal enable indicator: thr: 0, seq: 0 scn: 0x0000.00000000

--ORA-00600错误提示
*** 2012-07-26 13:21:11.566
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [4194], [31], [2], [], [], [], [], []
Current SQL statement for this session:
update undo$ set name=:2,file#=:3,block#=:4,status$=:5,user#=:6,undosqn=:7,
xactsqn=:8,scnbas=:9,scnwrp=:10,inst#=:11,ts#=:12,spare1=:13 where us#=:1

--ora-607
Error 607 in redo application callback
TYP:0 CLS:16 AFN:1 DBA:0x00400012 OBJ:4294967295 SCN:0x0000.0551610e SEQ:  1 OP:5.1
ktudb redo: siz: 256 spc: 7892 flg: 0x0012 seq: 0x003d rec: 0x02
            xid:  0x0000.026.00000035  
ktubl redo: slt: 38 rci: 0 opc: 11.1 objn: 15 objd: 15 tsn: 0
Undo type:  Regular undo        Begin trans    Last buffer split:  No 
Temp Object:  No 
Tablespace Undo:  No 
             0x00000000  prev ctl uba: 0x00400012.003d.01 
prev ctl max cmt scn:  0x0000.0550709b  prev tx cmt scn:  0x0000.0550709c 
txn start scn:  0xffff.ffffffff  logon user: 0  prev brb: 4194318  prev bcl: 0 KDO undo record:
KTB Redo 
op: 0x04  ver: 0x01  
op: L  itl: xid:  0x0000.01e.00000035 uba: 0x00400012.003d.01
                      flg: C---    lkc:  0     scn: 0x0000.05511296
KDO Op code: URP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x0040006a  hdba: 0x00400069
itli: 1  ispac: 0  maxfr: 4863
tabn: 0 slot: 1(0x1) flag: 0x2c lock: 0 ckix: 0
ncol: 17 nnew: 12 size: 0
col  1: [ 9]  5f 53 59 53 53 4d 55 31 24
col  2: [ 2]  c1 02
col  3: [ 2]  c1 03
col  4: [ 2]  c1 0a
col  5: [ 5]  c4 5a 12 5a 14
col  6: [ 1]  80
col  7: [ 4]  c3 08 5f 3d
col  8: [ 4]  c3 02 38 52
col  9: [ 1]  80
col 10: [ 2]  c1 04
col 11: [ 2]  c1 02
col 16: [ 2]  c1 02
Block after image is corrupt: 
buffer tsn: 0 rdba: 0x00400012 (1/18)
scn: 0x0000.0551610e seq: 0x01 flg: 0x04 tail: 0x610e0201
frmt: 0x02 chkval: 0x65f8 type: 0x02=KTU UNDO BLOCK

这里信息比较多:
1.dump redo部分得到file 1 block 18块可能异常
2.ora-600部分可以得出数据库在执行undo$对象update的回滚操作时候报错
3.通过ora-607信息得到update undo$记录对应的数据块是file 1 block 106(dba 0x00400069),在相同数据库版本数据库中查询.也就是说undo$这个回滚段回滚的时候出现错误.

SQL> SELECT OWNER, SEGMENT_NAME, SEGMENT_TYPE, TABLESPACE_NAME, A.PARTITION_NAME
  2    FROM DBA_EXTENTS A
  3   WHERE FILE_ID = &FILE_ID
  4     AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1;
Enter value for file_id: 1
old   3:  WHERE FILE_ID = &FILE_ID
new   3:  WHERE FILE_ID = 1
Enter value for block_id: 106
old   4:    AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
new   4:    AND 106 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1

OWNER
------------------------------
SEGMENT_NAME
--------------------------------------------------------------------------------
SEGMENT_TYPE       TABLESPACE_NAME                PARTITION_NAME
------------------ ------------------------------ ------------------------------
SYS
UNDO$
TABLE              SYSTEM

4.发现dba 0x00400012发现坏块是file 1 block 18,查询坏块对象为

SQL> SELECT OWNER, SEGMENT_NAME, SEGMENT_TYPE, TABLESPACE_NAME, A.PARTITION_NAME
  2    FROM DBA_EXTENTS A
  3   WHERE FILE_ID = &FILE_ID
  4     AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1;
Enter value for file_id: 1
old   3:  WHERE FILE_ID = &FILE_ID
new   3:  WHERE FILE_ID = 1
Enter value for block_id: 18
old   4:    AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
new   4:    AND 18 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1

OWNER
------------------------------
SEGMENT_NAME
--------------------------------------------------------------------------------
SEGMENT_TYPE       TABLESPACE_NAME                PARTITION_NAME
------------------ ------------------------------ ------------------------------
SYS
SYSTEM
ROLLBACK           SYSTEM

通过这里的分析,大概可以确定这次故障的原因:
因为ROLLBACK(file 1 block 18)坏块,redo 恢复undo 出现异常出现ORA-607,使得undo和redo不一致从而出现ORA-00600[4194],导致undo$(file 1 block 106)中的一条update事务不能被正常提交或者回滚,从而使得该数据库不能被正常打开.
针对这个库因为ROLLBACK异常,使用隐含参数无法屏蔽该回滚段,因为这个数据量非常小,我们选择了挖数据文件.如果数据量比较大,可以通过bbed尝试提交undo$(file 1 block 106)数据块中事务,看人品是否能够正常启动.