某集团ebs数据库redo undo丢失导致悲剧

某集团的ebs系统因磁盘空间不足把redo和undo存放到raid 0之上,而且该库无任何备份。最终悲剧发生了,raid 0异常导致redo undo全部丢失,数据库无法正常启动(我接手之时数据库已经resetlogs过,但是未成功)

Sun Jul 27 11:31:27 2014
SMON: enabling cache recovery
SMON: enabling tx recovery
Sun Jul 27 11:31:27 2014
Database Characterset is ZHS16GBK
Sun Jul 27 11:31:27 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_454754.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 42 cannot be read at this time
ORA-01110: data file 42: '/prod/oracle/PROD/logdata/undo/undo1.dbf'
Sun Jul 27 11:31:27 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_454754.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 42 cannot be read at this time
ORA-01110: data file 42: '/prod/oracle/PROD/logdata/undo/undo1.dbf'
Sun Jul 27 11:31:27 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_454754.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 42 cannot be read at this time
ORA-01110: data file 42: '/prod/oracle/PROD/logdata/undo/undo1.dbf'
Sun Jul 27 11:31:27 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_663670.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 41 cannot be read at this time
ORA-01110: data file 41: '/prod/oracle/PROD/logdata/undo/undo2.dbf'
Error 604 happened during db open, shutting down database
USER: terminating instance due to error 604
Instance terminated by USER, pid = 663670
ORA-1092 signalled during: ALTER DATABASE OPEN...

查询相关文件状态发现,undo表空间文件丢失,被offline处理
df_status
因为以前alert日志被清理,通过这里大概猜测是offline丢失的undo文件,然后resetlogs了数据库,现在处理方式为
使用_corrupted_rollback_segments屏蔽回滚段,然后尝试启动数据库

Tue Jul 29 11:40:39 2014
SMON: enabling cache recovery
SMON: enabling tx recovery
Tue Jul 29 11:40:39 2014
Database Characterset is ZHS16GBK
Tue Jul 29 11:40:39 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_569378.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number  with name "" too small
Tue Jul 29 11:40:39 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_569378.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number  with name "" too small
Tue Jul 29 11:40:39 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_569378.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number  with name "" too small
Tue Jul 29 11:40:39 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_585786.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number  with name "" too small
Error 604 happened during db open, shutting down database
USER: terminating instance due to error 604
Instance terminated by USER, pid = 585786
ORA-1092 signalled during: alter database open...

该错误是由于数据库启动需要找到对应的回滚段,但是由于undo异常导致该回滚段无法找到,因此出现该错误,解决方法是通过修改数据scn,让其不找回滚段,从而屏蔽该错误.数据库启动后,删除undo重新创建新undo

Tue Jul 29 15:59:22 2014
drop tablespace undo2 including contents and datafiles
Tue Jul 29 15:59:23 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_782490.trc:
ORA-01122: database file 41 failed verification check
ORA-01110: data file 41: '/prod/oracle/PROD/logdata/undo/undo2.dbf'
ORA-01565: error in identifying file '/prod/oracle/PROD/logdata/undo/undo2.dbf'
ORA-27037: unable to obtain file status
IBM AIX RISC System/6000 Error: 2: No such file or directory
Additional information: 3
Tue Jul 29 15:59:23 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_782490.trc:
ORA-01259: unable to delete datafile /prod/oracle/PROD/logdata/undo/undo2.dbf
Tue Jul 29 15:59:23 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_782490.trc:
ORA-01122: database file 42 failed verification check
ORA-01110: data file 42: '/prod/oracle/PROD/logdata/undo/undo1.dbf'
ORA-01565: error in identifying file '/prod/oracle/PROD/logdata/undo/undo1.dbf'
ORA-27037: unable to obtain file status
IBM AIX RISC System/6000 Error: 2: No such file or directory
Additional information: 3
ORA-01259: unable to delete datafile /prod/oracle/PROD/logdata/undo/undo2.dbf
Tue Jul 29 15:59:23 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_782490.trc:
ORA-01259: unable to delete datafile /prod/oracle/PROD/logdata/undo/undo1.dbf
Tue Jul 29 15:59:23 2014
Completed: drop tablespace undo2 including contents and datafiles
Tue Jul 29 15:59:56 2014
create undo tablespace undotbs1 datafile '/prod/oracle/PROD/logdata/undo_new01.dbf' size 100M autoextend on next 128M maxsize 30G
Tue Jul 29 15:59:57 2014
Completed: create undo tablespace undotbs1 datafile '/prod/oracle/PROD/logdata/undo_new01.dbf' size 100M autoextend on next 128M maxsize 30G
Tue Jul 29 16:00:03 2014
alter tablespace undotbs1 add datafile '/prod/oracle/PROD/logdata/undo_new02.dbf' size 100M autoextend on next 128M maxsize 30G
Completed: alter tablespace undotbs1 add datafile '/prod/oracle/PROD/logdata/undo_new02.dbf' size 100M autoextend on next 128M maxsize 30G

业务运行过程中,数据库报大量ORA-600 4097,ORA-600 kdsgrp1,ORA-600 kcfrbd_3错误

Tue Jul 29 16:07:03 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_950484.trc:
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Tue Jul 29 16:07:06 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_950484.trc:
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []
Tue Jul 29 16:10:06 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_917702.trc:
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Tue Jul 29 16:10:07 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_917702.trc:
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []
Tue Jul 29 16:12:45 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_m000_880692.trc:
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Tue Jul 29 16:21:23 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1040638.trc:
ORA-00600: 内部错误代码, 参数: [kcfrbd_3], [41], [231381], [1], [12800], [12800], [], []
Tue Jul 29 16:21:37 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1040638.trc:
ORA-00600: 内部错误代码, 参数: [kcfrbd_3], [41], [231381], [1], [12800], [12800], [], []
Tue Jul 29 16:21:56 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1040638.trc:
ORA-00600: 内部错误代码, 参数: [kcfrbd_3], [41], [231381], [1], [12800], [12800], [], []
Tue Jul 29 16:22:18 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1040638.trc:
ORA-00600: 内部错误代码, 参数: [kcfrbd_3], [41], [231381], [1], [12800], [12800], [], []
Tue Jul 29 16:22:28 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1105950.trc:
ORA-00600: 内部错误代码, 参数: [4097], [], [], [], [], [], [], []
Tue Jul 29 16:22:33 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1159232.trc:
ORA-00600: 内部错误代码, 参数: [kcfrbd_3], [42], [61235], [1], [12800], [12800], [], []

出现该错误有几个原因和解决方法:
ORA-600 kdsgrp1 是因为相关坏块引起(tab,index,memory,cr block等),结合日志分析对象异常原因,根据具体情况确定对象然后选择合适处理方案(具体参考NOTE:1332252.1)
ORA-600 4097 由于数据库异常关闭然后open,创建回滚段,可能触发bug导致该问题(虽然说在当前版本修复,但是实际处理我确实按照NOTE:1030620.6解决)
ORA-600 kcfrbd_3 有事务的block被访问之后,根据回滚槽信息定位到相关回滚段,而正好新建的回滚段信息又和以前的名字编号一致,从而反馈出来是数据文件大小不够,从而出现该错误(具体参考NOTE:601798.1)
最终该数据库虽然恢复了,抢救了大量数据,但是对于ebs系统来说,丢失redo和undo数据的损失还是巨大的.再次温馨提示:数据库的redo,undo也很重要,数据库的备份更加重要

ORACLE 8.1.7 数据库ORA-600 4194故障恢复

一个817数据库报ORA-600 4194 无法正常启动

Fri Jul 25 10:49:47 2014
Database mounted in Exclusive Mode.
Completed: ALTER DATABASE   MOUNT
Fri Jul 25 10:49:58 2014
ALTER DATABASE RECOVER  database  
Fri Jul 25 10:49:58 2014
Media Recovery Start
Media Recovery Log 
Recovery of Online Redo Log: Thread 1 Group 2 Seq 3320 Reading mem 0
  Mem# 0 errs 0: D:\ORACLE\ORADATA\ORCL\REDO02.LOG
Media Recovery Complete
Completed: ALTER DATABASE RECOVER  database  
Fri Jul 25 10:50:09 2014
alter database open

Beginning crash recovery of 1 threads
Fri Jul 25 10:50:09 2014
Thread recovery: start rolling forward thread 1
Recovery of Online Redo Log: Thread 1 Group 2 Seq 3320 Reading mem 0
  Mem# 0 errs 0: D:\ORACLE\ORADATA\ORCL\REDO02.LOG
Fri Jul 25 10:50:09 2014
Thread recovery: finish rolling forward thread 1
Thread recovery: 0 data blocks read, 0 data blocks written, 3 redo blocks read
Crash recovery completed successfully
Fri Jul 25 10:50:09 2014
Thread 1 advanced to log sequence 3321
Thread 1 opened at log sequence 3321
  Current log# 3 seq# 3321 mem# 0: D:\ORACLE\ORADATA\ORCL\REDO01.LOG
Successful open of redo thread 1.
Fri Jul 25 10:50:09 2014
SMON: enabling cache recovery
Fri Jul 25 10:50:10 2014
Errors in file D:\oracle\admin\ORCL\udump\ORA03216.TRC:
ORA-00600: ??????????: [4194], [12], [37], [], [], [], [], []

Fri Jul 25 10:50:10 2014
Recovery of Online Redo Log: Thread 1 Group 3 Seq 3321 Reading mem 0
  Mem# 0 errs 0: D:\ORACLE\ORADATA\ORCL\REDO01.LOG
Fri Jul 25 10:50:10 2014
SMON: disabling cache recovery
Fri Jul 25 10:50:10 2014
ORA-600 signalled during: alter database open

ORA-600 4194这个错误在数据库异常恢复中非常常见,因为库不是很重要,因此就是直接屏蔽掉故障回滚段,然后强制拉库,该库的恢复过程中,也直接使用隐含参数屏蔽回滚段
_corrupted_rollback_segments= RBS0, RBS1, RBS2, RBS3, RBS4, RBS5, RBS6, RBS_HDSYS,数据库依然无法open,进一步分析trace文件

Fri Jul 25 11:26:07 2014
ORACLE V8.1.7.0.0 - Production vsnsta=0
vsnsql=e vsnxtr=3
Windows 2000 Version 5.2 Service Pack 2, CPU type 586
Oracle8i Release 8.1.7.0.0 - Production
JServer Release 8.1.7.0.0 - Production
Windows 2000 Version 5.2 Service Pack 2, CPU type 586
Instance name: orcl

Redo thread mounted by this instance: 1

Oracle process number: 14

Windows thread id: 3648, image: ORACLE.EXE


*** SESSION ID:(11.1) 2014-07-25 11:26:07.843
*** 2014-07-25 11:26:07.843
ksedmp: internal or fatal error
ORA-00600: ??????????: [4194], [12], [37], [], [], [], [], []
Current SQL statement for this session:
update undo$ set name=:2,file#=:3,block#=:4,status$=:5,user#=:6,undosqn=:7,xactsqn=:8,
scnbas=:9,scnwrp=:10,inst#=:11,ts#=:12 where us#=:1
----- Call Stack Trace -----

这里很明显看出来,数据库是在open过程中,update undo$表遭遇到ORA-600 4194,因为该过程需要使用系统回滚段,但是由于其所对应的undo和redo信息不一致,所以无法正常启动数据库.继续读trace文件

  Extent Control Header
  -----------------------------------------------------------------
  Extent Header:: spare1: 0      space2: 0      #extents: 5      #blocks: 49    
                  last map  0x00000000  #maps: 0      offset: 4128  
      Highwater::  0x00400006  ext#: 0      blk#: 3      ext size: 9     
  #blocks in seg. hdr's freelists: 0     
  #blocks below: 0     
  mapblk  0x00000000  offset: 0     
                   Unlocked
     Map Header:: next  0x00000000  #extents: 5    obj#: 0      flag: 0x40000000
  Extent Map
  -----------------------------------------------------------------
   0x00400003  length: 9     
   0x0040000c  length: 10    
   0x0040008f  length: 10    
   0x00400099  length: 10    
   0x004000a3  length: 10    
  
  TRN CTL:: seq: 0x003c chd: 0x004e ctl: 0x0050 inc: 0x00000000 nfb: 0x0000
            mgc: 0x8002 xts: 0x0068 flg: 0x0001 opt: 2147483646 (0x7ffffffe)
            uba: 0x00400006.003c.25 scn: 0x0000.009a4009
Version: 0x01
  FREE BLOCK POOL::
    uba: 0x00000000.003c.24 ext: 0x0  spc: 0x196   
    uba: 0x00000000.001f.14 ext: 0x1  spc: 0x16f6  
    uba: 0x00000000.0018.02 ext: 0x4  spc: 0x1f1a  
    uba: 0x00000000.0000.00 ext: 0x0  spc: 0x0     
    uba: 0x00000000.0000.00 ext: 0x0  spc: 0x0     
  TRN TBL::

通过这里可以看出来,数据库在启动的时候,使用system undo的block为为0x00400006,使用bbed清除掉该uba记录,让数据库启动的时候重新分配system undo block给数据库执行update undo$使用,数据库open成功

BBED> m /x 0x00000000
 File: D:\ORACLE\ORADATA\ORCL\SYSTEM01.DBF (0)
 Block: 2                Offsets: 4188 to 4192           Dba:0x00000000
------------------------------------------------------------------------
 00000000 3c002400 00009601 00000000 1f001400 0100f616 00000000 18000200

BBED> m /x 0x0000
 File: D:\ORACLE\ORADATA\ORCL\SYSTEM01.DBF (0)
 Block: 2                Offsets: 4028 to 4032           Dba:0x00000000
------------------------------------------------------------------------
 00000000 00000000 3c005000 02800100 68000000 feffff7f 06004000 3c002400
Sat Jul 26 12:09:21 2014
Thread recovery: start rolling forward thread 1
Recovery of Online Redo Log: Thread 1 Group 2 Seq 3326 Reading mem 0
  Mem# 0 errs 0: D:\ORACLE\ORADATA\ORCL\REDO02.LOG
Sat Jul 26 12:09:21 2014
Thread recovery: finish rolling forward thread 1
Thread recovery: 0 data blocks read, 0 data blocks written, 3 redo blocks read
Crash recovery completed successfully
Sat Jul 26 12:09:22 2014
Thread 1 advanced to log sequence 3327
Thread 1 opened at log sequence 3327
  Current log# 3 seq# 3327 mem# 0: D:\ORACLE\ORADATA\ORCL\REDO01.LOG
Successful open of redo thread 1.
Sat Jul 26 12:09:22 2014
SMON: enabling cache recovery
SMON: enabling tx recovery
Sat Jul 26 12:09:39 2014
Completed: alter database open

数据库启动ORA-08103故障恢复

数据库在open过程报ORA-08103错误导致数据库无法正确启动

Fri Jul 18 22:02:51 2014
SMON: enabling tx recovery
Fri Jul 18 22:02:51 2014
Errors in file d:\oracle\product\10.2.0\admin\kemu3\udump\kemu3_ora_29788.trc:
ORA-00604: ?? SQL ?? 1 ????
ORA-08103: ??????

Fri Jul 18 22:02:51 2014
Database Characterset is ZHS16GBK
Fri Jul 18 22:02:51 2014
Errors in file d:\oracle\product\10.2.0\admin\kemu3\bdump\kemu3_smon_29704.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-08103: object no longer exists

Fri Jul 18 22:02:51 2014
Errors in file d:\oracle\product\10.2.0\admin\kemu3\bdump\kemu3_smon_29704.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-08103: object no longer exists

Fri Jul 18 22:02:51 2014
Errors in file d:\oracle\product\10.2.0\admin\kemu3\bdump\kemu3_smon_29704.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-08103: object no longer exists

Fri Jul 18 22:02:52 2014
Errors in file d:\oracle\product\10.2.0\admin\kemu3\bdump\kemu3_smon_29704.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-08103: object no longer exists

replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=18, OS id=29876
Fri Jul 18 22:02:53 2014
Errors in file d:\oracle\product\10.2.0\admin\kemu3\bdump\kemu3_smon_29704.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-08103: object no longer exists

Fri Jul 18 22:02:54 2014
ORA-604 signalled during: alter database open...

对数据库启动过程做10046

PARSING IN CURSOR #22 len=210 dep=2 uid=0 oct=3 lid=0 tim=20960424464 hv=864012087 ad='3063f0b4'
select /*+ rule */ bucket_cnt, row_cnt, cache_cnt, null_cnt, timestamp#, sample_size, minimum, 
maximum, distcnt, lowval, hival, density, col#, spare1, spare2, avgcln from hist_head$ where obj#=:1 and intcol#=:2
END OF STMT
EXEC #22:c=0,e=80,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=3,tim=20960424461
WAIT #22: nam='db file sequential read' ela= 5452 file#=1 block#=60213 blocks=1 obj#=4586 tim=20960429962
FETCH #22:c=0,e=5967,p=1,cr=1,cu=0,mis=0,r=0,dep=2,og=3,tim=20960430462
*** KEWUXS - encountered error: (ORA-00604: 递归 SQL 级别 2 出现错误
ORA-08103: 对象不再存在
)  
*** kewrwdbi_1: Error=13515 encountered during run_once
BINDS #21:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=0a217744  bln=22  avl=01  flg=05
  value=0
 Bind#1
  oacdty=01 mxl=32(20) mxlc=00 mal=00 scl=00 pre=00
  oacflg=18 fl2=0001 frm=01 csi=852 siz=32 off=0
  kxsbbbfp=0a217718  bln=32  avl=20  flg=05
  value="WRI$_ADV_DEFINITIONS"
 Bind#2
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=0a2176f4  bln=24  avl=02  flg=05
  value=1

这里很明显数据库启动过程,由于hist_head$的file 1 block 60213中的object_id 与 data_object_id 不匹配,从而出现ORA-08103错误,导致数据库无法正常启动,这里的故障的对象为hist_head$,非oracle核心对象,因此直接标记该block 为坏块(模拟普通ORA-08103并解决,模拟极端ORA-08103并解决,rman制造坏块,bbed修复坏块,bbed破坏数据文件),然后启动数据库,备份hist_head$表数据,然后truncate hist_head$,再插入hist_head$,整体完工.
在数据库open过程中,如果遇到ora-8103错误,导致数据库无法正常open,可以对其做10046定位到故障block和对象,然后判断对象是否数据库启动必须的对象,甚至是bootstarp$中对象,然后采取不同的处理方法.

分享一次ORA-01113 ORA-01110故障处理过程

数据库报ORA-01113 ORA-01110无法正常open

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1: 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM01.DBF'

alert日志

Wed Jul 16 17:17:56 2014
ALTER DATABASE   MOUNT
Successful mount of redo thread 1, with mount id 1380870980
Database mounted in Exclusive Mode
Lost write protection disabled
Completed: ALTER DATABASE   MOUNT
Wed Jul 16 17:19:01 2014
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_m000_3540.trc  (incident=367575):
ORA-00700: soft internal error, arguments: [dbgrmblcp_corrupt_page], [D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\metadata\HM_FINDING.ams], 
[1245], [], [], [], [], [], [], [], [], []
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1: 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM01.DBF'
Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\incident\incdir_367575\orcl_m000_3540_i367575.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_m000_3540.trc  (incident=367576):
ORA-00600: internal error code, arguments: [dbgrmblgp_get_page_1], [1245], [0], [80], [], [], [], [], [], [], [], []
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1: 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM01.DBF'
Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\incident\incdir_367576\orcl_m000_3540_i367576.trc
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_m000_3540.trc:
ORA-51106: check failed to complete due to an error.  See error below
ORA-00600: internal error code, arguments: [dbgrmblgp_get_page_1], [1245], [0], [80], [], [], [], [], [], [], [], []
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1: 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM01.DBF'

这里出现ORA-00700[dbgrmblcp_corrupt_page]与ORA-00600[dbgrmblgp_get_page_1]是Bug 10321285 : ORA-600 [DBGRMBLCP_CORRUPT_PAGE]

尝试recover database 遭遇ORA-00283 ORA-16433

SQL> recover database;
ORA-00283: recovery session canceled due to errors
ORA-16433: The database must be opened in read/write mode.

alert日志信息

Wed Jul 16 17:36:33 2014
ALTER DATABASE RECOVER  database  
Media Recovery Start
 started logmerger process
Wed Jul 16 17:36:33 2014
Media Recovery failed with error 16433
Slave exiting with ORA-283 exception
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_pr00_3868.trc:
ORA-00283: recovery session canceled due to errors
ORA-16433: The database must be opened in read/write mode.
Recovery Slave PR00 previously exited with exception 283
ORA-283 signalled during: ALTER DATABASE RECOVER  database  ...
Wed Jul 16 17:36:45 2014
alter database open
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_3236.trc:
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1: 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM01.DBF'
ORA-1113 signalled during: alter database open...

重建控制文件后恢复,遭遇ORA-600 2662错误

SQL> shutdown abort
ORACLE instance shut down.
SQL> startup nomount
ORACLE instance started.

Total System Global Area      10255212544 bytes
Fixed Size                        2264088 bytes
Variable Size                  4529849320 bytes
Database Buffers               5704253440 bytes
Redo Buffers                     18845696 bytes
SQL> @d:\ctl.sql

Control file created.

SQL> recover database using backup controlfile;
ORA-00279: change 13953689551797 generated at 07/16/2014 17:53:41 needed for
thread 1
ORA-00289: suggestion :
D:\APP\ADMINISTRATOR\FAST_RECOVERY_AREA\ORCL\ARCHIVELOG\2014_07_16\O1_MF_1_1_%U_

.ARC
ORA-00280: change 13953689551797 for thread 1 is in sequence #1


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
d:\app\administrator\oradata\orcl\redo01.log
Log applied.
Media recovery complete.
SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [2662], [3248], [3635774399],
[3249], [3580470066], [4194545], [], [], [], [], [], []
Process ID: 3932
Session ID: 200 Serial number: 1

因为该数据库是11.2.0.3未打上scn patch,直接推进scn(可以使用event,隐含参数,oradebug,bbed等),数据库就正常open

ORA-00600[17182],ORA-00600[25027],ORA-00600[kghfrempty:ds]故障处理

数据库不能启动(或者启动后马上crash),alert日志报错ORA-00600[17182],ORA-00600[25027],ORA-00600[kghfrempty:ds]等错误

Tue Jul 08 23:36:06 2014
alter database open
Beginning crash recovery of 1 threads
 parallel recovery started with 32 processes
Started redo scan
Completed redo scan
 read 1409 KB redo, 0 data blocks need recovery
Started redo application at
 Thread 1: logseq 2855, block 3
Recovery of Online Redo Log: Thread 1 Group 5 Seq 2855 Reading mem 0
  Mem# 0: /backup/oradata/ztmdb/redo05.log
Completed redo application of 0.00MB
Completed crash recovery at
 Thread 1: logseq 2855, block 2822, scn 104627804
 0 data blocks read, 0 data blocks written, 1409 redo k-bytes read
Tue Jul 08 23:36:12 2014
Thread 1 advanced to log sequence 2856 (thread open)
Thread 1 opened at log sequence 2856
  Current log# 1 seq# 2856 mem# 0: /backup/oradata/ztmdb/redo01.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Tue Jul 08 23:36:13 2014
SMON: enabling cache recovery
[15126] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:10340864 end:10340964 diff:100 (1 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
Errors in file /home/oracle/diag/rdbms/ztmdb/ztmdb/trace/ztmdb_smon_15100.trc  (incident=62061):
ORA-00600: internal error code, arguments: [17182], [0x2B6DFD23D7A0], [], [], [], [], [], [], [], [], [], []
Incident details in: /home/oracle/diag/rdbms/ztmdb/ztmdb/incident/incdir_62061/ztmdb_smon_15100_i62061.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
No Resource Manager plan active
Errors in file /home/oracle/diag/rdbms/ztmdb/ztmdb/trace/ztmdb_ora_15126.trc  (incident=62093):
ORA-00600: internal error code, arguments: [25027], [0], [875836979], [], [], [], [], [], [], [], [], []
Incident details in: /home/oracle/diag/rdbms/ztmdb/ztmdb/incident/incdir_62093/ztmdb_ora_15126_i62093.trc
Block recovery from logseq 2856, block 98 to scn 104627948
Recovery of Online Redo Log: Thread 1 Group 1 Seq 2856 Reading mem 0
  Mem# 0: /backup/oradata/ztmdb/redo01.log
Block recovery completed at rba 2856.99.16, scn 0.104627949
ORACLE Instance ztmdb (pid = 32) - Error 600 encountered while recovering transaction (14, 2) on object 15113.
Errors in file /home/oracle/diag/rdbms/ztmdb/ztmdb/trace/ztmdb_smon_15100.trc  (incident=62066):
ORA-00600: internal error code, arguments: [kghfrempty:ds], [0x2B6DFD23D790], [], [], [], [], [], [], [], [], [], []
ORA-07445: exception encountered: core dump [kghrst()+1835] [SIGSEGV] [ADDR:0x0] [PC:0x97EE385] [SI_KERNEL(general_protection)] []
ORA-00600: internal error code, arguments: [17182], [0x2B6DFD23D7A0], [], [], [], [], [], [], [], [], [], []
Incident details in: /home/oracle/diag/rdbms/ztmdb/ztmdb/incident/incdir_62066/ztmdb_smon_15100_i62066.trc
PMON (ospid: 14978): terminating the instance due to error 474
Tue Jul 08 23:36:26 2014
System state dump requested by (instance=1, osid=14978 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file /home/oracle/diag/rdbms/ztmdb/ztmdb/trace/ztmdb_diag_14996_20140708233626.trc
Dumping diagnostic data in directory=[cdmp_20140708233626], requested by (instance=1, osid=14978 (PMON)), summary=[abnormal instance termination].
Tue Jul 08 23:36:27 2014
Instance terminated by PMON, pid = 14978

这里的错误比较明显,数据库进行完前滚后,smon进行回滚发现有部分block CORRUPTED 导致该问题,解决该问题思路就是:
1.设置event 屏蔽回滚
2.如果1不行,设置回滚段相关隐含参数,屏蔽回滚
3.逻辑方式重建数据库

补充知识点
ORA-600 [25027]

ERROR:
  Format: ORA-600 [25027] [a] [b]

VERSIONS:
  versions 9.2 and above


ARGUMENTS:
  Arg [a]  Tablespace Number (TSN)
  Arg [b]  Decimal Relative Data Block Address (RDBA)