oracle exadata force open

这个库的恢复有一些历史故事(【力荐】Exadata火线救援:10TB级数据修复经典案例详解!):xx运营商x2的1/4配置的oracle exadata机器,跑了近6年,最近有一个cell节点主机异常,在rebalance过程中,只有两个节点的cell其中一个节点坏了一个硬盘导致.导致asm diskgroup无法正常mount,最后该运营商运维三方通过amdu把该一体机中的数据文件全部抽出来,然后在恢复过程中出现大量错误无法解决,请求我们支持
数据库open过程报ORA-01555错误

Thu Jul  14 00:01:04 2016
alter database open
Thu Jul  14 00:01:04 2016
Thread 1 advanced to log sequence 2 (thread open)
Thread 1 opened at log sequence 2
  Current log# 2 seq# 2 mem# 0: /data/amdu/redo/DATA_EC_260.f
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Thu Jul  14 00:01:05 2016
SMON: enabling cache recovery
ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 0x0b26.9f080238):
select ctime, mtime, stime from obj$ where obj# = :1
Errors in file /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_59546.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 83 with name "_SYSSMU83_1078760807$" too small
Errors in file /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_59546.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 83 with name "_SYSSMU83_1078760807$" too small
Error 704 happened during db open, shutting down database
USER (ospid: 59546): terminating the instance due to error 704
Instance terminated by USER, pid = 59546
ORA-1092 signalled during: alter database open...
opiodr aborting process unknown ospid (59546) as a result of ORA-1092

这个错误比较常见,可以通过推scn就可以解决,由于已经安装了scn patch,通过oradebug推scn解决该问题.

ORA-600 4194
这个ora 600 4194相对比较特殊,在SMON: enabling cache recovery之后立马报出来,然后实例直接open失败.

Thu Jul  14 00:06:15 2016
alter database open
Thu Jul  14 00:06:15 2016
Thread 1 advanced to log sequence 3 (thread open)
Thread 1 opened at log sequence 3
  Current log# 3 seq# 3 mem# 0: /data/amdu/redo/DATA_EC_263.f
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Thu Jul  14 00:06:15 2016
SMON: enabling cache recovery
Errors in file /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_60038.trc  (incident=1080450):
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Block recovery from logseq 3, block 3 to scn 12269096776739
Recovery of Online Redo Log: Thread 1 Group 3 Seq 3 Reading mem 0
  Mem# 0: /data/amdu/redo/DATA_EC_263.f
Block recovery stopped at EOT rba 3.5.16
Block recovery completed at rba 3.5.16, scn 2856.2670179361
Block recovery from logseq 3, block 3 to scn 12269096776736
Recovery of Online Redo Log: Thread 1 Group 3 Seq 3 Reading mem 0
  Mem# 0: /data/amdu/redo/DATA_EC_263.f
Block recovery completed at rba 3.5.16, scn 2856.2670179361
Errors in file /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_60038.trc:
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Errors in file /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_60038.trc:
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
USER (ospid: 60038): terminating the instance due to error 600
Instance terminated by USER, pid = 60038
ORA-1092 signalled during: alter database open...
opiodr aborting process unknown ospid (60038) as a result of ORA-1092

trace文件分析
打开数据库报ORA-600[4194]错误,对启动过程进行10046跟踪并且分析trace文件发现

PARSING IN CURSOR #140375370511672 len=148 dep=1 uid=0 oct=6 lid=0 tim=3501342849457766 hv=3540833987 
ad='a47df47a8' sqlid='5ansr7r9htpq3'
update undo$ set name=:2,file#=:3,block#=:4,status$=:5,user#=:6,undosqn=:7,xactsqn=:8,scnbas=:9,
scnwrp=:10,inst#=:11,ts#=:12,spare1=:13 where us#=:1
END OF STMT
PARSE #140375370511672:c=27996,e=28041,p=66,cr=224,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=3501342849457765
BINDS #140375370511672:
 Bind#0
  oacdty=01 mxl=32(20) mxlc=00 mal=00 scl=00 pre=00
  oacflg=18 fl2=0001 frm=01 csi=178 siz=32 off=0
  kxsbbbfp=a47e093ca  bln=32  avl=20  flg=09
  value="_SYSSMU1_2856534670$"
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fabae3b92b0  bln=24  avl=03  flg=05
  value=1024
 Bind#2
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fabae3b9280  bln=24  avl=03  flg=05
  value=128
 Bind#3
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fabae3b9248  bln=24  avl=02  flg=05
  value=5
 Bind#4
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fabae3b9218  bln=24  avl=02  flg=05
  value=1
 Bind#5
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fabae3b91e8  bln=24  avl=03  flg=05
  value=3398
 Bind#6
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fabae3b91b8  bln=24  avl=05  flg=05
  value=1485261
 Bind#7
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fabae3b9180  bln=24  avl=06  flg=05
  value=1946693999
 Bind#8
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fabae3b8ec8  bln=24  avl=03  flg=05
  value=2847
 Bind#9
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fabae3b8e98  bln=24  avl=02  flg=05
  value=1
 Bind#10
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fabae3b8e68  bln=24  avl=02  flg=05
  value=2
 Bind#11
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fabae3b8e38  bln=24  avl=02  flg=05
  value=2
 Bind#12
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fabae3b92e0  bln=22  avl=02  flg=05
  value=1
WAIT #140375370511672: nam='db file sequential read' ela= 21 file#=1 block#=179020 blocks=1 obj#=0 tim=3501342849459353

*** 2016-07-14 03:14:09.548
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []

很明显数据库是在update undo$的时候,读取到file 1 block 179020的时候报错,继续分析trace文件

Error 600 in redo application callback
Dump of change vector:
TYP:0 CLS:16 AFN:1 DBA:0x0042bb4c OBJ:4294967295 SCN:0x0b26.a88e815e SEQ:1 OP:5.1 ENC:0 RBL:0
ktudb redo: siz: 272 spc: 4906 flg: 0x0012 seq: 0x0f4c rec: 0x0d
            xid:  0x0000.01b.00000b63  
ktubl redo: slt: 27 rci: 0 opc: 11.1 [objn: 15 objd: 15 tsn: 0]
Undo type:  Regular undo        Begin trans    Last buffer split:  No 
Temp Object:  No 
Tablespace Undo:  No 
             0x00000000  prev ctl uba: 0x0042bb4c.0f4c.0c 
prev ctl max cmt scn:  0x0b23.ccb9eb89  prev tx cmt scn:  0x0b23.ccb9ebac 
txn start scn:  0xffff.ffffffff  logon user: 0  prev brb: 4373321  prev bcl: 0 BuExt idx: 0 flg2: 0
KDO undo record:
KTB Redo 
op: 0x04  ver: 0x01  
compat bit: 4 (post-11) padding: 1
op: L  itl: xid:  0x0000.01c.00000b65 uba: 0x0042bb4a.0f4c.1d
                      flg: C---    lkc:  0     scn: 0x0b25.7391ee5c
KDO Op code: URP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x004000e1  hdba: 0x004000e0
itli: 2  ispac: 0  maxfr: 4863
tabn: 0 slot: 1(0x1) flag: 0x2c lock: 0 ckix: 0
ncol: 17 nnew: 12 size: 0
col  1: [20]  5f 53 59 53 53 4d 55 31 5f 32 38 35 36 35 33 34 36 37 30 24
col  2: [ 2]  c1 02
col  3: [ 3]  c2 0b 19
col  4: [ 3]  c2 02 1d
col  5: [ 6]  c5 14 2f 46 28 64
col  6: [ 3]  c2 1d 30
col  7: [ 5]  c4 02 31 35 3e
col  8: [ 3]  c2 22 63
col  9: [ 2]  c1 02
col 10: [ 2]  c1 04
col 11: [ 2]  c1 03
col 16: [ 2]  c1 03
Block after image is corrupt: 
buffer tsn: 0 rdba: 0x0042bb4c (1/179020)
scn: 0x0b26.a88e815e seq: 0x01 flg: 0x04 tail: 0x815e0201
frmt: 0x02 chkval: 0xf022 type: 0x02=KTU UNDO BLOCK
Hex dump of block: st=0, typ_found=1
Dump of memory from 0x000000094E07A000 to 0x000000094E07C000
94E07A000 0000A202 0042BB4C A88E815E 04010B26  [....L.B.^...&...]
94E07A010 0000F022 004E0000 00000B5B 1D1D0F4C  [".....N.[...L...]

我们知道在file 1 block 179020的时候redo和undo信息不匹配,出现了上述的ORA 600 4194的错误.进一步分析

Block image after block recovery:
buffer tsn: 0 rdba: 0x00400080 (1/128)
scn: 0x0b26.6389e19d seq: 0x01 flg: 0x04 tail: 0xe19d0e01
frmt: 0x02 chkval: 0x7c95 type: 0x0e=KTU UNDO HEADER W/UNLIMITED EXTENTS
Hex dump of block: st=0, typ_found=1
Dump of memory from 0x000000094DAB2000 to 0x000000094DAB4000
94DAB2000 0000A20E 00400080 6389E19D 04010B26  [......@....c&...]
94DAB2010 00007C95 00000000 00000000 00000000  [.|..............]
94DAB2020 00000000 00000015 000002FF 00001020  [............ ...]
94DAB2030 0000000D 0000004C 00000080 0042BB4C  [....L.......L.B.]


  Extent Control Header
  -----------------------------------------------------------------
  Extent Header:: spare1: 0      spare2: 0      #extents: 21     #blocks: 767   
                  last map  0x00000000  #maps: 0      offset: 4128  
      Highwater::  0x0042bb4c  ext#: 13     blk#: 76     ext size: 128   
  #blocks in seg. hdr's freelists: 0     
  #blocks below: 0     
  mapblk  0x00000000  offset: 13    
                   Unlocked
     Map Header:: next  0x00000000  #extents: 21   obj#: 0      flag: 0x40000000
  Extent Map
  -----------------------------------------------------------------
   0x00400081  length: 7     
   0x00413a38  length: 8     
   0x00400088  length: 8     
   0x00413a30  length: 8     
   0x0042b888  length: 8     
   0x0042b890  length: 8     
   0x0042b898  length: 8     
   0x0042b8a0  length: 8     
   0x0042b8a8  length: 8     
   0x0042b8b0  length: 8     
   0x0042b8b8  length: 8     
   0x0042b8c0  length: 8     
   0x0042ba80  length: 128   
   0x0042bb00  length: 128   
   0x0042bc00  length: 128   
   0x0042bc80  length: 128   
   0x0042bb80  length: 128   
   0x00400210  length: 8     
   0x00400218  length: 8     
   0x00400220  length: 8     
   0x00400228  length: 8     
  
  TRN CTL:: seq: 0x0f4c chd: 0x001b ctl: 0x0043 inc: 0x00000000 nfb: 0x0001
            mgc: 0x8002 xts: 0x0068 flg: 0x0001 opt: 2147483646 (0x7ffffffe)
            uba: 0x0042bb4c.0f4c.0c scn: 0x0b23.ccb9eb89
Version: 0x01
  FREE BLOCK POOL::
    uba: 0x0042bb4c.0f4c.0c ext: 0xd  spc: 0x132a  
    uba: 0x00000000.0f4c.0c ext: 0xd  spc: 0x12f6  
    uba: 0x00000000.0f4c.01 ext: 0xd  spc: 0x1ec8  
    uba: 0x00000000.0f4c.04 ext: 0xd  spc: 0x1b86  
    uba: 0x00000000.0f4c.09 ext: 0xd  spc: 0x162c  
  TRN TBL::
 
  index  state cflags  wrap#    uel         scn            dba            parent-xid    nub     stmt_num
  ------------------------------------------------------------------------------------------------
   0x00    9    0x00  0x0b62  0x0011  0x0b25.2dacaf61  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x01    9    0x00  0x0b64  0x0024  0x0b24.a6a2cf7b  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x02    9    0x00  0x0b65  0x0036  0x0b25.7391eda0  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x03    9    0x00  0x0b4f  0x0007  0x0b24.337bf49b  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x04    9    0x00  0x0b64  0x0051  0x0b23.ff22c637  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x05    9    0x00  0x0b64  0x0022  0x0b26.4393eb1e  0x0042bb4c  0x0000.000.00000000  0x00000001   0x00000000
   0x06    9    0x00  0x0b66  0x0058  0x0b24.335c794d  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x07    9    0x00  0x0b4f  0x001d  0x0b24.4e05f2af  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x08    9    0x00  0x0b65  0x005e  0x0b23.ff22c618  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x09    9    0x00  0x0b5f  0x0035  0x0b24.337bf3d9  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x0a    9    0x00  0x0b64  0x004f  0x0b25.7391ee5f  0x0042bb4c  0x0000.000.00000000  0x00000001   0x00000000
   0x0b    9    0x00  0x0b64  0x0040  0x0b24.335c7bd7  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x0c    9    0x00  0x0b65  0x0002  0x0b25.7391e929  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x0d    9    0x00  0x0b4f  0x0033  0x0b24.a6a2caa5  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x0e    9    0x00  0x0b65  0x0008  0x0b23.ff22c616  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x0f    9    0x00  0x0b61  0x0038  0x0b26.6389e195  0x0042bb4c  0x0000.000.00000000  0x00000001   0x00000000
   0x10    9    0x00  0x0b4f  0x002a  0x0b24.bcff18b3  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x11    9    0x00  0x0b5c  0x0059  0x0b25.2dacaf69  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x12    9    0x00  0x0b65  0x0026  0x0b25.2dacb0a8  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x13    9    0x00  0x0b66  0x0021  0x0b24.a6a2caaa  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x14    9    0x00  0x0b62  0x0009  0x0b24.337bf3d7  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x15    9    0x00  0x0b63  0x0031  0x0b25.1b4e13ba  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x16    9    0x00  0x0b66  0x003b  0x0b25.2dacee5d  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x17    9    0x00  0x0b63  0x0034  0x0b26.6389e199  0x0042bb4c  0x0000.000.00000000  0x00000001   0x00000000
   0x18    9    0x00  0x0b5d  0x002f  0x0b24.bcff18af  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x19    9    0x00  0x0b5b  0x004d  0x0b24.d60f78e3  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x1a    9    0x00  0x0b60  0x005b  0x0b25.1b4e13be  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x1b    9    0x00  0x0b62  0x003e  0x0b23.ccb9ebac  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x1c    9    0x00  0x0b65  0x000a  0x0b25.7391ee5c  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x1d    9    0x00  0x0b64  0x002c  0x0b24.4e05f2b1  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x1e    9    0x00  0x0b64  0x0045  0x0b24.33255ae9  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x1f    9    0x00  0x0b64  0x0015  0x0b25.1b4e13b5  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x20    9    0x00  0x0b63  0x0050  0x0b24.335c79a4  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x21    9    0x00  0x0b5d  0x0001  0x0b24.a6a2caf0  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x22    9    0x00  0x0b65  0x000f  0x0b26.4393eb20  0x0042bb4c  0x0000.000.00000000  0x00000001   0x00000000
   0x23    9    0x00  0x0b62  0x0042  0x0b24.337bf3d2  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x24    9    0x00  0x0b65  0x003c  0x0b24.a6a2d137  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x25    9    0x00  0x0b62  0x0020  0x0b24.335c795d  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x26    9    0x00  0x0b63  0x0052  0x0b25.2dacee48  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x27    9    0x00  0x0b4e  0x003a  0x0b25.7391ee58  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x28    9    0x00  0x0b65  0x0049  0x0b25.2dacb089  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x29    9    0x00  0x0b61  0x0030  0x0b24.bcff18bb  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x2a    9    0x00  0x0b65  0x0057  0x0b24.bcff18b5  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x2b    9    0x00  0x0b64  0x0054  0x0b25.2dacee55  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x2c    9    0x00  0x0b61  0x000d  0x0b24.4e05f2b3  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x2d    9    0x00  0x0b64  0x000e  0x0b23.ff22c611  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x2e    9    0x00  0x0b65  0x0053  0x0b25.7391e78a  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x2f    9    0x00  0x0b66  0x0010  0x0b24.bcff18b1  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x30    9    0x00  0x0b63  0x0019  0x0b24.d60f78e1  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x31    9    0x00  0x0b65  0x001a  0x0b25.1b4e13bc  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x32    9    0x00  0x0b65  0x0037  0x0b24.a6a2d369  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x33    9    0x00  0x0b4f  0x0013  0x0b24.a6a2caa7  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x34    9    0x00  0x0b63  0x0043  0x0b26.6389e19b  0x0042bb4c  0x0000.000.00000000  0x00000001   0x00000000
   0x35    9    0x00  0x0b65  0x0046  0x0b24.337bf409  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x36    9    0x00  0x0b52  0x0061  0x0b25.7391eda2  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x37    9    0x00  0x0b64  0x0018  0x0b24.bcff18ad  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x38    9    0x00  0x0b65  0x0017  0x0b26.6389e197  0x0042bb4c  0x0000.000.00000000  0x00000001   0x00000000
   0x39    9    0x00  0x0b62  0x0006  0x0b24.335c7947  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x3a    9    0x00  0x0b64  0x001c  0x0b25.7391ee5a  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x3b    9    0x00  0x0b4d  0x002e  0x0b25.7391e730  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x3c    9    0x00  0x0b65  0x0032  0x0b24.a6a2d144  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x3d    9    0x00  0x0b65  0x0016  0x0b25.2dacee59  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x3e    9    0x00  0x0b63  0x002d  0x0b23.ff22c60b  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x3f    9    0x00  0x0b63  0x005f  0x0b24.335c7b57  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x40    9    0x00  0x0b65  0x0044  0x0b24.335c7bd9  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x41    9    0x00  0x0b65  0x0029  0x0b24.bcff18b9  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x42    9    0x00  0x0b61  0x0014  0x0b24.337bf3d4  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x43    9    0x00  0x0b5b  0xffff  0x0b26.6389e19d  0x0042bb4c  0x0000.000.00000000  0x00000001   0x00000000
   0x44    9    0x00  0x0b4c  0x004b  0x0b24.335c7bdb  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x45    9    0x00  0x0b61  0x0039  0x0b24.335c7945  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x46    9    0x00  0x0b65  0x005a  0x0b24.337bf421  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x47    9    0x00  0x0b65  0x0027  0x0b25.7391eda8  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x48    9    0x00  0x0b62  0x004e  0x0b24.335c7953  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x49    9    0x00  0x0b63  0x0012  0x0b25.2dacb09f  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x4a    9    0x00  0x0b63  0x0023  0x0b24.335c7bf8  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x4b    9    0x00  0x0b5b  0x004a  0x0b24.335c7bf3  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x4c    9    0x00  0x0b63  0x003f  0x0b24.335c7b55  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x4d    9    0x00  0x0b61  0x001f  0x0b25.1b4e13b3  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x4e    9    0x00  0x0b5a  0x0025  0x0b24.335c795b  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x4f    9    0x00  0x0b5f  0x005c  0x0b25.8ff588bb  0x0042bb4c  0x0000.000.00000000  0x00000001   0x00000000
   0x50    9    0x00  0x0b63  0x004c  0x0b24.335c79a7  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x51    9    0x00  0x0b64  0x005d  0x0b24.0c84cbf4  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x52    9    0x00  0x0b65  0x002b  0x0b25.2dacee49  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x53    9    0x00  0x0b64  0x000c  0x0b25.7391e927  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x54    9    0x00  0x0b63  0x003d  0x0b25.2dacee56  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x55    9    0x00  0x0b64  0x0005  0x0b26.4393eada  0x0042bb4c  0x0000.000.00000000  0x00000001   0x00000000
   0x56    9    0x00  0x0b64  0x0055  0x0b26.4393ead3  0x0042bb4c  0x0000.000.00000000  0x00000001   0x00000000
   0x57    9    0x00  0x0b60  0x0041  0x0b24.bcff18b7  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x58    9    0x00  0x0b65  0x0060  0x0b24.335c794f  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x59    9    0x00  0x0b60  0x0028  0x0b25.2dacaf74  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x5a    9    0x00  0x0b61  0x0003  0x0b24.337bf499  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x5b    9    0x00  0x0b62  0x0000  0x0b25.1b4e13c0  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x5c    9    0x00  0x0b62  0x0056  0x0b25.950a6e4b  0x0042bb4c  0x0000.000.00000000  0x00000001   0x00000000
   0x5d    9    0x00  0x0b63  0x001e  0x0b24.33255ae7  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x5e    9    0x00  0x0b5f  0x0004  0x0b23.ff22c635  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x5f    9    0x00  0x0b64  0x000b  0x0b24.335c7bd5  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x60    9    0x00  0x0b61  0x0048  0x0b24.335c7950  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x61    9    0x00  0x0b65  0x0047  0x0b25.7391eda6  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
KQRCMT: Write failed with error=600 po=0xa47e092c0 cid=3
diagnostics : cid=3 hash=35e74caf flag=2a
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []

到这里我们基本上明白了报错的file 1 block 179020是由FREE BLOCK POOL分配出来的,现在解决给问题的思路就是直接使用bbed分配一个新块即可.

ORA-600 6711
数据库无法正常open报ORA 600 6711错误

Thu Jul  14 04:04:28 2016
alter database open
Beginning crash recovery of 1 threads
 parallel recovery started with 32 processes
Started redo scan
Completed redo scan
 read 1 KB redo, 0 data blocks need recovery
Started redo application at
 Thread 1: logseq 2, block 3
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: /data/amdu/redo/DATA_EC_260.f
Completed redo application of 0.00MB
Completed crash recovery at
 Thread 1: logseq 2, block 5, scn 12269633687653
 0 data blocks read, 0 data blocks written, 1 redo k-bytes read
Thu Jul  14 04:04:29 2016
Thread 1 advanced to log sequence 3 (thread open)
Thread 1 opened at log sequence 3
  Current log# 3 seq# 3 mem# 0: /data/amdu/redo/DATA_EC_263.f
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
After successful startup of the database, please remove
the parameters _allow_error_simulation and _smu_debug_mode
and restart the database
Thu Jul  14 04:04:29 2016
SMON: enabling cache recovery
Undo initialization finished serial:0 start:270626334 end:270626544 diff:210 (2 seconds)
Dictionary check beginning
Tablespace 'TEMP' #3 found in data dictionary,
but not in the controlfile. Adding to controlfile.
Corrected file 19 plugged in read-only status in control file
Corrected file 81 plugged in read-only status in control file
Corrected file 88 plugged in read-only status in control file
Corrected file 93 plugged in read-only status in control file
Corrected file 128 plugged in read-only status in control file
Corrected file 130 plugged in read-only status in control file
Corrected file 131 plugged in read-only status in control file
Corrected file 163 plugged in read-only status in control file
Corrected file 181 plugged in read-only status in control file
Corrected file 184 plugged in read-only status in control file
Corrected file 186 plugged in read-only status in control file
Corrected file 191 plugged in read-only status in control file
Corrected file 214 plugged in read-only status in control file
Corrected file 220 plugged in read-only status in control file
Dictionary check complete
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
*********************************************************************
WARNING: The following temporary tablespaces contain no files.
         This condition can occur when a backup controlfile has
         been restored.  It may be necessary to add files to these
         tablespaces.  That can be done using the SQL statement:
 
         ALTER TABLESPACE <tablespace_name> ADD TEMPFILE
 
         Alternatively, if these temporary tablespaces are no longer
         needed, then they can be dropped.
           Empty temporary tablespace: TEMP
*********************************************************************
Updating character set in controlfile to ZHS16GBK
WARNING: event 8105 is set. This event disables failed
         online index [re]build cleanup
No Resource Manager plan active
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Thu Jul  14 04:04:30 2016
QMNC started with pid=57, OS id=3549 
LOGSTDBY: Validating controlfile with logical metadata
LOGSTDBY: Validation complete
Errors in file /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_3401.trc  (incident=1440450):
ORA-00600: internal error code, arguments: [6711], [4293062], [1], [4318348], [0], [], [], [], [], [], [], []
Incident details in: /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/incident/incdir_1440450/xifenfei_ora_3401_i1440450.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Error 600 in kwqmnpartition(), aborting txn 
Thu Jul  14 04:04:32 2016
Dumping diagnostic data in directory=[cdmp_20801214040432], requested by (instance=1, osid=3401), summary=[incident=1440450].
Thu Jul  14 04:04:32 2016
Errors in file /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_mmon_3329.trc  (incident=1440178):
ORA-00600: internal error code, arguments: [6711], [4293062], [1], [4318348], [0], [], [], [], [], [], [], []
Incident details in: /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/incident/incdir_1440178/xifenfei_mmon_3329_i1440178.trc
Errors in file /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_3401.trc  (incident=1440451):
ORA-00600: internal error code, arguments: [6711], [4293062], [1], [4318348], [0], [], [], [], [], [], [], []
Incident details in: /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/incident/incdir_1440451/xifenfei_ora_3401_i1440451.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_mmon_3329.trc  (incident=1440179):
ORA-00600: internal error code, arguments: [6711], [4293062], [1], [4318348], [0], [], [], [], [], [], [], []
Incident details in: /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/incident/incdir_1440179/xifenfei_mmon_3329_i1440179.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
ORA-600 signalled during: alter database open...

数据库正常open失败,但是可以upgrade启动.根据对trace文件的分析,定位到问题在HISTGRM$表上面,进一步分析该表结构为

CREATE TABLE SYS.HISTGRM$
(
  OBJ#      NUMBER,
  COL#      NUMBER,
  ROW#      NUMBER,
  BUCKET    NUMBER,
  ENDPOINT  NUMBER,
  INTCOL#   NUMBER,
  EPVALUE   VARCHAR2(1000 BYTE),
  SPARE1    NUMBER,
  SPARE2    NUMBER
)
CLUSTER SYS.C_OBJ#_INTCOL#(OBJ#, INTCOL#);

这个和ORA-600 6711错误相匹配了

ERROR:
ORA-600 [6711] [a] [b] 1 [d]
VERSIONS:
versions 6.0 to 12.1
DESCRIPTION:
This error is generated when we find more blocks on a cluster key
chain than are supposed to be there.
Usually this indicates that the chain contains a loop within itself. We
cannot have more than 65535 blocks in a chain.
ARGUMENTS:
Arg [a] beginning DBA
Arg [b] table slot number (in table index)
Arg {c} dba of key next in chain
Arg [d] row slot of key next in chain

需要处理该错误,也就是需要处理CLUSTER SYS.C_OBJ#_INTCOL#,这个可以通过重建来实现,但是当重建之时发生

ORA-00600: internal error code, arguments: [kkoipt:invalid aptyp], [0], [0], [], [], [], [], [], [], [], [], []
ORA-08102: index key not found, obj# 39, file 1, block 1374829 (2)

这里错误比较明显obj#=39为obj$的i_obj4的index的记录和表不匹配,导致任何ddl无法执行,因此如果要处理C_OBJ#_INTCOL#就必须要先处理i_obj4的问题.通过一些技巧重建i_obj4,然后重建C_OBJ#_INTCOL#,数据库终于可以正常打开.由于大量数据字典不一致,exp/expdp导出依旧有问题,通过dblink直接拉数据到新库,完成本次恢复
补充说明:1. 在这个库的恢复过程中,我们还使用了大量的event和隐含参数,因为比较常规而且不涉及核心环节,因为未列举出来;2. 由于当时操作记录未能够保留日志因此相关操作步骤无法贴出来,本文只能提供恢复处理思路
再次提醒各位数据库做好备份,做好巡检工作,哪怕是强大的Oracle exadata也禁不起无备份折腾,数据重于一切

recreate crontrolfile lost datafile—-MISSING0000N

在Oracle职业生涯中,恢复过生产环境数据库也有几百个.对于Oracle恢复我还是相当的自信,今天因为自己的一时过于自信,对于环境错了错误的判断,简单问题复杂化,差点变成悲剧
数据库最初故障

Thu Sep 25 09:27:26 2014
MMON started with pid=15, OS id=1968 
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
starting up 1 shared server(s) ...
ORACLE_BASE from environment = F:\oracle
Thu Sep 25 09:27:26 2014
ALTER DATABASE   MOUNT
Thu Sep 25 09:27:26 2014
MMNL started with pid=16, OS id=5976 
Errors in file f:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_4624.trc:
ORA-00202: ????: ''F:\ORACLE\ORADATA\ORCL\CONTROL01.CTL
ORA-27070: ????/????
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 23) 数据错误(循环冗余检查)。
Thu Sep 25 09:28:31 2014
ORA-204 signalled during: ALTER DATABASE   MOUNT...

因为硬件或者系统层面问题,导致控制文件无法正常访问

重建控制文件

Fri Sep 26 12:28:44 2014
Successful mount of redo thread 1, with mount id 1387065723
Completed: CREATE CONTROLFILE REUSE DATABASE "orcl" RESETLOGS ARCHIVELOG
    MAXLOGFILES 5
    MAXLOGMEMBERS 3
    MAXDATAFILES 100
    MAXINSTANCES 2
    MAXLOGHISTORY 226
LOGFILE
  GROUP 1 'F:\oracle\oradata\orcl\REDO01.LOG'  SIZE 50M,  --redo log ????
  GROUP 2 'F:\oracle\oradata\orcl\REDO02.LOG'  SIZE 50M,  --redo log ????
  GROUP 3 'F:\oracle\oradata\orcl\REDO03.LOG'  SIZE 50M  --redo log ????
-- STANDBY LOGFILE
DATAFILE
  'F:\oracle\oradata\orcl\SYSAUX01.DBF',  --sysaux???????
  'F:\oracle\oradata\orcl\SYSTEM01.DBF',
  'F:\oracle\oradata\orcl\USERS01.DBF', --user????????
  'F:\oracle\oradata\orcl\UNDOTBS01.DBF' --undo???????
CHARACTER SET ZHS16GBK
Fri Sep 26 12:29:55 2014
alter database open resetlogs
ORA-1194 signalled during: alter database open resetlogs...

埋下了雷,创建控制文件中未全部列举出来所有数据文件

进行不完全恢复,尝试resetlogs库发现redo异常

Fri Sep 26 14:13:24 2014
ALTER DATABASE   MOUNT
Fri Sep 26 14:13:24 2014
MMNL started with pid=16, OS id=9024 
Successful mount of redo thread 1, with mount id 1387037444
Database mounted in Exclusive Mode
Lost write protection disabled
Completed: ALTER DATABASE   MOUNT
Fri Sep 26 14:14:08 2014
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
Fri Sep 26 14:15:16 2014
Errors in file f:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_3720.trc:
ORA-00333: 重做日志读取块 2049 计数 6143 出错
ORA-00312: 联机日志 1 线程 1: 'F:\ORACLE\ORADATA\ORCL\REDO01.LOG'
ORA-27070: 异步读取/写入失败
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 23) 数据错误(循环冗余检查)。
Fri Sep 26 14:16:24 2014
Errors in file f:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_3720.trc:
ORA-00333: 重做日志读取块 1 计数 8191 出错
ORA-00312: 联机日志 1 线程 1: 'F:\ORACLE\ORADATA\ORCL\REDO01.LOG'
ORA-27070: 异步读取/写入失败
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 23) 数据错误(循环冗余检查)。
Errors in file f:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_3720.trc:
ORA-00333: 重做日志读取块 1 计数 8191 出错
ARCH: All Archive destinations made inactive due to error 333

使用隐含参数尝试拉库,报ORA-600[2662]

Fri Sep 26 14:16:45 2014
SMON: enabling cache recovery
Errors in file f:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_3720.trc  (incident=57761):
ORA-00600: 内部错误代码, 参数: [2662], [0], [38221304], [0], [38352371], [4194545], [], [], [], [], [], []
Incident details in: f:\oracle\diag\rdbms\orcl\orcl\incident\incdir_57761\orcl_ora_3720_i57761.trc
Fri Sep 26 14:16:45 2014
ARC3 started with pid=23, OS id=9692 
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Errors in file f:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_3720.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00600: 内部错误代码, 参数: [2662], [0], [38221304], [0], [38352371], [4194545], [], [], [], [], [], []
Errors in file f:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_3720.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00600: 内部错误代码, 参数: [2662], [0], [38221304], [0], [38352371], [4194545], [], [], [], [], [], []
Error 704 happened during db open, shutting down database
USER (ospid: 3720): terminating the instance due to error 704
Instance terminated by USER, pid = 3720
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (3720) as a result of ORA-1092

数据库在未使用所有数据文件的情况下,进行了resetlogs操作,悲剧的本质已经注定,我的失误是没有评估好现状,还继续在错误的道路上越走越远.

我开始接手该库现况

Database mounted in Exclusive Mode
Lost write protection disabled
Completed: ALTER DATABASE   MOUNT
Fri Sep 26 14:18:55 2014
alter database open
Errors in file f:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_8968.trc:
ORA-01113: 文件 1 需要介质恢复
ORA-01110: 数据文件 1: 'F:\ORACLE\ORADATA\ORCL\SYSTEM01.DBF'
ORA-1113 signalled during: alter database open...
Fri Sep 26 14:19:31 2014
alter database open 
Errors in file f:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_8968.trc:
ORA-01113: 文件 1 需要介质恢复
ORA-01110: 数据文件 1: 'F:\ORACLE\ORADATA\ORCL\SYSTEM01.DBF'
ORA-1113 signalled during: alter database open ...
Fri Sep 26 14:22:26 2014
ALTER DATABASE RECOVER  database  
Media Recovery Start
 started logmerger process
Fri Sep 26 14:22:26 2014
Media Recovery failed with error 16433
Recovery Slave PR00 previously exited with exception 283
ORA-283 signalled during: ALTER DATABASE RECOVER  database  ...
Fri Sep 26 14:24:25 2014
ALTER DATABASE RECOVER  datafile 'F:\ORACLE\ORADATA\ORCL\SYSTEM01.DBF'  
Media Recovery Start
Media Recovery failed with error 16433
ORA-283 signalled during: ALTER DATABASE RECOVER  datafile 'F:\ORACLE\ORADATA\ORCL\SYSTEM01.DBF'  ...
Fri Sep 26 14:28:47 2014
alter database open read write
Errors in file f:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_8968.trc:
ORA-01113: 文件 1 需要介质恢复
ORA-01110: 数据文件 1: 'F:\ORACLE\ORADATA\ORCL\SYSTEM01.DBF'
ORA-1113 signalled during: alter database open read write...
Fri Sep 26 14:31:48 2014
ALTER DATABASE RECOVER  datafile 'F:\oracle\oradata\orcl\SYSTEM01.DBF'  
Media Recovery Start
Media Recovery failed with error 16433
ORA-283 signalled during: ALTER DATABASE RECOVER  datafile 'F:\oracle\oradata\orcl\SYSTEM01.DBF'  ...

提示ORA-01110: 数据文件 1需要恢复,尝试recover操作

尝试recover操作

连接到:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> recover database ;
ORA-00283: recovery session canceled due to errors
ORA-16433: The database must be opened in read/write mode.


SQL> alter database backup controlfile to trace as 'd:\ctl.txt';
alter database backup controlfile to trace as 'd:\ctl.txt'
*
第 1 行出现错误:
ORA-16433: 必须以读/写模式打开数据库。


SQL> recover database using backup controlfile;
ORA-00283: recovery session canceled due to errors
ORA-16433: The database must be opened in read/write mode.

重建控制文件

SQL> shutdown immediate;
ORA-01109: 数据库未打开


已经卸载数据库。
ORACLE 例程已经关闭。
SQL> STARTUP NOMOUNT
ORACLE 例程已经启动。

Total System Global Area  970895360 bytes
Fixed Size                  1375452 bytes
Variable Size             603980580 bytes
Database Buffers          360710144 bytes
Redo Buffers                4829184 bytes
SQL> CREATE CONTROLFILE REUSE DATABASE orcl NORESETLOGS FORCE LOGGING ARCHIVELOG

  2      MAXLOGFILES 16
  3      MAXLOGMEMBERS 3
  4      MAXDATAFILES 100
  5      MAXINSTANCES 8
  6      MAXLOGHISTORY 2921
  7  LOGFILE
  8    GROUP 1 'F:\ORACLE\ORADATA\ORCL\REDO01.LOG'  SIZE 50M,
  9    GROUP 2 'F:\ORACLE\ORADATA\ORCL\REDO02.LOG'  SIZE 50M,
 10    GROUP 3 'F:\ORACLE\ORADATA\ORCL\REDO03.LOG'  SIZE 50M
 11  DATAFILE
 12    'F:\ORACLE\ORADATA\ORCL\SYSTEM01.DBF',
 13    'F:\ORACLE\ORADATA\ORCL\SYSAUX01.DBF',
 14    'F:\ORACLE\ORADATA\ORCL\UNDOTBS01.DBF',
 15    'F:\ORACLE\ORADATA\ORCL\USERS01.DBF'
 16  CHARACTER SET ZHS16GBK
 17  ;

控制文件已创建。

这一步严重发错,在恢复前未认真看alert日志,太依赖v$datafile查询出来结果,导致重建控制文件丢失数据文件,埋下大雷。根据前面alert日志报错ORA-600 2662,决定一并处理该问题,然后进行恢复

SQL> shutdown immediate;
ORA-01109: ??????


已经卸载数据库。
ORACLE 例程已经关闭。
SQL> startup pfile='d:\pfile.txt'  mount;
ORACLE 例程已经启动。

Total System Global Area  970895360 bytes
Fixed Size                  1375452 bytes
Variable Size             603980580 bytes
Database Buffers          360710144 bytes
Redo Buffers                4829184 bytes
数据库装载完毕。
SQL> recover database;
完成介质恢复。
SQL> alter database open;
alter database open
*
第 1 行出现错误:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [4194], [], [

数据库报ORA-600 4194,直接修改undo_management=manual,然后尝试启动数据库

SQL> conn / as sysdba
已连接到空闲例程。
SQL> startup pfile='d:\pfile.txt'
ORACLE 例程已经启动。

Total System Global Area  970895360 bytes
Fixed Size                  1375452 bytes
Variable Size             603980580 bytes
Database Buffers          360710144 bytes
Redo Buffers                4829184 bytes
数据库装载完毕。
数据库已经打开。
SQL> select name from v$datafile;

NAME
--------------------------------------------------------------------------------

F:\ORACLE\ORADATA\ORCL\SYSTEM01.DBF
F:\ORACLE\ORADATA\ORCL\SYSAUX01.DBF
F:\ORACLE\ORADATA\ORCL\UNDOTBS01.DBF
F:\ORACLE\ORADATA\ORCL\USERS01.DBF
F:\ORACLE\PRODUCT\11.2.0\DBHOME_1\DATABASE\MISSING00005
F:\ORACLE\PRODUCT\11.2.0\DBHOME_1\DATABASE\MISSING00006

已选择6行。

SQL> alter database rename file 'F:\ORACLE\PRODUCT\11.2.0\DBHOME_1\DATABASE\MISSING00005' to 'F:\oracle\oradata\SOURCE_DATA1.DBF';

数据库已更改。

SQL> alter database rename file 'F:\ORACLE\PRODUCT\11.2.0\DBHOME_1\DATABASE\MISSING00006' to 'F:\oracle\oradata\SOURCE_idx1.DBF';

数据库已更改。

SQL> shutdown immediate;
数据库已经关闭。
已经卸载数据库。
ORACLE 例程已经关闭。
SQL> startup mount pfile='d:\pfile.txt'
ORACLE 例程已经启动。

Total System Global Area  970895360 bytes
Fixed Size                  1375452 bytes
Variable Size             603980580 bytes
Database Buffers          360710144 bytes
Redo Buffers                4829184 bytes
数据库装载完毕。
SQL> alter datafile 5 online;
alter datafile 5 online
      *
第 1 行出现错误:
ORA-00940: 无效的 ALTER 命令


SQL> alter database datafile 5 online;

数据库已更改。

SQL> alter database datafile 6 online;

数据库已更改。

SQL> recover database until cancel;
ORA-00283: recovery session canceled due to errors
ORA-19909: datafile 5 belongs to an orphan incarnation
ORA-01110: data file 5: 'F:\ORACLE\ORADATA\SOURCE_DATA1.DBF'


SQL> alter database open resetlogs;
alter database open resetlogs
*
第 1 行出现错误:
ORA-01139: RESETLOGS 选项仅在不完全数据库恢复后有效


SQL> alter database datafile 6 offline;

数据库已更改。

SQL> alter database datafile 5 offline;

数据库已更改。

SQL> recover database until cancel;
完成介质恢复。
SQL> alter database datafile 6 online;

数据库已更改。

SQL> alter database datafile 5 online;

数据库已更改。

SQL> alter database open resetlogs;

数据库已更改。

还好结合一些隐含参数侥幸恢复成功,差点到了要使用bbed的程度

这次的恢复告诉我:Oracle数据库恢复千万比大意,需要认真分析alert日志和咨询客户做了那些操作,不然可能导致万劫不复之禁地

又一起存储故障导致ORA-00333 ORA-00312恢复

数据库启动报ORA-00333 ORA-00312错误,无法正常open数据库

Thu Aug 07 10:42:03  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_arc0_4724.trc:
ORA-00333: redo log read error block 63489 count 2048
ORA-00312: online log 2 thread 1: 'F:\ORADATA\SZCG\REDO02.LOG'
ORA-27091: unable to queue I/O
ORA-27070: async read/write failed
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 1) 函数不正确。

Thu Aug 07 10:42:03  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_arc0_4724.trc:
ORA-00333: redo log read error block 63489 count 2048

Thu Aug 07 10:42:03  2014
ARC0: All Archive destinations made inactive due to error 333
Thu Aug 07 10:42:03  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_1856.trc:
ORA-00449: 后台进程 'LGWR' 因错误 340 异常终止
ORA-00340: 处理联机日志  (用于线程 ) 时出现 I/O 错误

Thu Aug 07 10:42:03  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_6548.trc:
ORA-00449: 后台进程 'LGWR' 因错误 340 异常终止
ORA-00340: 处理联机日志  (用于线程 ) 时出现 I/O 错误

Thu Aug 07 10:42:03  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_8104.trc:
ORA-00449: 后台进程 'LGWR' 因错误 340 异常终止
ORA-00340: 处理联机日志  (用于线程 ) 时出现 I/O 错误

Thu Aug 07 10:42:03  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_lgwr_884.trc:
ORA-00340: IO error processing online log 3 of thread 1
ORA-00345: redo log write error block 65238 count 13
ORA-00312: online log 3 thread 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27070: async read/write failed
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1) 函数不正确。

Thu Aug 07 10:42:03  2014
LGWR: terminating instance due to error 340
Thu Aug 07 10:42:05  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_8104.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00449: background process 'LGWR' unexpectedly terminated with error 340
ORA-00340: IO error processing online log  of thread 

Thu Aug 07 10:42:05  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_1856.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00449: background process 'LGWR' unexpectedly terminated with error 340
ORA-00340: IO error processing online log  of thread 

Thu Aug 07 10:42:05  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_6548.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00449: background process 'LGWR' unexpectedly terminated with error 340
ORA-00340: IO error processing online log  of thread 


Thu Aug 07 17:40:05  2014
ALTER DATABASE OPEN
Thu Aug 07 17:40:05  2014
Beginning crash recovery of 1 threads
 parallel recovery started with 15 processes
Thu Aug 07 17:40:06  2014
Started redo scan
Thu Aug 07 17:40:06  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27070: 异步读取/写入失败
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1) 函数不正确。

Thu Aug 07 17:40:06  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27091: 无法将 I/O 排队
ORA-27070: 异步读取/写入失败
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 1) 函数不正确。

Thu Aug 07 17:40:06  2014
Aborting crash recovery due to error 333
Thu Aug 07 17:40:06  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错

ORA-333 signalled during: ALTER DATABASE OPEN...

进一步检查发现在7月6日系统就已经报io异常

Sun Jul 06 10:05:23  2014
ARC0: All Archive destinations made inactive due to error 333
Sun Jul 06 10:06:07  2014
KCF: write/open error block=0xd03 online=1
     file=3 F:\ORADATA\SZCG\SYSAUX01.DBF
     error=27070 txt: 'OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1) 函数不正确。'
Automatic datafile offline due to write error on
file 3: F:\ORADATA\SZCG\SYSAUX01.DBF
Sun Jul 06 10:06:23  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_arc1_2676.trc:
ORA-00333: redo log read error block 63489 count 2048
ORA-00312: online log 2 thread 1: 'F:\ORADATA\SZCG\REDO02.LOG'
ORA-27091: unable to queue I/O
ORA-27070: async read/write failed
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 1) 函数不正确。

Thu Aug 07 10:36:54  2014
ARC1: All Archive destinations made inactive due to error 333
Thu Aug 07 10:37:25  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_m000_5832.trc:
ORA-01135: file 3 accessed for DML/query is offline
ORA-01110: data file 3: 'F:\ORADATA\SZCG\SYSAUX01.DBF'

检查硬件发现raid一块盘完全损坏,另外一块盘也处于告警状态,保护现场拷贝文件过程中发现redo02,redo03,sysaux无法拷贝,使用rman检查发现
sysaux-block


因为redo完全损坏,使用工具跳过坏块,拷贝相关有坏块文件到其他目录,重命名相关文件尝试启动数据库,依然报ORA-00333 ORA-00312

Started redo scan
Thu Aug 07 17:40:06  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27070: 异步读取/写入失败
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1) 函数不正确。

Thu Aug 07 17:40:06  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27091: 无法将 I/O 排队
ORA-27070: 异步读取/写入失败
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 1) 函数不正确。

Thu Aug 07 17:40:06  2014
Aborting crash recovery due to error 333
Thu Aug 07 17:40:06  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错

ORA-333 signalled during: ALTER DATABASE OPEN...

设置隐含参数_allow_resetlogs_corruption,尝试强制拉库

Started redo scan
Fri Aug 08 12:13:25  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_3892.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27070: 异步读取/写入失败
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1) 函数不正确。

Fri Aug 08 12:13:25  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_3892.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27091: 无法将 I/O 排队
ORA-27070: 异步读取/写入失败
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 1) 函数不正确。

Fri Aug 08 12:13:25  2014
Aborting crash recovery due to error 333
Fri Aug 08 12:13:25  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_3892.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错

ORA-333 signalled during: ALTER DATABASE OPEN...
Fri Aug 08 12:13:45  2014
ALTER DATABASE RECOVER  database until cancel  
Fri Aug 08 12:13:45  2014
Media Recovery Start
 parallel recovery started with 15 processes
ORA-279 signalled during: ALTER DATABASE RECOVER  database until cancel  ...
Fri Aug 08 12:13:55  2014
ALTER DATABASE RECOVER    CANCEL  
Fri Aug 08 12:13:59  2014
ORA-1547 signalled during: ALTER DATABASE RECOVER    CANCEL  ...
Fri Aug 08 12:13:59  2014
ALTER DATABASE RECOVER CANCEL 
ORA-1112 signalled during: ALTER DATABASE RECOVER CANCEL ...
Fri Aug 08 12:14:12  2014
alter database open resetlogs
Fri Aug 08 12:14:13  2014
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
ORA-1245 signalled during: alter database open resetlogs...
Fri Aug 08 12:54:11  2014
alter tablespace sysaux offline
Fri Aug 08 12:54:11  2014
ORA-1109 signalled during: alter tablespace sysaux offline...
Fri Aug 08 13:05:30  2014
alter database open
Fri Aug 08 13:05:30  2014
ORA-1589 signalled during: alter database open...

在offline过程中,数据库检查到sysaux数据文件为offline状态,当表空间只有一个数据文件,而且该数据文件为offline,数据库将会尝试offline sysaux表空间,但是发现该表空间文件非正常scn,无法offline 表空间,导致resetlogs操作失败。这里是操作失误应该先online相关数据文件,然后再进行resetlogs操作

Sat Aug 09 11:56:03  2014
alter database datafile 3 online
Sat Aug 09 11:56:04  2014
Completed: alter database datafile 3 online
Sat Aug 09 11:56:08  2014
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
Sat Aug 09 11:56:18  2014
ARCH: Encountered disk I/O error 19502
Sat Aug 09 11:56:18  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_4516.trc:
ORA-19502: 文件 "F:\ARCHIVE\ARC01745_0814618167.001", 块编号 55297 写错误 (块大小 = 512) 
ORA-27072: 文件 I/O 错误
OSD-04008: WriteFile() 失败, 无法写入文件
O/S-Error: (OS 1) 函数不正确。
ORA-19502: 文件 "F:\ARCHIVE\ARC01745_0814618167.001", 块编号 55297 写错误 (块大小 = 512) 

Sat Aug 09 11:56:18  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_4516.trc:
ORA-19502: 文件 "F:\ARCHIVE\ARC01745_0814618167.001", 块编号 55297 写错误 (块大小 = 512) 
ORA-27072: 文件 I/O 错误
OSD-04008: WriteFile() 失败, 无法写入文件
O/S-Error: (OS 1) 函数不正确。
ORA-19502: 文件 "F:\ARCHIVE\ARC01745_0814618167.001", 块编号 55297 写错误 (块大小 = 512) 

ARCH: I/O error 19502 archiving log 3 to 'F:\ARCHIVE\ARC01745_0814618167.001'
Sat Aug 09 11:56:18  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_4516.trc:
ORA-00265: 要求实例恢复, 无法设置 ARCHIVELOG 模式

Archive all online redo logfiles failed:265
RESETLOGS after incomplete recovery UNTIL CHANGE 77983856
Resetting resetlogs activation ID 3562192628 (0xd452bef4)
Online log F:\ORADATA\SZCG\REDO01.LOG: Thread 1 Group 1 was previously cleared
Online log F:\ORADATA\SZCG\REDO02.LOG: Thread 1 Group 2 was previously cleared
Online log D:\REDO04.LOG: Thread 1 Group 4 was previously cleared
Sat Aug 09 11:56:22  2014
Setting recovery target incarnation to 3
Sat Aug 09 11:56:23  2014
Assigning activation ID 3602586269 (0xd6bb1a9d)
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=33, OS id=5900
Sat Aug 09 11:56:23  2014
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=34, OS id=5776
Sat Aug 09 11:56:24  2014
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: F:\ORADATA\SZCG\REDO01.LOG
Successful open of redo thread 1
Sat Aug 09 11:56:24  2014
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sat Aug 09 11:56:24  2014
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
Sat Aug 09 11:56:24  2014
ARC0: Becoming the heartbeat ARCH
Sat Aug 09 11:56:24  2014
SMON: enabling cache recovery
Sat Aug 09 11:56:25  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_4516.trc:
ORA-00600: 内部错误代码, 参数: [2662], [0], [77983864], [0], [77992379], [8388617], [], []

Sat Aug 09 11:56:26  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_4516.trc:
ORA-00600: 内部错误代码, 参数: [2662], [0], [77983864], [0], [77992379], [8388617], [], []

Sat Aug 09 11:56:26  2014
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Instance terminated by USER, pid = 4516
ORA-1092 signalled during: alter database open resetlogs...

ORA-600 2662这个错误很熟悉,直接推SCN,数据库open,但是报ORA-600 4194

Sat Aug 09 12:01:28  2014
SMON: enabling cache recovery
Dictionary check complete
Sat Aug 09 12:01:32  2014
SMON: enabling tx recovery
Sat Aug 09 12:01:32  2014
Database Characterset is ZHS16GBK
Opening with internal Resource Manager plan 
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=34, OS id=6116
Sat Aug 09 12:01:34  2014
LOGSTDBY: Validating controlfile with logical metadata
Sat Aug 09 12:01:34  2014
LOGSTDBY: Validation complete
Sat Aug 09 12:01:34  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_smon_920.trc:
ORA-00600: internal error code, arguments: [4194], [21], [53], [], [], [], [], []

Sat Aug 09 12:01:36  2014
Doing block recovery for file 2 block 319
Resuming block recovery (PMON) for file 2 block 319
Block recovery from logseq 2, block 56 to scn 1073742003
Sat Aug 09 12:01:36  2014
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: F:\ORADATA\SZCG\REDO02.LOG
Block recovery stopped at EOT rba 2.79.16
Block recovery completed at rba 2.79.16, scn 0.1073742002
Doing block recovery for file 2 block 153
Resuming block recovery (PMON) for file 2 block 153
Block recovery from logseq 2, block 56 to scn 1073741986
Sat Aug 09 12:01:36  2014
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: F:\ORADATA\SZCG\REDO02.LOG
Block recovery completed at rba 2.66.16, scn 0.1073741988
Sat Aug 09 12:01:36  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_smon_920.trc:
ORA-01595: error freeing extent (4) of rollback segment (10))
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [4194], [21], [53], [], [], [], [], []

Sat Aug 09 12:01:36  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5272.trc:
ORA-00600: internal error code, arguments: [4194], [21], [53], [], [], [], [], []

Sat Aug 09 12:01:36  2014
Completed: alter database open

尝试重建undo表空间并切换undo_tabspace到新undo表空间解决,因为数据库在恢复过程中使用了隐含参数强制拉库,不能保证数据一致性,强烈建议逻辑方式重建数据库
在本次故障中,所幸的是只有redo和sysaux文件损坏,如果是业务数据文件或者system数据文件损坏,恢复的后果可能更加麻烦,丢失数据可能更加多。再次说明:数据库备份非常重要,数据的安全性不能完全寄希望于硬件之上

ORACLE 8.1.7 数据库ORA-600 4194故障恢复

一个817数据库报ORA-600 4194 无法正常启动

Fri Jul 25 10:49:47 2014
Database mounted in Exclusive Mode.
Completed: ALTER DATABASE   MOUNT
Fri Jul 25 10:49:58 2014
ALTER DATABASE RECOVER  database  
Fri Jul 25 10:49:58 2014
Media Recovery Start
Media Recovery Log 
Recovery of Online Redo Log: Thread 1 Group 2 Seq 3320 Reading mem 0
  Mem# 0 errs 0: D:\ORACLE\ORADATA\ORCL\REDO02.LOG
Media Recovery Complete
Completed: ALTER DATABASE RECOVER  database  
Fri Jul 25 10:50:09 2014
alter database open

Beginning crash recovery of 1 threads
Fri Jul 25 10:50:09 2014
Thread recovery: start rolling forward thread 1
Recovery of Online Redo Log: Thread 1 Group 2 Seq 3320 Reading mem 0
  Mem# 0 errs 0: D:\ORACLE\ORADATA\ORCL\REDO02.LOG
Fri Jul 25 10:50:09 2014
Thread recovery: finish rolling forward thread 1
Thread recovery: 0 data blocks read, 0 data blocks written, 3 redo blocks read
Crash recovery completed successfully
Fri Jul 25 10:50:09 2014
Thread 1 advanced to log sequence 3321
Thread 1 opened at log sequence 3321
  Current log# 3 seq# 3321 mem# 0: D:\ORACLE\ORADATA\ORCL\REDO01.LOG
Successful open of redo thread 1.
Fri Jul 25 10:50:09 2014
SMON: enabling cache recovery
Fri Jul 25 10:50:10 2014
Errors in file D:\oracle\admin\ORCL\udump\ORA03216.TRC:
ORA-00600: ??????????: [4194], [12], [37], [], [], [], [], []

Fri Jul 25 10:50:10 2014
Recovery of Online Redo Log: Thread 1 Group 3 Seq 3321 Reading mem 0
  Mem# 0 errs 0: D:\ORACLE\ORADATA\ORCL\REDO01.LOG
Fri Jul 25 10:50:10 2014
SMON: disabling cache recovery
Fri Jul 25 10:50:10 2014
ORA-600 signalled during: alter database open

ORA-600 4194这个错误在数据库异常恢复中非常常见,因为库不是很重要,因此就是直接屏蔽掉故障回滚段,然后强制拉库,该库的恢复过程中,也直接使用隐含参数屏蔽回滚段
_corrupted_rollback_segments= RBS0, RBS1, RBS2, RBS3, RBS4, RBS5, RBS6, RBS_HDSYS,数据库依然无法open,进一步分析trace文件

Fri Jul 25 11:26:07 2014
ORACLE V8.1.7.0.0 - Production vsnsta=0
vsnsql=e vsnxtr=3
Windows 2000 Version 5.2 Service Pack 2, CPU type 586
Oracle8i Release 8.1.7.0.0 - Production
JServer Release 8.1.7.0.0 - Production
Windows 2000 Version 5.2 Service Pack 2, CPU type 586
Instance name: orcl

Redo thread mounted by this instance: 1

Oracle process number: 14

Windows thread id: 3648, image: ORACLE.EXE


*** SESSION ID:(11.1) 2014-07-25 11:26:07.843
*** 2014-07-25 11:26:07.843
ksedmp: internal or fatal error
ORA-00600: ??????????: [4194], [12], [37], [], [], [], [], []
Current SQL statement for this session:
update undo$ set name=:2,file#=:3,block#=:4,status$=:5,user#=:6,undosqn=:7,xactsqn=:8,
scnbas=:9,scnwrp=:10,inst#=:11,ts#=:12 where us#=:1
----- Call Stack Trace -----

这里很明显看出来,数据库是在open过程中,update undo$表遭遇到ORA-600 4194,因为该过程需要使用系统回滚段,但是由于其所对应的undo和redo信息不一致,所以无法正常启动数据库.继续读trace文件

  Extent Control Header
  -----------------------------------------------------------------
  Extent Header:: spare1: 0      space2: 0      #extents: 5      #blocks: 49    
                  last map  0x00000000  #maps: 0      offset: 4128  
      Highwater::  0x00400006  ext#: 0      blk#: 3      ext size: 9     
  #blocks in seg. hdr's freelists: 0     
  #blocks below: 0     
  mapblk  0x00000000  offset: 0     
                   Unlocked
     Map Header:: next  0x00000000  #extents: 5    obj#: 0      flag: 0x40000000
  Extent Map
  -----------------------------------------------------------------
   0x00400003  length: 9     
   0x0040000c  length: 10    
   0x0040008f  length: 10    
   0x00400099  length: 10    
   0x004000a3  length: 10    
  
  TRN CTL:: seq: 0x003c chd: 0x004e ctl: 0x0050 inc: 0x00000000 nfb: 0x0000
            mgc: 0x8002 xts: 0x0068 flg: 0x0001 opt: 2147483646 (0x7ffffffe)
            uba: 0x00400006.003c.25 scn: 0x0000.009a4009
Version: 0x01
  FREE BLOCK POOL::
    uba: 0x00000000.003c.24 ext: 0x0  spc: 0x196   
    uba: 0x00000000.001f.14 ext: 0x1  spc: 0x16f6  
    uba: 0x00000000.0018.02 ext: 0x4  spc: 0x1f1a  
    uba: 0x00000000.0000.00 ext: 0x0  spc: 0x0     
    uba: 0x00000000.0000.00 ext: 0x0  spc: 0x0     
  TRN TBL::

通过这里可以看出来,数据库在启动的时候,使用system undo的block为为0x00400006,使用bbed清除掉该uba记录,让数据库启动的时候重新分配system undo block给数据库执行update undo$使用,数据库open成功

BBED> m /x 0x00000000
 File: D:\ORACLE\ORADATA\ORCL\SYSTEM01.DBF (0)
 Block: 2                Offsets: 4188 to 4192           Dba:0x00000000
------------------------------------------------------------------------
 00000000 3c002400 00009601 00000000 1f001400 0100f616 00000000 18000200

BBED> m /x 0x0000
 File: D:\ORACLE\ORADATA\ORCL\SYSTEM01.DBF (0)
 Block: 2                Offsets: 4028 to 4032           Dba:0x00000000
------------------------------------------------------------------------
 00000000 00000000 3c005000 02800100 68000000 feffff7f 06004000 3c002400
Sat Jul 26 12:09:21 2014
Thread recovery: start rolling forward thread 1
Recovery of Online Redo Log: Thread 1 Group 2 Seq 3326 Reading mem 0
  Mem# 0 errs 0: D:\ORACLE\ORADATA\ORCL\REDO02.LOG
Sat Jul 26 12:09:21 2014
Thread recovery: finish rolling forward thread 1
Thread recovery: 0 data blocks read, 0 data blocks written, 3 redo blocks read
Crash recovery completed successfully
Sat Jul 26 12:09:22 2014
Thread 1 advanced to log sequence 3327
Thread 1 opened at log sequence 3327
  Current log# 3 seq# 3327 mem# 0: D:\ORACLE\ORADATA\ORCL\REDO01.LOG
Successful open of redo thread 1.
Sat Jul 26 12:09:22 2014
SMON: enabling cache recovery
SMON: enabling tx recovery
Sat Jul 26 12:09:39 2014
Completed: alter database open

ORA-607/ORA-600[4194]不一定是重大灾难

以前解决过ORA-607/ORA-600[4194]和模拟过ORA-607/ORA-600[4194]错误,所以固定思维任务ORA-607/ORA-600[4194]可能就是重大灾难,通过这个案例来说明ORA-607/ORA-600[4194]可能也就是一个常规的不能再常规的错误:有一网友数据库因意外关闭电源导致启动过程出现ORA-00607/ORA-00600[4194]/ORA-00600[4097]的错误,使得数据库启动失败.

SMON: enabling tx recovery
Fri Aug 31 23:14:08 2012
Database Characterset is ZHS16GBK
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=19, OS id=15619
Fri Aug 31 23:14:10 2012
Errors in file /u01/oradata/orcl/bdump/orcl_smon_15589.trc:
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Fri Aug 31 23:14:12 2012
Non-fatal internal error happenned while SMON was doing logging scn->time mapping.
SMON encountered 1 out of maximum 100 non-fatal internal errors.
Fri Aug 31 23:14:12 2012
Completed: alter database open
Fri Aug 31 23:14:14 2012
Errors in file /u01/oradata/orcl/bdump/orcl_smon_15589.trc:
ORA-01595: error freeing extent (2) of rollback segment (4))
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [4194], [48], [34], [], [], [], [], []
Fri Aug 31 23:29:41 2012
Errors in file /u01/oradata/orcl/bdump/orcl_smon_15589.trc:
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [4194], [17], [10], [], [], [], [], []
Fri Aug 31 23:29:43 2012
Errors in file /u01/oradata/orcl/bdump/orcl_smon_15589.trc:
ORA-00600: internal error code, arguments: [4194], [48], [34], [], [], [], [], []
Fri Aug 31 23:29:44 2012
Errors in file /u01/oradata/orcl/bdump/orcl_pmon_15577.trc:
ORA-00474: SMON process terminated with error
Fri Aug 31 23:29:44 2012
PMON: terminating instance due to error 474
Instance terminated by PMON, pid = 15577

通过alert日志可以定位到SMON_SCN_TIME表或者其回滚操作可能异常,结合alert和trace分析,发现这次错误的操作主要sql语句为:

ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [4194], [48], [34], [], [], [], [], []
Current SQL statement for this session:
UPDATE SYS.COL_USAGE$
   SET EQUALITY_PREDS    = EQUALITY_PREDS +
                           DECODE(BITAND(:FLAG, 1), 0, 0, 1),
       EQUIJOIN_PREDS    = EQUIJOIN_PREDS +
                           DECODE(BITAND(:FLAG, 2), 0, 0, 1),
       NONEQUIJOIN_PREDS = NONEQUIJOIN_PREDS +
                           DECODE(BITAND(:FLAG, 4), 0, 0, 1),
       RANGE_PREDS       = RANGE_PREDS + DECODE(BITAND(:FLAG, 8), 0, 0, 1),
       LIKE_PREDS        = LIKE_PREDS + DECODE(BITAND(:FLAG, 16), 0, 0, 1),
       NULL_PREDS        = NULL_PREDS + DECODE(BITAND(:FLAG, 32), 0, 0, 1),
       TIMESTAMP         = :TIME
 WHERE OBJ# = :OBJN
   AND INTCOL# = :COLN

ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [4194], [17], [10], [], [], [], [], []
Current SQL statement for this session:
UPDATE SYS.MON_MODS$
   SET INSERTS       = INSERTS + :INS,
       UPDATES       = UPDATES + :UPD,
       DELETES       = DELETES + :DEL,
       FLAGS        =
       (DECODE(BITAND(FLAGS, :FLAG), :FLAG, FLAGS, FLAGS + :FLAG)),
       DROP_SEGMENTS = DROP_SEGMENTS + :DROPSEG,
       TIMESTAMP     = :TIME
 WHERE OBJ# = :OBJN

ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Current SQL statement for this session:
INSERT INTO SMON_SCN_TIME
  (THREAD,
   TIME_MP,
   TIME_DP,
   SCN,
   SCN_WRP,
   SCN_BAS,
   NUM_MAPPINGS,
   TIM_SCN_MAP)
VALUES
  (0, :1, :2, :3, :4, :5, :6, :7)

这里主要涉及到对oracle的三张表的操作
COL_USAGE$:主要是在收集统计信息的时候作为是否需要收集列直方图信息参考
MON_MODS$:Oracle主要利用该表来记录那些表的数据发生改变,方便收集统计信息
SMON_SCN_TIME:记录SCN和TIME的对应关系
通过这里的分析可以确定这三张表中的数据对于数据库来说不是致命的基表信息,在数据库运行过程中可以清理掉这些信息,最多就是因为数据库性能的下降或者SCN和TIME互转功能不完善.

解决思路
完整的undo异常处理顺序
1.从alert中可以看出来数据库是在open之后由于SMON回滚到上述几条sql异常导致数据库down,所以可以尝试使用system回滚段启动数据库,看看是否可以屏蔽相关问题
2.如果方法1不可行,那使用event屏蔽smon对回滚段的相关操作,使得数据库正常启动
3.如果由于存在特殊事务,event无法屏蔽,尝试使用隐含参数处理该问题
4.如果隐含参数尚无法解决给问题考虑使用bbed
5.如果bbed不能解决,那只能选择dul或者其类似工具处理
这个案例中我们明确的看到是因为上面的三条sql回滚异常出现问题导致,对于这样的问题,经过测试使用方法1和2都能够顺利解决问题(open库之后需要重建undo,删除有问题undo表空间,修改参数[可能包括event],切换undo表空间).因为遇到几次ORA-607/ORA-600[4194]是因为system rollback损坏导致,所以这次开始也认为是一次比较复杂的恢复,最后证明这次是一种非常常规的恢复.对于ORACLE的数据库恢复有经验可能会比较快的定位问题,但是如果按照固定的思路去想可能会让自己走进死胡同.