硬件恢复之后,数据库无法open故障恢复

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:硬件恢复之后,数据库无法open故障恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

由于硬件故障,客户恢复硬件之后,数据库无法正常启动,报ORA-00354 ORA-00353错误

/tmp/> sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Mon Mar 1 17:10:30 2021

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> recover database;

ORA-00283: recovery session canceled due to errors
ORA-00354: corrupt redo log block header
ORA-00353: log corruption near block 86088 change 16135545783340 time
02/23/2021 13:53:24
ORA-00312: online log 2 thread 1: '/oradata02/redo02b.log'
ORA-00312: online log 2 thread 1: '/oradata01/redo02a.log'

由于redo损坏,数据库无法继续正常恢复,通过屏蔽一致性,force open库

SQL> alter database open resetlogs;
alter database open resetlogs 
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [2662], [3756], [3571444619], [3756], [3648471803], [4194545]
Process ID: 5104
Session ID: 576 Serial number: 3

这个错误比较简单,是由于scn问题导致,修改数据库scn启动库

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [4193], [], [], [], [], [], []
Process ID: 5536
Session ID: 576 Serial number: 1

这个错误比较明显,修改回滚段,尝试启动库

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 6033
Session ID: 576 Serial number: 3

数据库依旧无法正常open,alert日志报错如下

ARC3 started with pid=30, OS id=6078 
ARC1: Archival started
ARC2: Archival started
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
ARC2: Becoming the heartbeat ARCH
Exception[type:SIGSEGV Address not mapped to object][ADDR:0x60173487F5][PC:0xC003B1C20,_memcpy()+64][flags:0x0,count:1]
Exception[type:SIGSEGV,Address not mapped to object][ADDR:0x60173487F5][PC:0xC003B1C20,_memcpy()+64][flags:0x2,count:2]
Exception[type:SIGSEGV,Address not mapped to object][ADDR:0x60173487F5][PC:0xC003B1C20,_memcpy()+64][flags:0x2,count:2]
Archived Log entry 2 added for thread 1 sequence 2 ID 0x506cafbb dest 1:
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Mon Mar 01 17:44:44 2021
PMON (ospid: 5993): terminating the instance due to error 397
Mon Mar 01 17:44:45 2021
System state dump requested by (instance=1, osid=5993 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file /oracle/diag/rdbms/xff/xff/trace/xff_diag_6001.trc
Instance terminated by PMON, pid = 5993

通过其启动过程分析,发现数据库卡在如下对象:

PARSING IN CURSOR #11529215044940435280 len=148 dep=1 uid=0 oct=6 lid=0 tim=223080942765 
hv=3540833987 ad='c000000d67a42778' sqlid='5ansr7r9htpq3'
update undo$ set name=:2,file#=:3,block#=:4,status$=:5,user#=:6,undosqn=:7,xactsqn=:8,
scnbas=:9,scnwrp=:10,inst#=:11,ts#=:12,spare1=:13 where us#=:1
END OF STMT
PARSE #11529215044940435280:c=10000,e=8182,p=6,cr=55,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=223080942764
BINDS #11529215044940435280:
 Bind#0
  oacdty=01 mxl=32(20) mxlc=00 mal=00 scl=00 pre=00
  oacflg=18 fl2=0001 frm=01 csi=873 siz=32 off=0
  kxsbbbfp=c000000d5fd299aa  bln=32  avl=20  flg=09
  value="_SYSSMU1_3935275865$"
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=9fffffffbcc6e078  bln=24  avl=02  flg=05
  value=3
 Bind#2
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=9fffffffbcc6e048  bln=24  avl=03  flg=05
  value=128
 Bind#3
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=9fffffffbcc6e010  bln=24  avl=02  flg=05
  value=5
 Bind#4
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=9fffffffbcc6dfe0  bln=24  avl=02  flg=05
  value=1
 Bind#5
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=9fffffffbcc6dfb0  bln=24  avl=04  flg=05
  value=28921
 Bind#6
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=9fffffffbcc6df80  bln=24  avl=05  flg=05
  value=1245262
 Bind#7
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=9fffffffbcc6df48  bln=24  avl=06  flg=05
  value=1217986655
 Bind#8
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=9fffffffbcc6dc90  bln=24  avl=03  flg=05
  value=3621
 Bind#9
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=9fffffffbcc6dc60  bln=24  avl=01  flg=05
  value=0
 Bind#10
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=9fffffffbcc6dc30  bln=24  avl=02  flg=05
  value=2
 Bind#11
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=9fffffffbcc6dc00  bln=24  avl=02  flg=05
  value=2
 Bind#12
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=9fffffffbcc6e0a8  bln=22  avl=02  flg=05
  value=1
WAIT #11529215044940435280: nam='db file sequential read' ela= 21 file#=1 block#=530 blocks=1 obj#=0 tim=223080944352
Incident 528204 created, dump file: /oracle/diag/rdbms/xff/xff/incident/incdir_528204/xff_ora_6593_i528204.trc
ORA-00600: internal error code, arguments: [4193], [], [], [], [], [], [], [], [], [], [], []

至此基本上可以确认是由于出现回滚段异常,继续查看日志发现

Error 600 in redo application callback
Dump of change vector:
TYP:0 CLS:16 AFN:1 DBA:0x00400212 OBJ:4294967295 SCN:0x0ea6.f4f2da14 SEQ:1 OP:5.1 ENC:0 RBL:0
ktudb redo: siz: 320 spc: 5892 flg: 0x0012 seq: 0x0072 rec: 0x08
            xid:  0x0000.004.000000bc  
ktubl redo: slt: 4 rci: 0 opc: 11.1 [objn: 15 objd: 15 tsn: 0]
Undo type:  Regular undo        Begin trans    Last buffer split:  No 
Temp Object:  No 
Tablespace Undo:  No 
             0x00000000  prev ctl uba: 0x00400212.0072.07 
prev ctl max cmt scn:  0x0eac.d42963be  prev tx cmt scn:  0x0eac.d4296f48 
txn start scn:  0xffff.ffffffff  logon user: 0  prev brb: 4194446  prev bcl: 0 BuExt idx: 0 flg2: 0
KDO undo record:
KTB Redo 
op: 0x04  ver: 0x01  
compat bit: 4 (post-11) padding: 1
op: L  itl: xid:  0x0000.060.000000bb uba: 0x00400212.0072.04
                      flg: C---    lkc:  0     scn: 0x0eac.d9736b46
KDO Op code: URP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x004000e1  hdba: 0x004000e0
itli: 4  ispac: 0  maxfr: 4863
tabn: 0 slot: 1(0x1) flag: 0x2c lock: 0 ckix: 0
ncol: 17 nnew: 12 size: 0
col  1: [20]  5f 53 59 53 53 4d 55 31 5f 33 39 33 35 32 37 35 38 36 35 24
col  2: [ 2]  c1 02
col  3: [ 2]  c1 04
col  4: [ 3]  c2 02 1d
col  5: [ 6]  c5 0d 12 63 43 38
col  6: [ 3]  c2 25 16
col  7: [ 5]  c4 02 19 35 3f
col  8: [ 4]  c3 03 5a 16
col  9: [ 1]  80
col 10: [ 2]  c1 04
col 11: [ 2]  c1 03
col 16: [ 2]  c1 03
Block after image is corrupt: 
buffer tsn: 0 rdba: 0x00400212 (1/530)
scn: 0x0ea6.f4f2da14 seq: 0x01 flg: 0x04 tail: 0xda140201
frmt: 0x02 chkval: 0x9dd8 type: 0x02=KTU UNDO BLOCK

使用bbed对file 1 block 530进行处理

   struct ktuxcscn, 8 bytes                 @4148    
      ub4 kscnbas                           @4148     0xd42963be
      ub2 kscnwrp                           @4152     0x0eac
   struct ktuxcuba, 8 bytes                 @4156    
      ub4 kubadba                           @4156     0x00400212
      ub2 kubaseq                           @4160     0x0072
      ub1 kubarec                           @4162     0x07
   sb2 ktuxcflg                             @4164     1 (KTUXCFSK)
   ub2 ktuxcseq                             @4166     0x0072
   sb2 ktuxcnfb                             @4168     1
   ub4 ktuxcinc                             @4172     0x00000000
   sb2 ktuxcchd                             @4176     4
   sb2 ktuxcctl                             @4178     3
   ub2 ktuxcmgc                             @4180     0x8002
   ub4 ktuxcopt                             @4188     0x7ffffffe

数据库顺利open成功
20210301210425


后续建议客户逻辑导出数据,导入到新库

硬件故障导致ORA-600 2662错误处理

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:硬件故障导致ORA-600 2662错误处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

前几天恢复了一个40多T的CASE:ORA-00600: internal error code, arguments: [16513], [1403] 恢复,又一个近30T的库由于硬件故障,通过其他人一系列恢复之后,无法正常open,让我们提供技术支持:
故障最初原因是由于存储异常

Fri Feb 19 09:03:49 2021
Errors in file d:\app\administrator\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_ora_3460.trc:
ORA-01114: 将块写入文件 849 时出现 IO 错误 (块 # 3871748)
ORA-27070: 异步读取/写入失败
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1167) 设备没有连接。
ORA-01114: 将块写入文件 849 时出现 IO 错误 (块 # 3871748)
ORA-27070: 异步读取/写入失败
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1167) 设备没有连接。

通过其他人一系列处理后,数据库报ORA-600 2662错误

Sat Feb 20 08:19:35 2021
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sat Feb 20 08:19:35 2021
SMON: enabling cache recovery
Errors in file d:\app\administrator\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_ora_5304.trc(incident=1960181):
ORA-00600:internal error code,arguments:[2662],[4],[2185364344], [4],[2185453722],[893388032],[],[],[],[],[],[]
Errors in file d:\app\administrator\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_ora_5304.trc:
ORA-00600:internal error code,arguments:[2662],[4],[2185364344], [4],[2185453722],[893388032],[],[],[],[],[],[]
Errors in file d:\app\administrator\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_ora_5304.trc:
ORA-00600:internal error code,arguments:[2662],[4],[2185364344], [4],[2185453722],[893388032],[],[],[],[],[],[]
Error 600 happened during db open, shutting down database
USER (ospid: 5304): terminating the instance due to error 600
Instance terminated by USER, pid = 5304
ORA-1092 signalled during: ALTER DATABASE OPEN...
opiodr aborting process unknown ospid (5304) as a result of ORA-1092
Sat Feb 20 08:19:42 2021
ORA-1092 : opitsk aborting process

通过对scn处理,数据库顺利绕过该错误,然后报ORA-600 4194错误

Doing block recovery for file 213 block 4688
No block recovery was needed
Errors in file d:\app\administrator\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_smon_7048.trc(incident=1984136):
ORA-00600: internal error code, arguments: [4194], [38.4.1381252], [0], [], [],[],[],[],[],[],[],[]
Sat Feb 20 10:50:45 2021
Doing block recovery for file 213 block 4688
No block recovery was needed
Fatal internal error happened while SMON was doing active transaction recovery.
Errors in file d:\app\administrator\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_smon_7048.trc:
ORA-00600: internal error code, arguments: [4194], [38.4.1381252], [0], [], [],[],[],[],[],[],[],[]
SMON (ospid: 7048): terminating the instance due to error 474
Errors in file d:\app\administrator\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_ora_6652.trc(incident=1984185):
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Sat Feb 20 10:50:52 2021
Instance terminated by SMON, pid = 7048

通过对异常事务进行处理,屏蔽smon进程进行回滚,数据库open成功,但是报ORA-600 4137错误

Sat Feb 20 10:53:46 2021
Sweep [inc][1992133]: completed
Stopping background process MMNL
Sat Feb 20 10:53:47 2021
Trace dumping is performing id=[cdmp_20210220105347]
Errors in file d:\app\administrator\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_smon_6576.trc(incident=1992134):
ORA-00600: internal error code, arguments: [4137], [23.13.3094188], [0], [0], [], [], [], [], [], [], [], []
ORACLE Instance xifenfei (pid = 14) - Error 600 encountered while recovering transaction (23, 13).
Errors in file d:\app\administrator\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_smon_6576.trc:
ORA-00600: internal error code, arguments: [4137], [23.13.3094188], [0], [0], [], [], [], [], [], [], [], []
Sat Feb 20 10:53:47 2021
Sweep [inc2][1992133]: completed
Sat Feb 20 10:53:47 2021
Sweep [inc][1992134]: completed
Stopping background process MMON
Trace dumping is performing id=[cdmp_20210220105348]
Errors in file d:\app\administrator\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_smon_6576.trc(incident=1992135):
ORA-00600: internal error code, arguments: [4137], [38.4.1381252], [0], [0], [], [], [], [], [], [], [], []
Starting background process MMON
Starting background process MMNL
Sat Feb 20 10:53:48 2021
MMON started with pid=16, OS id=6448 
ALTER SYSTEM enable restricted session;
Sat Feb 20 10:53:48 2021
MMNL started with pid=36, OS id=6840 
ORACLE Instance xifenfei (pid = 14) - Error 600 encountered while recovering transaction (38, 4).
Errors in file d:\app\administrator\diag\rdbms\xifenfei\xifenfei\trace\xifenfei_smon_6576.trc:
ORA-00600: internal error code, arguments: [4137], [38.4.1381252], [0], [0], [], [], [], [], [], [], [], []
Sat Feb 20 10:53:49 2021
Sweep [inc][1992135]: completed
Trace dumping is performing id=[cdmp_20210220105349]
replication_dependency_tracking turned off (no async multimaster replication found)
Completed: alter database open

对异常回滚段进行处理,数据库后端启动正常,不再报明显ORA-错误.通过hcheck.sql检查字典正常

HCheck Version 07MAY18 on 20-FEB-2021 11:35:11
----------------------------------------------
Catalog Version 11.2.0.1.0 (1102000100)
db_name: JYJG

                                   Catalog       Fixed
Procedure Name                     Version    Vs Release    Timestamp
Result
------------------------------ ... ---------- -- ---------- --------------
------
.- LobNotInObj                 ... 1102000100 <=  *All Rel* 02/20 11:35:11 PASS
.- MissingOIDOnObjCol          ... 1102000100 <=  *All Rel* 02/20 11:35:11 PASS
.- SourceNotInObj              ... 1102000100 <=  *All Rel* 02/20 11:35:11 PASS
.- IndIndparMismatch           ... 1102000100 <= 1102000100 02/20 11:35:12 PASS
.- InvCorrAudit                ... 1102000100 <= 1102000100 02/20 11:35:12 PASS
.- OversizedFiles              ... 1102000100 <=  *All Rel* 02/20 11:35:12 PASS
.- PoorDefaultStorage          ... 1102000100 <=  *All Rel* 02/20 11:35:12 PASS
.- PoorStorage                 ... 1102000100 <=  *All Rel* 02/20 11:35:12 PASS
.- PartSubPartMismatch         ... 1102000100 <= 1102000100 02/20 11:35:12 PASS
.- TabPartCountMismatch        ... 1102000100 <=  *All Rel* 02/20 11:35:12 PASS
.- OrphanedTabComPart          ... 1102000100 <=  *All Rel* 02/20 11:35:12 PASS
.- MissingSum$                 ... 1102000100 <=  *All Rel* 02/20 11:35:12 PASS
.- MissingDir$                 ... 1102000100 <=  *All Rel* 02/20 11:35:12 PASS
.- DuplicateDataobj            ... 1102000100 <=  *All Rel* 02/20 11:35:12 PASS
.- ObjSynMissing               ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- ObjSeqMissing               ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- OrphanedUndo                ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- OrphanedIndex               ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- OrphanedIndexPartition      ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- OrphanedIndexSubPartition   ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- OrphanedTable               ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- OrphanedTablePartition      ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- OrphanedTableSubPartition   ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- MissingPartCol              ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- OrphanedSeg$                ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- OrphanedIndPartObj#         ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- DuplicateBlockUse           ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- FetUet                      ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- Uet0Check                   ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- ExtentlessSeg               ... 1102000100 <= 1102000100 02/20 11:35:13 PASS
.- SeglessUET                  ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- BadInd$                     ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- BadTab$                     ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- BadIcolDepCnt               ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- ObjIndDobj                  ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- TrgAfterUpgrade             ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- ObjType0                    ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- BadOwner                    ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- StmtAuditOnCommit           ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- BadPublicObjects            ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- BadSegFreelist              ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- BadDepends                  ... 1102000100 <=  *All Rel* 02/20 11:35:13 PASS
.- CheckDual                   ... 1102000100 <=  *All Rel* 02/20 11:35:14 PASS
.- ObjectNames                 ... 1102000100 <=  *All Rel* 02/20 11:35:14 PASS
.- BadCboHiLo                  ... 1102000100 <=  *All Rel* 02/20 11:35:14 PASS
.- ChkIotTs                    ... 1102000100 <=  *All Rel* 02/20 11:35:15 PASS
.- NoSegmentIndex              ... 1102000100 <=  *All Rel* 02/20 11:35:15 PASS
.- BadNextObject               ... 1102000100 <=  *All Rel* 02/20 11:35:15 PASS
.- DroppedROTS                 ... 1102000100 <=  *All Rel* 02/20 11:35:15 PASS
.- FilBlkZero                  ... 1102000100 <=  *All Rel* 02/20 11:35:15 PASS
.- DbmsSchemaCopy              ... 1102000100 <=  *All Rel* 02/20 11:35:15 PASS
.- OrphanedObjError            ... 1102000100 >  1102000000 02/20 11:35:15 PASS
.- ObjNotLob                   ... 1102000100 <=  *All Rel* 02/20 11:35:15 PASS
.- MaxControlfSeq              ... 1102000100 <=  *All Rel* 02/20 11:35:15 PASS
.- SegNotInDeferredStg         ... 1102000100 >  1102000000 02/20 11:35:18 PASS
.- SystemNotRfile1             ... 1102000100 >   902000000 02/20 11:35:18 PASS
.- DictOwnNonDefaultSYSTEM     ... 1102000100 <=  *All Rel* 02/20 11:35:19 PASS
.- OrphanTrigger               ... 1102000100 <=  *All Rel* 02/20 11:35:19 PASS
.- ObjNotTrigger               ... 1102000100 <=  *All Rel* 02/20 11:35:19 PASS
---------------------------------------
20-FEB-2021 11:35:19  Elapsed: 8 secs
---------------------------------------
Found 0 potential problem(s) and 0 warning(s)

PL/SQL procedure successfully completed.

Statement processed.

虽然字典正常,但是由于数据库屏蔽了一致性,建议客户在条件允许的情况下,进行逻辑迁移,排除风险隐患.

ORA-600 [4194]

在oracle恢复中ORA-600 4194是一个非常常见的错误,该错误的主要原因是由于redo记录和undo(rollback)记录不匹配.
ORA 600 4194错误原因以及含义

ERROR:              

  Format: ORA-600 [4194] [a] [b]
 
VERSIONS:           
  versions 6.0 to 12.1 

DESCRIPTION:

  A mismatch has been detected between Redo records and rollback (Undo) 
  records.

  We are validating the Undo record number relating to the change being 
  applied against the maximum undo record number recorded in the undo block.

  This error is reported when the validation fails.

ARGUMENTS:
  Arg [a] Maximum Undo record number in Undo block
  Arg [b] Undo record number from Redo block

ORA 600 4194 错误处理思路
第一步

Confirm whether the database is up and running or not.  If the database fails to start or crashes shortly 
after startup due to this error occurring, then try setting event 10513 at level 2 in the init.ora/spfile 
to disable transaction recovery and restart the instance, e.g.:

      event = "10513 trace name context forever, level 2"

This may allow the database to successfully open and stay up so that 
the required diagnostics/actions can be performed.

第二步

In the trace file there should be an undo segment header dump, and so check 
to see if the undo segment header shows an active transaction after recovery, e.g.:


TRN TBL    <---- Represents the Transaction table for the particular undo segment

index state cflags wrap# uel scn dba
--------------------------------------------------------------------------------------------- 
0x41 9 0x80 0x35ab6 0xffff 0x0695.38f6b959 0x1081e796 
0x42 9 0x80 0x35bb1 0x000e 0x0695.38f6b028 0x1081e793 
0x43 9 0x80 0x35b11 0x005d 0x0695.38f6b7ae 0x1081e795 
0x44 9 0x80 0x359f0 0x0036 0x0695.38f69a91 0x1081e78e 
0x45 10 0x80 0x35b1b 0x0000 0x0695.3a0aba4d 0x1081e796 
0x46 9 0x80 0x35bb7 0x001c 0x0695.38f69bde 0x1081e78f 
===================================

State ---> This column specifies the status of the transaction 
                  9 -----> represents a commited transaction
                  10 ---> Represents a active transaction
Dba -----> Undo block containing the undo records
                  Strictly speaking this is the block at the end of the undo chain.

You can see from the transaction table that there is an active transaction 
for this particular rollback/undo segment after recovery.
Therefore this rollback/undo segment and/or undo tablespace cannot be dropped without corrupting the database!  
Therefore recreating the UNDO tablespace is not an option.

第三步

From the trace file determine the affected undo segment, e.g.:

Block image after block recovery:

UNDO BLK: 
xid: 0x0015.02b.0001544b seq: 0x163e cnt: 0x12 irb: 0x12 icl: 0x0 flg: 0x0000

XID ==> Undo segment no + Slot no + Sequence no

Therefore, in this case the Undo Segment is:

USN# 0x15 (Hex) ==> 21 (Dec)  ==> _SYSSMU21$

So if and ONLY IF the transaction table shows no active transaction can the
 rollback/undo segment be offlined and dropped.Note however, 
that before you can confirm if the entire UNDO tablespace can be dropped, you would need to check the 
transaction tables of ALL active rollback/undo segments in the same wasy as the above.  
The steps required to drop the rollback/undo segment are fully detailed in Note:179952.1, 
but are briefly listed here for completeness:

If using Automatic Undo Management


Offline the undo segment using the _OFFLINE_ROLLBACK_SEGMENTS parameter and bounce the database as follows:

1.  Create  and edit the init.ora file for the instance to set the following parameters:

UNDO_MANAGEMENT=MANUAL 
_OFFLINE_ROLLBACK_SEGMENTS=(_SYSSMU21$)

2.  Open the database in restricted mode to prevent user access, e.g.:

connect / as sysdba
startup restrict pfile = '<Full path to init.ora file>';

3.  Drop the rollback/undo segment, e.g.:

drop rollback segment "_SYSSMU21";

4.  Shutdown the instance, and remove the init.ora parameters added in point 1 and restart the instance, e.g.:

shutdown immediate
startup


If SMON was recovering the transaction then this may not work as we cannot open the database if it is determined 
to be in an inconsistent state. I have reviewed a number of SRs where this approach was successful, 
so it is important to try it first but understand that it may fail and you will have to resort to 
a point in time recovery or forcing open the DB and recreating it.

第四步

Now we need to dump the undo block to see which object was affected.  
We noted in Step 2 that this is the active transaction (from the trace file): 

TRN TBL 

index state cflags wrap# uel scn dba 
0x45 10 0x80 0x35b1b 0x0000 0x0695.3a0aba4d 0x1081e796 

Dba----------------> Undo block containing the undo records 

dba--->0x1081e796 is the block containing the active transaction . 


Use the WebIV tools to convert this RDBA to block number (block#) and file number (file#), e.g.: 

V SPLIT ==> DBA (Hex) = File#,Block# (Hex File#,Block#) 
= ===== === ===== ============ 
V8 10,10 ==> 276948886 (0x1081e796) = 66,124822 (0x42 0x1e796) 


So the file# is 66 and the block# is 124822, so dump the block by issuing: 

SQL> Alter system dump datafile 66 block 124822; 


This will generate a trace file in the user_dump_dest.  The following is a sample of the information in the undo block:

UNDO BLK: 
xid: 0x000c.045.00035b1b seq: 0x1e14 cnt: 0x17 irb: 0x17 icl: 0x0 flg: 0x0000 

Rec Offset Rec Offset Rec Offset Rec Offset Rec Offset 
--------------------------------------------------------------------------- 
0x01 0x1f8c 0x02 0x1f30 0x03 0x1ed4 0x04 0x1e78 0x05 0x1e1c 
0x06 0x1dc0 0x07 0x1d64 0x08 0x1d08 0x09 0x1cac 0x0a 0x1c50 
0x0b 0x1bf4 0x0c 0x1b98 0x0d 0x1b3c 0x0e 0x1ae0 0x0f 0x1a74 
0x10 0x1a18 0x11 0x19bc 0x12 0x1960 0x13 0x1904 0x14 0x187c 
0x15 0x181c 0x16 0x1798 0x17 0x173c 

* Rec #0x16 slt: 0x45 objn: 1485619(0x0016ab33) objd: 1485619 tblspc: 71(0x00000047) 
* Layer: 11 (Row) opc: 1 rci 0x00 
Undo type: Regular undo Begin trans Last buffer split: No 
Temp Object: No 
Tablespace Undo: No 
rdba: 0x00000000 
*----------------------------- 
uba: 0x1081e796.1e14.14 ctl max scn: 0x0695.38f69853 prv tx scn: 0x0695.38f698a1 
KDO undo record: 
KTB Redo 
op: 0x04 ver: 0x01 
op: L itl: scn: 0x0019.009.00034237 uba: 0x36c0cce4.1d2f.19 
flg: C--- lkc: 0 scn: 0x0695.38f6b96b 
KDO Op code: URP xtype: XA bdba: 0x35406893 hdba: 0x35406892 
itli: 1 ispac: 0 maxfr: 4863 
tabn: 0 slot: 0(0x0) flag: 0x2c lock: 0 ckix: 0 
ncol: 1 nnew: 1 size: -1 
col 0: [ 4] c3 0e 36 2e 
*----------------------------- 

* Rec #0x17 slt: 0x45 objn: 1485619(0x0016ab33) objd: 1485619 tblspc: 71(0x00000047) 
* Layer: 11 (Row) opc: 1 rci 0x16 
Undo type: Regular undo Last buffer split: No 
Temp Object: No 
Tablespace Undo: No 
rdba: 0x00000000 
*----------------------------- 

From the trace file above:

UNDO BLK: 
xid: 0x000c.045.00035b1b seq: 0x1e14 cnt: 0x17 irb: 0x17 icl: 0x0 flg: 0x0000

The undo segment with the active transaction is segment is 0x000c (Hex) which is 12 (Dec) as the XID is:

      Undo segment no + Slot no + Sequence no

This step is often skipped because it was performed earlier in step 3, however it is a good idea to do this 
again now to make sure that the XID from the UNDO block matches the UNDO SEGMENT HEADER, 
this way you have followed all the chain, from the UNDO SEGMENT to UNDO BLOCK, back and forth.  
If there is a conflict here please check and make sure that the customer dumped the correct undo block.

Check for the value of irb which is an index which points you to the latest change done to the undo block.
This is the point from which a rollback would begin if one was issued.

From the trace file we see: 'irb: 0x17' so this points to record 0x17, 
so search for this particular string i.e 0x17 and it will take you to undo record 'REC #0x17', e.g.:

* Rec #0x17 slt: 0x45 objn: 1485619(0x0016ab33) objd: 1485619 tblspc: 71(0x00000047) 
* Layer: 11 (Row) opc: 1 rci 0x16 
Undo type: Regular undo Last buffer split: No 
Temp Object: No 
Tablespace Undo: No 
rdba: 0x00000000 
*----------------------------- 

Note the slot number (slt) is 0x45, the object number (objn) is the OBJECT_ID from dba_objects 
and data object number (objd) is the DATA_OBJECT_ID from dba_objects.  
These numbers may be the same but not necessarily, and so if the database is open then identify this object, e.g.:

        select object_name, owner, object_type, data_object_id from dba_objects where object_id = <objn>;

This is the object, which has an active transaction.  Note in the above trace file extract that rci 
has a value of 0x16 which means that this record is at the end of an undo chain.  
This means that the chain continues in another UNDO BLOCK.  
Please refer to unpublished Note:281504.1 for information on Undo chains.

So the next record that needs to be rolled back is present in REC #X016.  
If rci is 0x00 then it means that this is the first record present in the undo chain 
and so you can check to see if there is rdba info, e.g.:


* Rec #0x16 slt: 0x45 objn: 1485619(0x0016ab33) objd: 1485619 tblspc: 71(0x00000047) 
* Layer: 11 (Row) opc: 1 rci 0x00 
Undo type: Regular undo Begin trans Last buffer split: No 
Temp Object: No 
Tablespace Undo: No 
rdba: 0x00000000 
*----------------------------- 
uba: 0x1081e796.1e14.14 ctl max scn: 0x0695.38f69853 prv tx scn: 0x0695.38f698a1 
KDO undo record: 
KTB Redo 
op: 0x04 ver: 0x01 
op: L itl: scn: 0x0019.009.00034237 uba: 0x36c0cce4.1d2f.19 
flg: C--- lkc: 0 scn: 0x0695.38f6b96b 
KDO Op code: URP xtype: XA bdba: 0x35406893 hdba: 0x35406892 
itli: 1 ispac: 0 maxfr: 4863 
tabn: 0 slot: 0(0x0) flag: 0x2c lock: 0 ckix: 0 
ncol: 1 nnew: 1 size: -1 
col 0: [ 4] c3 0e 36 2e 
*----------------------------- 


If the object is an Index, drop and recreate it.  If it is a table, 
then again the table would need to be dropped and recreated (or truncated) 
so that its object number changes and hence the rollback/undo is no longer required.  
If this isn't possible, then you have two options:

First take a backup of the database in its current state.  
This is critical in case anything goes wrong and you lose the opportunity to salvage the data! 

Option 1

 - Restore the undo segment datafile and the datafile containing the object and perform a full recovery.
   This can only be done if you have all the archived redo as you will need to do full recovery on these files.

OR 

Option 2

If option 1 is not possible, you can use the unsupported method, e.g.:

Specify the undo segment in the _OFFLINE_ROLLBACK_SEGMENTS parameter and try to drop the rollback segment.
If there is an active transaction then this is not likely to work and you will probably need 
to set the _CORRUPTED_ROLLBACK_SEGMENTS parameter as well

温馨提示:
1.隐含参数_OFFLINE_ROLLBACK_SEGMENTS/_CORRUPTED_ROLLBACK_SEGMENTS属于Oracle内部隐含参数,建议在Oracle support认可的情况下使用,因为使用之后可能导致数据库事务完整性彻底损坏
2.进行屏蔽事务之前,如果条件允许建议使用txchecker检查
2.使用上述方法恢复数据库之后,建议通过逻辑方式导出导入重建数据库