resetlogs失败故障恢复-ORA-01555

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:resetlogs失败故障恢复-ORA-01555

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户数据库resetlogs报错

Tue Dec 19 15:21:23 2023
ALTER DATABASE   MOUNT
Successful mount of redo thread 1, with mount id 1683789043
Database mounted in Exclusive Mode
Lost write protection disabled
Completed: ALTER DATABASE   MOUNT
Tue Dec 19 15:22:01 2023
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
ORA-1248 signalled during: alter database open resetlogs...
Tue Dec 19 16:16:26 2023
alter database datafile 83 offline
Completed: alter database datafile 83 offline
Tue Dec 19 16:19:13 2023
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
Archived Log entry 50 added for thread 1 sequence 3657135 ID 0x5d907698 dest 1:
Tue Dec 19 16:20:01 2023
Errors in file /oracle/app/diag/rdbms/orcl/orcl/trace/orcl_ora_94696.trc:
ORA-00333: 重做日志读取块 8806400 计数 16384 出错
ORA-00312: 联机日志 2 线程 1: '/data/oradata/orcl/redo2.log'
ORA-27072: 文件 I/O 错误
Linux-x86_64 Error: 25: Inappropriate ioctl for device
Additional information: 4
Additional information: 8806400
Additional information: 4325376
Errors in file /oracle/app/diag/rdbms/orcl/orcl/trace/orcl_ora_94696.trc:
ORA-00333: 重做日志读取块 8806400 计数 16384 出错
ARCH: All Archive destinations made inactive due to error 333
ARCH: Closing local archive destination LOG_ARCHIVE_DEST_1: '/data/arch/1_3657136_874715183.dbf' (error 333) (orcl)
Committing creation of archivelog '/data/arch/1_3657136_874715183.dbf' (error 333)
Tue Dec 19 16:20:46 2023
Archived Log entry 51 added for thread 1 sequence 3657132 ID 0x5d907698 dest 1:
Tue Dec 19 16:21:28 2023
Archived Log entry 52 added for thread 1 sequence 3657133 ID 0x5d907698 dest 1:
Tue Dec 19 16:22:13 2023
Archived Log entry 53 added for thread 1 sequence 3657134 ID 0x5d907698 dest 1:
RESETLOGS after incomplete recovery UNTIL CHANGE 161052517347
Resetting resetlogs activation ID 1569748632 (0x5d907698)
Tue Dec 19 16:23:43 2023
Setting recovery target incarnation to 3
Tue Dec 19 16:23:43 2023
Assigning activation ID 1683789043 (0x645c94f3)
LGWR: STARTING ARCH PROCESSES
Tue Dec 19 16:23:43 2023
ARC0 started with pid=40, OS id=5391 
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
Thread 1 advanced to log sequence 2 (thread open)
Tue Dec 19 16:23:44 2023
ARC1 started with pid=41, OS id=5393 
Tue Dec 19 16:23:44 2023
ARC2 started with pid=42, OS id=5395 
ARC1: Archival started
Tue Dec 19 16:23:44 2023
ARC3 started with pid=43, OS id=5397 
ARC2: Archival started
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
ARC2: Becoming the heartbeat ARCH
Thread 1 opened at log sequence 2
  Current log# 2 seq# 2 mem# 0: /data/oradata/orcl/redo2.log
Successful open of redo thread 1
Tue Dec 19 16:23:44 2023
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Tue Dec 19 16:23:44 2023
SMON: enabling cache recovery
Tue Dec 19 16:23:44 2023
NSA2 started with pid=44, OS id=5399 
ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 0x0025.7f7d42df):
select ctime, mtime, stime from obj$ where obj# = :1
Errors in file /oracle/app/diag/rdbms/orcl/orcl/trace/orcl_ora_94696.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01555: 快照过旧: 回退段号 27 (名称为 "_SYSSMU27_4233559991$") 过小
Errors in file /oracle/app/diag/rdbms/orcl/orcl/trace/orcl_ora_94696.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01555: 快照过旧: 回退段号 27 (名称为 "_SYSSMU27_4233559991$") 过小
Error 704 happened during db open, shutting down database
USER (ospid: 94696): terminating the instance due to error 704
Instance terminated by USER, pid = 94696
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (94696) as a result of ORA-1092

通过以上信息,可以的出来以下结论:
1. 客户的硬件或者文件系统可能有问题,通过系统日志进一步确认底层异常

Dec 19 08:28:38 tdb2 kernel: sd 7:0:0:0: [sdb]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 19 08:28:38 tdb2 kernel: sd 7:0:0:0: [sdb]  Sense Key : Medium Error [current] 
Dec 19 08:28:38 tdb2 kernel: sd 7:0:0:0: [sdb]  Add. Sense: Unrecovered read error
Dec 19 08:28:38 tdb2 kernel: sd 7:0:0:0: [sdb] CDB: Read(10): 28 00 47 bc ff c0 00 01 00 00
Dec 19 08:28:38 tdb2 kernel: end_request: critical medium error, dev sdb, sector 1203568576
Dec 19 08:28:38 tdb2 kernel: end_request: critical medium error, dev dm-3, sector 1203568576
Dec 19 08:28:38 tdb2 kernel: sd 7:0:0:0: [sdb]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 19 08:28:38 tdb2 kernel: sd 7:0:0:0: [sdb]  Sense Key : Medium Error [current] 
Dec 19 08:28:38 tdb2 kernel: sd 7:0:0:0: [sdb]  Add. Sense: Unrecovered read error
Dec 19 08:28:38 tdb2 kernel: sd 7:0:0:0: [sdb] CDB: Read(10): 28 00 47 bd 00 c0 00 01 00 00
Dec 19 08:28:38 tdb2 kernel: end_request: critical medium error, dev sdb, sector 1203568832
Dec 19 08:28:38 tdb2 kernel: end_request: critical medium error, dev dm-3, sector 1203568832
Dec 19 08:28:38 tdb2 kernel: sd 7:0:0:0: [sdb]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 19 08:28:38 tdb2 kernel: sd 7:0:0:0: [sdb]  Sense Key : Medium Error [current] 
Dec 19 08:28:38 tdb2 kernel: sd 7:0:0:0: [sdb]  Add. Sense: Unrecovered read error
Dec 19 08:28:38 tdb2 kernel: sd 7:0:0:0: [sdb] CDB: Read(10): 28 00 47 bd 00 80 00 00 08 00
Dec 19 08:28:38 tdb2 kernel: end_request: critical medium error, dev sdb, sector 1203568768
Dec 19 08:28:38 tdb2 kernel: end_request: critical medium error, dev dm-3, sector 1203568768
Dec 19 16:20:01 tdb2 kernel: sd 7:0:0:0: [sdb]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 19 16:20:01 tdb2 kernel: sd 7:0:0:0: [sdb]  Sense Key : Medium Error [current] 
Dec 19 16:20:01 tdb2 kernel: sd 7:0:0:0: [sdb]  Add. Sense: Unrecovered read error
Dec 19 16:20:01 tdb2 kernel: sd 7:0:0:0: [sdb] CDB: Read(10): 28 00 9b 1a 28 20 00 01 00 00
Dec 19 16:20:01 tdb2 kernel: end_request: critical medium error, dev sdb, sector 2602182688
Dec 19 16:20:01 tdb2 kernel: end_request: critical medium error, dev dm-3, sector 2602182688
Dec 19 16:20:01 tdb2 kernel: sd 7:0:0:0: [sdb]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 19 16:20:01 tdb2 kernel: sd 7:0:0:0: [sdb]  Sense Key : Medium Error [current] 
Dec 19 16:20:01 tdb2 kernel: sd 7:0:0:0: [sdb]  Add. Sense: Unrecovered read error
Dec 19 16:20:01 tdb2 kernel: sd 7:0:0:0: [sdb] CDB: Read(10): 28 00 9b 1a 29 20 00 01 00 00
Dec 19 16:20:01 tdb2 kernel: end_request: critical medium error, dev sdb, sector 2602182944
Dec 19 16:20:01 tdb2 kernel: end_request: critical medium error, dev dm-3, sector 2602182944
Dec 19 16:20:01 tdb2 kernel: sd 7:0:0:0: [sdb]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 19 16:20:01 tdb2 kernel: sd 7:0:0:0: [sdb]  Sense Key : Medium Error [current] 
Dec 19 16:20:01 tdb2 kernel: sd 7:0:0:0: [sdb]  Add. Sense: Unrecovered read error
Dec 19 16:20:01 tdb2 kernel: sd 7:0:0:0: [sdb] CDB: Read(10): 28 00 9b 1a 29 00 00 00 08 00
Dec 19 16:20:01 tdb2 kernel: end_request: critical medium error, dev sdb, sector 2602182912
Dec 19 16:20:01 tdb2 kernel: end_request: critical medium error, dev dm-3, sector 2602182912

2. 数据库在强制拉库的时候,很可能是屏蔽了一致性,导致文件头scn过小
3. 在resetlogs之前,先offline了83号文件,这个将导致该文件的reseltogs scn和其他文件不一致,通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)
20231229095533



这个库由于客户在resetlogs之前offline了数据文件,导致一些麻烦,先使用Oracle Recovery Tools修改resetlogs scn
20231229100250

然后重建ctl,修改scn,打开数据库
20231229102556

hcheck检测字典一切正常

HCheck Version 07MAY18 on 26-12月-2023 18:44:20
----------------------------------------------
Catalog Version 11.2.0.1.0 (1102000100)
db_name: ORCL
                                   Catalog       Fixed           
Procedure Name                     Version    Vs Release    Timestamp      Result
------------------------------ ... ---------- -- ---------- -------------- ------
.- LobNotInObj                 ... 1102000100 <=  *All Rel* 12/26 18:44:20 
PASS
.- MissingOIDOnObjCol          ... 1102000100 <=  *All Rel* 12/26 18:44:20 
PASS
.- SourceNotInObj              ... 1102000100 <=  *All Rel* 12/26 18:44:20 
PASS
.- IndIndparMismatch           ... 1102000100 <= 1102000100 12/26 18:44:21 
PASS
.- InvCorrAudit                ... 1102000100 <= 1102000100 12/26 18:44:21 
PASS
.- OversizedFiles              ... 1102000100 <=  *All Rel* 12/26 18:44:21 
PASS
.- PoorDefaultStorage          ... 1102000100 <=  *All Rel* 12/26 18:44:21 
PASS
.- PoorStorage                 ... 1102000100 <=  *All Rel* 12/26 18:44:21 
PASS
.- PartSubPartMismatch         ... 1102000100 <= 1102000100 12/26 18:44:21 
PASS
.- TabPartCountMismatch        ... 1102000100 <=  *All Rel* 12/26 18:44:21 

*** 2023-12-26 18:44:21.507
PASS
.- OrphanedTabComPart          ... 1102000100 <=  *All Rel* 12/26 18:44:21 
PASS
.- MissingSum$                 ... 1102000100 <=  *All Rel* 12/26 18:44:21 
PASS
.- MissingDir$                 ... 1102000100 <=  *All Rel* 12/26 18:44:21 
PASS
.- DuplicateDataobj            ... 1102000100 <=  *All Rel* 12/26 18:44:21 
PASS
.- ObjSynMissing               ... 1102000100 <=  *All Rel* 12/26 18:44:21 
PASS
.- ObjSeqMissing               ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- OrphanedUndo                ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- OrphanedIndex               ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- OrphanedIndexPartition      ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- OrphanedIndexSubPartition   ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- OrphanedTable               ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- OrphanedTablePartition      ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- OrphanedTableSubPartition   ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- MissingPartCol              ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- OrphanedSeg$                ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- OrphanedIndPartObj#         ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- DuplicateBlockUse           ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- FetUet                      ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- Uet0Check                   ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- ExtentlessSeg               ... 1102000100 <= 1102000100 12/26 18:44:22 
PASS
.- SeglessUET                  ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- BadInd$                     ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- BadTab$                     ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- BadIcolDepCnt               ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- ObjIndDobj                  ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- TrgAfterUpgrade             ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- ObjType0                    ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- BadOwner                    ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- StmtAuditOnCommit           ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- BadPublicObjects            ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- BadSegFreelist              ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- BadDepends                  ... 1102000100 <=  *All Rel* 12/26 18:44:22 

*** 2023-12-26 18:44:22.571
PASS
.- CheckDual                   ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- ObjectNames                 ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- BadCboHiLo                  ... 1102000100 <=  *All Rel* 12/26 18:44:22 
PASS
.- ChkIotTs                    ... 1102000100 <=  *All Rel* 12/26 18:44:23 
PASS
.- NoSegmentIndex              ... 1102000100 <=  *All Rel* 12/26 18:44:23 
PASS
.- BadNextObject               ... 1102000100 <=  *All Rel* 12/26 18:44:23 
PASS
.- DroppedROTS                 ... 1102000100 <=  *All Rel* 12/26 18:44:23 
PASS
.- FilBlkZero                  ... 1102000100 <=  *All Rel* 12/26 18:44:23 
PASS
.- DbmsSchemaCopy              ... 1102000100 <=  *All Rel* 12/26 18:44:23 
PASS
.- OrphanedObjError            ... 1102000100 >  1102000000 12/26 18:44:23 
PASS
.- ObjNotLob                   ... 1102000100 <=  *All Rel* 12/26 18:44:23 
PASS
.- MaxControlfSeq              ... 1102000100 <=  *All Rel* 12/26 18:44:23 
PASS
.- SegNotInDeferredStg         ... 1102000100 >  1102000000 12/26 18:44:23 
PASS
.- SystemNotRfile1             ... 1102000100 >   902000000 12/26 18:44:23 

*** 2023-12-26 18:44:23.779
PASS
.- DictOwnNonDefaultSYSTEM     ... 1102000100 <=  *All Rel* 12/26 18:44:23 
PASS
.- OrphanTrigger               ... 1102000100 <=  *All Rel* 12/26 18:44:23 
PASS
.- ObjNotTrigger               ... 1102000100 <=  *All Rel* 12/26 18:44:23 
PASS
---------------------------------------
26-12月-2023 18:44:23  Elapsed: 3 secs
---------------------------------------
Found 0 potential problem(s) and 0 warning(s)

然后增加temp,导出数据数据,完成本次数据库救援

Oracle 19C 报ORA-704 ORA-01555故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:Oracle 19C 报ORA-704 ORA-01555故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

异常断电导致数据库无法启动,尝试对数据文件进行recover操作,报ORA-00283 ORA-00742 ORA-00312错误,由于redo写丢失无法正常应用

D:\check_db>sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on 星期日 7月 30 07:49:19 2023
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


连接到:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

SQL> recover datafile 1;
ORA-00283: 恢复会话因错误而取消
ORA-00742: 日志读取在线程 1 序列 9274 块 18057 中检测到写入丢失情况
ORA-00312: 联机日志 1 线程 1: 'D:\APP\ADMINISTRATOR\ORADATA\XFF\REDO01.LOG'

屏蔽数据一致性,尝试强制打开库,报ORA-00604,ORA-00704,ORA-01555错误

SQL> alter database open resetlogs;
alter database open resetlogs
*
第 1 行出现错误:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 9 with name
"_SYSSMU9_4165470211$" too small
进程 ID: 4036
会话 ID: 2277 序列号: 40707

alert日志对应错误

2023-07-30T06:54:43.457383+08:00
.... (PID:5836): Clearing online redo logfile 1 complete
.... (PID:5836): Clearing online redo logfile 2 complete
.... (PID:5836): Clearing online redo logfile 3 complete
Resetting resetlogs activation ID 3572089731 (0xd4e9c383)
Online log D:\APP\ADMINISTRATOR\ORADATA\XFF\REDO01.LOG: Thread 1 Group 1 was previously cleared
Online log D:\APP\ADMINISTRATOR\ORADATA\XFF\REDO02.LOG: Thread 1 Group 2 was previously cleared
Online log D:\APP\ADMINISTRATOR\ORADATA\XFF\REDO03.LOG: Thread 1 Group 3 was previously cleared
2023-07-30T06:54:43.863676+08:00
Setting recovery target incarnation to 2
2023-07-30T06:54:44.816771+08:00
Ping without log force is disabled:
  instance mounted in exclusive mode.
Endian type of dictionary set to little
2023-07-30T06:54:44.957395+08:00
Assigning activation ID 3664275149 (0xda6866cd)
2023-07-30T06:54:44.957395+08:00
TT00 (PID:4640): Gap Manager starting
2023-07-30T06:54:45.004305+08:00
Redo log for group 1, sequence 1 is not located on DAX storage
2023-07-30T06:54:46.176153+08:00
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: D:\APP\ADMINISTRATOR\ORADATA\XFF\REDO01.LOG
Successful open of redo thread 1
2023-07-30T06:54:46.191771+08:00
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
stopping change tracking
2023-07-30T06:54:46.223036+08:00
TT03 (PID:1816): Sleep 5 seconds and then try to clear SRLs in 2 time(s)
2023-07-30T06:54:46.332398+08:00
ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 
0x0000000017b852a7
):
2023-07-30T06:54:46.332398+08:00
select ctime, mtime, stime from obj$ where obj# = :1
2023-07-30T06:54:46.332398+08:00
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\trace\xff_ora_5836.trc:
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01555: 快照过旧: 回退段号 9 (名称为 "_SYSSMU9_4165470211$") 过小
2023-07-30T06:54:46.332398+08:00
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\trace\xff_ora_5836.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01555: 快照过旧: 回退段号 9 (名称为 "_SYSSMU9_4165470211$") 过小
2023-07-30T06:54:46.348028+08:00
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\trace\xff_ora_5836.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01555: 快照过旧: 回退段号 9 (名称为 "_SYSSMU9_4165470211$") 过小
Error 704 happened during db open, shutting down database
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\trace\xff_ora_5836.trc  (incident=474502):
ORA-00603: ORACLE 服务器会话因致命错误而终止
ORA-01092: ORACLE 实例终止。强制断开连接
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01555: 快照过旧: 回退段号 9 (名称为 "_SYSSMU9_4165470211$") 过小
Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\incident\incdir_474502\xff_ora_5836_i474502.trc
2023-07-30T06:54:47.785549+08:00
opiodr aborting process unknown ospid (5836) as a result of ORA-603
2023-07-30T06:54:47.816792+08:00
ORA-603 : opitsk aborting process
License high water mark = 6
USER (ospid: (prelim)): terminating the instance due to ORA error 

这类错误比较常见,参考以前类似恢复:
在数据库open过程中常遇到ORA-01555汇总
数据库open过程遭遇ORA-1555对应sql语句补充
Oracle Recovery Tools恢复—ORA-00704 ORA-01555故障
使用_allow_resetlogs_corruption导致ORA-00704/ORA-01555故障
对于本次故障,通过Oracle Recovery Tools工具快速处理
patch


open数据库成功

SQL> alter database open;

数据库已更改。

SQL>
SQL>
SQL> select status,count(1) from v$datafile group by status;

STATUS           COUNT(1)
-------------- ----------
SYSTEM                  1
ONLINE                 61

ORA-00600: internal error code, arguments: [16513], [1403] 恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-00600: internal error code, arguments: [16513], [1403] 恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

接到客户请求,存储异常断电之后,一个40多T的经分数据库无法正常启动,通过各方一系列操作之后,数据库依旧无法open,报错信息为:ORA-00600: internal error code, arguments: [16513], [1403], [28]
20210214193332


Sun Feb 14 00:20:09 BEIST 2021
SMON: enabling cache recovery
Sun Feb 14 00:20:09 BEIST 2021
ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 0x0f27.13cd4fc3):
Sun Feb 14 00:20:09 BEIST 2021
select ctime, mtime, stime from obj$ where obj# = :1
Sun Feb 14 00:20:09 BEIST 2021
Errors in file /oracle10g/db/admin/xifenfei/udump/xifenfei1_ora_177254.trc:
ORA-00600: internal error code, arguments: [16513], [1403], [28], [], [], [], [], []
Sun Feb 14 00:20:10 BEIST 2021
Errors in file /oracle10g/db/admin/xifenfei/udump/xifenfei1_ora_177254.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [16513], [1403], [28], [], [], [], [], []
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 177254
ORA-1092 signalled during: ALTER DATABASE OPEN...

通过对启动过程进行跟踪

=====================
PARSING IN CURSOR #5 len=52 dep=1 uid=0 oct=3 lid=0 tim=194171381576991 hv=429618617 ad='afcee60'
select ctime, mtime, stime from obj$ where obj# = :1
END OF STMT
PARSE #5:c=0,e=257,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=194171381576990
BINDS #5:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=1104b8ed0  bln=22  avl=02  flg=05
  value=28
EXEC #5:c=0,e=422,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=194171381577472
WAIT #5: nam='db file sequential read' ela= 274 file#=1 block#=110 blocks=1 obj#=36 tim=194171381577809
WAIT #5: nam='db file sequential read' ela= 257 file#=1 block#=78943 blocks=1 obj#=36 tim=194171381578123
WAIT #5: nam='db file sequential read' ela= 253 file#=1 block#=111 blocks=1 obj#=36 tim=194171381578416
WAIT #5: nam='db file sequential read' ela= 226 file#=1 block#=62 blocks=1 obj#=18 tim=194171381578692
=====================
PARSING IN CURSOR #6 len=142 dep=2 uid=0 oct=3 lid=0 tim=194171381579134 hv=361892850 ad='df87eb0'
select /*+ rule */ name,file#,block#,status$,user#,undosqn,xactsqn,scnbas,scnwrp,
DECODE(inst#,0,NULL,inst#),ts#,spare1 from undo$ where us#=:1
END OF STMT
PARSE #6:c=0,e=368,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=3,tim=194171381579133
BINDS #6:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=1105af9a8  bln=22  avl=02  flg=05
  value=92
EXEC #6:c=0,e=531,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=3,tim=194171381579736
WAIT #6: nam='db file sequential read' ela= 232 file#=1 block#=102 blocks=1 obj#=34 tim=194171381580011
WAIT #6: nam='db file sequential read' ela= 251 file#=1 block#=54 blocks=1 obj#=15 tim=194171381580307
FETCH #6:c=0,e=615,p=2,cr=2,cu=0,mis=0,r=1,dep=2,og=3,tim=194171381580369
STAT #6 id=1 cnt=1 pid=0 pos=1 obj=15 op='TABLE ACCESS BY INDEX ROWID UNDO$ (cr=2 pr=2 pw=0 time=584 us)'
STAT #6 id=2 cnt=1 pid=1 pos=1 obj=34 op='INDEX UNIQUE SCAN I_UNDO1 (cr=1 pr=1 pw=0 time=282 us)'
WAIT #5: nam='db file sequential read' ela= 260 file#=4290 block#=365090 blocks=1 obj#=0 tim=194171381580721
FETCH #5:c=10000,e=3327,p=7,cr=7,cu=0,mis=0,r=0,dep=1,og=4,tim=194171381580817
*** 2021-02-14 02:32:40.430
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [16513], [1403], [28], [], [], [], [], []
Current SQL statement for this session:
alter database open
----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
ksedst+001c          bl       ksedst1              70000090FC00AE4 ? 000000000 ?
ksedmp+0290          bl       ksedst               104C1CA58 ?
ksfdmp+02d8          bl       03F4BD0C             
kgeriv+0108          bl       _ptrgl               
kgesiv+0080          bl       kgeriv               7000008F6DB5890 ? 00000000C ?
                                                   7000008F6DB5890 ? 0FFFFFDC3 ?
                                                   0FFFFFFBF ?
ksesic2+0060         bl       kgesiv               5FFFEC580 ? 500000000 ?
                                                   FFFFFFFFFFEC590 ? 0F5FAF808 ?
                                                   7000008F5FAF7E8 ?
kqdpts+0158          bl       ksesic2              408100004081 ? 000000000 ?
                                                   00000057B ? 000000000 ?
                                                   00000001C ?
                                                   A006150571B05EF0 ?
                                                   1100CF0A8 ? 000000001 ?
kqrlfc+0274          bl       kqdpts               000000000 ?
kqlbplc+00b4         bl       03F4E6A8             
kqlblfc+0230         bl       kqlbplc              00000009D ?
adbdrv+1a9c          bl       03F4B66C             
opiexe+2db4          bl       adbdrv               
opiosq0+1ac8         bl       opiexe               1103B418C ? 000000000 ?
                                                   FFFFFFFFFFF9008 ?
kpooprx+016c         bl       opiosq0              300F1E1D4 ? 000000000 ?
                                                   000000000 ? A4FFFFFFFF9798 ?
                                                   000000000 ?
kpoal8+03cc          bl       kpooprx              FFFFFFFFFFFB814 ?
                                                   FFFFFFFFFFFB5C0 ?
                                                   1300000013 ? 100000001 ?
                                                   000000000 ? A40000000000A4 ?
                                                   000000000 ? 1103B4378 ?
opiodr+0b2c          bl       _ptrgl               
ttcpip+1020          bl       _ptrgl               
opitsk+117c          bl       01FBD04C             
opiino+09d0          bl       opitsk               1EFFFFD7E0 ? 000000000 ?
opiodr+0b2c          bl       _ptrgl               
opidrv+04a4          bl       opiodr               3C102B1398 ? 404C6FF30 ?
                                                   FFFFFFFFFFFF7A0 ? 0102B1390 ?
sou2o+0090           bl       opidrv               3C02705B3C ? 440663000 ?
                                                   FFFFFFFFFFFF7A0 ?
opimai_real+01bc     bl       01FB9BF4             
main+0098            bl       opimai_real          000000000 ? 000000000 ?
__start+0098         bl       main                 000000000 ? 000000000 ?
 
--------------------- Binary Stack Dump ---------------------

报错比较明显,数据库在查询obj$取obj#=28的数据的时候无法正常获取到该数据,从而报错ORA-600 16513错误.通过对相关block进行分析,发现有事务异常

BBED>  p ktbbh.ktbbhitl[1]
struct ktbbhitl[1], 24 bytes                @44
   struct ktbitxid, 8 bytes                 @44
      ub2 kxidusn                           @44       0x5c00
      ub2 kxidslt                           @46       0x0a00
      ub4 kxidsqn                           @48       0x48b4fc01
   struct ktbituba, 8 bytes                 @52
      ub4 kubadba                           @52       0x22928531
      ub2 kubaseq                           @56       0xe383
      ub1 kubarec                           @58       0x01
   ub2 ktbitflg                             @60       0x0120 (NONE)
   union _ktbitun, 2 bytes                  @62
      b2 _ktbitfsc                          @62       0
      ub2 _ktbitwrp                         @62       0x0000
   ub4 ktbitbas                             @64       0x95c2ff13

通过一些技巧处理规避掉该事务,然后启动库报熟悉的ORA-01555错误
20210214161333


相对比较简单参考(在数据库open过程中常遇到ORA-01555汇总),数据库顺利open成功,完成春节后第一个大库的恢复
20210214193344

oracle exadata force open

这个库的恢复有一些历史故事(【力荐】Exadata火线救援:10TB级数据修复经典案例详解!):xx运营商x2的1/4配置的oracle exadata机器,跑了近6年,最近有一个cell节点主机异常,在rebalance过程中,只有两个节点的cell其中一个节点坏了一个硬盘导致.导致asm diskgroup无法正常mount,最后该运营商运维三方通过amdu把该一体机中的数据文件全部抽出来,然后在恢复过程中出现大量错误无法解决,请求我们支持
数据库open过程报ORA-01555错误

Thu Jul  14 00:01:04 2016
alter database open
Thu Jul  14 00:01:04 2016
Thread 1 advanced to log sequence 2 (thread open)
Thread 1 opened at log sequence 2
  Current log# 2 seq# 2 mem# 0: /data/amdu/redo/DATA_EC_260.f
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Thu Jul  14 00:01:05 2016
SMON: enabling cache recovery
ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 0x0b26.9f080238):
select ctime, mtime, stime from obj$ where obj# = :1
Errors in file /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_59546.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 83 with name &quot;_SYSSMU83_1078760807$&quot; too small
Errors in file /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_59546.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 83 with name &quot;_SYSSMU83_1078760807$&quot; too small
Error 704 happened during db open, shutting down database
USER (ospid: 59546): terminating the instance due to error 704
Instance terminated by USER, pid = 59546
ORA-1092 signalled during: alter database open...
opiodr aborting process unknown ospid (59546) as a result of ORA-1092

这个错误比较常见,可以通过推scn就可以解决,由于已经安装了scn patch,通过oradebug推scn解决该问题.

ORA-600 4194
这个ora 600 4194相对比较特殊,在SMON: enabling cache recovery之后立马报出来,然后实例直接open失败.

Thu Jul  14 00:06:15 2016
alter database open
Thu Jul  14 00:06:15 2016
Thread 1 advanced to log sequence 3 (thread open)
Thread 1 opened at log sequence 3
  Current log# 3 seq# 3 mem# 0: /data/amdu/redo/DATA_EC_263.f
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Thu Jul  14 00:06:15 2016
SMON: enabling cache recovery
Errors in file /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_60038.trc  (incident=1080450):
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Block recovery from logseq 3, block 3 to scn 12269096776739
Recovery of Online Redo Log: Thread 1 Group 3 Seq 3 Reading mem 0
  Mem# 0: /data/amdu/redo/DATA_EC_263.f
Block recovery stopped at EOT rba 3.5.16
Block recovery completed at rba 3.5.16, scn 2856.2670179361
Block recovery from logseq 3, block 3 to scn 12269096776736
Recovery of Online Redo Log: Thread 1 Group 3 Seq 3 Reading mem 0
  Mem# 0: /data/amdu/redo/DATA_EC_263.f
Block recovery completed at rba 3.5.16, scn 2856.2670179361
Errors in file /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_60038.trc:
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Errors in file /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_60038.trc:
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
USER (ospid: 60038): terminating the instance due to error 600
Instance terminated by USER, pid = 60038
ORA-1092 signalled during: alter database open...
opiodr aborting process unknown ospid (60038) as a result of ORA-1092

trace文件分析
打开数据库报ORA-600[4194]错误,对启动过程进行10046跟踪并且分析trace文件发现

PARSING IN CURSOR #140375370511672 len=148 dep=1 uid=0 oct=6 lid=0 tim=3501342849457766 hv=3540833987 
ad='a47df47a8' sqlid='5ansr7r9htpq3'
update undo$ set name=:2,file#=:3,block#=:4,status$=:5,user#=:6,undosqn=:7,xactsqn=:8,scnbas=:9,
scnwrp=:10,inst#=:11,ts#=:12,spare1=:13 where us#=:1
END OF STMT
PARSE #140375370511672:c=27996,e=28041,p=66,cr=224,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=3501342849457765
BINDS #140375370511672:
 Bind#0
  oacdty=01 mxl=32(20) mxlc=00 mal=00 scl=00 pre=00
  oacflg=18 fl2=0001 frm=01 csi=178 siz=32 off=0
  kxsbbbfp=a47e093ca  bln=32  avl=20  flg=09
  value=&quot;_SYSSMU1_2856534670$&quot;
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fabae3b92b0  bln=24  avl=03  flg=05
  value=1024
 Bind#2
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fabae3b9280  bln=24  avl=03  flg=05
  value=128
 Bind#3
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fabae3b9248  bln=24  avl=02  flg=05
  value=5
 Bind#4
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fabae3b9218  bln=24  avl=02  flg=05
  value=1
 Bind#5
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fabae3b91e8  bln=24  avl=03  flg=05
  value=3398
 Bind#6
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fabae3b91b8  bln=24  avl=05  flg=05
  value=1485261
 Bind#7
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fabae3b9180  bln=24  avl=06  flg=05
  value=1946693999
 Bind#8
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fabae3b8ec8  bln=24  avl=03  flg=05
  value=2847
 Bind#9
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fabae3b8e98  bln=24  avl=02  flg=05
  value=1
 Bind#10
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fabae3b8e68  bln=24  avl=02  flg=05
  value=2
 Bind#11
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fabae3b8e38  bln=24  avl=02  flg=05
  value=2
 Bind#12
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fabae3b92e0  bln=22  avl=02  flg=05
  value=1
WAIT #140375370511672: nam='db file sequential read' ela= 21 file#=1 block#=179020 blocks=1 obj#=0 tim=3501342849459353

*** 2016-07-14 03:14:09.548
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []

很明显数据库是在update undo$的时候,读取到file 1 block 179020的时候报错,继续分析trace文件

Error 600 in redo application callback
Dump of change vector:
TYP:0 CLS:16 AFN:1 DBA:0x0042bb4c OBJ:4294967295 SCN:0x0b26.a88e815e SEQ:1 OP:5.1 ENC:0 RBL:0
ktudb redo: siz: 272 spc: 4906 flg: 0x0012 seq: 0x0f4c rec: 0x0d
            xid:  0x0000.01b.00000b63  
ktubl redo: slt: 27 rci: 0 opc: 11.1 [objn: 15 objd: 15 tsn: 0]
Undo type:  Regular undo        Begin trans    Last buffer split:  No 
Temp Object:  No 
Tablespace Undo:  No 
             0x00000000  prev ctl uba: 0x0042bb4c.0f4c.0c 
prev ctl max cmt scn:  0x0b23.ccb9eb89  prev tx cmt scn:  0x0b23.ccb9ebac 
txn start scn:  0xffff.ffffffff  logon user: 0  prev brb: 4373321  prev bcl: 0 BuExt idx: 0 flg2: 0
KDO undo record:
KTB Redo 
op: 0x04  ver: 0x01  
compat bit: 4 (post-11) padding: 1
op: L  itl: xid:  0x0000.01c.00000b65 uba: 0x0042bb4a.0f4c.1d
                      flg: C---    lkc:  0     scn: 0x0b25.7391ee5c
KDO Op code: URP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x004000e1  hdba: 0x004000e0
itli: 2  ispac: 0  maxfr: 4863
tabn: 0 slot: 1(0x1) flag: 0x2c lock: 0 ckix: 0
ncol: 17 nnew: 12 size: 0
col  1: [20]  5f 53 59 53 53 4d 55 31 5f 32 38 35 36 35 33 34 36 37 30 24
col  2: [ 2]  c1 02
col  3: [ 3]  c2 0b 19
col  4: [ 3]  c2 02 1d
col  5: [ 6]  c5 14 2f 46 28 64
col  6: [ 3]  c2 1d 30
col  7: [ 5]  c4 02 31 35 3e
col  8: [ 3]  c2 22 63
col  9: [ 2]  c1 02
col 10: [ 2]  c1 04
col 11: [ 2]  c1 03
col 16: [ 2]  c1 03
Block after image is corrupt: 
buffer tsn: 0 rdba: 0x0042bb4c (1/179020)
scn: 0x0b26.a88e815e seq: 0x01 flg: 0x04 tail: 0x815e0201
frmt: 0x02 chkval: 0xf022 type: 0x02=KTU UNDO BLOCK
Hex dump of block: st=0, typ_found=1
Dump of memory from 0x000000094E07A000 to 0x000000094E07C000
94E07A000 0000A202 0042BB4C A88E815E 04010B26  [....L.B.^...&amp;...]
94E07A010 0000F022 004E0000 00000B5B 1D1D0F4C  [&quot;.....N.[...L...]

我们知道在file 1 block 179020的时候redo和undo信息不匹配,出现了上述的ORA 600 4194的错误.进一步分析

Block image after block recovery:
buffer tsn: 0 rdba: 0x00400080 (1/128)
scn: 0x0b26.6389e19d seq: 0x01 flg: 0x04 tail: 0xe19d0e01
frmt: 0x02 chkval: 0x7c95 type: 0x0e=KTU UNDO HEADER W/UNLIMITED EXTENTS
Hex dump of block: st=0, typ_found=1
Dump of memory from 0x000000094DAB2000 to 0x000000094DAB4000
94DAB2000 0000A20E 00400080 6389E19D 04010B26  [......@....c&amp;...]
94DAB2010 00007C95 00000000 00000000 00000000  [.|..............]
94DAB2020 00000000 00000015 000002FF 00001020  [............ ...]
94DAB2030 0000000D 0000004C 00000080 0042BB4C  [....L.......L.B.]


  Extent Control Header
  -----------------------------------------------------------------
  Extent Header:: spare1: 0      spare2: 0      #extents: 21     #blocks: 767   
                  last map  0x00000000  #maps: 0      offset: 4128  
      Highwater::  0x0042bb4c  ext#: 13     blk#: 76     ext size: 128   
  #blocks in seg. hdr's freelists: 0     
  #blocks below: 0     
  mapblk  0x00000000  offset: 13    
                   Unlocked
     Map Header:: next  0x00000000  #extents: 21   obj#: 0      flag: 0x40000000
  Extent Map
  -----------------------------------------------------------------
   0x00400081  length: 7     
   0x00413a38  length: 8     
   0x00400088  length: 8     
   0x00413a30  length: 8     
   0x0042b888  length: 8     
   0x0042b890  length: 8     
   0x0042b898  length: 8     
   0x0042b8a0  length: 8     
   0x0042b8a8  length: 8     
   0x0042b8b0  length: 8     
   0x0042b8b8  length: 8     
   0x0042b8c0  length: 8     
   0x0042ba80  length: 128   
   0x0042bb00  length: 128   
   0x0042bc00  length: 128   
   0x0042bc80  length: 128   
   0x0042bb80  length: 128   
   0x00400210  length: 8     
   0x00400218  length: 8     
   0x00400220  length: 8     
   0x00400228  length: 8     
  
  TRN CTL:: seq: 0x0f4c chd: 0x001b ctl: 0x0043 inc: 0x00000000 nfb: 0x0001
            mgc: 0x8002 xts: 0x0068 flg: 0x0001 opt: 2147483646 (0x7ffffffe)
            uba: 0x0042bb4c.0f4c.0c scn: 0x0b23.ccb9eb89
Version: 0x01
  FREE BLOCK POOL::
    uba: 0x0042bb4c.0f4c.0c ext: 0xd  spc: 0x132a  
    uba: 0x00000000.0f4c.0c ext: 0xd  spc: 0x12f6  
    uba: 0x00000000.0f4c.01 ext: 0xd  spc: 0x1ec8  
    uba: 0x00000000.0f4c.04 ext: 0xd  spc: 0x1b86  
    uba: 0x00000000.0f4c.09 ext: 0xd  spc: 0x162c  
  TRN TBL::
 
  index  state cflags  wrap#    uel         scn            dba            parent-xid    nub     stmt_num
  ------------------------------------------------------------------------------------------------
   0x00    9    0x00  0x0b62  0x0011  0x0b25.2dacaf61  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x01    9    0x00  0x0b64  0x0024  0x0b24.a6a2cf7b  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x02    9    0x00  0x0b65  0x0036  0x0b25.7391eda0  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x03    9    0x00  0x0b4f  0x0007  0x0b24.337bf49b  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x04    9    0x00  0x0b64  0x0051  0x0b23.ff22c637  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x05    9    0x00  0x0b64  0x0022  0x0b26.4393eb1e  0x0042bb4c  0x0000.000.00000000  0x00000001   0x00000000
   0x06    9    0x00  0x0b66  0x0058  0x0b24.335c794d  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x07    9    0x00  0x0b4f  0x001d  0x0b24.4e05f2af  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x08    9    0x00  0x0b65  0x005e  0x0b23.ff22c618  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x09    9    0x00  0x0b5f  0x0035  0x0b24.337bf3d9  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x0a    9    0x00  0x0b64  0x004f  0x0b25.7391ee5f  0x0042bb4c  0x0000.000.00000000  0x00000001   0x00000000
   0x0b    9    0x00  0x0b64  0x0040  0x0b24.335c7bd7  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x0c    9    0x00  0x0b65  0x0002  0x0b25.7391e929  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x0d    9    0x00  0x0b4f  0x0033  0x0b24.a6a2caa5  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x0e    9    0x00  0x0b65  0x0008  0x0b23.ff22c616  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x0f    9    0x00  0x0b61  0x0038  0x0b26.6389e195  0x0042bb4c  0x0000.000.00000000  0x00000001   0x00000000
   0x10    9    0x00  0x0b4f  0x002a  0x0b24.bcff18b3  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x11    9    0x00  0x0b5c  0x0059  0x0b25.2dacaf69  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x12    9    0x00  0x0b65  0x0026  0x0b25.2dacb0a8  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x13    9    0x00  0x0b66  0x0021  0x0b24.a6a2caaa  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x14    9    0x00  0x0b62  0x0009  0x0b24.337bf3d7  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x15    9    0x00  0x0b63  0x0031  0x0b25.1b4e13ba  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x16    9    0x00  0x0b66  0x003b  0x0b25.2dacee5d  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x17    9    0x00  0x0b63  0x0034  0x0b26.6389e199  0x0042bb4c  0x0000.000.00000000  0x00000001   0x00000000
   0x18    9    0x00  0x0b5d  0x002f  0x0b24.bcff18af  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x19    9    0x00  0x0b5b  0x004d  0x0b24.d60f78e3  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x1a    9    0x00  0x0b60  0x005b  0x0b25.1b4e13be  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x1b    9    0x00  0x0b62  0x003e  0x0b23.ccb9ebac  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x1c    9    0x00  0x0b65  0x000a  0x0b25.7391ee5c  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x1d    9    0x00  0x0b64  0x002c  0x0b24.4e05f2b1  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x1e    9    0x00  0x0b64  0x0045  0x0b24.33255ae9  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x1f    9    0x00  0x0b64  0x0015  0x0b25.1b4e13b5  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x20    9    0x00  0x0b63  0x0050  0x0b24.335c79a4  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x21    9    0x00  0x0b5d  0x0001  0x0b24.a6a2caf0  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x22    9    0x00  0x0b65  0x000f  0x0b26.4393eb20  0x0042bb4c  0x0000.000.00000000  0x00000001   0x00000000
   0x23    9    0x00  0x0b62  0x0042  0x0b24.337bf3d2  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x24    9    0x00  0x0b65  0x003c  0x0b24.a6a2d137  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x25    9    0x00  0x0b62  0x0020  0x0b24.335c795d  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x26    9    0x00  0x0b63  0x0052  0x0b25.2dacee48  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x27    9    0x00  0x0b4e  0x003a  0x0b25.7391ee58  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x28    9    0x00  0x0b65  0x0049  0x0b25.2dacb089  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x29    9    0x00  0x0b61  0x0030  0x0b24.bcff18bb  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x2a    9    0x00  0x0b65  0x0057  0x0b24.bcff18b5  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x2b    9    0x00  0x0b64  0x0054  0x0b25.2dacee55  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x2c    9    0x00  0x0b61  0x000d  0x0b24.4e05f2b3  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x2d    9    0x00  0x0b64  0x000e  0x0b23.ff22c611  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x2e    9    0x00  0x0b65  0x0053  0x0b25.7391e78a  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x2f    9    0x00  0x0b66  0x0010  0x0b24.bcff18b1  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x30    9    0x00  0x0b63  0x0019  0x0b24.d60f78e1  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x31    9    0x00  0x0b65  0x001a  0x0b25.1b4e13bc  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x32    9    0x00  0x0b65  0x0037  0x0b24.a6a2d369  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x33    9    0x00  0x0b4f  0x0013  0x0b24.a6a2caa7  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x34    9    0x00  0x0b63  0x0043  0x0b26.6389e19b  0x0042bb4c  0x0000.000.00000000  0x00000001   0x00000000
   0x35    9    0x00  0x0b65  0x0046  0x0b24.337bf409  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x36    9    0x00  0x0b52  0x0061  0x0b25.7391eda2  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x37    9    0x00  0x0b64  0x0018  0x0b24.bcff18ad  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x38    9    0x00  0x0b65  0x0017  0x0b26.6389e197  0x0042bb4c  0x0000.000.00000000  0x00000001   0x00000000
   0x39    9    0x00  0x0b62  0x0006  0x0b24.335c7947  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x3a    9    0x00  0x0b64  0x001c  0x0b25.7391ee5a  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x3b    9    0x00  0x0b4d  0x002e  0x0b25.7391e730  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x3c    9    0x00  0x0b65  0x0032  0x0b24.a6a2d144  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x3d    9    0x00  0x0b65  0x0016  0x0b25.2dacee59  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x3e    9    0x00  0x0b63  0x002d  0x0b23.ff22c60b  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x3f    9    0x00  0x0b63  0x005f  0x0b24.335c7b57  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x40    9    0x00  0x0b65  0x0044  0x0b24.335c7bd9  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x41    9    0x00  0x0b65  0x0029  0x0b24.bcff18b9  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x42    9    0x00  0x0b61  0x0014  0x0b24.337bf3d4  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x43    9    0x00  0x0b5b  0xffff  0x0b26.6389e19d  0x0042bb4c  0x0000.000.00000000  0x00000001   0x00000000
   0x44    9    0x00  0x0b4c  0x004b  0x0b24.335c7bdb  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x45    9    0x00  0x0b61  0x0039  0x0b24.335c7945  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x46    9    0x00  0x0b65  0x005a  0x0b24.337bf421  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x47    9    0x00  0x0b65  0x0027  0x0b25.7391eda8  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x48    9    0x00  0x0b62  0x004e  0x0b24.335c7953  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x49    9    0x00  0x0b63  0x0012  0x0b25.2dacb09f  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x4a    9    0x00  0x0b63  0x0023  0x0b24.335c7bf8  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x4b    9    0x00  0x0b5b  0x004a  0x0b24.335c7bf3  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x4c    9    0x00  0x0b63  0x003f  0x0b24.335c7b55  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x4d    9    0x00  0x0b61  0x001f  0x0b25.1b4e13b3  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x4e    9    0x00  0x0b5a  0x0025  0x0b24.335c795b  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x4f    9    0x00  0x0b5f  0x005c  0x0b25.8ff588bb  0x0042bb4c  0x0000.000.00000000  0x00000001   0x00000000
   0x50    9    0x00  0x0b63  0x004c  0x0b24.335c79a7  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x51    9    0x00  0x0b64  0x005d  0x0b24.0c84cbf4  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x52    9    0x00  0x0b65  0x002b  0x0b25.2dacee49  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x53    9    0x00  0x0b64  0x000c  0x0b25.7391e927  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x54    9    0x00  0x0b63  0x003d  0x0b25.2dacee56  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x55    9    0x00  0x0b64  0x0005  0x0b26.4393eada  0x0042bb4c  0x0000.000.00000000  0x00000001   0x00000000
   0x56    9    0x00  0x0b64  0x0055  0x0b26.4393ead3  0x0042bb4c  0x0000.000.00000000  0x00000001   0x00000000
   0x57    9    0x00  0x0b60  0x0041  0x0b24.bcff18b7  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x58    9    0x00  0x0b65  0x0060  0x0b24.335c794f  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x59    9    0x00  0x0b60  0x0028  0x0b25.2dacaf74  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x5a    9    0x00  0x0b61  0x0003  0x0b24.337bf499  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
   0x5b    9    0x00  0x0b62  0x0000  0x0b25.1b4e13c0  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x5c    9    0x00  0x0b62  0x0056  0x0b25.950a6e4b  0x0042bb4c  0x0000.000.00000000  0x00000001   0x00000000
   0x5d    9    0x00  0x0b63  0x001e  0x0b24.33255ae7  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x5e    9    0x00  0x0b5f  0x0004  0x0b23.ff22c635  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x5f    9    0x00  0x0b64  0x000b  0x0b24.335c7bd5  0x0042bb49  0x0000.000.00000000  0x00000001   0x00000000
   0x60    9    0x00  0x0b61  0x0048  0x0b24.335c7950  0x0042bb4b  0x0000.000.00000000  0x00000001   0x00000000
   0x61    9    0x00  0x0b65  0x0047  0x0b25.7391eda6  0x0042bb4a  0x0000.000.00000000  0x00000001   0x00000000
KQRCMT: Write failed with error=600 po=0xa47e092c0 cid=3
diagnostics : cid=3 hash=35e74caf flag=2a
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []

到这里我们基本上明白了报错的file 1 block 179020是由FREE BLOCK POOL分配出来的,现在解决给问题的思路就是直接使用bbed分配一个新块即可.

ORA-600 6711
数据库无法正常open报ORA 600 6711错误

Thu Jul  14 04:04:28 2016
alter database open
Beginning crash recovery of 1 threads
 parallel recovery started with 32 processes
Started redo scan
Completed redo scan
 read 1 KB redo, 0 data blocks need recovery
Started redo application at
 Thread 1: logseq 2, block 3
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: /data/amdu/redo/DATA_EC_260.f
Completed redo application of 0.00MB
Completed crash recovery at
 Thread 1: logseq 2, block 5, scn 12269633687653
 0 data blocks read, 0 data blocks written, 1 redo k-bytes read
Thu Jul  14 04:04:29 2016
Thread 1 advanced to log sequence 3 (thread open)
Thread 1 opened at log sequence 3
  Current log# 3 seq# 3 mem# 0: /data/amdu/redo/DATA_EC_263.f
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
After successful startup of the database, please remove
the parameters _allow_error_simulation and _smu_debug_mode
and restart the database
Thu Jul  14 04:04:29 2016
SMON: enabling cache recovery
Undo initialization finished serial:0 start:270626334 end:270626544 diff:210 (2 seconds)
Dictionary check beginning
Tablespace 'TEMP' #3 found in data dictionary,
but not in the controlfile. Adding to controlfile.
Corrected file 19 plugged in read-only status in control file
Corrected file 81 plugged in read-only status in control file
Corrected file 88 plugged in read-only status in control file
Corrected file 93 plugged in read-only status in control file
Corrected file 128 plugged in read-only status in control file
Corrected file 130 plugged in read-only status in control file
Corrected file 131 plugged in read-only status in control file
Corrected file 163 plugged in read-only status in control file
Corrected file 181 plugged in read-only status in control file
Corrected file 184 plugged in read-only status in control file
Corrected file 186 plugged in read-only status in control file
Corrected file 191 plugged in read-only status in control file
Corrected file 214 plugged in read-only status in control file
Corrected file 220 plugged in read-only status in control file
Dictionary check complete
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
*********************************************************************
WARNING: The following temporary tablespaces contain no files.
         This condition can occur when a backup controlfile has
         been restored.  It may be necessary to add files to these
         tablespaces.  That can be done using the SQL statement:
 
         ALTER TABLESPACE &lt;tablespace_name&gt; ADD TEMPFILE
 
         Alternatively, if these temporary tablespaces are no longer
         needed, then they can be dropped.
           Empty temporary tablespace: TEMP
*********************************************************************
Updating character set in controlfile to ZHS16GBK
WARNING: event 8105 is set. This event disables failed
         online index [re]build cleanup
No Resource Manager plan active
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Thu Jul  14 04:04:30 2016
QMNC started with pid=57, OS id=3549 
LOGSTDBY: Validating controlfile with logical metadata
LOGSTDBY: Validation complete
Errors in file /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_3401.trc  (incident=1440450):
ORA-00600: internal error code, arguments: [6711], [4293062], [1], [4318348], [0], [], [], [], [], [], [], []
Incident details in: /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/incident/incdir_1440450/xifenfei_ora_3401_i1440450.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Error 600 in kwqmnpartition(), aborting txn 
Thu Jul  14 04:04:32 2016
Dumping diagnostic data in directory=[cdmp_20801214040432], requested by (instance=1, osid=3401), summary=[incident=1440450].
Thu Jul  14 04:04:32 2016
Errors in file /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_mmon_3329.trc  (incident=1440178):
ORA-00600: internal error code, arguments: [6711], [4293062], [1], [4318348], [0], [], [], [], [], [], [], []
Incident details in: /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/incident/incdir_1440178/xifenfei_mmon_3329_i1440178.trc
Errors in file /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_ora_3401.trc  (incident=1440451):
ORA-00600: internal error code, arguments: [6711], [4293062], [1], [4318348], [0], [], [], [], [], [], [], []
Incident details in: /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/incident/incdir_1440451/xifenfei_ora_3401_i1440451.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/trace/xifenfei_mmon_3329.trc  (incident=1440179):
ORA-00600: internal error code, arguments: [6711], [4293062], [1], [4318348], [0], [], [], [], [], [], [], []
Incident details in: /oracle/app/oracle/diag/rdbms/xifenfei/xifenfei/incident/incdir_1440179/xifenfei_mmon_3329_i1440179.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
ORA-600 signalled during: alter database open...

数据库正常open失败,但是可以upgrade启动.根据对trace文件的分析,定位到问题在HISTGRM$表上面,进一步分析该表结构为

CREATE TABLE SYS.HISTGRM$
(
  OBJ#      NUMBER,
  COL#      NUMBER,
  ROW#      NUMBER,
  BUCKET    NUMBER,
  ENDPOINT  NUMBER,
  INTCOL#   NUMBER,
  EPVALUE   VARCHAR2(1000 BYTE),
  SPARE1    NUMBER,
  SPARE2    NUMBER
)
CLUSTER SYS.C_OBJ#_INTCOL#(OBJ#, INTCOL#);

这个和ORA-600 6711错误相匹配了

ERROR:
ORA-600 [6711] [a] [b] 1 [d]
VERSIONS:
versions 6.0 to 12.1
DESCRIPTION:
This error is generated when we find more blocks on a cluster key
chain than are supposed to be there.
Usually this indicates that the chain contains a loop within itself. We
cannot have more than 65535 blocks in a chain.
ARGUMENTS:
Arg [a] beginning DBA
Arg [b] table slot number (in table index)
Arg {c} dba of key next in chain
Arg [d] row slot of key next in chain

需要处理该错误,也就是需要处理CLUSTER SYS.C_OBJ#_INTCOL#,这个可以通过重建来实现,但是当重建之时发生

ORA-00600: internal error code, arguments: [kkoipt:invalid aptyp], [0], [0], [], [], [], [], [], [], [], [], []
ORA-08102: index key not found, obj# 39, file 1, block 1374829 (2)

这里错误比较明显obj#=39为obj$的i_obj4的index的记录和表不匹配,导致任何ddl无法执行,因此如果要处理C_OBJ#_INTCOL#就必须要先处理i_obj4的问题.通过一些技巧重建i_obj4,然后重建C_OBJ#_INTCOL#,数据库终于可以正常打开.由于大量数据字典不一致,exp/expdp导出依旧有问题,通过dblink直接拉数据到新库,完成本次恢复
补充说明:1. 在这个库的恢复过程中,我们还使用了大量的event和隐含参数,因为比较常规而且不涉及核心环节,因为未列举出来;2. 由于当时操作记录未能够保留日志因此相关操作步骤无法贴出来,本文只能提供恢复处理思路
再次提醒各位数据库做好备份,做好巡检工作,哪怕是强大的Oracle exadata也禁不起无备份折腾,数据重于一切

某集团ebs数据库redo undo丢失导致悲剧

某集团的ebs系统因磁盘空间不足把redo和undo存放到raid 0之上,而且该库无任何备份。最终悲剧发生了,raid 0异常导致redo undo全部丢失,数据库无法正常启动(我接手之时数据库已经resetlogs过,但是未成功)

Sun Jul 27 11:31:27 2014
SMON: enabling cache recovery
SMON: enabling tx recovery
Sun Jul 27 11:31:27 2014
Database Characterset is ZHS16GBK
Sun Jul 27 11:31:27 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_454754.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 42 cannot be read at this time
ORA-01110: data file 42: '/prod/oracle/PROD/logdata/undo/undo1.dbf'
Sun Jul 27 11:31:27 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_454754.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 42 cannot be read at this time
ORA-01110: data file 42: '/prod/oracle/PROD/logdata/undo/undo1.dbf'
Sun Jul 27 11:31:27 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_454754.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 42 cannot be read at this time
ORA-01110: data file 42: '/prod/oracle/PROD/logdata/undo/undo1.dbf'
Sun Jul 27 11:31:27 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_663670.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 41 cannot be read at this time
ORA-01110: data file 41: '/prod/oracle/PROD/logdata/undo/undo2.dbf'
Error 604 happened during db open, shutting down database
USER: terminating instance due to error 604
Instance terminated by USER, pid = 663670
ORA-1092 signalled during: ALTER DATABASE OPEN...

查询相关文件状态发现,undo表空间文件丢失,被offline处理
df_status
因为以前alert日志被清理,通过这里大概猜测是offline丢失的undo文件,然后resetlogs了数据库,现在处理方式为
使用_corrupted_rollback_segments屏蔽回滚段,然后尝试启动数据库

Tue Jul 29 11:40:39 2014
SMON: enabling cache recovery
SMON: enabling tx recovery
Tue Jul 29 11:40:39 2014
Database Characterset is ZHS16GBK
Tue Jul 29 11:40:39 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_569378.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number  with name "" too small
Tue Jul 29 11:40:39 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_569378.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number  with name "" too small
Tue Jul 29 11:40:39 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_smon_569378.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number  with name "" too small
Tue Jul 29 11:40:39 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_585786.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number  with name "" too small
Error 604 happened during db open, shutting down database
USER: terminating instance due to error 604
Instance terminated by USER, pid = 585786
ORA-1092 signalled during: alter database open...

该错误是由于数据库启动需要找到对应的回滚段,但是由于undo异常导致该回滚段无法找到,因此出现该错误,解决方法是通过修改数据scn,让其不找回滚段,从而屏蔽该错误.数据库启动后,删除undo重新创建新undo

Tue Jul 29 15:59:22 2014
drop tablespace undo2 including contents and datafiles
Tue Jul 29 15:59:23 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_782490.trc:
ORA-01122: database file 41 failed verification check
ORA-01110: data file 41: '/prod/oracle/PROD/logdata/undo/undo2.dbf'
ORA-01565: error in identifying file '/prod/oracle/PROD/logdata/undo/undo2.dbf'
ORA-27037: unable to obtain file status
IBM AIX RISC System/6000 Error: 2: No such file or directory
Additional information: 3
Tue Jul 29 15:59:23 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_782490.trc:
ORA-01259: unable to delete datafile /prod/oracle/PROD/logdata/undo/undo2.dbf
Tue Jul 29 15:59:23 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_782490.trc:
ORA-01122: database file 42 failed verification check
ORA-01110: data file 42: '/prod/oracle/PROD/logdata/undo/undo1.dbf'
ORA-01565: error in identifying file '/prod/oracle/PROD/logdata/undo/undo1.dbf'
ORA-27037: unable to obtain file status
IBM AIX RISC System/6000 Error: 2: No such file or directory
Additional information: 3
ORA-01259: unable to delete datafile /prod/oracle/PROD/logdata/undo/undo2.dbf
Tue Jul 29 15:59:23 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_782490.trc:
ORA-01259: unable to delete datafile /prod/oracle/PROD/logdata/undo/undo1.dbf
Tue Jul 29 15:59:23 2014
Completed: drop tablespace undo2 including contents and datafiles
Tue Jul 29 15:59:56 2014
create undo tablespace undotbs1 datafile '/prod/oracle/PROD/logdata/undo_new01.dbf' size 100M autoextend on next 128M maxsize 30G
Tue Jul 29 15:59:57 2014
Completed: create undo tablespace undotbs1 datafile '/prod/oracle/PROD/logdata/undo_new01.dbf' size 100M autoextend on next 128M maxsize 30G
Tue Jul 29 16:00:03 2014
alter tablespace undotbs1 add datafile '/prod/oracle/PROD/logdata/undo_new02.dbf' size 100M autoextend on next 128M maxsize 30G
Completed: alter tablespace undotbs1 add datafile '/prod/oracle/PROD/logdata/undo_new02.dbf' size 100M autoextend on next 128M maxsize 30G

业务运行过程中,数据库报大量ORA-600 4097,ORA-600 kdsgrp1,ORA-600 kcfrbd_3错误

Tue Jul 29 16:07:03 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_950484.trc:
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Tue Jul 29 16:07:06 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_950484.trc:
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []
Tue Jul 29 16:10:06 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_917702.trc:
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Tue Jul 29 16:10:07 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_917702.trc:
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []
Tue Jul 29 16:12:45 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/bdump/prod_m000_880692.trc:
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Tue Jul 29 16:21:23 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1040638.trc:
ORA-00600: 内部错误代码, 参数: [kcfrbd_3], [41], [231381], [1], [12800], [12800], [], []
Tue Jul 29 16:21:37 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1040638.trc:
ORA-00600: 内部错误代码, 参数: [kcfrbd_3], [41], [231381], [1], [12800], [12800], [], []
Tue Jul 29 16:21:56 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1040638.trc:
ORA-00600: 内部错误代码, 参数: [kcfrbd_3], [41], [231381], [1], [12800], [12800], [], []
Tue Jul 29 16:22:18 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1040638.trc:
ORA-00600: 内部错误代码, 参数: [kcfrbd_3], [41], [231381], [1], [12800], [12800], [], []
Tue Jul 29 16:22:28 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1105950.trc:
ORA-00600: 内部错误代码, 参数: [4097], [], [], [], [], [], [], []
Tue Jul 29 16:22:33 2014
Errors in file /prod/oracle/PROD/db/tech_st/10.2.0/admin/PROD_erpserver/udump/prod_ora_1159232.trc:
ORA-00600: 内部错误代码, 参数: [kcfrbd_3], [42], [61235], [1], [12800], [12800], [], []

出现该错误有几个原因和解决方法:
ORA-600 kdsgrp1 是因为相关坏块引起(tab,index,memory,cr block等),结合日志分析对象异常原因,根据具体情况确定对象然后选择合适处理方案(具体参考NOTE:1332252.1)
ORA-600 4097 由于数据库异常关闭然后open,创建回滚段,可能触发bug导致该问题(虽然说在当前版本修复,但是实际处理我确实按照NOTE:1030620.6解决)
ORA-600 kcfrbd_3 有事务的block被访问之后,根据回滚槽信息定位到相关回滚段,而正好新建的回滚段信息又和以前的名字编号一致,从而反馈出来是数据文件大小不够,从而出现该错误(具体参考NOTE:601798.1)
最终该数据库虽然恢复了,抢救了大量数据,但是对于ebs系统来说,丢失redo和undo数据的损失还是巨大的.再次温馨提示:数据库的redo,undo也很重要,数据库的备份更加重要