Oracle bug ORA-600 k2vcbk_2故障恢复

有朋友找到我说他们数据库无法启动,数据库启动报ORA-600[k2vcbk_2]错误,数据库版本为11.2.0.2 RAC,操作系统是AIX 6.1

SQL> recover database;
Media recovery complete.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [k2vcbk_2], [], [], [], [], [], [],
[], [], [], [], []
Process ID: 7930020
Session ID: 49 Serial number: 14761

数据库节点1日志

Mon Sep 21 15:45:41 2015
Thread 1 advanced to log sequence 54076 (LGWR switch)
  Current log# 13 seq# 54076 mem# 0: +DG01/xifenfei/onlinelog/group_13.332.779459035
  Current log# 13 seq# 54076 mem# 1: +DG01/xifenfei/onlinelog/group_13.344.779582621
Mon Sep 21 15:45:44 2015
Archived Log entry 74655 added for thread 1 sequence 54075 ID 0x5a0bc0e1 dest 1:
Mon Sep 21 15:56:18 2015
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_ora_18088342.trc  (incident=184348):
ORA-00600: 内部错误代码, 参数: [kturPOTS_0], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /oracle/diag/rdbms/xifenfei/xifenfei1/incident/incdir_184348/xifenfei1_ora_18088342_i184348.trc
Mon Sep 21 15:56:34 2015
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Error 600 trapped in 2PC on transaction 7.16.120119. Cleaning up.
Error stack returned to user:
ORA-00600: 内部错误代码, 参数: [kturPOTS_0], [], [], [], [], [], [], [], [], [], [], []
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_ora_18088342.trc  (incident=184349):
ORA-00603: ORACLE 服务器会话因致命错误而终止
ORA-00600: 内部错误代码, 参数: [kturPOTS_0], [], [], [], [], [], [], [], [], [], [], []
Mon Sep 21 15:56:34 2015
Dumping diagnostic data in directory=[cdmp_20150921155634], requested by (instance=1, osid=18088342), summary=[incident=184348].
Incident details in: /oracle/diag/rdbms/xifenfei/xifenfei1/incident/incdir_184349/xifenfei1_ora_18088342_i184349.trc
Mon Sep 21 15:56:35 2015
Sweep [inc][184349]: completed
Sweep [inc][184348]: completed
Sweep [inc2][184348]: completed
opiodr aborting process unknown ospid (18088342) as a result of ORA-603
Mon Sep 21 15:57:12 2015
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_smon_7536810.trc  (incident=184274):
ORA-00600: internal error code, arguments: [k2vcbk_2], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /oracle/diag/rdbms/xifenfei/xifenfei1/incident/incdir_184274/xifenfei1_smon_7536810_i184274.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Mon Sep 21 15:57:16 2015
Dumping diagnostic data in directory=[cdmp_20150921155716], requested by (instance=1, osid=7536810 (SMON)), summary=[incident=184274].
Fatal internal error happened while SMON was doing active transaction recovery.
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_smon_7536810.trc:
ORA-00600: internal error code, arguments: [k2vcbk_2], [], [], [], [], [], [], [], [], [], [], []
SMON (ospid: 7536810): terminating the instance due to error 474
Mon Sep 21 15:57:18 2015
ORA-1092 : opitsk aborting process

数据库节点2日志

Mon Sep 21 15:21:50 2015
Archived Log entry 74653 added for thread 2 sequence 23559 ID 0x5a0bc0e1 dest 1:
Mon Sep 21 15:44:28 2015
Thread 2 advanced to log sequence 23561 (LGWR switch)
  Current log# 12 seq# 23561 mem# 0: +DG01/xifenfei/onlinelog/group_12.338.779457003
  Current log# 12 seq# 23561 mem# 1: +DG01/xifenfei/onlinelog/group_12.265.779582493
Mon Sep 21 15:44:31 2015
Archived Log entry 74654 added for thread 2 sequence 23560 ID 0x5a0bc0e1 dest 1:
Mon Sep 21 15:45:31 2015
DISTRIB TRAN xifenfei.1ebab0a5.20.3.1533822
  is local tran 20.3.1533822 (hex=14.03.17677e)
  insert pending committed tran, scn=14590688068086 (hex=d45.28c781f6)
Mon Sep 21 15:45:31 2015
DISTRIB TRAN xifenfei.1ebab0a5.20.3.1533822
  is local tran 20.3.1533822 (hex=14.03.17677e))
  delete pending committed tran, scn=14590688068086 (hex=d45.28c781f6)
Mon Sep 21 15:56:35 2015
Dumping diagnostic data in directory=[cdmp_20150921155634], requested by (instance=1, osid=18088342), summary=[incident=184348].
Mon Sep 21 15:57:10 2015
Error 3135 trapped in 2PC on transaction 20.11.1534704. Cleaning up.
Error stack returned to user:
ORA-03135: 连接失去联系
opidcl aborting process unknown ospid (9175532) as a result of ORA-604
Mon Sep 21 15:57:17 2015
Dumping diagnostic data in directory=[cdmp_20150921155716], requested by (instance=1, osid=7536810 (SMON)), summary=[incident=184274].
Mon Sep 21 15:57:23 2015
Reconfiguration started (old inc 18, new inc 20)
List of instances:
 2 (myinst: 2) 
 Global Resource Directory frozen
 * dead instance detected - domain 0 invalid = TRUE 
 Communication channels reestablished
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
Mon Sep 21 15:57:23 2015
 LMS 2: 3 GCS shadows cancelled, 1 closed, 0 Xw survived
Mon Sep 21 15:57:23 2015
 LMS 0: 2 GCS shadows cancelled, 0 closed, 0 Xw survived
Mon Sep 21 15:57:23 2015
 LMS 1: 3 GCS shadows cancelled, 1 closed, 0 Xw survived
 Set master node info 
 Submitted all remote-enqueue requests
 Dwn-cvts replayed, VALBLKs dubious
 All grantable enqueues granted
 Post SMON to start 1st pass IR
Mon Sep 21 15:57:23 2015
minact-scn: Inst 2 is now the master inc#:20 mmon proc-id:6816208 status:0x7
minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0d45.28c2bb5c gcalc-scn:0x0d45.28c3bd2e
minact-scn: master found reconf/inst-rec before recscn scan old-inc#:20 new-inc#:20
Mon Sep 21 15:57:23 2015
Instance recovery: looking for dead threads
 Submitted all GCS remote-cache requests
 Post SMON to start 1st pass IR
 Fix write in gcs resources
Reconfiguration complete
Beginning instance recovery of 1 threads
 parallel recovery started with 31 processes
Started redo scan
Completed redo scan
 read 12626 KB redo, 1724 data blocks need recovery
Started redo application at
 Thread 1: logseq 54076, block 184416
Recovery of Online Redo Log: Thread 1 Group 13 Seq 54076 Reading mem 0
  Mem# 0: +DG01/xifenfei/onlinelog/group_13.332.779459035
  Mem# 1: +DG01/xifenfei/onlinelog/group_13.344.779582621
Completed redo application of 9.78MB
Completed instance recovery at
 Thread 1: logseq 54076, block 209669, scn 14590688357285
 1633 data blocks read, 1794 data blocks written, 12626 redo k-bytes read
Thread 1 advanced to log sequence 54077 (thread recovery)
Mon Sep 21 15:57:33 2015
Error 3113 trapped in 2PC on transaction 21.18.1965522. Cleaning up.
Redo thread 1 internally disabled at seq 54077 (SMON)
Error stack returned to user:
ORA-02050: 事务处理 21.18.1965522 已回退, 某些远程数据库可能有问题
ORA-03113: 通信通道的文件结尾
ORA-02063: 紧接着 line (起自 ZSK)
Mon Sep 21 15:57:34 2015
Archived Log entry 74656 added for thread 1 sequence 54076 ID 0x5a0bc0e1 dest 1:
Mon Sep 21 15:57:34 2015
ARC0: Archiving disabled thread 1 sequence 54077
Archived Log entry 74657 added for thread 1 sequence 54077 ID 0x5a0bc0e1 dest 1:
Mon Sep 21 15:57:35 2015
Thread 2 advanced to log sequence 23562 (LGWR switch)
  Current log# 8 seq# 23562 mem# 0: +DG01/xifenfei/onlinelog/group_8.334.779456945
  Current log# 8 seq# 23562 mem# 1: +DG01/xifenfei/onlinelog/group_8.267.779582453
Mon Sep 21 15:57:36 2015
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei2/trace/xifenfei2_smon_6750672.trc  (incident=200218):
ORA-00600: internal error code, arguments: [k2vcbk_2], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /oracle/diag/rdbms/xifenfei/xifenfei2/incident/incdir_200218/xifenfei2_smon_6750672_i200218.trc
Archived Log entry 74658 added for thread 2 sequence 23561 ID 0x5a0bc0e1 dest 1:
Mon Sep 21 15:57:38 2015
minact-scn: master continuing after IR
Mon Sep 21 15:57:41 2015
Dumping diagnostic data in directory=[cdmp_20150921155741], requested by (instance=2, osid=6750672 (SMON)), summary=[incident=200218].
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Fatal internal error happened while SMON was doing instance transaction recovery.
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei2/trace/xifenfei2_smon_6750672.trc:
ORA-00600: internal error code, arguments: [k2vcbk_2], [], [], [], [], [], [], [], [], [], [], []
SMON (ospid: 6750672): terminating the instance due to error 474
Mon Sep 21 15:57:41 2015
ORA-1092 : opitsk aborting process
Mon Sep 21 15:57:42 2015
ORA-1092 : opitsk aborting process
Mon Sep 21 15:57:42 2015
License high water mark = 291
Instance terminated by SMON, pid = 6750672
USER (ospid: 18874814): terminating the instance

通过数据库日志大概可以看出来,由于节点2的分布式事事务异常,而在11.2.0.2中,分布式事务跨节点,引起节点2的pmon清理异常事务,但是由于bug,使得异常事务无法被清理掉,从而引起节点1 crash,节点1 crash之后节点2进行恢复,也因为分布式事务bug,导致smon回滚失败,实例也crash。无法进行回滚导致数据库无法正常启动,通过查询mos发现定位到是Bug 10222544 ORA-600 [k2vpci_2] from multi-branch distributed transaction
ORA-600-k2vpci_2


对于这类问题,由于分布事务无法清理,处理方法就是找出来事务人工提交或者直接屏蔽掉这个事务解决该问题

12C sysaux 异常恢复—wrong resetlogs 错误恢复

有朋友请求支援,他们数据库由于file 3 大量坏块,然后直接使用rman 备份还原了file 3,但是在recover过程中发现归档丢失,而且整个库在丢失归档的scn之后,还做过resetlogs操作,导致现在整个库无法正常启动,报ORA-01190错误,希望帮忙把file 3 给online起来,整个库正常open【当然在丢失sysaux的情况下,数据库可以open起来,但是这种情况下,迁移数据比较麻烦】

SQL> startup;
ORACLE 例程已经启动。

Total System Global Area 3.1868E+10 bytes
Fixed Size                  3601144 bytes
Variable Size            2.8655E+10 bytes
Database Buffers         3154116608 bytes
Redo Buffers               54804480 bytes
数据库装载完毕。
ORA-01190: 控制文件或数据文件 3 来自最后一个 RESETLOGS 之前
ORA-01110: 数据文件 3: 'E:\APP\ORAADM\ORADATA\ORCL\SYSAUX01.DBF'

Oracle Database Recovery Check Result结果显示[脚本]
wrong+resetlogs


尝试不完全恢复并使用隐含参数打开库

Fri Oct 02 19:10:12 2015
ALTER DATABASE RECOVER  database until cancel  
Fri Oct 02 19:10:12 2015
Media Recovery Start
 Started logmerger process
Fri Oct 02 19:10:12 2015
Media Recovery failed with error 16433
Fri Oct 02 19:10:14 2015
Recovery Slave PR00 previously exited with exception 283
ORA-283 signalled during: ALTER DATABASE RECOVER  database until cancel  ...
Fri Oct 02 19:10:37 2015
Errors in file E:\APP\ORAADM\diag\rdbms\orcl\oaorcl\trace\oaorcl_m000_5176.trc:
ORA-16433: The database or pluggable database must be opened in read/write mode.
Fri Oct 02 19:10:37 2015
Errors in file E:\APP\ORAADM\diag\rdbms\orcl\oaorcl\trace\oaorcl_m000_5176.trc:
ORA-16433: The database or pluggable database must be opened in read/write mode.
ALTER DATABASE RECOVER  database until cancel  
Fri Oct 02 19:11:18 2015
Media Recovery Start
 Started logmerger process
Fri Oct 02 19:11:18 2015
Media Recovery failed with error 16433
Fri Oct 02 19:11:19 2015
Recovery Slave PR00 previously exited with exception 283
ORA-283 signalled during: ALTER DATABASE RECOVER  database until cancel  ...
alter database open resetlogs
ORA-1139 signalled during: alter database open resetlogs...
alter database open
Fri Oct 02 19:11:49 2015
Errors in file E:\APP\ORAADM\diag\rdbms\orcl\oaorcl\trace\oaorcl_ora_4252.trc:
ORA-01190: 控制文件或数据文件 3 来自最后一个 RESETLOGS 之前
ORA-01110: 数据文件 3: 'E:\APP\ORAADM\ORADATA\ORCL\SYSAUX01.DBF'
ORA-1190 signalled during: alter database open...
Fri Oct 02 19:15:38 2015
Errors in file E:\APP\ORAADM\diag\rdbms\orcl\oaorcl\trace\oaorcl_m000_5292.trc:
ORA-16433: The database or pluggable database must be opened in read/write mode.
Fri Oct 02 19:15:38 2015
Errors in file E:\APP\ORAADM\diag\rdbms\orcl\oaorcl\trace\oaorcl_m000_5292.trc:
ORA-16433: The database or pluggable database must be opened in read/write mode.
Fri Oct 02 19:20:39 2015
Errors in file E:\APP\ORAADM\diag\rdbms\orcl\oaorcl\trace\oaorcl_m000_2276.trc:
ORA-16433: The database or pluggable database must be opened in read/write mode.
Fri Oct 02 19:20:39 2015
Errors in file E:\APP\ORAADM\diag\rdbms\orcl\oaorcl\trace\oaorcl_m000_2276.trc:
ORA-16433: The database or pluggable database must be opened in read/write mode.
Fri Oct 02 19:25:40 2015
Errors in file E:\APP\ORAADM\diag\rdbms\orcl\oaorcl\trace\oaorcl_m000_4804.trc:
ORA-16433: The database or pluggable database must be opened in read/write mode.
Fri Oct 02 19:25:40 2015
Errors in file E:\APP\ORAADM\diag\rdbms\orcl\oaorcl\trace\oaorcl_m000_4804.trc:
ORA-16433: The database or pluggable database must be opened in read/write mode.
Fri Oct 02 19:30:41 2015
Errors in file E:\APP\ORAADM\diag\rdbms\orcl\oaorcl\trace\oaorcl_m000_876.trc:
ORA-16433: The database or pluggable database must be opened in read/write mode.
Fri Oct 02 19:30:41 2015
Errors in file E:\APP\ORAADM\diag\rdbms\orcl\oaorcl\trace\oaorcl_m000_876.trc:
ORA-16433: The database or pluggable database must be opened in read/write mode.
Fri Oct 02 19:32:40 2015
Shutting down instance (abort)

数据库遭遇ORA-16433,此类方法无法打开数据库,根据经验值出现此类问题,可能需要重建控制文件,但是由于其中file 3的resetlogs scn不正确,无法包含该文件重建控制文件

Fri Oct 02 20:10:55 2015
WARNING: Default Temporary Tablespace not specified in CREATE DATABASE command
Default Temporary Tablespace will be necessary for a locally managed database in future release
Fri Oct 02 20:10:55 2015
Errors in file E:\APP\ORAADM\diag\rdbms\orcl\oaorcl\trace\oaorcl_ora_5004.trc:
ORA-01189: ????????????? RESETLOGS
ORA-01110: ???? 3: 'E:\APP\ORAADM\ORADATA\ORCL\SYSAUX01.DBF'
ORA-1503 signalled during:  CREATE CONTROLFILE REUSE DATABASE "orcl" RESETLOGS FORCE LOGGING ARCHIVELOG
     MAXLOGFILES 16
     MAXLOGMEMBERS 3
     MAXDATAFILES 100
     MAXINSTANCES 8
     MAXLOGHISTORY 2921
 LOGFILE
 GROUP 3 'E:\APP\ORAADM\ORADATA\ORCL\REDO03.LOG' size 50M,
 GROUP 2 'E:\APP\ORAADM\ORADATA\ORCL\REDO02.LOG'  size 50M,
 GROUP 1 'E:\APP\ORAADM\ORADATA\ORCL\REDO01.LOG'  size 50M
 DATAFILE
'E:\APP\ORAADM\ORADATA\ORCL\SYSTEM01.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\PDBSEED\SYSTEM01.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\SYSAUX01.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\PDBSEED\SYSAUX01.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\UNDOTBS01.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\USERS01.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\PDBORCL\SYSTEM01.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\PDBORCL\SYSAUX01.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\PDBORCL\SAMPLE_SCHEMA_USERS01.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\PDBORCL\EXAMPLE01.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\V3XSPACE.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\NMSA_BACKUP01.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\V3XSPACE1.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\V3XSPACE01.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\V3XSPACE02.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\V3XSPACE03.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\V3XSPACE04.DBF'
 CHARACTER SET AL32UTF8
 ...

除掉file 3 继续重建控制文件

Fri Oct 02 20:33:11 2015
Successful mount of redo thread 1, with mount id 1419796614
Completed:  CREATE CONTROLFILE REUSE DATABASE "orcl" RESETLOGS FORCE LOGGING ARCHIVELOG
     MAXLOGFILES 16
     MAXLOGMEMBERS 3
     MAXDATAFILES 100
     MAXINSTANCES 8
     MAXLOGHISTORY 2921
 LOGFILE
 GROUP 3 'E:\APP\ORAADM\ORADATA\ORCL\REDO03.LOG' size 50M,
 GROUP 2 'E:\APP\ORAADM\ORADATA\ORCL\REDO02.LOG'  size 50M,
 GROUP 1 'E:\APP\ORAADM\ORADATA\ORCL\REDO01.LOG'  size 50M
 DATAFILE
'E:\APP\ORAADM\ORADATA\ORCL\SYSTEM01.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\PDBSEED\SYSTEM01.DBF',
--'E:\APP\ORAADM\ORADATA\ORCL\SYSAUX01.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\PDBSEED\SYSAUX01.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\UNDOTBS01.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\USERS01.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\PDBORCL\SYSTEM01.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\PDBORCL\SYSAUX01.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\PDBORCL\SAMPLE_SCHEMA_USERS01.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\PDBORCL\EXAMPLE01.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\V3XSPACE.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\NMSA_BACKUP01.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\V3XSPACE1.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\V3XSPACE01.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\V3XSPACE02.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\V3XSPACE03.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\V3XSPACE04.DBF'
 CHARACTER SET AL32UTF8

继续恢复数据库

ALTER DATABASE OPEN
Fri Oct 02 20:34:57 2015
…………
Archived Log entry 3 added for thread 1 sequence 8 ID 0x54a083a3 dest 1:
Fri Oct 02 20:35:16 2015
Tablespace 'SYSAUX' #1 found in data dictionary,
but not in the controlfile. Adding to controlfile.
Tablespace 'TEMP' #3 found in data dictionary,
but not in the controlfile. Adding to controlfile.
File #3 found in data dictionary but not in controlfile.
Creating OFFLINE file 'MISSING00003' in the controlfile.
Corrected file 15 plugged in read-only status in control file
Corrected file 16 plugged in read-only status in control file
Corrected file 17 plugged in read-only status in control file
Corrected file 18 plugged in read-only status in control file
Corrected file 19 plugged in read-only status in control file
Dictionary check complete
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
Fri Oct 02 20:35:19 2015
SMON: enabling tx recovery
Fri Oct 02 20:35:19 2015
*********************************************************************
WARNING: The following temporary tablespaces in container(CDB$ROOT)
         contain no files.
Starting background process SMCO
Fri Oct 02 20:35:19 2015
SMCO started with pid=45, OS id=1500 
         This condition can occur when a backup controlfile has
         been restored.  It may be necessary to add files to these
         tablespaces.  That can be done using the SQL statement:
 
         ALTER TABLESPACE <tablespace_name> ADD TEMPFILE
 
         Alternatively, if these temporary tablespaces are no longer
         needed, then they can be dropped.
           Empty temporary tablespace: TEMP
*********************************************************************
Database Characterset is AL32UTF8
No Resource Manager plan active
Fri Oct 02 20:35:21 2015
Errors in file E:\APP\ORAADM\diag\rdbms\orcl\oaorcl\trace\oaorcl_ora_2220.trc:
ORA-00376: 此时无法读取文件 3
ORA-01111: 数据文件 3 名称未知 - 请重命名以更正文件
ORA-01110: 数据文件 3: 'C:\APP\ORAADM\PRODUCT\12.1.0\DBHOME_1\DATABASE\MISSING00003'
Fri Oct 02 20:35:21 2015
Errors in file E:\APP\ORAADM\diag\rdbms\orcl\oaorcl\trace\oaorcl_ora_2220.trc:
ORA-00376: 此时无法读取文件 3
ORA-01111: 数据文件 3 名称未知 - 请重命名以更正文件
ORA-01110: 数据文件 3: 'C:\APP\ORAADM\PRODUCT\12.1.0\DBHOME_1\DATABASE\MISSING00003'
Error 376 happened during db open, shutting down database
USER (ospid: 2220): terminating the instance due to error 376
Fri Oct 02 20:35:26 2015
Instance terminated by USER, pid = 2220
ORA-1092 signalled during: ALTER DATABASE OPEN...
opiodr aborting process unknown ospid (2220) as a result of ORA-1092

此时由于file 3 未包含在控制文件中,但是存在数据字典中,因此在数据库open的时候出现了默认文件名MISSING0003,尝试重命名改文件指定为存在的file 3,并且尝试恢复

SQL> startup mount;
ORACLE 例程已经启动。

Total System Global Area 3.1868E+10 bytes
Fixed Size                  3601144 bytes
Variable Size            2.8655E+10 bytes
Database Buffers         3154116608 bytes
Redo Buffers               54804480 bytes
数据库装载完毕。
SQL> alter database datafile 3 offline;

数据库已更改。

SQL> alter database rename file 'C:\APP\ORAADM\PRODUCT\12.1.0\DBHOME_1\DATABASE\
MISSING00003' to 'E:\APP\ORAADM\ORADATA\ORCL\SYSAUX01.DBF';

数据库已更改。

SQL> recover database until cancel;
ORA-00279: 更改 617412726 (在 10/02/2015 20:35:06 生成) 对于线程 1 是必需的
ORA-00289: 建议:
E:\APP\ORAADM\FAST_RECOVERY_AREA\ORCL\ARCHIVELOG\2015_10_02\O1_MF_1_9_%U_.ARC
ORA-00280: 更改 617412726 (用于线程 1) 在序列 #9 中


指定日志: {<RET>=suggested | filename | AUTO | CANCEL}
E:\APP\ORAADM\ORADATA\ORCL\REDO01.LOG
ORA-00310: archived log contains sequence 7; sequence 9 required
ORA-00334: archived log: 'E:\APP\ORAADM\ORADATA\ORCL\REDO01.LOG'


ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: 'E:\APP\ORAADM\ORADATA\ORCL\SYSTEM01.DBF'


SQL> recover database until cancel;
ORA-00279: 更改 617412726 (在 10/02/2015 20:35:06 生成) 对于线程 1 是必需的
ORA-00289: 建议:
E:\APP\ORAADM\FAST_RECOVERY_AREA\ORCL\ARCHIVELOG\2015_10_02\O1_MF_1_9_%U_.ARC
ORA-00280: 更改 617412726 (用于线程 1) 在序列 #9 中


指定日志: {<RET>=suggested | filename | AUTO | CANCEL}
E:\APP\ORAADM\ORADATA\ORCL\REDO03.LOG
已应用的日志。
完成介质恢复。
SQL> alter database datafile 3 online;

数据库已更改。

SQL> alter database open resetlogs;
alter database open resetlogs
*
第 1 行出现错误:
ORA-01122: 数据库文件 3 验证失败
ORA-01110: 数据文件 3: 'E:\APP\ORAADM\ORADATA\ORCL\SYSAUX01.DBF'
ORA-01202: 此文件的原型错误 - 创建时间错误

这里比较明显ORA-01202,由于创建控制文件之时没有file 3信息,因此导致控制文件中关于file 3的信息和该文件头的创建时间不一致(此处之时显示了时间不一致,如果通过bbed修改时间,后续可能还有很多东西不一致,因此通过bbed 一个个修改一个个尝试,理论可行,但实际可操作性不好),因此尝试直接使用bbed修改file 3文件头(由于是win环境,操作稍微麻烦点),把resetlogs信息修改和其他的一样

BBED> m /x 3c6b2b35
 File: SYSAUX01.dbf (3)
 Block: 2                Offsets:  112 to  143           Dba:0x00c00002
------------------------------------------------------------------------
 3c6b2b35 386b2200 00000000 00000000 00000000 00000000 00004000 bb460000 

 <32 bytes per line>

BBED> set offset 116
        OFFSET          116

BBED> m /x 3137ca24
 File: SYSAUX01.dbf (3)
 Block: 2                Offsets:  116 to  147           Dba:0x00c00002
------------------------------------------------------------------------
 3137ca24 00000000 00000000 00000000 00000000 00004000 bb460000 7dc12b35 

 <32 bytes per line>

BBED> m /x b9f8
 File: SYSAUX01.dbf (3)
 Block: 2                Offsets:  484 to  515           Dba:0x00c00002
------------------------------------------------------------------------
 b9f8a424 00000000 e65e2435 01000000 d3410000 b89b0000 10000900 02000000 

 <32 bytes per line>

BBED> set offset +2
        OFFSET          486

BBED> m /x cc24
 File: SYSAUX01.dbf (3)
 Block: 2                Offsets:  486 to  517           Dba:0x00c00002
------------------------------------------------------------------------
 cc240000 0000e65e 24350100 0000d341 0000b89b 00001000 09000200 00000000 

 <32 bytes per line>

BBED> m /x 87df offset 492
 File: SYSAUX01.dbf (3)
 Block: 2                Offsets:  492 to  523           Dba:0x00c00002
------------------------------------------------------------------------
 87df2435 01000000 d3410000 b89b0000 10000900 02000000 00000000 00000000 

 <32 bytes per line>

BBED> 
BBED> m /x 2b35 offset +2
 File: SYSAUX01.dbf (3)
 Block: 2                Offsets:  494 to  525           Dba:0x00c00002
------------------------------------------------------------------------
 2b350100 0000d341 0000b89b 00001000 09000200 00000000 00000000 00000000 

 <32 bytes per line>

BBED> d offset 140
 File: SYSAUX01.dbf (3)
 Block: 2                Offsets:  140 to  171           Dba:0x00c00002
------------------------------------------------------------------------
 bb460000 7dc12b35 ba460000 00000000 00000000 00000000 00000000 00000000 

 <32 bytes per line>

BBED> m /x 4248
 File: SYSAUX01.dbf (3)
 Block: 2                Offsets:  140 to  171           Dba:0x00c00002
------------------------------------------------------------------------
 42480000 7dc12b35 ba460000 00000000 00000000 00000000 00000000 00000000 

 <32 bytes per line>

BBED> d offset 148
 File: SYSAUX01.dbf (3)
 Block: 2                Offsets:  148 to  179           Dba:0x00c00002
------------------------------------------------------------------------
 ba460000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 

 <32 bytes per line>

BBED> m /x 4148
 File: SYSAUX01.dbf (3)
 Block: 2                Offsets:  148 to  179           Dba:0x00c00002
------------------------------------------------------------------------
 41480000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 

 <32 bytes per line>

BBED> sum apply
Check value for File 3, Block 2:
current = 0xd0c8, required = 0xd0c8

BBED> verify
DBVERIFY - Verification starting
FILE = SYSAUX01.dbf
BLOCK = 1


DBVERIFY - Verification complete

Total Blocks Examined         : 1
Total Blocks Processed (Data) : 0
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 0
Total Blocks Influx           : 0
Message 531 not found;  product=RDBMS; facility=BBED

修改完file 3的文件头之后,再次重建控制文件,此次包含file 3

Fri Oct 02 21:19:58 2015
Successful mount of redo thread 1, with mount id 1419797885
Completed:  CREATE CONTROLFILE REUSE DATABASE "orcl" NORESETLOGS FORCE LOGGING ARCHIVELOG
     MAXLOGFILES 16
     MAXLOGMEMBERS 3
     MAXDATAFILES 100
     MAXINSTANCES 8
     MAXLOGHISTORY 2921
 LOGFILE
 GROUP 3 'E:\APP\ORAADM\ORADATA\ORCL\REDO03.LOG' size 50M,
 GROUP 2 'E:\APP\ORAADM\ORADATA\ORCL\REDO02.LOG'  size 50M,
 GROUP 1 'E:\APP\ORAADM\ORADATA\ORCL\REDO01.LOG'  size 50M
 DATAFILE
'E:\APP\ORAADM\ORADATA\ORCL\SYSTEM01.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\PDBSEED\SYSTEM01.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\SYSAUX01.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\PDBSEED\SYSAUX01.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\UNDOTBS01.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\USERS01.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\PDBORCL\SYSTEM01.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\PDBORCL\SYSAUX01.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\PDBORCL\SAMPLE_SCHEMA_USERS01.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\PDBORCL\EXAMPLE01.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\V3XSPACE.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\NMSA_BACKUP01.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\V3XSPACE1.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\V3XSPACE01.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\V3XSPACE02.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\V3XSPACE03.DBF',
'E:\APP\ORAADM\ORADATA\ORCL\V3XSPACE04.DBF'
 CHARACTER SET AL32UTF8

继续恢复数据库,数据库正常open,而且file 3 已经正常online,数据库可以直接导出来,至此恢复大体完成

ORA-01245 ORA-01110 恢复

有朋友找到我,说数据库做recover报ORA-01245和ORA-01110错误,无法继续恢复,请求支持

SQL> recover database using backup controlfile until cancel;
…………

第 1 行出现错误:
ORA-01245: RESETLOGS 完成时脱机文件 1 将丢失
ORA-01110: 数据文件 1: 'E:\APP\ADMINISTRATOR\ORADATA\HXV10\SYSTEM01.DBF'

通过Oracle Database Recovery Check检查数据库情况,发现datafile 1处于offline状态
oracle_recovery_check


Wed Aug 26 23:11:00 2015
alter database datafile 1 offline drop
Completed: alter database datafile 1 offline drop

从这里基本上可以知道为什么出现ORA-01245错误了,由于system表空间中文件被offline导致.

redo信息
oracle_recovery_check_redo

Mon Aug 24 22:38:35 2015
alter database clear unarchived logfile group 2
Clearing online log 2 of thread 1 sequence number 5705
Completed: alter database clear unarchived logfile group 2
Wed Aug 26 23:13:23 2015
alter database clear logfile group 3
Clearing online log 3 of thread 1 sequence number 5706
Completed: alter database clear logfile group 3

除当前redo之外,其他redo被clear

尝试恢复

SQL> alter database datafile 1 online;

数据库已更改。

SQL> recover database;
ORA-00283: 恢复会话因错误而取消
ORA-01610: 使用 BACKUP CONTROLFILE 选项的恢复必须已完成


SQL> recover database using backup controlfile;
ORA-00279: 更改 63960710 (在 08/23/2015 17:01:25 生成) 对于线程 1 是必需的
ORA-00289: 建议:
E:\APP\ADMINISTRATOR\FLASH_RECOVERY_AREA\HXV10\ARCHIVELOG\2015_08_27\O1_MF_1_570

5_%U_.ARC
ORA-00280: 更改 63960710 (用于线程 1) 在序列 #5705 中


指定日志: {<RET>=suggested | filename | AUTO | CANCEL}
E:\APP\ADMINISTRATOR\ORADATA\HXV10\REDO03.LOG
ORA-00310: 归档日志包含序列 5706; 要求序列 5705
ORA-00334: 归档日志: 'E:\APP\ADMINISTRATOR\ORADATA\HXV10\REDO03.LOG'


SQL> recover database using backup controlfile;
ORA-00279: 更改 63960710 (在 08/23/2015 17:01:25 生成) 对于线程 1 是必需的
ORA-00289: 建议:
E:\APP\ADMINISTRATOR\FLASH_RECOVERY_AREA\HXV10\ARCHIVELOG\2015_08_27\O1_MF_1_570

5_%U_.ARC
ORA-00280: 更改 63960710 (用于线程 1) 在序列 #5705 中


指定日志: {<RET>=suggested | filename | AUTO | CANCEL}
E:\APP\ADMINISTRATOR\ORADATA\HXV10\REDO02.LOG
ORA-00339: 归档日志未包含任何重做
ORA-00334: 归档日志: 'E:\APP\ADMINISTRATOR\ORADATA\HXV10\REDO02.LOG'


SQL> recover database using backup controlfile;
ORA-00279: 更改 63960710 (在 08/23/2015 17:01:25 生成) 对于线程 1 是必需的
ORA-00289: 建议:
E:\APP\ADMINISTRATOR\FLASH_RECOVERY_AREA\HXV10\ARCHIVELOG\2015_08_27\O1_MF_1_570

5_%U_.ARC
ORA-00280: 更改 63960710 (用于线程 1) 在序列 #5705 中


指定日志: {<RET>=suggested | filename | AUTO | CANCEL}
E:\APP\ADMINISTRATOR\ORADATA\HXV10\REDO01.LOG
ORA-00310: 归档日志包含序列 5707; 要求序列 5705
ORA-00334: 归档日志: 'E:\APP\ADMINISTRATOR\ORADATA\HXV10\REDO01.LOG'

数据库做恢复需要seq 5705的redo,但是redo已经被clear,导致现在数据库常规手段无法恢复,只用使用隐含参数屏蔽数据库前滚(一致性检查)

再次尝试打开数据库

ORACLE 例程已经启动。

Total System Global Area  778387456 bytes
Fixed Size                  1374808 bytes
Variable Size             486540712 bytes
Database Buffers          285212672 bytes
Redo Buffers                5259264 bytes
数据库装载完毕。
SQL> recover database using backup controlfile;
ORA-00279: 更改 63960710 (在 08/23/2015 17:01:25 生成) 对于线程 1 是必需的
ORA-00289: 建议:
E:\APP\ADMINISTRATOR\FLASH_RECOVERY_AREA\HXV10\ARCHIVELOG\2015_08_27\O1_MF_1_570

5_%U_.ARC
ORA-00280: 更改 63960710 (用于线程 1) 在序列 #5705 中


指定日志: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
介质恢复已取消。
SQL> alter database open resetlogs;

数据库已更改。

在数据库恢复中,请不要对system表空间数据文件进行offline操作,如果对此类文件进行offline操作,讲在数据库恢复过程中出现ORA-01245和ORA-01110错误,而且文件还会出现SYSOFF状态

记录一次由于坏块和不恰当恢复引起各种ORA-600案例

朋友让我帮忙处理一个不能open的库,打开alert日志一看,傻眼了,里面是各种ORA-600的错误应有尽有,被折腾的够惨
故障后重启,无法启动主要表现在block坏块,引起的各种ORA-600等错误

Mon Mar 02 16:09:27 2015
ALTER DATABASE OPEN
Beginning crash recovery of 1 threads
 parallel recovery started with 23 processes
Started redo scan
Completed redo scan
 read 962 KB redo, 256 data blocks need recovery
Started redo application at
 Thread 1: logseq 726, block 37343
Recovery of Online Redo Log: Thread 1 Group 3 Seq 726 Reading mem 0
  Mem# 0: /u01/app/oracle/oradata/oa/redo03.log
Mon Mar 02 16:09:27 2015
RECOVERY OF THREAD 1 STUCK AT BLOCK 1673 OF FILE 3
Completed redo application of 0.27MB
Mon Mar 02 16:09:27 2015
RECOVERY OF THREAD 1 STUCK AT BLOCK 3104 OF FILE 3
Mon Mar 02 16:09:27 2015
RECOVERY OF THREAD 1 STUCK AT BLOCK 3613 OF FILE 3
Mon Mar 02 16:09:28 2015
RECOVERY OF THREAD 1 STUCK AT BLOCK 272 OF FILE 3
Mon Mar 02 16:09:28 2015
RECOVERY OF THREAD 1 STUCK AT BLOCK 2512 OF FILE 3
Hex dump of (file 2, block 92889) in trace file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_dbw2_4158.trc
Corrupt block relative dba: 0x00816ad9 (file 2, block 92889)
Bad header found during preparing block for write
Data in bad block:
 type: 0 format: 0 rdba: 0x6ad90000
 last change scn: 0x0000.00c6a052 seq: 0x1 flg: 0x00
 spare1: 0x6 spare2: 0xa2 spare3: 0x5d7e
 consistency value in tail: 0xa0520001
 check value in block header: 0x0
 block checksum disabled
Mon Mar 02 16:09:28 2015
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_p007_4196.trc  (incident=3833):
ORA-00600: internal error code, arguments: [4502], [1], [], [], [], [], [], [], [], [], [], []
Mon Mar 02 16:09:28 2015
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_p013_4208.trc  (incident=3881):
ORA-00600: internal error code, arguments: [2037], [4259067], [4244307968], [159], [243], [0], [2162032704], [100728832], [], [], [], []
Slave exiting with ORA-1172 exception
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_p009_4200.trc:
ORA-01172: recovery of thread 1 stuck at block 3613 of file 3
ORA-01151: use media recovery to recover block, restore backup if needed
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_p001_4184.trc:
ORA-01172: recovery of thread 1 stuck at block 2512 of file 3
ORA-01151: use media recovery to recover block, restore backup if needed
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_p021_4224.trc:
ORA-10388: parallel query server interrupt (failure)
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_p021_4224.trc:
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_dbw2_4158.trc  (incident=3697):
ORA-00600: internal error code, arguments: [kcbzpbuf_1], [4], [1], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/oa/oa/incident/incdir_3697/oa_dbw2_4158_i3697.trc
Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0xD2DDB7, kcbs_shrink_pool()+705] [flags: 0x0, count: 1]
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_mman_4152.trc  (incident=3673):
ORA-07445: exception encountered: core dump [kcbs_shrink_pool()+705] [SIGSEGV] [ADDR:0x0] [PC:0xD2DDB7] [SI_KERNEL(general_protection)] []
Incident details in: /u01/app/oracle/diag/rdbms/oa/oa/incident/incdir_3673/oa_mman_4152_i3673.trc
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_dbw2_4158.trc:
Mon Mar 02 16:09:34 2015
Instance terminated by DBW2, pid = 4158

第二次重启后增加新错误ORA-00600[17182]

Mon Mar 02 16:39:50 2015
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_p002_4321.trc  (incident=4993):
ORA-00600: internal error code, arguments: [17182], [0x7F548C2BDBA8], [], [], [], [], [], [], [], [], [], []

进行了一些恢复处理后,日志中报错
主要体现在进行了不完全恢复,而且应该是对redo进行了重命名或者redo头损坏锁引起的一系列提示

Beginning crash recovery of 1 threads
Started redo scan
Completed redo scan
 read 962 KB redo, 256 data blocks need recovery
Started redo application at
 Thread 1: logseq 726, block 37343
Recovery of Online Redo Log: Thread 1 Group 3 Seq 726 Reading mem 0
  Mem# 0: /u01/app/oracle/oradata/oa/redo03.log
RECOVERY OF THREAD 1 STUCK AT BLOCK 1673 OF FILE 3
Aborting crash recovery due to error 1172
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_6644.trc:
ORA-01172: recovery of thread 1 stuck at block 1673 of file 3
ORA-01151: use media recovery to recover block, restore backup if needed
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_6644.trc:
ORA-01172: recovery of thread 1 stuck at block 1673 of file 3
ORA-01151: use media recovery to recover block, restore backup if needed
ORA-1172 signalled during: alter  database open...
Tue Mar 03 11:17:59 2015
Sweep [inc][17178]: completed
Sweep [inc][17177]: completed
Sweep [inc2][17178]: completed
Tue Mar 03 11:18:00 2015
ALTER DATABASE RECOVER  database until cancel  
Media Recovery Start
 started logmerger process
Parallel Media Recovery started with 24 slaves
ORA-279 signalled during: ALTER DATABASE RECOVER  database until cancel  ...
ALTER DATABASE RECOVER    CONTINUE DEFAULT  
Tue Mar 03 11:18:06 2015
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_pr00_6701.trc:
ORA-00266: name of archived log file needed
ORA-266 signalled during: ALTER DATABASE RECOVER    CONTINUE DEFAULT  ...
ALTER DATABASE RECOVER CANCEL 
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_pr00_6701.trc:
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/u01/app/oracle/oradata/oa/system01.dbf'
Slave exiting with ORA-1547 exception
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_pr00_6701.trc:
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/u01/app/oracle/oradata/oa/system01.dbf'
ORA-10879 signalled during: ALTER DATABASE RECOVER CANCEL ...
Tue Mar 03 11:18:06 2015
Checker run found 4 new persistent data failures
Tue Mar 03 11:18:13 2015
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 12986989
Resetting resetlogs activation ID 3278679642 (0xc36cae5a)
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_6644.trc:
ORA-00367: checksum error in log file header
ORA-00322: log 1 of thread 1 is not current copy
ORA-00312: online log 1 thread 1: '/u01/app/oracle/oradata/oa/redo01.log'
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_6644.trc:

再一步折腾,增加了_allow_resetlogs_corruption= TRUE之后数据库报ORA-600[2662]

Tue Mar 03 11:19:26 2015
SMON: enabling cache recovery
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_6864.trc  (incident=18195):
ORA-00600: internal error code, arguments: [2662], [0], [13007002], [0], [13016626], [4194545], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/oa/oa/incident/incdir_18195/oa_ora_6864_i18195.trc
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_6864.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [2662], [0], [13007002], [0], [13016626], [4194545], [], [], [], [], [], []
Error 704 happened during db open, shutting down database
USER (ospid: 6864): terminating the instance due to error 704
Instance terminated by USER, pid = 6864
ORA-1092 signalled during: alter database open...
opiodr aborting process unknown ospid (6864) as a result of ORA-1092
Tue Mar 03 11:19:29 2015
ORA-1092 : opitsk aborting process

进一步折腾,可以看出来undo已经被其offline,无法正常访问,导致系统报ORA-704和ORA-00376

Wed Mar 04 21:10:58 2015
SMON: enabling cache recovery
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_17074.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/app/oracle/oradata/oa/undotbs01.dbf'
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_17074.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/app/oracle/oradata/oa/undotbs01.dbf'
Error 704 happened during db open, shutting down database
USER (ospid: 17074): terminating the instance due to error 704
Instance terminated by USER, pid = 17074
ORA-1092 signalled during: alter database open...
opiodr aborting process unknown ospid (17074) as a result of ORA-1092
Wed Mar 04 21:11:00 2015
ORA-1092 : opitsk aborting process

通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)检测结果见附件(xifenfei_db_recover_20150304),这里可以知道undo 不知道怎么折腾的数据文件scn较大而且还offline,
通过一些列方法(bbed,隐含参数等)调整数据库scn,强制启动数据库,报如下错误

Wed Mar 04 22:50:23 2015
SMON: enabling cache recovery
ORA-01555 caused by SQL statement below (SQL ID: 3nkd3g3ju5ph1, SCN: 0x0000.4000003e):
select obj#,type#,ctime,mtime,stime, status, dataobj#, flags, oid$, spare1, spare2 from obj$ where owner#=:1 and name=:2 and namespace=:3 and remoteowner is null and linkname is null and subname is null
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_17807.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-01555: snapshot too old: rollback segment number 10 with name "_SYSSMU10_3550978943$" too small
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_17807.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-01555: snapshot too old: rollback segment number 10 with name "_SYSSMU10_3550978943$" too small
Error 704 happened during db open, shutting down database
USER (ospid: 17807): terminating the instance due to error 704
Instance terminated by USER, pid = 17807
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (17807) as a result of ORA-1092

根据经验,该错误怀疑是文件头scn不够大,块延迟清理导致,进一步增加scn尝试,最后依旧是ORA-00704/ORA-00604/ORA-01555错误

Wed Mar 04 22:50:23 2015
SMON: enabling cache recovery
ORA-01555 caused by SQL statement below (SQL ID: 3nkd3g3ju5ph1, SCN: 0x0000.4000003e):
select obj#,type#,ctime,mtime,stime, status, dataobj#, flags, oid$, spare1, spare2 from obj$ where owner#=:1 and name=:2 and namespace=:3 and remoteowner is null and linkname is null and subname is null
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_17807.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-01555: snapshot too old: rollback segment number 10 with name "_SYSSMU10_3550978943$" too small
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_17807.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-01555: snapshot too old: rollback segment number 10 with name "_SYSSMU10_3550978943$" too small
Error 704 happened during db open, shutting down database
USER (ospid: 17807): terminating the instance due to error 704
Instance terminated by USER, pid = 17807
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (17807) as a result of ORA-1092

根据经验,在scn上做手脚估计难以解决给问题,对其启动过程做10046和errorstack分析发现

PARSING IN CURSOR #3 len=202 dep=2 uid=0 oct=3 lid=0 tim=1425481940448439 hv=3819099649 ad='64ff91af8' sqlid='3nkd3g3ju5ph1'
select obj#,type#,ctime,mtime,stime, status, dataobj#, flags, oid$, spare1, spare2 from obj$ where owner#=:1 and name=:2 and namespace=:3 and remoteowner is null and linkname is null and subname is null
END OF STMT
PARSE #3:c=1000,e=334,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=4,plh=0,tim=1425481940448439
BINDS #3:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7f5b3253a6f0  bln=22  avl=01  flg=05
  value=0
 Bind#1
  oacdty=01 mxl=32(06) mxlc=00 mal=00 scl=00 pre=00
  oacflg=18 fl2=0001 frm=01 csi=852 siz=32 off=0
  kxsbbbfp=7f5b3253a6b8  bln=32  avl=06  flg=05
  value="PROPS$"
 Bind#2
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7f5b3253a688  bln=24  avl=02  flg=05
  value=1
EXEC #3:c=0,e=640,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=4,plh=2853959010,tim=1425481940449147
WAIT #3: nam='db file sequential read' ela= 5 file#=1 block#=345 blocks=1 obj#=37 tim=1425481940449186
WAIT #3: nam='db file sequential read' ela= 4 file#=1 block#=44528 blocks=1 obj#=37 tim=1425481940449221
WAIT #3: nam='db file sequential read' ela= 3 file#=1 block#=5505 blocks=1 obj#=37 tim=1425481940449247

*** 2015-03-04 23:12:20.450
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=3, mask=0x0)
----- Error Stack Dump -----
ORA-00604: error occurred at recursive SQL level 2
ORA-01555: snapshot too old: rollback segment number 10 with name "_SYSSMU10_3550978943$" too small
----- Current SQL Statement for this session (sql_id=g64r07v2jn8nq) -----
SELECT NULL FROM PROPS$ WHERE NAME='BOOTSTRAP_UPGRADE_ERROR'

这里可以发现是数据库在启动的过程中需要执行SELECT NULL FROM PROPS$ WHERE NAME=’BOOTSTRAP_UPGRADE_ERROR’语句,而该语句递归调用了select obj#,type#,ctime,mtime,stime, status, dataobj#, flags, oid$, spare1, spare2 from obj$ where owner#=:1 and name=:2 and namespace=:3 and remoteowner is null and linkname is null and subname is null 语句。既然这样通过一些方法避免数据库启动之时查询SELECT NULL FROM PROPS$ WHERE NAME=’BOOTSTRAP_UPGRADE_ERROR’语句,果然数据库启动成功。

知识点补充
ORA-600 [4502] [a]

Arg [a] ITL entry with a lock count
Meaning: During ITL cleanout we clear all row locks but the ITL entry
	 still thinks there is an uncleared lock. Ie: ITL has a locked
	 row but there are no locked rows in the block 

大体意思是数据库发现undo 的itl已经被清除,但是block中的itl依然存在,从而出现ORA-600[4502],引起该问题除bug外主要原因是坏块

ORA-600 [2037] [a] [b] [d] [e] [f] [g]

Arg [a] Relative Data Block Address (RDBA) that the redo vector is for
Arg [b] The Block format  
Arg 1 RDBA in the block itself
Arg [d] The block type
Arg [e] The sequence number
Arg [f] Flags, if set  
Arg [g] The return value from the block head/tail checker.
DESCRIPTION:
  During recovery we are examining a block to ensure that it is not
  corrupt prior to applying any change vectors.
  The block has failed this check and this exception is raised

大体意思是在恢复过程中,正在检查的块,以确保它在应用任何变化向量之前不损坏。如果检查失败排除该异常ORA-600[2037],引起该问题除bug外主要原因是坏块

ORA-600 [kcbzpbuf_1],[a],[b]

Arg [a] Corruption reason
Arg [b] Calculate checksum flag

Corruption reason:
#define KCBH_GOOD    0                                     /* block is valid */
#define KCBH_ZERO    1             /* block header was entirely zero on disk */
#define KCBH_BROKEN  2      /* corruption could be from a partial disk write */
#define KCBH_CHKVAL  3               /* The check value for the block failed */
#define KCBH_CORRUPT 4     /* this is the wrong block or is not a data block */
#define KCBH_ZERONG  5               /* all zero block and it is not allowed */

Calculate checksum flag:
The possible values are 1 (Generate Checksum - db_block_checksum is enabled - default value) 
                        0 (do not generate checksum - db_block_checksum=false)

kcbzpbuf_1是该错误的源码函数

ORA-600 [17182] [a] [b] [d] [e]

DESCRIPTION:        
  Oracle has detected that the magic number in a memory chunk header has been overwritten.
  This is a heap (in memory) corruption and there is no underlying data corruption.
  The error may occur in the one of the process specific heaps 
  (the Call heap, PGA heap, or session heap) or in the shared heap (SGA).
1
ORACLE 发现在内存中重要的块头被重新,但是没有基础数据损坏,大部分和数据块或者内存损坏有关系.
<br>
ORA-600 [4552] [a] [b] 1 [d] [e]
1
DESCRIPTION:
  This assertion is raised because we are trying to unlock the rows in a 
  block, but receive an incorrect block type.
  The second argument is the block type received.

ORACLE尝试对某行进行解锁但是接收到了不正确的数据块类型,Arg [b]是接收到的数据块类型

ORA-600 [2662] [a] [b] [d] [e]

DESCRIPTION:

  A data block SCN is ahead of the current SCN.

  The ORA-600 [2662] occurs when an SCN is compared to the dependent SCN 
  stored in a UGA variable.

  If the SCN is less than the dependent SCN then we signal the ORA-600 [2662]
  internal error.

ARGUMENTS:
  Arg [a]  Current SCN WRAP
  Arg [b]  Current SCN BASE
  Arg 1  dependent SCN WRAP
  Arg [d]  dependent SCN BASE 
  Arg [e]  Where present this is the DBA where the dependent SCN came from.

主要的含义就是oracle文件头scn比某个block dependent scn小从而出现该问题

ora-01092:ORACLE 实例终止 —access$表丢失恢复

有网友咨询数据库启动报ora-01092:ORACLE 实例终止。强制断开连接,请求帮忙处理
数据库版本

Trace file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_5648.trc
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Windows NT Version V6.1 Service Pack 1
CPU                 : 1 - type 8664, 1 Physical Cores
Process Affinity    : 0x0000000000000000
Memory (Avail/Total): Ph:7605M/10239M, Ph+PgF:11979M/20477M
Instance name: orcl
Redo thread mounted by this instance: 1
Oracle process number: 18
Windows thread id: 5648, image: ORACLE.EXE (SHAD)

open数据库报ORA-01092: ORACLE 实例终止。强制断开连接

SQL> alter database open;
alter database open
*
第 1 行出现错误:
ORA-01092: ORACLE 实例终止。强制断开连接

alert日志

Thread 1 opened at log sequence 1008
  Current log# 3 seq# 1008 mem# 0: D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO03.LOG
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
SMON: enabling cache recovery
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_3964.trc:
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-00942: 表或视图不存在
Error 704 happened during db open, shutting down database
USER (ospid: 3964): terminating the instance due to error 704
Instance terminated by USER, pid = 3964
ORA-1092 signalled during: ALTER DATABASE OPEN...
ORA-1092 : opiodr aborting process unknown ospid (3384_3964)

做10046分析日志

PARSE ERROR #1:len=56 dep=1 uid=0 oct=3 lid=0 tim=1796038335 err=942
select order#,columns,types from access$ where d_obj#=:1

*** 2015-01-27 21:24:50.794
----- Error Stack Dump -----
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-00942: 表或视图不存在

通过这里可以知道数据库在启动的过程中由于无法访问access$表从而出现ORA-00942错误,又是由于该sql是数据库内部调用因为出现ORA-00604错误.
出现该错误的原因是由于:BUG:12733463 – ORA-704, ORA-604 AND ORA-942 ON TABLE ACCESS$ DURING STARTUP
官方提供方法

1. Shutdown (abort) the instance and clean up any OS structures used by the instance. 
    Eg: Ensure there is no shared memory, semaphores etc.. left lying around

2. Retry the startup.

3. If the error persists try and recover the database or recover from a backup.

惜分飞处理方法

startup  upgrade

 create table access$
 ( d_obj#        number not null,
   order#        number not null,
   columns       raw(126),
   types         number not null)
   storage (initial 10k next 100k maxextents unlimited pctincrease 0)
/

create index i_access1 on
  access$(d_obj#, order#)
  storage (initial 10k next 100k maxextents unlimited pctincrease 0)
/