一起ORA-600 3020故障恢复的大体思路

recover database 报ORA-600 3020

Recovery of Online Redo Log: Thread 1 Group 2 Seq 5729 Reading mem 0
  Mem# 0: E:\ORACLE\ORADATA\YYGDB\REDO02.LOG
Tue Aug 19 19:37:29 2014
Errors in file d:\oracle\diag\rdbms\yygdb\yygdb\trace\yygdb_pr0s_4296.trc  (incident=39403):
ORA-00600: internal error code, arguments: [3020], [3], [240], [12583152], [], [], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 3, block# 240)
ORA-10564: tablespace UNDOTBS1
ORA-01110: data file 3: 'E:\ORACLE\ORADATA\YYGDB\UNDOTBS01.DBF'
ORA-10560: block type 'KTU SMU HEADER BLOCK'
Incident details in: d:\oracle\diag\rdbms\yygdb\yygdb\incident\incdir_39403\yygdb_pr0s_4296_i39403.trc
ORA-00600: internal error code, arguments: [3020], [2], [90586], [8479194], [], [], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 2, block# 90586)
ORA-10564: tablespace SYSAUX
ORA-01110: data file 2: 'E:\ORACLE\ORADATA\YYGDB\SYSAUX01.DBF'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 6087
Errors in file d:\oracle\diag\rdbms\yygdb\yygdb\trace\yygdb_ora_12460.trc  (incident=39147):
ORA-00600: internal error code, arguments: [3020], [3], [240], [12583152], [], [], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 3, block# 240)
ORA-10564: tablespace UNDOTBS1
ORA-01110: data file 3: 'E:\ORACLE\ORADATA\YYGD
Incident details in: d:\oracle\diag\rdbms\yygdb\yygdb\incident\incdir_39147\yygdb_ora_12460_i39147.trc
Tue Aug 19 19:37:31 2014
Trace dumping is performing id=[cdmp_20140819193731]
Tue Aug 19 19:37:32 2014
Recovery Slave PR0S previously exited with an exception
Shutting down recovery slaves due to error 10877
Media Recovery failed with error 10877
ORA-283 signalled during: ALTER DATABASE RECOVER  database  ...

使用allow 1 corruption跳3020错误继续恢复

Tue Aug 19 19:38:53 2014
ALTER DATABASE RECOVER  database allow 1 corruption  
Media Recovery Start
Fast Parallel Media Recovery enabled
 ALLOW CORRUPTION option must use serial recovery
Warning: Datafile 10 (D:\ORACLE\PRODUCT\11.1.0\DB_1\ORADATA\SAMPLE\LAYOUT_DB.DBF) is offline during full 
database recovery and will not be recovered
Recovery of Online Redo Log: Thread 1 Group 2 Seq 5729 Reading mem 0
  Mem# 0: E:\ORACLE\ORADATA\YYGDB\REDO02.LOG
CORRUPTING BLOCK 240 OF FILE 3 AND CONTINUING RECOVERY
Errors in file d:\oracle\diag\rdbms\yygdb\yygdb\trace\yygdb_ora_12460.trc:
ORA-10567: Redo is inconsistent with data block (file# 3, block# 240)
ORA-10564: tablespace UNDOTBS1
ORA-01110: 数据文件 3: 'E:\ORACLE\ORADATA\YYGDB\UNDOTBS01.DBF'
ORA-10560: block type 'KTU SMU HEADER BLOCK'
Errors in file d:\oracle\diag\rdbms\yygdb\yygdb\trace\yygdb_ora_12460.trc  (incident=39148):
ORA-00600: 内部错误代码, 参数: [3020], [2], [90586], [8479194], [], [], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 2, block# 90586)
ORA-10564: tablespace SYSAUX
ORA-01110: 数据文件 2: 'E:\ORACLE\ORADATA\YYGDB\SYSAUX01.DBF'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 6087
Incident details in: d:\oracle\diag\rdbms\yygdb\yygdb\incident\incdir_39148\yygdb_ora_12460_i39148.trc
Media Recovery failed with error 600
ORA-283 signalled during: ALTER DATABASE RECOVER  database allow 1 corruption  ...
Tue Aug 19 19:38:56 2014
Trace dumping is performing id=[cdmp_20140819193856]
Tue Aug 19 19:38:59 2014
Sweep Incident[39148]: completed
Tue Aug 19 19:39:05 2014
ALTER DATABASE RECOVER  database allow 1 corruption  
Media Recovery Start
Fast Parallel Media Recovery enabled
 ALLOW CORRUPTION option must use serial recovery
Warning: Datafile 10 (D:\ORACLE\PRODUCT\11.1.0\DB_1\ORADATA\SAMPLE\LAYOUT_DB.DBF) is offline during full 
database recovery and will not be recovered
Recovery of Online Redo Log: Thread 1 Group 2 Seq 5729 Reading mem 0
  Mem# 0: E:\ORACLE\ORADATA\YYGDB\REDO02.LOG
CORRUPTING BLOCK 90586 OF FILE 2 AND CONTINUING RECOVERY
Errors in file d:\oracle\diag\rdbms\yygdb\yygdb\trace\yygdb_ora_12460.trc:
ORA-10567: Redo is inconsistent with data block (file# 2, block# 90586)
ORA-10564: tablespace SYSAUX
ORA-01110: 数据文件 2: 'E:\ORACLE\ORADATA\YYGDB\SYSAUX01.DBF'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 6087
Completed: ALTER DATABASE RECOVER  database allow 1 corruption  

继续open数据库报ORA-01578错误,数据库无法open

Thread 1 opened at log sequence 5730
  Current log# 3 seq# 5730 mem# 0: E:\ORACLE\ORADATA\YYGDB\REDO03.LOG
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Tue Aug 19 19:39:34 2014
SMON: enabling cache recovery
Errors in file d:\oracle\diag\rdbms\yygdb\yygdb\trace\yygdb_ora_12460.trc  (incident=39149):
ORA-01578: ORACLE 数据块损坏 (文件号 3, 块号 240)
ORA-01110: 数据文件 3: 'E:\ORACLE\ORADATA\YYGDB\UNDOTBS01.DBF'
Incident details in: d:\oracle\diag\rdbms\yygdb\yygdb\incident\incdir_39149\yygdb_ora_12460_i39149.trc
Errors in file d:\oracle\diag\rdbms\yygdb\yygdb\trace\yygdb_ora_12460.trc  (incident=39150):
ORA-00353: 日志损坏接近块 520 更改 101455257 时间 08/18/2014 08:22:54
ORA-00312: 联机日志 1 线程 1: 'E:\ORACLE\ORADATA\YYGDB\REDO01.LOG'
ORA-01578: ORACLE 数据块损坏 (文件号 3, 块号 240)
ORA-01110: 数据文件 3: 'E:\ORACLE\ORADATA\YYGDB\UNDOTBS01.DBF'
Incident details in: d:\oracle\diag\rdbms\yygdb\yygdb\incident\incdir_39150\yygdb_ora_12460_i39150.trc
Errors in file d:\oracle\diag\rdbms\yygdb\yygdb\incident\incdir_39149\yygdb_ora_12460_i39149.trc:
ORA-00354: 损坏重做日志块标头
ORA-00353: 日志损坏接近块 520 更改 101455257 时间 08/18/2014 08:22:54
ORA-00312: 联机日志 1 线程 1: 'E:\ORACLE\ORADATA\YYGDB\REDO01.LOG'
ORA-01578: ORACLE 数据块损坏 (文件号 3, 块号 240)
ORA-01110: 数据文件 3: 'E:\ORACLE\ORADATA\YYGDB\UNDOTBS01.DBF'
Errors in file d:\oracle\diag\rdbms\yygdb\yygdb\trace\yygdb_ora_12460.trc  (incident=39151):
Error 1578 happened during db open, shutting down database
USER (ospid: 12460): terminating the instance due to error 1578
Tue Aug 19 19:39:41 2014
Errors in file d:\oracle\diag\rdbms\yygdb\yygdb\trace\yygdb_dbw3_18508.trc  (incident=38659):
ORA-01578: ORACLE data block corrupted (file # , block # )
Tue Aug 19 19:39:41 2014
Errors in file d:\oracle\diag\rdbms\yygdb\yygdb\trace\yygdb_dbw5_12160.trc  (incident=38675):
ORA-01578: ORACLE data block corrupted (file # , block # )
Tue Aug 19 19:39:42 2014
Instance terminated by USER, pid = 12460
ORA-1092 signalled during: alter database open...
ORA-1092 : opiodr aborting process unknown ospid (5084_12460)

由于undo 表空间有坏块,导致数据库open失败,尝试修改undo_management= “MANUAL”,继续启动数据库

Tue Aug 19 19:50:06 2014
alter database open
Beginning crash recovery of 1 threads
 parallel recovery started with 16 processes
Started redo scan
Completed redo scan
 3 redo blocks read, 0 data blocks need recovery
Started redo application at
 Thread 1: logseq 5731, block 2, scn 101497289
Recovery of Online Redo Log: Thread 1 Group 1 Seq 5731 Reading mem 0
  Mem# 0: E:\ORACLE\ORADATA\YYGDB\REDO01.LOG
Completed redo application of 0.00MB
Completed crash recovery at
 Thread 1: logseq 5731, block 5, scn 101517294
 0 data blocks read, 0 data blocks written, 3 redo blocks read
Tue Aug 19 19:50:08 2014
Thread 1 advanced to log sequence 5732 (thread open)
Thread 1 opened at log sequence 5732
  Current log# 2 seq# 5732 mem# 0: E:\ORACLE\ORADATA\YYGDB\REDO02.LOG
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Tue Aug 19 19:50:08 2014
SMON: enabling cache recovery
Dictionary check beginning
Tablespace 'TEMP' #3 found in data dictionary,
but not in the controlfile. Adding to controlfile.
File #3 is offline, but is part of an online tablespace.
data file 3: 'E:\ORACLE\ORADATA\YYGDB\UNDOTBS01.DBF'
File #10 is offline, but is part of an online tablespace.
data file 10: 'D:\ORACLE\PRODUCT\11.1.0\DB_1\ORADATA\SAMPLE\LAYOUT_DB.DBF'
File #11 is offline, but is part of an online tablespace.
data file 11: 'D:\ORACLE\PRODUCT\11.1.0\DB_1\ORADATA\SAMPLE\LAYOUT.DBF'
Dictionary check complete
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
*********************************************************************
WARNING: The following temporary tablespaces contain no files.
         This condition can occur when a backup controlfile has
         been restored.  It may be necessary to add files to these
         tablespaces.  That can be done using the SQL statement:
 
         ALTER TABLESPACE <tablespace_name> ADD TEMPFILE
 
         Alternatively, if these temporary tablespaces are no longer
         needed, then they can be dropped.
           Empty temporary tablespace: TEMP
*********************************************************************
Database Characterset is ZHS16GBK
Opening with internal Resource Manager plan : on 4 X 8 NUMA system
**********************************************************
WARNING: Files may exists in db_recovery_file_dest
that are not known to the database. Use the RMAN command
CATALOG RECOVERY AREA to re-catalog any such files.
If files cannot be cataloged, then manually delete them
using OS command.
One of the following events caused this:
1. A backup controlfile was restored.
2. A standby controlfile was restored.
3. The controlfile was re-created.
4. db_recovery_file_dest had previously been enabled and
   then disabled.
**********************************************************
Hex dump of (file 1, block 7065) in trace file 
d:\oracle\diag\rdbms\yygdb\yygdb\trace\yygdb_ora_14296.trc
Corrupt block relative dba: 0x00401b99 (file 1, block 7065)
Fractured block found during buffer read
Data in bad block:
 type: 6 format: 2 rdba: 0x00401b99
 last change scn: 0x0000.060c1f83 seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0xac3b0601
 check value in block header: 0x2e13
 computed block checksum: 0xa4ac
Reread of rdba: 0x00401b99 (file 1, block 7065) found same corrupted data
Errors in file d:\oracle\diag\rdbms\yygdb\yygdb\trace\yygdb_ora_14296.trc  (incident=42814):
ORA-01578: ORACLE 数据块损坏 (文件号 1, 块号 7065)
ORA-01110: 数据文件 1: 'E:\ORACLE\ORADATA\YYGDB\SYSTEM01.DBF'
Incident details in: d:\oracle\diag\rdbms\yygdb\yygdb\incident\incdir_42814\yygdb_ora_14296_i42814.trc
Errors in file d:\oracle\diag\rdbms\yygdb\yygdb\trace\yygdb_ora_14296.trc  (incident=42815):
ORA-01578: ORACLE 数据块损坏 (文件号 1, 块号 7065)
ORA-01110: 数据文件 1: 'E:\ORACLE\ORADATA\YYGDB\SYSTEM01.DBF'
Incident details in: d:\oracle\diag\rdbms\yygdb\yygdb\incident\incdir_42815\yygdb_ora_14296_i42815.trc
Tue Aug 19 19:50:12 2014
Trace dumping is performing id=[cdmp_20140819195012]
Tue Aug 19 19:50:12 2014
Sweep Incident[42814]: completed
Hex dump of (file 1, block 7065) in trace file 
d:\oracle\diag\rdbms\yygdb\yygdb\incident\incdir_42814\yygdb_m000_11592_i42814_a.trc
Corrupt block relative dba: 0x00401b99 (file 1, block 7065)
Fractured block found during validation
Data in bad block:
 type: 6 format: 2 rdba: 0x00401b99
 last change scn: 0x0000.060c1f83 seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0xac3b0601
 check value in block header: 0x2e13
 computed block checksum: 0xa4ac
Reread of blocknum=7065, file=E:\ORACLE\ORADATA\YYGDB\SYSTEM01.DBF. found same corrupt data
Reread of blocknum=7065, file=E:\ORACLE\ORADATA\YYGDB\SYSTEM01.DBF. found same corrupt data
Reread of blocknum=7065, file=E:\ORACLE\ORADATA\YYGDB\SYSTEM01.DBF. found same corrupt data
Reread of blocknum=7065, file=E:\ORACLE\ORADATA\YYGDB\SYSTEM01.DBF. found same corrupt data
Reread of blocknum=7065, file=E:\ORACLE\ORADATA\YYGDB\SYSTEM01.DBF. found same corrupt data
Hex dump of (file 1, block 7065) in trace file 
d:\oracle\diag\rdbms\yygdb\yygdb\trace\yygdb_ora_14296.trc
Corrupt block relative dba: 0x00401b99 (file 1, block 7065)
Fractured block found during buffer read
Data in bad block:
 type: 6 format: 2 rdba: 0x00401b99
 last change scn: 0x0000.060c1f83 seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0xac3b0601
 check value in block header: 0x2e13
 computed block checksum: 0xa4ac
Reread of rdba: 0x00401b99 (file 1, block 7065) found same corrupted data
Corrupt Block Found
         TSN = 0, TSNAME = SYSTEM
         RFN = 1, BLK = 7065, RDBA = 4201369
         OBJN = 1164, OBJD = 1164, OBJECT = SYS_FBA_BARRIERSCN, SUBOBJECT = 
         SEGMENT OWNER = SYS, SEGMENT TYPE = Table Segment
Errors in file d:\oracle\diag\rdbms\yygdb\yygdb\trace\yygdb_ora_14296.trc  (incident=42816):
ORA-01578: ORACLE 数据块损坏 (文件号 1, 块号 7065)
ORA-01110: 数据文件 1: 'E:\ORACLE\ORADATA\YYGDB\SYSTEM01.DBF'
Incident details in: d:\oracle\diag\rdbms\yygdb\yygdb\incident\incdir_42816\yygdb_ora_14296_i42816.trc
Trace dumping is performing id=[cdmp_20140819195014]
Errors in file d:\oracle\diag\rdbms\yygdb\yygdb\trace\yygdb_ora_14296.trc  (incident=42817):
Starting background process FBDA
Tue Aug 19 19:50:18 2014
FBDA started with pid=86, OS id=17700 
replication_dependency_tracking turned off (no async multimaster replication found)
Errors in file d:\oracle\diag\rdbms\yygdb\yygdb\trace\yygdb_fbda_17700.trc  (incident=42910):
ORA-01578: ORACLE data block corrupted (file # 1, block # 7065)
ORA-01110: data file 1: 'E:\ORACLE\ORADATA\YYGDB\SYSTEM01.DBF'
Trace dumping is performing id=[cdmp_20140819195018]
Errors in file d:\oracle\diag\rdbms\yygdb\yygdb\trace\yygdb_fbda_17700.trc  (incident=42911):
ORA-01578: ORACLE data block corrupted (file # 1, block # 7065)
ORA-01110: data file 1: 'E:\ORACLE\ORADATA\YYGDB\SYSTEM01.DBF'
LOGSTDBY: Validating controlfile with logical metadata
LOGSTDBY: Validation complete
ORA-604 signalled during: alter database open...

数据库不完全open成功,报了604错误,通过分析undo$,直接使用_offline_rollback_segments屏蔽了status$=5的回滚段,数据库open正常,因为system有大量坏块,幸运的是使用exp导出来几个业务用户的表数据全部OK.
数据库备份重于一切,别寄希望数据库非常规恢复

又一起存储故障导致ORA-00333 ORA-00312恢复

数据库启动报ORA-00333 ORA-00312错误,无法正常open数据库

Thu Aug 07 10:42:03  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_arc0_4724.trc:
ORA-00333: redo log read error block 63489 count 2048
ORA-00312: online log 2 thread 1: 'F:\ORADATA\SZCG\REDO02.LOG'
ORA-27091: unable to queue I/O
ORA-27070: async read/write failed
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 1) 函数不正确。

Thu Aug 07 10:42:03  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_arc0_4724.trc:
ORA-00333: redo log read error block 63489 count 2048

Thu Aug 07 10:42:03  2014
ARC0: All Archive destinations made inactive due to error 333
Thu Aug 07 10:42:03  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_1856.trc:
ORA-00449: 后台进程 'LGWR' 因错误 340 异常终止
ORA-00340: 处理联机日志  (用于线程 ) 时出现 I/O 错误

Thu Aug 07 10:42:03  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_6548.trc:
ORA-00449: 后台进程 'LGWR' 因错误 340 异常终止
ORA-00340: 处理联机日志  (用于线程 ) 时出现 I/O 错误

Thu Aug 07 10:42:03  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_8104.trc:
ORA-00449: 后台进程 'LGWR' 因错误 340 异常终止
ORA-00340: 处理联机日志  (用于线程 ) 时出现 I/O 错误

Thu Aug 07 10:42:03  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_lgwr_884.trc:
ORA-00340: IO error processing online log 3 of thread 1
ORA-00345: redo log write error block 65238 count 13
ORA-00312: online log 3 thread 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27070: async read/write failed
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1) 函数不正确。

Thu Aug 07 10:42:03  2014
LGWR: terminating instance due to error 340
Thu Aug 07 10:42:05  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_8104.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00449: background process 'LGWR' unexpectedly terminated with error 340
ORA-00340: IO error processing online log  of thread 

Thu Aug 07 10:42:05  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_1856.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00449: background process 'LGWR' unexpectedly terminated with error 340
ORA-00340: IO error processing online log  of thread 

Thu Aug 07 10:42:05  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_6548.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00449: background process 'LGWR' unexpectedly terminated with error 340
ORA-00340: IO error processing online log  of thread 


Thu Aug 07 17:40:05  2014
ALTER DATABASE OPEN
Thu Aug 07 17:40:05  2014
Beginning crash recovery of 1 threads
 parallel recovery started with 15 processes
Thu Aug 07 17:40:06  2014
Started redo scan
Thu Aug 07 17:40:06  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27070: 异步读取/写入失败
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1) 函数不正确。

Thu Aug 07 17:40:06  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27091: 无法将 I/O 排队
ORA-27070: 异步读取/写入失败
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 1) 函数不正确。

Thu Aug 07 17:40:06  2014
Aborting crash recovery due to error 333
Thu Aug 07 17:40:06  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错

ORA-333 signalled during: ALTER DATABASE OPEN...

进一步检查发现在7月6日系统就已经报io异常

Sun Jul 06 10:05:23  2014
ARC0: All Archive destinations made inactive due to error 333
Sun Jul 06 10:06:07  2014
KCF: write/open error block=0xd03 online=1
     file=3 F:\ORADATA\SZCG\SYSAUX01.DBF
     error=27070 txt: 'OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1) 函数不正确。'
Automatic datafile offline due to write error on
file 3: F:\ORADATA\SZCG\SYSAUX01.DBF
Sun Jul 06 10:06:23  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_arc1_2676.trc:
ORA-00333: redo log read error block 63489 count 2048
ORA-00312: online log 2 thread 1: 'F:\ORADATA\SZCG\REDO02.LOG'
ORA-27091: unable to queue I/O
ORA-27070: async read/write failed
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 1) 函数不正确。

Thu Aug 07 10:36:54  2014
ARC1: All Archive destinations made inactive due to error 333
Thu Aug 07 10:37:25  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_m000_5832.trc:
ORA-01135: file 3 accessed for DML/query is offline
ORA-01110: data file 3: 'F:\ORADATA\SZCG\SYSAUX01.DBF'

检查硬件发现raid一块盘完全损坏,另外一块盘也处于告警状态,保护现场拷贝文件过程中发现redo02,redo03,sysaux无法拷贝,使用rman检查发现
sysaux-block


因为redo完全损坏,使用工具跳过坏块,拷贝相关有坏块文件到其他目录,重命名相关文件尝试启动数据库,依然报ORA-00333 ORA-00312

Started redo scan
Thu Aug 07 17:40:06  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27070: 异步读取/写入失败
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1) 函数不正确。

Thu Aug 07 17:40:06  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27091: 无法将 I/O 排队
ORA-27070: 异步读取/写入失败
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 1) 函数不正确。

Thu Aug 07 17:40:06  2014
Aborting crash recovery due to error 333
Thu Aug 07 17:40:06  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5168.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错

ORA-333 signalled during: ALTER DATABASE OPEN...

设置隐含参数_allow_resetlogs_corruption,尝试强制拉库

Started redo scan
Fri Aug 08 12:13:25  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_3892.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27070: 异步读取/写入失败
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 1) 函数不正确。

Fri Aug 08 12:13:25  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_3892.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错
ORA-00312: 联机日志 3 线程 1: 'F:\ORADATA\SZCG\REDO03.LOG'
ORA-27091: 无法将 I/O 排队
ORA-27070: 异步读取/写入失败
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 1) 函数不正确。

Fri Aug 08 12:13:25  2014
Aborting crash recovery due to error 333
Fri Aug 08 12:13:25  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_3892.trc:
ORA-00333: 重做日志读取块 63016 计数 8192 出错

ORA-333 signalled during: ALTER DATABASE OPEN...
Fri Aug 08 12:13:45  2014
ALTER DATABASE RECOVER  database until cancel  
Fri Aug 08 12:13:45  2014
Media Recovery Start
 parallel recovery started with 15 processes
ORA-279 signalled during: ALTER DATABASE RECOVER  database until cancel  ...
Fri Aug 08 12:13:55  2014
ALTER DATABASE RECOVER    CANCEL  
Fri Aug 08 12:13:59  2014
ORA-1547 signalled during: ALTER DATABASE RECOVER    CANCEL  ...
Fri Aug 08 12:13:59  2014
ALTER DATABASE RECOVER CANCEL 
ORA-1112 signalled during: ALTER DATABASE RECOVER CANCEL ...
Fri Aug 08 12:14:12  2014
alter database open resetlogs
Fri Aug 08 12:14:13  2014
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
ORA-1245 signalled during: alter database open resetlogs...
Fri Aug 08 12:54:11  2014
alter tablespace sysaux offline
Fri Aug 08 12:54:11  2014
ORA-1109 signalled during: alter tablespace sysaux offline...
Fri Aug 08 13:05:30  2014
alter database open
Fri Aug 08 13:05:30  2014
ORA-1589 signalled during: alter database open...

在offline过程中,数据库检查到sysaux数据文件为offline状态,当表空间只有一个数据文件,而且该数据文件为offline,数据库将会尝试offline sysaux表空间,但是发现该表空间文件非正常scn,无法offline 表空间,导致resetlogs操作失败。这里是操作失误应该先online相关数据文件,然后再进行resetlogs操作

Sat Aug 09 11:56:03  2014
alter database datafile 3 online
Sat Aug 09 11:56:04  2014
Completed: alter database datafile 3 online
Sat Aug 09 11:56:08  2014
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
Sat Aug 09 11:56:18  2014
ARCH: Encountered disk I/O error 19502
Sat Aug 09 11:56:18  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_4516.trc:
ORA-19502: 文件 "F:\ARCHIVE\ARC01745_0814618167.001", 块编号 55297 写错误 (块大小 = 512) 
ORA-27072: 文件 I/O 错误
OSD-04008: WriteFile() 失败, 无法写入文件
O/S-Error: (OS 1) 函数不正确。
ORA-19502: 文件 "F:\ARCHIVE\ARC01745_0814618167.001", 块编号 55297 写错误 (块大小 = 512) 

Sat Aug 09 11:56:18  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_4516.trc:
ORA-19502: 文件 "F:\ARCHIVE\ARC01745_0814618167.001", 块编号 55297 写错误 (块大小 = 512) 
ORA-27072: 文件 I/O 错误
OSD-04008: WriteFile() 失败, 无法写入文件
O/S-Error: (OS 1) 函数不正确。
ORA-19502: 文件 "F:\ARCHIVE\ARC01745_0814618167.001", 块编号 55297 写错误 (块大小 = 512) 

ARCH: I/O error 19502 archiving log 3 to 'F:\ARCHIVE\ARC01745_0814618167.001'
Sat Aug 09 11:56:18  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_4516.trc:
ORA-00265: 要求实例恢复, 无法设置 ARCHIVELOG 模式

Archive all online redo logfiles failed:265
RESETLOGS after incomplete recovery UNTIL CHANGE 77983856
Resetting resetlogs activation ID 3562192628 (0xd452bef4)
Online log F:\ORADATA\SZCG\REDO01.LOG: Thread 1 Group 1 was previously cleared
Online log F:\ORADATA\SZCG\REDO02.LOG: Thread 1 Group 2 was previously cleared
Online log D:\REDO04.LOG: Thread 1 Group 4 was previously cleared
Sat Aug 09 11:56:22  2014
Setting recovery target incarnation to 3
Sat Aug 09 11:56:23  2014
Assigning activation ID 3602586269 (0xd6bb1a9d)
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=33, OS id=5900
Sat Aug 09 11:56:23  2014
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=34, OS id=5776
Sat Aug 09 11:56:24  2014
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: F:\ORADATA\SZCG\REDO01.LOG
Successful open of redo thread 1
Sat Aug 09 11:56:24  2014
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sat Aug 09 11:56:24  2014
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
Sat Aug 09 11:56:24  2014
ARC0: Becoming the heartbeat ARCH
Sat Aug 09 11:56:24  2014
SMON: enabling cache recovery
Sat Aug 09 11:56:25  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_4516.trc:
ORA-00600: 内部错误代码, 参数: [2662], [0], [77983864], [0], [77992379], [8388617], [], []

Sat Aug 09 11:56:26  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_4516.trc:
ORA-00600: 内部错误代码, 参数: [2662], [0], [77983864], [0], [77992379], [8388617], [], []

Sat Aug 09 11:56:26  2014
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Instance terminated by USER, pid = 4516
ORA-1092 signalled during: alter database open resetlogs...

ORA-600 2662这个错误很熟悉,直接推SCN,数据库open,但是报ORA-600 4194

Sat Aug 09 12:01:28  2014
SMON: enabling cache recovery
Dictionary check complete
Sat Aug 09 12:01:32  2014
SMON: enabling tx recovery
Sat Aug 09 12:01:32  2014
Database Characterset is ZHS16GBK
Opening with internal Resource Manager plan 
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=34, OS id=6116
Sat Aug 09 12:01:34  2014
LOGSTDBY: Validating controlfile with logical metadata
Sat Aug 09 12:01:34  2014
LOGSTDBY: Validation complete
Sat Aug 09 12:01:34  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_smon_920.trc:
ORA-00600: internal error code, arguments: [4194], [21], [53], [], [], [], [], []

Sat Aug 09 12:01:36  2014
Doing block recovery for file 2 block 319
Resuming block recovery (PMON) for file 2 block 319
Block recovery from logseq 2, block 56 to scn 1073742003
Sat Aug 09 12:01:36  2014
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: F:\ORADATA\SZCG\REDO02.LOG
Block recovery stopped at EOT rba 2.79.16
Block recovery completed at rba 2.79.16, scn 0.1073742002
Doing block recovery for file 2 block 153
Resuming block recovery (PMON) for file 2 block 153
Block recovery from logseq 2, block 56 to scn 1073741986
Sat Aug 09 12:01:36  2014
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: F:\ORADATA\SZCG\REDO02.LOG
Block recovery completed at rba 2.66.16, scn 0.1073741988
Sat Aug 09 12:01:36  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\bdump\szcg_smon_920.trc:
ORA-01595: error freeing extent (4) of rollback segment (10))
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [4194], [21], [53], [], [], [], [], []

Sat Aug 09 12:01:36  2014
Errors in file d:\oracle\product\10.2.0\admin\szcg\udump\szcg_ora_5272.trc:
ORA-00600: internal error code, arguments: [4194], [21], [53], [], [], [], [], []

Sat Aug 09 12:01:36  2014
Completed: alter database open

尝试重建undo表空间并切换undo_tabspace到新undo表空间解决,因为数据库在恢复过程中使用了隐含参数强制拉库,不能保证数据一致性,强烈建议逻辑方式重建数据库
在本次故障中,所幸的是只有redo和sysaux文件损坏,如果是业务数据文件或者system数据文件损坏,恢复的后果可能更加麻烦,丢失数据可能更加多。再次说明:数据库备份非常重要,数据的安全性不能完全寄希望于硬件之上

记录一次ORA-00316 ORA-00312 redo异常恢复

正常运行的数据库报突然报ORA-00316: log 1 of thread 1, type 286 in header is not log file,异常终止

Fri Feb 21 08:44:42 2014
Thread 1 advanced to log sequence 591 (LGWR switch)
  Current log# 1 seq# 591 mem# 0: J:\ORADATA\ORCL\REDO01.LOG
Fri Feb 21 15:31:20 2014
Errors in file c:\oracle\product\10.2.0\admin\orcl\bdump\orcl_lgwr_10312.trc:
ORA-00316: log 1 of thread 1, type 286 in header is not log file
ORA-00312: online log 1 thread 1: 'J:\ORADATA\ORCL\REDO01.LOG'

Fri Feb 21 15:31:20 2014
Errors in file c:\oracle\product\10.2.0\admin\orcl\bdump\orcl_lgwr_10312.trc:
ORA-00316: log 1 of thread 1, type 286 in header is not log file
ORA-00312: online log 1 thread 1: 'J:\ORADATA\ORCL\REDO01.LOG'

Fri Feb 21 15:31:20 2014
LGWR: terminating instance due to error 316
Fri Feb 21 15:31:20 2014
Errors in file c:\oracle\product\10.2.0\admin\orcl\bdump\orcl_j001_11328.trc:
ORA-00316: log  of thread , type  in header is not log file

Fri Feb 21 15:31:20 2014
Errors in file c:\oracle\product\10.2.0\admin\orcl\bdump\orcl_j000_14116.trc:
ORA-00316: 日志  (用于线程 ) 标头中的类型  不是日志文件

Fri Feb 21 15:31:20 2014
Errors in file c:\oracle\product\10.2.0\admin\orcl\bdump\orcl_dbw1_8964.trc:
ORA-00316: log  of thread , type  in header is not log file

Fri Feb 21 15:31:22 2014
Errors in file c:\oracle\product\10.2.0\admin\orcl\bdump\orcl_dbw0_10592.trc:
ORA-00316: log  of thread , type  in header is not log file
ORA-27041: unable to open file
OSD-04002: unable to open file
O/S-Error: (OS 2) 系统找不到指定的文件。
ORA-27041: unable to open file
OSD-04002: unable to open file
O/S-Error: (OS 2) 系统找不到指定的文件。

Fri Feb 21 15:31:41 2014
Errors in file c:\oracle\product\10.2.0\admin\orcl\bdump\orcl_smon_11112.trc:
ORA-00316: log  of thread , type  in header is not log file

Fri Feb 21 15:31:41 2014
Instance terminated by LGWR, pid = 10312

数据库启动报ORA-00316错误

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-00316: log 1 of thread 1, type 286 in header is not log file
ORA-00312: online log 1 thread 1: 'J:\ORADATA\ORCL\REDO01.LOG'

alert日志信息,报ORA-00316 ORA-00312

Sun Feb 23 13:54:08 2014
Started redo scan
Sun Feb 23 13:54:08 2014
Errors in file e:\oracle\product\10.2.0\admin\ora10g\udump\orcl_ora_5544.trc:
ORA-00316: log 1 of thread 1, type 286 in header is not log file
ORA-00312: online log 1 thread 1: 'J:\ORADATA\ORCL\REDO01.LOG'

Sun Feb 23 13:54:08 2014
Aborting crash recovery due to error 316
Sun Feb 23 13:54:08 2014
Errors in file e:\oracle\product\10.2.0\admin\ora10g\udump\orcl_ora_5544.trc:
ORA-00316: log 1 of thread 1, type 286 in header is not log file
ORA-00312: online log 1 thread 1: 'J:\ORADATA\ORCL\REDO01.LOG'

ORA-316 signalled during: alter database open...

通过dump redo header可以看出来redo header完全混乱了,里面很多数据文件内容在里面,初步估计系统或者硬件有问题(不稳定)导致该问题

LOG FILE #1: 
  (name #3) J:\ORADATA\ORCL\REDO01.LOG
 Thread 1 redo log links: forward: 2 backward: 0
 siz: 0x7d000 seq: 0x0000024f hws: 0x1 bsz: 512 nab: 0xffffffff flg: 0xa dup: 1
 Archive links: fwrd: 0 back: 0 Prev scn: 0x0000.4f4a1951
 Low scn: 0x0000.4f4e5400 02/21/2014 08:44:42
 Next scn: 0xffff.ffffffff 01/01/1988 00:00:00
 FILE HEADER:
	Software vsn=4280360459=0xff211e0b, Compatibility Vsn=103886850=0x6313002
	Db ID=842019892=0x32303434, Db Name='01a0001'
	Activation ID=1107558657=0x42040101
	Control Seq=842019892=0x32303434, File size=808464688=0x30303130
	File Number=2417, Blksiz=2013738803, File Type=286 UNKNOWN
 descrip:"00-440201|广东省xxxxxx研究院|00014402010037"
 thread: 767 nab: 0x534e4906 seq: 0xbcfebcbd hws: 0xffffffff eot: 55 dis: 66
 reset logs count: 0x7545245 scn: 0x0101.1e097178
 Low scn: 0x4537.2022002c 01/15/2022 20:59:37
 Next scn: 0x3734.46463135 04/30/2017 17:34:49
 Enabled scn: 0x4345.37333038 10/22/2015 02:04:16
 Thread closed scn: 0x3939.43303143 08/25/2024 00:11:31
 Log format vsn: 0x46343835 Disk cksum: 0x1d09 Calc cksum: 0x70ed
 Terminal Recovery Stop scn: 0x3734.35304141
 Terminal Recovery Stamp  03/17/2014 19:59:48
 Most recent redo scn: 0x3636.31303134
 Largest LWN: 758788710 blocks
 Miscellaneous flags: 0x41444534
 Thread internal enable indicator: thr: 263902399, seq: 808464496 scn: 0x3032.30343431

因为当前redo完全损坏,尝试不完全恢复并结合隐含参数(_allow_resetlogs_corruption)拉库,出现错误ORA-00704 ORA-00604 ORA-01555

Sun Feb 23 14:03:37 2014
SMON: enabling cache recovery
Sun Feb 23 14:03:39 2014
ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 0x0000.4f4e5405):
Sun Feb 23 14:03:39 2014
select ctime, mtime, stime from obj$ where obj# = :1
Sun Feb 23 14:03:39 2014
Errors in file e:\oracle\product\10.2.0\admin\ora10g\udump\orcl_ora_5504.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 11 with name "_SYSSMU11$" too small

Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704

通过修改scn,让数据库顺利open