联系:手机/微信(+86 17813235971) QQ(107644445)
标题:先offline数据文件,再resetlogs导致恢复复杂的故障处理
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
本来是一个简单的数据文件被误删除,然后通过底层恢复出来数据文件,再启动库就可以的事情,结果由于对oracle的不了解和自以为是,直接把丢失的文件不存在的情况下,offline文件,然后尝试resetlogs打开库,并且进行了各种尝试,结果使得问题比较麻烦.
故障之后现象
通过分析alert日志大概的主要错误,大概梳理故障情况
1. 启动数据库报control03.ctl丢失
Fri Apr 17 21:53:03 2026 MMNL started with pid=16, OS id=3613 ORACLE_BASE from environment = /data/oracle Fri Apr 17 21:53:08 2026 alter database mount ORA-00210: cannot open the specified control file ORA-00202: control file: '/data/oracle/oradata/orcl/control03.ctl' ORA-27037: unable to obtain file status Linux-x86_64 Error: 2: No such file or directory Additional information: 3 ORA-00210: cannot open the specified control file ORA-00202: control file: '/data/oracle/oradata/orcl/control02.ctl' ORA-27037: unable to obtain file status Linux-x86_64 Error: 2: No such file or directory Additional information: 3 ORA-205 signalled during: alter database mount...
如果只是这个文件丢失(这里还没有看到其他数据文件丢失的报错),本身是一个非常简单的故障,直接修改control_files参数即可
2. 结果当时操作的人直接rectl
Fri Apr 17 21:57:01 2026
Successful mount of redo thread 1, with mount id 1758675116
Completed: CREATE CONTROLFILE REUSE DATABASE "ORCL" NORESETLOGS NOARCHIVELOG
MAXLOGFILES 16
MAXLOGMEMBERS 3
MAXDATAFILES 100
MAXINSTANCES 8
MAXLOGHISTORY 292
LOGFILE
GROUP 1 '/data/oracle/oradata/orcl/redo01.log' SIZE 50M BLOCKSIZE 512,
GROUP 2 '/data/oracle/oradata/orcl/redo02.log' SIZE 50M BLOCKSIZE 512,
GROUP 3 '/data/oracle/oradata/orcl/redo03.log' SIZE 50M BLOCKSIZE 512
DATAFILE
'/data/oracle/oradata/orcl/system01.dbf',
'/data/oracle/oradata/orcl/sysaux01.dbf',
'/data/oracle/oradata/orcl/undotbs01.dbf',
'/data/oracle/oradata/orcl/users01.dbf'
CHARACTER SET ZHS16GBK
3.然后启动数据库报错
Fri Apr 17 22:02:43 2026 ALTER DATABASE OPEN Beginning crash recovery of 1 threads parallel recovery started with 3 processes Started redo scan Completed redo scan read 39020 KB redo, 0 data blocks need recovery Started redo application at Thread 1: logseq 11590, block 2, scn 137806010 Recovery of Online Redo Log: Thread 1 Group 1 Seq 11590 Reading mem 0 Mem# 0: /data/oracle/oradata/orcl/redo01.log Completed redo application of 0.00MB Completed crash recovery at Thread 1: logseq 11590, block 78042, scn 137831847 0 data blocks read, 0 data blocks written, 39020 redo k-bytes read Fri Apr 17 22:02:44 2026 Thread 1 advanced to log sequence 11591 (thread open) Thread 1 opened at log sequence 11591 Current log# 2 seq# 11591 mem# 0: /data/oracle/oradata/orcl/redo02.log Successful open of redo thread 1 MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set Fri Apr 17 22:02:44 2026 SMON: enabling cache recovery Successfully onlined Undo Tablespace 2. Dictionary check beginning Tablespace 'TEMP' #3 found in data dictionary, but not in the controlfile. Adding to controlfile. Tablespace 'ERP_XXXX' #6 found in data dictionary, but not in the controlfile. Adding to controlfile. Tablespace 'ERP_AAAA' #7 found in data dictionary, but not in the controlfile. Adding to controlfile. Tablespace 'ABCD' #8 found in data dictionary, but not in the controlfile. Adding to controlfile. Tablespace 'ERP_BBBB' #9 found in data dictionary, but not in the controlfile. Adding to controlfile. Tablespace 'ERP_XXD' #10 found in data dictionary, but not in the controlfile. Adding to controlfile. Tablespace 'ERP_12SF' #11 found in data dictionary, but not in the controlfile. Adding to controlfile. Tablespace 'XXX14' #12 found in data dictionary, but not in the controlfile. Adding to controlfile. Tablespace 'P_ZY' #13 found in data dictionary, but not in the controlfile. Adding to controlfile. File #5 found in data dictionary but not in controlfile. Creating OFFLINE file 'MISSING00005' in the controlfile. File #6 found in data dictionary but not in controlfile. Creating OFFLINE file 'MISSING00006' in the controlfile. File #7 found in data dictionary but not in controlfile. Creating OFFLINE file 'MISSING00007' in the controlfile. File #8 found in data dictionary but not in controlfile. Creating OFFLINE file 'MISSING00008' in the controlfile. File #9 found in data dictionary but not in controlfile. Creating OFFLINE file 'MISSING00009' in the controlfile. File #10 found in data dictionary but not in controlfile. Creating OFFLINE file 'MISSING00010' in the controlfile. File #11 found in data dictionary but not in controlfile. Creating OFFLINE file 'MISSING00011' in the controlfile. File #12 found in data dictionary but not in controlfile. Creating OFFLINE file 'MISSING00012' in the controlfile.
4.然后尝试resetlogs操作
Sat Apr 18 05:55:10 2026 ALTER DATABASE MOUNT Successful mount of redo thread 1, with mount id 1758652862 Database mounted in Exclusive Mode Lost write protection disabled Completed: ALTER DATABASE MOUNT Sat Apr 18 05:55:14 2026 ALTER DATABASE OPEN RESETLOGS ORA-1139 signalled during: ALTER DATABASE OPEN RESETLOGS... Sat Apr 18 05:56:29 2026 Starting ORACLE instance (normal) ALTER DATABASE RECOVER DATABASE UNTIL CANCEL Media Recovery Start started logmerger process Parallel Media Recovery started with 4 slaves Sat Apr 18 05:56:29 2026 Warning: Datafile 5 (/data/oracle/orcl/xxxx.dbf) is offline during full database recovery and will not be recovered Warning: Datafile 6 (/data/oracle/orcl/xxxx.dbf) is offline during full database recovery and will not be recovered Warning: Datafile 7 (/data/oracle/orcl/xxxx.dbf) is offline during full database recovery and will not be recovered Warning: Datafile 8 (/data/oracle/orcl/xxxx.dbf) is offline during full database recovery and will not be recovered Media Recovery Not Required Completed: ALTER DATABASE RECOVER DATABASE UNTIL CANCEL Sat Apr 18 05:57:45 2026 ALTER DATABASE OPEN RESETLOGS RESETLOGS after complete recovery through change 137865786 Resetting resetlogs activation ID 1645665187 (0x6216dba3) Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_2549.trc: ORA-00367: checksum error in log file header ORA-00322: log 1 of thread 1 is not current copy ORA-00312: online log 1 thread 1: '/data/oracle/oradata/orcl/redo01.log' Sat Apr 18 05:57:45 2026 Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_m000_2554.trc: ORA-00316: log 1 of thread 1, type 0 in header is not log file ORA-00312: online log 1 thread 1: '/data/oracle/oradata/orcl/redo01.log' Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_2549.trc: ORA-00367: checksum error in log file header ORA-00322: log 2 of thread 1 is not current copy ORA-00312: online log 2 thread 1: '/data/oracle/oradata/orcl/redo02.log' Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_m000_2554.trc: ORA-00316: log 2 of thread 1, type 0 in header is not log file ORA-00312: online log 2 thread 1: '/data/oracle/oradata/orcl/redo02.log' Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_m000_2554.trc: ORA-00322: log 3 of thread 1 is not current copy ORA-00312: online log 3 thread 1: '/data/oracle/oradata/orcl/redo03.log' Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_2549.trc: ORA-00367: checksum error in log file header ORA-00322: log 3 of thread 1 is not current copy ORA-00312: online log 3 thread 1: '/data/oracle/oradata/orcl/redo03.log' Sat Apr 18 05:57:46 2026 Setting recovery target incarnation to 2 Sat Apr 18 05:57:46 2026 Assigning activation ID 1758652862 (0x68d2e9be) Thread 1 opened at log sequence 1 Current log# 1 seq# 1 mem# 0: /data/oracle/oradata/orcl/redo01.log Successful open of redo thread 1 MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set Sat Apr 18 05:57:46 2026 SMON: enabling cache recovery Successfully onlined Undo Tablespace 2. Dictionary check beginning File #5 is offline, but is part of an online tablespace. data file 5: '/data/oracle/oradata/orcl/xxxx.dbf' File #6 is offline, but is part of an online tablespace. data file 6: '/data/oracle/oradata/orcl/xxxx.dbf' File #7 is offline, but is part of an online tablespace. data file 7: '/data/oracle/oradata/orcl/xxxx.dbf' File #8 is offline, but is part of an online tablespace. data file 8: '/data/oracle/oradata/orcl/xxxx.dbf'
到这一步悲剧基本上已经发生,犯了一个在oracle恢复里面比较忌讳的事情,有数据文件offline的情况下,执行resetlogs操作,导致部分数据文件的resetlogs信息没有被及时更新,导致一套库里面,被offline的这个部分数据文件resetlogs信息小于其他online的数据文件的。
5. 后续其他操作各种报错
Completed: ALTER DATABASE MOUNT Sun Apr 19 08:13:02 2026 ALTER DATABASE DATAFILE 5 OFFLINE DROP Sun Apr 19 08:13:02 2026 Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_dbw0_9212.trc (incident=67094): ORA-00600: internal error code, arguments: [3600], [5], [14], [], [], [], [], [], [], [], [], [] Incident details in: /data/oracle/diag/rdbms/orcl/orcl/incident/incdir_67094/orcl_dbw0_9212_i67094.trc Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_dbw0_9212.trc: ORA-00600: internal error code, arguments: [3600], [5], [14], [], [], [], [], [], [], [], [], [] DBW0 (ospid: 9212): terminating the instance due to error 471
Tue Apr 21 22:31:23 2026 Assigning activation ID 1758985759 (0x68d7fe1f) Thread 1 opened at log sequence 1 Current log# 1 seq# 1 mem# 0: /data/oracle/oradata/orcl/redo01.log Successful open of redo thread 1 MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set Tue Apr 21 22:31:23 2026 SMON: enabling cache recovery Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_4951.trc (incident=87950): ORA-00600: internal error code, arguments: [2662], [0], [137890858], [0], [137891091], [12583056], [] Incident details in: /data/oracle/diag/rdbms/orcl/orcl/incident/incdir_87950/orcl_ora_4951_i87950.trc Errors in file /data/oracle/diag/rdbms/orcl/orcl/incident/incdir_87950/orcl_ora_4951_i87950.trc: ORA-00339: archived log does not contain any redo ORA-00334: archived log: '/data/oracle/oradata/orcl/redo03.log' ORA-00339: archived log does not contain any redo ORA-00334: archived log: '/data/oracle/oradata/orcl/redo02.log' ORA-00339: archived log does not contain any redo ORA-00334: archived log: '/data/oracle/oradata/orcl/redo02.log' ORA-00339: archived log does not contain any redo ORA-00334: archived log: '/data/oracle/oradata/orcl/redo03.log' ORA-00600: internal error code, arguments: [2662], [0], [137890858], [0], [137891091], [12583056], [] Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_4951.trc: ORA-00600: internal error code, arguments: [2662], [0], [137890858], [0], [137891091], [12583056], [] Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_4951.trc: ORA-00600: internal error code, arguments: [2662], [0], [137890858], [0], [137891091], [12583056], [] Error 600 happened during db open, shutting down database USER (ospid: 4951): terminating the instance due to error 600
接手故障之后分析
使用obet工具直接快速的检查坏块情况和文件头信息,关于obet的介绍参考:
obet实现对数据文件坏块检测功能
Oracle数据块编辑工具( Oracle Block Editor Tool)-obet

dbv检测没任何坏块,比较好好的消息

但是检测数据文件头信息,发现有三种类型的resetlogs的信息,证明进行了多次部分文件的情况下进行了resetlogs操作
恢复处理
1. 使用Oracle Recovery Tools工具修改 resetlogs 信息
由于大量reseltogs 信息不一致,先使用Oracle Recovery Tools修改scn等相关信息Oracle Recovery Tools恢复案例总结—202505(注意选择resetlogs scn最大的文件为参照文件)

2. 重建ctl,打开库

比较幸运直接打开成功(本来也就应该成功,因为客户本身之前丢失主要业务文件的时候多次打开过库)
3. 然后增加temp文件,并expdp导出数据,完成本次恢复工作



