Oracle故障第一现场被恢复混乱的数据库恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:Oracle故障第一现场被恢复混乱的数据库恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户数据库断电异常之后,由于第三方进行了一系列的恢复,现场比较混乱,无法判断最初情况,通过Oracle Database Recovery Check收集的结果进行初步判断
1. 有三个文件处于丢失状态,并且数据库在故障之后被人强制resetlogs拉过库
file-missing


2. 数据文件头scn不一致,而且相差的日志序列还比较大(该数据库为非归档模式)
seq

3. 该库多次重建ctl(alert日志中也有相关记录)
rectl

现在恢复这个库需要做的几件事情:
1. 由于没有任何原始故障之后的控制文件,需要从服务器上找出来所有故障之时的数据文件,担心被人重建ctl使用了错误的数据文件
2. 对于三个file missing的进行分析,并确认磁盘上是否存在,是否是好的,如果是好的需要和现在的文件一起作为一个整体进行恢复,并打开库
3. 打开数据库过程可能遇到的错误处理

通过obet中近期增加的get_dbinfo功能来解析所有可能的数据文件头(obet官方说明),结合文件头的信息判断,发现磁盘上名称dbf结尾的文件号重复
1

这样的情况下,我们取filesize大,(scn大不一定正确,可能由于被强制resetlogs导致scn比正确的文件大),同时也结合这个收集的信息,确认三个丢失的文件中两个为undotbs1表空间文件,另外一个为112k的数据文件,这里让我学习到了新知识,oracle的数据文件最小可以多少个block(通过试验测试,最小可以16个block,文件大小即为:16+1(block 0)*block_size)

C:\Users\XFF>sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on 星期三 5月 13 22:05:52 2026

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


连接到:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> create tablespace tbs datafile 'e:/tbs01.dbf' size 8k;
create tablespace tbs datafile 'e:/tbs01.dbf' size 8k
*
第 1 行出现错误:
ORA-03214: 指定的文件大小小于所需的最小值


SQL> create tablespace tbs datafile 'e:/tbs01.dbf' size 80k;
create tablespace tbs datafile 'e:/tbs01.dbf' size 80k
*
第 1 行出现错误:
ORA-03214: 指定的文件大小小于所需的最小值


SQL> create tablespace tbs datafile 'e:/tbs01.dbf' size 96k;

表空间已创建。

确认好相关信息之后,然后对于三个file missing状态的文件进行dbv检测确认undotbs01.dbf(file 3)基本上全部损坏(大量全0块),另外两个文件正常
obet_dbv


对于正常的文件通过obet修改相关scn信息
obet_resetlogs_scn

然后重建控制文件(丢弃undotbs01.dbf文件),由于确认undo已经异常,直接设置undo为manual管理方式并屏蔽回滚段,然后屏蔽一致性,强制打开数据库,结果报ORA-600 2662错误
ora-600-2662

使用Patch_SCN工具修改数据库scn(Patch_SCN工具说明)
patch_scn

然后数据库顺利打开,重建新undo,增加temp,删除老undo,导出数据完成本次恢复任务

一次断电引起的Oracle故障恢复-ora-600 2662故障

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:一次断电引起的Oracle故障恢复-ora-600 2662故障

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

接手一个oracle恢复case,由于断电导致oracle数据库异常,现场人员进行了一系列的恢复成功,但是没有成功open库,我接手故障之后尝试做recover 报ORA-16433错误

[oracle@xifenfei.com ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Sun May 10 09:27:51 2026

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> 
SQL> recover database;
ORA-00283: recovery session canceled due to errors
ORA-16433: The database must be opened in read/write mode.

根据经验这种错误一般是由于强制打开库失败导致,回溯oracle alert日志发现类似操作

Fri May 08 18:47:59 2026
ALTER DATABASE OPEN RESETLOGS
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 138986145572
Errors in file /data/oracle/diag/rdbms/xff/xff/trace/xff_ora_37813.trc:
ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1: '/data/oracle/oradata/xff/redo01.log'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Clearing online redo logfile 1 /data/oracle/oradata/xff/redo01.log
Clearing online log 1 of thread 1 sequence number 2389
Errors in file /data/oracle/diag/rdbms/xff/xff/trace/xff_ora_37813.trc:
ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1: '/data/oracle/oradata/xff/redo01.log'
………………
Clearing online redo logfile 3 complete
Resetting resetlogs activation ID 677485461 (0x28619b95)
Online log /data/oracle/oradata/xff/redo01.log: Thread 1 Group 1 was previously cleared
Online log /data/oracle/oradata/xff/redo02.log: Thread 1 Group 2 was previously cleared
Online log /data/oracle/oradata/xff/redo03.log: Thread 1 Group 3 was previously cleared
Fri May 08 18:48:17 2026
Setting recovery target incarnation to 4
Fri May 08 18:48:17 2026
Assigning activation ID 677634815 (0x2863e2ff)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: /data/oracle/oradata/xff/redo01.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri May 08 18:48:18 2026
SMON: enabling cache recovery
Errors in file /data/oracle/diag/rdbms/xff/xff/trace/xff_ora_37813.trc  (incident=170321):
ORA-00600: internal error code, arguments: [2662], [32], [1547192107], [32], [1547212241], [12583040], []
Fri May 08 18:48:20 2026
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /data/oracle/diag/rdbms/xff/xff/trace/xff_ora_37813.trc:
ORA-00600: internal error code, arguments: [2662], [32], [1547192107], [32], [1547212241], [12583040]
Errors in file /data/oracle/diag/rdbms/xff/xff/trace/xff_ora_37813.trc:
ORA-00600: internal error code, arguments: [2662], [32], [1547192107], [32], [1547212241], [12583040]
Error 600 happened during db open, shutting down database
USER (ospid: 37813): terminating the instance due to error 600
Instance terminated by USER, pid = 37813
ORA-1092 signalled during: ALTER DATABASE OPEN RESETLOGS...

对于这种情况,先重建控制文件

SQL> @/tmp/rectl.sql

Control file created.

尝试recover database恢复

ALTER DATABASE RECOVER  database  
Media Recovery Start
 started logmerger process
Parallel Media Recovery started with 96 slaves
Sun May 10 09:33:03 2026
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: /data/oracle/oradata/xff/redo02.log
Errors in file /data/oracle/diag/rdbms/xff/xff/trace/xff_pr00_250074.trc  (incident=254397):
ORA-00353: log corruption near block 3 change 138986165586 time 05/08/2026 18:55:37
ORA-00312: online log 2 thread 1: '/data/oracle/oradata/xff/redo02.log'
Incident details in: /data/oracle/diag/rdbms/xff/xff/incident/incdir_254397/xff_pr00_250074_i254397.trc
Sun May 10 09:33:04 2026
Media Recovery failed with error 399
Errors in file /data/oracle/diag/rdbms/xff/xff/trace/xff_pr00_250074.trc:
ORA-00283: recovery session canceled due to errors
ORA-00399: corrupt change description in redo log
ORA-00353: log corruption near block 3 change 138986165586 time 05/08/2026 18:55:37
ORA-00312: online log 2 thread 1: '/data/oracle/oradata/xff/redo02.log'
ORA-10877 signalled during: ALTER DATABASE RECOVER  database  ...
Sun May 10 09:33:04 2026
Sweep [inc][254397]: completed
Errors in file /data/oracle/diag/rdbms/xff/xff/trace/xff_m000_250348.trc  (incident=255173):
ORA-00353: log corruption near block 3 change 138986165586 time 05/08/2026 18:55:37
ORA-00312: online log 2 thread 1: '/data/oracle/oradata/xff/redo02.log'
Errors in file /data/oracle/diag/rdbms/xff/xff/incident/incdir_254397/xff_m000_250348_i254397_a.trc:
ORA-00399: corrupt change description in redo log
ORA-00353: log corruption near block 3 change 138986165586 time 05/08/2026 18:55:37
ORA-00312: online log 2 thread 1: '/data/oracle/oradata/xff/redo02.log'
Errors in file /data/oracle/diag/rdbms/xff/xff/incident/incdir_254397/xff_m000_250348_i254397_a.trc:
ORA-00399: corrupt change description in redo log
ORA-00353: log corruption near block 3 change 138986165586 time 05/08/2026 18:55:37
ORA-00312: online log 2 thread 1: '/data/oracle/oradata/xff/redo02.log'
Sun May 10 09:34:05 2026

由于无法正常recover 操作,数据库需要强制打开,考虑到之前该库open的过程有ORA-600 2662错误,这次在打开之前使用Patch_SCN调整SCN(关于Patch_SCN文章:Patch_SCN for Linux 功能完善

[oracle@xifenfei.com tmp]$ ./Patch_SCN 249756 0x205D38954E
Successfully obtained address automatically: 0x6001ae70
Original Oracle SCN at Address 0x6001ae70: 0x0
Are you sure you want to modify Oracle SCN? (yes/no): yes
New SCN at Address 0x6001ae70: 0x205d38954e
Oracle SCN successfully modified.

然后数据库顺利打开
open1


在expdp导出数据过程中遇到了硬件错误
ora-27072
io-error

为了安全性采用库只读情况下exp进行导出,运气不错所有数据顺利导出,完成本次数据恢复任务
exp