oracleasm createdisk破坏的acfs文件系统恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:oracleasm createdisk破坏的acfs文件系统恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

接到一个朋友请求,客户12.2.0.1的asm被执行了oracleasm createdisk,把之前的asmdisk给重建了
oracleasm-createdisk


根据以前恢复经验,这个故障会把前面1M的数据全部重置
删除asmlib磁盘导致磁盘组故障恢复
分享oracleasm createdisk重新创建asm disk后数据0丢失恢复案例
这个客户有点特殊,他的asm 磁盘组中跑的不是oracle 数据库而是直接跑acfs,然后再里面跑mysql数据库,也就是利用grid实现mysql的底层高可用,acfs实现共享挂载(我的理解一次也只能启动一个节点的mysql),现在asm disk的头被oracleasm createdisk重置之后,导致asm磁盘组无法mount,从而acfs也无法mount.对于这个故障,让现场提供被破坏磁盘使用dd前面100M 发给我进行分析
使用kfed读取asm disk磁盘头信息

H:\TEMP\0423\0423>kfed read data3_100m
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                  1096040823 ; 0x00c: 0x41544177
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
005B78600 00000000 00000000 00000000 41544177  [............wATA]
005B78610 00000000 00000000 00000000 00000000  [................]
005B78620 4C43524F 4B534944 41544144 00000033  [ORCLDISKDATA3...]
005B78630 00000000 00000000 00000000 00000000  [................]
  Repeat 252 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

这里可以发现比较明显的asmlib的标记信息ORCLDISKDATA3,证明该磁盘被oracleasm createdisk重建之后,没有再进行kfed修复或者重建新磁盘组

SUCCESS: CREATE DISKGROUP DATA EXTERNAL REDUNDANCY  DISK '/dev/oracleasm/disks/DATA1' SIZE 1430507M
 DISK '/dev/oracleasm/disks/DATA2' SIZE 1430507M
 DISK '/dev/oracleasm/disks/DATA3' SIZE 1430507M
 ATTRIBUTE 'compatible.asm'='12.2.0.1','compatible.advm'='12.2.0.1','au_size'='4M'

SUCCESS: CREATE DISKGROUP CRS EXTERNAL REDUNDANCY  DISK '/dev/oracleasm/disks/CRS' SIZE 190732M
 ATTRIBUTE 'compatible.asm'='12.2.0.1','compatible.advm'='12.2.0.1','au_size'='4M' /* ASMCA */

通过alert日志中磁盘组的创建语句,确认该磁组是ausize为4M,这样的情况下,asm的磁盘头备份备份和au备份应该都是好的,直接通过winhex来确认
orcldisk


再次通过kfed来确认相关磁盘头信息

H:\TEMP\0423\0423>kfed read data3_100m aus=4096k blkn=1022 aun=1|grep name
kfdhdb.dskname:               DATA_0002 ; 0x028: length=9
kfdhdb.grpname:                    DATA ; 0x048: length=4
kfdhdb.fgname:                DATA_0002 ; 0x068: length=9
kfdhdb.capname:                         ; 0x088: length=0

H:\TEMP\0423\0423>kfed read data3_100m aus=4096k blkn=0 aun=11|grep name
kfdhdb.dskname:               DATA_0002 ; 0x028: length=9
kfdhdb.grpname:                    DATA ; 0x048: length=4
kfdhdb.fgname:                DATA_0002 ; 0x068: length=9
kfdhdb.capname:                         ; 0x088: length=0

基于上述情况,证明磁盘头和au的备份都还在,而且au的备份是4M,完全包含了被createdisk破坏的1M数据,直接吧这个au给还原回去理论上就可以了,但是很不幸,这样处理之后,crs启动过程依旧无法找到表决盘
vote


通过分析crs盘的情况,发现它的偏移量是错误的
voteoffset

通过分析是由于磁盘分区问题导致

磁盘 /dev/sdc:200.0 GB, 199996997632 字节,390619136 个扇区
Units = 扇区 of 1 * 512 = 512 bytes
扇区大小(逻辑/物理):512 字节 / 512 字节
I/O 大小(最小/最佳):512 字节 / 1048576 字节
磁盘标签类型:dos
磁盘标识符:0x00000000

   设备 Boot      Start         End      Blocks   Id  System
/dev/sdc1               1   390619135   195309567+  ee  GPT

磁盘 /dev/sdd:1500.0 GB, 1499996356608 字节,2929680384 个扇区
Units = 扇区 of 1 * 512 = 512 bytes
扇区大小(逻辑/物理):512 字节 / 512 字节
I/O 大小(最小/最佳):512 字节 / 1048576 字节
磁盘标签类型:gpt
Disk identifier: 99F1679E-DC32-4F6A-B85D-D91C87B09775


#         Start          End    Size  Type            Name
 1         2048   2929678335    1.4T  Linux LVM       

处理好分区问题之后,重启crs,一切自动恢复成功
acfs-mysql


至此这个被oracleasm createdisk的case完美恢复,数据0丢失

先offline数据文件,再resetlogs导致恢复复杂的故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:先offline数据文件,再resetlogs导致恢复复杂的故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

本来是一个简单的数据文件被误删除,然后通过底层恢复出来数据文件,再启动库就可以的事情,结果由于对oracle的不了解和自以为是,直接把丢失的文件不存在的情况下,offline文件,然后尝试resetlogs打开库,并且进行了各种尝试,结果使得问题比较麻烦.
故障之后现象
通过分析alert日志大概的主要错误,大概梳理故障情况

1. 启动数据库报control03.ctl丢失

Fri Apr 17 21:53:03 2026
MMNL started with pid=16, OS id=3613 
ORACLE_BASE from environment = /data/oracle
Fri Apr 17 21:53:08 2026
alter database mount
ORA-00210: cannot open the specified control file
ORA-00202: control file: '/data/oracle/oradata/orcl/control03.ctl'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
ORA-00210: cannot open the specified control file
ORA-00202: control file: '/data/oracle/oradata/orcl/control02.ctl'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
ORA-205 signalled during: alter database mount...

如果只是这个文件丢失(这里还没有看到其他数据文件丢失的报错),本身是一个非常简单的故障,直接修改control_files参数即可

2. 结果当时操作的人直接rectl

Fri Apr 17 21:57:01 2026
Successful mount of redo thread 1, with mount id 1758675116
Completed: CREATE CONTROLFILE REUSE DATABASE "ORCL" NORESETLOGS NOARCHIVELOG
    MAXLOGFILES 16
    MAXLOGMEMBERS 3
    MAXDATAFILES 100
    MAXINSTANCES 8
    MAXLOGHISTORY 292
LOGFILE
  GROUP 1 '/data/oracle/oradata/orcl/redo01.log'  SIZE 50M BLOCKSIZE 512,
  GROUP 2 '/data/oracle/oradata/orcl/redo02.log'  SIZE 50M BLOCKSIZE 512,
  GROUP 3 '/data/oracle/oradata/orcl/redo03.log'  SIZE 50M BLOCKSIZE 512
DATAFILE
  '/data/oracle/oradata/orcl/system01.dbf',
  '/data/oracle/oradata/orcl/sysaux01.dbf',
  '/data/oracle/oradata/orcl/undotbs01.dbf',
  '/data/oracle/oradata/orcl/users01.dbf'
CHARACTER SET ZHS16GBK

3.然后启动数据库报错

Fri Apr 17 22:02:43 2026
ALTER DATABASE OPEN
Beginning crash recovery of 1 threads
 parallel recovery started with 3 processes
Started redo scan
Completed redo scan
 read 39020 KB redo, 0 data blocks need recovery
Started redo application at
 Thread 1: logseq 11590, block 2, scn 137806010
Recovery of Online Redo Log: Thread 1 Group 1 Seq 11590 Reading mem 0
  Mem# 0: /data/oracle/oradata/orcl/redo01.log
Completed redo application of 0.00MB
Completed crash recovery at
 Thread 1: logseq 11590, block 78042, scn 137831847
 0 data blocks read, 0 data blocks written, 39020 redo k-bytes read
Fri Apr 17 22:02:44 2026
Thread 1 advanced to log sequence 11591 (thread open)
Thread 1 opened at log sequence 11591
  Current log# 2 seq# 11591 mem# 0: /data/oracle/oradata/orcl/redo02.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Apr 17 22:02:44 2026
SMON: enabling cache recovery
Successfully onlined Undo Tablespace 2.
Dictionary check beginning
Tablespace 'TEMP' #3  found in data dictionary,
but not in the controlfile. Adding to controlfile.
Tablespace 'ERP_XXXX' #6  found in data dictionary,
but not in the controlfile. Adding to controlfile.
Tablespace 'ERP_AAAA' #7  found in data dictionary,
but not in the controlfile. Adding to controlfile.
Tablespace 'ABCD' #8  found in data dictionary,
but not in the controlfile. Adding to controlfile.
Tablespace 'ERP_BBBB' #9  found in data dictionary,
but not in the controlfile. Adding to controlfile.
Tablespace 'ERP_XXD' #10  found in data dictionary,
but not in the controlfile. Adding to controlfile.
Tablespace 'ERP_12SF' #11  found in data dictionary,
but not in the controlfile. Adding to controlfile.
Tablespace 'XXX14' #12  found in data dictionary,
but not in the controlfile. Adding to controlfile.
Tablespace 'P_ZY' #13 found in data dictionary,
but not in the controlfile. Adding to controlfile.
File #5 found in data dictionary but not in controlfile.
Creating OFFLINE file 'MISSING00005' in the controlfile.
File #6 found in data dictionary but not in controlfile.
Creating OFFLINE file 'MISSING00006' in the controlfile.
File #7 found in data dictionary but not in controlfile.
Creating OFFLINE file 'MISSING00007' in the controlfile.
File #8 found in data dictionary but not in controlfile.
Creating OFFLINE file 'MISSING00008' in the controlfile.
File #9 found in data dictionary but not in controlfile.
Creating OFFLINE file 'MISSING00009' in the controlfile.
File #10  found in data dictionary but not in controlfile.
Creating OFFLINE file 'MISSING00010' in the controlfile.
File #11  found in data dictionary but not in controlfile.
Creating OFFLINE file 'MISSING00011' in the controlfile.
File #12 found in data dictionary but not in controlfile.
Creating OFFLINE file 'MISSING00012' in the controlfile.

4.然后尝试resetlogs操作

Sat Apr 18 05:55:10 2026
ALTER DATABASE   MOUNT
Successful mount of redo thread 1, with mount id 1758652862
Database mounted in Exclusive Mode
Lost write protection disabled
Completed: ALTER DATABASE   MOUNT
Sat Apr 18 05:55:14 2026
ALTER DATABASE OPEN RESETLOGS
ORA-1139 signalled during: ALTER DATABASE OPEN RESETLOGS...
Sat Apr 18 05:56:29 2026
Starting ORACLE instance (normal)
ALTER DATABASE RECOVER  DATABASE UNTIL CANCEL  
Media Recovery Start
 started logmerger process
Parallel Media Recovery started with 4 slaves
Sat Apr 18 05:56:29 2026
Warning: Datafile 5 (/data/oracle/orcl/xxxx.dbf) is offline during full database recovery and will not be recovered
Warning: Datafile 6 (/data/oracle/orcl/xxxx.dbf) is offline during full database recovery and will not be recovered
Warning: Datafile 7 (/data/oracle/orcl/xxxx.dbf) is offline during full database recovery and will not be recovered
Warning: Datafile 8 (/data/oracle/orcl/xxxx.dbf) is offline during full database recovery and will not be recovered
Media Recovery Not Required
Completed: ALTER DATABASE RECOVER  DATABASE UNTIL CANCEL  
Sat Apr 18 05:57:45 2026
ALTER DATABASE OPEN RESETLOGS
RESETLOGS after complete recovery through change 137865786
Resetting resetlogs activation ID 1645665187 (0x6216dba3)
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_2549.trc:
ORA-00367: checksum error in log file header
ORA-00322: log 1 of thread 1 is not current copy
ORA-00312: online log 1 thread 1: '/data/oracle/oradata/orcl/redo01.log'
Sat Apr 18 05:57:45 2026
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_m000_2554.trc:
ORA-00316: log 1 of thread 1, type 0 in header is not log file
ORA-00312: online log 1 thread 1: '/data/oracle/oradata/orcl/redo01.log'
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_2549.trc:
ORA-00367: checksum error in log file header
ORA-00322: log 2 of thread 1 is not current copy
ORA-00312: online log 2 thread 1: '/data/oracle/oradata/orcl/redo02.log'
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_m000_2554.trc:
ORA-00316: log 2 of thread 1, type 0 in header is not log file
ORA-00312: online log 2 thread 1: '/data/oracle/oradata/orcl/redo02.log'
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_m000_2554.trc:
ORA-00322: log 3 of thread 1 is not current copy
ORA-00312: online log 3 thread 1: '/data/oracle/oradata/orcl/redo03.log'
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_2549.trc:
ORA-00367: checksum error in log file header
ORA-00322: log 3 of thread 1 is not current copy
ORA-00312: online log 3 thread 1: '/data/oracle/oradata/orcl/redo03.log'
Sat Apr 18 05:57:46 2026
Setting recovery target incarnation to 2
Sat Apr 18 05:57:46 2026
Assigning activation ID 1758652862 (0x68d2e9be)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: /data/oracle/oradata/orcl/redo01.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sat Apr 18 05:57:46 2026
SMON: enabling cache recovery
Successfully onlined Undo Tablespace 2.
Dictionary check beginning
File #5 is offline, but is part of an online tablespace.
data file 5: '/data/oracle/oradata/orcl/xxxx.dbf'
File #6 is offline, but is part of an online tablespace.
data file 6: '/data/oracle/oradata/orcl/xxxx.dbf'
File #7 is offline, but is part of an online tablespace.
data file 7: '/data/oracle/oradata/orcl/xxxx.dbf'
File #8 is offline, but is part of an online tablespace.
data file 8: '/data/oracle/oradata/orcl/xxxx.dbf'

到这一步悲剧基本上已经发生,犯了一个在oracle恢复里面比较忌讳的事情,有数据文件offline的情况下,执行resetlogs操作,导致部分数据文件的resetlogs信息没有被及时更新,导致一套库里面,被offline的这个部分数据文件resetlogs信息小于其他online的数据文件的。

5. 后续其他操作各种报错

Completed: ALTER DATABASE   MOUNT
Sun Apr 19 08:13:02 2026
ALTER DATABASE DATAFILE 5 OFFLINE DROP
Sun Apr 19 08:13:02 2026
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_dbw0_9212.trc  (incident=67094):
ORA-00600: internal error code, arguments: [3600], [5], [14], [], [], [], [], [], [], [], [], []
Incident details in: /data/oracle/diag/rdbms/orcl/orcl/incident/incdir_67094/orcl_dbw0_9212_i67094.trc
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_dbw0_9212.trc:
ORA-00600: internal error code, arguments: [3600], [5], [14], [], [], [], [], [], [], [], [], []
DBW0 (ospid: 9212): terminating the instance due to error 471
Tue Apr 21 22:31:23 2026
Assigning activation ID 1758985759 (0x68d7fe1f)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: /data/oracle/oradata/orcl/redo01.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Tue Apr 21 22:31:23 2026
SMON: enabling cache recovery
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_4951.trc  (incident=87950):
ORA-00600: internal error code, arguments: [2662], [0], [137890858], [0], [137891091], [12583056], []
Incident details in: /data/oracle/diag/rdbms/orcl/orcl/incident/incdir_87950/orcl_ora_4951_i87950.trc
Errors in file /data/oracle/diag/rdbms/orcl/orcl/incident/incdir_87950/orcl_ora_4951_i87950.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/data/oracle/oradata/orcl/redo03.log'
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/data/oracle/oradata/orcl/redo02.log'
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/data/oracle/oradata/orcl/redo02.log'
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/data/oracle/oradata/orcl/redo03.log'
ORA-00600: internal error code, arguments: [2662], [0], [137890858], [0], [137891091], [12583056], []
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_4951.trc:
ORA-00600: internal error code, arguments: [2662], [0], [137890858], [0], [137891091], [12583056], []
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_4951.trc:
ORA-00600: internal error code, arguments: [2662], [0], [137890858], [0], [137891091], [12583056], []
Error 600 happened during db open, shutting down database
USER (ospid: 4951): terminating the instance due to error 600

接手故障之后分析
使用obet工具直接快速的检查坏块情况和文件头信息,关于obet的介绍参考:
obet实现对数据文件坏块检测功能
Oracle数据块编辑工具( Oracle Block Editor Tool)-obet
dbv


dbv检测没任何坏块,比较好好的消息
reset

但是检测数据文件头信息,发现有三种类型的resetlogs的信息,证明进行了多次部分文件的情况下进行了resetlogs操作

恢复处理
1. 使用Oracle Recovery Tools工具修改 resetlogs 信息
由于大量reseltogs 信息不一致,先使用Oracle Recovery Tools修改scn等相关信息Oracle Recovery Tools恢复案例总结—202505(注意选择resetlogs scn最大的文件为参照文件)
orarec

2. 重建ctl,打开库
open-db

比较幸运直接打开成功(本来也就应该成功,因为客户本身之前丢失主要业务文件的时候多次打开过库)


3. 然后增加temp文件,并expdp导出数据,完成本次恢复工作

exp dmp导入报IMP-00098: INTERNAL ERROR: impgst2故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:exp dmp导入报IMP-00098: INTERNAL ERROR: impgst2故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

接到客户反馈,exp导出来的dmp无法导入到库中,而且原库已经被删除并且创建了新库,希望我们协助把dmp里面几个核心表给恢复出来
故障现象
imp导入dmp文件报 IMP-00098: INTERNAL ERROR: impgst2错误
imp


imp导入操作直接终止,无法恢复需要的数据
故障原因
分析导出日志,发现”ORA-24801: 在 OCI lob 函数中非法的参数值” 错误
ORA-24801

查询mos发现EXP-56 ORA-24801 During Export KB83982文章
nls

进一步和客户确认,他们确实修改过该库的字符集。基本上可以确认是由于修改字符集导致lob数据损坏,然后exp导出dmp中这些损坏的lob破坏了dmp的完整性,使得dmp无法正常导入
故障解决
对于这样的情况,由于损坏的lob比较靠前,而且不是客户业务用户中数据.处理方法有两种:
1. 直接使用winhex把损坏的lob表从dmp中剔除掉,然后导入数据
2. 直接使用工具从dmp中提取需要的表数据,以前处理过类似文章:
解决imp导入数据报IMP-00098错误
IMP-00098: INTERNAL ERROR: impgst2
exp dmp文件损坏(坏块/corruption)恢复—跳过dmp坏块

一次运气好的ORA-600 kcratr_nab_less_than_odr故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:一次运气好的ORA-600 kcratr_nab_less_than_odr故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户由于虚拟化环境空间不足,导致数据库异常,启动报ORA-600 kcratr_nab_less_than_odr错误

Mon Apr 06 00:13:16 2026
Completed: alter database mount exclusive
alter database open
Beginning crash recovery of 1 threads
 parallel recovery started with 3 processes
Started redo scan
Mon Apr 06 00:13:26 2026
Completed redo scan
 read 5480 KB redo, 459 data blocks need recovery
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_2324.trc  (incident=418959):
ORA-00600: ??????, ??: [kcratr_nab_less_than_odr], [1], [53856], [40105], [43042], [], [], [], [], [], [], []
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_418959\orcl_ora_2324_i418959.trc
Aborting crash recovery due to error 600
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_2324.trc:
ORA-00600: ??????, ??: [kcratr_nab_less_than_odr], [1], [53856], [40105], [43042], [], [], [], [], [], [], []
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_2324.trc:
ORA-00600: ??????, ??: [kcratr_nab_less_than_odr], [1], [53856], [40105], [43042], [], [], [], [], [], [], []
ORA-600 signalled during: alter database open...
Mon Apr 06 00:13:33 2026
Trace dumping is performing id=[cdmp_20260406001333]

由于客户自己不熟悉,故障之后,没有再次继续操作,一直保留着现场。这个故障一般是由于ctl写丢失导致,一般首先选择rectl
11111


然后尝试open库,运气不错,直接打开成功
22222

这样就完成了本次恢复工作,数据库一切正常,运气不错。对于ORA-600 kcratr_nab_less_than_odr错误大部分时候,可以这样简单的恢复,但是也遇到过rectl之后,继续报ORA-600等错误的情况:
ORA-600 kcratr_nab_less_than_odr和ORA-600 4194故障处理
ORA-600 kcratr_nab_less_than_odr和ORA-600 2662故障处理
ORA-600 kcratr_nab_less_than_odr和ORA-600 4193故障处理

obet一键恢复offline数据文件

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:obet一键恢复offline数据文件

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户有一个数据库由于异常断电之后无法正常启动,自行尝试恢复之后但是没有open成功,让可以通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)脚本收集信息进行评估,发现两个问题:
1. 根据查询信息确认users01.dbf(file# 4)文件处于offline状态,而且checkpoint scn明显小于其他文件
offline


2. 通过分析alert日志确认客户在尝试offline file 4 之后open数据库报ORA-600 4194错误,数据库没有open成功

Mon Mar 30 06:26:30 2026
Starting ORACLE instance (normal)
Mon Mar 30 06:27:00 2026
ALTER DATABASE DATAFILE 4 OFFLINE DROP
Completed: ALTER DATABASE DATAFILE 4 OFFLINE DROP
Mon Mar 30 06:27:26 2026
ALTER DATABASE OPEN
Beginning crash recovery of 1 threads
 parallel recovery started with 7 processes
Started redo scan
Completed redo scan
 read 88 KB redo, 103 data blocks need recovery
Started redo application at
 Thread 1: logseq 3, block 3
Recovery of Online Redo Log: Thread 1 Group 3 Seq 3 Reading mem 0
  Mem# 0: /home/oracle/app/oracle/oradata/orcl/redo03.log
Completed redo application of 0.07MB
Completed crash recovery at
 Thread 1: logseq 3, block 180, scn 415466134
 103 data blocks read, 103 data blocks written, 88 redo k-bytes read
Mon Mar 30 06:27:28 2026
Thread 1 advanced to log sequence 4 (thread open)
Thread 1 opened at log sequence 4
  Current log# 1 seq# 4 mem# 0: /home/oracle/app/oracle/oradata/orcl/redo01.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Mon Mar 30 06:27:28 2026
SMON: enabling cache recovery
Successfully onlined Undo Tablespace 2.
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
No Resource Manager plan active
Errors in file /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_17628.trc(incident=186839):
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /home/oracle/app/oracle/diag/rdbms/orcl/orcl/orcl_smon_17628_i186839.trc
Exception [type: SIGBUS, Non-existent physical address] [ADDR:0x6C0C4B62] [PC:0x2297750, kgegpa()+40]
Exception [type: SIGBUS, Non-existent physical address] [ADDR:0x6C0C4B62] [PC:0x229597B, kgebse()+279]
Mon Mar 30 06:27:28 2026
PMON (ospid: 17604): terminating the instance due to error 397
Errors in file /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_17628.trc:
ORA-00328: archived log ends at change 415466135, need later change 415466136
ORA-00334: archived log: '/home/oracle/app/oracle/oradata/orcl/redo03.log'
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Instance terminated by PMON, pid = 17604

接手这个故障之后,由于数据库是非归档模式,而且已经被屏蔽一致性强制打开过(open过程没成功),redo已经被clear过,因此基于这样的情况,直接上obet工具(Oracle Block Editor Tool修改file# 4的文件头状态
obet修复之前文件状态

STATUS	CHECKPOINT_TIME 			 FUZ CHECKPOINT_CHANGE# 	 ROW_NUM
------- ---------------------------------------- --- ------------------ ----------------
OFFLINE 2026-03-30 06:24:41			 YES	      415446014 	       1
ONLINE	2026-03-30 06:28:26			 YES	      415486184 	       7

obet修改文件头操作

OBET> open listfile.txt
Loaded 8 files from  datafile list 'listfile.txt'.

OBET> info

Loaded files (2 total):
----------------------------------------
Number  Path
----------------------------------------
     1  /home/oracle/app/oracle/oradata/orcl/system01.dbf
     4  /opt/oradata/orcl/users01.dbf
----------------------------------------

OBET> set mode edit
mode set to: edit


OBET> set file 4
filename set to: /opt/oradata/orcl/users01.dbf (file#4)

OBET> backup block 1
Created backup directory: backup_blk
Successfully backed up block 1 from current file to /tmp/backup_blk/users01.dbf_1.20260331092333


OBET> copy chkscn file 1 to file 4

Confirm Modify chkscn:
Source: file#1 (/home/oracle/app/oracle/oradata/orcl/system01.dbf)
Target: file#4 (/opt/oradata/orcl/users01.dbf)
Proceed? (Y/YES to confirm): y
Successfully copied checkpoint SCN information from file#1 to file#4.

OBET> exit
Exiting OBET.

再次查询文件头scn信息

STATUS	CHECKPOINT_TIME 			 FUZ CHECKPOINT_CHANGE# 	 ROW_NUM
------- ---------------------------------------- --- ------------------ ----------------
OFFLINE 2026-03-30 06:28:26			 NO	      415486184 	       1
ONLINE	2026-03-30 06:28:26			 YES	      415486184 	       7

尝试online文件,并open库成功

SQL> recover datafile 4;
Media recovery complete.
SQL> alter database datafile 4 online;

Database altered.

SQL> alter database open;

Database altered.

然后expdp导出数据成功,基本完成本次数据恢复工作
dump