一套近200T的,6个节点的RAC,由于存储管线链路不稳定,导致服务器经常性掉盘,引起asm 磁盘组频繁dismount/mount,数据库集群节点不停的重启,修复好链路问题之后,数据库启动报ORA-01113,ORA-01110

通过
Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)脚本检测,发现有10个数据文件异常,无法正常恢复
该库比较大,有近200T,因此恢复需要各位谨慎(无法做现场备份,另外客户要求2天时间必须恢复好)
由于数据库是非归档模式,该库无法通过应用归档日志来实现对这些文件进行恢复,对于这种情况,直接使用dbms_diskgroup把数据文件头拷贝到文件系统中,类似操作
SQL> @dbms_diskgroup_get_block.sql +DATA/xifenfei.dbf 1 1 /tmp/xff/xifenfei.dbf.header
Parameter 1:
ASM_file_name (required)
Parameter 2:
block_to_extract (required)
Parameter 3
number_of_blocks_to_extract (required)
Parameter 4:
FileSystem_File_Name (required)
old 14: v_AsmFilename := '&ASM_File_Name';
new 14: v_AsmFilename := '+DATA/xifenfei.dbf';
old 15: v_offstart := '&block_to_extract';
new 15: v_offstart := '1';
old 16: v_numblks := '&number_of_blocks_to_extract';
new 16: v_numblks := '1';
old 17: v_FsFilename := '&FileSystem_File_Name';
new 17: v_FsFilename := '/tmp/xff/xifenfei.dbf.header';
File: +DATA/xifenfei.dbf
Type: 2 Data File
Size (in logical blocks): 3978880
Logical Block Size: 16384
Physical Block Size: 512
PL/SQL procedure successfully completed.
然后通过bbed修改相关scn
BBED> set filename 'xifenfei.dbf.header'
FILENAME xifenfei.dbf.header
BBED> set blocksize 16384
BLOCKSIZE 16384
BBED> map
File: xifenfei.dbf.header (0)
Block: 1 Dba:0x00000000
------------------------------------------------------------
Data File Header
struct kcvfh, 860 bytes @0
ub4 tailchk @16380
BBED> p kcvfh.kcvfhckp.kcvcpscn
struct kcvcpscn, 8 bytes @484
ub4 kscnbas @484 0xa8061324
ub2 kscnwrp @488 0x0081
BBED> assign file 295 block 1 kcvfh.kcvfhckp.kcvcpscn = file 1 block 1 kcvfh.kcvfhckp.kcvcpscn;
struct kcvcpscn, 8 bytes @484
ub4 kscnbas @484 0xa8133e2b
ub2 kscnwrp @488 0x0081
然后把修改的数据文件头写回到asm中
SQL> @dbms_diskgroup_cp_block_to_asm.sql /tmp/xff/xifenfei.dbf.header +DATA/xifenfei.dbf 1 1
Parameter 1:
v_FsFileName (required)
Parameter 2:
v_AsmFileName (required)
Parameter 3
v_offstart (required)
Parameter 4
v_numblks (required)
old 16: v_FsFileName := '&v_FsFileName';
new 16: v_FsFileName := '/tmp/xff/xifenfei.dbf.header';
old 17: v_AsmFileName := '&v_AsmFileName';
new 17: v_AsmFileName := '+DATA/xifenfei.dbf';
old 18: v_offstart := '&v_offstart';
new 18: v_offstart := '1';
old 19: v_numblks := '&v_numblks';
new 19: v_numblks := '1';
File: +DATA/xifenfei.dbf
Type: 2 Data File
Size (in logical blocks): 3978880
Logical Block Size: 16384
PL/SQL procedure successfully completed.
查询文件头是否修改成功
[oracle@xff1 xff]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Sat Aug 10 16:45:02 2024
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL> set numw 16
SQL> select CHECKPOINT_CHANGE# from v$datafile_header where file# in (1,295);
CHECKPOINT_CHANGE#
------------------
556870614571
556870614571
SQL> recover datafile 295;
Media recovery complete.
通过上述操作,确认bbed修改文件头成功,后续类似方法对其他9个文件进行修改,并打开数据库
SQL> recover database;
Media recovery complete.
SQL> alter database open;
Database altered.
alert日志提示
Sat Aug 10 16:46:11 2024
ALTER DATABASE RECOVER datafile 295
Media Recovery Start
Serial Media Recovery started
WARNING! Recovering data file 295 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
Media Recovery Complete (xff1)
Completed: ALTER DATABASE RECOVER datafile 295
Sat Aug 10 16:46:39 2024
ALTER DATABASE RECOVER database
Media Recovery Start
started logmerger process
Sat Aug 10 16:46:51 2024
WARNING! Recovering data file 1139 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
WARNING! Recovering data file 1140 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
WARNING! Recovering data file 1601 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
WARNING! Recovering data file 1803 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
WARNING! Recovering data file 1827 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
WARNING! Recovering data file 1931 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
WARNING! Recovering data file 2185 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
WARNING! Recovering data file 2473 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
WARNING! Recovering data file 2616 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
Sat Aug 10 16:46:54 2024
Parallel Media Recovery started with 64 slaves
Media Recovery Complete (xff1)
Completed: ALTER DATABASE RECOVER database
Sat Aug 10 17:19:58 2024
alter database open
This instance was first to open
Sat Aug 10 17:19:58 2024
SUCCESS: diskgroup DATA was mounted
Sat Aug 10 17:19:58 2024
NOTE: dependency between database xff and diskgroup resource ora.DATA.dg is established
Sat Aug 10 17:20:10 2024
Picked broadcast on commit scheme to generate SCNs
Sat Aug 10 17:20:10 2024
SUCCESS: diskgroup REDO was mounted
Sat Aug 10 17:20:10 2024
NOTE: dependency between database xff and diskgroup resource ora.REDO.dg is established
Thread 1 opened at log sequence 124958
Current log# 14 seq# 124958 mem# 0: +REDO/xff/log2.ora
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sat Aug 10 17:20:14 2024
SMON: enabling cache recovery
Instance recovery: looking for dead threads
Instance recovery: lock domain invalid but no dead threads
[33770] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:261099864 end:261100854 diff:990 (9 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
Sat Aug 10 17:20:16 2024
minact-scn: Inst 1 is now the master inc#:2 mmon proc-id:33650 status:0x7
minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000
Starting background process GTX0
Sat Aug 10 17:20:16 2024
GTX0 started with pid=45, OS id=34119
Starting background process RCBG
Sat Aug 10 17:20:16 2024
RCBG started with pid=46, OS id=34121
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Sat Aug 10 17:20:16 2024
QMNC started with pid=47, OS id=34134
Starting background process SMCO
Completed: alter database open
其他集群其他节点数据库,一切正常

检查数据字典一致性
SQL> @hcheck.sql
HCheck Version 07MAY18 on 10-AUG-2024 18:24:49
----------------------------------------------
Catalog Version 11.2.0.3.0 (1102000300)
db_name: XFF
Catalog Fixed
Procedure Name Version Vs Release Timestamp
Result
------------------------------ ... ---------- -- ---------- --------------
------
.- LobNotInObj ... 1102000300 <= *All Rel* 08/10 18:24:49 PASS
.- MissingOIDOnObjCol ... 1102000300 <= *All Rel* 08/10 18:24:49 PASS
.- SourceNotInObj ... 1102000300 <= *All Rel* 08/10 18:24:49 PASS
.- OversizedFiles ... 1102000300 <= *All Rel* 08/10 18:24:50 PASS
.- PoorDefaultStorage ... 1102000300 <= *All Rel* 08/10 18:24:50 PASS
.- PoorStorage ... 1102000300 <= *All Rel* 08/10 18:24:50 PASS
.- TabPartCountMismatch ... 1102000300 <= *All Rel* 08/10 18:24:50 PASS
.- OrphanedTabComPart ... 1102000300 <= *All Rel* 08/10 18:24:50 PASS
.- MissingSum$ ... 1102000300 <= *All Rel* 08/10 18:24:50 PASS
.- MissingDir$ ... 1102000300 <= *All Rel* 08/10 18:24:50 PASS
.- DuplicateDataobj ... 1102000300 <= *All Rel* 08/10 18:24:50 PASS
.- ObjSynMissing ... 1102000300 <= *All Rel* 08/10 18:24:51 PASS
.- ObjSeqMissing ... 1102000300 <= *All Rel* 08/10 18:24:51 PASS
.- OrphanedUndo ... 1102000300 <= *All Rel* 08/10 18:24:51 PASS
.- OrphanedIndex ... 1102000300 <= *All Rel* 08/10 18:24:51 PASS
.- OrphanedIndexPartition ... 1102000300 <= *All Rel* 08/10 18:24:51 PASS
.- OrphanedIndexSubPartition ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS
.- OrphanedTable ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS
.- OrphanedTablePartition ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS
.- OrphanedTableSubPartition ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS
.- MissingPartCol ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS
.- OrphanedSeg$ ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS
.- OrphanedIndPartObj# ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS
.- DuplicateBlockUse ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS
.- FetUet ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS
.- Uet0Check ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS
.- SeglessUET ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS
.- BadInd$ ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS
.- BadTab$ ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS
.- BadIcolDepCnt ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS
.- ObjIndDobj ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS
.- TrgAfterUpgrade ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS
.- ObjType0 ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS
.- BadOwner ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS
.- StmtAuditOnCommit ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS
.- BadPublicObjects ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS
.- BadSegFreelist ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS
.- BadDepends ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS
.- CheckDual ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS
.- ObjectNames ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS
.- BadCboHiLo ... 1102000300 <= *All Rel* 08/10 18:24:54 PASS
.- ChkIotTs ... 1102000300 <= *All Rel* 08/10 18:24:54 PASS
.- NoSegmentIndex ... 1102000300 <= *All Rel* 08/10 18:24:54 PASS
.- BadNextObject ... 1102000300 <= *All Rel* 08/10 18:24:54 PASS
.- DroppedROTS ... 1102000300 <= *All Rel* 08/10 18:24:54 PASS
.- FilBlkZero ... 1102000300 <= *All Rel* 08/10 18:24:54 PASS
.- DbmsSchemaCopy ... 1102000300 <= *All Rel* 08/10 18:24:54 PASS
.- OrphanedObjError ... 1102000300 > 1102000000 08/10 18:24:54 PASS
.- ObjNotLob ... 1102000300 <= *All Rel* 08/10 18:24:54 PASS
.- MaxControlfSeq ... 1102000300 <= *All Rel* 08/10 18:24:55 PASS
.- SegNotInDeferredStg ... 1102000300 > 1102000000 08/10 18:25:18 PASS
.- SystemNotRfile1 ... 1102000300 > 902000000 08/10 18:25:18 PASS
.- DictOwnNonDefaultSYSTEM ... 1102000300 <= *All Rel* 08/10 18:25:18 PASS
.- OrphanTrigger ... 1102000300 <= *All Rel* 08/10 18:25:18 PASS
.- ObjNotTrigger ... 1102000300 <= *All Rel* 08/10 18:25:18 PASS
---------------------------------------
10-AUG-2024 18:25:18 Elapsed: 29 secs
---------------------------------------
Found 0 potential problem(s) and 0 warning(s)
PL/SQL procedure successfully completed.
Statement processed.
Complete output is in trace file:
/u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_71148_HCHECK.trc
运气不错,数据字典本身没有损坏,业务直接运行,一切正常(主要原因是在光纤链路不稳定的情况下,客户已经没有往库中写入数据)