19c库启动报ORA-600 kcbzib_kcrsds_1

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:19c库启动报ORA-600 kcbzib_kcrsds_1

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

一套19c的库由于某种情况,发现异常,当时的技术使用隐含参数强制拉库,导致数据库启动报ORA-00704 ORA-600 kcbzib_kcrsds_1错误
kcbzib_kcrsds_1

2024-08-24T06:11:25.494304+08:00
ALTER DATABASE OPEN
2024-08-24T06:11:25.494370+08:00
TMI: adbdrv open database BEGIN 2024-08-24 06:11:25.494324
Smart fusion block transfer is disabled:
  instance mounted in exclusive mode.
2024-08-24T06:11:25.515306+08:00
Beginning crash recovery of 1 threads
 parallel recovery started with 7 processes
 Thread 1: Recovery starting at checkpoint rba (logseq 2 block 3), scn 286550073
2024-08-24T06:11:25.567011+08:00
Started redo scan
2024-08-24T06:11:25.587170+08:00
Completed redo scan
 read 0 KB redo, 0 data blocks need recovery
2024-08-24T06:11:25.595192+08:00
Started redo application at
 Thread 1: logseq 2, block 3, offset 0, scn 0x0000000011146839
2024-08-24T06:11:25.595552+08:00
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: /dbf/RLZY/redo02.log
2024-08-24T06:11:25.595712+08:00
Completed redo application of 0.00MB
2024-08-24T06:11:25.596058+08:00
Completed crash recovery at
 Thread 1: RBA 2.3.0, nab 3, scn 0x000000001114683a
 0 data blocks read, 0 data blocks written, 0 redo k-bytes read
Endian type of dictionary set to little
2024-08-24T06:11:25.648152+08:00
LGWR (PID:1614826): STARTING ARCH PROCESSES
2024-08-24T06:11:25.661738+08:00
TT00 (PID:1614908): Gap Manager starting
Starting background process ARC0
2024-08-24T06:11:25.677246+08:00
ARC0 started with pid=54, OS id=1614910 
2024-08-24T06:11:25.687525+08:00
LGWR (PID:1614826): ARC0: Archival started
LGWR (PID:1614826): STARTING ARCH PROCESSES COMPLETE
2024-08-24T06:11:25.687733+08:00
ARC0 (PID:1614910): Becoming a 'no FAL' ARCH
ARC0 (PID:1614910): Becoming the 'no SRL' ARCH
2024-08-24T06:11:25.696437+08:00
TMON (PID:1614886): STARTING ARCH PROCESSES
Starting background process ARC1
2024-08-24T06:11:25.711645+08:00
Thread 1 advanced to log sequence 3 (thread open)
Redo log for group 3, sequence 3 is not located on DAX storage
2024-08-24T06:11:25.715270+08:00
ARC1 started with pid=56, OS id=1614914 
Starting background process ARC2
Thread 1 opened at log sequence 3
  Current log# 3 seq# 3 mem# 0: /dbf/RLZY/redo03.log
Successful open of redo thread 1
2024-08-24T06:11:25.728586+08:00
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Stopping change tracking
2024-08-24T06:11:25.734124+08:00
ARC2 started with pid=57, OS id=1614916 
Starting background process ARC3
2024-08-24T06:11:25.752891+08:00
ARC3 started with pid=58, OS id=1614918 
2024-08-24T06:11:25.752979+08:00
TMON (PID:1614886): ARC1: Archival started
TMON (PID:1614886): ARC2: Archival started
TMON (PID:1614886): ARC3: Archival started
TMON (PID:1614886): STARTING ARCH PROCESSES COMPLETE
2024-08-24T06:11:25.802551+08:00
ARC0 (PID:1614910): Archived Log entry 2828 added for T-1.S-2 ID 0x74f18f91 LAD:1
2024-08-24T06:11:25.806845+08:00
TT03 (PID:1614922): Sleep 5 seconds and then try to clear SRLs in 2 time(s)
Errors in file /oracle/diag/rdbms/xff/xff/trace/xff_ora_1614892.trc  (incident=124865):
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /oracle/diag/rdbms/xff/xff/incident/incdir_124865/xff_ora_1614892_i124865.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2024-08-24T06:11:25.871925+08:00
2024-08-24T06:11:26.772652+08:00
*****************************************************************
An internal routine has requested a dump of selected redo.
This usually happens following a specific internal error, when
analysis of the redo logs will help Oracle Support with the
diagnosis.
It is recommended that you retain all the redo logs generated (by
all the instances) during the past 12 hours, in case additional
redo dumps are required to help with the diagnosis.
*****************************************************************
2024-08-24T06:11:26.872265+08:00
Errors in file /oracle/diag/rdbms/xff/xff/trace/xff_ora_1614892.trc:
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
2024-08-24T06:11:26.872351+08:00
Errors in file /oracle/diag/rdbms/xff/xff/trace/xff_ora_1614892.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
2024-08-24T06:11:26.872412+08:00
Errors in file /oracle/diag/rdbms/xff/xff/trace/xff_ora_1614892.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
2024-08-24T06:11:26.872455+08:00
Error 704 happened during db open, shutting down database
Errors in file /oracle/diag/rdbms/xff/xff/trace/xff_ora_1614892.trc  (incident=124866):
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /oracle/diag/rdbms/xff/xff/incident/incdir_124866/xff_ora_1614892_i124866.trc
opiodr aborting process unknown ospid (1614892) as a result of ORA-603
2024-08-24T06:11:27.498146+08:00
Errors in file /oracle/diag/rdbms/xff/xff/trace/xff_ora_1614892.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
2024-08-24T06:11:27.501122+08:00
ORA-603 : opitsk aborting process
License high water mark = 8
USER(prelim) (ospid: 1614892): terminating the instance due to ORA error 704
2024-08-24T06:11:28.526358+08:00
Instance terminated by USER(prelim), pid = 1614892

官方关于kcbzib_kcrsds_1从解释只有:Bug 31887074 – sr21.1bigscn_hipu3 – trc – ksfdopn2 – ORA-600 [kcbzib_kcrsds_1] (Doc ID 31887074.8)
ksfdopn2


虽然关于ORA-600 [kcbzib_kcrsds_1],oracle官方没有给出来解决方案,其实通过以往大量的恢复案例和经验中已经知道,这个错误解决方案就是修改oracle scn的方法可以绕过去,以前有过一些类似恢复案例:
ORA-600 kcbzib_kcrsds_1报错
12C数据库报ORA-600 kcbzib_kcrsds_1故障处理
ORA-00603 ORA-01092 ORA-600 kcbzib_kcrsds_1
redo异常强制拉库报ORA-600 kcbzib_kcrsds_1修复
Patch SCN工具一键恢复ORA-600 kcbzib_kcrsds_1
存储故障,强制拉库报ORA-600 kcbzib_kcrsds_1处理

DBMS_SESSION.set_context提示ORA-01031问题解决

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:DBMS_SESSION.set_context提示ORA-01031问题解决

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

最近给客户把oracle数据库从11.2.0.3 aix平台迁移到19.23 linux平台,使用impdp+network_link 按照用户的方式处理,结果发现有一个程序运行异常
ORA-01031


ORA-01031: insufficient privileges通过程序跟踪确认是在调用以下部分异常

 DBMS_SESSION.set_context (
         'back_exec',
         'back_exec_log_no',
         v_back_exec_log_no
……

通过跟踪执行用户权限,确认EXECUTE ON SYS.DBMS_SESSION已经授权.做一个简单测试重现给问题:

SQL> show user;
USER 为 "SYS"

SQL> exec DBMS_SESSION.SET_CONTEXT('test_ctx', 'a', '1');
BEGIN DBMS_SESSION.SET_CONTEXT('test_ctx', 'a', '1'); END;

*
第 1 行出现错误:
ORA-01031: 权限不足
ORA-06512: 在 "SYS.DBMS_SESSION", line 114
ORA-06512: 在 line 1

SQL> GRANT EXECUTE ON SYS.DBMS_SESSION TO sys;

授权成功。

SQL> exec DBMS_SESSION.SET_CONTEXT('test_ctx', 'a', '1');
BEGIN DBMS_SESSION.SET_CONTEXT('test_ctx', 'a', '1'); END;

*
第 1 行出现错误:
ORA-01031: 权限不足
ORA-06512: 在 "SYS.DBMS_SESSION", line 114
ORA-06512: 在 line 1

基于这种情况,肯定不是权限问题,查询官方DBMS_SESSION.set_context部分描述
dbms_session.set_context


确认namespace:The namespace of the application context to be set, limited to 128 bytes. Exceeding the maximum permissible length will result in an error during the execution of the procedure.

SQL> create context test_ctx using sys.DBMS_SESSION;

上下文已创建。

SQL> exec DBMS_SESSION.SET_CONTEXT('test_ctx', 'a', '1');

PL/SQL 过程已成功完成。

官方有相关的执行例子:Example: Creating a Global Application Context that Uses a Client Session ID

redo写丢失导致ORA-600 kcrf_resilver_log_1故障

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:redo写丢失导致ORA-600 kcrf_resilver_log_1故障

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有一个客户硬件故障,做完硬件恢复之后,数据库启动报ORA-600 kcrf_resilver_log_1错误.
kcrf_resilver_log_1

Thu Aug 22 13:37:50 2024
alter database open
Beginning crash recovery of 1 threads
 parallel recovery started with 3 processes
Started redo scan
Errors in file e:\oracle\zy\diag\rdbms\orcl\orcl\trace\orcl_ora_1640.trc  (incident=9767):
ORA-00600: 内部错误代码, 参数: [kcrf_resilver_log_1], [0x7DCEBE020], [2], [], [], [], [], [], [], [], [], []
Incident details in: e:\oracle\zy\diag\rdbms\orcl\orcl\incident\incdir_9767\orcl_ora_1640_i9767.trc
Thu Aug 22 13:37:55 2024
Trace dumping is performing id=[cdmp_20240822133755]
Aborting crash recovery due to error 600
Errors in file e:\oracle\zy\diag\rdbms\orcl\orcl\trace\orcl_ora_1640.trc:
ORA-00600: 内部错误代码, 参数: [kcrf_resilver_log_1], [0x7DCEBE020], [2], [], [], [], [], [], [], [], [], []
Errors in file e:\oracle\zy\diag\rdbms\orcl\orcl\trace\orcl_ora_1640.trc:
ORA-00600: 内部错误代码, 参数: [kcrf_resilver_log_1], [0x7DCEBE020], [2], [], [], [], [], [], [], [], [], []

查询mos出现该问题的原因一般是由于redo log write lost导致
kcrf_resilver_log_1-9056657


这个问题恢复起来不难,一般就是尝试强制打开库,以前有过类似的恢复case:
ORA-600 kcrf_resilver_log_1故障处理
ORA-00600[kcrf_resilver_log_1]异常恢复

硬件故障导致ORA-01242 ORA-01122等错误

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:硬件故障导致ORA-01242 ORA-01122等错误

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户多个节点rac,早上反馈说有两个节点实例异常,需要分析原因,查看其中一个节点的数据库alert日志,发现是由于访问1399号文件异常报ORA-01242 ORA-01122等错误,导致实例crash

Mon Aug 19 20:48:02 2024
Read of datafile '+DATA/xifenfei_01-157.dbf' (fno 1399) header failed with ORA-01207
Rereading datafile 1399 header failed with ORA-01207
Errors in file /u01/app/oracle/diag/rdbms/xff/xff6/trace/xff6_ckpt_75582.trc:
ORA-01242: data file suffered media failure: database in NOARCHIVELOG mode
ORA-01122: database file 1399 failed verification check
ORA-01110: data file 1399: '+DATA/xifenfei_01-157.dbf'
ORA-01207: file is more recent than control file - old control file
Errors in file /u01/app/oracle/diag/rdbms/xff/xff6/trace/xff6_ckpt_75582.trc:
ORA-01242: data file suffered media failure: database in NOARCHIVELOG mode
ORA-01122: database file 1399 failed verification check
ORA-01110: data file 1399: '+DATA/xifenfei_01-157.dbf'
ORA-01207: file is more recent than control file - old control file
CKPT (ospid: 75582): terminating the instance due to error 1242
Mon Aug 19 20:48:02 2024
System state dump requested by (instance=6, osid=75582 (CKPT)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/xff/xff6/trace/xff6_diag_75520.trc
Termination issued to instance processes. Waiting for the processes to exit
Mon Aug 19 20:48:13 2024
ORA-1092 : opitsk aborting process

继续分析日志发现集群尝试拉起该实例,遭遇ORA-01186,ORA-01122无法启动成功

ALTER DATABASE OPEN /* db agent *//* {0:6:39} */
Mon Aug 19 20:49:34 2024
SUCCESS: diskgroup DATA was mounted
Mon Aug 19 20:49:34 2024
NOTE: dependency between database xff and diskgroup resource ora.DATA.dg is established
Mon Aug 19 20:50:41 2024
Picked broadcast on commit scheme to generate SCNs
Mon Aug 19 20:50:42 2024
Read of datafile '+DATA/xifenfei_01-157.dbf' (fno 1399) header failed with ORA-01207
Rereading datafile 1399 header failed with ORA-01207
Errors in file /u01/app/oracle/diag/rdbms/xff/xff6/trace/xff6_dbw0_29208.trc:
ORA-01186: file 1399 failed verification tests
ORA-01122: database file 1399 failed verification check
ORA-01110: data file 1399: '+DATA/xifenfei_01-157.dbf'
ORA-01207: file is more recent than control file - old control file
File 1399 not verified due to error ORA-01122

这个错误是数据库文件访问异常导致,根据经验,出现这种问题一般是由于底层异常导致,查看系统messages日志,发现有硬件磁盘报错

Aug 19 20:41:58 xff6 fcoemon: FC_HOST_EVENT 6894 at 1724071318 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:41:58 xff6 kernel: sd 1:0:0:43: [sdas]  
Aug 19 20:41:58 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:41:58 xff6 kernel: sd 1:0:0:43: [sdas]  
Aug 19 20:41:58 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1
Aug 19 20:42:03 xff6 kernel: sd 1:0:0:43: [sdas]  
Aug 19 20:42:03 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:03 xff6 kernel: sd 1:0:0:43: [sdas]  
Aug 19 20:42:03 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1
Aug 19 20:42:03 xff6 fcoemon: FC_HOST_EVENT 6895 at 1724071323 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:07 xff6 fcoemon: FC_HOST_EVENT 6896 at 1724071327 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:07 xff6 kernel: sd 1:0:0:44: [sdat]  
Aug 19 20:42:07 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:07 xff6 kernel: sd 1:0:0:44: [sdat]  
Aug 19 20:42:07 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1
Aug 19 20:42:12 xff6 fcoemon: FC_HOST_EVENT 6897 at 1724071332 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:12 xff6 kernel: sd 1:0:0:44: [sdat]  
Aug 19 20:42:12 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:12 xff6 kernel: sd 1:0:0:44: [sdat]  
Aug 19 20:42:12 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1
Aug 19 20:42:25 xff6 fcoemon: FC_HOST_EVENT 6898 at 1724071345 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:25 xff6 kernel: sd 1:0:0:42: [sdar]  
Aug 19 20:42:25 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:25 xff6 kernel: sd 1:0:0:42: [sdar]  
Aug 19 20:42:25 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1
Aug 19 20:42:41 xff6 fcoemon: FC_HOST_EVENT 6899 at 1724071361 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:41 xff6 kernel: sd 1:0:0:42: [sdar]  
Aug 19 20:42:41 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:41 xff6 kernel: sd 1:0:0:42: [sdar]  
Aug 19 20:42:41 xff6 kernel: <<vendor>> ASC=0xd0 ASCQ=0x6ASC=0xd0 ASCQ=0x6
Aug 19 20:42:41 xff6 fcoemon: FC_HOST_EVENT 6900 at 1724071361 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:41 xff6 kernel: sd 1:0:0:41: [sdaq]  
Aug 19 20:42:41 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:41 xff6 kernel: sd 1:0:0:41: [sdaq]  
Aug 19 20:42:41 xff6 kernel: <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1
Aug 19 20:42:41 xff6 kernel: sd 1:0:0:41: [sdaq]  
Aug 19 20:42:41 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:41 xff6 kernel: sd 1:0:0:41: [sdaq]  
Aug 19 20:42:41 xff6 kernel: <<vendor>> ASC=0xd0 ASCQ=0x6ASC=0xd0 ASCQ=0x6
Aug 19 20:42:41 xff6 fcoemon: FC_HOST_EVENT 6901 at 1724071361 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:53 xff6 fcoemon: FC_HOST_EVENT 6902 at 1724071373 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:42:53 xff6 kernel: sd 1:0:0:41: [sdaq]  
Aug 19 20:42:53 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:42:53 xff6 kernel: sd 1:0:0:41: [sdaq]  
Aug 19 20:42:53 xff6 kernel: <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1
Aug 19 20:43:03 xff6 kernel: sd 1:0:0:40: [sdap]  
Aug 19 20:43:03 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:43:03 xff6 kernel: sd 1:0:0:40: [sdap]  
Aug 19 20:43:03 xff6 kernel: <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1
Aug 19 20:43:03 xff6 fcoemon: FC_HOST_EVENT 6903 at 1724071383 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:43:03 xff6 fcoemon: FC_HOST_EVENT 6904 at 1724071383 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:43:03 xff6 fcoemon: FC_HOST_EVENT 6905 at 1724071383 secs on host1:code 65535=vendor_unique datalen 32 data=512
Aug 19 20:43:03 xff6 kernel: sd 1:0:0:43: [sdas]  
Aug 19 20:43:03 xff6 kernel: Sense Key : Recovered Error [current] 
Aug 19 20:43:03 xff6 kernel: sd 1:0:0:43: [sdas]  
Aug 19 20:43:03 xff6 kernel: <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1
Aug 19 20:49:26 xff6 kernel: scsi_verify_blk_ioctl: 683 callbacks suppressed

客户进一步分析是由于昨天存储坏了一块盘,然后热备盘顶上了,但是不知道什么原因出现了文件访问异常,可能和当时的rebuild过程有关系.由于客户是rac环境,还有部分剩余节点运行正常,对于异常节点直接启动库成功
20240820-182825


节点写入数据报ORA-01187: cannot read from file because it failed verification tests错误
ora-01187

在所有节点通过执行ALTER SYSTEM CHECK DATAFILES,然后所有节点操作正常
check_datafile

200T 数据库非归档无备份恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:200T 数据库非归档无备份恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

一套近200T的,6个节点的RAC,由于存储管线链路不稳定,导致服务器经常性掉盘,引起asm 磁盘组频繁dismount/mount,数据库集群节点不停的重启,修复好链路问题之后,数据库启动报ORA-01113,ORA-01110
ORA-01113-ORA-01110


通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)脚本检测,发现有10个数据文件异常,无法正常恢复
20240814155122

该库比较大,有近200T,因此恢复需要各位谨慎(无法做现场备份,另外客户要求2天时间必须恢复好)
200t

由于数据库是非归档模式,该库无法通过应用归档日志来实现对这些文件进行恢复,对于这种情况,直接使用dbms_diskgroup把数据文件头拷贝到文件系统中,类似操作

SQL> @dbms_diskgroup_get_block.sql  +DATA/xifenfei.dbf 1 1 /tmp/xff/xifenfei.dbf.header

Parameter 1:
ASM_file_name (required)


Parameter 2:
block_to_extract (required)


Parameter 3
number_of_blocks_to_extract (required)


Parameter 4:
FileSystem_File_Name (required)

old  14:  v_AsmFilename := '&ASM_File_Name';
new  14:  v_AsmFilename := '+DATA/xifenfei.dbf';
old  15:  v_offstart := '&block_to_extract';
new  15:  v_offstart := '1';
old  16:  v_numblks := '&number_of_blocks_to_extract';
new  16:  v_numblks := '1';
old  17:  v_FsFilename := '&FileSystem_File_Name';
new  17:  v_FsFilename := '/tmp/xff/xifenfei.dbf.header';
File: +DATA/xifenfei.dbf
Type: 2 Data File
Size (in logical blocks): 3978880
Logical Block Size: 16384
Physical Block Size: 512

PL/SQL procedure successfully completed.

然后通过bbed修改相关scn

BBED> set filename 'xifenfei.dbf.header'
	FILENAME       	xifenfei.dbf.header

BBED> set blocksize 16384
	BLOCKSIZE      	16384

BBED> map
 File: xifenfei.dbf.header (0)
 Block: 1                                     Dba:0x00000000
------------------------------------------------------------
 Data File Header

 struct kcvfh, 860 bytes                    @0       

 ub4 tailchk                                @16380   


BBED> p kcvfh.kcvfhckp.kcvcpscn
struct kcvcpscn, 8 bytes                    @484     
   ub4 kscnbas                              @484      0xa8061324
   ub2 kscnwrp                              @488      0x0081

BBED> assign file 295 block 1 kcvfh.kcvfhckp.kcvcpscn = file 1 block 1 kcvfh.kcvfhckp.kcvcpscn;
struct kcvcpscn, 8 bytes                    @484     
   ub4 kscnbas                              @484      0xa8133e2b
   ub2 kscnwrp                              @488      0x0081

然后把修改的数据文件头写回到asm中

SQL> @dbms_diskgroup_cp_block_to_asm.sql  /tmp/xff/xifenfei.dbf.header  +DATA/xifenfei.dbf 1 1 

Parameter 1:
v_FsFileName (required)


Parameter 2:
v_AsmFileName (required)


Parameter 3
v_offstart (required)


Parameter 4
v_numblks (required)

old  16: v_FsFileName := '&v_FsFileName';
new  16: v_FsFileName := '/tmp/xff/xifenfei.dbf.header';
old  17: v_AsmFileName := '&v_AsmFileName';
new  17: v_AsmFileName := '+DATA/xifenfei.dbf';
old  18: v_offstart := '&v_offstart';
new  18: v_offstart := '1';
old  19:  v_numblks := '&v_numblks';
new  19:  v_numblks := '1';
File: +DATA/xifenfei.dbf
Type: 2 Data File
Size (in logical blocks): 3978880
Logical Block Size: 16384

PL/SQL procedure successfully completed.

查询文件头是否修改成功

[oracle@xff1 xff]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Sat Aug 10 16:45:02 2024

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options

SQL> set numw 16
SQL> select CHECKPOINT_CHANGE# from v$datafile_header where file# in (1,295);

CHECKPOINT_CHANGE#
------------------
      556870614571
      556870614571

SQL> recover datafile 295;
Media recovery complete.

通过上述操作,确认bbed修改文件头成功,后续类似方法对其他9个文件进行修改,并打开数据库

SQL> recover database;
Media recovery complete.
SQL> alter database open;

Database altered.

alert日志提示

Sat Aug 10 16:46:11 2024
ALTER DATABASE RECOVER  datafile 295  
Media Recovery Start
Serial Media Recovery started
WARNING! Recovering data file 295 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
Media Recovery Complete (xff1)
Completed: ALTER DATABASE RECOVER  datafile 295  
Sat Aug 10 16:46:39 2024
ALTER DATABASE RECOVER  database  
Media Recovery Start
 started logmerger process
Sat Aug 10 16:46:51 2024
WARNING! Recovering data file 1139 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
WARNING! Recovering data file 1140 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
WARNING! Recovering data file 1601 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
WARNING! Recovering data file 1803 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
WARNING! Recovering data file 1827 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
WARNING! Recovering data file 1931 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
WARNING! Recovering data file 2185 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
WARNING! Recovering data file 2473 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
WARNING! Recovering data file 2616 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
Sat Aug 10 16:46:54 2024
Parallel Media Recovery started with 64 slaves
Media Recovery Complete (xff1)
Completed: ALTER DATABASE RECOVER  database  
Sat Aug 10 17:19:58 2024
alter database open
This instance was first to open
Sat Aug 10 17:19:58 2024
SUCCESS: diskgroup DATA was mounted
Sat Aug 10 17:19:58 2024
NOTE: dependency between database xff and diskgroup resource ora.DATA.dg is established
Sat Aug 10 17:20:10 2024
Picked broadcast on commit scheme to generate SCNs
Sat Aug 10 17:20:10 2024
SUCCESS: diskgroup REDO was mounted
Sat Aug 10 17:20:10 2024
NOTE: dependency between database xff and diskgroup resource ora.REDO.dg is established
Thread 1 opened at log sequence 124958
  Current log# 14 seq# 124958 mem# 0: +REDO/xff/log2.ora
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sat Aug 10 17:20:14 2024
SMON: enabling cache recovery
Instance recovery: looking for dead threads
Instance recovery: lock domain invalid but no dead threads
[33770] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:261099864 end:261100854 diff:990 (9 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
Sat Aug 10 17:20:16 2024
minact-scn: Inst 1 is now the master inc#:2 mmon proc-id:33650 status:0x7
minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000
Starting background process GTX0
Sat Aug 10 17:20:16 2024
GTX0 started with pid=45, OS id=34119 
Starting background process RCBG
Sat Aug 10 17:20:16 2024
RCBG started with pid=46, OS id=34121 
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Sat Aug 10 17:20:16 2024
QMNC started with pid=47, OS id=34134 
Starting background process SMCO
Completed: alter database open

其他集群其他节点数据库,一切正常
20240814162201


检查数据字典一致性

SQL> @hcheck.sql
HCheck Version 07MAY18 on 10-AUG-2024 18:24:49
----------------------------------------------
Catalog Version 11.2.0.3.0 (1102000300)
db_name: XFF

				   Catalog	 Fixed
Procedure Name			   Version    Vs Release    Timestamp
Result
------------------------------ ... ---------- -- ---------- --------------
------
.- LobNotInObj		       ... 1102000300 <=  *All Rel* 08/10 18:24:49 PASS
.- MissingOIDOnObjCol	       ... 1102000300 <=  *All Rel* 08/10 18:24:49 PASS
.- SourceNotInObj	       ... 1102000300 <=  *All Rel* 08/10 18:24:49 PASS
.- OversizedFiles	       ... 1102000300 <=  *All Rel* 08/10 18:24:50 PASS
.- PoorDefaultStorage	       ... 1102000300 <=  *All Rel* 08/10 18:24:50 PASS
.- PoorStorage		       ... 1102000300 <=  *All Rel* 08/10 18:24:50 PASS
.- TabPartCountMismatch        ... 1102000300 <=  *All Rel* 08/10 18:24:50 PASS
.- OrphanedTabComPart	       ... 1102000300 <=  *All Rel* 08/10 18:24:50 PASS
.- MissingSum$		       ... 1102000300 <=  *All Rel* 08/10 18:24:50 PASS
.- MissingDir$		       ... 1102000300 <=  *All Rel* 08/10 18:24:50 PASS
.- DuplicateDataobj	       ... 1102000300 <=  *All Rel* 08/10 18:24:50 PASS
.- ObjSynMissing	       ... 1102000300 <=  *All Rel* 08/10 18:24:51 PASS
.- ObjSeqMissing	       ... 1102000300 <=  *All Rel* 08/10 18:24:51 PASS
.- OrphanedUndo 	       ... 1102000300 <=  *All Rel* 08/10 18:24:51 PASS
.- OrphanedIndex	       ... 1102000300 <=  *All Rel* 08/10 18:24:51 PASS
.- OrphanedIndexPartition      ... 1102000300 <=  *All Rel* 08/10 18:24:51 PASS
.- OrphanedIndexSubPartition   ... 1102000300 <=  *All Rel* 08/10 18:24:52 PASS
.- OrphanedTable	       ... 1102000300 <=  *All Rel* 08/10 18:24:52 PASS
.- OrphanedTablePartition      ... 1102000300 <=  *All Rel* 08/10 18:24:52 PASS
.- OrphanedTableSubPartition   ... 1102000300 <=  *All Rel* 08/10 18:24:52 PASS
.- MissingPartCol	       ... 1102000300 <=  *All Rel* 08/10 18:24:52 PASS
.- OrphanedSeg$ 	       ... 1102000300 <=  *All Rel* 08/10 18:24:52 PASS
.- OrphanedIndPartObj#	       ... 1102000300 <=  *All Rel* 08/10 18:24:52 PASS
.- DuplicateBlockUse	       ... 1102000300 <=  *All Rel* 08/10 18:24:52 PASS
.- FetUet		       ... 1102000300 <=  *All Rel* 08/10 18:24:52 PASS
.- Uet0Check		       ... 1102000300 <=  *All Rel* 08/10 18:24:52 PASS
.- SeglessUET		       ... 1102000300 <=  *All Rel* 08/10 18:24:52 PASS
.- BadInd$		       ... 1102000300 <=  *All Rel* 08/10 18:24:52 PASS
.- BadTab$		       ... 1102000300 <=  *All Rel* 08/10 18:24:53 PASS
.- BadIcolDepCnt	       ... 1102000300 <=  *All Rel* 08/10 18:24:53 PASS
.- ObjIndDobj		       ... 1102000300 <=  *All Rel* 08/10 18:24:53 PASS
.- TrgAfterUpgrade	       ... 1102000300 <=  *All Rel* 08/10 18:24:53 PASS
.- ObjType0		       ... 1102000300 <=  *All Rel* 08/10 18:24:53 PASS
.- BadOwner		       ... 1102000300 <=  *All Rel* 08/10 18:24:53 PASS
.- StmtAuditOnCommit	       ... 1102000300 <=  *All Rel* 08/10 18:24:53 PASS
.- BadPublicObjects	       ... 1102000300 <=  *All Rel* 08/10 18:24:53 PASS
.- BadSegFreelist	       ... 1102000300 <=  *All Rel* 08/10 18:24:53 PASS
.- BadDepends		       ... 1102000300 <=  *All Rel* 08/10 18:24:53 PASS
.- CheckDual		       ... 1102000300 <=  *All Rel* 08/10 18:24:53 PASS
.- ObjectNames		       ... 1102000300 <=  *All Rel* 08/10 18:24:53 PASS
.- BadCboHiLo		       ... 1102000300 <=  *All Rel* 08/10 18:24:54 PASS
.- ChkIotTs		       ... 1102000300 <=  *All Rel* 08/10 18:24:54 PASS
.- NoSegmentIndex	       ... 1102000300 <=  *All Rel* 08/10 18:24:54 PASS
.- BadNextObject	       ... 1102000300 <=  *All Rel* 08/10 18:24:54 PASS
.- DroppedROTS		       ... 1102000300 <=  *All Rel* 08/10 18:24:54 PASS
.- FilBlkZero		       ... 1102000300 <=  *All Rel* 08/10 18:24:54 PASS
.- DbmsSchemaCopy	       ... 1102000300 <=  *All Rel* 08/10 18:24:54 PASS
.- OrphanedObjError	       ... 1102000300 >  1102000000 08/10 18:24:54 PASS
.- ObjNotLob		       ... 1102000300 <=  *All Rel* 08/10 18:24:54 PASS
.- MaxControlfSeq	       ... 1102000300 <=  *All Rel* 08/10 18:24:55 PASS
.- SegNotInDeferredStg	       ... 1102000300 >  1102000000 08/10 18:25:18 PASS
.- SystemNotRfile1	       ... 1102000300 >   902000000 08/10 18:25:18 PASS
.- DictOwnNonDefaultSYSTEM     ... 1102000300 <=  *All Rel* 08/10 18:25:18 PASS
.- OrphanTrigger	       ... 1102000300 <=  *All Rel* 08/10 18:25:18 PASS
.- ObjNotTrigger	       ... 1102000300 <=  *All Rel* 08/10 18:25:18 PASS
---------------------------------------
10-AUG-2024 18:25:18  Elapsed: 29 secs
---------------------------------------
Found 0 potential problem(s) and 0 warning(s)

PL/SQL procedure successfully completed.

Statement processed.

Complete output is in trace file:
/u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_71148_HCHECK.trc

运气不错,数据字典本身没有损坏,业务直接运行,一切正常(主要原因是在光纤链路不稳定的情况下,客户已经没有往库中写入数据)