联系:手机/微信(+86 17813235971) QQ(107644445)
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
有客户exsi系统被勒索病毒加密,拷贝出来磁盘文件,通过工具分析,磁盘文件均有部分被破坏,恢复工具无法自动识别

无法扫描到任何分区信息

和客户沟通确认三快盘采用的是lvm方式管理,尝试检索lvm信息


检索lv信息

选择合适的lv,进行读取

人工选择合适的磁盘作为pv

直接查看lv中数据

联系:手机/微信(+86 17813235971) QQ(107644445)
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
有客户exsi系统被勒索病毒加密,拷贝出来磁盘文件,通过工具分析,磁盘文件均有部分被破坏,恢复工具无法自动识别








联系:手机/微信(+86 17813235971) QQ(107644445)
标题:open数据库报ora-600 kdsgrp1故障处理
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
客户服务器异常关机之后,数据库无法正常启动,查看alert日志发现如下报ORA-600 kdsgrp1报错信息导致数据库无法正常open
2025-12-05T16:17:23.315325+08:00 MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set stopping change tracking Undo initialization recovery: err:0 start: 7620046 end: 7620062 diff: 16 ms (0.0 seconds) Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\his\his\trace\his_ora_9672.trc (incident=1243378): ORA-00600: 内部错误代码, 参数: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], [] Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\his\his\incident\incdir_1243378\his_ora_9672_i1243378.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. 2025-12-05T16:17:25.341421+08:00 ***************************************************************** An internal routine has requested a dump of selected redo. This usually happens following a specific internal error, when analysis of the redo logs will help Oracle Support with the diagnosis. It is recommended that you retain all the redo logs generated (by all the instances) during the past 12 hours, in case additional redo dumps are required to help with the diagnosis. ***************************************************************** 2025-12-05T16:17:25.533576+08:00 Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\his\his\trace\his_ora_9672.trc (incident=1243379): ORA-00600: 内部错误代码, 参数: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], [] Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\his\his\incident\incdir_1243379\his_ora_9672_i1243379.trc 2025-12-05T16:17:27.007101+08:00 Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. 2025-12-05T16:17:27.008097+08:00 Undo initialization online undo segments: err:600 start: 7620062 end: 7623343 diff: 3281 ms (3.3 seconds) 2025-12-05T16:17:27.008097+08:00 Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\his\his\trace\his_ora_9672.trc: ORA-00600: 内部错误代码, 参数: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], [] 2025-12-05T16:17:27.008097+08:00 Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\his\his\trace\his_ora_9672.trc: ORA-00600: 内部错误代码, 参数: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], [] Error 600 happened during db open, shutting down database Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\his\his\trace\his_ora_9672.trc (incident=1243380): ORA-00603: ORACLE 服务器会话因致命错误而终止 ORA-01092: ORACLE 实例终止。强制断开连接 ORA-00600: 内部错误代码, 参数: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], [] Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\his\his\incident\incdir_1243380\his_ora_9672_i1243380.trc 2025-12-05T16:17:28.540363+08:00 opiodr aborting process unknown ospid (9672) as a result of ORA-603
尝试人工启动数据库,报错比较明显也是ORA-600 kdsgrp1错误
C:\Users\Administrator>sqlplus / as sysdba SQL*Plus: Release 19.0.0.0.0 - Production on Fri Dec 5 16:45:48 2025 Version 19.3.0.0.0 Copyright (c) 1982, 2019, Oracle. All rights reserved. Connected to: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Version 19.3.0.0.0 SQL> recover datafile 1; Media recovery complete. SQL> recover database; Media recovery complete. SQL> alter database open ; alter database open * ERROR at line 1: ORA-00603: ORACLE server session terminated by fatal error ORA-01092: ORACLE instance terminated. Disconnection forced ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], [] Process ID: 10800 Session ID: 5025 Serial number: 34108
查看报错trace文件
*** CLIENT DRIVER:(SQL*PLUS) 2025-12-05T16:17:23.924647+08:00
[TOC00000]
Jump to table of contents
Dump continued from file: D:\APP\ADMINISTRATOR\diag\rdbms\his\his\trace\his_ora_9672.trc
[TOC00001]
ORA-00600: 内部错误代码, 参数: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
[TOC00001-END]
[TOC00002]
========= Dump for incident 1243378 (ORA 600 [kdsgrp1]) ========
[TOC00003]
----- Beginning of Customized Incident Dump(s) -----
kdsDumpState: cdb: 0 dspdb: 0 type: 1 info: 0x5 flag: 0x0
kdsDumpState: sample type: 0 layer: 0
* kdsgrp1-1: ***********************************************
row 0x00c02754.24 continuation at: 0x00c02754.24 file# 3 block# 10068 slot 36 not found (dscnt: 0)
state kdscurrid points to 0x00c28206.17
KDSTABN_GET: 0 ..... ntab: 1
curSlot: 36 ..... nrows: 60
Dumping kcb descriptor:
kcbds 0x1e08fe71510: pdb 0, tsn 1, rdba 0x00c02754, afn 3, objd 11331, cls 1, tidflg 0x8 0x80 0x0
在这里基本上可以确认报ORA-600 错误的是file 3 block 10068,dataobj 为11331,根据经验初步怀疑是sysaux中的wrh$_某个对象异常.对启动过程做10046进一步确认
EXEC #2063975471112:c=17341,e=35163,p=54,cr=283,cu=0,mis=1,r=0,dep=1,og=4,plh=794648223,tim=9717774784 WAIT #2063975471112: nam='db file sequential read' ela= 115 file#=3 block#=11434 blocks=1 obj#=11579 tim=9717774961 WAIT #2063975471112: nam='db file sequential read' ela= 93 file#=3 block#=11435 blocks=1 obj#=11579 tim=9717775098 WAIT #2063975471112: nam='db file sequential read' ela= 89 file#=3 block#=11436 blocks=1 obj#=11579 tim=9717775217 WAIT #2063975471112: nam='db file sequential read' ela= 98 file#=3 block#=11437 blocks=1 obj#=11579 tim=9717775361 WAIT #2063975471112: nam='db file sequential read' ela= 23087 file#=3 block#=11438 blocks=1 obj#=11579 tim=9717798471 WAIT #2063975471112: nam='db file sequential read' ela= 83 file#=3 block#=11439 blocks=1 obj#=11579 tim=9717798681 WAIT #2063975471112: nam='db file sequential read' ela= 183 file#=3 block#=10075 blocks=1 obj#=11332 tim=9717799100 WAIT #2063975471112: nam='db file sequential read' ela= 124 file#=3 block#=10078 blocks=1 obj#=11332 tim=9717799291 WAIT #2063975471112: nam='db file sequential read' ela= 115 file#=3 block#=165702 blocks=1 obj#=11332 tim=9717799460 WAIT #2063975471112: nam='db file sequential read' ela= 109 file#=3 block#=83355 blocks=1 obj#=11332 tim=9717799621 WAIT #2063975471112: nam='db file sequential read' ela= 139 file#=3 block#=165703 blocks=1 obj#=11332 tim=9717799816 WAIT #2063975471112: nam='db file sequential read' ela= 139 file#=3 block#=10079 blocks=1 obj#=11332 tim=9717800074 WAIT #2063975471112: nam='db file sequential read' ela= 112 file#=3 block#=165696 blocks=1 obj#=11332 tim=9717800248 WAIT #2063975471112: nam='db file sequential read' ela= 110 file#=3 block#=165701 blocks=1 obj#=11332 tim=9717800502 WAIT #2063975471112: nam='db file sequential read' ela= 107 file#=3 block#=83354 blocks=1 obj#=11332 tim=9717800665 WAIT #2063975471112: nam='db file sequential read' ela= 109 file#=3 block#=83356 blocks=1 obj#=11332 tim=9717800823 WAIT #2063975471112: nam='db file sequential read' ela= 107 file#=3 block#=83358 blocks=1 obj#=11332 tim=9717800978 WAIT #2063975471112: nam='db file sequential read' ela= 106 file#=3 block#=165698 blocks=1 obj#=11332 tim=9717801133 WAIT #2063975471112: nam='db file sequential read' ela= 106 file#=3 block#=164358 blocks=1 obj#=11331 tim=9717801298 WAIT #2063975471112: nam='db file sequential read' ela= 108 file#=3 block#=10068 blocks=1 obj#=11331 tim=9717801509 2025-12-05T16:52:21.382064+08:00 ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], [] FETCH #2063975471112:c=1599462,e=1655487,p=20,cr=26,cu=0,mis=0,r=0,dep=1,og=4,plh=794648223,tim=9719430286 STAT #2063975471112 id=1 cnt=0 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=0 pr=0 pw=0 str=1 time=8 us)' STAT #2063975471112 id=2 cnt=0 pid=1 pos=1 obj=0 op='HASH JOIN (cr=0 pr=0 pw=0 str=1 time=5 us cost=9 size=34 card=1)' STAT #2063975471112 id=3 cnt=1 pid=2 pos=1 obj=11579 op='TABLE ACCESS FULL WRM$_SNAPSHOT (cr=12 pr=6 pw=0 str=1 time=23966 us cost=4 size=17 card=1)' STAT #2063975471112 id=4 cnt=1 pid=3 pos=1 obj=0 op='SORT AGGREGATE (cr=6 pr=3 pw=0 str=1 time=23513 us)' STAT #2063975471112 id=5 cnt=1 pid=4 pos=1 obj=11579 op='TABLE ACCESS FULL WRM$_SNAPSHOT (cr=6 pr=3 pw=0 str=1 time=23497 us cost=4 size=13 card=1)' STAT #2063975471112 id=6 cnt=6 pid=2 pos=2 obj=11331 op='TABLE ACCESS BY INDEX ROWID BATCHED WRH$_UNDOSTAT (cr=13 pr=13 pw=0 str=1 time=2455 us cost=4 size=102 card=6)' STAT #2063975471112 id=7 cnt=7 pid=6 pos=1 obj=11332 op='INDEX SKIP SCAN WRH$_UNDOSTAT_PK (cr=12 pr=12 pw=0 str=1 time=2304 us cost=2 size=0 card=6)' 2025-12-05T16:52:23.005928+08:00 ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], [] <error barrier> at 0x000000275BED14D0 placed dbsdrv.c@4959 ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], [] <error barrier> at 0x000000275BED14D0 placed dbsdrv.c@4959 ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
到这里基本上可以确认是WRH$_UNDOSTAT_PK去找WRH$_UNDOSTAT中午对应记录,从而出现该问题,对应的具体sql为
PARSING IN CURSOR #2063975471112 len=233 dep=1 uid=0 oct=3 lid=0 tim=9717739563 hv=517686153 ad='1e1efda1f20' sqlid='d4m7ss0gdqhw9' select max(maxconcurrency) from sys.wrh$_undostat where instance_number = :1 and dbid = :2 and snap_id in (select snap_id from dba_hist_snapshot where end_interval_time >(select max(end_interval_time)-7 from dba_hist_snapshot)) BINDS #2063975471112: Bind#0 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=00 fl2=0000 frm=00 csi=00 siz=48 off=0 kxsbbbfp=8e8646e8 bln=22 avl=02 flg=05 value=1 Bind#1 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=00 fl2=0000 frm=00 csi=00 siz=0 off=24 kxsbbbfp=8e864700 bln=22 avl=06 flg=01 value=971429521
对于这种问题,处理起来相对比较简单,绕过oracle open过程对该sql的执行,然后open库对该表的index进行rebuild
SQL> recover database; Media recovery complete. SQL> alter database open; Database altered. SQL> SELECT OWNER, SEGMENT_NAME, SEGMENT_TYPE, TABLESPACE_NAME, A.PARTITION_NAME 2 FROM DBA_EXTENTS A 3 WHERE FILE_ID = &FILE_ID 4 AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1; Enter value for file_id: 3 old 3: WHERE FILE_ID = &FILE_ID new 3: WHERE FILE_ID = 3 Enter value for block_id: 10068 old 4: AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1 new 4: AND 10068 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1 OWNER -------------------------------------------------------------------------------- SEGMENT_NAME -------------------------------------------------------------------------------- SEGMENT_TYPE TABLESPACE_NAME ------------------ ------------------------------ PARTITION_NAME -------------------------------------------------------------------------------- SYS WRH$_UNDOSTAT TABLE SYSAUX SQL> select index_name from dba_indexes where table_name='WRH$_UNDOSTAT'; INDEX_NAME -------------------------------------------------------------------------------- WRH$_UNDOSTAT_PK SQL> alter index WRH$_UNDOSTAT_PK REBUILD ONLINE; Index altered.
至此该数据库完成恢复任务,重启之后一切正常.
联系:手机/微信(+86 17813235971) QQ(107644445)
标题:expdp dmp 导出不完整导入ORA-39059 ORA-39246 故障抢救数据
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
客户一套nc系统,由于安装时候把库建在了比较小的分区上,运行一些时间之后,出现空间不足,现场技术人员对oracle不太熟悉,经过一系列操作(删除业务表空间,复制pdb,创建表空间等等操作),无法恢复数据库,准备使用备份的dmp进行还原,结果分析发现仅保留的最后一份dmp,是一份导出不完全的dmp文件,无法正常导入(以前处理过一个类似case:ORA-39773: parse of metadata stream failed故障处理,尝试导入报ORA-39246错:
C:\Users\XFF>impdp system/oracle@127.0.0.1/orapdb directory=expdp_dir dumpfile=xxxxx_2025-12-01_0230.dmp logfile=1.log Import: Release 19.0.0.0.0 - Production on 星期三 12月 3 21:00:19 2025 Version 19.3.0.0.0 Copyright (c) 1982, 2019, Oracle and/or its affiliates. All rights reserved. 连接到: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production ORA-39002: 操作无效 ORA-39059: 转储文件集不完整 ORA-39246: 无法在提供的转储文件中定位主表
分析当时当初的dmp日志,由于expdp的job表所在表空间不足导致expdp导出失败

TABLE:"XIFENFEI"."EOM_MEASURE_POINT" ORA-30032: 挂起的 (可恢复) 语句已超时 ORA-01691: Lob 段 XIFENFEI.SYS_LOB0000161267C00111$$ 无法通过 32 (在表空间 NNC_DATA01 中) 扩展 ORA-06512: 在 "SYS.DBMS_SYS_ERROR", line 105 ORA-06512: 在 "SYS.KUPW$WORKER", line 12620 ORA-06512: 在 "SYS.DBMS_SYS_ERROR", line 105 ORA-06512: 在 "SYS.KUPW$WORKER", line 11414 ----- PL/SQL Call Stack ----- object line object handle number name 0xda5dae50 33476 package body SYS.KUPW$WORKER.WRITE_ERROR_INFORMATION 0xda5dae50 12641 package body SYS.KUPW$WORKER.DETERMINE_FATAL_ERROR 0xda5dae50 11602 package body SYS.KUPW$WORKER.CREATE_OBJECT_ROWS 0xda5dae50 15268 package body SYS.KUPW$WORKER.FETCH_XML_OBJECTS 0xda5dae50 3907 package body SYS.KUPW$WORKER.UNLOAD_METADATA 0xda5dae50 13736 package body SYS.KUPW$WORKER.DISPATCH_WORK_ITEMS 0xda5dae50 2429 package body SYS.KUPW$WORKER.MAIN 0x6524a4f0 2 anonymous block KUPW: Object row index into parse items is: 1 KUPW: Parse item count is: 19 KUPW: In function CHECK_FOR_REMAP_NETWORK KUPW: Nothing to remap KUPW: In procedure BUILD_OBJECT_STRINGS - non-base info KUPW: In procedure BUILD_SUBNAME_LIST with TABLE:XIFENFEI.EOM_MEASURE_POINT KUPW: In function NEXT_PO_NUMBER KUPW: PO number assigned: 34198 FORALL KUPW: In procedure DETERMINE_FATAL_ERROR with ORA-30032: 挂起的 (可恢复) 语句已超时 ORA-01691: Lob 段 XIFENFEI.SYS_LOB0000161267C00111$$ 无法通过 32 (在表空间 NNC_DATA01 中) 扩展 作业 "XIFENFEI"."SYS_EXPORT_SCHEMA_01" 因致命错误于 星期一 12月 1 06:33:21 2025 elapsed 0 04:03:18 停止
从导出日志看,在导出大量”0 KB 0 行”记录之后提示表空间不足,expdp的job表无法扩展导致导出挂起然后超时导出终止(这个导出操作没有完全完成),从而在导入的时候出现了ORA-39059: 转储文件集不完整 ORA-39246: 无法在提供的转储文件中定位主表 的错误.对于这种故障,分析导出日志,发现运气不错,所有有数据的表都导出完成,基于这个心中就有了第一层底气,所有表数据不会丢失(因为都导出到了这个dmp中),但是非表的字典数据不完整,要想业务完整跑起来,需要找到一个完整的业务字典信息.对于大量的备份dmp被删除,然后对应分区还写入了很多数据,只能尝试看运气,通过对磁盘文件镜像,然后进行反删除恢复,找出来一个11月26日的dmp的压缩文件是完整的

联系:手机/微信(+86 17813235971) QQ(107644445)
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
今天在oracle linux 7.9的系统中装Oracle 19.29集群采用gridSetup applyRU 的方式安装
su – grid cd $ORACLE_HOME ./gridSetup.sh -applyRU /soft/38298204/
I/O scheduler - This task checks the I/O scheduler parameter configured Error: PRVG-11975:The I/O scheduler parameter of device "/dev/sdm" did not match the expected value on nodes "hisdb2,hisdb2". [Expected scheduler = "none" ; Found scheduler = "mq-deadline"] ? Cause:?The I/O scheduler parameter of the indicated device was not the expected value on the indicated nodes. ? Action:?Change the I/O scheduler parameter using 'echo deadline > /sys/block/<device>/queue/scheduler' command of the indicated device to ensure it is the expected value.
大概的意思是磁盘的I/O scheduler parameter被检查出来是mq-deadline,实际要求为:deadline,并建议使用echo deadline > /sys/block/
[grid@hisdb1 grid]$ cat /sys/block/sdm/queue/scheduler [mq-deadline] kyber bfq none
确实I/O scheduler为mq-deadline,查询了相关描述:
mq-deadline 是 deadline 调度器的多队列升级版,核心设计目标一致(按 “截止时间” 调度以保证 I/O 延迟),但适配的硬件场景、性能表现差异显著 —— 简单说:deadline 是为传统单队列设备(如 HDD)设计的老版本,mq-deadline 是为现代多队列设备(如 SSD/NVMe)优化的新版本。


联系:手机/微信(+86 17813235971) QQ(107644445)
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
接手客户一个云平台硬件故障恢复之后,数据库无法启动的case,通过分析alert日志,发现数据库在open过程中报ORA-01172: recovery of thread 1 stuck at block 2220167 of file 262,ORA-01151: use media recovery to recover block, restore backup if needed等相关错误(其实也就是在做实例恢复的过程中报了logically corrupt导致无法完成实例恢复
Sat Nov 01 14:29:10 2025 ALTER DATABASE OPEN Beginning crash recovery of 1 threads parallel recovery started with 15 processes Started redo scan Completed redo scan read 7034 KB redo, 937 data blocks need recovery Started redo application at Thread 1: logseq 296553, block 389408 Recovery of Online Redo Log: Thread 1 Group 3 Seq 296553 Reading mem 0 Mem# 0: /data/orcl/onlinelog/redo03a.log Sat Nov 01 14:29:11 2025 Hex dump of (file 262, block 2220584) in trace file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_p009_2648.trc Sat Nov 01 14:29:11 2025 Hex dump of (file 262, block 2218886) in trace file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_p007_2644.trc Reading datafile '/data/orcl/datafile/xifenfei12.dbf' for corruption at rdba: 0x41a1db86 (file 262, block 2218886) Reading datafile '/data/orcl/datafile/xifenfei12.dbf' for corruption at rdba: 0x41a1e228 (file 262, block 2220584) Sat Nov 01 14:29:11 2025 Hex dump of (file 262, block 2219845) in trace file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_p008_2646.trc Reading datafile '/data/orcl/datafile/xifenfei12.dbf' for corruption at rdba: 0x41a1df45 (file 262, block 2219845) Sat Nov 01 14:29:11 2025 Hex dump of (file 262, block 2220167) in trace file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_p001_2632.trc Reading datafile '/data/orcl/datafile/xifenfei12.dbf' for corruption at rdba: 0x41a1e087 (file 262, block 2220167) Reread (file 262, block 2218886) found same corrupt data (logically corrupt) RECOVERY OF THREAD 1 STUCK AT BLOCK 2218886 OF FILE 262 Reread (file 262, block 2220584) found same corrupt data (logically corrupt) RECOVERY OF THREAD 1 STUCK AT BLOCK 2220584 OF FILE 262 Reread (file 262, block 2219845) found same corrupt data (logically corrupt) RECOVERY OF THREAD 1 STUCK AT BLOCK 2219845 OF FILE 262 Reread (file 262, block 2220167) found same corrupt data (logically corrupt) RECOVERY OF THREAD 1 STUCK AT BLOCK 2220167 OF FILE 262 Sat Nov 01 14:29:26 2025 Slave exiting with ORA-1172 exception Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_p007_2644.trc: ORA-01172: recovery of thread 1 stuck at block 2218886 of file 262 ORA-01151: use media recovery to recover block, restore backup if needed Sat Nov 01 14:29:26 2025 Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_p008_2646.trc: ORA-10388: parallel query server interrupt (failure) Sat Nov 01 14:29:26 2025 Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_p009_2648.trc: ORA-10388: parallel query server interrupt (failure) Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_p008_2646.trc: ORA-10388: parallel query server interrupt (failure) Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_p009_2648.trc: ORA-10388: parallel query server interrupt (failure) Sat Nov 01 14:29:26 2025 Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_p001_2632.trc: ORA-10388: parallel query server interrupt (failure) Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_p001_2632.trc: ORA-10388: parallel query server interrupt (failure) Sat Nov 01 14:29:26 2025 Aborting crash recovery due to slave death, attempting serial crash recovery Beginning crash recovery of 1 threads Started redo scan Completed redo scan read 7034 KB redo, 937 data blocks need recovery Started redo application at Thread 1: logseq 296553, block 389408 Recovery of Online Redo Log: Thread 1 Group 3 Seq 296553 Reading mem 0 Mem# 0: /data/orcl/onlinelog/redo03a.log Hex dump of (file 262,block 2220167) in trace file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_2606.trc Reading datafile '/data/orcl/datafile/xifenfei12.dbf'for corruption at rdba: 0x41a1e087 (file 262,block 2220167) Reread (file 262, block 2220167) found same corrupt data (logically corrupt) RECOVERY OF THREAD 1 STUCK AT BLOCK 2220167 OF FILE 262 Aborting crash recovery due to error 1172 Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_2606.trc: ORA-01172: recovery of thread 1 stuck at block 2220167 of file 262 ORA-01151: use media recovery to recover block, restore backup if needed Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_2606.trc: ORA-01172: recovery of thread 1 stuck at block 2220167 of file 262 ORA-01151: use media recovery to recover block, restore backup if needed ORA-1172 signalled during: ALTER DATABASE OPEN...
接手故障之后,尝试recover database恢复,结果报ORA-600 4552错误
SQL> recover database; ORA-10562: Error occurred while applying redo to data block (file# 262, block# 2222153) ORA-10564: tablespace XIFENFEI ORA-01110: data file 262: '/data/orcl/datafile/xifenfei12.dbf' ORA-10560: block type '0' ORA-00600: internal error code, arguments: [4552], [1], [0], [], [], [], [], [], [], [], [], []
关于ORA-600 4552对应的alert日志信息
Sat Nov 01 17:49:58 2025 ALTER DATABASE RECOVER database Media Recovery Start started logmerger process Parallel Media Recovery started with 16 slaves Sat Nov 01 17:49:59 2025 Recovery of Online Redo Log: Thread 1 Group 3 Seq 296553 Reading mem 0 Mem# 0: /data/orcl/onlinelog/redo03a.log Sat Nov 01 17:49:59 2025 Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_pr0c_28770.trc (incident=1018821): ORA-00600: internal error code, arguments: [4552], [1], [0], [], [], [], [], [], [], [], [], [] Incident details in:/u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_1018821/orcl_pr0c_28770_i1018821.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Sat Nov 01 17:50:03 2025 Sweep [inc][1018821]: completed Sweep [inc2][1018821]: completed Slave exiting with ORA-10562 exception Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_pr0c_28770.trc: ORA-10562: Error occurred while applying redo to data block (file# 262, block# 2222153) ORA-10564: tablespace xifenfei ORA-01110: data file 262: '/data/orcl/datafile/xifenfei12.dbf' ORA-10560: block type '0' ORA-00600: internal error code, arguments: [4552], [1], [0], [], [], [], [], [], [], [], [], [] Recovery Slave PR0C previously exited with exception 10562 Media Recovery failed with error 448 Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_pr00_28732.trc: ORA-00283: recovery session canceled due to errors ORA-00448: normal completion of background process ORA-10562 signalled during: ALTER DATABASE RECOVER database ...
无法整个库级别recover,尝试数据文件recover操作
SQL> recover datafile 1; Media recovery complete. ………… SQL> recover datafile 22,23,24,26,25,27,28,29,30; Media recovery complete. ………… SQL> recover datafile 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261; Media recovery complete. SQL> recover datafile 262; ORA-00283: recovery session canceled due to errors ORA-00600: internal error code, arguments: [3020], [262], [2215808], [1101123456], [], [], [], [], [], [], [], [] ORA-10567: Redo is inconsistent with data block (file# 262, block# 2215808, file offset is 972029952 bytes) ORA-10564: tablespace XIFENFEI ORA-01110: data file 262: '/data/orcl/datafile/xifenfei12.dbf' ORA-10560: block type '0' SQL> recover datafile 263; Media recovery complete.
出file# 262数据文件之外,其他文件全部recover成功,对应的ORA-600 3020错误相关alert日志信息
Sat Nov 01 17:53:37 2025 ALTER DATABASE RECOVER datafile 262 Media Recovery Start Serial Media Recovery started Recovery of Online Redo Log: Thread 1 Group 3 Seq 296553 Reading mem 0 Mem# 0: /data/orcl/onlinelog/redo03a.log Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_28561.trc (incident=1018717): ORA-00600: internal error code, arguments: [3020], [262], [2215808], [1101123456], [], [], [], [], [] ORA-10567: Redo is inconsistent with data block (file# 262, block# 2215808, file offset is 972029952 bytes) ORA-10564: tablespace xifenfei ORA-01110: data file 262: '/data/orcl/datafile/xifenfei12.dbf' ORA-10560: block type '0' Incident details in:/u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_1018717/orcl_ora_28561_i1018717.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Media Recovery failed with error 600 ORA-283 signalled during: ALTER DATABASE RECOVER datafile 262 ...
对于这种情况,有两种处理方式:
1)在recover过程中对于报错的block标记为坏块,然后继续恢复,这样正常应用日志成功,再把标记的坏块修复好
2)直接修改该文件头跳过该文件跳过这些block的应用日志,直接骗过数据库
在本case中由于客户急着恢复业务,需要尽快处理,所以采用了第一个方案,这里我使用自研的m_scn(modify_scn)工具快速修改相关数据文件信息
[oracle@host-172-18-50-10 tmp]$ cat 1.txt 1@/data/orcl/datafile/system01.dbf 262@/data/orcl/datafile/xifenfei12.dbf [oracle@host-172-18-50-10 tmp]$ ./m_scn 1.txt Please Enter Password: ===== Starting Datafile Header modification program ===== Datafile list file: 1.txt Operation Mode: Only Modify Datafile Header CheckPoint Block Size: 8192 Log Path: /tmp/modify_scn --------------------------------------------------------- Preparing Datafile list file... Verifying Datafile existence... Datafile verification passed Initializing working directory... Recovery script created: /tmp/modify_scn/backup/recover_datafile.sh --------------------------------------------------------- Starting Datafile Header processing (total 2 files)... [1/2] Processing Datafile Header: /data/orcl/datafile/system01.dbf (File number: 1) - Skipping file number 1 (control file) --------------------------------------------------------- [2/2] Processing Datafile Header: /data/orcl/datafile/xifenfei12.dbf (File number: 262) - Backing up Datafile header... - Executing Datafile Header modification with block size 8192... - Datafile Header processing completed --------------------------------------------------------- Cleaning up temporary files... ================= All operations completed ================= Note: Execute /tmp/modify_scn/backup/recover_datafile.sh operation for rollback
然后查询相关scn信息,确认修改文件信息没有问题并尝试recover 262号文件
[oracle@host-172-18-50-10 tmp]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Sat Nov 1 18:02:18 2025
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup mount;
ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance
ORACLE instance started.
Total System Global Area 4.1823E+10 bytes
Fixed Size 2262368 bytes
Variable Size 4294970016 bytes
Database Buffers 3.7447E+10 bytes
Redo Buffers 78614528 bytes
Database mounted.
SQL> set pages 10000
SQL> set numw 16
SQL> SELECT status,
2 checkpoint_change#,
3 to_char(checkpoint_time,'yyyy-mm-dd hh24:mi:ss') checkpoint_time,
4 last_change#,
5 count(*) ROW_NUM
FROM v$datafile
6 7 GROUP BY status, checkpoint_change#, checkpoint_time,last_change#
ORDER BY status, checkpoint_change#, checkpoint_time;
8
set numw 16
col CHECKPOINT_TIME for a40
set lines 150
set pages 1000
SELECT status,
to_char(checkpoint_time,'yyyy-mm-dd hh24:mi:ss') checkpoint_time,FUZZY,checkpoint_change#,
count(*) ROW_NUM
FROM v$datafile_header
GROUP BY status, checkpoint_change#, to_char(checkpoint_time,'yyyy-mm-dd hh24:mi:ss'),fuzzy
ORDER BY status, checkpoint_change#, checkpoint_time;
SELECT dd.FILE#,
dd.NAME,
dd.STATUS,
dd.checkpoint_change# dfile_chkp_change,
dh.checkpoint_change# dfile_hed_chkp_change,
dh.recover,
dh.fuzzy
FROM v$datafile dd,
v$datafile_header dh
WHERE dd.FILE#=dh.FILE#
AND dd.checkpoint_change#<>dh.checkpoint_change#;
STATUS CHECKPOINT_CHANGE# CHECKPOINT_TIME LAST_CHANGE# ROW_NUM
------- ------------------ ------------------- ---------------- ----------------
ONLINE 16816934458875 2025-11-01 17:58:28 16816934458875 258
RECOVER 16816934368799 2025-11-01 05:29:39 16816934456943 1
SYSTEM 16816934458875 2025-11-01 17:58:28 16816934458875 4
SQL> SQL> SQL> SQL> SQL> SQL> SQL> 2 3 4 5 6
STATUS CHECKPOINT_TIME FUZ CHECKPOINT_CHANGE# ROW_NUM
------- ---------------------------------------- --- ------------------ ----------------
OFFLINE 2025-11-01 17:58:28 NO 16816934458875 1
ONLINE 2025-11-01 17:58:28 NO 16816934458875 262
SQL> SQL> 2 3 4 5 6 7 8 9 10 11
FILE#
----------------
NAME
----------------------------------------------------------------------------------
STATUS DFILE_CHKP_CHANGE DFILE_HED_CHKP_CHANGE REC FUZ
------- ----------------- --------------------- --- ---
262
/data/orcl/datafile/xifenfei12.dbf
RECOVER 16816934368799 16816934458875 YES NO
SQL> recover datafile 262;
Media recovery complete.
open数据库成功
SQL> alter database open; Database altered.
Sat Nov 01 18:06:00 2025 ALTER DATABASE OPEN Thread 1 opened at log sequence 296554 Current log# 1 seq# 296554 mem# 0: /data/orcl/onlinelog/redo01a.log Successful open of redo thread 1 MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set SMON: enabling cache recovery [33941] Successfully onlined Undo Tablespace 143. Undo initialization finished serial:0 start:12793234 end:12793304 diff:70 (0 seconds) Verifying file header compatibility for 11g tablespace encryption.. Verifying 11g file header compatibility for tablespace encryption completed SMON: enabling tx recovery Database Characterset is ZHS16GBK No Resource Manager plan active replication_dependency_tracking turned off (no async multimaster replication found) Starting background process QMNC Sat Nov 01 18:06:01 2025 QMNC started with pid=20, OS id=33973 Completed: ALTER DATABASE OPEN
至此基本上完成本次恢复任务,后续根据alert日志,有个别表可能由于在file# 262中丢失一些数据导致不一致的问题进行单独,其他没有太大问题,最快帮客户恢复了业务
![]() |
|