dd破坏包含50多个pdb的asm 磁盘组恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:dd破坏包含50多个pdb的asm 磁盘组恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

前段时间刚刚恢复了一个客户dd了34个磁盘中的2个磁盘的1-2G多的数据(在生产环境错误执行dd命令破坏asm磁盘故障恢复),这次又遇到一个客户dd了asm 的两个磁盘的100m和10m,这次故障麻烦的是由于dd了disk 0和disk 1,ausize为4M,有50多个pdb在该磁盘组中,相对恢复比较麻烦,通过不懈努力,最后终于最大限度恢复客户数据,避免了进一步的损失
故障误操作
近期又一个可以误执行了dd命令,破坏了生产库的两个asm disk磁盘
1


上述截图中可以确认两个信息
1.sdh盘被dd掉了10m,sdg盘被dd掉了100M
2.sdg对应的是asm-data1,sdh对应的是asm-data2
dd误操作之后,data磁盘组开始报错,然后直接dismount.
2

分析原磁盘组中asm磁盘情况
1.确定损坏磁盘在磁盘组中位置
通过asm的alert日志分析损坏的盘对应的磁盘组信息
3

基于这个信息,我们可以确认asm-data1 对应的是DATA磁盘组的disk 0,asm-data2对应的是DATA磁盘组的disk 1,也就是说现在DATA磁盘组的0号盘被dd了100M,1号盘被dd了10m
2.确认ausize大小
通过分析该磁盘组的其他磁盘确认ausize大小
kfdhdb.ausize: 4194304 ; 0x0bc: 0×00400000
可以确认该磁盘组的au大小为4M
3.磁盘组后续进行了一次加盘扩容
4

在25年6月份,对data磁盘组加了三块盘
4.sdh(asm-data2)磁盘还被加入到了arch磁盘组中
5

比较幸运由于在另外一个节点上sdh磁盘权限没有正确修改,导致该增加没有完全成功,也就是该磁盘没有被reblance(如果加入成功并正常reblance,那后果比现在严重很多)
5.通过结合asm日志以及kfed,磁盘物理大小等信息,并且通过kfed构造出来损坏的磁盘头信息,列出故障之前DATA磁盘组所有磁盘的情况(为了避免udev别名带来的影响,我直接使用物理磁盘名称来显示)
6

客户现场情况说明
1.客户有一个大概1年多之前的dataguard(后续没有继续同步),已经进行了failover激活
2.客户的备份系统中有一个大概1个月之前的备份,但是备份库缺少归档,我接手之时,已经被维护厂商强制拉起
3.在我恢复之前,有专业的工程师已经对其这个现场进行了分析,但是没有拿出好的恢复方案
恢复难点说明
1.该磁盘组的disk 0 被dd掉了100M,这个导致kfdhdb.f1b1locn记录被清空,也就无法获取到存储指向ASM file 1 文件目录表,这个值虽然被清空,但是根据经验或者对比其他磁盘组的disk 0 可以确认指定aun为10
2.f1b1locn指向的au中存储中asm元数据1-255以及256-1023的file的前面60个au的extent映射表,由于这个在aun为10(也就是aun*aus=40M)的位置,但是这个位置也已经被dd掉了,从原理上业务文件256-1023的前面60个au的extent映射表彻底丢失
3.通过工具扫描,发现存储别名信息的au也在disk 0的前面100M之内(也被dd掉了),导致通过别名直接定位文件的起点的恢复思路也不可行,而且别名丢失导致以别名的数据文件后续识别有一定的难度(无法获取asm里面文件的完整路径)
4.该库有50多套pdb组成,也就意味着通过数据库碎片扫描的方式无法恢复(因为每个pdb里面都是由默认的种子创建而成,也就意味着rfile 1,4,9是重复的)
5.由于中途加过一次磁盘,因引起这个asm里面的文件进行重新reblance,使得部分文件同一个block记录在不同磁盘的au上(数据文件块可能重复)
6.由于该磁盘组中asm disk 大小不等,导致文件au分布在各个磁盘上不均匀,无明显规律可循
恢复操作
1.通过kfed对损坏的asm-data1、asm-disk2的磁盘头进行构造,便于后续的恢复工具识别
2.通过工具对data中所有磁盘进行扫描,主要扫描文件extent映射表信息和ACD中关于asm文件的分配信息
2.1> 通过对这些信息进行综合分析,确认asm file >=1024的文件的extent映射表信息完整,数据可以直接恢复,效果类似
7

2.2> 对于256-1023号文件通过asm file的extent信息缺少0-59 au的数据,结合acd中获取的部分au分配信息,可以尽可能完整的恢复出来这些数据(由于disk 0中的acd信息丢失,所以不是100%完整)
8

3.对于绝对文件号非1,4,9的文件,而且asm file 小于1024的数据文件,结合rdba碎片重组的方式再一次进行恢复,避免上面2.2中恢复中前面240M(60个au),有部分au信息丢失导致文件不完整的情况
通过上述多种方法恢复,整体恢复数据文件类似(由于文件较多,存放多个目录和根据规则取了多种名字)
9

4.由于asm的文件目录,别名信息全部丢失,而且该库有50多个pdb,无法确认恢复出来的上千个文件和pdb对应的关系,对于这种情况,临时写了一个小程序,对这些文件进行读取,获取file#,rfile#,ts#,tsname,文件大小,dbid,dbname,scn等信息(obet(Oracle Block Editor Tool)第二版发布
10

通过把这些信息和历史的控制文件中的信息进行匹配,确认各个文件所属的pdb关系
5.通过4中获取的文件和pdb对应关系然后通过dbms_pdb.recover包实现把恢复的文件插入到新的cdb库中,在这个插入和open库过程中遇到各种错误,都一一解决
11
12
13
14
15

6.最终完成客户数据恢复要求(为了保证数据不被再次修改,客户要求所有恢复的pdb不能打开到读写模式)
16

然后由客户的运维厂商或者应用厂商把需要的数据迁移或者整合到新库中并恢复业务,完成本次恢复任务

Oracle数据库系统回滚段异常处理-ORA-600 4137/4193

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:Oracle数据库系统回滚段异常处理-ORA-600 4137/4193

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

最初是由于数据库sysaux文件无法正常恢复,重建ctl抛弃sysaux文件,然后打开数据库,但是无法expdp导出数据

Export: Release 12.2.0.1.0 - Production on Wed Jun 24 17:18:04 2026
Copyright (c) 1982, 2017, Oracle and/or its affiliates.  All rights reserved.
Connected to: Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production
ORA-31626: job does not exist
ORA-31637: cannot create job SYS_EXPORT_SCHEMA_02 for user SYS
ORA-06512: at "SYS.KUPV$FT", line 1140
ORA-06512: at "SYS.KUPV$FT", line 1741
ORA-06512: at "SYS.DBMS_SYS_ERROR", line 95
ORA-06512: at "SYS.KUPV$FT_INT", line 823
ORA-39080: failed to create queues "KUPC$C_1_20260624171804" and "" for Data Pump job
ORA-06512: at "SYS.DBMS_SYS_ERROR", line 95
ORA-06512: at "SYS.KUPC$QUE_INT", line 1541
ORA-00376: file 3 cannot be read at this time
ORA-06512: at "SYS.DBMS_AQADM", line 742
ORA-06512: at "SYS.DBMS_AQADM_SYS", line 8060
ORA-01110: data file 3: '/u01/app/oracle/product/12.2.0.1/dbhome_2/dbs/MISSING00003'
ORA-06512: at "SYS.DBMS_AQADM_SYSCALLS", line 912
ORA-06512: at "SYS.DBMS_AQADM_SYS", line 8036
ORA-06512: at "SYS.DBMS_AQADM", line 737
ORA-06512: at "SYS.KUPC$QUE_INT", line 1461
ORA-06512: at line 1
ORA-06512: at "SYS.KUPC$QUEUE_INT", line 158
ORA-06512: at "SYS.KUPV$FT_INT", line 758
ORA-06512: at "SYS.KUPV$FT", line 1645
ORA-06512: at "SYS.KUPV$FT", line 1101

然后通过各方人员一顿操作猛如虎,导致数据库启动报ORA-600 4137和ORA-600 4193错误,数据库无法open成功

2026-06-24T18:38:50.158906+08:00
alter database open
2026-06-24T18:38:50.182720+08:00
Ping without log force is disabled:
  instance mounted in exclusive mode.
2026-06-24T18:38:50.219449+08:00
…………
2026-06-24T18:38:50.514016+08:00
ARC3: Archival started
ARCH: STARTING ARCH PROCESSES COMPLETE
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_48840.trc  (incident=304968):
ORA-00600: internal error code, arguments: [4137], [0.77.1546], [0], [0], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_304968/orcl1_ora_48840_i304968.trc
Use ADRCI or Support Workbench to package the incident.
ORACLE Instance orcl1 (pid = 53) - Error 600 encountered while recovering transaction (0, 77).
2026-06-24T18:38:51.313973+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_48840.trc:
ORA-00600: internal error code, arguments: [4137], [0.77.1546], [0], [0], [], [], [], [], [], [], [], []
2026-06-24T18:38:51.649361+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_48840.trc  (incident=304969):
ORA-00600: internal error code, arguments: [4193], [1112], [1122], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_304969/orcl1_ora_48840_i304969.trc
2026-06-24T18:38:53.412782+08:00
opiodr aborting process unknown ospid (48840) as a result of ORA-603

需要open故障库,并且正常导出数据,需要处理两个问题
1. 被抛弃的sysaux文件需要正常online起来,不然expdp无法正常导出用户或者全库数据
2. 需要解决open过程的ORA-600 4137/ORA-600 4193错误
对于sysaux文件进行检查,由于重建ctl没有包含异常的sysaux文件,因此无法直接从库中查询到当前各种文件头相关情况,通过obet直接解析文件头获取相关信息(Oracle数据块编辑工具( Oracle Block Editor Tool)-obet)
res


对于这种情况,可以使用obet的修改文件头checkpoint scn和resetlogs scn功能进行快速修复

OBET> set file 2
filename set to: /u02/app/oracle/oradata/PYJHYSYS/PYJHYSYS/datafile/o1_mf_sysaux_go991cmw_.dbf (file#2)

OBET> backup
Created backup directory: backup_blk
Successfully backed up block 1 from current file to /tmp/backup_blk/o1_mf_sysaux_go991cmw_.dbf_1.20260624191357

OBET> copy chkscn file 1 to file 2
Error: Edit mode not enabled. Use 'set mode edit' first.

OBET> set mode edit
mode set to: edit

OBET> copy chkscn file 1 to file 2

Confirm Modify chkscn:
Source: file#1 (/u02/app/oracle/oradata/PYJHYSYS/PYJHYSYS/datafile/o1_mf_system_go990lcg_.dbf)
Target: file#2 (/u02/app/oracle/oradata/PYJHYSYS/PYJHYSYS/datafile/o1_mf_sysaux_go991cmw_.dbf)
Proceed? (Y/YES to confirm): yes
Successfully copied checkpoint SCN information from file#1 to file#2.

OBET> copy resetlogscn file 1 to file 2

Confirm Modify resetlogscn:
Source: file#1 (/u02/app/oracle/oradata/PYJHYSYS/PYJHYSYS/datafile/o1_mf_system_go990lcg_.dbf)
Target: file#2 (/u02/app/oracle/oradata/PYJHYSYS/PYJHYSYS/datafile/o1_mf_sysaux_go991cmw_.dbf)
Proceed? (Y/YES to confirm): yes
Successfully copied resetlog SCN information from file#1 to file#2.

OBET> sum
Check value for File /u02/app/oracle/oradata/PYJHYSYS/PYJHYSYS/datafile/o1_mf_sysaux_go991cmw_.dbf, Block 1:
current = 0xF21B, required = 0x6651

OBET> sum apply

Confirm applying checksum:
File: /u02/app/oracle/oradata/PYJHYSYS/PYJHYSYS/datafile/o1_mf_sysaux_go991cmw_.dbf
Block: 1
Offset in block: 16 (file offset: 0x00002010)
Original value: 0xF21B
New value:      0x6651
Confirm? (Y/YES to proceed): y
Verification successful: Stored checksum matches calculated value (0x6651).
Checksum applied successfully.

OBET> tailchk
Check tailchk for File /u02/app/oracle/oradata/PYJHYSYS/PYJHYSYS/datafile/o1_mf_sysaux_go991cmw_.dbf, Block 1:
current = 0x010B0000, required = 0x010B0000

OBET>

然后重建ctl,包含该sysaux,尝试打开数据库,报ORA-600 4193错误

SYS@ORCL> alter database open ;
alter database open 
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [4193], [1112], [1122], [], [], [], [], [], [], [], [], []
Process ID: 93550
Session ID: 1123 Serial number: 55884

进一步跟踪启动过程,确认报错在update undo$上


PARSING IN CURSOR 
#140446136869016
 len=160 dep=1 uid=0 oct=6 lid=0 tim=3161302405543 hv=1292341136 ad='9bbd4828' sqlid='8vyjutx6hg3wh'
update /*+ rule */ undo$ set name=:2,file#=:3,block#=:4,status$=:5,user#=:6,undosqn=:7,
xactsqn=:8,scnbas=:9,scnwrp=:10,inst#=:11,ts#=:12,spare1=:13 where us#=:1
END OF STMT
PARSE 
#140446136869016
:c=11966,e=11918,p=18,cr=94,cu=0,mis=1,r=0,dep=1,og=3,plh=0,tim=3161302405542
BINDS 
#140446136869016
:
 Bind
#0
  oacdty=01 mxl=32(21) mxlc=00 mal=00 scl=00 pre=00
  oacflg=18 fl2=0001 frm=01 csi=852 siz=32 off=0
  kxsbbbfp=9bbdac32  bln=32  avl=21  flg=09
  value="_SYSSMU12_3861134380$"
 Bind
#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fbc23eda370  bln=24  avl=02  flg=05
  value=5
 Bind
#2
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fbc23eda340  bln=24  avl=03  flg=05
  value=144
 Bind
#3
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fbc23eda310  bln=24  avl=02  flg=05
  value=5
 Bind
#4
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fbc23eda2e0  bln=24  avl=02  flg=05
  value=1
 Bind
#5
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fbc23eda2b0  bln=24  avl=04  flg=05
  value=46221
 Bind
#6
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fbc23eda280  bln=24  avl=05  flg=05
  value=30810931
 Bind
#7
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fbc23eda250  bln=24  avl=06  flg=05
  value=3399756014
 Bind
#8
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fbc23eda220  bln=24  avl=03  flg=05
  value=2429
 Bind
#9
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fbc23eda1f0  bln=24  avl=02  flg=05
  value=2
 Bind
#10
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fbc23eda1c0  bln=24  avl=02  flg=05
  value=4
 Bind
#11
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fbc23eda190  bln=24  avl=02  flg=05
  value=2
 Bind
#12
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fbc23eda3a0  bln=22  avl=02  flg=05
  value=12
WAIT 
#140446136869016
: nam='db file sequential read' ela= 16 file#=1 block#=547 blocks=1 obj#=0 tim=3161302406306
2026-06-24T19:59:40.979075+08:00
ORA-00600: internal error code, arguments: [4193], [1112], [1122], [], [], [], [], [], [], [], [], []

alert日志中还有ORA-600 4137等错误

ORACLE Instance orcl1 (pid = 53) - Error 600 encountered while recovering transaction (0, 77).
2026-06-24T19:59:40.387459+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_83245.trc:
ORA-00600: internal error code, arguments: [4137], [0.77.1546], [0], [0], [], [], [], [], [], [], [], []

通过这个报错,可以确认是由于0号回滚段,也就是rollback中事务异常,获取相关的trace

[TOC00003]
----- Beginning of Customized Incident Dump(s) -----
XID passed in = xid: 0x0000.04d.0000060a
XID from Undo block = xid: 0x0000.060.00000600
Dump of buffer cache at level 7 for pdb=0 tsn=0 rdba=4194432
BH (0x3ddfd26b8) file#: 1 rdba: 0x00400080 (1/128) class: 15 ba: 0x3ddb80000
  set: 166 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,0
  dbwrid: 3 obj: -1 objn: 0 tsn: [0/0] afn: 1 hint: f
  hash: [0x3ddece808,0xc6bdc2d8] lru: [0xbc1db108,0xbc1db108]
  ckptq: [NULL] fileq: [NULL]
  objq: [0xa2267bc0,0xa2267bc0] objaq: [0xa2267bb0,0xa2267bb0]
  st: XCURRENT md: NULL fpin: 'ktuwh72: ktugus:ktuswr1' fscn: 0x980cfff669f tch: 1
  flags:
  LRBA: [0x0.0.0] LSCN: [0x0] HSCN: [0x0] HSUB: [65535]
  Printing buffer operation history (latest change first):
  cnt: 10
  01. sid:00 L353:gcur:set:MEXCL      02. sid:00 L145:zib:mk:EXCL       
  03. sid:00 L212:zib:bic:FSQ         04. sid:00 L122:zgb:set:st        
  05. sid:00 L830:olq1:clr:WRT+CKT    06. sid:00 L951:zgb:lnk:objq      
  07. sid:00 L372:zgb:set:MEXCL       08. sid:00 L123:zgb:no:FEN        
  09. sid:00 L083:zgb:ent:fn          10. sid:01 L203:w_ini_dc:bic:FVB  
  buffer tsn: 0 rdba: 0x00400080 (1/128)
  scn: 0x980cffc5958 seq: 0x01 flg: 0x04 tail: 0x59580e01
  frmt: 0x02 chkval: 0x2688 type: 0x0e=KTU UNDO HEADER W/UNLIMITED EXTENTS

基于这样的情况,可以判断通过清理undo$中的相关记录,让其重新分配新的回滚块

Block Header:
block type=0x0e (KTU UNDO HEADER W/UNLIMITED EXTENTS)
block format=0xa2 (oracle 10+)
block rdba=0x00400080 (file#=1, block#=128)
scn=0x0980.cff7c56d, seq=1, tail=0xc56d0e01
block checksum value=0x2683=9859, flag=4
  Extent Control Header
  -------------------------------------------------------------
  Extent Header:: extents: 10  blocks: 79
                  last map: 0x00000000  
#maps
: 0  offset: 4128
      Highwater:: 0x00400225  (rfile#=1,block#=549)
                  ext#: 6  blk#: 5   ext size:8
      
#blocks
 in seg. hdr's freelists: 0
      
#blocks
 below: 0
      mapblk: 0x00000000   offset: 6
      Map Header:: next: 0x00000000   
#extents
: 10  obj#: 0  flag: 0x40000000
  Extent Control Header
  -------------------------------------------------------------
   0x00400081  length: 7
   0x004206a8  length: 8
   0x004206b0  length: 8
   0x00400088  length: 8
   0x00400210  length: 8
   0x00400218  length: 8
   0x00400220  length: 8
   0x00400228  length: 8
   0x004206a0  length: 8
   0x00400230  length: 8
  TRN CTL:: seq: 0x0462 chd: 0x005e ctl: 0x000d inc: 0x00000000 nbf: 0x0000
            mgc: 0x8002 xts: 0x0068 flg: 0x0001 opt: 2147483646(0x7ffffffe)
            uba: 0x00000225.0462.1d scn: 0x0980.cf1f2121
Version: 0x01
  FREE BLOCK POOL::
    uba: 0x00000000.0462.1c ext: 0x6  spc: 0x11a2
    uba: 0x00000000.0462.26 ext: 0x6  spc: 0xc86
    uba: 0x00000000.0462.03 ext: 0x6  spc: 0x1e5c
    uba: 0x00000000.0460.03 ext: 0x4  spc: 0x1e5c
    uba: 0x00000000.043c.21 ext: 0x8  spc: 0xd2c

然后数据库打开成功,使用expdp完美导出数据,完成本次恢复任务

使用deepseek进行Oracle恢复,引起重大故障

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:使用deepseek进行Oracle恢复,引起重大故障

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有一个恢复case,查询数据库是open状态,有一个数据文件处于offline,删除表空间报offline的文件不能读写
2
3
1


根据经验,这个是一个小问题,可能就是由于datafile 5 offline了,而这个文件是undo表空间的所以出现这样的情况,想着屏蔽下异常回滚段,或者强制online下文件就可以解决该问题.先进行第一个尝试,屏蔽异常回滚段,由于库是open状态,直接查询数据库是否有异常回滚段
undo_seg

无法查询到异常回滚段,这个有点不太符合常规认知,进一步核实文件和表空间信息
4
5
6
到这一步就发现了异常:
1. v$tablespace里面有两个undotbs1的表空间(这个肯定不对,是ctl和ts$不一致)
2. ts$中只有一个而且ts#=9没有ts#=2
3. file$中有ts#=2,这样导致ts$和file$信息不匹配,也不对
基于上述这样信息,我怀疑有人对底层字典进行了操作delete了ts$这个表记录.让现场技术人员再次确认这个库的所有操作,最后确认在他不知情的情况下,有另外的技术人员上来进行了类似操作
delete_ts

根据他们提供的聊天记录,以及当前数据库情况,进一步确认他们应该是执行了

delete from ts$ where name='UNDOTB1';
delete from seg$ where ts#=2;

没有对file$进行delete操作.对于这样的情况,人工删除字典,明显没有处理干净.导致数据库的任何操作都会去检查异常事务.
seg


通过清理这些异常事务,数据库可以正常操作,数据也导出成功
expdp

后续和当时直接进行delete 字典操作的人员沟通,他那边是根据deepseek提供的建议进行处理的
deepseek

在这里温馨提醒,虽然现在的ai比较发达,很多问题可以直接在上面问出来答案,但是需要对这些答案有一个判断能力,不能他说啥你就执行啥,特别是数据库非常规恢复这种不可逆而且可能引起重大事故的高风险性操作需要谨慎和做好回退方案.

接手一个只差临门一脚的数据库恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:接手一个只差临门一脚的数据库恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

刚刚进行了一个数据库恢复case,故障处理起来非常简单(设置一个参数启动库即可),大概回溯下故障经过,反馈是由于客户在没有停机的情况下复制虚拟机,然后启动虚拟机之后,数据库无法正常启动报ORA-00314 ORA-00312错误

Wed Jun 17 10:24:02 2026
ALTER DATABASE   MOUNT
Successful mount of redo thread 1, with mount id 3689345602
Database mounted in Exclusive Mode
Lost write protection disabled
Completed: ALTER DATABASE   MOUNT
Wed Jun 17 10:24:06 2026
ALTER DATABASE OPEN
Beginning crash recovery of 1 threads
 parallel recovery started with 32 processes
Started redo scan
Errors in file /u01/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_60458.trc:
ORA-00314: log 2 of thread 1, expected sequence# 12271 doesn't match 9443
ORA-00312: online log 2 thread 1: '/oradata/orcl/redo02.log'
Aborting crash recovery due to error 314
Errors in file /u01/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_60458.trc:
ORA-00314: log 2 of thread 1, expected sequence# 12271 doesn't match 9443
ORA-00312: online log 2 thread 1: '/oradata/orcl/redo02.log'
Errors in file /u01/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_60458.trc:
ORA-00314: log 2 of thread 1, expected sequence# 12271 doesn't match 9443
ORA-00312: online log 2 thread 1: '/oradata/orcl/redo02.log'
ORA-314 signalled during: ALTER DATABASE OPEN...
Wed Jun 17 10:24:07 2026
Errors in file /u01/oracle/diag/rdbms/orcl/orcl/trace/orcl_m000_61170.trc:
ORA-00314: log 1 of thread 1, expected sequence# 12269 doesn't match 9441
ORA-00312: online log 1 thread 1: '/oradata/orcl/redo01.log'
Errors in file /u01/oracle/diag/rdbms/orcl/orcl/trace/orcl_m000_61170.trc:
ORA-00314: log 2 of thread 1, expected sequence# 12271 doesn't match 9443
ORA-00312: online log 2 thread 1: '/oradata/orcl/redo02.log'
Errors in file /u01/oracle/diag/rdbms/orcl/orcl/trace/orcl_m000_61170.trc:
ORA-00314: log 3 of thread 1, expected sequence# 12265 doesn't match 9444
ORA-00312: online log 3 thread 1: '/oradata/orcl/redo03.log'
Errors in file /u01/oracle/diag/rdbms/orcl/orcl/trace/orcl_m000_61170.trc:
ORA-00314: log 4 of thread 1, expected sequence# 12266 doesn't match 9438
ORA-00312: online log 4 thread 1: '/oradata/orcl/redo04.log'
Errors in file /u01/oracle/diag/rdbms/orcl/orcl/trace/orcl_m000_61170.trc:
ORA-00314: log 5 of thread 1, expected sequence# 12267 doesn't match 9439
ORA-00312: online log 5 thread 1: '/oradata/orcl/redo05.log'
Errors in file /u01/oracle/diag/rdbms/orcl/orcl/trace/orcl_m000_61170.trc:
ORA-00314: log 6 of thread 1, expected sequence# 12268 doesn't match 9440
ORA-00312: online log 6 thread 1: '/oradata/orcl/redo06.log'

这个错误的redo sequence差距有点大,个人感觉可能不是简单的复制引起的,由于没有第一现场不好溯源,不乱做评论,姑且认为是由于虚拟机复制引起的问题
客户使用隐含参数强制打开数据库,报ORA-1555错误

Wed Jun 17 10:42:50 2026
ALTER DATABASE OPEN RESETLOGS
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 1838398816
Resetting resetlogs activation ID 3463997979 (0xce786a1b)
ORA-344 signalled during: ALTER DATABASE OPEN RESETLOGS...
Wed Jun 17 10:44:30 2026
ALTER DATABASE OPEN RESETLOGS
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 1838398816
Resetting resetlogs activation ID 3463997979 (0xce786a1b)
Wed Jun 17 10:45:13 2026
Setting recovery target incarnation to 3
Wed Jun 17 10:45:13 2026
Assigning activation ID 3689297678 (0xdbe6370e)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: /oradata/orcl/redo01.log
Successful open of redo thread 1
Wed Jun 17 10:45:14 2026
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Wed Jun 17 10:45:14 2026
SMON: enabling cache recovery
ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 0x0000.6d93bd66):
select ctime, mtime, stime from obj$ where obj# = :1
Errors in file /u01/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_18038.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 35 with name "_SYSSMU35_3782695576$" too small
Errors in file /u01/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_18038.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 35 with name "_SYSSMU35_3782695576$" too small
Error 704 happened during db open, shutting down database
USER (ospid: 18038): terminating the instance due to error 704
Instance terminated by USER, pid = 18038
ORA-1092 signalled during: ALTER DATABASE OPEN RESETLOGS...
opiodr aborting process unknown ospid (18038) as a result of ORA-1092
Wed Jun 17 10:45:16 2026
ORA-1092 : opitsk aborting process

这个是一个比较经典的错误,以往的文章中总结了这类错误可能涉及的具体sql语句
在数据库open过程中常遇到ORA-01555汇总
数据库open过程遭遇ORA-1555对应sql语句补充
客户尝试多次重启之后,数据库报ORA-600 2662错误

Wed Jun 17 10:55:34 2026
Thread 1 advanced to log sequence 3 (thread open)
Thread 1 opened at log sequence 3
  Current log# 3 seq# 3 mem# 0: /oradata/orcl/redo03.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Wed Jun 17 10:55:34 2026
SMON: enabling cache recovery
Errors in file /u01/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_6707.trc  (incident=117868):
ORA-00600: internal error code, arguments: [2662], [0], [1838441753], [0], [1838486139], [12583040], 
Incident details in: /u01/oracle/diag/rdbms/orcl/orcl/incident/incdir_117868/orcl_ora_6707_i117868.trc
Wed Jun 17 10:55:45 2026
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_6707.trc:
ORA-00600: internal error code, arguments: [2662], [0], [1838441753], [0], [1838486139], [12583040], 
Errors in file /u01/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_6707.trc:
ORA-00600: internal error code, arguments: [2662], [0], [1838441753], [0], [1838486139], [12583040], 
Error 600 happened during db open, shutting down database
USER (ospid: 6707): terminating the instance due to error 600
Instance terminated by USER, pid = 6707
ORA-1092 signalled during: alter database open...

客户后续多次重建ctl,强制拉库等操作,一直在ORA-600 2662上面循环,后面终于出现了ORA-600 4193/4194错误,数据库没有正常open成功,至此客户放弃恢复尝试。
我们接手故障之后,设置undo手工管理模式,然后直接启动库成功
open1
然后使用expdp导出数据,完成本次恢复工作

硬件故障后数据文件大小不对故障处理—Oracle碎片扫描恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:硬件故障后数据文件大小不对故障处理—Oracle碎片扫描恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有硬件恢复圈朋友找到我,说硬件恢复之后dbv报dbv-00102错误,让我给看看是否可以处理
dbv-00102


这个是oracle dbv中一种常见错误,一般是由于block 0 不对,或者是由于文件大小不对引起,让把恢复文件发给我,进行检查

SQL> select name,bytes/1024/1024/1024 from v$datafile_header;

NAME                                                                             BYTES/1024/1024/1024
-------------------------------------------------------------------------------- --------------------
H:\BAIDUNETDISK\ORADATA\XXXXORCL\SYSTEM01.DBF                                             2.080078125
H:\BAIDUNETDISK\ORADATA\XXXXORCL\SYSAUX01.DBF                                             2.880859375
H:\BAIDUNETDISK\ORADATA\XXXXORCL\UNDOTBS01.DBF                                           9.0087890625
H:\BAIDUNETDISK\ORADATA\XXXXORCL\USERS01.DBF                                          31.993408203125
H:\BAIDUNETDISK\ORADATA\XXXXORCL\USERS02.DBF                                                8.1640625
H:\BAIDUNETDISK\ORADATA\XXXXORCL\USERS03.DBF                                              7.958984375
H:\BAIDUNETDISK\ORADATA\XXXXORCL\USERS04.DBF                                              7.958984375
H:\BAIDUNETDISK\ORADATA\XXXXORCL\USERS05.DBF                                                 7.890625

已选择8行。

确定USER02-USERS05的dbf文件实际大小(数据文件头记录)在8G左右,但是目前恢复出来的文件大小只有4G左右
4g


在恢复工具中直接查看文件大小情况
rs

这里比较明显rs中虽然显示文件状态良好,但是实际大小也不对(得出经验:以后恢复中不能太依赖这个状态),根据反馈现场是三个盘的raid5,中途做了一次强制上线,然后客户也使用win pe拷贝过一次数据,大小和现在一样,也是少了近4G.第一反应可能是由于raid盘弄的不对,但是经过对其他文件的确认,多完全没有问题,排除了盘错误的问题,怀疑是由于文件系统异常导致,对于这种的情况,文件系统层面肯定无法恢复,考虑使用自研的OraScan工具进行扫描(OraScan(Oracle 碎片扫描工具) 使用说明)
ora1
ora2

通过OraScan扫描找到相关block,并提取出来数据文件
file

使用dbv检测文件

C:\Users\XFF>dbv file=H:\BaiduNetdisk\xff\YFKJORCL.USERS.4.7.4.N.DBF

DBVERIFY: Release 11.2.0.4.0 - Production on 星期日 6月 7 18:06:30 2026

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - 开始验证: FILE = H:\BAIDUNETDISK\XFF\YFKJORCL.USERS.4.7.4.N.DBF


DBVERIFY - 验证完成

检查的页总数: 1043200
处理的页总数 (数据): 67167
失败的页总数 (数据): 0
处理的页总数 (索引): 37995
失败的页总数 (索引): 0
处理的页总数 (其他): 861109
处理的总页数 (段)  : 0
失败的总页数 (段)  : 0
空的页总数: 76929
标记为损坏的总页数: 0
流入的页总数: 0
加密的总页数        : 0
最高块 SCN            : 347454063 (0.347454063)

把文件拷贝替换掉之前恢复的USERS02-USER05.DBF,然后尝试打开数据库

SQL> recover database ;
完成介质恢复。
SQL> alter database open ;
alter database open
*
第 1 行出现错误:
ORA-03113: 通信通道的文件结尾
进程 ID: 3308
会话 ID: 14 序列号: 3

查看alert日志分析原因

Sun Jun 07 14:43:51 2026
Recovery of Online Redo Log: Thread 1 Group 2 Seq 36464 Reading mem 0
  Mem# 0: H:\BAIDUNETDISK\ORADATA\YFKJORCL\REDO02.LOG
Completed: ALTER DATABASE RECOVER  database   
alter database open 
Beginning crash recovery of 1 threads
 parallel recovery started with 19 processes
Started redo scan
Completed redo scan
 read 2353 KB redo, 0 data blocks need recovery
Started redo application at
 Thread 1: logseq 36464, block 15876
Recovery of Online Redo Log: Thread 1 Group 2 Seq 36464 Reading mem 0
  Mem# 0: H:\BAIDUNETDISK\ORADATA\YFKJORCL\REDO02.LOG
Completed redo application of 0.00MB
Completed crash recovery at
 Thread 1: logseq 36464, block 20582, scn 347475303
 0 data blocks read, 0 data blocks written, 2353 redo k-bytes read
Sun Jun 07 14:43:57 2026
Errors in file c:\app\xff\diag\rdbms\yfkjorcl\o11201\trace\o11201_lgwr_2204.trc:
ORA-00314: ?? 3 (???? 1) ??? sequence# 36462 ? 32025 ???
ORA-00312: ???? 3 ?? 1: 'H:\BAIDUNETDISK\ORADATA\YFKJORCL\REDO03.LOG'
Errors in file c:\app\xff\diag\rdbms\yfkjorcl\o11201\trace\o11201_lgwr_2204.trc:
ORA-00314: ?? 3 (???? 1) ??? sequence# 36462 ? 32025 ???
ORA-00312: ???? 3 ?? 1: 'H:\BAIDUNETDISK\ORADATA\YFKJORCL\REDO03.LOG'
Errors in file c:\app\xff\diag\rdbms\yfkjorcl\o11201\trace\o11201_ora_3308.trc:
ORA-00314: 日志 1 (用于线程 ) 要求的 sequence#  与  不匹配
ORA-00312: 联机日志 3 线程 1: 'H:\BAIDUNETDISK\ORADATA\YFKJORCL\REDO03.LOG'
USER (ospid: 3308): terminating the instance due to error 314
Sun Jun 07 14:44:02 2026
Instance terminated by USER, pid = 3308

由于redo group 异常导致库无法正常open,但是由于已经recover database成功,因此大概率可以clear该redo 组

SQL> select group#,status from v$log;

    GROUP# STATUS
---------- ----------------
         1 INACTIVE
         3 INACTIVE
         2 CURRENT

SQL> alter database clear logfile group 3;

数据库已更改。

SQL> alter database open;

数据库已更改。

数据库open成功,然后使用expdp导出数据,完成本次恢复任务.