记录block 0损坏,数据文件大量坏块,使用不当数据库版本恢复等各种操作之后的故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:记录block 0损坏,数据文件大量坏块,使用不当数据库版本恢复等各种操作之后的故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

今天一个恢复行业的朋友让我帮忙看一个oracle故障,说是vmdk文件被加密(加密破坏的很少),然后他从里面恢复出来了oracle的数据库文件,客户那边拿到数据文件然后说有三个文件头(被重命名为.bak)损坏了,无法打开数据库,让我这边给他们分析处理.分析他们说的被破坏文件,确实有损坏(初步看是block 0)
block0


使用oracle自带的dbv进行检测

C:\Users\XFF>dbv file=H:\BaiduNetdisk\D\SYSTEM06.DBF.bak

DBVERIFY: Release 11.2.0.4.0 - Production on 星期五 7月 3 21:06:28 2026

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.


DBV-00107: 未知标头格式 (85) (1311029317)

由于block 0损坏,oracle自带的dbv无法检测,使用obet的dbv功能进行检测(obet实现对数据文件坏块检测功能)

===============================================
DBV (Data Block Verification) Report
Started: 2026-07-03 21:04:22
Block Size: 8192 bytes
Target File: ALL files in listfile
===============================================

File #1: H:\BaiduNetdisk\D\SYSTEM06.DBF.bak (12801 blocks) - Started: 2026-07-03 21:04:22
File #1: rfile=19 (0x00000013)  header_block_num=12800 (0x00003200)  filesize_status:OK
file 1, block 0: rdba error (expected 0, got 4083969), bad block
  File #1 completed: 0 all zero, 0 soft corrupted, 0 tailchk error, 0 checksum error, 1 rdba error

File #1: H:\BaiduNetdisk\D\TSP_XXXXS06.DBF.bak (4194303 blocks) - Started: 2026-07-03 21:04:22
File #1: rfile=24 (0x00000018)  header_block_num=4194302 (0x003FFFFE)  filesize_status:OK
file 1, block 0: rdba error (expected 0, got 1367117), bad block
  File #1 completed: 0 all zero, 0 soft corrupted, 0 tailchk error, 0 checksum error, 1 rdba error

File #1: H:\BaiduNetdisk\D\TSP_XXXXS07.DBF.bak (1933313 blocks) - Started: 2026-07-03 21:04:42
File #1: rfile=25 (0x00000019)  header_block_num=1933312 (0x001D8000)  filesize_status:OK
file 1, block 0: rdba error (expected 0, got 2445572), bad block
  File #1 completed: 0 all zero, 0 soft corrupted, 0 tailchk error, 0 checksum error, 1 rdba error


DBV completed at: 2026-07-03 21:04:51
===============================================
DBV Summary:
Total blocks checked: 6140414
Total all zero blocks found: 0
Total all rdba error blocks found: 3
Total all tailchk error blocks found: 0
Total all soft corrupted blocks found: 0
Total all checksum error blocks found: 0
Total bad blocks found: 3
Execution time: 29.00 seconds
===============================================

通过检测确认这三个数据文件就是block 0损坏其他的数据块是ok的.检查其他数据文件发现有一个数据文件有近2w个坏块

DBVERIFY: Release 11.2.0.4.0 - Production on 星期五 7月 3 12:55:08 2026

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.


DBVERIFY - 开始验证: FILE = H:\BAIDUNETDISK\ORCL\TSP_XXXX.DBF
页 409601 标记为损坏
Corrupt block relative dba: 0x01864001 (file 6, block 409601)
Bad header found during dbv: 
Data in bad block:
 type: 219 format: 1 rdba: 0xdf6e738f
 last change scn: 0xd7ed.50be2d8b seq: 0x44 flg: 0x52
 spare1: 0x39 spare2: 0x92 spare3: 0xb5c5
 consistency value in tail: 0x6e7debc9
 check value in block header: 0x6d54
 block checksum disabled

………………
DBVERIFY - 验证完成

检查的页总数: 4194302
处理的页总数 (数据): 1842107
失败的页总数 (数据): 0
处理的页总数 (索引): 2143604
失败的页总数 (索引): 0
处理的页总数 (其他): 13763
处理的总页数 (段)  : 0
失败的总页数 (段)  : 0
空的页总数: 437
标记为损坏的总页数: 194391
流入的页总数: 2
加密的总页数        : 0
最高块 SCN            : 1869110545 (9.1869110545)

对于这个文件,由于客户有部分rman备份,尝试通过备份找出来这个文件历史文件

RMAN> run
2> {
3> set newname for datafile 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\TSP_XXXX.DBF' 
4>   to 'H:\BaiduNetdisk\orcl\TSP_XXXX.DBF_rman';
5> restore datafile 6;
6> }

正在执行命令: SET NEWNAME

启动 restore 于 03-7月 -26
使用通道 ORA_DISK_1

通道 ORA_DISK_1: 正在开始还原数据文件备份集
通道 ORA_DISK_1: 正在指定从备份集还原的数据文件
通道 ORA_DISK_1: 将数据文件 00006 还原到 H:\BaiduNetdisk\orcl\TSP_XXXX.DBF_rman
通道 ORA_DISK_1: 正在读取备份片段 H:\BAIDUNETDISK\BACKDB\FULLDB_ORCL_20250704_TC3TM4LJ_1_1.BAK.RESTORED0
通道 ORA_DISK_1: ORA-19870: 还原备份片段 H:\BAIDUNETDISK\BACKDB\FULLDB_ORCL_20250704_TC3TM4LJ_1_1.BAK.RESTORED0 时出错
ORA-19599: 块编号 1065697 已在 backup piece H:\BAIDUNETDISK\BACKDB\FULLDB_ORCL_20250704_TC3TM4LJ_1_1.BAK.RESTORED0 中损 坏

故障转移到上一个备份

通道 ORA_DISK_1: 正在开始还原数据文件备份集
通道 ORA_DISK_1: 正在指定从备份集还原的数据文件
通道 ORA_DISK_1: 将数据文件 00006 还原到 H:\BaiduNetdisk\orcl\TSP_XXXX.DBF_rman
通道 ORA_DISK_1: 正在读取备份片段 H:\BAIDUNETDISK\BACKDB\FULLDB_ORCL_20250703_T33TJG9L_1_1.BAK
通道 ORA_DISK_1: ORA-19870: 还原备份片段 H:\BAIDUNETDISK\BACKDB\FULLDB_ORCL_20250703_T33TJG9L_1_1.BAK 时出错
ORA-19599: 块编号 95352 已在 backup piece H:\BAIDUNETDISK\BACKDB\FULLDB_ORCL_20250703_T33TJG9L_1_1.BAK 中损坏

故障转移到上一个备份

通道 ORA_DISK_1: 正在开始还原数据文件备份集
通道 ORA_DISK_1: 正在指定从备份集还原的数据文件
通道 ORA_DISK_1: 将数据文件 00006 还原到 H:\BaiduNetdisk\orcl\TSP_XXXX.DBF_rman
通道 ORA_DISK_1: 正在读取备份片段 H:\BAIDUNETDISK\BACKDB\FULLDB_ORCL_20250702_SQ3TGRTU_1_1.BAK
通道 ORA_DISK_1: 段句柄 = H:\BAIDUNETDISK\BACKDB\FULLDB_ORCL_20250702_SQ3TGRTU_1_1.BAK 标记 = TAG20250702T000518
通道 ORA_DISK_1: 已还原备份片段 1
通道 ORA_DISK_1: 还原完成, 用时: 00:28:56
完成 restore 于 03-7月 -26

RMAN> exit

运气还不错,从备份里面找一份好的该文件的备份,然后通过使用该文件好的block替换最初损坏的block,实现该文件无坏块(由于这个文件比较靠前,而且已经写满,所以只差3天左右数据可能改变很小,因此可以采用这种替代方法最大限度恢复数据).数据文件检测和明显坏块处理完成,接下来开始打开数据库操作.
由于我这边的数据文件路径和原库的不一致,修改ctl中的datafile 路径报ORA-17503错误
rename


对于这些情况,通过修复block 0,然后尝试重命名数据文件成功

SQL> alter database rename file 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\TSP_XXX06.DBF' to 'H:\BaiduNetdisk\orcl\TSP_XXX06.DBF'
  2  ;

数据库已更改。

SQL> alter database rename file 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\TSP_XXX07.DBF' to 'H:\BaiduNetdisk\orcl\TSP_XXX07.DBF'
  2  ;

数据库已更改。

SQL> alter database rename file 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM06.DBF' to 'H:\BaiduNetdisk\orcl\SYSTEM06.DBF'
  2  ;

数据库已更改。

然后尝试打开数据库成功

SQL> recover database;
完成介质恢复。
SQL> alter database open;

数据库已更改。

尝试expdp导出数据,结果报UDE-22303

C:\Users\XFF>expdp "'/ as sysdba'" full=y dumpfile=expdp_0703_%U.dmp DIRECTORY=expdp_dir logfile=expdp_0703.log 
 parallel=4 compression=all EXCLUDE=STATISTICS,AUDIT

Export: Release 11.2.0.4.0 - Production on 星期五 7月 3 13:10:12 2026

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

连接到: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

UDE-22303: 操作产生了 ORACLE 错误 22303
OCI-22303: 未找到类型 "SYS"."KU$_STATUS1020"

正常打开的数据库,没有明显坏块,出现这个错误,理论上不太应该,怀疑是客户的版本问题,查询组件版本
v1


确认该数据库版本可能是11.2.0.1而不是我现在看到的数据文件头为11.2.0.4,查询WRM$_DATABASE_INSTANCE,确认数据库之前版本是11.2.0.1.对于这种情况,直接简单的把compatible从11.2.0.4修改为11.2.0.0肯定不行,因为ctl,dbf,redo里面都写了11.2.0.4的信息
1. ctl报版本不匹配

SQL> startup mount pfile='i:/pfile.txt';
ORACLE 例程已经启动。

Total System Global Area 3206836224 bytes
Fixed Size                  2180024 bytes
Variable Size             654314568 bytes
Database Buffers         2533359616 bytes
Redo Buffers               16982016 bytes
ORA-00201: ?????? 11.2.0.4.0 ? ORACLE ?? 11.2.0.0.0 ???
ORA-00202: ????: ''H:\BAIDUNETDISK\ORCL\CONTROL01.CTL''

2. 这种情况需要rectl,然后报dbf版本不对

CREATE CONTROLFILE REUSE DATABASE "ORCL" NORESETLOGS  NOARCHIVELOG
*
第 1 行出现错误:
ORA-01503: CREATE CONTROLFILE ??
ORA-01130: ??????? 11.2.0.4.0 ? ORACLE ?? 11.2.0.0.0 ???
ORA-01110: ???? 1: 'H:\BAIDUNETDISK\ORCL\SYSTEM01.DBF'

这种情况,dbf中的版本信息无法通过重建解决,只能使用obet工具修改,每个文件类似修改(Oracle数据块编辑工具( Oracle Block Editor Tool)-obet)

OBET> set file 25
filename set to: H:\BAIDUNETDISK\ORCL\TSP_XXXX07.DBF (file#25)

OBET> set offset 24
offset set to: 24

OBET> m 0000

Confirm modification:
File: H:\BAIDUNETDISK\ORCL\TSP_XXXX07.DBF
Block: 1
Offset: 24 (file offset: 0x00002018)
Original value: 0004
New value:      0000
Confirm? (Y/YES to proceed): y
Verification successful: Data written correctly.
Modified 2 bytes at offset 0x00002018 successfully.

OBET> sum apply

Confirm applying checksum:
File: H:\BAIDUNETDISK\ORCL\TSP_XXXX07.DBF
Block: 1
Offset in block: 16 (file offset: 0x00002010)
Original value: 0x94C3
New value:      0x94C7
Confirm? (Y/YES to proceed): y
Verification successful: Stored checksum matches calculated value (0x94C7).
Checksum applied successfully.

OBET>

然后重建ctl报redo版本不兼容ORA-00331

CREATE CONTROLFILE REUSE DATABASE "ORCL" NORESETLOGS  NOARCHIVELOG
*
第 1 行出现错误:
ORA-01503: CREATE CONTROLFILE 失败
ORA-00331: 日志版本 0.0.0.0.0 与 ORACLE 版本 11.2.0.0.0 不兼容
ORA-01517: 日志成员: 'H:\BAIDUNETDISK\ORCL\REDO01.LOG'

采用resetlogs方式重建(不读取redo信息),重建ctl成功,然后直接打开数据库

 37  CHARACTER SET AL32UTF8
 38  ;

控制文件已创建。

SQL> alter database open resetlogs;

数据库已更改。

然后expdp导出数据完成本次恢复任务

需要注意:dbv 检测controlfile可能不准

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:需要注意:dbv 检测controlfile可能不准

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

oracle dbv工具是Oracle数据库离线检测坏块的工具主要是用来检测数据文件坏块(物理和逻辑坏块),虽然在一定程度上面可以检测controlfile的坏块,但是不是特别准,在一次恢复案例中数据库启动报controlfile损坏,但是dbv检测是正常的
数据库在mount的过程中报controlfile 损坏ORA-00227
ctl


把控制文件从asm里面拷贝到文件系统,然后通过dbv进行检测,一切正常

[oracle@oracle1 ~]$ dbv blocksize=16384 file=/tmp/control01.ctl 

DBVERIFY: Release 11.2.0.4.0 - Production on Wed Jul 1 14:21:58 2026

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /tmp/control01.ctl


DBVERIFY - Verification complete

Total Pages Examined         : 1312
Total Pages Processed (Data) : 0
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 395
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 917
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 4294967295 (65535.4294967295)

这个故障通过重建ctl,打开数据库成功

达梦数据库redo异常强制拉库

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:达梦数据库redo异常强制拉库

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

通过模拟事务不提交,并删除掉redo来模仿达梦数据库故障的恢复过程
模拟达梦事务并不提交直接abort掉库

[dmdba@xifenfei ~]$ disql SYSDBA/Oracle123@localhost:5236

Server[localhost:5236]:mode is normal, state is open
login used time : 5.575(ms)
disql V8
SQL> 
SQL> create table t1 as select * from dba_objects;
executed successfully
used time: 85.929(ms). Execute id is 601.
SQL> select count(1) from t1;

LINEID     COUNT(1)            
---------- --------------------
1          1067

used time: 2.106(ms). Execute id is 602.
SQL> delete from t1;
affect rows 1067

used time: 3.726(ms). Execute id is 603.
SQL> shutdown abort;
executed successfully
used time: 1.324(ms). Execute id is 0.

删除掉redo文件

[dmdba@xifenfei ~]$ cd /dmdb/data/DAMENG/
[dmdba@xifenfei DAMENG]$ rm -rf DAMENG0
DAMENG01.log  DAMENG02.log  
[dmdba@xifenfei DAMENG]$ rm -rf DAMENG0*.log

尝试启动达梦数据库

[dmdba@xifenfei bin]$ ./dmserver  /dmdb/data/DAMENG/dm.ini 
file dm.key not found, use default license!
version info: develop
csek2_vm_t = 9456
nsql_vm_t = 336
prjt2_vm_t = 176
ltid_vm_t = 272
nins2_vm_t = 1144
nset2_vm_t = 272
ndlck_vm_t = 192
ndel2_vm_t = 760
slct2_vm_t = 352
nli2_vm_t = 200
aagr2_vm_t = 312
pscn_vm_t = 416
dist_vm_t = 1000
DM Database Server 64 V8 03134284458-20251113-301923-20178 startup...
Normal of FAST
Normal of DEFAULT
Normal of RECYCLE
Normal of KEEP
Normal of ROLL
Database mode = 0, oguid = 0
/dmdb/data/DAMENG/DAMENG01.log not exist, can not startup

重命名现在的库文件,然后参考当时的创建库的init.log 进行重新初始化一个新库

[dmdba@xifenfei data]$ cat DAMENG_BAK/dminit_DAMENG_20260315093214.log 
start init database: V8, 2026-03-15 09:32:14
init params:
        db path: /dmdb/data/DAMENG
        db name: DAMENG
        auto overwrite: 0
        page size: 8192
        extent size: 16
        char_fix_storage: 0
        sql_log_forbid: 0
        secur_flag: 2
        enable mac: 0
        page checksum policy: 1
        time zone: +08:00
        string case sensitive: 1
        charset: 0
        page check mode: 3
        page check algorithm id: 0
        priv flag: 0
        env label: 0
        rlog enc flag: 0
        use new hash: 1
        blank pad mode: 0
        sec priv mode: 0
        huge with delta: 1
        rlog gen for huge: 1
        pseg_mgr_flag: 0
        auto_adj_para: 0
[dmdba@xifenfei data]$ dminit PATH=/dmdb/data/DAMENG  PAGE_SIZE=8 EXTENT_SIZE=16 
> LOG_SIZE=256 PORT_NUM=5236 CASE_SENSITIVE=Y CHARSET=0 DB_NAME=DAMENG INSTANCE_NAME=DMSERVER
> SYSDBA_PWD=Oracle123 SYSAUDITOR_PWD=Oracle123 RLOG_POSTFIX_NAME=log
initdb V8
db version: 0x7000d
file dm.key not found, use default license!
License will expire on 2026-11-13
Normal of FAST
Normal of DEFAULT
Normal of RECYCLE
Normal of KEEP
Normal of ROLL

 log file path: /dmdb/data/DAMENG/DAMENG/DAMENG01.log


 log file path: /dmdb/data/DAMENG/DAMENG/DAMENG02.log

write to dir [/dmdb/data/DAMENG/DAMENG].
create dm database success. 2026-06-27 23:40:17

正常启动这个新库并干净关闭

[dmdba@xifenfei bin]$ disql SYSDBA/Oracle123@localhost:5236

Server[localhost:5236]:mode is normal, state is open
login used time : 5.523(ms)
disql V8
SQL> shutdown normal
2   ;
executed successfully
used time: 1.417(ms). Execute id is 0.
SQL> 

直接拷贝redo文件替换尝试启动

[dmdba@xifenfei bin]$ ./dmserver  /dmdb/data/DAMENG/dm.ini 
file dm.key not found, use default license!
version info: develop
csek2_vm_t = 9456
nsql_vm_t = 336
prjt2_vm_t = 176
ltid_vm_t = 272
nins2_vm_t = 1144
nset2_vm_t = 272
ndlck_vm_t = 192
ndel2_vm_t = 760
slct2_vm_t = 352
nli2_vm_t = 200
aagr2_vm_t = 312
pscn_vm_t = 416
dist_vm_t = 1000
DM Database Server 64 V8 03134284458-20251113-301923-20178 startup...
Normal of FAST
Normal of DEFAULT
Normal of RECYCLE
Normal of KEEP
Normal of ROLL
Database mode = 0, oguid = 0
License will expire on 2026-11-13
rfil grp log file error in (db_magic, permanent_magic), 
        log file /dmdb/data/DAMENG/DAMENG01.log is (475644558, 854749702), 
        dbfile is(92637567, 1763199417).
Floating point exception
[dmdba@xifenfei bin]$ 

提示db_magic和permanent_magic不匹配,使用dmmdf修改新库redo文件

[dmdba@xifenfei DAMENG]$ dmmdf type=1 FILE=SYSTEM.DBF
dmmdf V8
**********************************************************
1 db_magic=92637567
2 next_trxid=38064
3 pemnt_magic=1763199417
4 enable_page_check=3
**********************************************************
Please input which parameter you want to change(1-4), q to quit: Q

然后在使用dmmdf type=2 修改redo文件的db_magic和pemnt_magic修改之后,设置PSEG_RECV = 0,RLOG_CHECK_SPACE = 2尝试启动库

[dmdba@xifenfei bin]$ ./dmserver  /dmdb/data/DAMENG/dm.ini
file dm.key not found, use default license!
version info: develop
csek2_vm_t = 9456
nsql_vm_t = 336
prjt2_vm_t = 176
ltid_vm_t = 272
nins2_vm_t = 1144
nset2_vm_t = 272
ndlck_vm_t = 192
ndel2_vm_t = 760
slct2_vm_t = 352
nli2_vm_t = 200
aagr2_vm_t = 312
pscn_vm_t = 416
dist_vm_t = 1000
DM Database Server 64 V8 03134284458-20251113-301923-20178 startup...
Normal of FAST
Normal of DEFAULT
Normal of RECYCLE
Normal of KEEP
Normal of ROLL
Database mode = 0, oguid = 0
License will expire on 2026-11-13
file lsn: 49136
ndct db load finished, code:100
ndct fill fast pool finished
pseg_set_gtv_trxid_low next_trxid in mem:[40065]
pseg recv finished
nsvr_startup end.
uthr_pipe_create, create pipe[read:10, write:11]
uthr_pipe_create, create pipe[read:12, write:13]
uthr_pipe_create, create pipe[read:14, write:15]
uthr_pipe_create, create pipe[read:16, write:17]
uthr_pipe_create, create pipe[read:18, write:19]
uthr_pipe_create, create pipe[read:20, write:21]
uthr_pipe_create, create pipe[read:22, write:23]
uthr_pipe_create, create pipe[read:24, write:25]
uthr_pipe_create, create pipe[read:26, write:27]
uthr_pipe_create, create pipe[read:28, write:29]
uthr_pipe_create, create pipe[read:30, write:31]
uthr_pipe_create, create pipe[read:32, write:33]
uthr_pipe_create, create pipe[read:34, write:35]
uthr_pipe_create, create pipe[read:36, write:37]
uthr_pipe_create, create pipe[read:38, write:39]
uthr_pipe_create, create pipe[read:40, write:41]
aud sys init success.
aud rt sys init success.
systables desc init success.
ndct_db_load_info finished, code:100.
nsvr_process_before_open begin.
nsvr_process_before_open success.
SYSTEM IS READY.

查询拉起来库中的数据

[dmdba@xifenfei DAMENG]$ disql SYSDBA/Oracle123@localhost:5236

Server[localhost:5236]:mode is normal, state is open
login used time : 6.192(ms)
disql V8
SQL> select count(1) from t1;

LINEID     COUNT(1)            
---------- --------------------
1          1067

used time: 8.976(ms). Execute id is 601.
SQL> 

dd破坏包含50多个pdb的asm 磁盘组恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:dd破坏包含50多个pdb的asm 磁盘组恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

前段时间刚刚恢复了一个客户dd了34个磁盘中的2个磁盘的1-2G多的数据(在生产环境错误执行dd命令破坏asm磁盘故障恢复),这次又遇到一个客户dd了asm 的两个磁盘的100m和10m,这次故障麻烦的是由于dd了disk 0和disk 1,ausize为4M,有50多个pdb在该磁盘组中,相对恢复比较麻烦,通过不懈努力,最后终于最大限度恢复客户数据,避免了进一步的损失
故障误操作
近期又一个可以误执行了dd命令,破坏了生产库的两个asm disk磁盘
1


上述截图中可以确认两个信息
1.sdh盘被dd掉了10m,sdg盘被dd掉了100M
2.sdg对应的是asm-data1,sdh对应的是asm-data2
dd误操作之后,data磁盘组开始报错,然后直接dismount.
2

分析原磁盘组中asm磁盘情况
1.确定损坏磁盘在磁盘组中位置
通过asm的alert日志分析损坏的盘对应的磁盘组信息
3

基于这个信息,我们可以确认asm-data1 对应的是DATA磁盘组的disk 0,asm-data2对应的是DATA磁盘组的disk 1,也就是说现在DATA磁盘组的0号盘被dd了100M,1号盘被dd了10m
2.确认ausize大小
通过分析该磁盘组的其他磁盘确认ausize大小
kfdhdb.ausize: 4194304 ; 0x0bc: 0×00400000
可以确认该磁盘组的au大小为4M
3.磁盘组后续进行了一次加盘扩容
4

在25年6月份,对data磁盘组加了三块盘
4.sdh(asm-data2)磁盘还被加入到了arch磁盘组中
5

比较幸运由于在另外一个节点上sdh磁盘权限没有正确修改,导致该增加没有完全成功,也就是该磁盘没有被reblance(如果加入成功并正常reblance,那后果比现在严重很多)
5.通过结合asm日志以及kfed,磁盘物理大小等信息,并且通过kfed构造出来损坏的磁盘头信息,列出故障之前DATA磁盘组所有磁盘的情况(为了避免udev别名带来的影响,我直接使用物理磁盘名称来显示)
6

客户现场情况说明
1.客户有一个大概1年多之前的dataguard(后续没有继续同步),已经进行了failover激活
2.客户的备份系统中有一个大概1个月之前的备份,但是备份库缺少归档,我接手之时,已经被维护厂商强制拉起
3.在我恢复之前,有专业的工程师已经对其这个现场进行了分析,但是没有拿出好的恢复方案
恢复难点说明
1.该磁盘组的disk 0 被dd掉了100M,这个导致kfdhdb.f1b1locn记录被清空,也就无法获取到存储指向ASM file 1 文件目录表,这个值虽然被清空,但是根据经验或者对比其他磁盘组的disk 0 可以确认指定aun为10
2.f1b1locn指向的au中存储中asm元数据1-255以及256-1023的file的前面60个au的extent映射表,由于这个在aun为10(也就是aun*aus=40M)的位置,但是这个位置也已经被dd掉了,从原理上业务文件256-1023的前面60个au的extent映射表彻底丢失
3.通过工具扫描,发现存储别名信息的au也在disk 0的前面100M之内(也被dd掉了),导致通过别名直接定位文件的起点的恢复思路也不可行,而且别名丢失导致以别名的数据文件后续识别有一定的难度(无法获取asm里面文件的完整路径)
4.该库有50多套pdb组成,也就意味着通过数据库碎片扫描的方式无法恢复(因为每个pdb里面都是由默认的种子创建而成,也就意味着rfile 1,4,9是重复的)
5.由于中途加过一次磁盘,因引起这个asm里面的文件进行重新reblance,使得部分文件同一个block记录在不同磁盘的au上(数据文件块可能重复)
6.由于该磁盘组中asm disk 大小不等,导致文件au分布在各个磁盘上不均匀,无明显规律可循
恢复操作
1.通过kfed对损坏的asm-data1、asm-disk2的磁盘头进行构造,便于后续的恢复工具识别
2.通过工具对data中所有磁盘进行扫描,主要扫描文件extent映射表信息和ACD中关于asm文件的分配信息
2.1> 通过对这些信息进行综合分析,确认asm file >=1024的文件的extent映射表信息完整,数据可以直接恢复,效果类似
7

2.2> 对于256-1023号文件通过asm file的extent信息缺少0-59 au的数据,结合acd中获取的部分au分配信息,可以尽可能完整的恢复出来这些数据(由于disk 0中的acd信息丢失,所以不是100%完整)
8

3.对于绝对文件号非1,4,9的文件,而且asm file 小于1024的数据文件,结合rdba碎片重组的方式再一次进行恢复,避免上面2.2中恢复中前面240M(60个au),有部分au信息丢失导致文件不完整的情况
通过上述多种方法恢复,整体恢复数据文件类似(由于文件较多,存放多个目录和根据规则取了多种名字)
9

4.由于asm的文件目录,别名信息全部丢失,而且该库有50多个pdb,无法确认恢复出来的上千个文件和pdb对应的关系,对于这种情况,临时写了一个小程序,对这些文件进行读取,获取file#,rfile#,ts#,tsname,文件大小,dbid,dbname,scn等信息(obet(Oracle Block Editor Tool)第二版发布
10

通过把这些信息和历史的控制文件中的信息进行匹配,确认各个文件所属的pdb关系
5.通过4中获取的文件和pdb对应关系然后通过dbms_pdb.recover包实现把恢复的文件插入到新的cdb库中,在这个插入和open库过程中遇到各种错误,都一一解决
11
12
13
14
15

6.最终完成客户数据恢复要求(为了保证数据不被再次修改,客户要求所有恢复的pdb不能打开到读写模式)
16

然后由客户的运维厂商或者应用厂商把需要的数据迁移或者整合到新库中并恢复业务,完成本次恢复任务

Oracle数据库系统回滚段异常处理-ORA-600 4137/4193

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:Oracle数据库系统回滚段异常处理-ORA-600 4137/4193

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

最初是由于数据库sysaux文件无法正常恢复,重建ctl抛弃sysaux文件,然后打开数据库,但是无法expdp导出数据

Export: Release 12.2.0.1.0 - Production on Wed Jun 24 17:18:04 2026
Copyright (c) 1982, 2017, Oracle and/or its affiliates.  All rights reserved.
Connected to: Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production
ORA-31626: job does not exist
ORA-31637: cannot create job SYS_EXPORT_SCHEMA_02 for user SYS
ORA-06512: at "SYS.KUPV$FT", line 1140
ORA-06512: at "SYS.KUPV$FT", line 1741
ORA-06512: at "SYS.DBMS_SYS_ERROR", line 95
ORA-06512: at "SYS.KUPV$FT_INT", line 823
ORA-39080: failed to create queues "KUPC$C_1_20260624171804" and "" for Data Pump job
ORA-06512: at "SYS.DBMS_SYS_ERROR", line 95
ORA-06512: at "SYS.KUPC$QUE_INT", line 1541
ORA-00376: file 3 cannot be read at this time
ORA-06512: at "SYS.DBMS_AQADM", line 742
ORA-06512: at "SYS.DBMS_AQADM_SYS", line 8060
ORA-01110: data file 3: '/u01/app/oracle/product/12.2.0.1/dbhome_2/dbs/MISSING00003'
ORA-06512: at "SYS.DBMS_AQADM_SYSCALLS", line 912
ORA-06512: at "SYS.DBMS_AQADM_SYS", line 8036
ORA-06512: at "SYS.DBMS_AQADM", line 737
ORA-06512: at "SYS.KUPC$QUE_INT", line 1461
ORA-06512: at line 1
ORA-06512: at "SYS.KUPC$QUEUE_INT", line 158
ORA-06512: at "SYS.KUPV$FT_INT", line 758
ORA-06512: at "SYS.KUPV$FT", line 1645
ORA-06512: at "SYS.KUPV$FT", line 1101

然后通过各方人员一顿操作猛如虎,导致数据库启动报ORA-600 4137和ORA-600 4193错误,数据库无法open成功

2026-06-24T18:38:50.158906+08:00
alter database open
2026-06-24T18:38:50.182720+08:00
Ping without log force is disabled:
  instance mounted in exclusive mode.
2026-06-24T18:38:50.219449+08:00
…………
2026-06-24T18:38:50.514016+08:00
ARC3: Archival started
ARCH: STARTING ARCH PROCESSES COMPLETE
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_48840.trc  (incident=304968):
ORA-00600: internal error code, arguments: [4137], [0.77.1546], [0], [0], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_304968/orcl1_ora_48840_i304968.trc
Use ADRCI or Support Workbench to package the incident.
ORACLE Instance orcl1 (pid = 53) - Error 600 encountered while recovering transaction (0, 77).
2026-06-24T18:38:51.313973+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_48840.trc:
ORA-00600: internal error code, arguments: [4137], [0.77.1546], [0], [0], [], [], [], [], [], [], [], []
2026-06-24T18:38:51.649361+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_48840.trc  (incident=304969):
ORA-00600: internal error code, arguments: [4193], [1112], [1122], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_304969/orcl1_ora_48840_i304969.trc
2026-06-24T18:38:53.412782+08:00
opiodr aborting process unknown ospid (48840) as a result of ORA-603

需要open故障库,并且正常导出数据,需要处理两个问题
1. 被抛弃的sysaux文件需要正常online起来,不然expdp无法正常导出用户或者全库数据
2. 需要解决open过程的ORA-600 4137/ORA-600 4193错误
对于sysaux文件进行检查,由于重建ctl没有包含异常的sysaux文件,因此无法直接从库中查询到当前各种文件头相关情况,通过obet直接解析文件头获取相关信息(Oracle数据块编辑工具( Oracle Block Editor Tool)-obet)
res


对于这种情况,可以使用obet的修改文件头checkpoint scn和resetlogs scn功能进行快速修复

OBET> set file 2
filename set to: /u02/app/oracle/oradata/PYJHYSYS/PYJHYSYS/datafile/o1_mf_sysaux_go991cmw_.dbf (file#2)

OBET> backup
Created backup directory: backup_blk
Successfully backed up block 1 from current file to /tmp/backup_blk/o1_mf_sysaux_go991cmw_.dbf_1.20260624191357

OBET> copy chkscn file 1 to file 2
Error: Edit mode not enabled. Use 'set mode edit' first.

OBET> set mode edit
mode set to: edit

OBET> copy chkscn file 1 to file 2

Confirm Modify chkscn:
Source: file#1 (/u02/app/oracle/oradata/PYJHYSYS/PYJHYSYS/datafile/o1_mf_system_go990lcg_.dbf)
Target: file#2 (/u02/app/oracle/oradata/PYJHYSYS/PYJHYSYS/datafile/o1_mf_sysaux_go991cmw_.dbf)
Proceed? (Y/YES to confirm): yes
Successfully copied checkpoint SCN information from file#1 to file#2.

OBET> copy resetlogscn file 1 to file 2

Confirm Modify resetlogscn:
Source: file#1 (/u02/app/oracle/oradata/PYJHYSYS/PYJHYSYS/datafile/o1_mf_system_go990lcg_.dbf)
Target: file#2 (/u02/app/oracle/oradata/PYJHYSYS/PYJHYSYS/datafile/o1_mf_sysaux_go991cmw_.dbf)
Proceed? (Y/YES to confirm): yes
Successfully copied resetlog SCN information from file#1 to file#2.

OBET> sum
Check value for File /u02/app/oracle/oradata/PYJHYSYS/PYJHYSYS/datafile/o1_mf_sysaux_go991cmw_.dbf, Block 1:
current = 0xF21B, required = 0x6651

OBET> sum apply

Confirm applying checksum:
File: /u02/app/oracle/oradata/PYJHYSYS/PYJHYSYS/datafile/o1_mf_sysaux_go991cmw_.dbf
Block: 1
Offset in block: 16 (file offset: 0x00002010)
Original value: 0xF21B
New value:      0x6651
Confirm? (Y/YES to proceed): y
Verification successful: Stored checksum matches calculated value (0x6651).
Checksum applied successfully.

OBET> tailchk
Check tailchk for File /u02/app/oracle/oradata/PYJHYSYS/PYJHYSYS/datafile/o1_mf_sysaux_go991cmw_.dbf, Block 1:
current = 0x010B0000, required = 0x010B0000

OBET>

然后重建ctl,包含该sysaux,尝试打开数据库,报ORA-600 4193错误

SYS@ORCL> alter database open ;
alter database open 
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [4193], [1112], [1122], [], [], [], [], [], [], [], [], []
Process ID: 93550
Session ID: 1123 Serial number: 55884

进一步跟踪启动过程,确认报错在update undo$上


PARSING IN CURSOR 
#140446136869016
 len=160 dep=1 uid=0 oct=6 lid=0 tim=3161302405543 hv=1292341136 ad='9bbd4828' sqlid='8vyjutx6hg3wh'
update /*+ rule */ undo$ set name=:2,file#=:3,block#=:4,status$=:5,user#=:6,undosqn=:7,
xactsqn=:8,scnbas=:9,scnwrp=:10,inst#=:11,ts#=:12,spare1=:13 where us#=:1
END OF STMT
PARSE 
#140446136869016
:c=11966,e=11918,p=18,cr=94,cu=0,mis=1,r=0,dep=1,og=3,plh=0,tim=3161302405542
BINDS 
#140446136869016
:
 Bind
#0
  oacdty=01 mxl=32(21) mxlc=00 mal=00 scl=00 pre=00
  oacflg=18 fl2=0001 frm=01 csi=852 siz=32 off=0
  kxsbbbfp=9bbdac32  bln=32  avl=21  flg=09
  value="_SYSSMU12_3861134380$"
 Bind
#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fbc23eda370  bln=24  avl=02  flg=05
  value=5
 Bind
#2
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fbc23eda340  bln=24  avl=03  flg=05
  value=144
 Bind
#3
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fbc23eda310  bln=24  avl=02  flg=05
  value=5
 Bind
#4
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fbc23eda2e0  bln=24  avl=02  flg=05
  value=1
 Bind
#5
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fbc23eda2b0  bln=24  avl=04  flg=05
  value=46221
 Bind
#6
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fbc23eda280  bln=24  avl=05  flg=05
  value=30810931
 Bind
#7
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fbc23eda250  bln=24  avl=06  flg=05
  value=3399756014
 Bind
#8
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fbc23eda220  bln=24  avl=03  flg=05
  value=2429
 Bind
#9
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fbc23eda1f0  bln=24  avl=02  flg=05
  value=2
 Bind
#10
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fbc23eda1c0  bln=24  avl=02  flg=05
  value=4
 Bind
#11
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fbc23eda190  bln=24  avl=02  flg=05
  value=2
 Bind
#12
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fbc23eda3a0  bln=22  avl=02  flg=05
  value=12
WAIT 
#140446136869016
: nam='db file sequential read' ela= 16 file#=1 block#=547 blocks=1 obj#=0 tim=3161302406306
2026-06-24T19:59:40.979075+08:00
ORA-00600: internal error code, arguments: [4193], [1112], [1122], [], [], [], [], [], [], [], [], []

alert日志中还有ORA-600 4137等错误

ORACLE Instance orcl1 (pid = 53) - Error 600 encountered while recovering transaction (0, 77).
2026-06-24T19:59:40.387459+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_83245.trc:
ORA-00600: internal error code, arguments: [4137], [0.77.1546], [0], [0], [], [], [], [], [], [], [], []

通过这个报错,可以确认是由于0号回滚段,也就是rollback中事务异常,获取相关的trace

[TOC00003]
----- Beginning of Customized Incident Dump(s) -----
XID passed in = xid: 0x0000.04d.0000060a
XID from Undo block = xid: 0x0000.060.00000600
Dump of buffer cache at level 7 for pdb=0 tsn=0 rdba=4194432
BH (0x3ddfd26b8) file#: 1 rdba: 0x00400080 (1/128) class: 15 ba: 0x3ddb80000
  set: 166 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,0
  dbwrid: 3 obj: -1 objn: 0 tsn: [0/0] afn: 1 hint: f
  hash: [0x3ddece808,0xc6bdc2d8] lru: [0xbc1db108,0xbc1db108]
  ckptq: [NULL] fileq: [NULL]
  objq: [0xa2267bc0,0xa2267bc0] objaq: [0xa2267bb0,0xa2267bb0]
  st: XCURRENT md: NULL fpin: 'ktuwh72: ktugus:ktuswr1' fscn: 0x980cfff669f tch: 1
  flags:
  LRBA: [0x0.0.0] LSCN: [0x0] HSCN: [0x0] HSUB: [65535]
  Printing buffer operation history (latest change first):
  cnt: 10
  01. sid:00 L353:gcur:set:MEXCL      02. sid:00 L145:zib:mk:EXCL       
  03. sid:00 L212:zib:bic:FSQ         04. sid:00 L122:zgb:set:st        
  05. sid:00 L830:olq1:clr:WRT+CKT    06. sid:00 L951:zgb:lnk:objq      
  07. sid:00 L372:zgb:set:MEXCL       08. sid:00 L123:zgb:no:FEN        
  09. sid:00 L083:zgb:ent:fn          10. sid:01 L203:w_ini_dc:bic:FVB  
  buffer tsn: 0 rdba: 0x00400080 (1/128)
  scn: 0x980cffc5958 seq: 0x01 flg: 0x04 tail: 0x59580e01
  frmt: 0x02 chkval: 0x2688 type: 0x0e=KTU UNDO HEADER W/UNLIMITED EXTENTS

基于这样的情况,可以判断通过清理undo$中的相关记录,让其重新分配新的回滚块

Block Header:
block type=0x0e (KTU UNDO HEADER W/UNLIMITED EXTENTS)
block format=0xa2 (oracle 10+)
block rdba=0x00400080 (file#=1, block#=128)
scn=0x0980.cff7c56d, seq=1, tail=0xc56d0e01
block checksum value=0x2683=9859, flag=4
  Extent Control Header
  -------------------------------------------------------------
  Extent Header:: extents: 10  blocks: 79
                  last map: 0x00000000  
#maps
: 0  offset: 4128
      Highwater:: 0x00400225  (rfile#=1,block#=549)
                  ext#: 6  blk#: 5   ext size:8
      
#blocks
 in seg. hdr's freelists: 0
      
#blocks
 below: 0
      mapblk: 0x00000000   offset: 6
      Map Header:: next: 0x00000000   
#extents
: 10  obj#: 0  flag: 0x40000000
  Extent Control Header
  -------------------------------------------------------------
   0x00400081  length: 7
   0x004206a8  length: 8
   0x004206b0  length: 8
   0x00400088  length: 8
   0x00400210  length: 8
   0x00400218  length: 8
   0x00400220  length: 8
   0x00400228  length: 8
   0x004206a0  length: 8
   0x00400230  length: 8
  TRN CTL:: seq: 0x0462 chd: 0x005e ctl: 0x000d inc: 0x00000000 nbf: 0x0000
            mgc: 0x8002 xts: 0x0068 flg: 0x0001 opt: 2147483646(0x7ffffffe)
            uba: 0x00000225.0462.1d scn: 0x0980.cf1f2121
Version: 0x01
  FREE BLOCK POOL::
    uba: 0x00000000.0462.1c ext: 0x6  spc: 0x11a2
    uba: 0x00000000.0462.26 ext: 0x6  spc: 0xc86
    uba: 0x00000000.0462.03 ext: 0x6  spc: 0x1e5c
    uba: 0x00000000.0460.03 ext: 0x4  spc: 0x1e5c
    uba: 0x00000000.043c.21 ext: 0x8  spc: 0xd2c

然后数据库打开成功,使用expdp完美导出数据,完成本次恢复任务