Oracle Recovery Tools恢复案例总结—202505

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:Oracle Recovery Tools恢复案例总结—202505

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

开发出来Oracle Recovery Tools小工具已经一段时间,而且在大量的客户恢复case中使用,大大的提高的恢复效率,特别是win平台需要bbed或者类似工具的时候.现在对该工具在实战中的一些case进行总结:
Oracle Recovery Tools修复空闲坏块
Oracle Recovery Tools实战批量坏块修复
Oracle Recovery Tools快速恢复ORA-19909
Oracle Recovery Tools 解决ORA-600 3020故障
Oracle Recovery Tools恢复csc higher than block scn
Oracle Recovery Tools恢复MISSING00000文件故障
Oracle Recovery Tools快速恢复重建ctl遗漏数据文件故障
一键恢复ORA-01113 ORA-01110—Oracle Recovery Tools
Oracle Recovery Tools 解决ORA-01190 ORA-01248等故障
Oracle Recovery Tools快速解决sysaux文件不能online问题
Oracle Recovery Tools恢复—ORA-00704 ORA-01555故障
ORA-01113 ORA-01110错误不一定都要Oracle Recovery Tools解决
Oracle Recovery Tools解决ORA-00279 ORA-00289 ORA-00280故障
Oracle Recovery Tools修复ORA-600 6101/kdxlin:psno out of range故障
Oracle Recovery Tools工具一键解决ORA-00376 ORA-01110故障(文件offline)
Oracle Recovery Tools修复ORA-00742、ORA-600 ktbair2: illegal inheritance故障
Oracle Recovery Tools快速恢复断电引起的无法正常启动数据库(ORA-01555,MISSING000等问题)
软件下载:OraRecovery下载
使用说明:使用说明

ORA-600 kddummy_blkchk 数据库循环重启

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-600 kddummy_blkchk 数据库循环重启

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

一个运行在hp平台的10g rac突然异常,之后运行一段时间就自动重启,客户让对其进行分析和解决

Thu May  8 06:23:21 2025
ALTER DATABASE OPEN
Picked broadcast on commit scheme to generate SCNs
Thu May  8 06:23:21 2025
Thread 1 opened at log sequence 74302
  Current log# 1 seq# 74302 mem# 0: /dev/vgdata/rrac_redo01
Successful open of redo thread 1
Thu May  8 06:23:21 2025
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Thu May  8 06:23:21 2025
SMON: enabling cache recovery
Thu May  8 06:23:22 2025
Successfully onlined Undo Tablespace 1.
Thu May  8 06:23:22 2025
SMON: enabling tx recovery
Thu May  8 06:23:22 2025
Database Characterset is ZHS16CGB231280
Opening with internal Resource Manager plan
where NUMA PG = 1, CPUs = 4
replication_dependency_tracking turned off (no async multimaster replication found)
Thu May  8 06:23:23 2025
Errors in file /users/oracle/admin/orcl/bdump/orcl1_smon_15721.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [118], [333578], [18019], [], [], [], []
Starting background process QMNC
QMNC started with pid=22, OS id=15792
Thu May  8 06:23:25 2025
ORACLE Instance orcl1 (pid = 13) - Error 607 encountered while recovering transaction (9, 33) on object 775794.
Thu May  8 06:23:25 2025
Errors in file /users/oracle/admin/orcl/bdump/orcl1_smon_15721.trc:
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [kddummy_blkchk], [118], [333578], [18019], [], [], [], []
Thu May  8 06:23:26 2025
Completed: ALTER DATABASE OPEN
Thu May  8 06:23:26 2025
Doing block recovery for file 118 block 333578
Block recovery from logseq 74302, block 22 to scn 46740761996
Thu May  8 06:23:26 2025
Recovery of Online Redo Log: Thread 1 Group 1 Seq 74302 Reading mem 0
  Mem# 0: /dev/vgdata/rrac_redo01
Block recovery stopped at EOT rba 74302.33.16
Block recovery completed at rba 74302.33.16, scn 10.3791089036
Thu May  8 06:23:33 2025
Trace dumping is performing id=[cdmp_20250508062324]
Thu May  8 06:25:55 2025
Errors in file /users/oracle/admin/orcl/bdump/orcl1_smon_15721.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [118], [333578], [18019], [], [], [], []
Thu May  8 06:25:58 2025
ORACLE Instance orcl1 (pid = 13) - Error 607 encountered while recovering transaction (9, 33) on object 775794.
Thu May  8 06:27:32 2025
Errors in file /users/oracle/admin/orcl/bdump/orcl1_smon_15721.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [118], [333578], [18019], [], [], [], []
Errors in file /users/oracle/admin/orcl/bdump/orcl1_smon_15721.trc:
Doing block recovery for file 118 block 333578
Block recovery from logseq 74302, block 372 to scn 46740952565
Thu May  8 06:27:41 2025
Errors in file /users/oracle/admin/orcl/bdump/orcl1_smon_15721.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [118], [333578], [18019], [], [], [], []
ORACLE Instance orcl1 (pid = 13) - Error 607 encountered while recovering transaction (9, 33) on object 775794.
Thu May  8 06:27:43 2025
Errors in file /users/oracle/admin/orcl/bdump/orcl1_smon_15721.trc:
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [kddummy_blkchk], [118], [333578], [18019], [], [], [], []
Doing block recovery for file 118 block 333578
Block recovery from logseq 74302, block 372 to scn 46740952565
Thu May  8 06:27:45 2025
Recovery of Online Redo Log: Thread 1 Group 1 Seq 74302 Reading mem 0
  Mem# 0: /dev/vgdata/rrac_redo01
Block recovery completed at rba 74302.394.16, scn 10.3791279606
Thu May  8 06:27:47 2025
Errors in file /users/oracle/admin/orcl/bdump/orcl1_smon_15721.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [118], [333578], [18019], [], [], [], []
Thu May  8 06:28:07 2025
Errors in file /users/oracle/admin/orcl/bdump/orcl1_pmon_15690.trc:
ORA-00474: SMON process terminated with error
Thu May  8 06:28:07 2025
PMON: terminating instance due to error 474

这个数据库重启是由于smon进程异常导致数据库关闭,然后rac自动拉起数据库,从而出现了循环重启启动,smon异常的主要原因是由于“ORACLE Instance orcl1 (pid = 13) – Error 607 encountered while recovering transaction (9, 33) on object 775794.”这个表示有实物要回滚,但是遇到了ORA-600 kddummy_blkchk异常,无法完成回滚从而使得smon进程异常进而使得数据实例crash.对于这个问题处理相对比较简单
1. 通过ORA-600 kddummy_blkchk找出来报错对象

ORA-00600: internal error code, arguments: [kddummy_blkchk], [a], [b], 1

ARGUMENTS:
Arg [a] Absolute file number
Arg [b] Block number
Arg 1 Internal error code returned from kcbchk() which indicates the problem encountered.

2. 屏蔽实物回滚,打开数据库,然后对通过dba_extents 查询出来的对象/或者根据事务报错的object信息查询出来对象进行重建,完成本次恢复任务

关于ORA-00600: internal error code, arguments: [kddummy_blkchk], [a], [b], 相关目前已知bug

Bug Fixed Description
12349316 11.2.0.4, 12.1.0.2, 12.2.0.1 DBMS_SPACE_ADMIN.TABLESPACE_FIX_BITMAPS fails with ORA-600 [kddummy_blkchk] / ORA-600 [kdBlkCheckError] / ORA-607
17325413 11.2.0.3.BP23, 11.2.0.4.2, 11.2.0.4.BP04, 12.1.0.1.3, 12.1.0.2, 12.2.0.1 Drop column with DEFAULT value and NOT NULL definition ends up with Dropped Column Data still on Disk leading to Corruption
13715932 11.2.0.4, 12.1.0.1 ORA-600 [kddummy_blkchk] [18038] while adding extents to a large datafile
12417369 11.2.0.2.5, 11.2.0.2.BP13, 11.2.0.2.GIPSU05, 11.2.0.3, 12.1.0.1 Block corruption from rollback on compressed table
10324526 10.2.0.5.4, 11.1.0.7.8, 11.2.0.2.3, 11.2.0.2.BP06, 11.2.0.3, 12.1.0.1 ORA-600 [kddummy_blkchk] [6106] / corruption on COMPRESS table in TTS
10113224 11.2.0.3, 12.1.0.1 Index coalesce may generate invalid redo if blocks in the buffer cache are invalid/corrupted
9726702 11.2.0.3, 12.1.0.1 DBMS_SPACE_ADMIN.assm_segment_verify reports HWM related inconsistencies
9724970 11.2.0.1.BP08, 11.2.0.2.2, 11.2.0.2.BP02, 11.2.0.3, 12.1.0.1 Block Corruption with PDML UPDATE. ORA_600 [4511] OERI[kdblkcheckerror] by block check
9711859 10.2.0.5.1, 11.1.0.7.6, 11.2.0.2, 12.1.0.1 ORA-600 [ktsptrn_fix-extmap] / ORA-600 [kdblkcheckerror] during extent allocation caused by bug 8198906
9581240 11.1.0.7.9, 11.2.0.2, 12.1.0.1 Corruption / ORA-600 [kddummy_blkchk] [6101] / ORA-600 [7999] after RENAME operation during ROLLBACK
9350204 11.2.0.3, 12.1.0.1 Spurious ORA-600 [kddummy_blkchk] .. [6145] during CR operations on tables with ROWDEPENDENCIES
9231605 11.1.0.7.4, 11.2.0.1.3, 11.2.0.1.BP02, 11.2.0.2, 12.1.0.1 Block corruption with missing row on a compressed table after DELETE
9119771 11.2.0.2, 12.1.0.1 OERI [kddummy_blkchk]…[6108] from ‘SHRINK SPACE CASCADE’
9019113 11.2.0.1.BP02, 11.2.0.2, 12.1.0.1 ORA-600 [17182] ORA-7445 [memcpy] ORA-600 [kdBlkCheckError] for OLTP COMPRESS table in OLTP Compression REDO during RECOVERY
8951812 11.2.0.2, 12.1.0.1 Corrupt index by rebuild online. Possible OERI [kddummy_blkchk] by SMON
8720802 10.2.0.5, 11.2.0.1.BP07, 11.2.0.2, 12.1.0.1 Add check for row piece pointing to itself (db_block_checking,dbv,rman,analyze)
8331063 11.2.0.3, 12.1.0.1 Corrupt Undo. ORA-600 [2015] in Undo Block During Rollback
6523037 11.2.0.1.BP07, 11.2.0.2.2, 11.2.0.2.BP01, 11.2.0.3, 12.1.0.1 Corruption / ORA-600 [kddummy_blkchk] [6110] on update
8277580 11.1.0.7.2, 11.2.0.1, 11.2.0.2, 12.1.0.1 Corruption on compressed tables during Recovery and Quick Multi Delete (QMD).
9964102 11.2.0.1 OERI:2015 / OERI:kddummy_blkchk / undo corruption from supplementat logging with compressed tables
8613137 11.1.0.7.2, 11.2.0.1 ORA-600 updating table with DEFERRED constraints
8360192 11.1.0.7.6, 11.2.0.1 ORA-600 [kdBlkCheckError] [6110] / corruption from insert
8239658 10.2.0.5, 11.2.0.1 Dump / corruption writing row to compressed table
8198906 10.2.0.5, 11.2.0.1 OERI [kddummy_blkchk] / OERI [5467] for an aborted transaction of allocating extents
7715244 11.1.0.7.2, 11.2.0.1 Corruption on compressed tables. Error codes 6103 / 6110
7662491 10.2.0.4.2, 10.2.0.5, 11.1.0.7.4, 11.2.0.1 Array Update can corrupt a row. Errors OERI[kghstack_free1] or OERI[kddummy_blkchk][6110]
7411865 10.2.0.4.2, 10.2.0.5, 11.1.0.7.1, 11.2.0.1 OERI:13030 / ORA-1407 / block corruption from UPDATE .. RETURNING DML with trigger
7331181 11.2.0.1 ORA-1555 or OERI [kddummy_blkchk] [file#] [block#] [6126] during CR Rollback in query
7293156 11.1.0.7, 11.2.0.1 ORA-600 [2023] by Parallel Transaction Rollback when applying Multi-block undo Head-piece / Tail-piece
7041254 11.1.0.7.5, 11.2.0.1 ORA-19661 during RMAN restore check logical of compressed backup / IOT dummy key
6760697 10.2.0.4.3, 10.2.0.5, 11.1.0.7, 11.2.0.1 DBMS_SPACE_ADMIN.ASSM_SEGMENT_VERIFY does not detect certain segment header block corruption
6647480 10.2.0.4.4, 10.2.0.5, 11.1.0.7.3, 11.2.0.1 Corruption / OERI [kddummy_blkchk] .. [18021] with ASSM
6134368 10.2.0.5, 11.2.0.1 ORA-1407 / block corruption from UPDATE .. RETURNING DML with trigger – SUPERCEDED
6057203 10.2.0.4, 11.1.0.7, 11.2.0.1 Corruption with zero length column (ZLC) / OERI [kcbchg1_6] from Parallel update
6653934 10.2.0.4.2, 10.2.0.5, 11.1.0.7 Dump / block corruption from ONLINE segment shrink with ROWDEPENDENCIES
6674196 10.2.0.4, 10.2.0.5, 11.1.0.6 OERI / buffer cache corruption using ASM, OCFS or any ksfd client like ODM
5599596 10.2.0.4, 11.1.0.6 Block corruption / OERI [kddummy_blkchk] on clustered or compressed tables
5496041 10.2.0.4, 11.1.0.6 OERI[6006] / index corruption on compressed index
5386204 10.2.0.4.1, 10.2.0.5, 11.1.0.6 Block corruption / OERI[kddummy_blkchk] after direct load of ASSM segment
5363584 10.2.0.4, 11.1.0.6 Array insert into table can corrupt redo
4602031 10.2.0.2, 11.1.0.6 Block corruption from UPDATE or MERGE into compressed table
4493447 11.1.0.6 Spurious ORA-600 [kddummy_blkchk] [file#] [block#] [6145] on rollback of array update
4329302 11.1.0.6 OERI [kddummy_blkchk] [file#] [block#] [6145] on rollback of update with logminer
6075487 10.2.0.4 OERI[kddummy_blkchk]..[18020/18026] for DDL on plugged ASSM tablespace with FLASHBACK
4054640 10.1.0.5, 10.2.0.1 Block corruption / OERI [kddummy_blkchk] at physical standby
4000840 10.1.0.4, 10.2.0.1, 9.2.0.7 Update of a row with more than 255 columns can cause block corruption
3772033 9.2.0.7, 10.1.0.4, 10.2.0.1 OERI[ktspfmb_create1] creating a LOB in ASSM using 2k blocksize

 

记录一次asm disk加入到vg通过恢复直接open库的案例

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:记录一次asm disk加入到vg通过恢复直接open库的案例

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户在不清楚磁盘被asm disk使用的情况下,直接分区做pv,加入到vg中并且分配给了lv,导致数据库异常
QQ20250504
144809


通过操作系统层面分析,确认客户把data磁盘组的一个磁盘给处理掉了,导致数据库报错

WARNING: ASMB force dismounting group 2 (DATA) due to failover
SUCCESS: diskgroup DATA was dismounted
2025-05-04T07:03:19.910082+08:00
KCF: read, write or open error, block=0x201544 online=1
        file=102 '+DATA/ORCL/F7D939D6DBE06C71E053C30114AC1F10/DATAFILE/xifenfei_61.dbf'
        error=15078 txt: ''
2025-05-04T07:03:19.918972+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_dbwc_18507.trc:
2025-05-04T07:03:19.952045+08:00
KCF: read, write or open error, block=0x2013e7 online=1
        file=102 '+DATA/ORCL/F7D939D6DBE06C71E053C30114AC1F10/DATAFILE/xifenfei_61.dbf'
        error=15078 txt: ''
2025-05-04T07:03:19.964538+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_dbw7_18486.trc:
2025-05-04T07:03:19.967133+08:00
KCF: read, write or open error, block=0x230e71 online=1
        file=105 '+DATA/ORCL/F7D939D6DBE06C71E053C30114AC1F10/DATAFILE/xifenfei_64.dbf'
        error=15078 txt: ''
2025-05-04T07:03:19.973289+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_dbw2_18466.trc:
2025-05-04T07:03:19.978514+08:00
KCF: read, write or open error, block=0x1f6e91 online=1
        file=86 '+DATA/ORCL/F7D939D6DBE06C71E053C30114AC1F10/DATAFILE/xifenfei_52.dbf'
        error=15078 txt: ''
2025-05-04T07:03:19.991060+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_dbwd_18511.trc:
2025-05-04T07:03:19.995762+08:00
KCF: read, write or open error, block=0x7f8 online=1
        file=15 '+DATA/ORCL/F7D939D6DBE06C71E053C30114AC1F10/DATAFILE/undotbs01.dbf'
        error=15078 txt: ''
2025-05-04T07:03:20.006862+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_dbwa_18498.trc:
2025-05-04T07:03:20.020739+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_imr0_18937.trc:

这个客户比较幸运,处理该磁盘之后,没有往对应的lv中写入太多数据,导致覆盖部分很少

[root@rac01 rules.d]# df -h
文件系统               容量  已用  可用 已用% 挂载点
/dev/mapper/nlas-root  800G  272G  528G   34% /
devtmpfs               284G     0  284G    0% /dev
tmpfs                  284G  637M  283G    1% /dev/shm
tmpfs                  284G  4.0G  280G    2% /run
tmpfs                  284G     0  284G    0% /sys/fs/cgroup
/dev/mapper/nlas-home  200G   64M  200G    1% /home
/dev/sda1              197M  158M   40M   80% /boot
tmpfs                   57G   40K   57G    1% /run/user/0
tmpfs                   57G   48K   57G    1% /run/user/1000
[root@rac01 rules.d]# pvs
  PV         VG   Fmt  Attr PSize   PFree
  /dev/sda2  nlas lvm2 a--  564.00g    0 
  /dev/sdb1  nlas lvm2 a--   <2.00t 1.51t
[root@rac01 rules.d]# vgs
  VG   #PV #LV #SN Attr   VSize VFree
  nlas   2   3   0 wz--n- 2.55t 1.51t
[root@rac01 rules.d]# lvs
  LV   VG   Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  home nlas -wi-ao---- 200.00g                                                    
  root nlas -wi-ao---- 800.00g                                                    
  swap nlas -wi-ao----  64.00g                                                    

通过底层对磁盘进行分析,发现备份的磁盘头均以损坏,通过深入分析确认f1b1在sdb磁盘的第10个au上,通过相关信息,使用dul工具加载磁盘组,并分析元数据信息,发现恢复数据需要的元数据都可以正常加载
asm-dul


直接使用dul抽取数据到文件系统,然后open数据库成功
open-asm

然后通过rman 检测坏块(3T多的库只有不到5000个坏块,相对来说效果非常好),对于坏块对象进行处理,完美完成本次恢复工作.对于这次能够有这样好的恢复效果有几个因素:
1)asm disk 加入到vg,并分配给lv之后,立刻停止写入操作,避免了因为写入数据而覆盖asm 磁盘的带来的风险
2)由于是19c库,默认au为4M,使得数据库文件数据相对比较靠后,覆盖几率小了一点
3)由于文件系统是xfs,相对覆盖比ext4会少很多
4)是云环境的ssd磁盘,没有触发trim功能
以前类似asm disk异常恢复的相关case汇总:
asm磁盘加入vg恢复
asm磁盘dd破坏恢复
asm磁盘分区丢失恢复
pvid=yes导致asm无法mount
win asm disk header 异常恢复
又一例asm disk 加入vg故障
pvcreate asm disk导致asm磁盘组异常恢复
asm disk被加入到另外一个磁盘组故障恢复
再一例asm disk被误加入vg并且扩容lv恢复
再一起asm disk被格式化成ext3文件系统故障恢复
一次完美的asm disk被格式化ntfs恢复
asm disk误设置pvid导致asm diskgroup无法mount恢复
asm disk被分区,格式化为ext4恢复
oracle asm disk格式化恢复—格式化为ext4文件系统
分享oracleasm createdisk重新创建asm disk后数据0丢失恢复案例

CHECKDB 发现了 N 个分配错误和 M 个一致性错误

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:CHECKDB 发现了 N 个分配错误和 M 个一致性错误

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

接到一个朋友的数据库故障请求,dbcc checkdb报以下错误

服务器: 消息 8905,级别 16,状态 1,行 1
扩展盘区 (1:5144)(属于数据库 ID 8)在 GAM 中标记为已分配,但没有 SGAM 或 IAM 分配过该盘区。
服务器: 消息 8929,级别 16,状态 1,行 1
对象 ID 2: 在文本 ID 800849920 中发现错误,该文本的所有者是由 RID = (1:143:7) id = 1218103380 and indid = 4 标识的数据记录。
服务器: 消息 8961,级别 16,状态 1,行 1
表错误: 对象 ID 2。text、ntext 或 image 节点(位于页 (1:3813),槽 0,文本 ID 800849920)与该节点位于页 (1:489),槽 4 处的引用不匹配。
'myhis' 的 DBCC 结果。
CHECKDB 发现了 1 个分配错误和 0 个一致性错误,这些错误并不与任何单个的对象相关联。
'sysobjects' 的 DBCC 结果。
对象 'sysobjects' 有 905 行,这些行位于 13 页中。
'sysindexes' 的 DBCC 结果。
对象 'sysindexes' 有 635 行,这些行位于 26 页中。
CHECKDB 发现了 0 个分配错误和 2 个一致性错误(在表 'sysindexes' 中,该表的对象 ID 为 2)。
'syscolumns' 的 DBCC 结果。
………………
对象 'yj_sqd_taoc' 有 0 行,这些行位于 0 页中。
'h_zdytj' 的 DBCC 结果。
对象 'h_zdytj' 有 0 行,这些行位于 0 页中。
CHECKDB 发现了 1 个分配错误和 4 个一致性错误(在数据库 'myhis' 中)。
DBCC 执行完毕。如果 DBCC 输出了错误信息,请与系统管理员联系。

主要为:
1. 扩展盘区 (1:5144)(属于数据库 ID 8)在 GAM 中标记为已分配,但没有 SGAM 或 IAM 分配过该盘区。
2. 表错误: 对象 ID 2。text、ntext 或 image 节点(位于页 (1:3813),槽 0,文本 ID 800849920)与该节点位于页 (1:489),槽 4 处的引用不匹配。
3. CHECKDB 发现了 0 个分配错误和 2 个一致性错误(在表 ‘sysindexes’ 中,该表的对象 ID 为 2)

这个库是sql server 2000的版本,处理起来相对麻烦一些(由于该版本太老,很多工具软件对sql 2000版本支持不太好),后面通过sql恢复工具和sql控制台中的所有任务–>数据导入功能,对于个表异常表进行单独迁移完成本次任务
QQ20250503-194327


再次使用dbcc进行检测,一切正常,客户业务也恢复正常
QQ20250503-194523

dm.ctl文件异常恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:dm.ctl文件异常恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

达梦数据库中也有类似oracle的控制文件(control0x.ctl),在达梦数据库中一般叫做dm.ctl,具体是有dm.ini(达梦默认参数文件名)中的CTL_PATH参数确定.在某些情况下,由于参数文件损坏或者丢失,导致数据库异常,这里dm.ctl文件丢失故障恢复
创建表空间和表并插入数据

SQL> create tablespace tbs_xff datafile 'tbs_xff01.dbf' size 128;
操作已执行
已用时间: 52.996(毫秒). 执行号:1303.

SQL> create table t_xff tablespace tbs_xff as 
2   select * from dba_objects;
操作已执行
已用时间: 71.055(毫秒). 执行号:1304.
SQL> insert into t_xff select * from dba_objects;
影响行数 1094

已用时间: 28.759(毫秒). 执行号:1305.
SQL> insert into t_xff select * from dba_objects;
影响行数 1094

已用时间: 13.395(毫秒). 执行号:1306.
SQL> insert into t_xff select * from dba_objects;
影响行数 1094

已用时间: 13.867(毫秒). 执行号:1307.
SQL> insert into t_xff select * from dba_objects;
影响行数 1094

已用时间: 13.838(毫秒). 执行号:1308.
SQL> insert into t_xff select * from t_xff;
影响行数 5470

已用时间: 5.677(毫秒). 执行号:1309.
SQL> select count(1) from t_xff;

行号     COUNT(1)            
---------- --------------------
1          10940

已用时间: 1.949(毫秒). 执行号:1310.
SQL> insert into t_xff select * from t_xff;
影响行数 10940

已用时间: 35.035(毫秒). 执行号:1311.
SQL> insert into t_xff select * from t_xff;
影响行数 21880

已用时间: 38.489(毫秒). 执行号:1312.
SQL> insert into t_xff select * from t_xff;
影响行数 43760

已用时间: 86.242(毫秒). 执行号:1313.
SQL> insert into t_xff select * from t_xff;
影响行数 87520

已用时间: 272.804(毫秒). 执行号:1314.
SQL> insert into t_xff select * from t_xff;
影响行数 175040

已用时间: 529.733(毫秒). 执行号:1315.
SQL> insert into t_xff select * from t_xff;
影响行数 350080

已用时间: 00:00:01.090. 执行号:1316.
SQL> select count(1) from t_xff;

行号     COUNT(1)            
---------- --------------------
1          700160

已用时间: 0.352(毫秒). 执行号:1317.

kill达梦进程并删除dm.ctl文件

[dmdba@localhost ctl_bak]$ ps -ef|grep dmserver
dmdba     2216  1963  1 21:25 pts/1    00:00:10 dmserver /home/dmdba/dmdbms/data/htdb/dm.ini
dmdba     2401  1963  0 21:40 pts/1    00:00:00 grep --color=auto dmserver
[dmdba@localhost ctl_bak]$ kill -9 2216
[dmdba@localhost ctl_bak]$ 
[1]+  已杀死               nohup dmserver /home/dmdba/dmdbms/data/htdb/dm.ini(工作目录:~/dmdbms/log)
(当前工作目录:~/dmdbms/data/htdb/ctl_bak)
[dmdba@localhost htdb]$ 
[dmdba@localhost htdb]$ rm -rf dm.ctl 
[dmdba@localhost htdb]$ 

启动达梦数据库报错

[dmdba@localhost htdb]$ dmserver /home/dmdba/dmdbms/data/htdb/dm.ini
file dm.key not found, use default license!
Read ini error, name:CTL_PATH, value:/home/dmdba/dmdbms/data/htdb/dm.ctl
dmserver startup failed, code = -803 [Invalid ini config value]
nsvr_ini_file_read failed, 1
[dmdba@localhost htdb]$ 

使用备份ctl文件直接启动库
备份文件在dm.ini中的CTL_BAK_PATH参数控制备份控制文件路径(一般在达梦数据库目录的ctl_bak中),CTL_BAK_NUM控制备份文件数量

[dmdba@localhost htdb]$ cd ctl_bak/
[dmdba@localhost ctl_bak]$ ls -lhtra
总用量 96K
-rw-r--r--. 1 dmdba dmsys 5.5K 4月  26 21:25 dm_20250426212532_981690.ctl
-rw-r--r--. 1 dmdba dmsys 5.5K 4月  26 21:36 dm_20250426213636_596321.ctl
-rw-r--r--. 1 dmdba dmsys 5.5K 4月  26 21:36 dm_20250426213636_599137.ctl
-rw-r--r--. 1 dmdba dmsys 6.0K 4月  26 21:36 dm_20250426213636_604672.ctl
-rw-r--r--. 1 dmdba dmsys 6.0K 4月  26 21:36 dm_20250426213636_607133.ctl
-rw-r--r--. 1 dmdba dmsys 6.0K 4月  26 21:36 dm_20250426213636_610925.ctl
drwxr-xr-x. 2 dmdba dmsys 4.0K 4月  26 21:36 .
-rw-r--r--. 1 dmdba dmsys 6.0K 4月  26 21:36 dm_20250426213636_630282.ctl
drwxr-xr-x. 6 dmdba dmsys 4.0K 4月  26 21:41 ..
-rw-r--r--. 1 dmdba dmsys 5.5K 4月  27 2025 dm_20250427034852_563000.ctl
-rw-r--r--. 1 dmdba dmsys 5.5K 4月  27 2025 dm_20250427034905_993889.ctl
-rw-r--r--. 1 dmdba dmsys 5.5K 4月  27 2025 dm_20250427035646_961400.ctl
-rw-r--r--. 1 dmdba dmsys 5.5K 4月  27 2025 dm_20250427041036_574397.ctl
[dmdba@localhost ctl_bak]$ 
[dmdba@localhost ctl_bak]$ cp dm_20250427041036_574397.ctl ../dm.ctl
[dmdba@localhost ctl_bak]$ dmserver /home/dmdba/dmdbms/data/htdb/dm.ini
file dm.key not found, use default license!
version info: develop
csek2_vm_t = 1440
nsql_vm_t = 328
prjt2_vm_t = 176
ltid_vm_t = 216
nins2_vm_t = 1120
nset2_vm_t = 272
ndlck_vm_t = 192
ndel2_vm_t = 768
slct2_vm_t = 352
nli2_vm_t = 200
aagr2_vm_t = 304
pscn_vm_t = 376
dist_vm_t = 960
DM Database Server 64 V8 03134284336-20250117-257733-20132 startup...
Normal of FAST
Normal of DEFAULT
Normal of RECYCLE
Normal of KEEP
Normal of ROLL
Database mode = 0, oguid = 0
License will expire on 2026-01-17
begin redo pwr log collect, last ckpt lsn: 46175 ...
redo pwr log collect finished
main rfil[/home/dmdba/dmdbms/data/htdb/htdb01.log]'s grp collect 0 valid pwr record, discard 1135 invalid pwr record
EP[0]'s cur_lsn[61627], file_lsn[61627]
begin redo log recover, last ckpt lsn: 46175 ...
redo log recover finished 0
ndct db load finished, code:0
ndct second level fill fast pool finished
ndct third level fill fast pool finished
ndct second level fill fast pool finished
ndct third level fill fast pool finished
ndct fill fast pool finished
pseg_set_gtv_trxid_low next_trxid in mem:[26039]
pseg_collect_mgr_items, total collect 0 active_trxs, 0 cmt_trxs, 0 pre_cmt_trxs, 0 to_release_trxs,
    0 active_pages, 0 cmt_pages, 0 pre_cmt_pages, 0 to_release_pages, 0 mgr pages, 0 mgr recs!
next_trxid in mem:[28041]
next_trxid = 30043.
pseg recv finished
nsvr_startup end.
uthr_pipe_create, create pipe[read:10, write:11]
uthr_pipe_create, create pipe[read:12, write:13]
uthr_pipe_create, create pipe[read:14, write:15]
uthr_pipe_create, create pipe[read:16, write:17]
uthr_pipe_create, create pipe[read:18, write:19]
uthr_pipe_create, create pipe[read:20, write:21]
uthr_pipe_create, create pipe[read:22, write:23]
uthr_pipe_create, create pipe[read:24, write:25]
uthr_pipe_create, create pipe[read:26, write:27]
uthr_pipe_create, create pipe[read:28, write:29]
uthr_pipe_create, create pipe[read:30, write:31]
uthr_pipe_create, create pipe[read:32, write:33]
uthr_pipe_create, create pipe[read:34, write:35]
uthr_pipe_create, create pipe[read:36, write:37]
uthr_pipe_create, create pipe[read:38, write:39]
uthr_pipe_create, create pipe[read:40, write:41]
aud sys init success.
aud rt sys init success.
systables desc init success.
ndct_db_load_info finished, code:0.
nsvr_process_before_open begin.
nsvr_process_before_open success.
SYSTEM IS READY.
purg2_crash_cmt_trx end, total 0 trx, 0 pages purged

启动数据库成功,查询相关信息

SQL> select TABLESPACE_NAME from dba_tablespaces;

行号     TABLESPACE_NAME
---------- ---------------
1          SYSTEM
2          ROLL
3          TEMP
4          MAIN
5          TBS_XFF
6          MAIN

6 rows got

已用时间: 4.257(毫秒). 执行号:602.
SQL> select count(*) from t_xff;

行号     COUNT(*)            
---------- --------------------
1          700160

已用时间: 2.893(毫秒). 执行号:603.

使用重建ctl文件方式恢复
1. 使用dmctlcvt把备份ctl转换为txt文件

[dmdba@localhost htdb]$ dmctlcvt  help
DMCTLCVT V8
version: 03134284336-20250117-257733-20132

格式: ./dmctlcvt KEYWORD=value
注意: 控制文件名称必须指定为dm.ctl、dmmpp.ctl、dss.ctl

关键字              说明
--------------------------------------------------------------------------------
TYPE                1 转换控制文件为文本文件(源文件路径中控制文件名称必须是dm.ctl或dmmpp.ctl或dss.ctl)
                    2 转换文本文件为控制文件(目标文件路径中控制文件名称必须是dm.ctl或dmmpp.ctl或dss.ctl)
SRC                 源文件路径
DEST                目标文件路径
DCR_INI             dmdcr.ini文件路径
HELP                打印帮助信息

示例:
./dmctlcvt TYPE=1 SRC=/opt/dmdbms/data/dameng/dm.ctl DEST=/opt/dmdbms/data/dameng/dmctl.txt
./dmctlcvt TYPE=2 SRC=/opt/dmdbms/data/dameng/dmctl.txt DEST=/opt/dmdbms/data/dameng/dm.ctl
[dmdba@localhost htdb]$ dmctlcvt TYPE=1 SRC=dm_20250426213636_630282.ctl DEST=/tmp/ctl.txt
DMCTLCVT V8
convert ctl to txt success!

2. 查看ctl.txt文件中是的表空间和数据文件信息是否正确,如果确实可以参考达梦日志中相关创建信息参考其他表空间信息进行完善,并注意修改next_ts_id=6值
2025-04-26 21:36:36.600 [INFO] database P0000002216 T0000000000000002387 ifun_add_file_low initialize file[0] of ts[5], file_path[/home/dmdba/dmdbms/data/htdb/tbs_xff01.dbf]!
dm_rectl


3. 利用该txt文件重建ctl文件

[dmdba@localhost ctl_bak]$ dmctlcvt TYPE=2 SRC=/tmp/ctl.txt DEST=/home/dmdba/dmdbms/data/htdb/dm.ctl
DMCTLCVT V8
convert txt to ctl success!

4. 启动数据库并进行验证

[dmdba@localhost ~]$ dmserver path=/home/dmdba/dmdbms/data/htdb/dm.ini
file dm.key not found, use default license!
version info: develop
csek2_vm_t = 1440
nsql_vm_t = 328
prjt2_vm_t = 176
ltid_vm_t = 216
nins2_vm_t = 1120
nset2_vm_t = 272
…………
aud sys init success.
aud rt sys init success.
systables desc init success.
ndct_db_load_info finished, code:0.
nsvr_process_before_open begin.
nsvr_process_before_open success.
SYSTEM IS READY.
purg2_crash_cmt_trx end, total 0 trx, 0 pages purged
[dmdba@localhost ctl_bak]$ disql
disql V8
用户名:sysdba
密码:
[-2501]:用户名或密码错误.
用户名:sysdba
密码:

服务器[LOCALHOST:5236]:处于普通打开状态
登录使用时间 : 4.624(ms)
SQL> select tablespace_name from dba_tablespaces;

行号     TABLESPACE_NAME
---------- ---------------
1          SYSTEM
2          ROLL
3          TEMP
4          MAIN
5          TBS_XFF
6          MAIN

6 rows got

已用时间: 9.326(毫秒). 执行号:701.
SQL> select count(1) from t_xff;

行号     COUNT(1)            
---------- --------------------
1          700160

已用时间: 2.916(毫秒). 执行号:702.
SQL> 

基于此,对于dm的control损坏/丢失一般可以通过以上两种方法实现恢复,这里可以明显感觉到dm的control相对于oracle的control文件中少了相关的一致性检测