ORA-600 kokiasg1故障分析(obj$中核心字典序列全部被恶意删除)

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-600 kokiasg1故障分析(obj$中核心字典序列全部被恶意删除)

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

故障总结:客户正常关闭数据库,然后启动报ORA-600 kokiasg1错误,通过对启动分析确认是由于IDGEN1$序列丢失导致,修复该故障之后,数据库启动成功,但是后台大量报ORA-600 12803,ORA-600 15264等错误,业务用户无法登录.经过深入分析,发现数据库字典obj$中所有核心字典的序列全部被删除,但是在seq$中这些对象的obj#记录还存在.初步怀疑是有人恶意删除了obj$中字典核心序列对象导致.
数据库启动报ORA-600 kokiasg1错误

SQL> startup ;
ORACLE 例程已经启动。

Total System Global Area 1.4531E+10 bytes
Fixed Size                  2295256 bytes
Variable Size            2181040680 bytes
Database Buffers         1.2314E+10 bytes
Redo Buffers               33193984 bytes
数据库装载完毕。
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [kokiasg1], [], [], [], [], [], [],
[], [], [], [], []
进程 ID: 5628
会话 ID: 122 序列号: 3

对应的alert日志信息

Thu Jul 03 16:35:25 2025
Shutting down instance (immediate)
Stopping background process SMCO
Shutting down instance: further logons disabled
Thu Jul 03 16:35:26 2025
Stopping background process CJQ0
Stopping background process QMNC
Stopping background process MMNL
Stopping background process MMON
License high water mark = 272
All dispatchers and shared servers shutdown
Thu Jul 03 16:35:54 2025
alter database close normal
Thu Jul 03 16:35:54 2025
SMON: disabling tx recovery
SMON: disabling cache recovery
Thu Jul 03 16:35:54 2025
Shutting down archive processes
Archiving is disabled
Archive process shutdown avoided: 0 active
Thread 1 closed at log sequence 296590
Successful close of redo thread 1
Completed: alter database close normal
alter database dismount
Shutting down archive processes
Archiving is disabled
Completed: alter database dismount
ARCH: Archival disabled due to shutdown: 1089
Shutting down archive processes
Archiving is disabled
ARCH: Archival disabled due to shutdown: 1089
Shutting down archive processes
Archiving is disabled
Thu Jul 03 16:36:02 2025
Stopping background process VKTM
Thu Jul 03 16:36:07 2025
Instance shutdown complete
Thu Jul 03 16:36:19 2025
Adjusting the default value of parameter parallel_max_servers
from 640 to 270 due to the value of parameter processes (300)
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Initial number of CPU is 16
Number of processor cores in the system is 8
Number of processor sockets in the system is 1
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameter default value as USE_DB_RECOVERY_FILE_DEST
Autotune of undo retention is turned on. 
IMODE=BR
ILAT =52
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options.
Windows NT Version V6.2  
CPU                 : 16 - type 8664, 8 Physical Cores
Process Affinity    : 0x0x0000000000000000
Memory (Avail/Total): Ph:24712M/32767M, Ph+PgF:14089M/39123M 
System parameters with non-default values:
  processes                = 300
  sessions                 = 480
  nls_language             = "SIMPLIFIED CHINESE"
  nls_territory            = "CHINA"
  sga_target               = 13920M
  control_files            = "D:\APP\ADMINISTRATOR\ORADATA\orcl\CONTROL01.CTL"
  control_files            = "D:\APP\ADMINISTRATOR\FAST_RECOVERY_AREA\orcl\CONTROL02.CTL"
  db_block_size            = 8192
  compatible               = "11.2.0.4.0"
  db_recovery_file_dest    = "D:\app\Administrator\fast_recovery_area"
  db_recovery_file_dest_size= 10G
  undo_tablespace          = "UNDOTBS1"
  remote_login_passwordfile= "EXCLUSIVE"
  db_domain                = ""
  dispatchers              = "(PROTOCOL=TCP) (SERVICE=orclXDB)"
  job_queue_processes      = 10
  audit_file_dest          = "D:\APP\ADMINISTRATOR\ADMIN\orcl\ADUMP"
  audit_trail              = "DB"
  db_name                  = "orcl"
  open_cursors             = 300
  pga_aggregate_target     = 4639M
  diagnostic_dest          = "D:\APP\ADMINISTRATOR"
Thu Jul 03 16:36:20 2025
PMON started with pid=2, OS id=13088 
Thu Jul 03 16:36:20 2025
PSP0 started with pid=3, OS id=16168 
Thu Jul 03 16:36:21 2025
VKTM started with pid=4, OS id=7948 at elevated priority
VKTM running at (10)millisec precision with DBRM quantum (100)ms
Thu Jul 03 16:36:21 2025
GEN0 started with pid=5, OS id=4192 
Thu Jul 03 16:36:21 2025
DIAG started with pid=6, OS id=8232 
Thu Jul 03 16:36:21 2025
DBRM started with pid=7, OS id=16436 
Thu Jul 03 16:36:21 2025
DIA0 started with pid=8, OS id=11400 
Thu Jul 03 16:36:21 2025
MMAN started with pid=9, OS id=11108 
Thu Jul 03 16:36:21 2025
DBW0 started with pid=10, OS id=12232 
Thu Jul 03 16:36:21 2025
DBW1 started with pid=11, OS id=7368 
Thu Jul 03 16:36:21 2025
LGWR started with pid=12, OS id=13520 
Thu Jul 03 16:36:21 2025
CKPT started with pid=13, OS id=11952 
Thu Jul 03 16:36:21 2025
SMON started with pid=14, OS id=9304 
Thu Jul 03 16:36:21 2025
RECO started with pid=15, OS id=17136 
Thu Jul 03 16:36:21 2025
MMON started with pid=16, OS id=1984 
Thu Jul 03 16:36:21 2025
MMNL started with pid=17, OS id=2568 
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'
starting up 1 shared server(s) ...
ORACLE_BASE from environment = D:\app\Administrator
Thu Jul 03 16:36:22 2025
alter database mount exclusive
Successful mount of redo thread 1, with mount id 1287723014
Database mounted in Exclusive Mode
Lost write protection disabled
Completed: alter database mount exclusive
alter database open
Thread 1 opened at log sequence 296590
  Current log# 1 seq# 296590 mem# 0: D:\APP\ADMINISTRATOR\ORADATA\orcl\REDO01.LOG
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
SMON: enabling cache recovery
[15144] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:3680275922 end:3680276032 diff:110 (1 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_15144.trc  (incident=7579):
ORA-00600: 内部错误代码, 参数: [kokiasg1], [], [], [], [], [], [], [], [], [], [], []
Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\incident\incdir_7579\orcl_ora_15144_i7579.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_15144.trc:
ORA-00600: 内部错误代码, 参数: [kokiasg1], [], [], [], [], [], [], [], [], [], [], []
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_15144.trc:
ORA-00600: 内部错误代码, 参数: [kokiasg1], [], [], [], [], [], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
USER (ospid: 15144): terminating the instance due to error 600
Instance terminated by USER, pid = 15144
ORA-1092 signalled during: alter database open...

对数据库启动过程进行跟踪确认报错可能和IDGEN1$对象有关系

PARSING IN CURSOR #615624160 len=30 dep=1 uid=0 oct=3 lid=0 tim=752975051401
   hv=3013659460 ad='7ffbd8f025d0' sqlid='6d8vr86tu1ku4'
select TOTAL from SYS.ID_GENS$
END OF STMT
PARSE #615624160:c=15625,e=2775,p=2,cr=14,cu=0,mis=1,r=0,dep=1,og=4,plh=1676180847,tim=752975051401
EXEC #615624160:c=0,e=6,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=1676180847,tim=752975051452
WAIT #615624160: nam='db file sequential read' ela= 126 file#=1 block#=3440 blocks=1 obj#=514 tim=752975051594
WAIT #615624160: nam='db file sequential read' ela= 48 file#=1 block#=3441 blocks=1 obj#=514 tim=752975051671
FETCH #615624160:c=0,e=224,p=2,cr=3,cu=0,mis=0,r=1,dep=1,og=4,plh=1676180847,tim=752975051687
STAT #615624160 id=1 cnt=1 pid=0 pos=1 obj=514 op='TABLE ACCESS FULL ID_GENS$ (cr=3 pr=2 pw=0 time=223 us)'
CLOSE #615624160:c=0,e=15,dep=1,type=0,tim=752975051716
BINDS #12720440:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=80 off=0
  kxsbbbfp=24b1b128  bln=22  avl=01  flg=05
  value=0
 Bind#1
  oacdty=01 mxl=32(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0001 frm=01 csi=852 siz=0 off=24
  kxsbbbfp=24b1b140  bln=32  avl=07  flg=01
  value="IDGEN1$"
 Bind#2
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=0 off=56
  kxsbbbfp=24b1b160  bln=22  avl=02  flg=01
  value=1
EXEC #12720440:c=0,e=107,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=2853959010,tim=752975051842
FETCH #12720440:c=0,e=5,p=0,cr=3,cu=0,mis=0,r=0,dep=1,og=4,plh=2853959010,tim=752975051856
CLOSE #12720440:c=0,e=0,dep=1,type=3,tim=752975051870
Incident 161 created, dump file: C:\APP\XFF\diag\rdbms\orcl\orcl\incident\incdir_161\orcl_ora_1880_i161.trc
ORA-00600: 内部错误代码, 参数: [kokiasg1], [], [], [], [], [], [], [], [], [], [], []

ORA-00600: 内部错误代码, 参数: [kokiasg1], [], [], [], [], [], [], [], [], [], [], []
ORA-00600: 内部错误代码, 参数: [kokiasg1], [], [], [], [], [], [], [], [], [], [], []

从mos中确认当数据库缺少IDGEN1$序列的时候,启动会报ORA-600 kokiasg1错误.
ORA-600-kokiasg1


使用工具恢复obj$表到新库中

E:\dump>imp test/oracle file=SYS_OBJ$.dmp full=y

Import: Release 11.2.0.4.0 - Production on 星期六 7月 5 09:34:42 2025

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.


连接到: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

经由常规路径由 EXPORT:V08.01.07 创建的导出文件

警告: 这些对象由 SYS 导出, 而不是当前用户

已经完成 ZHS16GBK 字符集和 AL16UTF16 NCHAR 字符集中的导入
导出服务器使用 UTF8 NCHAR 字符集 (可能的 ncharset 转换)
. 正在将 SYS 的对象导入到 TEST
. 正在将 SYS 的对象导入到 TEST
. . 正在导入表                          "OBJ$"导入了      103764 行
成功终止导入, 没有出现警告。

查询test.obj$表确认没有IDGEN1$对象名称记录

SQL> select * from test.obj$ where name='IDGEN1$';

未选定行

SQL>

查询正常obj$字典中关于IDGEN1$对象信息

SQL> select owner#, obj#,type# from obj$ where name='IDGEN1$';

    OWNER#       OBJ#      TYPE#
---------- ---------- ----------
         0       1229          6

在故障库恢复出来的test.obj$中查询obj#为1229附近对象

SQL> select owner#, obj#,type#,name from test.obj$ where obj# in(1228,1229,1230);

    OWNER#       OBJ#      TYPE# NAME
---------- ---------- ---------- ------------------------------
         0       1228          2 DST$TRIGGER_TABLE
         0       1230         13 BFILE

SQL> select owner#, obj#,type#,name from obj$ where obj# in(1228,1229,1230);

    OWNER#       OBJ#      TYPE# NAME
---------- ---------- ---------- ------------------------------
         0       1228          2 DST$TRIGGER_TABLE
         0       1229          6 IDGEN1$
         0       1230         13 BFILE

目前看初步判断故障库确实由于IDGEN1$序列丢失导致无法启动,处理过程相对比较简单,在数据库open的过程中,打开新会话创建IDGEN1$序列序列
11
22


然后重启数据库,即可正常启动成功,让看尝试登录数据库报ora-600 12803错误
ORA-600-12803

再次检查alert日志大量ORA-600错误

Fri Jul 04 15:57:13 2025
Errors in file C:\APP\XFF\diag\rdbms\orcl\orcl\trace\orcl_ora_27788.trc  (incident=12239):
ORA-00600: 内部错误代码, 参数: [12803], [], [], [], [], [], [], [], [], [], [], []
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Fri Jul 04 15:58:04 2025
Errors in file C:\APP\XFF\diag\rdbms\orcl\orcl\trace\orcl_mmon_1976.trc  (incident=12184):
ORA-00600: 内部错误代码, 参数: [15264], [], [], [], [], [], [], [], [], [], [], []
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.

基于这样ORA-600错误,初步怀疑字典层面还有问题,因为最初的错误是序列异常,所以这次我重点对系统队列进行分析,通过dul把seq$表恢复到test用户中

E:\dump>imp test/oracle file=SYS_seq$.dmp full=y

Import: Release 11.2.0.4.0 - Production on 星期六 7月 5 10:10:17 2025

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.


连接到: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

经由常规路径由 EXPORT:V08.01.07 创建的导出文件

警告: 这些对象由 SYS 导出, 而不是当前用户

已经完成 ZHS16GBK 字符集和 AL16UTF16 NCHAR 字符集中的导入
导出服务器使用 UTF8 NCHAR 字符集 (可能的 ncharset 转换)
. 正在将 SYS 的对象导入到 TEST
. 正在将 SYS 的对象导入到 TEST
. . 正在导入表                          "SEQ$"导入了         359 行
成功终止导入, 没有出现警告。

查询发现之前的序列(obj=1229)的竟然还在seq$中(obj$中没有了记录)

SQL> select * from test.seq$ where obj#=1229;

      OBJ# INCREMENT$   MINVALUE   MAXVALUE     CYCLE#     ORDER$      CACHE
---------- ---------- ---------- ---------- ---------- ---------- ----------
 HIGHWATER AUDIT$                                      FLAGS
---------- -------------------------------------- ----------
      1229         50          1 1.0000E+28          0          0       1000
  60267151 --------------------------------                0

这种现象证明seq 不是通过drop sequence命令删除,而可能直接delete obj$表进行删除,通过试验重现正常删除seq之后,obj$和seq$都会同步被删除

SQL> create sequence xxxx;

序列已创建。

SQL> select obj#,type# from obj$ where name='XXXX';

      OBJ#      TYPE#
---------- ----------
     87383          6

SQL> SELECT * FROM SEQ$ WHERE OBJ#=87383;

      OBJ# INCREMENT$   MINVALUE   MAXVALUE     CYCLE#     ORDER$      CACHE
---------- ---------- ---------- ---------- ---------- ---------- ----------
 HIGHWATER AUDIT$                                      FLAGS
---------- -------------------------------------- ----------
     87383          1          1 1.0000E+28          0          0         20
         1 --------------------------------                0


SQL> DROP SEQUENCE XXXX;

序列已删除。

SQL> SELECT * FROM SEQ$ WHERE OBJ#=87383;

未选定行

SQL> select obj#,type# from obj$ where name='XXXX';

未选定行

想到这里,那进一步分析,是否还有其他的系统序列被删除,分析思路是:在一个正常的库里面找出来SYS的seq的obj#,然后和test用户里面的obj$,seq$表里面对比
找出来test.obj$中sys用户的seq对象名字

SQL> select name,obj#,type# from test.obj$ where obj# in(
  2  select obj# from sys.obj$ where owner#=0 and type#=6)
  3  and type#=6;

未选定行

通过查询确认故障库中sys下面系统自带的核心seq的对象名称全部被删除(obj$中明确被删除),分析seq$中情况确认
QQ20250705-102429

SQL> select name,ctime from test.obj$ where type#=6 and owner#=0;

未选定行

通过上述相关核实,故障库中的obj$中系统字典seq基本上被删除(正常情况应该有130多个).对于这种情况,后续的类此比较简单,通过seq$表内容,构造出来系统 seq的创建语句,对其进行创建,然后数据库恢复正常,完成本次恢复工作.

ORA-00756 ORA-10567故障数据0丢失恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-00756 ORA-10567故障数据0丢失恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户虚拟化故障修复之后,数据库启动报ORA-600 kcratr_scan_lastbwr错误
kcratr_scan_lastbwr


这个是一个比较常见的错误,一般recover 下就ok了,但是有些时候会出现ORA-600 3020或者类似ORA-00756 ORA-10567的错误,比如这次不幸就遇到了该错误

SQL> recover database;
ORA-00283: recovery session canceled due to errors
ORA-00756: recovery detected a lost write of a data block
ORA-10567: Redo is inconsistent with data block (file# 10, block# 4005760, file
offset is 2750414848 bytes)
ORA-10564: tablespace PACS55
ORA-01110: data file 10: '/u02/oradata/pacsdb/pacs55.4.dbf'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 76649

然后尝试单个文件recover恢复

SQL> recover datafile 10;
ORA-00283: recovery session canceled due to errors
ORA-00756: recovery detected a lost write of a data block
ORA-10567: Redo is inconsistent with data block (file# 10, block# 4005760, file
offset is 2750414848 bytes)
ORA-10564: tablespace PACS55
ORA-01110: data file 10: '/u02/oradata/pacsdb/pacs55.4.dbf'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 76649

SQL> recover datafile 9;
ORA-00283: recovery session canceled due to errors
ORA-00756: recovery detected a lost write of a data block
ORA-10567: Redo is inconsistent with data block (file# 9, block# 4158754, file
offset is 4003741696 bytes)
ORA-10564: tablespace PACS55
ORA-01110: data file 9: '/u02/oradata/pacsdb/pacs55.3.dbf'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 76660

通过dbv检查这两个异常文件

[oracle@oradb ~]$ dbv file=/u02/oradata/pacsdb/pacs55.3.dbf

DBVERIFY: Release 19.0.0.0.0 - Production on Sat Jun 28 23:02:15 2025

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /u02/oradata/pacsdb/pacs55.3.dbf


DBVERIFY - Verification complete

Total Pages Examined         : 4194302
Total Pages Processed (Data) : 2482487
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 1655515
Total Pages Failing   (Index): 0
Total Pages Processed (Lob)  : 25017
Total Pages Failing   (Lob)  : 0
Total Pages Processed (Other): 15919
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 15364
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 311133131196 (72.1895485884)
[oracle@oradb ~]$ dbv file=/u02/oradata/pacsdb/pacs55.4.dbf 

DBVERIFY: Release 19.0.0.0.0 - Production on Sat Jun 28 23:04:59 2025

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /u02/oradata/pacsdb/pacs55.4.dbf


DBVERIFY - Verification complete

Total Pages Examined         : 4194302
Total Pages Processed (Data) : 2466409
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 1683244
Total Pages Failing   (Index): 0
Total Pages Processed (Lob)  : 16977
Total Pages Failing   (Lob)  : 0
Total Pages Processed (Other): 15909
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 11763
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 311133133727 (72.1895488415)

确定数据文件本身没有坏块,只是redo写丢失或者某种bug导致少量block应用redo的时候异常,而且报错是index,直接通过底层处理报错的block,让其这些报错的block直接不应用日志,然后完成recover操作,其他数据块数据不会丢失(最大限度减少损失,而不是直接修改文件头scn,或者强制拉库的方式来处理)

SQL> select file#,fuzzy from v$datafile_header;

     FILE# FUZ
---------- ---
	 1 NO
	 2 NO
	 3 NO
	 4 NO
	 5 NO
	 7 NO
	 8 NO
	 9 YES
	10 YES
	11 NO
	12 NO

     FILE# FUZ
---------- ---
	13 NO
	14 NO
	15 NO
	16 NO
	17 NO
	18 NO
	19 NO

18 rows selected.

SQL> recover  datafile 9 ;
Media recovery complete.
SQL> recover  datafile 10 ;
ORA-00283: recovery session canceled due to errors
ORA-00756: recovery detected a lost write of a data block
ORA-10567: Redo is inconsistent with data block (file# 10, block# 3822912, file
offset is 1252524032 bytes)
ORA-10564: tablespace PACS55
ORA-01110: data file 10: '/u02/oradata/pacsdb/pacs55.4.dbf'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 76649
 
SQL> recover  datafile 10;
Media recovery complete.

正常open数据库成功,并rebuild 异常的对象

SQL> alter database open;

Database altered.

SQL> select owner,object_name,object_type from dba_objects where data_object_id in(76649,76660);

OWNER
--------------------------------------------------------------------------------
OBJECT_NAME
--------------------------------------------------------------------------------
OBJECT_TYPE
-----------------------
PACS55
STUDYINFO_DIAGRPTID
INDEX

PACS55
PACS_STUDYINFO_PK
INDEX

OWNER
--------------------------------------------------------------------------------
OBJECT_NAME
--------------------------------------------------------------------------------
OBJECT_TYPE
-----------------------


SQL> alter index PACS55.STUDYINFO_DIAGRPTID rebuild online parallel 4;

Index altered.

SQL> alter index PACS55.PACS_STUDYINFO_PK rebuild online parallel 4;

Index altered.

SQL> 
SQL> 
SQL> 
SQL> alter index PACS55.STUDYINFO_DIAGRPTID noparallel;
alter index PACS55.PACS_STUDYINFO_PK noparallel;
Index altered.

SQL> 

Index altered.

至此该库完美恢复业务可以直接使用,业务数据0丢失.这次运气比较好,如果是表数据异常,可能会麻烦一点,但是也可以最大限度恢复(肯定比强制拉库,或者修改文件头的方式效果好)

数据库文件变成32k故障恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:数据库文件变成32k故障恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

最近一个客户数据库重启系统之后,数据文件大小变为了32kb,我接手的不是第一现场(客户那边尝试了rman还原操作),查看alert日志,数据库最初报错

Wed Jun 18 13:09:23 2025
alter database open
Block change tracking file is current.
Read of datafile 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\WASION08.DBF' (fno 14) header failed with ORA-01210
Hex dump of (file 14, block 1) in trace file d:\app\administrator\diag\rdbms\ORCL\ORCL\trace\ORCL_ora_11208.trc
Corrupt block relative dba: 0x03800001 (file 14, block 1)
Completely zero block found during datafile header read
Rereading datafile 14 header failed with ORA-01210
Hex dump of (file 14, block 1) in trace file d:\app\administrator\diag\rdbms\ORCL\ORCL\trace\ORCL_ora_11208.trc
Corrupt block relative dba: 0x03800001 (file 14, block 1)
Completely zero block found during datafile header read
Errors in file d:\app\administrator\diag\rdbms\ORCL\ORCL\trace\ORCL_ora_11208.trc:
ORA-01122: 数据库文件 14 验证失败
ORA-01110: 数据文件 14: 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\WASION08.DBF'
ORA-01210: 数据文件标头发生介质损坏
ORA-1122 signalled during: alter database open...
Wed Jun 18 13:09:23 2025
Checker run found 1 new persistent data failures

客户那边不知道做了什么操作之后报错(初步估计是把14号文件重命名了)

Thu Jun 19 16:04:19 2025
alter database open
Thu Jun 19 16:04:21 2025
Errors in file d:\app\administrator\diag\rdbms\ORCL\ORCL\trace\ORCL_dbw0_13000.trc:
ORA-01157: ????/?????? 14 - ??? DBWR ????
ORA-01110: ???? 14: 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\WASION08.DBF'
ORA-27041: ??????
OSD-04002: 无法打开文件
O/S-Error: (OS 2) 系统找不到指定的文件。
Errors in file d:\app\administrator\diag\rdbms\ORCL\ORCL\trace\ORCL_ora_12328.trc:
ORA-03113: 通信通道的文件结尾
ORA-3113 signalled during: alter database open...

根据客户反馈14号文件变成了32kb,就是被重命名的.bak文件
32k


这其中有一个bak0618是通过rman还原出来的(备份中无有效的14号文件备份,还原出来的为该文件初始化创建大小)

Thu Jul 07 16:57:05 2022
alter tablespace wasion add datafile 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\WASION08.dbf' size 10g autoextend on
Completed: alter tablespace wasion add datafile 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\WASION08.dbf' size 10g autoextend on

2025-06-26_101717_568


基于当前情况,可以确认该文件异常,而且没有有效的rman备份.通过分析备份脚本,发现每个备份集1个数据文件,而且没有压缩,并按照10g进行分割为多个文件
QQ20250626-095824

这些本身没有问题,脚本的后面有直接通过系统级别命令删除两天之前的备份文件
QQ20250626-100105

这里有一个问题,由于磁盘空间不足,导致部分备份不成功,但是系统级别删除操作依旧正常进行,导致以前有效的备份被删除,后面的备份又没有成功(这个是本次该文件无法还原的主要原因),慎重提醒,rman备份尽量使用rman本身的策略来管理不要使用系统命令来维护备份策略,基于这样的情况,可以使用反删除命令找出来了一些该文件的备份集,并注册到控制文件中

RMAN> list backup of datafile 14;


备份集列表
===================


BS 关键字  类型 LV 大小       设备类型 经过时间 完成时间
------- ---- -- ---------- ----------- ------------ ----------
35251   Incr 0  10.89G     DISK        00:01:20     15-6月 -25
  备份集 35251 中的数据文件列表
  文件 LV 类型 Ckp SCN    Ckp 时间   名称
  ---- -- ---- ---------- ---------- ----
  14   0  Incr 758850903  15-6月 -25 D:\APP\ADMINISTRATOR\ORADATA\ORCL\WASION08.DBF

  备份集 副本号 2 属于备份集 35251
  设备类型 经过时间 完成时间   压缩标记
  ----------- ------------ ---------- ---------- ---
  DISK        00:01:20     26-6月 -25 NO         TAG20250615T220003

    备份集 35251 副本号 2的备份片段列表
    BP 关键字  Pc# 状态      段名称
    ------- --- ----------- ----------
    78307   1   AVAILABLE   H:\BAIDUNETDISK\202506191452\L0_ORCL_20250615_78847_VV3S3RQP_1_1
    78308   2   AVAILABLE   H:\BAIDUNETDISK\202506191452\L0_ORCL_20250615_78847_VV3S3RQP_2_1

BS 关键字  类型 LV 大小       设备类型 经过时间 完成时间
------- ---- -- ---------- ----------- ------------ ----------
35266   Incr 0  1.81G      DISK        00:00:00     17-6月 -25
  备份集 35266 中的数据文件列表
  文件 LV 类型 Ckp SCN    Ckp 时间   名称
  ---- -- ---- ---------- ---------- ----
  14      Full 759283192  17-6月 -25 D:\APP\ADMINISTRATOR\ORADATA\ORCL\WASION08.DBF

  备份集 副本号 1 属于备份集 35266
  设备类型 经过时间 完成时间   压缩标记
  ----------- ------------ ---------- ---------- ---
  DISK        00:00:00     26-6月 -25 NO         TAG20250617T220049

    备份集 35266 副本号 1的备份片段列表
    BP 关键字  Pc# 状态      段名称
    ------- --- ----------- ----------
            1   DELETED                        <---缺少一个备份集文件
    78309   2   AVAILABLE   H:\BAIDUNETDISK\202506191452\L0_ORCL_20250617_79022_5E3S94MC_2_1

尝试rman还原这些备份文件

RMAN> run
2> {
3> SET NEWNAME FOR DATAFILE 14 to 'H:\BaiduNetdisk\202506191452\14.dbf';
4> restore datafile 14;
5> }

正在执行命令: SET NEWNAME

启动 restore 于 26-6月 -25
使用通道 ORA_DISK_1

通道 ORA_DISK_1: 正在开始还原数据文件备份集
通道 ORA_DISK_1: 正在指定从备份集还原的数据文件
通道 ORA_DISK_1: 将数据文件 00014 还原到 H:\BAIDUNETDISK\202506191452\14.DBF
通道 ORA_DISK_1: 正在还原段 1 (属于 2)
通道 ORA_DISK_1: 正在读取备份片段 H:\BAIDUNETDISK\202506191452\L0_ORCL_20250615_78847_VV3S3RQP_1_1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: restore 命令 (在 06/26/2025 08:35:53 上) 失败
ORA-19870: 还原备份片段 H:\BAIDUNETDISK\202506191452\L0_ORCL_20250615_78847_VV3S3RQP_1_1 时出错
ORA-00600: 内部错误代码, 参数: [krbvalmrange_badfno], [1], [14], [], [], [], [], [], [], [], [], []

alert日志报错

Thu Jun 26 08:25:26 2025
Checker run found 39 new persistent data failures
Thu Jun 26 08:35:51 2025
Datafile rdba reconstruction error, expected block greater than 804966, got 322047 for datafile 14
Corrupt block 804352 found during reading backup piece, 
file=H:\BAIDUNETDISK\202506191452\L0_ORCL_20250615_78847_VV3S3RQP_1_1, corr_type=4
Reread of blocknum=804352, file=H:\BAIDUNETDISK\202506191452\L0_ORCL_20250615_78847_VV3S3RQP_1_1. found valid data
Datafile rdba reconstruction error, expected block greater than 324095, got 55516 for datafile 14
Corrupt block 806400 found during reading backup piece, 
file=H:\BAIDUNETDISK\202506191452\L0_ORCL_20250615_78847_VV3S3RQP_1_1, corr_type=4
Reread of blocknum=806400, file=H:\BAIDUNETDISK\202506191452\L0_ORCL_20250615_78847_VV3S3RQP_1_1. found valid data
Errors in file C:\APP\XFF\diag\rdbms\ORCL\orcl\trace\orcl_ora_19208.trc  (incident=177):
ORA-00600: 内部错误代码, 参数: [krbvalmrange_badfno], [1], [14], [], [], [], [], [], [], [], [], []
Incident details in: C:\APP\XFF\diag\rdbms\ORCL\orcl\incident\incdir_177\orcl_ora_19208_i177.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Thu Jun 26 08:35:52 2025

后面通过工具分析以及ORA-600 krbvalmrange_badfno的错误,基本上可以确认在反删除恢复的备份集文件中部分rman的block是其他数据文件的,从而导致无法正常还原.基于这种情况,通过工具进行强制还原出来部分14号数据文件的block
QQ20250626-101208


然后再通过磁盘级别碎片,找到部分没有覆盖的block
suip

把rman备份中强制抽取的部分block和底层碎片恢复的没有覆盖的block组合到一起,通过检测确认恢复了大概2/3的数据
QQ20250626-101601

基于恢复的该文件和这个表空间的其他文件一起,使用dul工具把数据恢复到新库中,最大限度完成本次数据的抢救工作.

本次故障本不该发生,或者说发生不该如此严重:
1. rman备份采用系统级别维护策略,在备份没有成功的情况下依旧通过系统层面删除文件,导致故障文件无一份有效备份
2. 发生故障之后,没有保护现场的意识:对于32kb的数据文件所在磁盘进行了大量的写入操作(近1T的数据文件直接在本盘做了一次拷贝,还有rman默认写入到了以前文件所在位置)

ORA-600 kddummy_blkchk 数据库循环重启

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-600 kddummy_blkchk 数据库循环重启

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

一个运行在hp平台的10g rac突然异常,之后运行一段时间就自动重启,客户让对其进行分析和解决

Thu May  8 06:23:21 2025
ALTER DATABASE OPEN
Picked broadcast on commit scheme to generate SCNs
Thu May  8 06:23:21 2025
Thread 1 opened at log sequence 74302
  Current log# 1 seq# 74302 mem# 0: /dev/vgdata/rrac_redo01
Successful open of redo thread 1
Thu May  8 06:23:21 2025
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Thu May  8 06:23:21 2025
SMON: enabling cache recovery
Thu May  8 06:23:22 2025
Successfully onlined Undo Tablespace 1.
Thu May  8 06:23:22 2025
SMON: enabling tx recovery
Thu May  8 06:23:22 2025
Database Characterset is ZHS16CGB231280
Opening with internal Resource Manager plan
where NUMA PG = 1, CPUs = 4
replication_dependency_tracking turned off (no async multimaster replication found)
Thu May  8 06:23:23 2025
Errors in file /users/oracle/admin/orcl/bdump/orcl1_smon_15721.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [118], [333578], [18019], [], [], [], []
Starting background process QMNC
QMNC started with pid=22, OS id=15792
Thu May  8 06:23:25 2025
ORACLE Instance orcl1 (pid = 13) - Error 607 encountered while recovering transaction (9, 33) on object 775794.
Thu May  8 06:23:25 2025
Errors in file /users/oracle/admin/orcl/bdump/orcl1_smon_15721.trc:
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [kddummy_blkchk], [118], [333578], [18019], [], [], [], []
Thu May  8 06:23:26 2025
Completed: ALTER DATABASE OPEN
Thu May  8 06:23:26 2025
Doing block recovery for file 118 block 333578
Block recovery from logseq 74302, block 22 to scn 46740761996
Thu May  8 06:23:26 2025
Recovery of Online Redo Log: Thread 1 Group 1 Seq 74302 Reading mem 0
  Mem# 0: /dev/vgdata/rrac_redo01
Block recovery stopped at EOT rba 74302.33.16
Block recovery completed at rba 74302.33.16, scn 10.3791089036
Thu May  8 06:23:33 2025
Trace dumping is performing id=[cdmp_20250508062324]
Thu May  8 06:25:55 2025
Errors in file /users/oracle/admin/orcl/bdump/orcl1_smon_15721.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [118], [333578], [18019], [], [], [], []
Thu May  8 06:25:58 2025
ORACLE Instance orcl1 (pid = 13) - Error 607 encountered while recovering transaction (9, 33) on object 775794.
Thu May  8 06:27:32 2025
Errors in file /users/oracle/admin/orcl/bdump/orcl1_smon_15721.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [118], [333578], [18019], [], [], [], []
Errors in file /users/oracle/admin/orcl/bdump/orcl1_smon_15721.trc:
Doing block recovery for file 118 block 333578
Block recovery from logseq 74302, block 372 to scn 46740952565
Thu May  8 06:27:41 2025
Errors in file /users/oracle/admin/orcl/bdump/orcl1_smon_15721.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [118], [333578], [18019], [], [], [], []
ORACLE Instance orcl1 (pid = 13) - Error 607 encountered while recovering transaction (9, 33) on object 775794.
Thu May  8 06:27:43 2025
Errors in file /users/oracle/admin/orcl/bdump/orcl1_smon_15721.trc:
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [kddummy_blkchk], [118], [333578], [18019], [], [], [], []
Doing block recovery for file 118 block 333578
Block recovery from logseq 74302, block 372 to scn 46740952565
Thu May  8 06:27:45 2025
Recovery of Online Redo Log: Thread 1 Group 1 Seq 74302 Reading mem 0
  Mem# 0: /dev/vgdata/rrac_redo01
Block recovery completed at rba 74302.394.16, scn 10.3791279606
Thu May  8 06:27:47 2025
Errors in file /users/oracle/admin/orcl/bdump/orcl1_smon_15721.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [118], [333578], [18019], [], [], [], []
Thu May  8 06:28:07 2025
Errors in file /users/oracle/admin/orcl/bdump/orcl1_pmon_15690.trc:
ORA-00474: SMON process terminated with error
Thu May  8 06:28:07 2025
PMON: terminating instance due to error 474

这个数据库重启是由于smon进程异常导致数据库关闭,然后rac自动拉起数据库,从而出现了循环重启启动,smon异常的主要原因是由于“ORACLE Instance orcl1 (pid = 13) – Error 607 encountered while recovering transaction (9, 33) on object 775794.”这个表示有实物要回滚,但是遇到了ORA-600 kddummy_blkchk异常,无法完成回滚从而使得smon进程异常进而使得数据实例crash.对于这个问题处理相对比较简单
1. 通过ORA-600 kddummy_blkchk找出来报错对象

ORA-00600: internal error code, arguments: [kddummy_blkchk], [a], [b], 1

ARGUMENTS:
Arg [a] Absolute file number
Arg [b] Block number
Arg 1 Internal error code returned from kcbchk() which indicates the problem encountered.

2. 屏蔽实物回滚,打开数据库,然后对通过dba_extents 查询出来的对象/或者根据事务报错的object信息查询出来对象进行重建,完成本次恢复任务

关于ORA-00600: internal error code, arguments: [kddummy_blkchk], [a], [b], 相关目前已知bug

Bug Fixed Description
12349316 11.2.0.4, 12.1.0.2, 12.2.0.1 DBMS_SPACE_ADMIN.TABLESPACE_FIX_BITMAPS fails with ORA-600 [kddummy_blkchk] / ORA-600 [kdBlkCheckError] / ORA-607
17325413 11.2.0.3.BP23, 11.2.0.4.2, 11.2.0.4.BP04, 12.1.0.1.3, 12.1.0.2, 12.2.0.1 Drop column with DEFAULT value and NOT NULL definition ends up with Dropped Column Data still on Disk leading to Corruption
13715932 11.2.0.4, 12.1.0.1 ORA-600 [kddummy_blkchk] [18038] while adding extents to a large datafile
12417369 11.2.0.2.5, 11.2.0.2.BP13, 11.2.0.2.GIPSU05, 11.2.0.3, 12.1.0.1 Block corruption from rollback on compressed table
10324526 10.2.0.5.4, 11.1.0.7.8, 11.2.0.2.3, 11.2.0.2.BP06, 11.2.0.3, 12.1.0.1 ORA-600 [kddummy_blkchk] [6106] / corruption on COMPRESS table in TTS
10113224 11.2.0.3, 12.1.0.1 Index coalesce may generate invalid redo if blocks in the buffer cache are invalid/corrupted
9726702 11.2.0.3, 12.1.0.1 DBMS_SPACE_ADMIN.assm_segment_verify reports HWM related inconsistencies
9724970 11.2.0.1.BP08, 11.2.0.2.2, 11.2.0.2.BP02, 11.2.0.3, 12.1.0.1 Block Corruption with PDML UPDATE. ORA_600 [4511] OERI[kdblkcheckerror] by block check
9711859 10.2.0.5.1, 11.1.0.7.6, 11.2.0.2, 12.1.0.1 ORA-600 [ktsptrn_fix-extmap] / ORA-600 [kdblkcheckerror] during extent allocation caused by bug 8198906
9581240 11.1.0.7.9, 11.2.0.2, 12.1.0.1 Corruption / ORA-600 [kddummy_blkchk] [6101] / ORA-600 [7999] after RENAME operation during ROLLBACK
9350204 11.2.0.3, 12.1.0.1 Spurious ORA-600 [kddummy_blkchk] .. [6145] during CR operations on tables with ROWDEPENDENCIES
9231605 11.1.0.7.4, 11.2.0.1.3, 11.2.0.1.BP02, 11.2.0.2, 12.1.0.1 Block corruption with missing row on a compressed table after DELETE
9119771 11.2.0.2, 12.1.0.1 OERI [kddummy_blkchk]…[6108] from ‘SHRINK SPACE CASCADE’
9019113 11.2.0.1.BP02, 11.2.0.2, 12.1.0.1 ORA-600 [17182] ORA-7445 [memcpy] ORA-600 [kdBlkCheckError] for OLTP COMPRESS table in OLTP Compression REDO during RECOVERY
8951812 11.2.0.2, 12.1.0.1 Corrupt index by rebuild online. Possible OERI [kddummy_blkchk] by SMON
8720802 10.2.0.5, 11.2.0.1.BP07, 11.2.0.2, 12.1.0.1 Add check for row piece pointing to itself (db_block_checking,dbv,rman,analyze)
8331063 11.2.0.3, 12.1.0.1 Corrupt Undo. ORA-600 [2015] in Undo Block During Rollback
6523037 11.2.0.1.BP07, 11.2.0.2.2, 11.2.0.2.BP01, 11.2.0.3, 12.1.0.1 Corruption / ORA-600 [kddummy_blkchk] [6110] on update
8277580 11.1.0.7.2, 11.2.0.1, 11.2.0.2, 12.1.0.1 Corruption on compressed tables during Recovery and Quick Multi Delete (QMD).
9964102 11.2.0.1 OERI:2015 / OERI:kddummy_blkchk / undo corruption from supplementat logging with compressed tables
8613137 11.1.0.7.2, 11.2.0.1 ORA-600 updating table with DEFERRED constraints
8360192 11.1.0.7.6, 11.2.0.1 ORA-600 [kdBlkCheckError] [6110] / corruption from insert
8239658 10.2.0.5, 11.2.0.1 Dump / corruption writing row to compressed table
8198906 10.2.0.5, 11.2.0.1 OERI [kddummy_blkchk] / OERI [5467] for an aborted transaction of allocating extents
7715244 11.1.0.7.2, 11.2.0.1 Corruption on compressed tables. Error codes 6103 / 6110
7662491 10.2.0.4.2, 10.2.0.5, 11.1.0.7.4, 11.2.0.1 Array Update can corrupt a row. Errors OERI[kghstack_free1] or OERI[kddummy_blkchk][6110]
7411865 10.2.0.4.2, 10.2.0.5, 11.1.0.7.1, 11.2.0.1 OERI:13030 / ORA-1407 / block corruption from UPDATE .. RETURNING DML with trigger
7331181 11.2.0.1 ORA-1555 or OERI [kddummy_blkchk] [file#] [block#] [6126] during CR Rollback in query
7293156 11.1.0.7, 11.2.0.1 ORA-600 [2023] by Parallel Transaction Rollback when applying Multi-block undo Head-piece / Tail-piece
7041254 11.1.0.7.5, 11.2.0.1 ORA-19661 during RMAN restore check logical of compressed backup / IOT dummy key
6760697 10.2.0.4.3, 10.2.0.5, 11.1.0.7, 11.2.0.1 DBMS_SPACE_ADMIN.ASSM_SEGMENT_VERIFY does not detect certain segment header block corruption
6647480 10.2.0.4.4, 10.2.0.5, 11.1.0.7.3, 11.2.0.1 Corruption / OERI [kddummy_blkchk] .. [18021] with ASSM
6134368 10.2.0.5, 11.2.0.1 ORA-1407 / block corruption from UPDATE .. RETURNING DML with trigger – SUPERCEDED
6057203 10.2.0.4, 11.1.0.7, 11.2.0.1 Corruption with zero length column (ZLC) / OERI [kcbchg1_6] from Parallel update
6653934 10.2.0.4.2, 10.2.0.5, 11.1.0.7 Dump / block corruption from ONLINE segment shrink with ROWDEPENDENCIES
6674196 10.2.0.4, 10.2.0.5, 11.1.0.6 OERI / buffer cache corruption using ASM, OCFS or any ksfd client like ODM
5599596 10.2.0.4, 11.1.0.6 Block corruption / OERI [kddummy_blkchk] on clustered or compressed tables
5496041 10.2.0.4, 11.1.0.6 OERI[6006] / index corruption on compressed index
5386204 10.2.0.4.1, 10.2.0.5, 11.1.0.6 Block corruption / OERI[kddummy_blkchk] after direct load of ASSM segment
5363584 10.2.0.4, 11.1.0.6 Array insert into table can corrupt redo
4602031 10.2.0.2, 11.1.0.6 Block corruption from UPDATE or MERGE into compressed table
4493447 11.1.0.6 Spurious ORA-600 [kddummy_blkchk] [file#] [block#] [6145] on rollback of array update
4329302 11.1.0.6 OERI [kddummy_blkchk] [file#] [block#] [6145] on rollback of update with logminer
6075487 10.2.0.4 OERI[kddummy_blkchk]..[18020/18026] for DDL on plugged ASSM tablespace with FLASHBACK
4054640 10.1.0.5, 10.2.0.1 Block corruption / OERI [kddummy_blkchk] at physical standby
4000840 10.1.0.4, 10.2.0.1, 9.2.0.7 Update of a row with more than 255 columns can cause block corruption
3772033 9.2.0.7, 10.1.0.4, 10.2.0.1 OERI[ktspfmb_create1] creating a LOB in ASSM using 2k blocksize

 

记录一次asm disk加入到vg通过恢复直接open库的案例

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:记录一次asm disk加入到vg通过恢复直接open库的案例

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户在不清楚磁盘被asm disk使用的情况下,直接分区做pv,加入到vg中并且分配给了lv,导致数据库异常
QQ20250504
144809


通过操作系统层面分析,确认客户把data磁盘组的一个磁盘给处理掉了,导致数据库报错

WARNING: ASMB force dismounting group 2 (DATA) due to failover
SUCCESS: diskgroup DATA was dismounted
2025-05-04T07:03:19.910082+08:00
KCF: read, write or open error, block=0x201544 online=1
        file=102 '+DATA/ORCL/F7D939D6DBE06C71E053C30114AC1F10/DATAFILE/xifenfei_61.dbf'
        error=15078 txt: ''
2025-05-04T07:03:19.918972+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_dbwc_18507.trc:
2025-05-04T07:03:19.952045+08:00
KCF: read, write or open error, block=0x2013e7 online=1
        file=102 '+DATA/ORCL/F7D939D6DBE06C71E053C30114AC1F10/DATAFILE/xifenfei_61.dbf'
        error=15078 txt: ''
2025-05-04T07:03:19.964538+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_dbw7_18486.trc:
2025-05-04T07:03:19.967133+08:00
KCF: read, write or open error, block=0x230e71 online=1
        file=105 '+DATA/ORCL/F7D939D6DBE06C71E053C30114AC1F10/DATAFILE/xifenfei_64.dbf'
        error=15078 txt: ''
2025-05-04T07:03:19.973289+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_dbw2_18466.trc:
2025-05-04T07:03:19.978514+08:00
KCF: read, write or open error, block=0x1f6e91 online=1
        file=86 '+DATA/ORCL/F7D939D6DBE06C71E053C30114AC1F10/DATAFILE/xifenfei_52.dbf'
        error=15078 txt: ''
2025-05-04T07:03:19.991060+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_dbwd_18511.trc:
2025-05-04T07:03:19.995762+08:00
KCF: read, write or open error, block=0x7f8 online=1
        file=15 '+DATA/ORCL/F7D939D6DBE06C71E053C30114AC1F10/DATAFILE/undotbs01.dbf'
        error=15078 txt: ''
2025-05-04T07:03:20.006862+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_dbwa_18498.trc:
2025-05-04T07:03:20.020739+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_imr0_18937.trc:

这个客户比较幸运,处理该磁盘之后,没有往对应的lv中写入太多数据,导致覆盖部分很少

[root@rac01 rules.d]# df -h
文件系统               容量  已用  可用 已用% 挂载点
/dev/mapper/nlas-root  800G  272G  528G   34% /
devtmpfs               284G     0  284G    0% /dev
tmpfs                  284G  637M  283G    1% /dev/shm
tmpfs                  284G  4.0G  280G    2% /run
tmpfs                  284G     0  284G    0% /sys/fs/cgroup
/dev/mapper/nlas-home  200G   64M  200G    1% /home
/dev/sda1              197M  158M   40M   80% /boot
tmpfs                   57G   40K   57G    1% /run/user/0
tmpfs                   57G   48K   57G    1% /run/user/1000
[root@rac01 rules.d]# pvs
  PV         VG   Fmt  Attr PSize   PFree
  /dev/sda2  nlas lvm2 a--  564.00g    0 
  /dev/sdb1  nlas lvm2 a--   <2.00t 1.51t
[root@rac01 rules.d]# vgs
  VG   #PV #LV #SN Attr   VSize VFree
  nlas   2   3   0 wz--n- 2.55t 1.51t
[root@rac01 rules.d]# lvs
  LV   VG   Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  home nlas -wi-ao---- 200.00g                                                    
  root nlas -wi-ao---- 800.00g                                                    
  swap nlas -wi-ao----  64.00g                                                    

通过底层对磁盘进行分析,发现备份的磁盘头均以损坏,通过深入分析确认f1b1在sdb磁盘的第10个au上,通过相关信息,使用dul工具加载磁盘组,并分析元数据信息,发现恢复数据需要的元数据都可以正常加载
asm-dul


直接使用dul抽取数据到文件系统,然后open数据库成功
open-asm

然后通过rman 检测坏块(3T多的库只有不到5000个坏块,相对来说效果非常好),对于坏块对象进行处理,完美完成本次恢复工作.对于这次能够有这样好的恢复效果有几个因素:
1)asm disk 加入到vg,并分配给lv之后,立刻停止写入操作,避免了因为写入数据而覆盖asm 磁盘的带来的风险
2)由于是19c库,默认au为4M,使得数据库文件数据相对比较靠后,覆盖几率小了一点
3)由于文件系统是xfs,相对覆盖比ext4会少很多
4)是云环境的ssd磁盘,没有触发trim功能
以前类似asm disk异常恢复的相关case汇总:
asm磁盘加入vg恢复
asm磁盘dd破坏恢复
asm磁盘分区丢失恢复
pvid=yes导致asm无法mount
win asm disk header 异常恢复
又一例asm disk 加入vg故障
pvcreate asm disk导致asm磁盘组异常恢复
asm disk被加入到另外一个磁盘组故障恢复
再一例asm disk被误加入vg并且扩容lv恢复
再一起asm disk被格式化成ext3文件系统故障恢复
一次完美的asm disk被格式化ntfs恢复
asm disk误设置pvid导致asm diskgroup无法mount恢复
asm disk被分区,格式化为ext4恢复
oracle asm disk格式化恢复—格式化为ext4文件系统
分享oracleasm createdisk重新创建asm disk后数据0丢失恢复案例