ORA-704 ORA-604 ORA-1426故障分析处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-704 ORA-604 ORA-1426故障分析处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

服务器异常断电,通过分析alert日志发现现场的一些操作,数据库启动最初报ORA-00322 ORA-00312错

Tue Sep 23 20:06:52 2025
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_m000_6056.trc:
ORA-00322: log 1 of thread 1 is not current copy
ORA-00312: online log 1 thread 1: 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO01.LOG'
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_m000_6056.trc:
ORA-00322: log 2 of thread 1 is not current copy
ORA-00312: online log 2 thread 1: 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO02.LOG'
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_m000_6056.trc:
ORA-00322: log 3 of thread 1 is not current copy
ORA-00312: online log 3 thread 1: 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO03.LOG'

通过隐含参数强制拉库报ORA-00704 ORA-00604 ORA-01426错误

Tue Sep 23 23:53:52 2025
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 444541390
Resetting resetlogs activation ID 1705450279 (0x65a71b27)
Online log D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO03.LOG: Thread 1 Group 3 was previously cleared
Tue Sep 23 23:53:53 2025
Setting recovery target incarnation to 6
Tue Sep 23 23:53:53 2025
Assigning activation ID 1740400222 (0x67bc665e)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO01.LOG
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Tue Sep 23 23:53:53 2025
SMON: enabling cache recovery
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_7792.trc:
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01426: 数字溢出
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_7792.trc:
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01426: 数字溢出
Error 704 happened during db open, shutting down database
USER (ospid: 7792): terminating the instance due to error 704
Instance terminated by USER, pid = 7792
ORA-1092 signalled during: alter database open resetlogs...

我接手故障之后尝试启动库,依旧报ORA-01092 ORA-00704 ORA-00604 ORA-01426错误无法启动库

C:\Users\XFF>sqlplus / as sysdba

SQL*Plus: Release 11.2.0.1.0 Production on 星期四 9月 25 22:50:21 2025

Copyright (c) 1982, 2010, Oracle.  All rights reserved.


连接到:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options


SQL> recover database;
完成介质恢复。
SQL> alter database open ;
alter database open
*
第 1 行出现错误:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01426: numeric overflow
进程 ID: 18152
会话 ID: 14 序列号: 1

这个报错是number数据溢出,那就可能是由于number类型的数据值不对,通过对启动过程跟踪

PARSING IN CURSOR #5 len=74 dep=1 uid=0 oct=3 lid=0 tim=1677733680660 hv=3309402135 
    ad='7fffeef07300' sqlid='5n1fs4m2n2y0r'
select pos#,intcol#,col#,spare1,bo#,spare2,spare3 from icol$ where obj#=:1
END OF STMT
BINDS #5:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=1c1d70b8  bln=22  avl=03  flg=05
  value=425
EXEC #5:c=0,e=69,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=299250003,tim=1677733680725
WAIT #5: nam='db file sequential read' ela= 152 file#=1 block#=386 blocks=1 obj#=42 tim=1677733680892
FETCH #5:c=0,e=178,p=1,cr=3,cu=0,mis=0,r=1,dep=1,og=4,plh=299250003,tim=1677733680913
FETCH #5:c=0,e=3,p=0,cr=2,cu=0,mis=0,r=1,dep=1,og=4,plh=299250003,tim=1677733680931
FETCH #5:c=0,e=2,p=0,cr=1,cu=0,mis=0,r=0,dep=1,og=4,plh=299250003,tim=1677733680943
CLOSE #5:c=0,e=1,dep=1,type=3,tim=1677733680957

*** 2025-09-25 21:29:42.634
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=10, mask=0x0)
----- Error Stack Dump -----
ORA-01426: 数字溢出
----- Current SQL Statement for this session (sql_id=bkdusjx00dsmc) -----
select i.obj#,i.ts#,i.file#,i.block#,i.intcols,i.type#,i.flags, i.property,i.pctfree$,i.initrans,
i.maxtrans,i.blevel,i.leafcnt,i.distkey, i.lblkkey,i.dblkkey,i.clufac,i.cols,i.analyzetime,i.samplesize,
i.dataobj#, nvl(i.degree,1),nvl(i.instances,1),i.rowcnt,mod(i.pctthres$,256),i.indmethod#,
i.trunccnt,nvl(c.unicols,0),nvl(c.deferrable#+c.valid#,0), nvl(i.spare1,i.intcols),i.spare4,
spare2,spare6, decode(i.pctthres$,null,null, mod(trunc(i.pctthres$/256),256)) 
from ind$ i, (select enabled, min(cols) unicols, min(to_number(bitand(defer,1))) 
deferrable#, min(to_number(bitand(defer,4))) valid# from cdef$ where obj#=:1 and enabled > 1 group by enabled)
 c where i.obj#=c.enabled(+) and i.bo#=:1 order by i.obj#

通过上述可以是在执行上述sql的时候,遭遇异常,进一步查看trace,发现在该block中有异常信息

Block header dump:  0x004000a5
 Object id on Block? Y
 seg/obj: 0x2  csc: 0x00.10daa102  itc: 2  flg: -  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0006.01b.00000009  0x00c001b2.0003.09  C---    0  scn 0x0000.000021b5
0x02   0x000a.010.0003827e  0x00c00349.35ea.2f  --U-    2  fsc 0x0000.10daa11d
bdba: 0x004000a5

这个部分可以确认,异常对象是dataobj#为2(c_obj#的cluster),rdba为0x004000a5(file 1 block 165),对应的block dump中有以下信息异常

tab 3, row 4, @0x19a6
tl: 366 fb: -CH-FL-- lb: 0x0  cc: 28 cki: 4
col  0: [ 3]  c2 05 1b
col  1: [ 3]  c2 1d 1b
col  2: [44]
 80 02 c1 02 03 c2 1d 59 01 80 02 c1 03 02 c1 0b 02 c1 03 03 c2 03 38 ff 02
 c1 02 03 c2 15 33 01 80 02 c1 02 03 c2 02 3d 04 c3 04 62
col  3: [35]
 02 c1 02 02 c1 02 03 c2 17 10 07 78 7c 07 1b 0f 04 24 04 c3 04 62 23 04 c3
 04 62 23 02 c1 03 ff ff ff 02
col  4: [193]
 03 6c 00 1c 04 03 c2 05 1a 03 c2 05 1a 01 80 02 c1 02 03 c2 1d 51 01 80 02
 c1 03 02 c1 0b 02 c1 03 03 c2 03 38 ff 02 c1 02 03 c2 15 33 01 80 02 c1 02
 03 c2 02 3d 04 c3 04 5c 3d 02 c1 02 02 c1 02 03 c2 17 13 07 78 7c 07 1b 0f
 04 24 04 c3 04 62 23 04 c3 04 62 23 02 c1 03 ff ff ff 02 c1 03 6c 00 1c 02
 03 c2 05 16 03 c2 05 16 01 80 02 c1 02 03 c2 1d 11 01 80 02 c1 03 02 c1 0b
 02 c1 03 03 c2 03 38 ff 02 c1 04 02 c1 03 02 c1 02 02 c1 02 02 c1 0f 03 c2
 19 3b 02 c1 02 02 c1 02 03 c2 19 3b 07 78 7c 07 1b 17 04 23 03 c2 19 3b 03
 c2 19 3b 02 c1 03 ff ff ff 02 c1 03 6c 00 1c 01 03 c2
col  5: [ 5]  14 03 c2 05 14
col  6: [ 1]  80
col  7: [ 2]  c1 02
col  8: [ 3]  c2 1a 35
col  9: [ 1]  80
col 10: [ 2]  c1 03
col 11: [ 2]  c1 0b
col 12: [ 2]  c1 03
col 13: [ 3]  c2 03 38
col 14: *NULL*
col 15: [ 2]  c1 02
col 16: [ 3]  c2 15 33
col 17: [ 2]  c1 02
col 18: [ 2]  c1 02
col 19: [ 2]  c1 05
col 20: [ 3]  c2 08 4c
col 21: [ 2]  c1 02
col 22: [ 2]  c1 02
col 23: [ 2]  c1 05
col 24: [ 7]  78 6e 03 1e 0b 12 32
col 25: [ 3]  c2 08 4c
col 26: [ 3]  c2 08 4c
col 27: [ 2]  c1 03

这个里面显示是c_obj#这个簇中的第三个表第四行记录,通过查询正常库,确认第三个表是什么对象

SQL> select obj# from tab$ where dataobj#=2 and tab#=3;

      OBJ#
----------
        19

SQL> select name from obj$ where obj#=19;

NAME
------------------------------
IND$

通过上述可以去人tab 3为ind$对象,进一步分析ind$的表结构

SQL> desc ind$
 名称                                      是否为空? 类型
 ----------------------------------------- -------- ----------------------------
 OBJ#                                      NOT NULL NUMBER
 DATAOBJ#                                           NUMBER
 TS#                                       NOT NULL NUMBER
 FILE#                                     NOT NULL NUMBER
 BLOCK#                                    NOT NULL NUMBER
 BO#                                       NOT NULL NUMBER
 INDMETHOD#                                NOT NULL NUMBER
 COLS                                      NOT NULL NUMBER
 PCTFREE$                                  NOT NULL NUMBER
 INITRANS                                  NOT NULL NUMBER
 MAXTRANS                                  NOT NULL NUMBER
 PCTTHRES$                                          NUMBER
 TYPE#                                     NOT NULL NUMBER
 FLAGS                                     NOT NULL NUMBER
 PROPERTY                                  NOT NULL NUMBER
 BLEVEL                                             NUMBER
 LEAFCNT                                            NUMBER
 DISTKEY                                            NUMBER
 LBLKKEY                                            NUMBER
 DBLKKEY                                            NUMBER
 CLUFAC                                             NUMBER
 ANALYZETIME                                        DATE
 SAMPLESIZE                                         NUMBER
 ROWCNT                                             NUMBER
 INTCOLS                                   NOT NULL NUMBER
 DEGREE                                             NUMBER
 INSTANCES                                          NUMBER
 TRUNCCNT                                           NUMBER
 SPARE1                                             NUMBER
 SPARE2                                             NUMBER
 SPARE3                                             NUMBER
 SPARE4                                             VARCHAR2(1000)
 SPARE5                                             VARCHAR2(1000)
 SPARE6                                             DATE

对dump出来的block记录进行转换为实际值

SQL> select utl_raw.cast_to_number('c2051b') value from dual;

     VALUE
----------
       426

SQL> select utl_raw.cast_to_number('c21d1b') value from dual;

     VALUE
----------
      2826

SQL>  select utl_raw.cast_to_number('02c10202c10203c2171007787c071b0f042404c304622304c304622302c103ffffff02')
  2  value from dual;
 select utl_raw.cast_to_number('02c10202c10203c2171007787c071b0f042404c304622304c304622302c103ffffff02')
        *
第 1 行出现错误:
ORA-06502: PL/SQL: 数字或值错误
ORA-06512: 在 "SYS.UTL_RAW", line 388

通过上述分析证明,在ind$表的obj#为426行的记录的第三列无法转换为正常number记录,证明该值异常,从而导致数据库在执行sql_id=bkdusjx00dsmc这个sql的时候报错,从而无法open库,dbv对system文件进行检测发现有一些逻辑层面损坏

C:\Users\XFF>dbv file=H:\BaiduNetdisk\orcl\SYSTEM01.DBF

DBVERIFY: Release 11.2.0.4.0 - Production on 星期四 9月 25 19:17:17 2025

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - 开始验证: FILE = H:\BAIDUNETDISK\ORCL\SYSTEM01.DBF
Block Checking: DBA = 4194469, Block Type = KTB-managed data block
data header at 0x5c2425c
kdbchk: row does not end within block
        table=1  slot=0
        len=923  offset=7757  dtl=8096 aclen=923
页 165 失败, 校验代码为 6103
itl[2] has higher commit scn(0x0000.000ac920) than block scn (0x0000.000a4a37)
页 21478 失败, 校验代码为 6056
Block Checking: DBA = 4263330, Block Type = KTB-managed data block
**** row 100: row length 65627 past end of block
**** row 100: row skipped so other stats may be wrong
**** row 102: key out of order
**** row 102: lock value 4 is larger than maximum itl 2
**** row 102: bad flag value 193
**** row 105: key out of order
**** row 105: lock value 4 is larger than maximum itl 2
**** row 105: bad flag value 193
**** row 106: row length 65537 past end of block
**** row 106: row skipped so other stats may be wrong
**** row 108: row length 65627 past end of block
**** row 108: row skipped so other stats may be wrong
**** row 109: key out of order
**** row 109: lock value 4 is larger than maximum itl 2
**** row 109: bad flag value 193
**** row 110: key out of order
**** row 110: lock value 239 is larger than maximum itl 2
**** row 112: row length 65627 past end of block
**** row 112: row skipped so other stats may be wrong
**** row 113: key out of order
**** row 113: lock value 4 is larger than maximum itl 2
**** row 113: bad flag value 193
**** row 114: row length 65537 past end of block
**** row 114: row skipped so other stats may be wrong
**** row 116: key out of order
**** row 116: lock value 4 is larger than maximum itl 2
**** row 116: bad flag value 193
**** key (begin=0x1a14, len=17) overlaps with another
        begin = 0x1a1c len = 11
---- end index block validation
页 69026 失败, 校验代码为 6401
Block Checking: DBA = 4263331, Block Type = KTB-managed data block
**** row 121: row offset 3123 out of valid range
**** row 122: row offset 3106 out of valid range
**** row 123: lock value 214 is larger than maximum itl 2
**** row 123: bad flag value 128
**** row 124: row length 65537 past end of block
**** row 124: row skipped so other stats may be wrong
**** row 125: key out of order
**** row 129: row offset 2979 out of valid range
**** row 130: row offset 2962 out of valid range
**** row 131: row offset 3225 out of valid range
**** row 135: committed with rsl and/or ras flag
**** row 136: key out of order
**** row 136: lock value 100 is larger than maximum itl 2
**** row 137: lock value 193 is larger than maximum itl 2
**** row 141: row length 65537 past end of block
**** row 141: row skipped so other stats may be wrong
**** row 142: rsl is 0 with ras flag
**** row 143: key out of order
**** row 143: lock value 195 is larger than maximum itl 2
**** row 147: key out of order
**** row 149: key out of order
**** row 154: key out of order
**** row 154: lock value 72 is larger than maximum itl 2
**** row 155: key out of order
**** row 155: lock value 73 is larger than maximum itl 2
**** row 157: key out of order
**** row 157: lock value 195 is larger than maximum itl 2
**** row 159: row offset 8168 out of valid range
**** row 161: row offset 8138 out of valid range
**** row 163: row length 65943 past end of block
**** row 163: row skipped so other stats may be wrong
**** actual free space = 2756 < kdxcoavs = 3155
**** key (begin=0x11fa, len=17) overlaps with another
        begin = 0x1209 len = 9
---- end index block validation
页 69027 失败, 校验代码为 6401
itl[1] has higher commit scn(0x0000.000c021f) than block scn (0x0000.0009a123)
页 72609 失败, 校验代码为 6056
Block Checking: DBA = 4273153, Block Type = KTB-managed data block
data header at 0x5ac525c
kdbchk: row locked by non-existent transaction
        table=0   slot=93
        lockid=3   ktbbhitc=2
页 78849 失败, 校验代码为 6101
Block Checking: DBA = 4273154, Block Type = KTB-managed data block
data header at 0x5ac725c
kdbchk: bad row offset slot 113 offs 473 fseo 474 dtl 8168 bhs 72
页 78850 失败, 校验代码为 6135
Block Checking: DBA = 4273155, Block Type = KTB-managed data block
data header at 0x5ac925c
kdbchk: bad row offset slot 17 offs 342 fseo 394 dtl 8168 bhs 72
页 78851 失败, 校验代码为 6135
Block Checking: DBA = 4273185, Block Type = KTB-managed data block
data header at 0x5b0525c
kdbchk: row locked by non-existent transaction
        table=0   slot=14
        lockid=4   ktbbhitc=2
页 78881 失败, 校验代码为 6101
Block Checking: DBA = 4273189, Block Type = KTB-managed data block
data header at 0x5b0d25c
kdbchk: row locked by non-existent transaction
        table=0   slot=0
        lockid=2   ktbbhitc=2
页 78885 失败, 校验代码为 6101
Block Checking: DBA = 4273192, Block Type = KTB-managed data block
data header at 0x5b1325c
kdbchk: bad row offset slot 62 offs 569 fseo 1122 dtl 8168 bhs 72
页 78888 失败, 校验代码为 6135
Block Checking: DBA = 4282166, Block Type = KTB-managed data block
**** row 0: row offset 1457 out of valid range
**** row 1: row offset 1440 out of valid range
**** row 2: row offset 1423 out of valid range
**** row 3: row offset 1406 out of valid range
**** row 4: row offset 1389 out of valid range
**** row 5: row offset 1372 out of valid range
**** row 6: row offset 1355 out of valid range
**** row 7: row offset 1338 out of valid range
**** row 25: row offset 3055 out of valid range
**** row 26: row offset 3038 out of valid range
**** row 27: row offset 3021 out of valid range
**** row 28: row offset 3004 out of valid range
**** row 29: row offset 2987 out of valid range
**** row 30: row offset 2970 out of valid range
**** row 31: row offset 2953 out of valid range
**** row 32: row offset 2936 out of valid range
**** row 33: row offset 2919 out of valid range
**** row 34: row offset 2902 out of valid range
**** row 35: row offset 2885 out of valid range
**** row 36: row offset 2868 out of valid range
**** row 37: row offset 2851 out of valid range
**** row 38: row offset 2834 out of valid range
**** row 39: row offset 1321 out of valid range
**** row 40: row offset 2817 out of valid range
**** row 41: row offset 2800 out of valid range
**** row 42: row offset 2783 out of valid range
**** row 43: row offset 2766 out of valid range
**** row 44: row offset 2749 out of valid range
**** row 45: row offset 2732 out of valid range
**** row 46: row offset 2715 out of valid range
**** row 47: row offset 2698 out of valid range
**** row 48: row offset 2681 out of valid range
**** row 49: row offset 2664 out of valid range
**** row 50: row offset 2647 out of valid range
**** row 51: row offset 2630 out of valid range
**** row 52: row offset 2613 out of valid range
**** row 53: row offset 2596 out of valid range
**** row 54: row offset 2579 out of valid range
**** row 55: row offset 2562 out of valid range
**** row 56: row offset 2545 out of valid range
**** row 57: row offset 2528 out of valid range
**** row 58: row offset 2511 out of valid range
**** row 59: row offset 2494 out of valid range
**** row 60: row offset 2477 out of valid range
**** row 61: row offset 2460 out of valid range
**** row 62: row offset 2443 out of valid range
**** row 63: row offset 2426 out of valid range
**** row 64: row offset 2409 out of valid range
**** row 65: row offset 2392 out of valid range
**** row 66: row offset 2375 out of valid range
**** row 67: row offset 2358 out of valid range
**** row 68: row offset 2341 out of valid range
**** row 69: row offset 2324 out of valid range
**** row 70: row offset 2307 out of valid range
**** row 71: row offset 2290 out of valid range
**** row 72: row offset 2273 out of valid range
**** row 73: row offset 2256 out of valid range
**** row 74: row offset 2239 out of valid range
**** row 75: row offset 2222 out of valid range
**** row 76: row offset 2205 out of valid range
**** row 77: row offset 2188 out of valid range
**** row 78: row offset 2171 out of valid range
**** row 79: row offset 2154 out of valid range
**** row 80: row offset 2137 out of valid range
**** row 81: row offset 2120 out of valid range
**** row 82: row offset 2103 out of valid range
**** row 83: row offset 2086 out of valid range
**** row 84: row offset 2069 out of valid range
**** row 85: row offset 2052 out of valid range
**** row 86: row offset 2035 out of valid range
**** row 87: row offset 2018 out of valid range
**** row 88: row offset 2001 out of valid range
**** row 89: row offset 1984 out of valid range
**** row 90: row offset 1967 out of valid range
**** row 91: row offset 1950 out of valid range
**** row 92: row offset 1933 out of valid range
**** row 93: row offset 1916 out of valid range
**** row 94: row offset 1899 out of valid range
**** row 95: row offset 1882 out of valid range
**** row 96: row offset 1865 out of valid range
**** row 97: row offset 1848 out of valid range
**** row 98: row offset 1831 out of valid range
**** row 99: row offset 1814 out of valid range
**** row 100: row offset 1797 out of valid range
**** row 101: row offset 1780 out of valid range
**** row 102: row offset 1763 out of valid range
**** row 103: row offset 1746 out of valid range
**** row 104: row offset 1729 out of valid range
**** row 105: row offset 1712 out of valid range
**** row 106: row offset 1695 out of valid range
**** row 107: row offset 1678 out of valid range
**** row 108: row offset 1661 out of valid range
**** row 109: row offset 1644 out of valid range
**** row 110: row offset 1627 out of valid range
**** row 111: row offset 1610 out of valid range
**** row 112: row offset 1593 out of valid range
**** row 113: row offset 1576 out of valid range
**** row 114: row offset 1559 out of valid range
**** row 115: row offset 1542 out of valid range
**** row 116: row offset 1525 out of valid range
**** row 117: row offset 1508 out of valid range
**** row 118: row offset 1491 out of valid range
**** row 119: row offset 1474 out of valid range
**** actual rows locked by itl 2  = 95 != # in trans. header = 198
**** actual rows marked deleted = 95 != kdxlende = 123
---- end index block validation
页 87862 失败, 校验代码为 6401
Block Checking: DBA = 4282170, Block Type = KTB-managed data block
**** kdxcofbo = 434 != 590
---- end index block validation
页 87866 失败, 校验代码为 6401
Block Checking: DBA = 4301433, Block Type = KTB-managed data block
data header at 0x5bb525c
kdbchk: avsp(899) > tosp(867)
页 107129 失败, 校验代码为 6128


DBVERIFY - 验证完成

检查的页总数: 956160
处理的页总数 (数据): 919530
失败的页总数 (数据): 10
处理的页总数 (索引): 13114
失败的页总数 (索引): 4
处理的页总数 (其他): 3222
处理的总页数 (段)  : 1
失败的总页数 (段)  : 0
空的页总数: 20294
标记为损坏的总页数: 0
流入的页总数: 0
加密的总页数        : 0
最高块 SCN            : 443681161 (0.443681161)

基于上述情况,现在基本上可以确定是由于datafile 1 block 165中的ind$记录的第四行出现损坏,导致数据库无法正常查询ind$记录,从而使得数据库无法open.已经定位到该问题了,处理起来相对比较简单,使用bbed对其进行修复,让数据库可以正常查询ind$表记录,从而正常open库

SQL> recover database;
完成介质恢复。
SQL> shutdown immediate;
ORA-01109: ??????


已经卸载数据库。
ORACLE 例程已经关闭。
SQL> startup mount pfile='d:/pfile.txt'
ORACLE 例程已经启动。

Total System Global Area 4275781632 bytes
Fixed Size                  2182592 bytes
Variable Size             822084160 bytes
Database Buffers         3439329280 bytes
Redo Buffers               12185600 bytes
数据库装载完毕。
SQL> alter database open;

数据库已更改。

然后到粗数据,完成本次恢复任务.

删除数据库文件并部分覆盖情况下Oracle恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:删除数据库文件并部分覆盖情况下Oracle恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户由于磁盘空间满了,人工删除了数据库30多个数据文件,导致数据库无法正常工作,然后又被人offline这些文件启动数据库,并运行了一段时间,导致写入了大量的trace和部分数据库归档日志,导致被删除的数据文件发生了覆盖,对于这样情况,通过底层反删除工具对磁盘进行扫描,发现了部分被删除文件,但是大小基本上显示0kb
dbf-0kb


对于这种情况,os层面反删除恢复,肯定无法恢复出来合适的数据文件,只能做底层数据块扫描恢复,参考以前类似case:
rm -rf误删Oracle数据库恢复
win系统删除oracle数据文件恢复
Oracle 数据文件大小为0kb或者文件丢失恢复
解决一次硬件恢复之后数据文件0kb的故障恢复case
这个客户的情况相对复杂一些:
1. 该磁盘分区中有历史库(也就是说单纯的软件直接按照rdba方式无法直接区分出来合适的数据块,儿实现数据重组)
2. 删除的文件比较多(33个数据文件),分区较大(5T+)
3. 删除文件之后,分区还写入了不少数据,会引起一些覆盖和导致碎片数量增加,导致工作量增加和恢复效果变差
通过对客户alert日志分析,发现一个好消息,客户数据每个数据文件是固定大小(没有设置自增长),这种情况,一般来说数据比较连续,碎片相对比较容易区分出来.
add_datafile

通过工具扫描识别出来oracle block,并把结果记录到数据库中,然后通过人工在数据库中对其进行挑选识别,然后生成dd语句恢复出来数据文件,比如这个只是被覆盖了文件头的22号文件,就比较容易恢复
dd-ok

对于一些碎片严重的文件,就需要人工生成大量dd语句来恢复
dd

对于所有恢复出来的文件,使用工具检查坏块情况
check

然后把这些数据文件中的数据恢复到新库中,完成本次数据恢复工作,最大限度抢救客户数据.

一次非常幸运的ORA-600 16703(tab$被清空)故障恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:一次非常幸运的ORA-600 16703(tab$被清空)故障恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

这次的ORA-600 16703的故障比较特殊,客户的一套rac运行了5年多没有重启,这次由于异常导致其中一个节点重启,然后触发了tab$被清空,异常节点启动报ORA-600 16703错误.朋友在故障之后,第一时间没有对在运行的节点进行重启(虽然也无法对外提供业务服务),使得恢复工作相对简单一些,恢复效果也是最完美的.这个是我在对于软件安装介质注入恶意脚本,300天之后重启触发tab$被清空的相关恢复case中,最完美的一次(以前遇到过一次客户是虚拟化环境通过cdp回退然后类似方法处理ORA-600 16703直接把orachk备份表插入到tab$恢复),凸显了这位朋友在故障发生之后对于问题的准确判断和果断的应对能力.
有朋友和我反馈,他们数据库突然报大量ORA-600错误,业务无法正常操作,我分析相关日志确认:节点2重启之后节点1开始报大量ORA-600错误,但是节点一直处于open状态

Fri Jul 25 15:28:53 2025
Decreasing number of real time LMS from 3 to 0
Fri Jul 25 15:29:18 2025
Reconfiguration started (old inc 13, new inc 15)
List of instances:
 1 2 (myinst: 1) 
 Global Resource Directory frozen
 Communication channels reestablished
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
Fri Jul 25 15:29:18 2025
 LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Fri Jul 25 15:29:18 2025
 LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Fri Jul 25 15:29:18 2025
 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
 Set master node info 
 Submitted all remote-enqueue requests
 Dwn-cvts replayed, VALBLKs dubious
 All grantable enqueues granted
 Submitted all GCS remote-cache requests
 Fix write in gcs resources
Reconfiguration complete
Fri Jul 25 15:29:20 2025
minact-scn: Master returning as live inst:2 has inc# mismatch instinc:0 cur:15 errcnt:0
Fri Jul 25 15:30:07 2025
Errors in file /u01/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_252634.trc  (incident=77234):
ORA-00600: internal error code, arguments: [ktsircinfo_num1], [0], [0], [0], [],[],[],[],[],[],[],[]
Incident details in: /u01/oracle/diag/rdbms/orcl/orcl1/incident/incdir_77234/orcl1_ora_252634_i77234.trc
Fri Jul 25 15:30:18 2025
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Fri Jul 25 15:30:19 2025
Sweep [inc][77234]: completed
Sweep [inc2][77234]: completed
Fri Jul 25 15:30:27 2025
Errors in file /u01/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_261587.trc  (incident=76487):
ORA-00600: internal error code, arguments: [ktsircinfo_num1],[0],[0],[0], [], [], [], [], [], [], [], []

通过grep筛选报错信息

[root@iZbp11c0qyuuo1gr7j98upZ tmp]# egrep "ORA-00600|ORA-07445" alert_1.txt |sort -u
ORA-00600: internal error code, arguments: [25027], [0], [0], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kkpo_rcinfo_defstg:delseg], [28941391], [], [], [], []
ORA-00600: internal error code, arguments: [ktsircinfo_num1], [0], [0], [0], [], [], [], [], [], []
ORA-00600: 内部错误代码, 参数: [16659], [kqldtu], [DEL], [0], [35038924], [], [], [], [], [], [], []
ORA-00600: 内部错误代码, 参数: [16659], [kqldtu], [INS], [0], [277736], [], [], [], [], [], [], []
ORA-00600: 内部错误代码, 参数: [16659], [kqldtu], [INS], [0], [28829570], [], [], [], [], [], [], []
ORA-07445: exception encountered: core dump [qknSetParent()+9] [SIGSEGV] [ADDR:0x10354] 
   [PC:0x1A48B9B] [Address not mapped to object] []
ORA-07445: exception encountered: core dump [qksxaMoveQbAnnotations()+168] [SIGSEGV]
   [ADDR:0x20304] [PC:0x1594954] [Address not mapped to object] []
ORA-07445: 出现异常错误: 核心转储 [qknExpRegIni_int()+87] [SIGSEGV] [ADDR:0x8C] 
   [PC:0x1A4D729] [Address not mapped to object] []
ORA-07445: 出现异常错误: 核心转储 [qksxaMoveQbAnnotations()+168] [SIGSEGV] [ADDR:0x0] 
   [PC:0x1594954] [SI_KERNEL(general_protection)] []

既然是由于节点2重启导致节点1报错,那分析节点2重启相关情况,第一次重启成功之后,数据库开始报ORA-600错误

Fri Jul 25 15:29:29 2025
QMNC started with pid=46, OS id=363757 
Fri Jul 25 15:29:31 2025
minact-scn: Inst 2 is a slave inc#:15 mmon proc-id:363622 status:0x2
minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000
Fri Jul 25 15:29:33 2025
Errors in file /u01/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_m003_363779.trc  (incident=248519):
ORA-00600: internal error code, arguments: [kgmfvmi#3], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/oracle/diag/rdbms/orcl/orcl2/incident/incdir_248519/orcl2_m003_363779_i248519.trc
Starting background process SMCO
Fri Jul 25 15:29:35 2025
SMCO started with pid=57, OS id=363802 
Fri Jul 25 15:29:35 2025
Completed: ALTER DATABASE OPEN /* db agent *//* {2:23784:2} */
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_m003_363779.trc  (incident=248520):
ORA-00600: internal error code, arguments: [kgmfvmi#3], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/oracle/diag/rdbms/orcl/orcl2/incident/incdir_248520/orcl2_m003_363779_i248520.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Exception [type: SIGSEGV, Address not mapped to object][ADDR:0x10] [PC:0x2FDA4BB,kgmdelsis()+219][flags:0x0,count:1]
Errors in file /u01/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_m003_363779.trc  (incident=248521):
ORA-07445: exception encountered: core dump [kgmdelsis()+219] [SIGSEGV] 
  [ADDR:0x10] [PC:0x2FDA4BB] [Address not mapped to object] []
ORA-00600: internal error code, arguments: [kgmfvmi#3], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/oracle/diag/rdbms/orcl/orcl2/incident/incdir_248521/orcl2_m003_363779_i248521.trc
Use ADRCI or Support Workbench to package the incident.
Fri Jul 25 15:29:47 2025
Errors in file /u01/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_ora_363808.trc  (incident=248559):
ORA-00600: internal error code, arguments: [kkposds2], [18446744073709551615], [18446744073709551615], 
  [18446744073709551615], [], [], [], [], [], [], [], []

然后第二次重启数据库无法open成功,而是报ORA-600 16703错误

ALTER DATABASE OPEN /* db agent *//* {2:21799:2} */
Picked broadcast on commit scheme to generate SCNs
ARCH: STARTING ARCH PROCESSES
Fri Jul 25 15:41:23 2025
ARC0 started with pid=39, OS id=369231 
ARC0: Archival started
ARCH: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
Fri Jul 25 15:41:24 2025
ARC1 started with pid=40, OS id=369242 
Fri Jul 25 15:41:24 2025
ARC2 started with pid=41, OS id=369244 
Fri Jul 25 15:41:24 2025
ARC3 started with pid=42, OS id=369246 
ARC1: Archival started
ARC2: Archival started
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
ARC2: Becoming the heartbeat ARCH
Thread 2 opened at log sequence 33585
  Current log# 7 seq# 33585 mem# 0: +DATA/orcl/onlinelog/group_7.269.1011373611
Successful open of redo thread 2
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
SMON: enabling cache recovery
Errors in file /u01/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_ora_369210.trc  (incident=260494):
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
Incident details in: /u01/oracle/diag/rdbms/orcl/orcl2/incident/incdir_260494/orcl2_ora_369210_i260494.trc
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
SUCCESS: diskgroup FRA was mounted
Fri Jul 25 15:41:30 2025
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_ora_369210.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
Errors in file /u01/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_ora_369210.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
Error 704 happened during db open, shutting down database
USER (ospid: 369210): terminating the instance due to error 704
Instance terminated by USER, pid = 369210
ORA-1092 signalled during: ALTER DATABASE OPEN /* db agent *//* {2:21799:2} */...
opiodr aborting process unknown ospid (369210) as a result of ORA-1092
Fri Jul 25 15:41:31 2025
ORA-1092 : opitsk aborting process

到这一步基本上就清晰了,大概率是遭遇到以前恢复的类似case,tab$数据被清空导致,类似案例
ORA-600 16703故障解析—tab$表被清空
警告:互联网中有oracle介质被注入恶意程序导致—ORA-600 16703
通过在故障主机上找到安装介质,验证md5确认该程序是被注入恶意代码程序
md5


这个库由于还有一个节点处于open状态,相对处理比较简单,直接把备份的表数据反向插入回去即可

SYS@orcl1> select count(1) from ORACHK3C08C86E063530510ACD937;

  COUNT(1)
----------
     20696

SYS@orcl1> insert into tab$ select * from ORACHK3C08C86E063530510ACD937;

20696 rows created.

SYS@orcl1> commit;

Commit complete.

SYS@orcl1> select object_name,to_char(CREATED,'yyyy-mm-dd hh24:mi:ss') from dba_objects 
          2 where object_name in('DBMS_SUPPORT_DBMONITOR','DBMS_SUPPORT_DBMONITORP');

OBJECT_NAME                                  TO_CHAR(CREATED,'YY
-------------------------------------------- -------------------
DBMS_SUPPORT_DBMONITORP                      2019-06-19 17:06:46
DBMS_SUPPORT_DBMONITOR                       2019-06-19 17:06:46

然后清理掉恶意脚本,分别重启两个节点,完成数据恢复任务
2025-07-26_215903_578


这次故障能够快速顺利的恢复,和客户发现故障之后保留第一现场,没有把一个open的节点也重启有很大关系,open的节点也重启了,那后续恢复工作会麻烦很多,效果可能也没有这样的完美.

ORA-600 kokiasg1故障分析(obj$中核心字典序列全部被恶意删除)

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-600 kokiasg1故障分析(obj$中核心字典序列全部被恶意删除)

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

故障总结:客户正常关闭数据库,然后启动报ORA-600 kokiasg1错误,通过对启动分析确认是由于IDGEN1$序列丢失导致,修复该故障之后,数据库启动成功,但是后台大量报ORA-600 12803,ORA-600 15264等错误,业务用户无法登录.经过深入分析,发现数据库字典obj$中所有核心字典的序列全部被删除,但是在seq$中这些对象的obj#记录还存在.初步怀疑是有人恶意删除了obj$中字典核心序列对象导致.
数据库启动报ORA-600 kokiasg1错误

SQL> startup ;
ORACLE 例程已经启动。

Total System Global Area 1.4531E+10 bytes
Fixed Size                  2295256 bytes
Variable Size            2181040680 bytes
Database Buffers         1.2314E+10 bytes
Redo Buffers               33193984 bytes
数据库装载完毕。
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [kokiasg1], [], [], [], [], [], [],
[], [], [], [], []
进程 ID: 5628
会话 ID: 122 序列号: 3

对应的alert日志信息

Thu Jul 03 16:35:25 2025
Shutting down instance (immediate)
Stopping background process SMCO
Shutting down instance: further logons disabled
Thu Jul 03 16:35:26 2025
Stopping background process CJQ0
Stopping background process QMNC
Stopping background process MMNL
Stopping background process MMON
License high water mark = 272
All dispatchers and shared servers shutdown
Thu Jul 03 16:35:54 2025
alter database close normal
Thu Jul 03 16:35:54 2025
SMON: disabling tx recovery
SMON: disabling cache recovery
Thu Jul 03 16:35:54 2025
Shutting down archive processes
Archiving is disabled
Archive process shutdown avoided: 0 active
Thread 1 closed at log sequence 296590
Successful close of redo thread 1
Completed: alter database close normal
alter database dismount
Shutting down archive processes
Archiving is disabled
Completed: alter database dismount
ARCH: Archival disabled due to shutdown: 1089
Shutting down archive processes
Archiving is disabled
ARCH: Archival disabled due to shutdown: 1089
Shutting down archive processes
Archiving is disabled
Thu Jul 03 16:36:02 2025
Stopping background process VKTM
Thu Jul 03 16:36:07 2025
Instance shutdown complete
Thu Jul 03 16:36:19 2025
Adjusting the default value of parameter parallel_max_servers
from 640 to 270 due to the value of parameter processes (300)
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Initial number of CPU is 16
Number of processor cores in the system is 8
Number of processor sockets in the system is 1
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameter default value as USE_DB_RECOVERY_FILE_DEST
Autotune of undo retention is turned on. 
IMODE=BR
ILAT =52
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options.
Windows NT Version V6.2  
CPU                 : 16 - type 8664, 8 Physical Cores
Process Affinity    : 0x0x0000000000000000
Memory (Avail/Total): Ph:24712M/32767M, Ph+PgF:14089M/39123M 
System parameters with non-default values:
  processes                = 300
  sessions                 = 480
  nls_language             = "SIMPLIFIED CHINESE"
  nls_territory            = "CHINA"
  sga_target               = 13920M
  control_files            = "D:\APP\ADMINISTRATOR\ORADATA\orcl\CONTROL01.CTL"
  control_files            = "D:\APP\ADMINISTRATOR\FAST_RECOVERY_AREA\orcl\CONTROL02.CTL"
  db_block_size            = 8192
  compatible               = "11.2.0.4.0"
  db_recovery_file_dest    = "D:\app\Administrator\fast_recovery_area"
  db_recovery_file_dest_size= 10G
  undo_tablespace          = "UNDOTBS1"
  remote_login_passwordfile= "EXCLUSIVE"
  db_domain                = ""
  dispatchers              = "(PROTOCOL=TCP) (SERVICE=orclXDB)"
  job_queue_processes      = 10
  audit_file_dest          = "D:\APP\ADMINISTRATOR\ADMIN\orcl\ADUMP"
  audit_trail              = "DB"
  db_name                  = "orcl"
  open_cursors             = 300
  pga_aggregate_target     = 4639M
  diagnostic_dest          = "D:\APP\ADMINISTRATOR"
Thu Jul 03 16:36:20 2025
PMON started with pid=2, OS id=13088 
Thu Jul 03 16:36:20 2025
PSP0 started with pid=3, OS id=16168 
Thu Jul 03 16:36:21 2025
VKTM started with pid=4, OS id=7948 at elevated priority
VKTM running at (10)millisec precision with DBRM quantum (100)ms
Thu Jul 03 16:36:21 2025
GEN0 started with pid=5, OS id=4192 
Thu Jul 03 16:36:21 2025
DIAG started with pid=6, OS id=8232 
Thu Jul 03 16:36:21 2025
DBRM started with pid=7, OS id=16436 
Thu Jul 03 16:36:21 2025
DIA0 started with pid=8, OS id=11400 
Thu Jul 03 16:36:21 2025
MMAN started with pid=9, OS id=11108 
Thu Jul 03 16:36:21 2025
DBW0 started with pid=10, OS id=12232 
Thu Jul 03 16:36:21 2025
DBW1 started with pid=11, OS id=7368 
Thu Jul 03 16:36:21 2025
LGWR started with pid=12, OS id=13520 
Thu Jul 03 16:36:21 2025
CKPT started with pid=13, OS id=11952 
Thu Jul 03 16:36:21 2025
SMON started with pid=14, OS id=9304 
Thu Jul 03 16:36:21 2025
RECO started with pid=15, OS id=17136 
Thu Jul 03 16:36:21 2025
MMON started with pid=16, OS id=1984 
Thu Jul 03 16:36:21 2025
MMNL started with pid=17, OS id=2568 
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'
starting up 1 shared server(s) ...
ORACLE_BASE from environment = D:\app\Administrator
Thu Jul 03 16:36:22 2025
alter database mount exclusive
Successful mount of redo thread 1, with mount id 1287723014
Database mounted in Exclusive Mode
Lost write protection disabled
Completed: alter database mount exclusive
alter database open
Thread 1 opened at log sequence 296590
  Current log# 1 seq# 296590 mem# 0: D:\APP\ADMINISTRATOR\ORADATA\orcl\REDO01.LOG
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
SMON: enabling cache recovery
[15144] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:3680275922 end:3680276032 diff:110 (1 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_15144.trc  (incident=7579):
ORA-00600: 内部错误代码, 参数: [kokiasg1], [], [], [], [], [], [], [], [], [], [], []
Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\incident\incdir_7579\orcl_ora_15144_i7579.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_15144.trc:
ORA-00600: 内部错误代码, 参数: [kokiasg1], [], [], [], [], [], [], [], [], [], [], []
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_15144.trc:
ORA-00600: 内部错误代码, 参数: [kokiasg1], [], [], [], [], [], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
USER (ospid: 15144): terminating the instance due to error 600
Instance terminated by USER, pid = 15144
ORA-1092 signalled during: alter database open...

对数据库启动过程进行跟踪确认报错可能和IDGEN1$对象有关系

PARSING IN CURSOR #615624160 len=30 dep=1 uid=0 oct=3 lid=0 tim=752975051401
   hv=3013659460 ad='7ffbd8f025d0' sqlid='6d8vr86tu1ku4'
select TOTAL from SYS.ID_GENS$
END OF STMT
PARSE #615624160:c=15625,e=2775,p=2,cr=14,cu=0,mis=1,r=0,dep=1,og=4,plh=1676180847,tim=752975051401
EXEC #615624160:c=0,e=6,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=1676180847,tim=752975051452
WAIT #615624160: nam='db file sequential read' ela= 126 file#=1 block#=3440 blocks=1 obj#=514 tim=752975051594
WAIT #615624160: nam='db file sequential read' ela= 48 file#=1 block#=3441 blocks=1 obj#=514 tim=752975051671
FETCH #615624160:c=0,e=224,p=2,cr=3,cu=0,mis=0,r=1,dep=1,og=4,plh=1676180847,tim=752975051687
STAT #615624160 id=1 cnt=1 pid=0 pos=1 obj=514 op='TABLE ACCESS FULL ID_GENS$ (cr=3 pr=2 pw=0 time=223 us)'
CLOSE #615624160:c=0,e=15,dep=1,type=0,tim=752975051716
BINDS #12720440:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=80 off=0
  kxsbbbfp=24b1b128  bln=22  avl=01  flg=05
  value=0
 Bind#1
  oacdty=01 mxl=32(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0001 frm=01 csi=852 siz=0 off=24
  kxsbbbfp=24b1b140  bln=32  avl=07  flg=01
  value="IDGEN1$"
 Bind#2
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=0 off=56
  kxsbbbfp=24b1b160  bln=22  avl=02  flg=01
  value=1
EXEC #12720440:c=0,e=107,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=2853959010,tim=752975051842
FETCH #12720440:c=0,e=5,p=0,cr=3,cu=0,mis=0,r=0,dep=1,og=4,plh=2853959010,tim=752975051856
CLOSE #12720440:c=0,e=0,dep=1,type=3,tim=752975051870
Incident 161 created, dump file: C:\APP\XFF\diag\rdbms\orcl\orcl\incident\incdir_161\orcl_ora_1880_i161.trc
ORA-00600: 内部错误代码, 参数: [kokiasg1], [], [], [], [], [], [], [], [], [], [], []

ORA-00600: 内部错误代码, 参数: [kokiasg1], [], [], [], [], [], [], [], [], [], [], []
ORA-00600: 内部错误代码, 参数: [kokiasg1], [], [], [], [], [], [], [], [], [], [], []

从mos中确认当数据库缺少IDGEN1$序列的时候,启动会报ORA-600 kokiasg1错误.
ORA-600-kokiasg1


使用工具恢复obj$表到新库中

E:\dump>imp test/oracle file=SYS_OBJ$.dmp full=y

Import: Release 11.2.0.4.0 - Production on 星期六 7月 5 09:34:42 2025

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.


连接到: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

经由常规路径由 EXPORT:V08.01.07 创建的导出文件

警告: 这些对象由 SYS 导出, 而不是当前用户

已经完成 ZHS16GBK 字符集和 AL16UTF16 NCHAR 字符集中的导入
导出服务器使用 UTF8 NCHAR 字符集 (可能的 ncharset 转换)
. 正在将 SYS 的对象导入到 TEST
. 正在将 SYS 的对象导入到 TEST
. . 正在导入表                          "OBJ$"导入了      103764 行
成功终止导入, 没有出现警告。

查询test.obj$表确认没有IDGEN1$对象名称记录

SQL> select * from test.obj$ where name='IDGEN1$';

未选定行

SQL>

查询正常obj$字典中关于IDGEN1$对象信息

SQL> select owner#, obj#,type# from obj$ where name='IDGEN1$';

    OWNER#       OBJ#      TYPE#
---------- ---------- ----------
         0       1229          6

在故障库恢复出来的test.obj$中查询obj#为1229附近对象

SQL> select owner#, obj#,type#,name from test.obj$ where obj# in(1228,1229,1230);

    OWNER#       OBJ#      TYPE# NAME
---------- ---------- ---------- ------------------------------
         0       1228          2 DST$TRIGGER_TABLE
         0       1230         13 BFILE

SQL> select owner#, obj#,type#,name from obj$ where obj# in(1228,1229,1230);

    OWNER#       OBJ#      TYPE# NAME
---------- ---------- ---------- ------------------------------
         0       1228          2 DST$TRIGGER_TABLE
         0       1229          6 IDGEN1$
         0       1230         13 BFILE

目前看初步判断故障库确实由于IDGEN1$序列丢失导致无法启动,处理过程相对比较简单,在数据库open的过程中,打开新会话创建IDGEN1$序列序列
11
22


然后重启数据库,即可正常启动成功,让看尝试登录数据库报ora-600 12803错误
ORA-600-12803

再次检查alert日志大量ORA-600错误

Fri Jul 04 15:57:13 2025
Errors in file C:\APP\XFF\diag\rdbms\orcl\orcl\trace\orcl_ora_27788.trc  (incident=12239):
ORA-00600: 内部错误代码, 参数: [12803], [], [], [], [], [], [], [], [], [], [], []
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Fri Jul 04 15:58:04 2025
Errors in file C:\APP\XFF\diag\rdbms\orcl\orcl\trace\orcl_mmon_1976.trc  (incident=12184):
ORA-00600: 内部错误代码, 参数: [15264], [], [], [], [], [], [], [], [], [], [], []
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.

基于这样ORA-600错误,初步怀疑字典层面还有问题,因为最初的错误是序列异常,所以这次我重点对系统队列进行分析,通过dul把seq$表恢复到test用户中

E:\dump>imp test/oracle file=SYS_seq$.dmp full=y

Import: Release 11.2.0.4.0 - Production on 星期六 7月 5 10:10:17 2025

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.


连接到: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

经由常规路径由 EXPORT:V08.01.07 创建的导出文件

警告: 这些对象由 SYS 导出, 而不是当前用户

已经完成 ZHS16GBK 字符集和 AL16UTF16 NCHAR 字符集中的导入
导出服务器使用 UTF8 NCHAR 字符集 (可能的 ncharset 转换)
. 正在将 SYS 的对象导入到 TEST
. 正在将 SYS 的对象导入到 TEST
. . 正在导入表                          "SEQ$"导入了         359 行
成功终止导入, 没有出现警告。

查询发现之前的序列(obj=1229)的竟然还在seq$中(obj$中没有了记录)

SQL> select * from test.seq$ where obj#=1229;

      OBJ# INCREMENT$   MINVALUE   MAXVALUE     CYCLE#     ORDER$      CACHE
---------- ---------- ---------- ---------- ---------- ---------- ----------
 HIGHWATER AUDIT$                                      FLAGS
---------- -------------------------------------- ----------
      1229         50          1 1.0000E+28          0          0       1000
  60267151 --------------------------------                0

这种现象证明seq 不是通过drop sequence命令删除,而可能直接delete obj$表进行删除,通过试验重现正常删除seq之后,obj$和seq$都会同步被删除

SQL> create sequence xxxx;

序列已创建。

SQL> select obj#,type# from obj$ where name='XXXX';

      OBJ#      TYPE#
---------- ----------
     87383          6

SQL> SELECT * FROM SEQ$ WHERE OBJ#=87383;

      OBJ# INCREMENT$   MINVALUE   MAXVALUE     CYCLE#     ORDER$      CACHE
---------- ---------- ---------- ---------- ---------- ---------- ----------
 HIGHWATER AUDIT$                                      FLAGS
---------- -------------------------------------- ----------
     87383          1          1 1.0000E+28          0          0         20
         1 --------------------------------                0


SQL> DROP SEQUENCE XXXX;

序列已删除。

SQL> SELECT * FROM SEQ$ WHERE OBJ#=87383;

未选定行

SQL> select obj#,type# from obj$ where name='XXXX';

未选定行

想到这里,那进一步分析,是否还有其他的系统序列被删除,分析思路是:在一个正常的库里面找出来SYS的seq的obj#,然后和test用户里面的obj$,seq$表里面对比
找出来test.obj$中sys用户的seq对象名字

SQL> select name,obj#,type# from test.obj$ where obj# in(
  2  select obj# from sys.obj$ where owner#=0 and type#=6)
  3  and type#=6;

未选定行

通过查询确认故障库中sys下面系统自带的核心seq的对象名称全部被删除(obj$中明确被删除),分析seq$中情况确认
QQ20250705-102429

SQL> select name,ctime from test.obj$ where type#=6 and owner#=0;

未选定行

通过上述相关核实,故障库中的obj$中系统字典seq基本上被删除(正常情况应该有130多个).对于这种情况,后续的类此比较简单,通过seq$表内容,构造出来系统 seq的创建语句,对其进行创建,然后数据库恢复正常,完成本次恢复工作.

ORA-00756 ORA-10567故障数据0丢失恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-00756 ORA-10567故障数据0丢失恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户虚拟化故障修复之后,数据库启动报ORA-600 kcratr_scan_lastbwr错误
kcratr_scan_lastbwr


这个是一个比较常见的错误,一般recover 下就ok了,但是有些时候会出现ORA-600 3020或者类似ORA-00756 ORA-10567的错误,比如这次不幸就遇到了该错误

SQL> recover database;
ORA-00283: recovery session canceled due to errors
ORA-00756: recovery detected a lost write of a data block
ORA-10567: Redo is inconsistent with data block (file# 10, block# 4005760, file
offset is 2750414848 bytes)
ORA-10564: tablespace PACS55
ORA-01110: data file 10: '/u02/oradata/pacsdb/pacs55.4.dbf'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 76649

然后尝试单个文件recover恢复

SQL> recover datafile 10;
ORA-00283: recovery session canceled due to errors
ORA-00756: recovery detected a lost write of a data block
ORA-10567: Redo is inconsistent with data block (file# 10, block# 4005760, file
offset is 2750414848 bytes)
ORA-10564: tablespace PACS55
ORA-01110: data file 10: '/u02/oradata/pacsdb/pacs55.4.dbf'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 76649

SQL> recover datafile 9;
ORA-00283: recovery session canceled due to errors
ORA-00756: recovery detected a lost write of a data block
ORA-10567: Redo is inconsistent with data block (file# 9, block# 4158754, file
offset is 4003741696 bytes)
ORA-10564: tablespace PACS55
ORA-01110: data file 9: '/u02/oradata/pacsdb/pacs55.3.dbf'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 76660

通过dbv检查这两个异常文件

[oracle@oradb ~]$ dbv file=/u02/oradata/pacsdb/pacs55.3.dbf

DBVERIFY: Release 19.0.0.0.0 - Production on Sat Jun 28 23:02:15 2025

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /u02/oradata/pacsdb/pacs55.3.dbf


DBVERIFY - Verification complete

Total Pages Examined         : 4194302
Total Pages Processed (Data) : 2482487
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 1655515
Total Pages Failing   (Index): 0
Total Pages Processed (Lob)  : 25017
Total Pages Failing   (Lob)  : 0
Total Pages Processed (Other): 15919
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 15364
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 311133131196 (72.1895485884)
[oracle@oradb ~]$ dbv file=/u02/oradata/pacsdb/pacs55.4.dbf 

DBVERIFY: Release 19.0.0.0.0 - Production on Sat Jun 28 23:04:59 2025

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /u02/oradata/pacsdb/pacs55.4.dbf


DBVERIFY - Verification complete

Total Pages Examined         : 4194302
Total Pages Processed (Data) : 2466409
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 1683244
Total Pages Failing   (Index): 0
Total Pages Processed (Lob)  : 16977
Total Pages Failing   (Lob)  : 0
Total Pages Processed (Other): 15909
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 11763
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 311133133727 (72.1895488415)

确定数据文件本身没有坏块,只是redo写丢失或者某种bug导致少量block应用redo的时候异常,而且报错是index,直接通过底层处理报错的block,让其这些报错的block直接不应用日志,然后完成recover操作,其他数据块数据不会丢失(最大限度减少损失,而不是直接修改文件头scn,或者强制拉库的方式来处理)

SQL> select file#,fuzzy from v$datafile_header;

     FILE# FUZ
---------- ---
	 1 NO
	 2 NO
	 3 NO
	 4 NO
	 5 NO
	 7 NO
	 8 NO
	 9 YES
	10 YES
	11 NO
	12 NO

     FILE# FUZ
---------- ---
	13 NO
	14 NO
	15 NO
	16 NO
	17 NO
	18 NO
	19 NO

18 rows selected.

SQL> recover  datafile 9 ;
Media recovery complete.
SQL> recover  datafile 10 ;
ORA-00283: recovery session canceled due to errors
ORA-00756: recovery detected a lost write of a data block
ORA-10567: Redo is inconsistent with data block (file# 10, block# 3822912, file
offset is 1252524032 bytes)
ORA-10564: tablespace PACS55
ORA-01110: data file 10: '/u02/oradata/pacsdb/pacs55.4.dbf'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 76649
 
SQL> recover  datafile 10;
Media recovery complete.

正常open数据库成功,并rebuild 异常的对象

SQL> alter database open;

Database altered.

SQL> select owner,object_name,object_type from dba_objects where data_object_id in(76649,76660);

OWNER
--------------------------------------------------------------------------------
OBJECT_NAME
--------------------------------------------------------------------------------
OBJECT_TYPE
-----------------------
PACS55
STUDYINFO_DIAGRPTID
INDEX

PACS55
PACS_STUDYINFO_PK
INDEX

OWNER
--------------------------------------------------------------------------------
OBJECT_NAME
--------------------------------------------------------------------------------
OBJECT_TYPE
-----------------------


SQL> alter index PACS55.STUDYINFO_DIAGRPTID rebuild online parallel 4;

Index altered.

SQL> alter index PACS55.PACS_STUDYINFO_PK rebuild online parallel 4;

Index altered.

SQL> 
SQL> 
SQL> 
SQL> alter index PACS55.STUDYINFO_DIAGRPTID noparallel;
alter index PACS55.PACS_STUDYINFO_PK noparallel;
Index altered.

SQL> 

Index altered.

至此该库完美恢复业务可以直接使用,业务数据0丢失.这次运气比较好,如果是表数据异常,可能会麻烦一点,但是也可以最大限度恢复(肯定比强制拉库,或者修改文件头的方式效果好)