ORA-704 ORA-604 ORA-942 恢复

接到客户请求,oracle数据库停机重启维护之后,无法正常启动,请求我们给予协助
数据库启动报ORA-00704 ORA-00604 ORA-00942错误

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------

Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
PL/SQL Release 11.2.0.1.0 - Production
CORE    11.2.0.1.0      Production
TNS for 32-bit Windows: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production

SQL> startup open
ORACLE 例程已经启动。

Total System Global Area 1288949760 bytes
Fixed Size                  1376520 bytes
Variable Size             377491192 bytes
Database Buffers          897581056 bytes
Redo Buffers               12500992 bytes
数据库装载完毕。
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-00942: table or view does not exist
进程 ID: 2756
会话 ID: 5 序列号: 9

alert日志报错

SMON: enabling cache recovery
Errors in file d:\app\administrator\diag\rdbms\xff\xff\trace\xff_ora_2756.trc:
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-00942: 表或视图不存在
Errors in file d:\app\administrator\diag\rdbms\xff\xff\trace\xff_ora_2756.trc:
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-00942: 表或视图不存在
Error 704 happened during db open, shutting down database
USER (ospid: 2756): terminating the instance due to error 704
Instance terminated by USER, pid = 2756
ORA-1092 signalled during: ALTER DATABASE OPEN...
opiodr aborting process unknown ospid (2756) as a result of ORA-1092
Fri Nov 30 12:51:26 2018
ORA-1092 : opitsk aborting process

根据这些年的恢复经验,恢复过相关错误的主要有:
ORA-01092 ORA-00704 ORA-00942
Oracle 11g丢失access$恢复方法
11.1.0.7版本也会出现access$表丢失导致数据库无法启动
总结主要两类:1. 由于某种bug导致access$表丢失的故障,另外一种是由于坏块导致数据库核心基表损坏引起,对于这个库进行分析

dbv 检查坏块

D:\APP\ADMINISTRATOR\ORADATA\ORCL> dbv file=system01.dbf

DBVERIFY: Release 10.2.0.3.0 - Production on 星期五 11月 30 15:11:24 2018

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

DBVERIFY - 开始验证: FILE = system01.dbf


DBVERIFY - 验证完成

检查的页总数: 93440
处理的页总数 (数据): 61979
失败的页总数 (数据): 0
处理的页总数 (索引): 13560
失败的页总数 (索引): 0
处理的页总数 (其它): 3067
处理的总页数 (段)  : 0
失败的总页数 (段)  : 0
空的页总数: 14834
标记为损坏的总页数: 0
流入的页总数: 0
最高块 SCN            : 659587683 (0.659587683)

通过dbv检查数据库,确定system没有坏块
10046跟踪数据库启动

SQL> startup mount;
ORACLE 例程已经启动。

Total System Global Area 1288949760 bytes
Fixed Size                  1376520 bytes
Variable Size             377491192 bytes
Database Buffers          897581056 bytes
Redo Buffers               12500992 bytes
数据库装载完毕。

SQL> oradebug setmypid
已处理的语句
SQL> alter session set db_file_multiblocK_read_count=1;

会话已更改。

SQL> ALTER SESSION SET EVENTS '704 trace name errorstack level 3';

会话已更改。

SQL> oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12
已处理的语句
SQL> oradebug TRACEFILE_NAME
d:\app\administrator\diag\rdbms\xff\xff\trace\xff_ora_1132.trc

--trace信息
PARSE ERROR #3:len=208 dep=1 uid=0 oct=9 lid=0 tim=5838844893 err=942
CREATE UNIQUE INDEX I_OBJ1 ON OBJ$(OBJ#,OWNER#,TYPE#) PCTFREE 10 INITRANS 2 MAXTRANS 255 
STORAGE (  INITIAL 64K NEXT 1024K MINEXTENTS 1 MAXEXTENTS 2147483645
 PCTINCREASE 0 OBJNO 36 EXTENTS (FILE 1 BLOCK 336))

*** 2018-11-30 13:18:07.593
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=3, mask=0x0)
----- Error Stack Dump -----
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-00942: 表或视图不存在

通过这一步基本上可以判断是由于obj$表丢失导致数据库创建I_OBJ1 index不成功,从而使得数据库无法正常启动。通过一些技巧修复出来obj$表,尝试启动数据库

SQL> alter database open;
alter database open
*
第 1 行出现错误:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [4000], [6], [], [], [], [], [], [],
[], [], [], []
进程 ID: 836
会话 ID: 355 序列号: 6027

alert日志报错

Fri Nov 30 15:33:47 2018
SMON: enabling cache recovery
Errors in file d:\app\administrator\diag\rdbms\xff\xff\trace\xff_ora_836.trc  (incident=648694):
ORA-00600: 内部错误代码, 参数: [4000], [6], [], [], [], [], [], [], [], [], [], []
Incident details in: d:\app\administrator\diag\rdbms\xff\xff\incident\incdir_648694\xff_ora_836_i648694.trc
Errors in file d:\app\administrator\diag\rdbms\xff\xff\trace\xff_ora_836.trc:
ORA-00704: 引导程序进程失败
ORA-00600: 内部错误代码, 参数: [4000], [6], [], [], [], [], [], [], [], [], [], []
Errors in file d:\app\administrator\diag\rdbms\xff\xff\trace\xff_ora_836.trc:
ORA-00704: 引导程序进程失败
ORA-00600: 内部错误代码, 参数: [4000], [6], [], [], [], [], [], [], [], [], [], []
Error 704 happened during db open, shutting down database
USER (ospid: 836): terminating the instance due to error 704
Fri Nov 30 15:33:57 2018
Instance terminated by USER, pid = 836
ORA-1092 signalled during: alter database open...
opiodr aborting process unknown ospid (836) as a result of ORA-1092

发现数据库启动报ORA-00704和ORA-00600: internal error code, arguments: [4000], [6], [], [], [], [], [], [],错误,根据以往经验,该问题是由于回滚段异常导致
分析trace信息

Dump continued from file: d:\app\administrator\diag\rdbms\xff\xff\trace\xff_ora_2124.trc
ORA-00600: 内部错误代码, 参数: [4000], [6], [], [], [], [], [], [], [], [], [], []

========= Dump for incident 651102 (ORA 600 [4000]) ========

*** 2018-11-30 15:46:32.125
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- Current SQL Statement for txff session (sql_id=6apq2rjyxmxpj) -----
select line#, sql_text from bootstrap$ where obj# != :1

----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
_skdstdst()+121      CALLrel  _kgdsdst()           E6B1614 2
_ksedst1()+93        CALLrel  _skdstdst()          E6B1614 0 1 436646 435BE2
                                                   436646
_ksedst()+49         CALLrel  _ksedst1()           0 1
_dbkedDefDump()+367  CALLrel  _ksedst()            0
2                                                  
_ksedmp()+44         CALLrel  _dbkedDefDump()      3 2
_ksfdmp()+56         CALLrel  _ksedmp()            3EB
_dbgexPhaseII()+164  CALLreg  00000000             C378A28 3EB
0                                                  
_dbgexProcessError(  CALLrel  _dbgexPhaseII()      E2B0454 E2B6E50 E6B6078
)+2061                                             
_dbgeExecuteForErro  CALLrel  _dbgexProcessError(  E2B0454 E2B6E50 1 0 E2B0454
r()+43                        )                    E2B6E50
__VInfreq__dbgePost  CALLrel  _dbgeExecuteForErro  E2B0454 E2B6E50 0 1 0
ErrorKGE()+260                r()                  
_dbkePostKGE_kgsf()  CALLrel  _dbgePostErrorKGE()  C378A28 E2BD274 258
+56                                                
_kgeade()+299        CALLreg  00000000             C378A28 E2BD274 258
_kgeriv_int()+79     CALLrel  _kgeade()            C378A28 C378B50 E2BD274 258 0
                                                   FA0 0 0 0 1 E6B68A8
_kgeriv()+22         CALLrel  _kgeriv_int()        C378A28 E2BD274 FA0 0 1
                                                   E6B68A8
_kgeasi()+107        CALLrel  _kgeriv()            C378A28 E2BD274 FA0 1 E6B68A8
__VInfreq__ktuGetUs  CALLrel  _kgeasi()            C378A28 E2BD274 FA0 2 1 0 6 0
egDba()+123                                        
_ktrgcm()+5147       CALLrel  _ktuGetUsegDba()     6 E6B6E78 0 0 E6B6F48 0
_ktrget2()+596       CALLrel  _ktrgcm()            F9FCEAC
_kdst_fetch()+816    CALLrel  _ktrget2()           F9FCEAC F9FCE24 303 0
_kdstf11001010000km  CALLrel  _kdst_fetch()        1 F9FCEA8 E6B71C8
()+2806                                            
_kdsttgr()+5944      CALLrel  _kdstf11001010000km  F9FCEA8 0 557B044C F9FCDF8
                              ()                   2F0C3A6 E6B7894
_qertbFetch()+767    CALLrel  _kdsttgr()           F9FCEA8 0 557B044C F9FCDF8
                                                   557B0498 2F0C3A6 E6B7894 1
_opifch2()+2729      CALLptr  00000000             E6E7228 0 0 2 26180001
_opifch()+53         CALLrel  _opifch2()           89 5 E6B7A04
_opiodr()+1248       CALLreg  00000000             5 2 E6B81FC
_rpidrus()+186       CALLrel  _opiodr()            5 2 E6B81FC 2
_rpidru()+90         CALLrel  _rpidrus()           E6B7D58
_rpiswu2()+557       CALLrel  _rpidru()            E6B813C
_rpidrv()+1242       CALLrel  _rpiswu2()           6D9A295C 0 6D9A29A8 2 E6B8184
                                                   0 6D9A2A28 0 0 544632 5448D6
                                                   E6B813C 8
_rpifch()+43         CALLrel  _rpidrv()            2 5 E6B81FC 8
_kqlbebs()+1213      CALLrel  _rpifch()            2 2 2 F013232 FA0 1 0 E6B8634
                                                   0 0 0 0 0
_kqlblfc()+175       CALLrel  _kqlbebs()           0 E6BBBD4
_adbdrv()+16992      CALLrel  _kqlblfc()           0 E6BBBD4
_opiexe()+13594      CALLrel  _adbdrv()            4A 6E6CBC48 6DDEDB6C E6BBD68
                                                   6D60697 6E6CBC48
_opiosq0()+6248      CALLrel  _opiexe()            4 0 E6BC734
_kpooprx()+277       CALLrel  _opiosq0()           3 E E6BC9A0 A4 0
_kpoal8()+632        CALLrel  _kpooprx()           E6BF0A4 E6BD420 1B 1 0 A4
_opiodr()+1248       CALLreg  00000000             5E 1C E6BF0A0
_ttcpip()+1051       CALLreg  00000000             5E 1C E6BF0A0 0
_opitsk()+1404       CALL???  00000000             C3832A8 5E E6BF0A0 0 E6BED30
                                                   E6BF1CC 53E52E 0 E6BF1F8
_opiino()+980        CALLrel  _opitsk()            0 0
_opiodr()+1248       CALLreg  00000000             3C 4 E6BFBF4
_opidrv()+1201       CALLrel  _opiodr()            3C 4 E6BFBF4 0
_sou2o()+55          CALLrel  _opidrv()            3C 4 E6BFBF4
_opimai_real()+124   CALLrel  _sou2o()             E6BFC04 3C 4 E6BFBF4
_opimai()+125        CALLrel  _opimai_real()       2 E6BFC2C
_OracleThreadStart@  CALLrel  _opimai()            2 E6BFF6C 7C9BA7F4 E6BFC34 0
4()+830                                            E6BFD04
7C82484C             CALLreg  00000000             E5BFF9C 0 0 E5BFF9C 0 E6BFFC4
00000000             CALL???  00000000             
 

--------------------- Binary Stack Dump ---------------------


Block header dump:  0x0040020b
 Object id on Block? Y
 seg/obj: 0x3b  csc: 0x00.27508136  itc: 1  flg: O  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0006.015.0005e9e9  0x00c0052c.916f.10  --U-    1  fsc 0x0000.27508263

报错比较明显该block比较异常,通过bbed修改相关block信息,相关处理参考:
重现ORA-600 4000异常
通过bbed解决ORA-00600[4000]案例
记录一次ORA-600 4000数据库故障恢复
ORACLE 8.1.7 数据库ORA-600 4000故障恢复

处理之后数据库正常open

Fri Nov 30 15:57:34 2018
SMON: enabling cache recovery
Successfully onlined Undo Tablespace 2.
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
No Resource Manager plan active
replication_dependency_tracking turned off (no async multimaster replication found)
Fri Nov 30 15:57:38 2018
Starting background process QMNC
Fri Nov 30 15:57:38 2018
QMNC started with pid=23, OS id=1152 
Completed: alter database open

0kb数据文件恢复或者文件丢失恢复

接到一个朋友恢复请求,由于rose频繁切换导致文件系统部分数据文件变化为0kb和文件丢失.
故障现象
部分数据文件变化为0kb和文件丢失.
file_lost
file_size_0


这里比较明显,数据库的users03变为了0kb和users04丢失.数据库alert日志报错信息如下:

Completed: alter database mount exclusive
alter database open
Errors in file E:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_dbw0_12008.trc:
ORA-01157: ????/?????? 7 - ??? DBWR ????
ORA-01110: ???? 7: 'E:\APP\ADMINISTRATOR\ORADATA\ORCL\USERS03.DBF'
ORA-27047: ??????????
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 38) 已到文件结尾。
Errors in file E:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_dbw0_12008.trc:
ORA-01157: ????/?????? 8 - ??? DBWR ????
ORA-01110: ???? 8: 'E:\APP\ADMINISTRATOR\ORADATA\ORCL\USERS04.DBF'
ORA-27041: ??????
OSD-04002: 无法打开文件
O/S-Error: (OS 2) 系统找不到指定的文件。
Errors in file E:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_12040.trc:
ORA-01157: ????/?????? 7 - ??? DBWR ????
ORA-01110: ???? 7: 'E:\APP\ADMINISTRATOR\ORADATA\ORCL\USERS03.DBF'
ORA-1157 signalled during: alter database open...
Fri May 04 09:35:10 2018
Checker run found 2 new persistent data failures

alert日志的报错也比较明显,users03是文件超过了大小(大小为0kb,读取之后肯定超过大小),users04提示无法打开文件(文件在文件系统层面已经丢失).现在问题比较明显由于文件系统故障导致文件大小为0和丢失

碎片扫描恢复
常规的方法肯定无法恢复,比较好的方法只能是底层碎片扫描重组,结合多种扫描工具,最后发现一个做底层恢复的朋友的工具效果不错,扫描结果如下
file_scan


通过工具分析坏块情况

C:\Users\Administrator>dbv FiLe=D:\0504\ORCL_TS.4_FILE.7_10.ora

DBVERIFY: Release 11.2.0.4.0 - Production on 星期六 5月 5 08:52:53 2018

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - 开始验证: FILE = D:\0504\ORCL_TS.4_FILE.7_10.ora

………………

页 382565 标记为损坏
Corrupt block relative dba: 0x01c5d665 (file 7, block 382565)
Completely zero block found during dbv:

页 382566 标记为损坏
Corrupt block relative dba: 0x01c5d666 (file 7, block 382566)
Completely zero block found during dbv:

页 382567 标记为损坏
Corrupt block relative dba: 0x01c5d667 (file 7, block 382567)
Completely zero block found during dbv:



DBVERIFY - 验证完成

检查的页总数: 1374720
处理的页总数 (数据): 27582
失败的页总数 (数据): 0
处理的页总数 (索引): 20114
失败的页总数 (索引): 0
处理的页总数 (其他): 1319752
处理的总页数 (段)  : 0
失败的总页数 (段)  : 0
空的页总数: 1
标记为损坏的总页数: 7271
流入的页总数: 0
加密的总页数        : 0
最高块 SCN            : 228271996 (0.228271996)


C:\Users\Administrator>dbv FiLe=D:\0504\ORCL_TS.4_FILE.8_8.ora

DBVERIFY: Release 11.2.0.4.0 - Production on 星期六 5月 5 08:52:53 2018

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - 开始验证: FILE = D:\0504\ORCL_TS.4_FILE.8_8.ora


DBVERIFY - 验证完成

检查的页总数: 1136896
处理的页总数 (数据): 36639
失败的页总数 (数据): 0
处理的页总数 (索引): 57038
失败的页总数 (索引): 0
处理的页总数 (其他): 1043218
处理的总页数 (段)  : 0
失败的总页数 (段)  : 0
空的页总数: 1
标记为损坏的总页数: 0
流入的页总数: 0
加密的总页数        : 0
最高块 SCN            : 228271997 (0.228271997)

C:\Users\Administrator>

scan_resulte


这里通过分析恢复的两个文件总的block数量2511618,其中连续损坏7271个block损坏,由于出现问题之后,数据库被offline这两个文件继续启动运行了几个小时,导致少量block被覆盖,恢复软件直接置空.后续的恢复比较顺利,正常open数据库,然后处理坏块对象(正好不是业务核心表的lob字段,所有部分丢失影响不是非常大).

温馨提醒:
1. 数据文件和备份不要放在同一个阵列上,更不能是同一个分区(卷)上
2. 出现此类问题之后,应当理解停止对该分区的任何写操作,方式丢失或者大小为0的文件被覆盖.

ORA-00704 ORA-00702 恢复

数据库启动报ORA-01092 ORA-00704 ORA-00702错误

SQL> startup
ORACLE instance started.

Total System Global Area 3056513024 bytes
Fixed Size                  2257152 bytes
Variable Size             704646912 bytes
Database Buffers         2332033024 bytes
Redo Buffers               17575936 bytes
Database mounted.
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00702: bootstrap verison '' inconsistent with version '8.0.0.0.0'
Process ID: 27344
Session ID: 191 Serial number: 3

bootstrap-ORA-00702


查看alert日志

Mon Apr 09 16:22:34 2018
ALTER DATABASE   MOUNT
Successful mount of redo thread 1, with mount id 1383493834
Database mounted in Exclusive Mode
Lost write protection disabled
Completed: ALTER DATABASE   MOUNT
Mon Apr 09 16:22:39 2018
ALTER DATABASE OPEN
Thread 1 opened at log sequence 3
  Current log# 3 seq# 3 mem# 0: /u01/app/oracle/oradata/orcl/redo03.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
SMON: enabling cache recovery
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_27344.trc:
ORA-00704: bootstrap process failure
ORA-00702: bootstrap verison '' inconsistent with version '8.0.0.0.0'
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_27344.trc:
ORA-00704: bootstrap process failure
ORA-00702: bootstrap verison '' inconsistent with version '8.0.0.0.0'
Error 704 happened during db open, shutting down database
USER (ospid: 27344): terminating the instance due to error 704
Instance terminated by USER, pid = 27344
ORA-1092 signalled during: ALTER DATABASE OPEN...
opiodr aborting process unknown ospid (27344) as a result of ORA-1092
Mon Apr 09 16:22:40 2018
ORA-1092 : opitsk aborting process

错误比较明显是由于数据库open过程中bootstrap异常导致,出现此类错误一般是由于软件介质和db不匹配或者bootstrap表的block故障导致.

官方说明

Versions 9.2, 10.1, 10.2, 11.1, 11.2, 12.1

Error:  ORA-00702 bootstrap verison '%s' inconsistent with version '%s' 
---------------------------------------------------------------------------
Cause:  The reading version of the boostrap is incompatible with the current 
	bootstrap version. 
Action: Restore a version of the software that is compatible with this 
	bootstrap version

由于bootstrap$以及其中相关表处理比较特殊,如果您遭遇此类bootstrap$相关异常无法解决,需要恢复支持,请联系我们
Phone:13429648788    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com

ORA-00600: internal error code, arguments: [16703], [1403], [20]

继续上篇的tab$被清空(ORA-600 16703故障解析—tab$表被清空),导致数据库启动异常的case
数据库日志分析
数据库open成功同时报ORA-7445 kokeicbegin和ORA-600 kzrini:!uprofile错误
ora-600-kzrini-uprofile


再次启动数据库直接报ORA-600 16703错误
ora-600-16703

这里有一个其他现象,就是数据库在open成功的同时(同一秒内),开始报异常.重启之后直接报
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
根据10046分析结果

=====================
select rowcnt,blkcnt,empcnt,avgspc,chncnt,avgrln,nvl(degree,1), nvl(instances,1) from tab$ where obj# = :1
END OF STMT
PARSE #140048443935120:c=0,e=390,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905161433
=====================
select blevel, leafcnt, distkey, lblkkey, dblkkey, clufac,        nvl(degree,1), nvl(instances,1) from ind$ where bo# = :1 and obj# = :2
END OF STMT
PARSE #140048443934176:c=1000,e=601,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905162088
=====================
PARSING IN CURSOR #140048443933232 len=70 dep=1 uid=0 oct=3 lid=0 tim=1499185905162444 hv=3377894161 ad='84f13d70' sqlid='32d4jrb4pd4sj'
select charsetid, charsetform from col$  where obj# = :1 and col# = :2
END OF STMT
PARSE #140048443933232:c=0,e=294,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905162443
=====================
PARSING IN CURSOR #140048443932288 len=52 dep=1 uid=0 oct=3 lid=0 tim=1499185905247020 hv=429618617 ad='84f0f2d0' sqlid='4krwuz0ctqxdt'
select ctime, mtime, stime from obj$ where obj# = :1
END OF STMT
PARSE #140048443932288:c=0,e=549,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905247019
BINDS #140048443932288:
select blevel, leafcnt, distkey, lblkkey, dblkkey, clufac,        nvl(degree,1), nvl(instances,1) from ind$ where bo# = :1 and obj# = :2
END OF STMT
PARSE #140048443934176:c=1000,e=601,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905162088
=====================
PARSING IN CURSOR #140048443933232 len=70 dep=1 uid=0 oct=3 lid=0 tim=1499185905162444 hv=3377894161 ad='84f13d70' sqlid='32d4jrb4pd4sj'
select charsetid, charsetform from col$  where obj# = :1 and col# = :2
END OF STMT
PARSE #140048443933232:c=0,e=294,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905162443
=====================
PARSING IN CURSOR #140048443932288 len=52 dep=1 uid=0 oct=3 lid=0 tim=1499185905247020 hv=429618617 ad='84f0f2d0' sqlid='4krwuz0ctqxdt'
select ctime, mtime, stime from obj$ where obj# = :1
END OF STMT
PARSE #140048443932288:c=0,e=549,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499185905247019
BINDS #140048443932288:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7f5f91b87bd0  bln=22  avl=02  flg=05
  value=20
EXEC #140048443932288:c=2000,e=2686,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=1218588913,tim=1499185905249810
WAIT #140048443932288: nam='db file sequential read' ela= 6205 file#=1 block#=337 blocks=1 obj#=36 tim=1499185905256132
WAIT #140048443932288: nam='db file sequential read' ela= 3739 file#=1 block#=338 blocks=1 obj#=36 tim=1499185905266704
WAIT #140048443932288: nam='db file sequential read' ela= 4966 file#=1 block#=241 blocks=1 obj#=18 tim=1499185905271761
FETCH #140048443932288:c=0,e=21976,p=3,cr=3,cu=0,mis=0,r=1,dep=1,og=4,plh=1218588913,tim=1499185905271820
STAT #140048443932288 id=1 cnt=1 pid=0 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWID OBJ$ (cr=3 pr=3 pw=0 time=21993 us)'
STAT #140048443932288 id=2 cnt=1 pid=1 pos=1 obj=36 op='INDEX RANGE SCAN I_OBJ1 (cr=2 pr=2 pw=0 time=16923 us)'
CLOSE #140048443932288:c=0,e=63,dep=1,type=0,tim=1499185905271941
BINDS #140048443935120:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7f5f91c07de8  bln=22  avl=02  flg=05
  value=20
EXEC #140048443935120:c=1000,e=795,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=2970138452,tim=1499185905272802
WAIT #140048443935120: nam='db file sequential read' ela= 3197 file#=1 block#=169 blocks=1 obj#=3 tim=1499185905276069
WAIT #140048443935120: nam='db file sequential read' ela= 3459 file#=1 block#=170 blocks=1 obj#=3 tim=1499185905404084
WAIT #140048443935120: nam='db file sequential read' ela= 6358 file#=1 block#=145 blocks=1 obj#=4 tim=1499185905410548
FETCH #140048443935120:c=999,e=137805,p=3,cr=3,cu=0,mis=0,r=0,dep=1,og=4,plh=2970138452,tim=1499185905410635
STAT #140048443935120 id=1 cnt=0 pid=0 pos=1 obj=4 op='TABLE ACCESS CLUSTER TAB$ (cr=3 pr=3 pw=0 time=137810 us)'
STAT #140048443935120 id=2 cnt=1 pid=1 pos=1 obj=3 op='INDEX UNIQUE SCAN I_OBJ# (cr=2 pr=2 pw=0 time=131330 us)'
 
*** 2017-07-05 00:31:46.094
Incident 176395 created, dump file: /oracle/diag/rdbms/orcl/orcl2/incident/incdir_176395/orcl_ora_51261_i176395.trc
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []

以及以往恢复经验和mos,基本上可以确定是由于tab$和obj$记录不匹配导致该问题.而且是obj#=20的记录在tab$和obj$中不匹配.

分析tab$和obj$记录

Data UnLoader: 11.2.0.1.5 - Internal Only - on Wed Jul 05 01:28:53 2017
with 64-bit io functions and the decompression option
 
Copyright (c) 1994 2017 Bernard van Duijnen All rights reserved.
 
 Strictly Oracle Internal Use Only
 
 
Found db_id = 1334610369
Found db_name = orcl
DUL> unload table TAB$( OBJ# number, DATAOBJ# number,
  2      TS# number, FILE# number, BLOCK# number,
  3      BOBJ# number, TAB# number, COLS number, CLUCOLS number,
  4      PCTFREE$ ignore, PCTUSED$ ignore, INITRANS ignore, MAXTRANS ignore,
  5      FLAGS ignore, AUDIT$ ignore, ROWCNT ignore, BLKCNT ignore,
  6      EMPCNT ignore, AVGSPC ignore, CHNCNT ignore, AVGRLN ignore,
  7      AVGSPC_FLB ignore, FLBCNT ignore,
  8      ANALYZETIME ignore, SAMPLESIZE ignore,
  9      DEGREE ignore, INSTANCES ignore,
 10      INTCOLS ignore, KERNELCOLS number, PROPERTY number)
 11      cluster  C_OBJ#(OBJ#)
 12      storage ( tablespace 0 segobjno 2 tabno 1 file 1 block 144);
. unloading table                      TAB$       0 rows unloaded
DUL> unload table OBJ$( OBJ# number, DATAOBJ# number, OWNER# number,
  2      NAME clean varchar2(30), NAMESPACE ignore, SUBNAME clean varchar2(30),
  3      TYPE# number, CTIME ignore, MTIME ignore, STIME ignore,
  4      STATUS ignore, REMOTEOWNER ignore, LINKNAME ignore,
  5      FLAGS ignore, OID$ hexraw)
  6      storage ( tablespace 0 segobjno 18 file 1 block 240);
. unloading table                      OBJ$   89804 rows unloaded
DUL>

这里可以明确表示tab$无记录,obj$有记录,从而启动的过程出现ORA-600 16703错误可以很好解释.由于数据库启动成功和报错几乎同时进行,以及上面看到的tab$记录不存在的情况,初步怀疑是有startup触发器清空tab$表记录
工具分析触发器表trigger$
startup-trigger
这里果然看到一个after startup on database的触发器,名字为DBMS_SUPPORT_DBMONITOR,而它调用的是DBMS_SUPPORT_DBMONITORP存储.

工具分析pl/sql表source$
DBMS_SUPPORT_DBMONITOR


对wraped加密的程序进行解密
DBMS_SUPPORT_DBMONITOR-unwraped

这里可以明确的看到DBMS_SUPPORT_DBMONITORP存储过程备份tab$表到orachk中然后delete tab$表,现在已经明确是由于DBMS_SUPPORT_DBMONITOR触发器在数据库重启之后开始执行调用DBMS_SUPPORT_DBMONITORP程序,如果判断数据库创建时间大于等于300天,便干掉tab$表,实现数据库破坏.

查找DBMS_SUPPORT_DBMONITOR等程序源头
分析相关程序创建时间,通过obj$表的ctime和name来判断
DBMS_SUPPORT_DBMONITOR-ctime
bootstrap-ctime

这里可以看出来DBMS_SUPPORT_DBMONITOR和DBMS_SUPPORT_DBMONITORP的创建时间基本上和数据库核心对象的创建时间相差无几,我们可以大概排除掉,是由于pl sql等工具连接数据库导致该发问题(类似:plsql dev引起的数据库被黑勒索比特币实现原理分析和解决方案),很可能是在dbca创建库的过程中就已经带有了DBMS_SUPPORT_DBMONITOR等程序,如果这样那很可能是由于数据库的安装介质被破坏导致该问题.

分析DBMS_SUPPORT_DBMONITOR来源
prvtsupp
20170711001626

这里已经很清晰,由于prvtsupp.plb被人注入了恶意脚本从而使得数据库被创建了DBMS_SUPPORT_DBMONITOR的触发器和DBMS_SUPPORT_DBMONITORP的存储过程,从而实现了对数据库的而易破坏.
md5

通过md5码对比,可以确定是有人对软件的安装介质进行了破坏,从而实现了恶意代码的注入,从而实现了数据库300天之后重启之后无法正常启动而是出现类似ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []的错误.提醒各位一定要从官方途径下载oracle安装介质,如果是从其他互联网途径下载一定要验证md5,确保文件没有被人恶意篡改,造成无可挽回的损坏.如果真的不幸遇到这类问题,请保护现场联系我们
Tel:13429648788    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com
可以实现业务数据0丢失恢复,最大限度抢救您的数据

ora-600 2037 ORA-7445 kcbs_dump_adv_state

有客户系统断电,导致数据库无法启动,让我们帮忙解决,通过分析主要是ORA-600 2037和ORA-7445 _kcbs_dump_adv_state等错误,通过人工recover解决.
数据库报ORA-03113,无法启动成功

C:\Documents and Settings\Administrator>sqlplus / as sysdba

SQL*Plus: Release 10.2.0.1.0 - Production on 星期五 5月 12 09:50:36 2017

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

已连接到空闲例程。

SQL> startup
ORACLE 例程已经启动。

Total System Global Area 1258291200 bytes
Fixed Size                  1250548 bytes
Variable Size             218106636 bytes
Database Buffers         1031798784 bytes
Redo Buffers                7135232 bytes
数据库装载完毕。
ORA-03113: 通信通道的文件结束

分析alert日志

Fri May 12 09:50:43 2017
ALTER DATABASE OPEN
Fri May 12 09:50:43 2017
Beginning crash recovery of 1 threads
 parallel recovery started with 15 processes
Fri May 12 09:50:43 2017
Started redo scan
Fri May 12 09:50:43 2017
Completed redo scan
 1240 redo blocks read, 277 data blocks need recovery
Fri May 12 09:50:44 2017
Started redo application at
 Thread 1: logseq 5881, block 41179
Fri May 12 09:50:44 2017
Recovery of Online Redo Log: Thread 1 Group 1 Seq 5881 Reading mem 0
  Mem# 0 errs 0: E:\ORACLE\PRODUCT\10.2.0\ORADATA\xff\REDO01.LOG
Fri May 12 09:50:44 2017
Completed redo application
Fri May 12 09:50:44 2017
Errors in file e:\oracle\product\10.2.0\admin\xff\bdump\xff_p006_6072.trc:
ORA-00600: internal error code, arguments: [6110], [193], [3], [], [], [], [], []

Fri May 12 09:50:44 2017
Hex dump of (file 3, block 14004) in trace file e:\oracle\product\10.2.0\admin\xff\bdump\xff_p000_6024.trc
Corrupt block relative dba: 0x00c036b4 (file 3, block 14004)
Bad header found during crash/instance recovery
Data in bad block:
 type: 255 format: 7 rdba: 0x06010601
 last change scn: 0xa206.a2060601 seq: 0xb4 flg: 0x36
 spare1: 0x1 spare2: 0x6 spare3: 0x673
 consistency value in tail: 0x1b0a0708
 check value in block header: 0x36b4
 computed block checksum: 0xe4f5
Fri May 12 09:50:44 2017
Hex dump of (file 9, block 65507) in trace file e:\oracle\product\10.2.0\admin\xff\bdump\xff_p003_6056.trc
Corrupt block relative dba: 0x0240ffe3 (file 9, block 65507)
Bad header found during crash/instance recovery
Data in bad block:
 type: 3 format: 6 rdba: 0x06020601
 last change scn: 0xa206.a2060602 seq: 0xe3 flg: 0xff
 spare1: 0x1 spare2: 0x6 spare3: 0x6dc
 consistency value in tail: 0xc1028001
 check value in block header: 0xffe3
 computed block checksum: 0xff01
Fri May 12 09:50:44 2017
Reread of rdba: 0x00c036b4 (file 3, block 14004) found different data
Fri May 12 09:50:44 2017
Reread of rdba: 0x0240ffe3 (file 9, block 65507) found different data
Fri May 12 09:50:44 2017
Errors in file e:\oracle\product\10.2.0\admin\xff\bdump\xff_p005_6060.trc:
ORA-00600: internal error code,arguments:[2037],[17442602],[2718302723],[255],[9],[203],[657105414],[2147549568]
Fri May 12 09:50:44 2017
Errors in file e:\oracle\product\10.2.0\admin\xff\bdump\xff_p000_6024.trc:
ORA-07445:exception encountered:core dump[ACCESS_VIOLATION][_kclcomplete+79][PC:0x72B0C7][ADDR:0x220][UNABLE_TO_READ][]
Fri May 12 09:50:44 2017
Errors in file e:\oracle\product\10.2.0\admin\xff\bdump\xff_p006_6072.trc:
ORA-07445: exception encountered:core dump[ACCESS_VIOLATION][_kcbzdh+2496][PC:0x4A4928][ADDR:0xB][UNABLE_TO_READ][]
ORA-00600: internal error code, arguments: [6110], [193], [3], [], [], [], [], []
Errors in file e:\oracle\product\10.2.0\admin\xff\bdump\xff_p012_6128.trc:
ORA-07445: exception encountered: core dump [ACCESS_VIOLATION] [_kcbs_dump_adv_state+723] 
                                 [PC:0x5975A3] [ADDR:0xCBC0CBB2] [UNABLE_TO_READ] []
ORA-00600:internal error code,arguments:[2037],[17430318],[2718303745],[128],[1],[203],[4147028486],[2147549568]

错误比较明显由于坏块导致应用日志恢复异常,主要错误集中在ORA-600 2037,ORA-7445 _kcbs_dump_adv_state,ORA-7445_kcbzdh,ORA-7445 _kclcomplete等

dbv检查数据文件

E:\>dbv file=E:\ORACLE\PRODUCT\10.2.0\ORADATA\xff\SYSAUX01.DBF

DBVERIFY: Release 10.2.0.1.0 - Production on 星期五 5月 12 09:57:39 2017

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

DBVERIFY - 开始验证: FILE = E:\ORACLE\PRODUCT\10.2.0\ORADATA\xff\SYSAUX01.DBF

页 13353 标记为损坏
Corrupt block relative dba: 0x00c03429 (file 3, block 13353)
Bad header found during dbv:
Data in bad block:
 type: 1 format: 6 rdba: 0x3429a206
 last change scn: 0x066f.066f3429 seq: 0x0 flg: 0x00
 spare1: 0x6 spare2: 0xa2 spare3: 0x8c96
 consistency value in tail: 0x06018001
 check value in block header: 0x0
 block checksum disabled

页 14004 标记为损坏
Corrupt block relative dba: 0x00c036b4 (file 3, block 14004)
Bad header found during dbv:
Data in bad block:
 type: 1 format: 6 rdba: 0x36b4a206
 last change scn: 0x0673.067336b4 seq: 0x0 flg: 0x00
 spare1: 0x6 spare2: 0xa2 spare3: 0xfb97
 consistency value in tail: 0x06010210
 check value in block header: 0x0
 block checksum disabled

页 15261 标记为损坏
Corrupt block relative dba: 0x00c03b9d (file 3, block 15261)
Bad header found during dbv:
Data in bad block:
 type: 2 format: 6 rdba: 0x3b9da206
 last change scn: 0x0673.06733b9d seq: 0x0 flg: 0x00
 spare1: 0x6 spare2: 0xa2 spare3: 0x0
 consistency value in tail: 0x06018001
 check value in block header: 0x5549
 block checksum disabled



DBVERIFY - 验证完成

检查的页总数: 58880
处理的页总数 (数据): 19318
失败的页总数 (数据): 0
处理的页总数 (索引): 18610
失败的页总数 (索引): 0
处理的页总数 (其它): 13747
处理的总页数 (段)  : 0
失败的总页数 (段)  : 0
空的页总数: 7202
标记为损坏的总页数: 3
流入的页总数: 0
最高块 SCN            : 178325323 (0.178325323)


E:\>dbv file=E:\ORACLE\PRODUCT\10.2.0\ORADATA\xff\xff_BSE02

DBVERIFY: Release 10.2.0.1.0 - Production on 星期五 5月 12 10:10:24 2017

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

DBVERIFY - 开始验证: FILE = E:\ORACLE\PRODUCT\10.2.0\ORADATA\xff\xff_BSE02

页 65507 标记为损坏
Corrupt block relative dba: 0x0240ffe3 (file 9, block 65507)
Bad header found during dbv:
Data in bad block:
 type: 2 format: 6 rdba: 0xffe3a206
 last change scn: 0x06dc.06dcffe3 seq: 0x0 flg: 0x00
 spare1: 0x6 spare2: 0xa2 spare3: 0xb32
 consistency value in tail: 0x060102ff
 check value in block header: 0x0
 block checksum disabled



DBVERIFY - 验证完成

检查的页总数: 1310720
处理的页总数 (数据): 34102
失败的页总数 (数据): 0
处理的页总数 (索引): 30270
失败的页总数 (索引): 0
处理的页总数 (其它): 10850
处理的总页数 (段)  : 0
失败的总页数 (段)  : 0
空的页总数: 1235497
标记为损坏的总页数: 1
流入的页总数: 0
最高块 SCN            : 178325221 (0.178325221)

确实如alert日志报错,file 3和9 都出现坏块导致实例恢复无法进行。根据错误ORA-600 2037和ORA-7445 _kcbs_dump_adv_state,初步判断和During Startup (Open Database) Alert Log Shows ORA-600[2037] and ORA-7445[kcbs_dump_adv_state] (Doc ID 551993.1)文章描述相符(而且版本也相符)

尝试recover datafile部分file

E:\>sqlplus / as sysdba

SQL*Plus: Release 10.2.0.1.0 - Production on 星期五 5月 12 10:16:00 2017

Copyright (c) 1982, 2005, Oracle.  All rights reserved.


连接到:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> recover datafile 1;
完成介质恢复。
SQL> recover datafile 2;
完成介质恢复。
SQL> recover datafile 3;
完成介质恢复。
SQL> recover datafile 4;
完成介质恢复。
SQL> recover datafile 9;
完成介质恢复。
SQL> alter database open;
alter database open
*
第 1 行出现错误:
ORA-00600: 内部错误代码, 参数: [kcratr1_lastbwr], [], [], [], [], [], [], []

ORA-00600 kcratr1_lastbwr错误比较明显,见ORA-00600:[Kcratr1_lastbwr] During Database Startup after a Crash (Doc ID 393984.1)

通过recover database处理

SQL> recover database;
完成介质恢复。
SQL> alter database open;

数据库已更改。

然后通过查询dba_extents 处理坏块对象

补充ORA-600 2037错误

Format: ORA-600 [2037] [a] [b] 1 [d] [e] [f] [g]


VERSIONS:
  versions 8.0 and above

DESCRIPTION:

  During recovery we are examining a block to ensure that it is not
  corrupt prior to applying any change vectors.

  The block has failed this check and this exception is raised.

ARGUMENTS:
  Arg [a] Relative Data Block Address (RDBA) that the redo vector is for
  Arg [b] The Block format  
  Arg 1 RDBA in the block itself
  Arg [d] The block type
  Arg [e] The sequence number
  Arg [f] Flags, if set  
  Arg [g] The return value from the block head/tail checker.