再现ORA-600 4000故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:再现ORA-600 4000故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有一个10g的库,由于redo损坏导致无法正常recover成功
20231018111911


正常途径无法open成功,尝试强制打开库

Wed Oct 18 11:23:25 2023
alter database open resetlogs
Wed Oct 18 11:23:25 2023
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 236715370
Resetting resetlogs activation ID 4009817354 (0xef00f50a)
Wed Oct 18 11:23:26 2023
Setting recovery target incarnation to 3
Wed Oct 18 11:23:26 2023
Assigning activation ID 4271120987 (0xfe94225b)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: D:\ORADATA\EAIS\REDO01.LOG
Successful open of redo thread 1
Wed Oct 18 11:23:26 2023
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Wed Oct 18 11:23:26 2023
SMON: enabling cache recovery
Wed Oct 18 11:23:27 2023
Errors in file d:\hys_db\admin\eais\udump\eais_ora_10916.trc:
ORA-00600: 内部错误代码, 参数: [4000], [9], [], [], [], [], [], []

Wed Oct 18 11:23:28 2023
Errors in file d:\hys_db\admin\eais\udump\eais_ora_10916.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00600: 内部错误代码, 参数: [4000], [9], [], [], [], [], [], []

Wed Oct 18 11:23:28 2023
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Wed Oct 18 11:23:28 2023
Errors in file d:\hys_db\admin\eais\bdump\eais_pmon_1764.trc:
ORA-00704: bootstrap process failure

Wed Oct 18 11:23:29 2023
Errors in file d:\hys_db\admin\eais\bdump\eais_reco_9628.trc:
ORA-00704: bootstrap process failure

Wed Oct 18 11:23:29 2023
Errors in file d:\hys_db\admin\eais\bdump\eais_smon_9884.trc:
ORA-00704: bootstrap process failure

Wed Oct 18 11:23:29 2023
Errors in file d:\hys_db\admin\eais\bdump\eais_ckpt_10096.trc:
ORA-00704: bootstrap process failure

Wed Oct 18 11:23:29 2023
Errors in file d:\hys_db\admin\eais\bdump\eais_lgwr_8932.trc:
ORA-00704: bootstrap process failure

Wed Oct 18 11:23:29 2023
Errors in file d:\hys_db\admin\eais\bdump\eais_dbw0_9892.trc:
ORA-00704: bootstrap process failure

Wed Oct 18 11:23:29 2023
Errors in file d:\hys_db\admin\eais\bdump\eais_mman_9472.trc:
ORA-00704: bootstrap process failure

Wed Oct 18 11:23:29 2023
Errors in file d:\hys_db\admin\eais\bdump\eais_psp0_7692.trc:
ORA-00704: bootstrap process failure

Instance terminated by USER, pid = 10916
ORA-1092 signalled during: alter database open resetlogs...

resetlogs失败,报ora-600 4000错误,查看相关trace文件

*** 2023-10-18 11:23:27.103
ksedmp: internal or fatal error
ORA-00600: 内部错误代码, 参数: [4000], [9], [], [], [], [], [], []
Current SQL statement for this session:
select ctime, mtime, stime from obj$ where obj# = :1
----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
_ksedst+38           CALLrel  _ksedst1+0           0 1
_ksedmp+898          CALLrel  _ksedst+0            0
_ksfdmp+14           CALLrel  _ksedmp+0            3
603A816A             CALLreg  00000000             6AE8258 3
603A8550             CALLrel  603A80D8             6AE8258 4FB1090 FA0 1 8C886B0
__VInfreq__ktudba+1  CALLrel  _kgeasi+0            6AE8258 4FB1090 FA0 2 1 0 9 0
26                                                 
_ktrgcm+851          CALLrel  _ktudba+0            9 8C88B68 0 0
_ktrgtc+526          CALLrel  _ktrgcm+0            4FC753C
_kdsgrp+937          CALLrel  _ktrgtc+0            4FC753C 4FC74E0 1F1FB00
                                                   8C88CB4 240 2AED19C 1F1F71C
_kdsfbrcb+432        CALLrel  _kdsgrp+0            4FC7538 0 4FC7538
_qertbFetchByRowID+  CALLrel  _kdsfbrcb+0          4FC7538 4FDF004 0 1 0 0
4344                                               4FC74E0 22E8078 8C88E14 0
_opifch2+3104        CALL???  00000000             8DA05838 20135FC 8C890AC 1
_opifch+51           CALLrel  _opifch2+0           89 5 8C891F0
_opiodr+1099         CALLreg  00000000             5 2 8C89938
_rpidrus+178         CALLrel  _opiodr+0            5 2 8C89938 5
_rpidru+88           CALLrel  _rpidrus+0           8C894C4
_rpiswu2+426         CALLreg  00000000             8C8988C
_rpidrv+1461         CALLrel  _rpiswu2+0           90F3CCB4 0 8C89860 2 8C89880
                                                   0 8C89860 0 84DF58 84E018
                                                   8C8988C 8
_rpifch+44           CALLrel  _rpidrv+0            5 5 8C89938 8
_kqdpts+166          CALLrel  _rpifch+0            5 5 5 3 90AE2383 7 C 0 0 0 0
                                                   0 0 0 5 2 90AE237C 7 C 0 0 0
                                                   0 0 0 0
_kqrlfc+478          CALLrel  _kqdpts+0            90AE21F4
_kqlbplc+102         CALLrel  _kqrlfc+0            
_kqlblfc+240         CALLrel  _kqlbplc+0           0
_adbdrv+12911        CALLrel  _kqlblfc+0           0 8C8D034
_opiexe+11778        CALLrel  _adbdrv+0            
_opiosq0+6088        CALLrel  _opiexe+0            4 0 8C8D894
_kpooprx+232         CALLrel  _opiosq0+0           3 E 8C8D9AC A4
_kpoal8+775          CALLrel  _kpooprx+0           8C8F6CC 8C8E248 1D 1 0 A4
_opiodr+1099         CALLreg  00000000             5E 17 8C8F6C8
60FEFF8D             CALLreg  00000000             5E 17 8C8F6C8 0
_opitsk+1017         CALL???  00000000             
_opiino+1087         CALLrel  _opitsk+0            0 0
_opiodr+1099         CALLreg  00000000             3C 4 8C8FC60
_opidrv+819          CALLrel  _opiodr+0            3C 4 8C8FC60 0
_sou2o+45            CALLrel  _opidrv+0            3C 4 8C8FC60
_opimai_real+112     CALLrel  _sou2o+0             8C8FC54 3C 4 8C8FC60
_opimai+92           CALLrel  _opimai_real+0       2 8C8FC8C
_OracleThreadStart@  CALLrel  _opimai+0            
4+708                                              
74A93438             CALLptr  00000000             
76F09830             CALLreg  00000000             
76F09800             CALLrel  76F0980B             
 
--------------------- Binary Stack Dump ---------------------

通过一些技巧处理数据库正常open,导出数据完成本次恢复,参考以前恢复类似case:

重现ORA-600 4000异常
通过bbed解决ORA-00600[4000]案例
记录一次ORA-600 4000数据库故障恢复
ORACLE 8.1.7 数据库ORA-600 4000故障恢复

记录一次ORA-600 4000数据库故障恢复

ORA-600[4000]错误
一朋友数据库因为当前redo丢失,在恢复的过程中启动报ORA-600[4000]错误

SMON: enabling cache recovery
Thu May 30 16:24:17 2013
Errors in file /u02/oracle/app/oracle/admin/xifenfei/udump/xifenfei1_ora_1458370.trc:
ORA-00600: internal error code, arguments: [4000], [83], [], [], [], [], [], []
Thu May 30 16:24:19 2013
Errors in file /u02/oracle/app/oracle/admin/xifenfei/udump/xifenfei1_ora_1458370.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [4000], [83], [], [], [], [], [], []
Thu May 30 16:24:19 2013
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 1458370
ORA-1092 signalled during: alter database open resetlogs...

分析trace文件

*** 2013-05-30 16:24:17.979
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [4000], [83], [], [], [], [], [], []
Current SQL statement for this session:
select ctime, mtime, stime from obj$ where obj# = :1
--确定是obj$对象异常,通过某种手段找到obj$的objid和dataobjid均为16,对应16进制为12

Block header dump:  0x0040007a
 Object id on Block? Y
 seg/obj: 0x12  csc: 0xc1e.a329e76f  itc: 1  flg: -  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0053.02a.000598bd  0x0d407e46.4f52.2f  --U-    1  fsc 0x0000.a329e772

这里比较明显obj$对象在rdba为0040007a的block上,scn为0c1e.a329e76f(13325725984623)且未提交的事务,这样的现象就决定了处理的特殊性(不是因为块延迟清理导致访问undo现象,该现象直接推进scn解决,而该情况不行)

数据文件头scn

SQL> SELECT DISTINCT CHECKPOINT# FROM V$DATAFILE_HEADER;

    CHECKPOINT_CHANGE#
-------------------------
           13324676536960

bbed查看文件头scn

   struct kcvfhckp, 160 bytes               @484     
      struct kcvcpscn, 8 bytes              @484     
         ub4 kscnbas                        @484      0x649c9a80
         ub2 kscnwrp                        @488      0x0c1e

这里看到的文件头scn也是为13324676536960(0c1e.649c9a80)和sql查询结果一致,也就是说数据库中的obj$的某个对象含有事务,且scn大于文件头scn(因为当前redo丢失,无法前滚,所以出现该情况),当数据库访问obj$的时候,为了事务的一致性,就需要访问undo(这里提示为83 回滚段),而undo异常,所以smon进程回滚失败,数据库无法正常启动

使用bbed提交事务

BBED> map
 File: /oradata/sys/xifenfei/system01.dbf (1)
 Block: 122                                   Dba:0x0040007a
------------------------------------------------------------
 KTB Data Block (Table/Cluster)

 struct kcbh, 20 bytes                      @0       

 struct ktbbh, 48 bytes                     @20      

 struct kdbh, 14 bytes                      @68      

 struct kdbt[1], 4 bytes                    @82      

 sb2 kdbr[108]                              @86      

 ub1 freespace[802]                         @302     

 ub1 rowdata[7084]                          @1104    

 ub4 tailchk                                @8188    


BBED> p ktbbh
struct ktbbh, 48 bytes                      @20      
   ub1 ktbbhtyp                             @20       0x01 (KDDBTDATA)
   union ktbbhsid, 4 bytes                  @24      
      ub4 ktbbhsg1                          @24       0x00000012
      ub4 ktbbhod1                          @24       0x00000012
   struct ktbbhcsc, 8 bytes                 @28      
      ub4 kscnbas                           @28       0xa329e76f
      ub2 kscnwrp                           @32       0x0c1e
   b2 ktbbhict                              @36       1
   ub1 ktbbhflg                             @38       0x02 (NONE)
   ub1 ktbbhfsl                             @39       0x00
   ub4 ktbbhfnx                             @40       0x00000000
   struct ktbbhitl[0], 24 bytes             @44      
      struct ktbitxid, 8 bytes              @44      
         ub2 kxidusn                        @44       0x0053
         ub2 kxidslt                        @46       0x002a
         ub4 kxidsqn                        @48       0x000598bd
      struct ktbituba, 8 bytes              @52      
         ub4 kubadba                        @52       0x0d407e46
         ub2 kubaseq                        @56       0x4f52
         ub1 kubarec                        @58       0x2f
      ub2 ktbitflg                          @60       0x2001 (KTBFUPB)<--需要提交
      union _ktbitun, 2 bytes               @62      
         b2 _ktbitfsc                       @62       0
         ub2 _ktbitwrp                      @62       0x0000
      ub4 ktbitbas                          @64       0xa329e772

BBED> set count 32
        COUNT           32

BBED> set offset 60
        OFFSET          60

BBED> d
 File: /oradata/sys/xifenfei/system01.dbf (1)
 Block: 122              Offsets:   60 to   91           Dba:0x0040007a
------------------------------------------------------------------------
 20010000 a329e772 0001006c ffff00ea 040c0368 03680000 006c1f7c 1f3c1efb 

 <32 bytes per line>

BBED> m /x 8001
 File: /oradata/sys/xifenfei/system01.dbf (1)
 Block: 122              Offsets:   60 to   91           Dba:0x0040007a
------------------------------------------------------------------------
 80010000 a329e772 0001006c ffff00ea 040c0368 03680000 006c1f7c 1f3c1efb 

 <32 bytes per line>

BBED> sum apply
Check value for File 1, Block 122:
current = 0xafd6, required = 0xafd6

尝试open数据库ORA-600[2662]解决

Thu May 30 21:16:00 2013
Errors in file /u02/oracle/app/oracle/admin/xifenfei/bdump/xifenfei1_smon_819664.trc:
ORA-00600: internal error code, arguments:[2662],[3102],[2737532996],[3102],[2745973074],[4194397],[],[]
Non-fatal internal error happenned while SMON was doing non-existent object cleanup.
SMON encountered 1 out of maximum 100 non-fatal internal errors.
Thu May 30 21:16:01 2013
Trace dumping is performing id=[cdmp_20130530211601]
Thu May 30 21:16:02 2013
Errors in file /u02/oracle/app/oracle/admin/xifenfei/bdump/xifenfei1_smon_819664.trc:
ORA-00600: internal error code, arguments:[2662],[3102],[2737532997],[3102],[2745973074],[4194397],[],[]
Thu May 30 21:16:03 2013
Non-fatal internal error happenned while SMON was doing logging scn->time mapping.
SMON encountered 2 out of maximum 100 non-fatal internal errors.
Thu May 30 21:16:05 2013
Errors in file /u02/oracle/app/oracle/admin/xifenfei/bdump/xifenfei1_smon_819664.trc:
ORA-00600: internal error code, arguments:[2662],[3102],[2737532997],[3102],[2745973074],[4194397],[],[]
Thu May 30 21:16:08 2013
Errors in file /u02/oracle/app/oracle/admin/xifenfei/bdump/xifenfei1_pmon_958764.trc:
ORA-00474: SMON process terminated with error
Thu May 30 21:16:08 2013
PMON: terminating instance due to error 474

数据库在open过程中遇到大量ORA-00600[2662],这个是因为数据库中文件头的scn小于访问的数据块scn导致该问题,解决方法推荐scn,如果数据库的scn本身就很大(和时间理论scn较接近),推进过程中可能遇到如下错误,这个时候就需要选择合适的方法/合适的值来推进scn

SQL> startup pfile=/home/oracle/pfile force
ORACLE instance started.

Total System Global Area 5.5835E+10 bytes
Fixed Size                  2177056 bytes
Variable Size            3.2867E+10 bytes
Database Buffers         2.2951E+10 bytes
Redo Buffers               14598144 bytes
Database mounted.
ORA-01052: required destination LOG_ARCHIVE_DUPLEX_DEST is not specified

后面的工作因为没有redo前滚,而且该库故障时有大量事务在跑,现在无法前滚,导致大量的undo回滚段异常,index和data不一致等故障,需要做的就是屏蔽undo seg,重建undo,重建库