ORA-600 2663分析

在大家熟悉的ORA-600 2662错误中还有一个类似的ORA-600 2663的错误,以下是对该2663的解释说明

ERROR:
  Format: ORA-600 [2663] [a] [b] {c} [d]

VERSIONS:
  versions 10.1 to 11.1

DESCRIPTION:
  A data block SCN is ahead of the current SCN.

  The ORA-600 [2663] occurs when an SCN is compared to the dependent SCN 
  stored in a UGA variable.

  If the SCN is less than the dependent SCN then we signal the ORA-600 [2663]
  internal error.

ARGUMENTS:
  Arg [a]  Current SCN WRAP
  Arg [b]  Current SCN BASE
  Arg {c}  dependent SCN WRAP
  Arg [d]  dependent SCN BASE 

FUNCTIONALITY:
  Kernel Cache Redo File Redo Generation

IMPACT:
  INSTANCE FAILURE
  POSSIBLE PHYSICAL CORRUPTION

SUGGESTIONS:
  There are different situations where ORA-600 [2663] can be raised.

  It can be raised on startup or duing database operation.

  If not using RAC, Real Application Clusters,, check that 2 instances 
  have not mounted the same database.

  Check for SMON traces and have the alert.log and trace files ready
  to send to support.

  Check the SCN difference [argument d]-[argument b].

  If the SCNs in the error are very close, then try to shutdown and startup
  the instance several times. 

  In some situations, the SCN increment during startup may permit the 
  database to open. Keep track of the number of times you attempted a 
  startup.

ORA-600 2662分析

在数据库恢复中,ORA-600 2662我想是很多人都非常熟悉的错误,下文是对于该错误的一些解释
ORA-600 2662解释说明

ERROR:              

  Format: ORA-600 [2662] [a] [b] {c} [d] [e]
 
VERSIONS:
  versions 6.0 to 12.1
 
DESCRIPTION:

  A data block SCN is ahead of the current SCN.

  The ORA-600 [2662] occurs when an SCN is compared to the dependent SCN 
  stored in a UGA variable.

  If the SCN is less than the dependent SCN then we signal the ORA-600 [2662]
  internal error.

ARGUMENTS:
  Arg [a]  Current SCN WRAP
  Arg [b]  Current SCN BASE
  Arg {c}  dependent SCN WRAP
  Arg [d]  dependent SCN BASE 
  Arg [e]  Where present this is the DBA where the dependent SCN came from.

出现ORA-600 2662可能的原因

  (1) doing an open resetlogs with _ALLOW_RESETLOGS_CORRUPTION enabled    
  (2) a hardware problem, like a faulty controller, resulting in a failed 
      write to the control file or the redo logs     
  (3) restoring parts of the database from backup and not doing the 
      appropriate recovery     
  (4) restoring a control file and not doing a RECOVER DATABASE USING BACKUP 
      CONTROLFILE     
  (5) having _DISABLE_LOGGING set during crash recovery                      
  (6) problems with the DLM in a parallel server environment      
  (7) a bug   

ORA-600 2662解决方法

   (1) if the SCNs in the error are very close, attempting a startup several 
       times will bump up the dscn every time we open the database even if 
       open fails. The database will open when dscn=scn.
      
   (2)You can bump the SCN either on open or while the database is open 
      using Event:ADJUST_SCN 
      Be aware that you should rebuild the database if you use this
      option.

_ALLOW_RESETLOGS_CORRUPTION 参数

我相信_ALLOW_RESETLOGS_CORRUPTION 这个参数一定很多人都熟悉,是redo异常恢复的杀手锏之一,以下文章是来自官方的解释

DB_Parameter _ALLOW_RESETLOGS_CORRUPTION 
========================================
 
This documentation has been prepared avoiding the mention of the complex 
structures from the code and to simply give an insight to the 'damage it could 
cause'.  The usage of this parameter leads to an in-consistent Database with no 
other alternative but to rebuild the complete Database.  This parameter could 
be used when we realize that there are no stardard options available and are 
convinced that the customer understands the implications of using the Oracle's 
secret parameter.  The factors to be considered are ;-- 
 
1. Customer does not have a good backup. 
2. A lot of time and money has been invested after the last good backup and     
   there is no possibility for reproduction of the lost data. 
3. The customer has to be ready to export the full database and import it     
   back after creating a new one. 
4. There is no 100% guarantee that by using this parameter the database would 
   come up. 
5. Oracle does not support the database after using this parameter for        
   recovery.    
6. ALL OPTIONS including the ones mentioned in the action part of the error   
   message have been tried. 
 
 

 
By setting _ALLOW_RESETLOGS_CORRUPTION=TRUE, certain consistency checks are 
SKIPPED during database open stage.  This basically means it does not check 
the datafile headers as to what the status was before the shutdown and how it 
was shutdown.  The following cases mention few of the checks that were skipped. 
 
Case-I 
------ 
Verification that the datafile present has not been restored from a BACKUP 
taken before the database was opened successfully by using RESETLOGS.   

ORA-01190: control file or data file %s is from before the last RESETLOGS
    Cause: Attempting to use a data file when the log reset information in  
           the file does not match the control file.  Either the data file or  
           the control file is a backup that was made before the most recent 
           ALTER DATABASE OPEN RESETLOGS. 
   Action: Restore file from a more recent backup. 

 
Case-II 
------- 
Verification that the status bit of the datafile is not in a FUZZY state. 
The datafile could be in this state due to the database going down when the  
 - Datafile was on-line and open 
 - Datafile was not closed cleanly (maybe due to OS). 

ORA-01194: file %s needs more recovery to be consistent 
    Cause: An incomplete recover session was started, but an insufficient 
           number of logs were applied to make the file consistent.  The  
           reported file was not closed cleanly when it was last opened by 
           the database.  It must be recovered to a time when it was not  
           being updated.  The most likely cause of this error is forgetting 
           to restore the file from a backup before doing incomplete  
           recovery. 
   Action: Either apply more logs until the file is consistent or restore the 
           file from an older backup and repeat recovery. 
 

Case-III 
-------- 
Verification that the COMPLETE recover strategies have been applied for 
recovering the datafile and not any of the INCOMPLETE recovery options.  
Basically because the complete recovery is one in which we even apply the 
ON-LINE redo log files and open the DB without reseting the logs. 

ORA-01113: file '%s' needs media recovery starting at log sequence # %s 
    Cause: An attempt was made to open a database file that is in need of  
           media recovery. 
   Action: First apply media recovery to the file. 

 
Case-IV 
------- 
Verification that the datafile has been recovered through an END BACKUP if the 
control file indicates that it was in backup mode. 
This is useful when the DB has crashed while in hot backup mode and we lost 
all log files in DB version's less than V7.2. 

ORA-01195: on-line backup of file %s needs more recovery to be consistent" 
    Cause: An incomplete recovery session was started, but an insufficient  
           number of logs were applied to make the file consistent.  The 
           reported file is an on-line backup which must be recovered to the 
           time the backup ended. 
   Action: Either apply more logs until the file is consistent or resotre 
          the database files from an older backup and repeat recovery. 
 
In version 7.2, we could simply issue the ALTER DATABASE DATAFILE xxxx END 
BACKUP statement and proceed with the recovery.  But again to issue this 
statement, we need to have the ON-LINE redo logs or else we still are forced to
use this parameter. 
 

Case-V 
------ 
Verification that the data file status is not still in (0x10) MEDIA recovery 
FUZZY. 
When recovery is started, a flag is set in the datafile header status flag to 
indicate that the file is presently in media recovery.  This is reset when 
recovery is completed and at times when it has not been reset we are forced to 
use this paramter. 

ORA-01196: file %s is inconsistent due to a failed media recovery session 
    Cause: The file was being recovered but the recovery did not terminate 
           normally.  This left the file in an inconsistent state.  No more  
           recovery was successfully completed on this file. 
   Action: Either apply more logs until the file is consistent or restore the 
           backup again and repeat recovery. 
 
 
Case-VI 
------- 
Verification that the datafile has been restored form a proper backup to 
correspond with the log files.  This situation could happen when we have 
decided that the data file is invalid since its SCN is ahead of the last 
applied logs SCN but it has not failed on one of the ABOVE CHECKS. 

ORA-01152: file '%s' was not restored from a sufficientluy old backup" 
    Cause: A manual recovery session was started, but an insufficient number 
           of logs were applied to make the database consistent.  This file is 
           still in the future of the last log applied.  Note that this  
           mistake can not always be caught. 
   Action: Either apply more logs until the database is consistent or 
           restore the database file from an older backup and repeat  
           recovery.

使用_ALLOW_RESETLOGS_CORRUPTION 参数需谨慎,因为该参数可能导致数据库逻辑不一致,甚至可能把本来很简单的一个恢复弄的非常复杂甚至不可恢复的后果,建议在oracle support支持下使用.另外使用该参数resetlogs库之后,强烈建议通过逻辑方式重建库

ORA-600 999

有网友数据库启动报ORA-600 999错误,无法正常open,让我们介入分析,帮忙恢复其中部分数据
数据库启动报ORA-600 999

Sun Jul 31 23:09:36 2016
SMON: enabling cache recovery
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_3356.trc  (incident=179779):
ORA-00600: internal error code, arguments: [999], [0x7FFAE748013], [], [], [], [], [], [], [], [], [], []
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_179779\orcl_smon_3356_i179779.trc
No Resource Manager plan active
Starting background process QMNC
Sun Jul 31 23:09:37 2016
QMNC started with pid=20, OS id=5068 
ORACLE Instance orcl (pid = 13) - Error 600 encountered while recovering transaction (7, 1).
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_3356.trc:
ORA-00600: internal error code, arguments: [999], [0x7FFAE748013], [], [], [], [], [], [], [], [], [], []
Completed: alter database open
Sun Jul 31 23:09:38 2016
db_recovery_file_dest_size of 8680 MB is 0.00% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Trace dumping is performing id=[cdmp_20160731230939]
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_3356.trc  (incident=179785):
ORA-00600: internal error code, arguments: [999], [0x7FFAE748013], [], [], [], [], [], [], [], [], [], []
ORACLE Instance orcl (pid = 13) - Error 600 encountered while recovering transaction (7, 1).
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_3356.trc:
ORA-00600: internal error code, arguments: [999], [0x7FFAE748013], [], [], [], [], [], [], [], [], [], []
Sun Jul 31 23:09:41 2016
Starting background process CJQ0
Sun Jul 31 23:09:41 2016
CJQ0 started with pid=25, OS id=2572 
Process debug not enabled via parameter _debug_enable
Trace dumping is performing id=[cdmp_20160731230942]
PMON (ospid: 3948): terminating the instance due to error 474
Sun Jul 31 23:09:48 2016
opiodr aborting process unknown ospid (2592) as a result of ORA-1092
Sun Jul 31 23:09:48 2016
ORA-1092 : opitsk aborting process
Sun Jul 31 23:09:52 2016
Instance terminated by PMON, pid = 3948

设置_offline_rollback_segments数据库启动正常

Sun Jul 31 23:18:13 2016
ALTER DATABASE OPEN
Thread 1 opened at log sequence 16
  Current log# 1 seq# 16 mem# 0: D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO01.LOG
Successful open of redo thread 1
SMON: enabling cache recovery
Successfully onlined Undo Tablespace 5.
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
No Resource Manager plan active
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_4372.trc  (incident=182188):
ORA-00600: internal error code, arguments: [kdBlkCheckError], [3], [224], [38508], [], [], [], [], [], [], [], []
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_182188\orcl_smon_4372_i182188.trc
Doing block recovery for file 3 block 224
Resuming block recovery (PMON) for file 3 block 224
Block recovery from logseq 16, block 2945 to scn 15431544
Recovery of Online Redo Log: Thread 1 Group 1 Seq 16 Reading mem 0
  Mem# 0: D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO01.LOG
Trace dumping is performing id=[cdmp_20160731231815]
Block recovery stopped at EOT rba 16.2952.16
Block recovery completed at rba 16.2952.16, scn 0.15431543
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_4372.trc:
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [kdBlkCheckError], [3], [224], [38508], [], [], [], [], [], [], [], []
Sun Jul 31 23:18:19 2016
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_4372.trc  (incident=182189):
ORA-00600: internal error code, arguments: [kdBlkCheckError], [3], [224], [38508], [], [], [], [], [], [], [], []
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_182189\orcl_smon_4372_i182189.trc
Starting background process QMNC
Sun Jul 31 23:18:19 2016
QMNC started with pid=20, OS id=4920 
Doing block recovery for file 3 block 224
Resuming block recovery (PMON) for file 3 block 224
Block recovery from logseq 16, block 2945 to scn 15431544
Recovery of Online Redo Log: Thread 1 Group 1 Seq 16 Reading mem 0
  Mem# 0: D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO01.LOG
Block recovery completed at rba 16.2952.16, scn 0.15431545
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_4372.trc:
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [kdBlkCheckError], [3], [224], [38508], [], [], [], [], [], [], [], []
Starting background process SMCO
Sun Jul 31 23:18:19 2016
SMCO started with pid=21, OS id=3176 
Sun Jul 31 23:18:20 2016
Trace dumping is performing id=[cdmp_20160731231820]
Completed: ALTER DATABASE OPEN

尝试删除异常回滚段

Sun Jul 31 23:15:07 2016
drop rollback segment "_SYSSMU7_1101470402$"
Sun Jul 31 23:15:07 2016
Corrupt Block Found
         TSN = 2, TSNAME = UNDOTBS1
         RFN = 3, BLK = 224, RDBA = 12583136
         OBJN = -1, OBJD = -1, OBJECT = , SUBOBJECT = 
         SEGMENT OWNER = , SEGMENT TYPE = 
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_5300.trc  (incident=181035):
ORA-00600: 内部错误代码, 参数: [kdBlkCheckError], [3], [224], [38508], [], [], [], [], [], [], [], []
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_181035\orcl_ora_5300_i181035.trc
Doing block recovery for file 3 block 224
Resuming block recovery (PMON) for file 3 block 224
Block recovery from logseq 14, block 8682 to scn 15397854
Recovery of Online Redo Log: Thread 1 Group 2 Seq 14 Reading mem 0
  Mem# 0: D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO02.LOG
Block recovery completed at rba 14.8688.16, scn 0.15397855
ORA-607 signalled during: drop rollback segment "_SYSSMU7_1101470402$"...
Corrupt Block Found
         TSN = 2, TSNAME = UNDOTBS1
         RFN = 3, BLK = 224, RDBA = 12583136
         OBJN = -1, OBJD = -1, OBJECT = , SUBOBJECT = 
         SEGMENT OWNER = , SEGMENT TYPE = 

从这里看,我们可以确定file 3 block 224异常,导致删除回滚段异常.和mos官方给出来的案例类似,由于undo坏块导致数据库报ORA-600 999错误

mos中ORA-600 999报错信息
官方的益处ORA-600[999]报错,也是由于undo坏块引起和本文的报错基本上一致
ORA-600-999


因为只要部分数据,直接屏蔽回滚段,数据库不再crash,导出需要对象即可

找出来asm 磁盘组中数据文件别名对应的文件号

前段时间有多个朋友问我,在amdu中,如果数据文件命名不是omf的方式,该如何找出来数据文件的asm file_number,从而实现通过amdu对不能mount的磁盘组中的数据文件进行恢复,这里通过测试给出来处理方法.根据我们对asm的理解,asm file_number 6为asm file的别名文件记录所在地,我们通过分析kfed这些au中的记录即可获得相关数据文件的别名对应的asm文件号

模拟各种别名

D:\app\product\10.2.0\db_1\bin>sqlplus / as sysdba

SQL*Plus: Release 10.2.0.3.0 - Production on 星期三 7月 27 22:48:48 2016

Copyright (c) 1982, 2006, Oracle.  All Rights Reserved.


连接到:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> select name from v$datafile;

NAME
--------------------------------------------------------------------------------
+DATA/ora10g/datafile/system.256.914797317
+DATA/ora10g/datafile/undotbs1.258.914797317
+DATA/ora10g/datafile/sysaux.257.914797317
+DATA/ora10g/datafile/users.259.914797317

SQL> create tablespace xifenfei
  2  datafile '+data/xifenfei01.dbf' size 10M;

表空间已创建。

SQL> alter tablespace xifenfei add
  2  datafile '+data/ora10g/datafile/xifenfei02.dbf' size 10m;

表空间已更改。

SQL> alter tablespace xifenfei add
  2  datafile '+data/ora10g/xifenfei03.dbf' size 10m;

表空间已更改。

SQL> select name from v$datafile;

NAME
--------------------------------------------------------------------------------
+DATA/ora10g/datafile/system.256.914797317
+DATA/ora10g/datafile/undotbs1.258.914797317
+DATA/ora10g/datafile/sysaux.257.914797317
+DATA/ora10g/datafile/users.259.914797317
+DATA/xifenfei01.dbf
+DATA/ora10g/datafile/xifenfei02.dbf
+DATA/ora10g/xifenfei03.dbf

已选择7行。

分析磁盘组和别名信息

SQL> select name from v$asm_disk;

NAME
------------------------------
DATA_0000
DATA_0001

SQL> select path from v$asm_disk;

PATH
-----------------------------------------
H:\ASMDISK\ASMDISK1.DD
H:\ASMDISK\ASMDISK2.DD

SQL> SELECT NAME,FILE_NUMBER FROM V$ASM_ALIAS where file_number<>4294967295;

NAME                           FILE_NUMBER
------------------------------ -----------
SYSTEM.256.914797317                   256
SYSAUX.257.914797317                   257
UNDOTBS1.258.914797317                 258
USERS.259.914797317                    259
XIFENFEI.266.918341361                 266
XIFENFEI.267.918341389                 267
xifenfei02.dbf                         267
XIFENFEI.268.918341409                 268
Current.260.914797381                  260
group_1.261.914797385                  261
group_2.262.914797385                  262
group_3.263.914797387                  263
TEMP.264.914797393                     264
spfile.265.914797421                   265
spfileora10g.ora                       265
xifenfei03.dbf                         268
xifenfei01.dbf                         266

已选择17行。

SQL> SELECT NAME,FILE_NUMBER FROM V$ASM_ALIAS;

NAME                           FILE_NUMBER
------------------------------ -----------
ORA10G                          4294967295
DATAFILE                        4294967295
SYSTEM.256.914797317                   256
SYSAUX.257.914797317                   257
UNDOTBS1.258.914797317                 258
USERS.259.914797317                    259
XIFENFEI.266.918341361                 266
XIFENFEI.267.918341389                 267
xifenfei02.dbf                         267
XIFENFEI.268.918341409                 268
CONTROLFILE                     4294967295
Current.260.914797381                  260
ONLINELOG                       4294967295
group_1.261.914797385                  261
group_2.262.914797385                  262
group_3.263.914797387                  263
TEMPFILE                        4294967295
TEMP.264.914797393                     264
PARAMETERFILE                   4294967295
spfile.265.914797421                   265
spfileora10g.ora                       265
xifenfei03.dbf                         268
xifenfei01.dbf                         266

已选择23行。

从sql查询,我们可以确定xifenfei0n.dbf对应的文件号分别为:xifenfei01.dbf==>266,xifenfei02.dbf==>267,xifenfei03.dbf==>268

通过kfed file 6所在位置

www.xifenfei.com>kfed read H:\ASMDISK\ASMDISK1.DD |grep f1b1
kfdhdb.f1b1locn:                      2 ; 0x0d4: 0x00000002
kfdhdb.f1b1fcn.base:                  0 ; 0x100: 0x00000000
kfdhdb.f1b1fcn.wrap:                  0 ; 0x104: 0x00000000

www.xifenfei.com>kfed read H:\ASMDISK\ASMDISK1.DD aun=2 blkn=6|grep kfffde|more
kfffde[0].xptr.au:                   26 ; 0x4a0: 0x0000001a
kfffde[0].xptr.disk:                  0 ; 0x4a4: 0x0000
kfffde[0].xptr.flags:                 0 ; 0x4a6: L=0 E=0 D=0 S=0
kfffde[0].xptr.chk:                  48 ; 0x4a7: 0x30
kfffde[1].xptr.au:           4294967295 ; 0x4a8: 0xffffffff
kfffde[1].xptr.disk:              65535 ; 0x4ac: 0xffff

从这里我们可以确定别名的au只有一个位于disk 0, au 26(0x1a)的位置
通过kfed分析别名

www.xifenfei.com>kfed read H:\ASMDISK\ASMDISK1.DD aun=26 |more
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 11 ; 0x002: KFBTYP_ALIASDIR
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 0 ; 0x004: blk=0
kfbh.block.obj: 6 ; 0x008: file=6
kfbh.check: 1563703526 ; 0x00c: 0x5d3438e6
kfbh.fcn.base: 3461 ; 0x010: 0x00000d85
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
kffdnd.bnode.incarn: 1 ; 0x000: A=1 NUMM=0x0
kffdnd.bnode.frlist.number: 4294967295 ; 0x004: 0xffffffff
kffdnd.bnode.frlist.incarn: 0 ; 0x008: A=0 NUMM=0x0
kffdnd.overfl.number: 4294967295 ; 0x00c: 0xffffffff
kffdnd.overfl.incarn: 0 ; 0x010: A=0 NUMM=0x0
kffdnd.parent.number: 0 ; 0x014: 0x00000000
kffdnd.parent.incarn: 1 ; 0x018: A=1 NUMM=0x0
kffdnd.fstblk.number: 0 ; 0x01c: 0x00000000
kffdnd.fstblk.incarn: 1 ; 0x020: A=1 NUMM=0x0
kfade[0].entry.incarn: 1 ; 0x024: A=1 NUMM=0x0
kfade[0].entry.hash: 2080305534 ; 0x028: 0x7bfef17e
kfade[0].entry.refer.number: 1 ; 0x02c: 0x00000001
kfade[0].entry.refer.incarn: 1 ; 0x030: A=1 NUMM=0x0
kfade[0].name: ORA10G ; 0x034: length=6
kfade[0].fnum: 4294967295 ; 0x