在数据库恢复遭遇ORA-07445 kgegpa错误

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:在数据库恢复遭遇ORA-07445 kgegpa错误

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

接到客户恢复请求,数据库启动报ORA-600 2662错误

Fri Apr 24 19:52:58 2020
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 15491509441794
Resetting resetlogs activation ID 1460987657 (0x5714e709)
Fri Apr 24 19:52:59 2020
Setting recovery target incarnation to 3
Fri Apr 24 19:52:59 2020
Assigning activation ID 1566342598 (0x5d5c7dc6)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: Y:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO01.LOG
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Apr 24 19:52:59 2020
SMON: enabling cache recovery
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_3860.trc  (incident=8561):
ORA-00600: 内部错误代码, 参数: [2662], [3606], [3857372426], [3606], [3857377059], [12583040], [], [], [], [], [], []
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_8561\orcl_ora_3860_i8561.trc
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_3860.trc:
ORA-00600: 内部错误代码, 参数: [2662], [3606], [3857372426], [3606], [3857377059], [12583040], [], [], [], [], [], []
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_3860.trc:
ORA-00600: 内部错误代码, 参数: [2662], [3606], [3857372426], [3606], [3857377059], [12583040], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
USER (ospid: 3860): terminating the instance due to error 600
Instance terminated by USER, pid = 3860
ORA-1092 signalled during: alter database open resetlogs...

这个错误比较常见,通过对数据库scn进行调整,顺利规避该错误,继续启动报如下错误

SQL> startup mount pfile='d:/pfile.txt';
ORACLE 例程已经启动。

Total System Global Area 1.3696E+10 bytes
Fixed Size                  2188768 bytes
Variable Size            6878661152 bytes
Database Buffers         6777995264 bytes
Redo Buffers               37044224 bytes
数据库装载完毕。
SQL> alter database open;
alter database open
*
第 1 行出现错误:
ORA-03113: 通信通道的文件结尾
进程 ID: 5884
会话 ID: 66 序列号: 3
Fri Apr 24 20:57:49 2020
SMON: enabling cache recovery
Successfully onlined Undo Tablespace 2.
Dictionary check beginning
Dictionary check complete
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
No Resource Manager plan active
Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0x898ADE43] [PC:0x9287D88, kgegpa()+38]
Dump file d:\app\administrator\diag\rdbms\orcl\orcl\trace\alert_orcl.log
Fri Apr 24 20:57:49 2020
ORACLE V11.2.0.1.0 - 64bit Production vsnsta=0
vsnsql=16 vsnxtr=3
Windows NT Version V6.1  
CPU                 : 16 - type 8664, 16 Physical Cores
Process Affinity    : 0x0x0000000000000000
Memory (Avail/Total): Ph:21429M/32767M, Ph+PgF:54255M/65533M 
Fri Apr 24 20:57:49 2020
Errors in file 
ORA-07445: caught exception [ACCESS_VIOLATION] at [kgegpa()+38] [0x0000000009287D88]
Fri Apr 24 20:57:52 2020
PMON (ospid: 2496): terminating the instance due to error 397
Instance terminated by PMON, pid = 2496

这里的主要错误是由于ORA-07445 kgegpa,根据以前恢复经验,该问题很可能和undo有关,对undo进行处理之后启动库

SQL> startup mount pfile='d:/pfile.txt' ;
ORACLE 例程已经启动。

Total System Global Area 1.3696E+10 bytes
Fixed Size                  2188768 bytes
Variable Size            6878661152 bytes
Database Buffers         6777995264 bytes
Redo Buffers               37044224 bytes
数据库装载完毕。
SQL> recover database;
完成介质恢复。
SQL> alter database open;

数据库已更改。

SMON: enabling tx recovery
Database Characterset is ZHS16GBK
SMON: Restarting fast_start parallel rollback
Fri Apr 24 21:01:28 2020
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_p000_4360.trc  (incident=13377):
ORA-00600: internal error code, arguments: [4198], [], [], [], [], [], [], [], [], [], [], []
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_13377\orcl_p000_4360_i13377.trc
Stopping background process MMNL
Doing block recovery for file 3 block 296
Resuming block recovery (PMON) for file 3 block 296
Block recovery from logseq 3, block 25 to scn 15491947056761
Recovery of Online Redo Log: Thread 1 Group 3 Seq 3 Reading mem 0
  Mem# 0: Y:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO03.LOG
Block recovery completed at rba 3.25.16, scn 3607.20090
Doing block recovery for file 6 block 165592
Resuming block recovery (PMON) for file 6 block 165592
Block recovery from logseq 3, block 33 to scn 15491947056769
Recovery of Online Redo Log: Thread 1 Group 3 Seq 3 Reading mem 0
  Mem# 0: Y:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO03.LOG
Block recovery completed at rba 3.58.16, scn 3607.20098
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_4912.trc  (incident=13321):
ORA-00600: internal error code, arguments: [4198], [], [], [], [], [], [], [], [], [], [], []
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_13321\orcl_smon_4912_i13321.trc
SMON: Parallel transaction recovery slave got internal error
SMON: Downgrading transaction recovery to serial
Stopping background process MMON
Fri Apr 24 21:01:29 2020
Trace dumping is performing id=[cdmp_20200424210129]
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_4912.trc  (incident=13322):
ORA-00600: internal error code, arguments: [4137], [12.30.1712324], [0], [0], [], [], [], [], [], [], [], []
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_13322\orcl_smon_4912_i13322.trc
ORACLE Instance orcl (pid = 14) - Error 600 encountered while recovering transaction (12, 30).
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_4912.trc:
ORA-00600: internal error code, arguments: [4137], [12.30.1712324], [0], [0], [], [], [], [], [], [], [], []
Completed: alter database open upgrade
Fri Apr 24 21:01:30 2020
MMON started with pid=16, OS id=4980 
Fri Apr 24 21:01:31 2020
Sweep [inc][13322]: completed
Corrupt block relative dba: 0x00c395ee (file 3, block 234990)
Fractured block found during buffer read
Data in bad block:
 type: 2 format: 2 rdba: 0x00c395ee
 last change scn: 0x0e16.e5ead38b seq: 0x2b flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0xdb720232
 check value in block header: 0xebe2
 computed block checksum: 0xb60b
Reading datafile'Y:\APP\ADMINISTRATOR\ORADATA\ORCL\UNDOTBS01.DBF'for corruption at rdba: 0x00c395ee (file 3,block 234990)
Reread (file 3, block 234990) found same corrupt data
Corrupt Block Found
         TSN = 2, TSNAME = UNDOTBS1
         RFN = 3, BLK = 234990, RDBA = 12817902
         OBJN = 0, OBJD = -1, OBJECT = , SUBOBJECT = 
         SEGMENT OWNER = , SEGMENT TYPE = 
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_m001_4852.trc  (incident=13641):
ORA-01578: ORACLE data block corrupted (file # 3, block # 234990)
ORA-01110: data file 3: 'Y:\APP\ADMINISTRATOR\ORADATA\ORCL\UNDOTBS01.DBF'
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_13641\orcl_m001_4852_i13641.trc
SQL> create undo tablespace undotbs2 datafile 
2   'Y:\APP\ADMINISTRATOR\ORADATA\ORCL\undo_xff02.dbf' size 128M autoextend on;

表空间已创建。

SQL> drop tablespace undotbs1 including contents and datafiles;

表空间已删除。

SQL> shutdown immediate;
数据库已经关闭。
已经卸载数据库。
ORACLE 例程已经关闭。
SQL> create spfile from pfile='d:/pfile.txt';

文件已创建。

SQL> startup mount
ORACLE 例程已经启动。

Total System Global Area 1.3696E+10 bytes
Fixed Size                  2188768 bytes
Variable Size            6878661152 bytes
Database Buffers         6777995264 bytes
Redo Buffers               37044224 bytes
数据库装载完毕。
SQL> alter database open;

数据库已更改。

数据库启动之后继续报出来的ORA-600 4198和ORA-600 4137以及undo坏块均证明是由于undo异常引起的问题,通过重建新undo,数据库open正常,安排客户进行数据导出导入到新库

记录一次200T的数据库恢复经历

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:记录一次200T的数据库恢复经历

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有一个客户恢复请求,6个节点11.2.0.3 RAC,非归档模式,数据量近200T
df_size


由于存储掉电导致数据库6个节点全部宕机,恢复硬件之后,数据库无法正常启动,报错如下:

SQL> recover database;
ORA-00279: change 318472018583 generated at 05/04/2019 17:58:05 needed for
thread 4
ORA-00289: suggestion :
/u01/app/oracle/product/11.2.0/db_1/dbs/arch4_322810_870181839.dbf
ORA-00280: change 318472018583 for thread 4 is in sequence #322810

Wed Aug 28 11:19:55 2019
ALTER DATABASE RECOVER  DATABAE 
Media Recovery Start
Serial Media Recovery started
Recovery of Online Redo Log: Thread 1 Group 14 Seq 552 Reading mem 0
  Mem# 0: +REDO/xff/log2.ora
Recovery of Online Redo Log: Thread 2 Group 15 Seq 126 Reading mem 0
  Mem# 0: +REDO/xff/log3.ora
Recovery of Online Redo Log: Thread 3 Group 18 Seq 122 Reading mem 0
  Mem# 0: +REDO/xff/log6.ora
ORA-279 signalled during: ALTER DATABASE RECOVER  database  ...
Wed Aug 28 11:21:31 2019
ALTER DATABASE RECOVER CANCEL 
Media Recovery Canceled
Completed: ALTER DATABASE RECOVER CANCEL 

数据库恢复需要thread 4 sequence #322810,查询redo信息
redo


redo已经被覆盖,数据库无法通过正常途径恢复实现数据库open,尝试屏蔽一致性强制拉库操作后

Wed Aug 28 12:40:15 2019
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_smon_51338.trc  (incident=244209):
ORA-00600: internal error code, arguments: [4137], [44.47.613406], [0], [0], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/xff/xff1/incident/incdir_244209/xff1_smon_51338_i244209.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
No Resource Manager plan active
replication_dependency_tracking turned off (no async multimaster replication found)
Wed Aug 28 12:40:16 2019
ORACLE Instance xff1 (pid = 26) - Error 600 encountered while recovering transaction (44, 47).
Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_smon_51338.trc:
ORA-00600: internal error code, arguments: [4137], [44.47.613406], [0], [0], [], [], [], [], [], [], [], []
Wed Aug 28 12:40:20 2019
Exception[type: SIGSEGV,Address not mapped to object][ADDR:0x5122000000C8][PC:0xE1B4D3,ktugru()+87][flags:0x0,count:1]
Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_p086_54066.trc  (incident=245017):
ORA-07445:exception encountered:core dump [ktugru()+87][SIGSEGV][ADDR:0x5122000000C8][Address not mapped to object]
Incident details in: /u01/app/oracle/diag/rdbms/xff/xff1/incident/incdir_245017/xff1_p086_54066_i245017.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Wed Aug 28 12:40:20 2019
Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_p000_53873.trc  (incident=244305):
ORA-00600: internal error code, arguments: [4198], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/xff/xff1/incident/incdir_244305/xff1_p000_53873_i244305.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.

提示undo异常,屏蔽回滚段之后,数据库正常打开没有任何报错信息

Wed Aug 28 12:57:15 2019
SMON: enabling cache recovery
Instance recovery: looking for dead threads
Instance recovery: lock domain invalid but no dead threads
[57676] Successfully onlined Undo Tablespace 22.
Undo initialization finished serial:0 start:2386111306 end:2386112316 diff:1010 (10 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
Wed Aug 28 12:57:17 2019
minact-scn: Inst 1 is now the master inc#:2 mmon proc-id:57624 status:0x7
minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000
No Resource Manager plan active
Starting background process GTX0
Wed Aug 28 12:57:18 2019
GTX0 started with pid=45, OS id=57777 
Starting background process RCBG
Wed Aug 28 12:57:18 2019
RCBG started with pid=46, OS id=57779 
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Wed Aug 28 12:57:19 2019
QMNC started with pid=47, OS id=57788 
Completed: ALTER DATABASE OPEN

后续涉及创建新undo,删除老undo并处理一些类似,基本上恢复正常
OPEN


ORACLE system 坏块恢复—C_TS$

一朋友给我电话,说他们客户公司数据库故障,被另外一家公司恢复了一天不能正常恢复,请求我协助解决.接手一看数据库已经被破坏的不像样子了,根据alert日志信息大概分析了故障原因和上家公司处理情况。后面接手后通过bbed修复block数据库恢复过程,在本次恢复中出现大量ORA-600错误,主要包括ORA-00600 400,ORA-00600 2662,ORA-00600 2663,ORA-00600 krhpfh_03-1209,ORA-00600 3600,ORA-00600 ktsitbs_info1,ORA-00600 4137,ORA-00600 4511,ORA-00600 4198,ORA-00600 6807等
故障原因redo文件丢失

Thu Nov 20 11:28:39 2014
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_lgwr_1404.trc:
ORA-00313: open failed for members of log group 7 of thread 1
ORA-00312: online log 9 thread 1: '/data2/oradata/redo0902.log'
ORA-27037: unable to obtain file status
SVR4 Error: 2: No such file or directory
Additional information: 3
Thu Nov 20 11:28:39 2014
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_lgwr_1404.trc:
ORA-00313: open failed for members of log group 7 of thread 1
ORA-00312: online log 9 thread 1: '/data2/oradata/redo0902.log'
ORA-27037: unable to obtain file status
SVR4 Error: 2: No such file or directory
Additional information: 3
Thu Nov 20 11:28:39 2014
LGWR: terminating instance due to error 313
Thu Nov 20 11:28:39 2014
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_pmon_1394.trc:
ORA-00313: open failed for members of log group  of thread 
Thu Nov 20 11:28:39 2014
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_ckpt_1406.trc:
ORA-00313: open failed for members of log group  of thread 
Instance terminated by LGWR, pid = 1404

尝试clear redo文件方式恢复

Thu Nov 20 13:04:16 2014
alter database clear logfile group 9
Thu Nov 20 13:04:16 2014
ORA-1624 signalled during: alter database clear logfile group 9...
Thu Nov 20 13:04:45 2014
alter database clear logfile group 9
Thu Nov 20 13:04:46 2014
ORA-1624 signalled during: alter database clear logfile group 9...
Thu Nov 20 13:04:59 2014
alter database clear unarchived logfile group 9
Thu Nov 20 13:04:59 2014
ORA-1624 signalled during: alter database clear unarchived logfile group 9...
Thu Nov 20 13:05:00 2014
alter database clear unarchived logfile group 9
Thu Nov 20 13:05:00 2014
ORA-1624 signalled during: alter database clear unarchived logfile group 9...

不完全恢复resetlogs尝试打开数据库

ORA-279 signalled during: ALTER DATABASE RECOVER  database using backup controlfile  ...
Thu Nov 20 13:49:01 2014
ALTER DATABASE RECOVER    CONTINUE DEFAULT  
Thu Nov 20 13:49:02 2014
Media Recovery Log /opt/oracle/flash_recovery_area/xifenfei/archivelog/2014_11_20/o1_mf_1_285999_%u_.arc
Errors with log /opt/oracle/flash_recovery_area/xifenfei/archivelog/2014_11_20/o1_mf_1_285999_%u_.arc
ORA-308 signalled during: ALTER DATABASE RECOVER    CONTINUE DEFAULT  ...
Thu Nov 20 13:49:02 2014
ALTER DATABASE RECOVER    CONTINUE DEFAULT  
Thu Nov 20 13:49:02 2014
Media Recovery Log /opt/oracle/flash_recovery_area/xifenfei/archivelog/2014_11_20/o1_mf_1_285999_%u_.arc
Errors with log /opt/oracle/flash_recovery_area/xifenfei/archivelog/2014_11_20/o1_mf_1_285999_%u_.arc
ORA-308 signalled during: ALTER DATABASE RECOVER    CONTINUE DEFAULT  ...
Thu Nov 20 13:49:02 2014
ALTER DATABASE RECOVER CANCEL 
Thu Nov 20 13:49:03 2014
Media Recovery Canceled
Completed: ALTER DATABASE RECOVER CANCEL 
Thu Nov 20 13:49:33 2014
alter database open resetlogs
Thu Nov 20 13:49:34 2014
ORA-1113 signalled during: alter database open resetlogs...

使用隐含参数

_allow_resetlogs_corruption= TRUE

进行不完全恢复,尝试open数据库报ORA-600 4000错误

Thu Nov 20 14:35:02 2014
ALTER DATABASE   MOUNT
Thu Nov 20 14:35:07 2014
Setting recovery target incarnation to 2
Thu Nov 20 14:35:07 2014
Successful mount of redo thread 1, with mount id 4039504598
Thu Nov 20 14:35:07 2014
Database mounted in Exclusive Mode
Completed: ALTER DATABASE   MOUNT
Thu Nov 20 14:40:33 2014
ALTER DATABASE RECOVER  database until cancel  
Thu Nov 20 14:40:33 2014
Media Recovery Start
Thu Nov 20 14:40:33 2014
Media Recovery failed with error 1610
ORA-283 signalled during: ALTER DATABASE RECOVER  database until cancel  ...
Thu Nov 20 14:41:23 2014
ALTER DATABASE RECOVER  database using backup controlfile until cancel  

Thu Nov 20 14:43:08 2014
alter database open resetlogs
Thu Nov 20 14:43:08 2014
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 31293973571
Resetting resetlogs activation ID 3855216310 (0xe5c9eeb6)
Online log /data2/oradata/redo0802.log: Thread 1 Group 8 was previously cleared
Online log /data2/oradata/redo0902.log: Thread 1 Group 9 was previously cleared
Thu Nov 20 14:43:14 2014
Setting recovery target incarnation to 3
Thu Nov 20 14:43:14 2014
Assigning activation ID 4039504598 (0xf0c5f2d6)
Thread 1 opened at log sequence 1
  Current log# 9 seq# 1 mem# 0: /data2/oradata/redo0902.log
Successful open of redo thread 1
Thu Nov 20 14:43:14 2014
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Thu Nov 20 14:43:14 2014
SMON: enabling cache recovery
Thu Nov 20 14:43:14 2014
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_1844.trc:
ORA-00600: internal error code, arguments: [4000], [17], [], [], [], [], [], []
Thu Nov 20 14:43:16 2014
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_1844.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [4000], [17], [], [], [], [], [], []
Thu Nov 20 14:43:16 2014
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 1844
ORA-1092 signalled during: alter database open resetlogs...

尝试隐含屏蔽回滚段

_corrupted_rollback_segments= _SYSSMU1$, _SYSSMU2$,…………

错误依旧ORA-600 4000

Thu Nov 20 15:09:21 2014
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Thu Nov 20 15:09:21 2014
SMON: enabling cache recovery
Thu Nov 20 15:09:21 2014
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_624.trc:
ORA-00600: internal error code, arguments: [4000], [17], [], [], [], [], [], []
Thu Nov 20 15:09:23 2014
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_624.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [4000], [17], [], [], [], [], [], []
Thu Nov 20 15:09:23 2014
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 624
ORA-1092 signalled during: alter database open

多次重启,resetlogs后,数据库出现ORA-600 2662错误

Successful open of redo thread 1
Thu Nov 20 17:13:24 2014
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Thu Nov 20 17:13:24 2014
SMON: enabling cache recovery
Thu Nov 20 17:13:24 2014
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_7967.trc:
ORA-00600: internal error code, arguments: [2662], [7], [1229382552], [7], [1229560642], [8388633], [], []
Thu Nov 20 17:13:25 2014
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_7967.trc:
ORA-00600: internal error code, arguments: [2662], [7], [1229382552], [7], [1229560642], [8388633], [], []
Thu Nov 20 17:13:25 2014
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Instance terminated by USER, pid = 7967
ORA-1092 signalled during: ALTER DATABASE OPEN...
Thu Nov 20 17:18:23 2014
USER: terminating instance due to error 1092
Instance terminated by USER, pid = 7967

offline undo相关文件,尝试打开数据库

Database mounted in Exclusive Mode
Completed: ALTER DATABASE   MOUNT
Thu Nov 20 17:52:31 2014
ALTER DATABASE RECOVER  database until cancel  
Thu Nov 20 17:52:31 2014
Media Recovery Start
 parallel recovery started with 15 processes
ORA-279 signalled during: ALTER DATABASE RECOVER  database until cancel  ...
Thu Nov 20 17:53:42 2014
ALTER DATABASE RECOVER CANCEL 
Thu Nov 20 17:53:44 2014
ORA-1547 signalled during: ALTER DATABASE RECOVER CANCEL ...
Thu Nov 20 17:56:34 2014
alter database datafile '/opt/oracle/oradata/xifenfei/undotbs01.dbf' offline
Thu Nov 20 17:56:35 2014
Completed: alter database datafile '/opt/oracle/oradata/xifenfei/undotbs01.dbf' offline
Thu Nov 20 17:57:01 2014
alter database datafile '/data2/oradata/undotbs02.dbf' offline
Thu Nov 20 17:57:02 2014
Completed: alter database datafile '/data2/oradata/undotbs02.dbf' offline
Thu Nov 20 17:57:26 2014
alter database datafile '/data2/oradata/undotbs03.dbf' offline
Thu Nov 20 17:57:27 2014
Completed: alter database datafile '/data2/oradata/undotbs03.dbf' offline
Thu Nov 20 17:57:43 2014
alter database open resetlogs
Thu Nov 20 17:57:43 2014
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
ORA-1245 signalled during: alter database open resetlogs...
Thu Nov 20 17:58:58 2014
alter database datafile '/opt/oracle/oradata/xifenfei/undotbs01.dbf' offline drop
Thu Nov 20 17:58:58 2014
Completed: alter database datafile '/opt/oracle/oradata/xifenfei/undotbs01.dbf' offline drop
Thu Nov 20 17:59:15 2014
alter database open resetlogs
Thu Nov 20 17:59:15 2014
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
ORA-1245 signalled during: alter database open resetlogs...
Thu Nov 20 17:59:35 2014
alter database datafile '/data2/oradata/undotbs02.dbf' offline drop
Thu Nov 20 17:59:35 2014
Completed: alter database datafile '/data2/oradata/undotbs02.dbf' offline drop
Thu Nov 20 17:59:50 2014
alter database datafile '/data2/oradata/undotbs03.dbf' offline drop
Thu Nov 20 17:59:50 2014
Completed: alter database datafile '/data2/oradata/undotbs03.dbf' offline drop
Thu Nov 20 18:00:07 2014
alter database open resetlogs
Thu Nov 20 18:00:07 2014
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 31294173628
Resetting resetlogs activation ID 4039492628 (0xf0c5c414)
Online log /data2/oradata/redo0802.log: Thread 1 Group 8 was previously cleared
Thu Nov 20 18:00:14 2014
Setting recovery target incarnation to 8
Thu Nov 20 18:00:14 2014
Assigning activation ID 4039504142 (0xf0c5f10e)
Thread 1 opened at log sequence 1
  Current log# 9 seq# 1 mem# 0: /data2/oradata/redo0902.log
Successful open of redo thread 1
Thu Nov 20 18:00:15 2014
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Thu Nov 20 18:00:15 2014
SMON: enabling cache recovery
Thu Nov 20 18:00:15 2014
Successfully onlined Undo Tablespace 1.
Dictionary check beginning
File #2 is offline, but is part of an online tablespace.
data file 2: '/opt/oracle/oradata/xifenfei/undotbs01.dbf'
File #100 is offline, but is part of an online tablespace.
data file 100: '/data2/oradata/undotbs02.dbf'
Thu Nov 20 18:00:28 2014
File #185 is offline, but is part of an online tablespace.
data file 185: '/data2/oradata/undotbs03.dbf'
Dictionary check complete
Thu Nov 20 18:00:35 2014
SMON: enabling tx recovery
Thu Nov 20 18:00:36 2014
Database Characterset is ZHS16CGB231280
Thu Nov 20 18:00:37 2014
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_28472.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 185 cannot be read at this time
ORA-01110: data file 185: '/data2/oradata/undotbs03.dbf'
Error 604 happened during db open, shutting down database
USER: terminating instance due to error 604
Thu Nov 20 18:00:37 2014
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_lgwr_28450.trc:
ORA-00604: error occurred at recursive SQL level 
Thu Nov 20 18:00:37 2014
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_dbw0_28446.trc:
ORA-00604: error occurred at recursive SQL level 
Instance terminated by USER, pid = 28472
ORA-1092 signalled during: alter database open resetlogs...

不知道做了什么操作出现file 1 block 60坏块,很可能bbed修改错误导致

Thu Nov 20 19:18:15 2014
SMON: enabling cache recovery
Thu Nov 20 19:18:16 2014
Hex dump of (file 1, block 60) in trace file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_13232.trc
Corrupt block relative dba: 0x0040003c (file 1, block 60)
Bad header found during buffer read
Data in bad block:
 type: 128 format: 0 rdba: 0x0040003c
 last change scn: 0x0005.ebe04bc9 seq: 0x2 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x4bc90602
 check value in block header: 0x6faa
 computed block checksum: 0x0
Reread of rdba: 0x0040003c (file 1, block 60) found same corrupted data
Successfully onlined Undo Tablespace 1.
Thu Nov 20 19:18:16 2014
SMON: enabling tx recovery
Thu Nov 20 19:18:17 2014
Database Characterset is ZHS16CGB231280
Thu Nov 20 19:18:17 2014
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_13232.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 185 cannot be read at this time
ORA-01110: data file 185: '/data2/oradata/undotbs03.dbf'
Error 604 happened during db open, shutting down database
USER: terminating instance due to error 604
Instance terminated by USER, pid = 13232
ORA-1092 signalled during: alter database open...

尝试不完全恢复,并resetlogs操作

ALTER DATABASE RECOVER  database until cancel  
Thu Nov 20 19:33:41 2014
Media Recovery Start
Datafile 2 is on orphaned branch
          File status = 4
        Abs fuzzy SCN = 0
 Hot backup fuzzy SCN = 0
Thu Nov 20 19:33:41 2014
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_20878.trc:
ORA-00600: internal error code, arguments: [krhpfh_03-1209], [2], [864151207], [864153315], [1229402557], [7], [0], [0]
ORA-01110: data file 2: '/opt/oracle/oradata/xifenfei/undotbs01.dbf'
Thu Nov 20 19:33:42 2014
Media Recovery failed with error 600
ORA-283 signalled during: ALTER DATABASE RECOVER  database until cancel  ...
Thu Nov 20 19:34:06 2014
alter database open resetlogs
Thu Nov 20 19:34:06 2014
ORA-1139 signalled during: alter database open resetlogs...
Thu Nov 20 19:34:17 2014
alter database open
Thu Nov 20 19:34:17 2014
ORA-1190 signalled during: alter database open...
Thu Nov 20 19:35:57 2014
ALTER DATABASE RECOVER  database until cancel  
Thu Nov 20 19:35:57 2014
Media Recovery Start
Datafile 2 is on orphaned branch
          File status = 4
        Abs fuzzy SCN = 0
 Hot backup fuzzy SCN = 0
Thu Nov 20 19:35:58 2014
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_20878.trc:
ORA-00600: internal error code, arguments: [krhpfh_03-1209], [2], [864151207], [864153315], [1229402557], [7], [0], [0]
ORA-01110: data file 2: '/opt/oracle/oradata/xifenfei/undotbs01.dbf'
Thu Nov 20 19:35:59 2014
Media Recovery failed with error 600
ORA-283 signalled during: ALTER DATABASE RECOVER  database until cancel  ...
Thu Nov 20 19:37:19 2014
alter database open resetlogs
Thu Nov 20 19:37:19 2014
ORA-1139 signalled during: alter database open resetlogs...

继续打开报 ORA-600 3600错误

Thu Nov 20 19:43:14 2014
alter database datafile '/opt/oracle/oradata/xifenfei/undotbs01.dbf' offline drop
Thu Nov 20 19:43:14 2014
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_dbw0_20856.trc:
ORA-00600: internal error code, arguments: [3600], [2], [14], [], [], [], [], []
Thu Nov 20 19:43:15 2014
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_dbw0_20856.trc:
ORA-00600: internal error code, arguments: [3600], [2], [14], [], [], [], [], []
Thu Nov 20 19:43:15 2014
DBW0: terminating instance due to error 471
Instance terminated by DBW0, pid = 20856
1
<strong>中间多次重启和resetlogs,还出现ORA-600 2663错误</strong>
1
Fri Nov 21 12:35:12 2014
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Nov 21 12:35:12 2014
SMON: enabling cache recovery
Fri Nov 21 12:35:13 2014
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_15596.trc:
ORA-00600: internal error code, arguments: [2663], [7], [1229543007], [7], [1229560642], [], [], []
Fri Nov 21 12:35:14 2014
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_15596.trc:
ORA-00600: internal error code, arguments: [2663], [7], [1229543007], [7], [1229560642], [], [], []
Fri Nov 21 12:35:14 2014
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Fri Nov 21 12:35:14 2014
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_mman_15572.trc:
ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], []
Fri Nov 21 12:35:14 2014
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_dbw1_15576.trc:
ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], []
Instance terminated by USER, pid = 15596
ORA-1092 signalled during: ALTER DATABASE OPEN..

继续尝试打开数据库出现ORA-600 ktsitbs_info1错误

SMON: enabling cache recovery
Fri Nov 21 13:54:25 2014
Hex dump of (file 1, block 60) in trace file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_21111.trc
Corrupt block relative dba: 0x0040003c (file 1, block 60)
Bad header found during buffer read
Data in bad block:
 type: 128 format: 0 rdba: 0x0040003c
 last change scn: 0x0005.ebe04bc9 seq: 0x2 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x4bc90602
 check value in block header: 0x6faa
 computed block checksum: 0x0
Reread of rdba: 0x0040003c (file 1, block 60) found same corrupted data
Fri Nov 21 13:54:25 2014
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_21111.trc:
ORA-00600: internal error code, arguments: [ktsitbs_info1], [2], [], [], [], [], [], []
Fri Nov 21 13:54:27 2014
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_21111.trc:
ORA-00600: internal error code, arguments: [ktsitbs_info1], [2], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Instance terminated by USER, pid = 21111
ORA-1092 signalled during: alter database open...

以上是客户数据库故障原因和问题大概的处理过程,下面是我接手后的处理过程


dbv 检查system01.dbf文件,得到结果

HNDX-DB% dbv file=/opt/oracle/oradata/xifenfei/system01.dbf

DBVERIFY: Release 10.2.0.1.0 - Production on Fri Nov 21 16:22:37 2014

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

DBVERIFY - Verification starting : FILE = /opt/oracle/oradata/xifenfei/system01.dbf
Page 60 is marked corrupt
Corrupt block relative dba: 0x0040003c (file 1, block 60)
Bad header found during dbv: 
Data in bad block:
 type: 128 format: 0 rdba: 0x0040003c
 last change scn: 0x0005.ebe04bc9 seq: 0x2 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x4bc90602
 check value in block header: 0x6faa
 computed block checksum: 0x0

Corrupt block relative dba: 0x004001f2 (file 1, block 498)
Bad check value found during buffer read
Data in bad block:
 type: 6 format: 2 rdba: 0x004001f2
 last change scn: 0x0007.49499ca1 seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x9ca10601
 check value in block header: 0xe458
 computed block checksum: 0x9720


DBVERIFY - Verification complete

Total Pages Examined         : 786432
Total Pages Processed (Data) : 201131
Total Pages Failing   (Data) : 2
Total Pages Processed (Index): 221394
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 60265
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 303641
Total Pages Marked Corrupt   : 2
Total Pages Influx           : 0
Highest block SCN            : 1229823477 (7.1229823477)

这里知道数据库有两个坏块,而且根据对于bootstrap$的经验,可以大概确定60坏块很可能是C_TS$,第一反应type异常,498可能是seq$

对数据库启动过程做10046,得到trace文件

PARSING IN CURSOR #1 len=275 dep=2 uid=0 oct=3 lid=0 tim=27978051403575 hv=3408408745 ad='7df93cd0'
select name,online$,contents$,undofile#,undoblock#,blocksize,dflmaxext,dflinit,dflincr,dflextpct,dflminext, 
dflminlen, owner#,scnwrp,scnbas, NVL(pitrscnwrp, 0), NVL(pitrscnbas, 0), dflogging, bitmapped, inc#, flags, 
plugged, NVL(spare1,0), NVL(spare2,0) from ts$ where ts#=:1
END OF STMT
PARSE #1:c=0,e=92,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=4,tim=27978051403569
BINDS #1:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=ffffffff7dbac9a8  bln=22  avl=02  flg=05
  value=2
EXEC #1:c=0,e=310,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=4,tim=27978051404296
WAIT #1: nam='db file sequential read' ela= 42 file#=1 block#=60 blocks=1 obj#=-1 tim=27978051404449
Hex dump of (file 1, block 60)
Corrupt block relative dba: 0x0040003c (file 1, block 60)
Bad header found during buffer read
Data in bad block:
 type: 128 format: 0 rdba: 0x0040003c
 last change scn: 0x0005.ebe04bc9 seq: 0x2 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x4bc90602
 check value in block header: 0x6faa
 computed block checksum: 0x0
Reread of rdba: 0x0040003c (file 1, block 60) found same corrupted data
FETCH #1:c=10000,e=4072,p=1,cr=2,cu=0,mis=0,r=0,dep=2,og=4,tim=27978051408438
STAT #1 id=1 cnt=0 pid=0 pos=1 obj=16 op='TABLE ACCESS CLUSTER TS$ (cr=2 pr=1 pw=0 time=4075 us)'
STAT #1 id=2 cnt=1 pid=1 pos=1 obj=7 op='INDEX UNIQUE SCAN I_TS# (cr=1 pr=0 pw=0 time=13 us)'
*** 2014-11-22 14:44:43.235
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [ktsitbs_info1], [2], [], [], [], [], [], []
Current SQL statement for this session:
select max(maxconcurrency) from sys.wrh$_undostat  where instance_number = :1 and dbid = :2 
and snap_id in   (select snap_id from dba_hist_snapshot where end_interval_time >     
(select max(end_interval_time)-7 from dba_hist_snapshot))

这里显示了数据库启动报ORA-00600[ktsitbs_info1],[2],明显的表示了b中的2是表示表空间号,由于ts$坏块,无法读取ts$中表空间信息,从而出现数据字典不一致,从而出现该错误。所以恢复该库的关键是修复file 1 block 60.

bbed尝试修复file 1 block 60

HNDX-DB% bbed password=blockedit mode=edit

BBED: Release 2.0.0.0.0 - Limited Production on Sat Nov 22 15:16:26 2014

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

************* !!! For Oracle Internal Use only !!! ***************

BBED> set filename '/opt/oracle/oradata/xifenfei/system01.dbf'
        FILENAME        /opt/oracle/oradata/xifenfei/system01.dbf

BBED> set block 8192
        BLOCK#          8192

BBED> set block 60
        BLOCK#          60

BBED> set count 64
        COUNT           64

BBED> map
 File: /opt/oracle/oradata/xifenfei/system01.dbf (0)
 Block: 60                                    Dba:0x00000000
------------------------------------------------------------
BBED-00400: invalid blocktype (128)


BBED> set block 61
        BLOCK#          61

BBED> map
 File: /opt/oracle/oradata/xifenfei/system01.dbf (0)
 Block: 61                                    Dba:0x00000000
------------------------------------------------------------
 KTB Data Block (Table/Cluster)

 struct kcbh, 20 bytes                      @0       

 struct ktbbh, 72 bytes                     @20      

 struct kdbh, 14 bytes                      @92      

 struct kdbt[3], 12 bytes                   @106     

 sb2 kdbr[2]                                @118     

 ub1 freespace[7959]                        @122     

 ub1 rowdata[107]                           @8081    

 ub4 tailchk                                @8188    


BBED> p kcbh
struct kcbh, 20 bytes                       @0       
   ub1 type_kcbh                            @0        0x06
   ub1 frmt_kcbh                            @1        0xa2
   ub1 spare1_kcbh                          @2        0x00
   ub1 spare2_kcbh                          @3        0x00
   ub4 rdba_kcbh                            @4        0x0040003d
   ub4 bas_kcbh                             @8        0x0000235b
   ub2 wrp_kcbh                             @12       0x0000
   ub1 seq_kcbh                             @14       0x01
   ub1 flg_kcbh                             @15       0x04 (KCBHFCKV)
   ub2 chkval_kcbh                          @16       0x7a85
   ub2 spare3_kcbh                          @18       0x0000

BBED> set block 60
        BLOCK#          60

BBED> d
 File: /opt/oracle/oradata/xifenfei/system01.dbf (0)
 Block: 60               Offsets:    0 to   63           Dba:0x00000000
------------------------------------------------------------------------
 80000000 0040003c ebe04bc9 00050204 6faa0000 01000000 00000006 29b3a204 
 00040ca0 00020200 00000000 000a0000 00000002 0080009b 00000100 80000000 

 <32 bytes per line>

BBED> d block 61
 File: /opt/oracle/oradata/xifenfei/system01.dbf (0)
 Block: 61               Offsets:    0 to   63           Dba:0x00000000
------------------------------------------------------------------------
 06a20000 0040003d 0000235b 00000104 7a850000 01000000 00000006 00001837 
 00001738 00020200 00000000 0007002e 00000002 00800075 00012300 80000000 

 <32 bytes per line>

BBED> set block 60
        BLOCK#          60

BBED> m /x 06a2
 File: /opt/oracle/oradata/xifenfei/system01.dbf (0)
 Block: 60               Offsets:    0 to   63           Dba:0x00000000
------------------------------------------------------------------------
 06a20000 0040003c ebe04bc9 00050204 6faa0000 01000000 00000006 29b3a204 
 00040ca0 00020200 00000000 000a0000 00000002 0080009b 00000100 80000000 

 <32 bytes per line>

BBED> map
 File: /opt/oracle/oradata/xifenfei/system01.dbf (0)
 Block: 60                                    Dba:0x00000000
------------------------------------------------------------
 KTB Data Block (Table/Cluster)

 struct kcbh, 20 bytes                      @0       

 struct ktbbh, 72 bytes                     @20      

 struct kdbh, 14 bytes                      @92      

 struct kdbt[3], 12 bytes                   @106     

 sb2 kdbr[2]                                @118     

 ub1 freespace[7598]                        @122     

 ub1 rowdata[468]                           @7720    

 ub4 tailchk                                @8188    


BBED> sum apply
Check value for File 0, Block 60:
current = 0xe908, required = 0xe908

BBED> verify
DBVERIFY - Verification starting
FILE = /opt/oracle/oradata/xifenfei/system01.dbf
BLOCK = 60


DBVERIFY - Verification complete

Total Blocks Examined         : 1
Total Blocks Processed (Data) : 1
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 0
Total Blocks Influx           : 0

BBED> 

尝试启动数据库

Sat Nov 22 15:51:33 2014
alter database open
Sat Nov 22 15:51:34 2014
Thread 1 opened at log sequence 7
  Current log# 8 seq# 7 mem# 0: /data2/oradata/redo0802.log
Successful open of redo thread 1
Sat Nov 22 15:51:34 2014
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sat Nov 22 15:51:34 2014
SMON: enabling cache recovery
SMON: enabling tx recovery
Sat Nov 22 15:51:34 2014
Database Characterset is ZHS16CGB231280
Hex dump of (file 1, block 498) in trace file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_2818.trc
Corrupt block relative dba: 0x004001f2 (file 1, block 498)
Bad check value found during buffer read
Data in bad block:
 type: 6 format: 2 rdba: 0x004001f2
 last change scn: 0x0007.49499ca1 seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x9ca10601
 check value in block header: 0xe458
 computed block checksum: 0x9720
Reread of rdba: 0x004001f2 (file 1, block 498) found same corrupted data
Sat Nov 22 15:51:35 2014
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_smon_2803.trc:
ORA-00600: internal error code, arguments: [4000], [12], [], [], [], [], [], []
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=18, OS id=3000
Sat Nov 22 15:51:36 2014
Completed: alter database open
Sat Nov 22 15:51:36 2014
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_3010.trc:
ORA-00600: internal error code, arguments: [6807], [AUDSES$], [144], [], [], [], [], []
Sat Nov 22 15:51:37 2014
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_mmon_2809.trc:
ORA-00600: internal error code, arguments: [6807], [WRI$_ALERT_SEQUENCE], [8783], [], [], [], [], []
Sat Nov 22 15:51:37 2014
Non-fatal internal error happenned while SMON was doing non-existent object cleanup.
SMON encountered 1 out of maximum 100 non-fatal internal errors.
Sat Nov 22 15:51:38 2014
ORA-600 encountered when generating server alert SMG-3000
Sat Nov 22 15:51:38 2014
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_mmon_2809.trc:
ORA-00600: internal error code, arguments: [ktcpoptx_0], [0x772705E60], [], [], [], [], [], []

只要出现ORA-600 4000和ORA-600 6807错误,其中ORA-600 6807错误比较明显是由于seq$坏块,导致AUDSES$ seq异常导致。ORA-600 4000应该是回滚段异常,继续分析回滚段

SQL> select name,ts#,status$ from undo$;

NAME                                  TS#    STATUS$
------------------------------ ---------- ----------
SYSTEM                                  0          2
_SYSSMU1$                               1          2
_SYSSMU2$                               1          2
_SYSSMU3$                               1          2
…………
_SYSSMU168$                             1          2
_SYSSMU169$                             1          2

这里很异常,system回滚段在数据库open之后,按照常理不可能处于STATUS$=2(OFFLINE)状态。而且其他回滚段全部为OFFLINE状态也属于异常情况.而且尝试drop undo报ORA-01561,另外在dba_rollback_segs中无SYSTEM(查询结果忘记保存)

SQL> drop tablespace undotbs1 including contents; 
drop tablespace undotbs1 including contents
*
ERROR at line 1:
ORA-01561: failed to remove all objects in the tablespace specified

通过这一系列很怀疑是由于bbed 修改了undo$等相关基表信息导致现在system中的undo信息混乱.信息反馈给客户后,客户想起来昨天给他们恢复的公司在bbed操作前备份了system01.dbf.突然感觉救星来了.实在怕不懂bbed的人折腾bbed

dbv检测备份文件

DBVERIFY - Verification starting : FILE = /data3/backup/system01.dbf_bak
Page 60 is marked corrupt
Corrupt block relative dba: 0x0040003c (file 1, block 60)
Bad header found during dbv: 
Data in bad block:
 type: 128 format: 0 rdba: 0x0040003c
 last change scn: 0x0005.ebe04bc9 seq: 0x2 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x4bc90602
 check value in block header: 0x6faa
 computed block checksum: 0x0

Block Checking: DBA = 4194802, Block Type = KTB-managed data block
data header at 0x1002ef05c
kdbchk: row locked by non-existent transaction
        table=0   slot=4
        lockid=1   ktbbhitc=2
Page 498 failed with check code 6101


DBVERIFY - Verification complete

Total Pages Examined         : 786432
Total Pages Processed (Data) : 201131
Total Pages Failing   (Data) : 1
Total Pages Processed (Index): 221394
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 60265
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 303641
Total Pages Marked Corrupt   : 1
Total Pages Influx           : 0
Highest block SCN            : 1229823477 (7.1229823477)

好家伙只有一个物理坏块和一个逻辑坏块,而对于物理坏块block 60已经知道如何修复,逻辑坏块可以尝试设置隐含参数跳过去,bbed修改相关block(同上步骤)

再次启动数据库

dd if=/opt/oracle/oradata/xifenfei/system01.dbf bs=8192 count=2 of=/tmp/system01.2
dd if=/tmp/system01.2 of=/data3/backup/system01.dbf_bak bs=8192 count=2 conv=notrunc 
Sat Nov 22 17:52:50 2014
Database mounted in Exclusive Mode
Completed: ALTER DATABASE   MOUNT
Sat Nov 22 17:53:38 2014
alter database rename file '/opt/oracle/oradata/xifenfei/system01.dbf' to '/data3/backup/system01.dbf_bak'
Sat Nov 22 17:53:39 2014
Completed: alter database rename file '/opt/oracle/oradata/xifenfei/system01.dbf' to '/data3/backup/system01.dbf_bak'
Sat Nov 22 17:55:43 2014
alter database open
Sat Nov 22 17:55:48 2014
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=18, OS id=15858
Sat Nov 22 17:56:10 2014
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=17, OS id=15879
Sat Nov 22 17:56:19 2014
Thread 1 opened at log sequence 7
  Current log# 8 seq# 7 mem# 0: /data2/oradata/redo0802.log
Successful open of redo thread 1
Sat Nov 22 17:56:19 2014
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sat Nov 22 17:56:19 2014
SMON: enabling cache recovery
SMON: enabling tx recovery
Sat Nov 22 17:56:20 2014
ARC1: STARTING ARCH PROCESSES
Sat Nov 22 17:56:20 2014
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
Sat Nov 22 17:56:22 2014
Database Characterset is ZHS16CGB231280
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Sat Nov 22 17:56:33 2014
ARC2: Archival started
ARC1: STARTING ARCH PROCESSES COMPLETE
ARC1: Becoming the heartbeat ARCH
ARC2 started with pid=23, OS id=15928
QMNC started with pid=25, OS id=15996
Sat Nov 22 17:57:11 2014
Completed: alter database open
Sat Nov 22 17:57:18 2014
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_16010.trc:
ORA-00600: internal error code, arguments: [4511], [], [], [], [], [], [], []
Sat Nov 22 17:57:26 2014
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_16012.trc:
ORA-00600: internal error code, arguments: [4511], [], [], [], [], [], [], []
Sat Nov 22 17:58:17 2014
Starting background process EMN0
Sat Nov 22 18:00:03 2014
Shutting down instance: further logons disabled
EMN0 started with pid=71, OS id=16421
Sat Nov 22 18:00:12 2014
SMON: Restarting fast_start parallel rollback
Sat Nov 22 18:00:23 2014
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_p000_15951.trc:
ORA-00600: internal error code, arguments: [4198], [9], [], [], [], [], [], []
Sat Nov 22 18:00:24 2014
Stopping background process CJQ0
Sat Nov 22 18:00:24 2014
Stopping background process QMNC
Sat Nov 22 18:00:27 2014
Doing block recovery for file 2 block 41
Block recovery from logseq 7, block 180883 to scn 214748389244
Sat Nov 22 18:00:27 2014
Recovery of Online Redo Log: Thread 1 Group 8 Seq 7 Reading mem 0
  Mem# 0 errs 0: /data2/oradata/redo0802.log
Block recovery stopped at EOT rba 7.180988.16
Block recovery completed at rba 7.180988.16, scn 50.24441
Sat Nov 22 18:00:32 2014
Stopping background process MMNL
Sat Nov 22 18:00:38 2014
Stopping background process MMON
Sat Nov 22 18:00:41 2014
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_smon_15395.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Sat Nov 22 18:00:42 2014
ORACLE Instance xifenfei (pid = 9) - Error 600 encountered while recovering transaction (3, 4).
Sat Nov 22 18:00:42 2014
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_smon_15395.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []

这里都是很常规的错误,查询undo$也已经正常,重建新undo表空间删除老undo,然后alert日志中无其他报错,数据库恢复至此完成,建议客户导出导入重建数据库