ALERT: Database Corruption ORA-600 ORA-7445 errors after applying AIX SP patches – AIX 6.1.9.8 or AIX 7.1.3.8 or AIX 7.1.4.3 or AIX 7.2.0.3 or AIX 7.2.1.0, 01

APPLIES TO:

Oracle Database – Enterprise Edition – Version 11.2.0.3 to 12.2.0.1 [Release 11.2 to 12.2]
IBM AIX on POWER Systems (64-bit)
A problem has been discovered in the latest SP patches for IBM AIX 6.1 and 7.1 (SP 08 and SP 03) where 11.2.0.3, 11.2.0.4, or 12.1 or 12.2 are running. ORA-600 errors and possible database corruption.

upgrade from AIX 6.1.9.7 to SP08
upgrade from AIX 7.1.4.2 to SP03
or running on one of the oslevels listed below in this note.

This is only known to impact Oracle 11.2.0.3.x, 11.2.0.4.x, 12.1.0.2, or 12.2.0.1 on AIX platforms. It has been observed on various Oracle PSU versions.
The symptoms observed so far are ORA-600 memory related failures with examples below.
Additionally, Redo log corruption has been observed in at least two cases.

DESCRIPTION

Database Corruption and/or ORA-600 ORA-7445 errors after applying IBM AIX SP patches – After update from AIX 6.1.9.7 to SP08 or AIX 7.1.4.2 to SP03 (note the earlier service packs (SP 07 or SP 02 are not impacted)

OCCURRENCE

The only changes were upgrades to the latest IBM SP patches.

upgraded from AIX 6.1.9.7 to SP08  –> SP08 has the problem.
upgraded from AIX 7.1.4.2 to SP03  –> SP03 has the problem.

To check for AIX patch levels that are exposed to this risk, run the following command and look for any of the following:

# oslevel -s

If any of the following are listed, exposure to this problem exists:

6100-09-08
7100-03-08
7100-04-03
7200-00-03
7200-01-00
7200-01-01

SYMPTOMS

The following ORA-600 errors have been observed. Note that not all errors are needed, and not all customers have seen all these errors.

=========================================================================================
ORA-00600: internal error code, arguments: [kkoipt:invalid aptyp], [0], [0], [], [], [], [], [], [], [], [], []
Optimizer – Maps the structures from memory
=========================================================================================
ORA-00600: internal error code, arguments: [kghssgai2], [1], [32], [], [], [], [], [], [], [], [], []
–looks to be pga related allocations
Generic memory Heap manager -we can’t have both a heap and an allocation function passed in to us
=========================================================================================
ORA-00600: internal error code, arguments: [qkkAssignKey:1], [], [], [], [], [], [], [], [], [], [], []
qkkAssignKey – copy keys from source to destination key
=========================================================================================
ORA-00600: internal error code, arguments: [kclgclks_3], [454], [2431642561], [], [], [], [], [], [], [], [], []
kclgclks – CR Server request
=========================================================================================
ORA-00600: internal error code, arguments: [kkqvmRmViewFromLst1], [], [], [], [], [], [], [], [], [], [], []
View Merging – list management
=========================================================================================
ORA-00600: internal error code, arguments: [kghstack_underflow_internal_1], [0x082024000], [rpi role space], [], [], [], [], [], [], [], [], []
shared heap manager Stack segment underflow, failure to follow stack discipline.
assert no previous chunk in this segment
=========================================================================================
ORA-00600: internal error code, arguments: [qerghFetch.y], [], [], [], [], [], [], [], [], [], [], []
Implements hash aggregation for query source
=========================================================================================
ORA-00600: internal error code, arguments: [qeshQBNextLoad.1], [], [], [], [], [], [], [], [], [], [], []
Hash Table Infrastructure -get Next buffer during Load
=========================================================================================
ORA-00600: internal error code, arguments: [qkshtQBGet:1], [], [], [], [], [], [], [], [], [], [], []
gets memory pointer for a query block.
Make sure the query block pointer is not NULL
=========================================================================================
ORA-00600: internal error code, arguments: [qeshIHBuildOnPartition block missed], [], [], [], [], [], [], [], [], [], [], []
Hash Table Infrastructure
update the partition at the end.
=========================================================================================
ORA-00600: internal error code, arguments: [kghssgfr2], [1]
=========================================================================================
ORA-07445: exception encountered: core dump [PC:0x0] [SIGILL] [ADDR:0x0] [PC:0x0] [Illegal opcode]
=========================================================================================
ORA-00600 [kkogbro: no kkoaptyp]
=========================================================================================
ORA-00600: internal error code, arguments: [kewrose_1], [600]
========================================================================================
ORA-00600: internal error code, arguments: [1868], [0x000000000], [], [], [], [], [], [], [], [], [], []
Core dumps are also possible.

—————

Redo log corruption with checksum error has also been observed.

Two known examples below:

example 1:

Alert.log messages:

ORA-00368: checksum error in redo log block
ORA-00353: log corruption near block 73804 change 8112409541614 time 12/07/2016 07:12:25
ORA-00334: archived log: ‘/dev/rredo13’
ORA-07445: exception encountered: core dump [pkrdi()+780] [SIGSEGV] [ADDR:0x0] [PC:0x10367B26C] [Invalid permissions for mapped

—————

There have been also transient database block corruptions or control file block corruption with checksum errors in the database where a reread finds valid data.

example 2 (transient database block corruption with checksum error):

Corrupt block relative dba: 0x5a066b2f (file 360, block 420655)
Bad check value found during buffer read
Data in bad block:
type: 6 format: 2 rdba: 0x5a066b2f
last change scn: 0x00cc.6a826294 seq: 0x1 flg: 0x06
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0x62940601
check value in block header: 0x9e7d
computed block checksum: 0x0           —> 0x0 means that checksum is good when printing the error message (transient problem)
Reading datafile ‘Datafile name’ for corruption at rdba: 0x5a066b2f (file 360, block 420655)
Reread (file 360, block 420655) found valid data
Hex dump of (file 360, block 420655) in trace file ….
Repaired corruption at (file 360, block 420655)

example 3 (transient control file corruption with checksum error):

Hex dump of (file 0, block 1) in trace file …
Corrupt block relative dba: 0x00000001 (file 0, block 1)
Bad check value found during control file header read
Data in bad block:
type: 21 format: 2 rdba: 0x00000001
last change scn: 0x0000.00000000 seq: 0x1 flg: 0x04
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0x00001501
check value in block header: 0xca35
computed block checksum: 0x0                 —> 0x0 means that checksum is good when printing the error message (transient problem)
Errors in file ..:
ORA-00202: control file: ‘/oracle/dbs/control_01.ctl’
Errors in file …
ORA-00227: corrupt block detected in control file: (block 1, # blocks 1)
ORA-00202: control file: ‘/oracle/dbs/control_01.ctl’

WORKAROUND

There is no workaround to avoid the problem, but if log corruption is encountered, one possible workaround is to clear the unarchived redo log.  The fix is to rollback the IBM SP or apply the updated fixes.

Syntax to clear logfile:

alter database clear <unarchived> logfile group <integer>;
alter database clear <unarchived> logfile ‘<filename>’;

PATCHES

The fix is now ready from IBM

It can be downloaded for the above releases via:

ftp://aix.software.ibm.com/aix/ifixes/

Affected AIX Levels     Fixed In           iFix / APAR (ftp://aix.software.ibm.com/aix/ifixes/)
6100-09-08               6100-09-09      IV93840
7100-03-08               7100-03-09      IV93884
7100-04-03               7100-04-04      IV93845
7200-00-03               7200-00-04      IV93883
7200-01-01               7200-01-02      IV93885

The fix is included in the next to be released AIX Service Packs.

IBM HIPER APAR
Abstract: PROBLEMS CAN OCCUR WITH THREAD_CPUTIME AND THREAD_CPUTIME_FAST

This APAR corrects an issue with system call thread_cputime_self with floating point registers which is exposed by Oracle Database 11gR2.

PROBLEM SUMMARY:
The thread_cputime or thread_cputime_fast interfaces can
cause invalid data in the FP/VMX/VSX registers if the thread
page faults in this function

For more information see the following from IBM:

http://www-01.ibm.com/support/docview.wss?uid=isg1SSRVPOAIX71HIPER170303-1247

参考:ALERT: Database Corruption ORA-600 ORA-7445 errors after applying AIX SP patches – AIX 6.1.9.8 or AIX 7.1.3.8 or AIX 7.1.4.3 or AIX 7.2.0.3 or AIX 7.2.1.0, 01 (Doc ID 2237498.1)

ORA-600 [kcrfr_update_nab_2]恢复支持

APPLIES TO:

Oracle Database – Enterprise Edition – Version 10.2.0.2 to 10.2.0.4 [Release 10.2]
Information in this document applies to any platform.
Oracle Server Enterprise Edition – Version: 10.2.0.2 to 10.2.0.4

SYMPTOMS

After Database Crashing or Shutdown abort we are unable to open database, in the alert.log we find the error   ORA-600[kcrfr_update_nab_2]

==> In the alert.log we can see that database crashes while performing instance recovery:

Tue Oct 07 13:30:28 2008
Starting ORACLE instance (normal)
..
ALTER DATABASE OPEN
Tue Oct 07 13:30:39 2008
Beginning crash recovery of 1 threads
Tue Oct 07 13:30:39 2008
Started redo scan
Tue Oct 07 13:30:41 2008
Errors in file ….ORCL_ora_3148.trc:
ORA-00600: internal error code, arguments: [kcrfr_update_nab_2], [0x3C2C5CD0], [2], [], [], []
Tue Oct 07 13:30:46 2008
Aborting crash recovery due to error 600

==> In the trace file we can see the following error stack

start recovery at logseq 18989, block 1312, scn 0

ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kcrfr_update_nab_2], [0x3C2C5CD0], [2], [], [], [],
Current SQL statement for this session:
alter database open
—– Call Stack Trace:
ksedst <- ksedmp <- ksfdmp <- kgerinv <- kgeasnmierr
<- kcrfr_update_nab <- kcrfr_read <- kcrfr_read_buffer <- kcrfrgv <- kcratr1
<- kcratr <- kctrec <- kcvcrv <- kcfopd <- adbdrv
<- opiexe <- opiosq0 <- kpooprx <- kpoal8 <- opiodr
<- ttcpip <- opitsk <- opiino <- opiodr <- opidrv
<- sou2o <- opimai_real <- opimai <- OracleThreadStart@

CAUSE

This issue has been reported in following bugs:

Bug 5692594
Hdr: 5692594 10.2.0.1 RDBMS 10.2.0.1 RECOVERY PRODID-5 PORTID-226 ORA-600
Abstract: AFTER DATABASE CRASHED DOESN’T OPEN ORA-600 [KCRFR_UPDATE_NAB_2]
Status: 95,Closed, Vendor OS Problem
 

Bug 6655116
Hdr: 6655116 10.2.0.3 RDBMS 10.2.0.3 RECOVERY PRODID-5 PORTID-23
Abstract: INSTANCES CRASH WITH ORA-600 [KCRFR_UPDATE_NAB_2] AFTER DISK FAILURE

Status: 95,Closed, Vendor OS Problem

EXPLANATION
The assert ORA-600: [kcrfr_update_nab_2] is a direct result of a lost write   in the current on line log that we are attempting to resolve.   So, this confirms the theory that this is a OS/hardware   lost write issue not an internal oracle bug. In fact the assert  ORA-600: [kcrfr_update_nab_2] is how we detect a lost log write.

SOLUTION

There are some bugs that match with this issue and all have been closed as Vendor OS/problem.
The error is caused by a corruption in the on line redo-log, probably a lost write in the file.

The best solution in this case is to restore database from backup and recover it until the sequence before the current on line redo-log

 

这类问题,由于写丢失无法直接open成功,如果需要,可以联系我们,提供专业ORACLE数据库恢复技术支持
Phone:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com

Alter database open ORA-7445 [kkcnrli0] signalled

APPLIES TO:

Oracle Database – Enterprise Edition – Version 11.2.0.3 to 12.1.0.2 [Release 11.2 to 12.1]
Information in this document applies to any platform.

SYMPTOMS

Alter database open signaled ORA-7445 kkcnrli0

at :

Wed Jan 17 12:09:17 2018
CJQ0 started with pid=174, OS id=71144
Completed: ALTER DATABASE OPEN /* db agent *//* {1:39551:2} */
Wed Jan 17 12:09:18 2018
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x0] [PC:0x198F36F, kkcnrli0()+639] [flags: 0x0, count: 1]
Errors in file /u01/app/oracle/diag/rdbms/zdlra/zdlra1/trace/zdlra1_q003_71365.trc (incident=247784):
ORA-07445: exception encountered: core dump [kkcnrli0()+639] [SIGSEGV] [ADDR:0x0] [PC:0x198F36F] [Address not mapped to object] []
Incident details in: /u01/app/oracle/diag/rdbms/zdlra/zdlra1/incident/incdir_247784/zdlra1_q003_71365_i247784.trc

 

Trace files

shows :

*** SERVICE NAME:(SYS$BACKGROUND) 2018-01-17 12:09:18.108
*** MODULE NAME:(Streams) 2018-01-17 12:09:18.108 <—————-
*** CLIENT DRIVER:() 2018-01-17 12:09:18.108
*** ACTION NAME:(QMON Slave) 2018-01-17 12:09:18.108 <————–

========= Dump for incident 247784 (ORA 7445 [kkcnrli0]) ========


….

*** 2018-01-17 12:09:18.110
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x3, level=3, mask=0x0)
[TOC00004]
—– SQL Statement (None) —–
Current SQL information unavailable – no cursor.
[TOC00004-END]

[TOC00005]
—– Call Stack Trace —–
… kkcnrli0 kkcnrli kponPurgeUnreachLoc kwqmnslv kwsbsmspm
kwsbgcbkms ksvrdp opirip opidrv sou2o
opimai_real ssthrdmain main

 

CAUSE

Bug 17722075 – ORA-7445 [kkcnrli] in Qnnn process during ALTER DATABASE OPEN (Doc ID 17722075.8)

SOLUTION

User  may either upgrade to the releases that were fixed

12.2.0.1 (Base Release)
12.1.0.2.160719 (Jul 2016) Database Patch Set Update (DB PSU)
12.1.0.2.160719 (Jul 2016) Database Proactive Bundle Patch
11.2.0.4.160719 Exadata Database Bundle Patch (Jul 2016)
12.1.0.2.160719 (Jul 2016) Bundle Patch for Windows Platforms

or

check if there were one-off patch according to your RDBMS oraInventory version

or

simply workaorund by restart database.

ora-600 2037 ORA-7445 kcbs_dump_adv_state

有客户系统断电,导致数据库无法启动,让我们帮忙解决,通过分析主要是ORA-600 2037和ORA-7445 _kcbs_dump_adv_state等错误,通过人工recover解决.
数据库报ORA-03113,无法启动成功

C:\Documents and Settings\Administrator>sqlplus / as sysdba

SQL*Plus: Release 10.2.0.1.0 - Production on 星期五 5月 12 09:50:36 2017

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

已连接到空闲例程。

SQL> startup
ORACLE 例程已经启动。

Total System Global Area 1258291200 bytes
Fixed Size                  1250548 bytes
Variable Size             218106636 bytes
Database Buffers         1031798784 bytes
Redo Buffers                7135232 bytes
数据库装载完毕。
ORA-03113: 通信通道的文件结束

分析alert日志

Fri May 12 09:50:43 2017
ALTER DATABASE OPEN
Fri May 12 09:50:43 2017
Beginning crash recovery of 1 threads
 parallel recovery started with 15 processes
Fri May 12 09:50:43 2017
Started redo scan
Fri May 12 09:50:43 2017
Completed redo scan
 1240 redo blocks read, 277 data blocks need recovery
Fri May 12 09:50:44 2017
Started redo application at
 Thread 1: logseq 5881, block 41179
Fri May 12 09:50:44 2017
Recovery of Online Redo Log: Thread 1 Group 1 Seq 5881 Reading mem 0
  Mem# 0 errs 0: E:\ORACLE\PRODUCT\10.2.0\ORADATA\xff\REDO01.LOG
Fri May 12 09:50:44 2017
Completed redo application
Fri May 12 09:50:44 2017
Errors in file e:\oracle\product\10.2.0\admin\xff\bdump\xff_p006_6072.trc:
ORA-00600: internal error code, arguments: [6110], [193], [3], [], [], [], [], []

Fri May 12 09:50:44 2017
Hex dump of (file 3, block 14004) in trace file e:\oracle\product\10.2.0\admin\xff\bdump\xff_p000_6024.trc
Corrupt block relative dba: 0x00c036b4 (file 3, block 14004)
Bad header found during crash/instance recovery
Data in bad block:
 type: 255 format: 7 rdba: 0x06010601
 last change scn: 0xa206.a2060601 seq: 0xb4 flg: 0x36
 spare1: 0x1 spare2: 0x6 spare3: 0x673
 consistency value in tail: 0x1b0a0708
 check value in block header: 0x36b4
 computed block checksum: 0xe4f5
Fri May 12 09:50:44 2017
Hex dump of (file 9, block 65507) in trace file e:\oracle\product\10.2.0\admin\xff\bdump\xff_p003_6056.trc
Corrupt block relative dba: 0x0240ffe3 (file 9, block 65507)
Bad header found during crash/instance recovery
Data in bad block:
 type: 3 format: 6 rdba: 0x06020601
 last change scn: 0xa206.a2060602 seq: 0xe3 flg: 0xff
 spare1: 0x1 spare2: 0x6 spare3: 0x6dc
 consistency value in tail: 0xc1028001
 check value in block header: 0xffe3
 computed block checksum: 0xff01
Fri May 12 09:50:44 2017
Reread of rdba: 0x00c036b4 (file 3, block 14004) found different data
Fri May 12 09:50:44 2017
Reread of rdba: 0x0240ffe3 (file 9, block 65507) found different data
Fri May 12 09:50:44 2017
Errors in file e:\oracle\product\10.2.0\admin\xff\bdump\xff_p005_6060.trc:
ORA-00600: internal error code,arguments:[2037],[17442602],[2718302723],[255],[9],[203],[657105414],[2147549568]
Fri May 12 09:50:44 2017
Errors in file e:\oracle\product\10.2.0\admin\xff\bdump\xff_p000_6024.trc:
ORA-07445:exception encountered:core dump[ACCESS_VIOLATION][_kclcomplete+79][PC:0x72B0C7][ADDR:0x220][UNABLE_TO_READ][]
Fri May 12 09:50:44 2017
Errors in file e:\oracle\product\10.2.0\admin\xff\bdump\xff_p006_6072.trc:
ORA-07445: exception encountered:core dump[ACCESS_VIOLATION][_kcbzdh+2496][PC:0x4A4928][ADDR:0xB][UNABLE_TO_READ][]
ORA-00600: internal error code, arguments: [6110], [193], [3], [], [], [], [], []
Errors in file e:\oracle\product\10.2.0\admin\xff\bdump\xff_p012_6128.trc:
ORA-07445: exception encountered: core dump [ACCESS_VIOLATION] [_kcbs_dump_adv_state+723] 
                                 [PC:0x5975A3] [ADDR:0xCBC0CBB2] [UNABLE_TO_READ] []
ORA-00600:internal error code,arguments:[2037],[17430318],[2718303745],[128],[1],[203],[4147028486],[2147549568]

错误比较明显由于坏块导致应用日志恢复异常,主要错误集中在ORA-600 2037,ORA-7445 _kcbs_dump_adv_state,ORA-7445_kcbzdh,ORA-7445 _kclcomplete等

dbv检查数据文件

E:\>dbv file=E:\ORACLE\PRODUCT\10.2.0\ORADATA\xff\SYSAUX01.DBF

DBVERIFY: Release 10.2.0.1.0 - Production on 星期五 5月 12 09:57:39 2017

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

DBVERIFY - 开始验证: FILE = E:\ORACLE\PRODUCT\10.2.0\ORADATA\xff\SYSAUX01.DBF

页 13353 标记为损坏
Corrupt block relative dba: 0x00c03429 (file 3, block 13353)
Bad header found during dbv:
Data in bad block:
 type: 1 format: 6 rdba: 0x3429a206
 last change scn: 0x066f.066f3429 seq: 0x0 flg: 0x00
 spare1: 0x6 spare2: 0xa2 spare3: 0x8c96
 consistency value in tail: 0x06018001
 check value in block header: 0x0
 block checksum disabled

页 14004 标记为损坏
Corrupt block relative dba: 0x00c036b4 (file 3, block 14004)
Bad header found during dbv:
Data in bad block:
 type: 1 format: 6 rdba: 0x36b4a206
 last change scn: 0x0673.067336b4 seq: 0x0 flg: 0x00
 spare1: 0x6 spare2: 0xa2 spare3: 0xfb97
 consistency value in tail: 0x06010210
 check value in block header: 0x0
 block checksum disabled

页 15261 标记为损坏
Corrupt block relative dba: 0x00c03b9d (file 3, block 15261)
Bad header found during dbv:
Data in bad block:
 type: 2 format: 6 rdba: 0x3b9da206
 last change scn: 0x0673.06733b9d seq: 0x0 flg: 0x00
 spare1: 0x6 spare2: 0xa2 spare3: 0x0
 consistency value in tail: 0x06018001
 check value in block header: 0x5549
 block checksum disabled



DBVERIFY - 验证完成

检查的页总数: 58880
处理的页总数 (数据): 19318
失败的页总数 (数据): 0
处理的页总数 (索引): 18610
失败的页总数 (索引): 0
处理的页总数 (其它): 13747
处理的总页数 (段)  : 0
失败的总页数 (段)  : 0
空的页总数: 7202
标记为损坏的总页数: 3
流入的页总数: 0
最高块 SCN            : 178325323 (0.178325323)


E:\>dbv file=E:\ORACLE\PRODUCT\10.2.0\ORADATA\xff\xff_BSE02

DBVERIFY: Release 10.2.0.1.0 - Production on 星期五 5月 12 10:10:24 2017

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

DBVERIFY - 开始验证: FILE = E:\ORACLE\PRODUCT\10.2.0\ORADATA\xff\xff_BSE02

页 65507 标记为损坏
Corrupt block relative dba: 0x0240ffe3 (file 9, block 65507)
Bad header found during dbv:
Data in bad block:
 type: 2 format: 6 rdba: 0xffe3a206
 last change scn: 0x06dc.06dcffe3 seq: 0x0 flg: 0x00
 spare1: 0x6 spare2: 0xa2 spare3: 0xb32
 consistency value in tail: 0x060102ff
 check value in block header: 0x0
 block checksum disabled



DBVERIFY - 验证完成

检查的页总数: 1310720
处理的页总数 (数据): 34102
失败的页总数 (数据): 0
处理的页总数 (索引): 30270
失败的页总数 (索引): 0
处理的页总数 (其它): 10850
处理的总页数 (段)  : 0
失败的总页数 (段)  : 0
空的页总数: 1235497
标记为损坏的总页数: 1
流入的页总数: 0
最高块 SCN            : 178325221 (0.178325221)

确实如alert日志报错,file 3和9 都出现坏块导致实例恢复无法进行。根据错误ORA-600 2037和ORA-7445 _kcbs_dump_adv_state,初步判断和During Startup (Open Database) Alert Log Shows ORA-600[2037] and ORA-7445[kcbs_dump_adv_state] (Doc ID 551993.1)文章描述相符(而且版本也相符)

尝试recover datafile部分file

E:\>sqlplus / as sysdba

SQL*Plus: Release 10.2.0.1.0 - Production on 星期五 5月 12 10:16:00 2017

Copyright (c) 1982, 2005, Oracle.  All rights reserved.


连接到:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> recover datafile 1;
完成介质恢复。
SQL> recover datafile 2;
完成介质恢复。
SQL> recover datafile 3;
完成介质恢复。
SQL> recover datafile 4;
完成介质恢复。
SQL> recover datafile 9;
完成介质恢复。
SQL> alter database open;
alter database open
*
第 1 行出现错误:
ORA-00600: 内部错误代码, 参数: [kcratr1_lastbwr], [], [], [], [], [], [], []

ORA-00600 kcratr1_lastbwr错误比较明显,见ORA-00600:[Kcratr1_lastbwr] During Database Startup after a Crash (Doc ID 393984.1)

通过recover database处理

SQL> recover database;
完成介质恢复。
SQL> alter database open;

数据库已更改。

然后通过查询dba_extents 处理坏块对象

补充ORA-600 2037错误

Format: ORA-600 [2037] [a] [b] 1 [d] [e] [f] [g]


VERSIONS:
  versions 8.0 and above

DESCRIPTION:

  During recovery we are examining a block to ensure that it is not
  corrupt prior to applying any change vectors.

  The block has failed this check and this exception is raised.

ARGUMENTS:
  Arg [a] Relative Data Block Address (RDBA) that the redo vector is for
  Arg [b] The Block format  
  Arg 1 RDBA in the block itself
  Arg [d] The block type
  Arg [e] The sequence number
  Arg [f] Flags, if set  
  Arg [g] The return value from the block head/tail checker.

MON_MODS$表ORA-600 13013报错处理

有朋友反馈数据库启动运行一点时间之后,然后就自动crash,让我们帮忙找原因,通过分析是由于smon进程触发ORA-600 13013导致数据库异常
alert日志报错信息

Thu Aug  4 18:39:44 2016
Database Characterset is ZHS16GBK
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=33, OS id=22935
Thu Aug  4 18:39:44 2016
Completed: ALTER DATABASE OPEN
Thu Aug  4 18:39:44 2016
db_recovery_file_dest_size of 2048 MB is 0.00% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Thu Aug  4 18:48:41 2016
Thread 1 advanced to log sequence 86746
  Current log# 3 seq# 86746 mem# 0: /opt/ora10/oradata/ora10g/redo03.log
Thu Aug  4 18:58:13 2016
Errors in file /opt/ora10/admin/ora10g/bdump/ora10g_smon_22449.trc:
ORA-00600: internal error code, arguments: [13013], [5001], [482], [4198075], [40], [4198075], [17], []
Thu Aug  4 18:58:56 2016
Non-fatal internal error happenned while SMON was doing flushing of monitored table stats.
SMON encountered 8 out of maximum 100 non-fatal internal errors.
Thu Aug  4 18:59:06 2016
Errors in file /opt/ora10/admin/ora10g/bdump/ora10g_smon_22449.trc:
ORA-00600: internal error code, arguments: [13013], [5001], [482], [4198075], [40], [4198075], [17], []
Thu Aug  4 18:59:08 2016
Errors in file /opt/ora10/admin/ora10g/bdump/ora10g_pmon_22413.trc:
ORA-00474: SMON process terminated with error
Thu Aug  4 18:59:08 2016
PMON: terminating instance due to error 474
Instance terminated by PMON, pid = 22413

通过trace文件大概可以发现是由于ORA-600 13013错误导致数据库crash,而且这里有类似”SMON was doing flushing of monitored table stats”错误提示,根据经验,很可能是smon把表的dml操作收集信息相关.

ORA-600 [13013] 含义

ORA-600 [13013] [a] [b] {c} [d] [e] [f] 


This format relates to Oracle Server 8.0.3 to 10.1 

Arg [a] Passcount 
Arg [b] Data Object number 
Arg {c} Tablespace Relative DBA of block containing the row to be updated 
Arg [d] Row Slot number 
Arg [e] Relative DBA of block being updated (should be same as 1) 
Arg [f] Code 

根据这个错误信息,以及How to resolve ORA-00600 [13013], [5001] [ID 816784.1]中的描述

ORA-600 13013 对应对象

SQL> select object_name from dba_objects where object_id=482

OBJECT_NAME
--------------------------------------------------------------------------------
MON_MODS$

该对象正是和监控dml变化相关的表,smon会对其进行相关操作,以前写过一篇:MON_MODS$和MON_MODS_ALL$统计DML操作次数的文章
对于MON_MODS$表ORA-600 13013处理

SQL> analyze table mon_mods$ validate structure cascade;

analyze table mon_mods$ validate structure cascade
*
ERROR at line 1:
ORA-01499: table/index cross reference failure - see trace file

 
SQL> select index_name from dba_indexes where table_name='MON_MODS$';

INDEX_NAME
------------------------------
I_MON_MODS$_OBJ

SQL> ALTER INDEX I_MON_MODS$_OBJ REBUILD;

Index altered.

SQL> analyze table mon_mods$ validate structure cascade;
analyze table mon_mods$ validate structure cascade
*
ERROR at line 1:
ORA-01499: table/index cross reference failure - see trace file

SQL> CREATE TABLE MON_MODS_BAK AS SELECT * FROM MON_MODS$;

Table created.

SQL> SELECT COUNT(*) FROM MON_MODS$;

  COUNT(*)
----------
      1247

SQL> C/MON_MODS$/MON_MODS_BAK;
  1* SELECT COUNT(*) FROM MON_MODS_BAK
SQL> /

  COUNT(*)
----------
      1247

SQL> TRUNCATE TABLE MON_MODS$;

Table truncated.

SQL> INSERT INTO MON_MODS$ SELECT * fROM MON_MODS_BAK;

1247 rows created.

SQL> COMMIT;

Commit complete.

SQL>  analyze table mon_mods$ validate structure cascade;

Table analyzed.

自此关于MON_MODS$表相关的ORA-600 13013异常处理完全,当然也可以通过重建I_MON_MODS$_OBJ索引来解决,但是不能通过rebuild index解决.数据库也就不会因此而crash了.