ORA-00600[17182],ORA-00600[25027],ORA-00600[kghfrempty:ds]故障处理

数据库不能启动(或者启动后马上crash),alert日志报错ORA-00600[17182],ORA-00600[25027],ORA-00600[kghfrempty:ds]等错误

Tue Jul 08 23:36:06 2014
alter database open
Beginning crash recovery of 1 threads
 parallel recovery started with 32 processes
Started redo scan
Completed redo scan
 read 1409 KB redo, 0 data blocks need recovery
Started redo application at
 Thread 1: logseq 2855, block 3
Recovery of Online Redo Log: Thread 1 Group 5 Seq 2855 Reading mem 0
  Mem# 0: /backup/oradata/ztmdb/redo05.log
Completed redo application of 0.00MB
Completed crash recovery at
 Thread 1: logseq 2855, block 2822, scn 104627804
 0 data blocks read, 0 data blocks written, 1409 redo k-bytes read
Tue Jul 08 23:36:12 2014
Thread 1 advanced to log sequence 2856 (thread open)
Thread 1 opened at log sequence 2856
  Current log# 1 seq# 2856 mem# 0: /backup/oradata/ztmdb/redo01.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Tue Jul 08 23:36:13 2014
SMON: enabling cache recovery
[15126] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:10340864 end:10340964 diff:100 (1 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
Errors in file /home/oracle/diag/rdbms/ztmdb/ztmdb/trace/ztmdb_smon_15100.trc  (incident=62061):
ORA-00600: internal error code, arguments: [17182], [0x2B6DFD23D7A0], [], [], [], [], [], [], [], [], [], []
Incident details in: /home/oracle/diag/rdbms/ztmdb/ztmdb/incident/incdir_62061/ztmdb_smon_15100_i62061.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
No Resource Manager plan active
Errors in file /home/oracle/diag/rdbms/ztmdb/ztmdb/trace/ztmdb_ora_15126.trc  (incident=62093):
ORA-00600: internal error code, arguments: [25027], [0], [875836979], [], [], [], [], [], [], [], [], []
Incident details in: /home/oracle/diag/rdbms/ztmdb/ztmdb/incident/incdir_62093/ztmdb_ora_15126_i62093.trc
Block recovery from logseq 2856, block 98 to scn 104627948
Recovery of Online Redo Log: Thread 1 Group 1 Seq 2856 Reading mem 0
  Mem# 0: /backup/oradata/ztmdb/redo01.log
Block recovery completed at rba 2856.99.16, scn 0.104627949
ORACLE Instance ztmdb (pid = 32) - Error 600 encountered while recovering transaction (14, 2) on object 15113.
Errors in file /home/oracle/diag/rdbms/ztmdb/ztmdb/trace/ztmdb_smon_15100.trc  (incident=62066):
ORA-00600: internal error code, arguments: [kghfrempty:ds], [0x2B6DFD23D790], [], [], [], [], [], [], [], [], [], []
ORA-07445: exception encountered: core dump [kghrst()+1835] [SIGSEGV] [ADDR:0x0] [PC:0x97EE385] [SI_KERNEL(general_protection)] []
ORA-00600: internal error code, arguments: [17182], [0x2B6DFD23D7A0], [], [], [], [], [], [], [], [], [], []
Incident details in: /home/oracle/diag/rdbms/ztmdb/ztmdb/incident/incdir_62066/ztmdb_smon_15100_i62066.trc
PMON (ospid: 14978): terminating the instance due to error 474
Tue Jul 08 23:36:26 2014
System state dump requested by (instance=1, osid=14978 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file /home/oracle/diag/rdbms/ztmdb/ztmdb/trace/ztmdb_diag_14996_20140708233626.trc
Dumping diagnostic data in directory=[cdmp_20140708233626], requested by (instance=1, osid=14978 (PMON)), summary=[abnormal instance termination].
Tue Jul 08 23:36:27 2014
Instance terminated by PMON, pid = 14978

这里的错误比较明显,数据库进行完前滚后,smon进行回滚发现有部分block CORRUPTED 导致该问题,解决该问题思路就是:
1.设置event 屏蔽回滚
2.如果1不行,设置回滚段相关隐含参数,屏蔽回滚
3.逻辑方式重建数据库

补充知识点
ORA-600 [25027]

ERROR:
  Format: ORA-600 [25027] [a] [b]

VERSIONS:
  versions 9.2 and above


ARGUMENTS:
  Arg [a]  Tablespace Number (TSN)
  Arg [b]  Decimal Relative Data Block Address (RDBA)

处理smon清理临时段导致数据库异常案例

一个朋友的数据库在经过自己的千辛万苦终于open成功,但是几分钟就down掉,使得他想导出数据重建库的目标不能实现.让我帮忙处理
alert日志报ORA-00600[kafspa:columnBuffer1]

Wed Aug  8 10:55:31 2012
Completed: ALTER DATABASE OPEN
Wed Aug  8 10:55:41 2012
Errors in file /oracle/ora10/admin/ora10g/udump/ora10g_ora_12160.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug  8 10:55:47 2012
Non-fatal internal error happenned while SMON was doing temporary segment drop.
SMON encountered 2 out of maximum 100 non-fatal internal errors.
Wed Aug  8 10:55:47 2012
Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug  8 10:55:58 2012
Non-fatal internal error happenned while SMON was doing temporary segment drop.
SMON encountered 3 out of maximum 100 non-fatal internal errors.
Wed Aug  8 10:55:59 2012
Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug  8 10:56:10 2012
Non-fatal internal error happenned while SMON was doing temporary segment drop.
SMON encountered 4 out of maximum 100 non-fatal internal errors.
Wed Aug  8 10:56:11 2012
Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug  8 10:56:22 2012
Non-fatal internal error happenned while SMON was doing temporary segment drop.
SMON encountered 5 out of maximum 100 non-fatal internal errors.
Wed Aug  8 10:56:32 2012
Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug  8 10:56:43 2012
Non-fatal internal error happenned while SMON was doing temporary segment drop.
SMON encountered 6 out of maximum 100 non-fatal internal errors.
Wed Aug  8 10:56:53 2012
Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug  8 10:57:04 2012
Non-fatal internal error happenned while SMON was doing temporary segment drop.
SMON encountered 7 out of maximum 100 non-fatal internal errors.
Wed Aug  8 10:57:14 2012
Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug  8 10:57:25 2012
Non-fatal internal error happenned while SMON was doing temporary segment drop.
SMON encountered 8 out of maximum 100 non-fatal internal errors.
Wed Aug  8 10:57:35 2012
Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug  8 10:57:38 2012
Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_pmon_8856.trc:
ORA-00474: SMON process terminated with error
Wed Aug  8 10:57:38 2012
PMON: terminating instance due to error 474
Instance terminated by PMON, pid = 8856

这里出现ORA-00600[kafspa:columnBuffer1],一个未知的错误,但是根据相关的提示,可以大概猜出来是什么原因导致数据库异常

Non-fatal internal error happenned while SMON was doing temporary segment drop.

出现这个错误,使得我们想到一个smon的功能,清理临时段.该数据库down掉很可能和smon清理临时段的过程发生失败有关系

SMON encountered 8 out of maximum 100 non-fatal internal errors.

这个错误提示是因为smon内部最多允许发生100次错误,记录错误发生了8次,当然这次数据库down掉是smon还没有达到100次就直接abort掉

SQL> col name for a32
SQL> col value for a24
SQL> col description for a70
SQL> set linesize 150
SQL> select a.ksppinm name,b.ksppstvl value,a.ksppdesc description
  2    from x$ksppi a,x$ksppcv b
  3   where a.inst_id = USERENV ('Instance')
  4     and b.inst_id = USERENV ('Instance')
  5     and a.indx = b.indx
  6     and upper(a.ksppinm) LIKE upper('%&param%')
  7  order by name
  8  
SQL> /
Enter value for param: smon_internal_errlimit
old   6:    and upper(a.ksppinm) LIKE upper('%&param%')
new   6:    and upper(a.ksppinm) LIKE upper('%smon_internal_errlimit%')

NAME                             VALUE                    DESCRIPTION
-------------------------------- ------------------------ -----------------------------------
_smon_internal_errlimit          100                      limit of SMON internal errors

分析trace文件

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /oracle/ora10/product
System name:	Linux
Node name:	DBN-HLD-155-2
Release:	2.6.18-92.el5PAE
Version:	#1 SMP Tue Apr 29 13:31:02 EDT 2008
Machine:	i686
Instance name: ora10g
Redo thread mounted by this instance: 1
Oracle process number: 8
Unix process pid: 8941, image: oracle@DBN-HLD-155-2 (SMON)

*** SERVICE NAME:() 2012-08-08 10:55:20.208
*** SESSION ID:(274.1) 2012-08-08 10:55:20.208
*** 2012-08-08 10:55:20.208
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [25027], [0], [0], [], [], [], [], []
Current SQL statement for this session:
select file#, block#, ts# from seg$ where type# = 3

果然是smon在查询type#=3的时候发现异常,出现ORA-00600[25027]错误.通过对seg$相关视图分析,可以知道type#=3表示临时段,也就是说数据库smon在查询哪些segment是临时段的时候发生意外,而结合alert日志,完整的错误应该就是:数据库启动后,smon进程为了清理临时段,需要通过select file#, block#, ts# from seg$ where type# = 3查询临时段,但是在查询的时候发生错误.而这个错误累积几次导致数据库异常关闭.

解决问题

--修改两个参数
event='10061 trace name context forever, level 10'
_smon_internal_errlimit=1000000

--启动数据库
SQL> startup pfile='/tmp/pfile'
ORACLE instance started.

Total System Global Area 1577058304 bytes
Fixed Size                  1267716 bytes
Variable Size             385878012 bytes
Database Buffers         1174405120 bytes
Redo Buffers               15507456 bytes
Database mounted.
Database opened.

因为屏蔽了smon回收临时段,数据库未出现开始时错误,观察数据库几个小时,运行正常,到此基本上解决了此次异常,通过exp可以顺利导出数据然后导入到新库中.

导致该异常sql分析

SQL> select file#, block#, ts# from seg$ where type# = 3;
select file#, block#, ts# from seg$ where type# = 3
                               *
ERROR at line 1:
ORA-00600: internal error code, arguments: [25027], [0], [0], [], [], [], [],
[]


SQL> set autot trace exp
SQL> select file#, block#, ts# from seg$ where type# = 3;

Execution Plan
----------------------------------------------------------
Plan hash value: 1605285479

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    10 |   150 |   389   (1)| 00:00:05 |
|*  1 |  TABLE ACCESS FULL| SEG$ |    10 |   150 |   389   (1)| 00:00:05 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("TYPE#"=3)

SQL> select count(rowid) from seg$;

Execution Plan
----------------------------------------------------------
Plan hash value: 763549841

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    12 |   389   (1)| 00:00:05 |
|   1 |  SORT AGGREGATE    |      |     1 |    12 |            |          |
|   2 |   TABLE ACCESS FULL| SEG$ | 45682 |   535K|   389   (1)| 00:00:05 |
---------------------------------------------------------------------------

SQL> select /*+ full(t) */ count(*) from seg$;

Execution Plan
----------------------------------------------------------
Plan hash value: 763549841

-------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    29   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |            |          |
|   2 |   TABLE ACCESS FULL| SEG$ |  3223 |    29   (0)| 00:00:01 |
-------------------------------------------------------------------

SQL> select ts# from seg$;

Execution Plan
----------------------------------------------------------
Plan hash value: 1605285479

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  3223 |  9669 |    29   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| SEG$ |  3223 |  9669 |    29   (0)| 00:00:01 |
--------------------------------------------------------------------------

SQL> set autot off
SQL> select count(rowid) from seg$;                                                                                 
SQL> /

COUNT(ROWID)
------------
       45727

SQL> select /*+ full(t) */ count(*) from seg$;

  COUNT(*)
----------
     45727

SQL> select  ts# from seg$;
0
0
……
7
7

ORA-00600: internal error code, arguments: [25027], [0], [0], [], [], [], [],
[]

SQL> !dbv file='/oracle/ora10/oradata/ora10g/system01.dbf'

DBVERIFY: Release 10.2.0.4.0 - Production on Thu Aug 9 14:05:09 2012

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

DBVERIFY - Verification starting : FILE = /oracle/ora10/oradata/ora10g/system01.dbf
Block Checking: DBA = 4225673, Block Type = KTB-managed data block
data header at 0xb6d8225c
kdbchk: bad row offset slot 6 offs 3030 fseo 3752 dtl 8168 bhs 72
Page 31369 failed with check code 6135
Block Checking: DBA = 4236289, Block Type = KTB-managed data block
data header at 0xb6d7225c
kdbchk: tosp bad (-13399)
Page 41985 failed with check code 6127


DBVERIFY - Verification complete

Total Pages Examined         : 192000
Total Pages Processed (Data) : 47588
Total Pages Failing   (Data) : 2
Total Pages Processed (Index): 40929
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 1784
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 101699
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Highest block SCN            : 1775671440 (0.1775671440)

SQL> ANALYZE TABLE sys.SEG$ VALIDATE STRUCTURE CASCADE;
ANALYZE TABLE sys.SEG$ VALIDATE STRUCTURE CASCADE
*
ERROR at line 1:
ORA-01498: block check failure - see trace file

通过这里的分析大概可以确定是由于两块KTB-managed data block数据块异常,导致直接对seg$进行TABLE ACCESS FULL操作的时候发生异常.因为这个库已经破坏了数据一致性,先导出来数据,至于出现该错误的原因,后续继续关注分析