obj$坏块exp不能执行原因探讨

上篇(obj$坏块exp/expdp导出不能执行),验证了在obj$有坏块的情况下,不能执行exp/expdp操作,这篇是说明是什么原因导致在obj$有坏块的情况下exp不能正常执行
一.启动数据库级别会话跟踪

[oracle@node1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Sun Jan 15 11:37:07 2012

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options

SQL> create pfile='/tmp/pfile' from spfile;

File created.

--------------------------------------------------
在pfile中添加
event='10046 trace name context forever,level 12'
--------------------------------------------------

SQL> startup pfile='/tmp/pfile' force
ORACLE instance started.

Total System Global Area  622149632 bytes
Fixed Size                  2230912 bytes
Variable Size             398460288 bytes
Database Buffers          213909504 bytes
Redo Buffers                7548928 bytes
Database mounted.
Database opened.

二.执行单表导出,找到trace文件

[oracle@node1 trace]$ exp "'/ as sysdba'" tables=chf.t1 file=/tmp/xifenfei.dmp \
> log=/tmp/xifenfei.log INDEXES =n  COMPRESS =n CONSISTENT =n GRANTS =n \
> STATISTICS =none TRIGGERS =n CONSTRAINTS =n

Export: Release 11.2.0.3.0 - Production on Sun Jan 15 11:48:50 2012

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.


Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing option
Export done in ZHS16GBK character set and AL16UTF16 NCHAR character set
Note: grants on tables/views/sequences/roles will not be exported
Note: indexes on tables will not be exported
Note: constraints on tables will not be exported

About to export specified tables via Conventional Path ...
Current user changed to CHF
. . exporting table                             T1

--另外会话观察
Tasks: 241 total,   1 running, 240 sleeping,   0 stopped,   0 zombie
Cpu(s):  8.9%us,  1.2%sy,  0.0%ni, 85.1%id,  4.8%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   8165060k total,  7168288k used,   996772k free,   266028k buffers
Swap:  8289500k total,      168k used,  8289332k free,  4653408k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                         
 4829 oracle    18   0 69812  12m 9144 S 51.1  0.2   0:03.64 exp               tables=chf.t1 file=/tmp/xifenfei.dmp log=/tmp/xifenfei.log INDEXES =n COMPRESS
 4830 oracle    18   0  829m  62m  58m D 27.9  0.8   0:03.85 oraclechf (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))  

[oracle@node1 trace]$ ll |grep 4830
-rw-r----- 1 oracle oinstall 14101447 01-15 11:49 chf_ora_4830.trc
-rw-r----- 1 oracle oinstall    75398 01-15 11:49 chf_ora_4830.trm
1

<strong>三.阅读trace文件</strong>
因为是obj$对象出现坏块,导致exp不能执行,如果是使用了obj$表的index,那么不会每次都报错,而我测试了多次都报错,所以怀疑是对obj$表进行全表扫描导致该错误发生,而使得exp不能继续下去。所以这次查找trace文件,重点是关注obj$表的全表扫描操作,经过耐心查找,终于发现了一个对obj$全表扫描的操作
1
PARSING IN CURSOR #46986932266584 len=41 dep=0 uid=0 oct=3 lid=0 tim=1326599330636591 hv=2311813821 ad='7be773c8' sqlid='ftx7dd64wqypx'
SELECT COUNT(*)      FROM   SYS.EXU81JAVT
END OF STMT
PARSE #46986932266584:c=2999,e=2938,p=5,cr=23,cu=0,mis=1,r=0,dep=0,og=1,plh=23986678,tim=1326599330636590
WAIT #46986932266584: nam='SQL*Net message to client' ela= 3 driver id=1650815232 #bytes=1 p3=0 obj#=91 tim=1326599330636682
WAIT #46986932266584: nam='SQL*Net message from client' ela= 42 driver id=1650815232 #bytes=1 p3=0 obj#=91 tim=1326599330636738
EXEC #46986932266584:c=0,e=21,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=23986678,tim=1326599330636788
WAIT #46986932266584: nam='SQL*Net message to client' ela= 2 driver id=1650815232 #bytes=1 p3=0 obj#=91 tim=1326599330636810
WAIT #46986932266584: nam='SQL*Net message from client' ela= 91 driver id=1650815232 #bytes=1 p3=0 obj#=91 tim=1326599330636913
WAIT #46986932266584: nam='SQL*Net message to client' ela= 9 driver id=1650815232 #bytes=1 p3=0 obj#=91 tim=1326599330668126
FETCH #46986932266584:c=30995,e=31256,p=0,cr=989,cu=0,mis=0,r=1,dep=0,og=1,plh=23986678,tim=1326599330668198
STAT #46986932266584 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=989 pr=0 pw=0 time=31173 us)'
STAT #46986932266584 id=2 cnt=1 pid=1 pos=1 obj=90724 op='TABLE ACCESS FULL OBJ$ (cr=989 pr=0 pw=0 time=31156 us cost=220 size=18270 card=522)'
WAIT #46986932266584: nam='SQL*Net message from client' ela= 76 driver id=1650815232 #bytes=1 p3=0 obj#=91 tim=1326599330668403
CLOSE #46986932266584:c=0,e=10,dep=0,type=0,tim=1326599330668452
WAIT #0: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=91 tim=1326599330668481
WAIT #0: nam='SQL*Net message from client' ela= 113 driver id=1650815232 #bytes=1 p3=0 obj#=91 tim=1326599330668606

四.对EXU81JAVT对象深究

SQL> select object_type from dba_objects where object_name='EXU81JAVT';

OBJECT_TYPE
-------------------
VIEW

SQL> set long 1000
SQL> select TEXT from dba_views where view_name='EXU81JAVT';

TEXT
------------------------------------------------------
SELECT  obj#
        FROM    sys.obj$
        WHERE   name LIKE '%DbmsJava' AND
                type# = 29 AND
                owner# = 0 AND
                status = 1


SQL> SELECT  obj#
  2          FROM    sys.obj$
  3          WHERE   name LIKE '%DbmsJava' AND
  4                type# = 29 AND
  5                owner# = 0 AND
  6                status = 1     ;

      OBJ#
----------
     17671

SQL> select name from obj$ where obj#=17671;

NAME
------------------------------
oracle/aurora/rdbms/DbmsJava

现在稳定已经定位到,是因为exp判断是否使用了java,是去找”/oracle/aurora/rdbms/DbmsJava”.这个对象的,如果java enabled,那么它就会使用dbms_java做一些转换,实际上oracle是查找视图exu81javt来确定DbmsJava的。
这里的EXU81JAVT是查询obj$而是通过name LIKE ‘%DbmsJava’,导致index不能正常使用,从而使得obj$全表扫描,而obj$有坏块,从而使得exp在obj$有坏块的情况下,不能正常执行

obj$坏块exp/expdp导出不能正常执行

今天有个朋友的多个库同时出现了obj$表出现坏块,总计数据量在100T-200T之间,而且是非归档模式,幸好数据不是很重要,不然将是一个非常大的悲剧。
我通过试验模拟证明obj$表如果出现坏块(具体方法见:bbed破坏数据文件),数据库的不能通过逻辑导出,然后建库。
一.alert发现坏块

Sat Jan 14 17:36:53 2012
Errors in file /opt/oracle/diag/rdbms/chf/chf/trace/chf_smon_493.trc:
ORA-01578: ORACLE data block corrupted (file # 1, block # 95369)
ORA-01110: data file 1: '/opt/oracle/oradata/chf/system01.dbf'

二.检查坏块对象

[oracle@node1 chf]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Sat Jan 14 17:40:29 2012

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options

SQL> col owner for a10
SQL> col SEGMENT_NAME for a15
SQL> col SEGMENT_TYPE for a10
SQL> col TABLESPACE_NAME for a10
SQL> col PARTITION_NAME for a10
SQL> SELECT OWNER, SEGMENT_NAME, SEGMENT_TYPE, TABLESPACE_NAME, A.PARTITION_NAME
  2    FROM DBA_EXTENTS A
  3   WHERE FILE_ID = &FILE_ID
  4     AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1;
Enter value for file_id: 1
old   3:  WHERE FILE_ID = &FILE_ID
new   3:  WHERE FILE_ID = 1
Enter value for block_id: 95369
old   4:    AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
new   4:    AND 95369 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1

OWNER      SEGMENT_NAME    SEGMENT_TY TABLESPACE PARTITION_
---------- --------------- ---------- ---------- ----------
SYS        OBJ$            TABLE      SYSTEM

三.验证坏块方法
1.sql查询

SQL> select /*+ full(obj$) */ count(*) from obj$;
select /*+ full(obj$) */ count(*) from obj$
                                       *
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 1, block # 95369)
ORA-01110: data file 1: '/opt/oracle/oradata/chf/system01.dbf'

2.dump数据文件

SQL> alter system dump datafile 1 block 95369;

System altered.

--查看dump文件
Start dump data blocks tsn: 0 file#:1 minblk 95369 maxblk 95369
Block dump from cache:
Dump of buffer cache at level 4 for tsn=0 rdba=4289673
BH (0x6aff6a88) file#: 1 rdba: 0x00417489 (1/95369) class: 1 ba: 0x6af3c000
  set: 19 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 25,19
  dbwrid: 0 obj: 90724 objn: 90724 tsn: 0 afn: 1 hint: f
  hash: [0x6d7f6088,0x838655e0] lru: [0x6aff6ca0,0x6aff6a40]
  ckptq: [NULL] fileq: [NULL] objq: [0x6b3f2458,0x6a3e03c8] objaq: [0x6b3f2468,0x6a3e03d8]
  st: XCURRENT md: NULL fpin: 'kdswh11: kdst_fetch' tch: 0
  flags: only_sequential_access auto_bmr_tried
  LRBA: [0x0.0.0] LSCN: [0x0.0] HSCN: [0xffff.ffffffff] HSUB: [65535]
BH (0x6d7f5fd8) file#: 1 rdba: 0x00417489 (1/95369) class: 1 ba: 0x6d72a000
  set: 23 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 16,28
  dbwrid: 0 obj: 90724 objn: 90724 tsn: 0 afn: 1 hint: f
  hash: [0x838655e0,0x6aff6b38] lru: [0x60ff3ac0,0x83b346e8]
  lru-flags: on_auxiliary_list
  ckptq: [NULL] fileq: [NULL] objq: [NULL] objaq: [NULL]
  st: FREE md: NULL fpin: 'kdswh11: kdst_fetch' tch: 0 lfb: 33
  flags:
Block dump from disk:
buffer tsn: 0 rdba: 0x00417489 (1/95369)
scn: 0x0000.00db0299 seq: 0x01 flg: 0x06 tail: 0x39393332
frmt: 0x02 chkval: 0x05f8 type: 0x06=trans data
Hex dump of corrupt header 2 = BROKEN
Dump of memory from 0x00002B5631A02A00 to 0x00002B5631A02A14
2B5631A02A00 0000A206 00417489 00DB0299 06010000  [.....tA.........]
2B5631A02A10 000005F8                             [....]     

SQL>   select object_name from dba_objects where object_id=90724;

OBJECT_NAME
----------------------------------
OBJ$
       

3.bbed

[oracle@node1 chf]$ bbed
Password: 

BBED: Release 2.0.0.0.0 - Limited Production on Sat Jan 14 18:28:25 2012

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

************* !!! For Oracle Internal Use only !!! ***************

BBED> set filename 'system01.dbf'
        FILENAME        ./system01.dbf

BBED> set blocksize 8192
        BLOCKSIZE       8192

BBED> set block 95369
        BLOCK#          95369

BBED> verify
DBVERIFY - Verification starting
FILE = ././system01.dbf
BLOCK = 95368

Block 95368 is corrupt
Corrupt block relative dba: 0x00417489 (file 0, block 95369)
Fractured block found during verification
Data in bad block:
 type: 6 format: 2 rdba: 0x00417488
 last change scn: 0x0000.00da8bce seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x65636238
 check value in block header: 0x9925
 computed block checksum: 0x8a94


DBVERIFY - Verification complete

Total Blocks Examined         : 1
Total Blocks Processed (Data) : 0
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 1
Total Blocks Influx           : 2
Message 531 not found;  product=RDBMS; facility=BBED

4.dbv

[oracle@node1 chf]$ dbv file=system01.dbf

DBVERIFY: Release 11.2.0.3.0 - Production on Sat Jan 14 18:29:43 2012

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /opt/oracle/oradata/chf/system01.dbf
Page 95369 is influx - most likely media corrupt
Corrupt block relative dba: 0x00417489 (file 1, block 95369)
Fractured block found during dbv: 
Data in bad block:
 type: 6 format: 2 rdba: 0x00417489
 last change scn: 0x0000.00db0299 seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x39393332
 check value in block header: 0x5f8
 computed block checksum: 0xe93



DBVERIFY - Verification complete

Total Pages Examined         : 172800
Total Pages Processed (Data) : 132246
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 15726
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 3548
Total Pages Processed (Seg)  : 1
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 21278
Total Pages Marked Corrupt   : 1
Total Pages Influx           : 1
Total Pages Encrypted        : 0
Highest block SCN            : 16682372 (0.16682372)

5.rman

[oracle@node1 chf]$ rman target / 

Recovery Manager: Release 11.2.0.3.0 - Production on Sat Jan 14 18:30:54 2012

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to target database: CHF (DBID=3444205684)

RMAN> backup check logical validate datafile 1;

Starting backup at 2012-01-14 18:31:29
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=223 device type=DISK
channel ORA_DISK_1: starting compressed full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=/opt/oracle/oradata/chf/system01.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:03
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
1    FAILED 0              21278        172802          16682540  
  File Name: /opt/oracle/oradata/chf/system01.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       1              132247                  
  Other      0              3548            

validate found one or more corrupt blocks
See trace file /opt/oracle/diag/rdbms/chf/chf/trace/chf_ora_2429.trc for details
channel ORA_DISK_1: starting compressed full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current control file in backup set
including current SPFILE in backup set
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
List of Control File and SPFILE
===============================
File Type    Status Blocks Failing Blocks Examined
------------ ------ -------------- ---------------
SPFILE       OK     0              2               
Control File OK     0              882             
Finished backup at 2012-01-14 18:31:34


SQL>  select file#,block#,blocks from v$database_block_corruption;

     FILE#     BLOCK#     BLOCKS
---------- ---------- ----------
         1      95369          1

6.ANALYZE

SQL> ANALYZE TABLE sys.OBJ$ VALIDATE STRUCTURE;
ANALYZE TABLE sys.OBJ$ VALIDATE STRUCTURE
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 1, block # 95369)
ORA-01110: data file 1: '/opt/oracle/oradata/chf/system01.dbf'

三.测试逻辑导出数据
1.exp导出单个表

[oracle@node1 chf]$ exp "'/ as sysdba'" tables=chf.t_undo file=/tmp/chf.dmp log=/tmp/chf.log

Export: Release 11.2.0.3.0 - Production on Sat Jan 14 17:41:35 2012

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.


Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing option
EXP-00008: ORACLE error 1578 encountered
ORA-01578: ORACLE data block corrupted (file # 1, block # 95369)
ORA-01110: data file 1: '/opt/oracle/oradata/chf/system01.dbf'
EXP-00000: Export terminated unsuccessfully

2.expdp导出单个表

[oracle@node1 chf]$ expdp "'/ as sysdba'" dumpfile=xifenfei.dmp  tables=chf.t_odu

Export: Release 11.2.0.3.0 - Production on Sat Jan 14 17:55:09 2012

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options
Starting "SYS"."SYS_EXPORT_TABLE_01":  "/******** AS SYSDBA" dumpfile=xifenfei.dmp tables=chf.t_odu 
Estimate in progress using BLOCKS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 6 MB
Processing object type TABLE_EXPORT/TABLE/TABLE
ORA-39126: Worker unexpected fatal error in KUPW$WORKER.FETCH_XML_OBJECTS [TABLE:"CHF"."T_ODU"] 
ORA-01578: ORACLE data block corrupted (file # 1, block # 95369)
ORA-01110: data file 1: '/opt/oracle/oradata/chf/system01.dbf'

ORA-06512: at "SYS.DBMS_SYS_ERROR", line 95
ORA-06512: at "SYS.KUPW$WORKER", line 9001

----- PL/SQL Call Stack -----
  object      line  object
  handle    number  name
0x7a8608a8     20462  package body SYS.KUPW$WORKER
0x7a8608a8      9028  package body SYS.KUPW$WORKER
0x7a8608a8     10935  package body SYS.KUPW$WORKER
0x7a8608a8      2728  package body SYS.KUPW$WORKER
0x7a8608a8      9697  package body SYS.KUPW$WORKER
0x7a8608a8      1775  package body SYS.KUPW$WORKER
0x7a864160         2  anonymous block

Processing object type TABLE_EXPORT/TABLE/TABLE
ORA-39126: Worker unexpected fatal error in KUPW$WORKER.FETCH_XML_OBJECTS [TABLE:"CHF"."T_ODU"] 
ORA-01578: ORACLE data block corrupted (file # 1, block # 95369)
ORA-01110: data file 1: '/opt/oracle/oradata/chf/system01.dbf'

ORA-06512: at "SYS.DBMS_SYS_ERROR", line 95
ORA-06512: at "SYS.KUPW$WORKER", line 9001

----- PL/SQL Call Stack -----
  object      line  object
  handle    number  name
0x7a8608a8     20462  package body SYS.KUPW$WORKER
0x7a8608a8      9028  package body SYS.KUPW$WORKER
0x7a8608a8     10935  package body SYS.KUPW$WORKER
0x7a8608a8      2728  package body SYS.KUPW$WORKER
0x7a8608a8      9697  package body SYS.KUPW$WORKER
0x7a8608a8      1775  package body SYS.KUPW$WORKER
0x7a864160         2  anonymous block

Job "SYS"."SYS_EXPORT_TABLE_01" stopped due to fatal error at 17:55:24

3.exp表空间传输

[oracle@node1 chf]$ exp userid=\'/ as sysdba\' tablespaces=users file=/tmp/users.dmp transport_tablespace=y

Export: Release 11.2.0.3.0 - Production on Sat Jan 14 18:00:21 2012

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.


Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing option
EXP-00008: ORACLE error 1578 encountered
ORA-01578: ORACLE data block corrupted (file # 1, block # 95369)
ORA-01110: data file 1: '/opt/oracle/oradata/chf/system01.dbf'
EXP-00000: Export terminated unsuccessfully

4.expdp表空间传输

[oracle@node1 chf]$ expdp userid=\'/ as sysdba\' dumpfile=xienfei.dmp  transport_tablespaces=users

Export: Release 11.2.0.3.0 - Production on Sat Jan 14 18:12:12 2012

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options
Starting "SYS"."SYS_EXPORT_TRANSPORTABLE_01":  userid="/******** AS SYSDBA" dumpfile=xienfei.dmp transport_tablespaces=users 
ORA-39123: Data Pump transportable tablespace job aborted
ORA-01578: ORACLE data block corrupted (file # 1, block # 95369)
ORA-01110: data file 1: '/opt/oracle/oradata/chf/system01.dbf'

Job "SYS"."SYS_EXPORT_TRANSPORTABLE_01" stopped due to fatal error at 18:12:14

四.总结obj$表坏块
1.在执行导出过程中,应该会去查询obj$表,而该表因出现坏块,不能被正常方法,导致逻辑备份异常终止。
2.如果你真的出现了obj$坏块,而你没有备份,那么恭喜你,悲剧的人生开始了。该对象出现问题,逻辑备份都不能工作,就算你有心重建库也不会给你这个机会,可能你不得不借助odu/dul之类的工具去解决问题。再次提醒各位,备份重于一切,安全重于泰山。

ORA-600 kcratr_nab_less_than_odr故障解决

朋友的数据库服务器出现ORA-00600[kcratr_nab_less_than_odr],不能open数据库
1.open数据库报ORA-00600[kcratr_nab_less_than_odr]

SQL> ALTER DATABASE OPEN;
ALTER DATABASE OPEN
*
第 1 行出现错误:
ORA-00600: 内部错误代码, 参数: [kcratr_nab_less_than_odr], [1], [99189],
[43531], [43569], [], [], [], [], [], [], []

2.查看alert日志

Wed Jan 11 13:56:16 2012
ALTER DATABASE OPEN
Beginning crash recovery of 1 threads
 parallel recovery started with 2 processes
Started redo scan
Completed redo scan
 read 54591 KB redo, 0 data blocks need recovery
Errors in file d:\dbdms\diag\rdbms\dbdms\dbdms\trace\dbdms_ora_3108.trc  (incident=818557):
ORA-00600: 内部错误代码, 参数: [kcratr_nab_less_than_odr], [1], [99189], [43531], [43569], [], [], [], [], [], [], []
Incident details in: d:\dbdms\diag\rdbms\dbdms\dbdms\incident\incdir_818557\dbdms_ora_3936_i818557.trc
Aborting crash recovery due to error 600
Errors in file d:\dbdms\diag\rdbms\dbdms\dbdms\trace\dbdms_ora_3108.trc:
ORA-00600: 内部错误代码, 参数: [kcratr_nab_less_than_odr], [1], [99189], [43531], [43569], [], [], [], [], [], [], []
Errors in file d:\dbdms\diag\rdbms\dbdms\dbdms\trace\dbdms_ora_3108.trc:
ORA-00600: 内部错误代码, 参数: [kcratr_nab_less_than_odr], [1], [99189], [43531], [43569], [], [], [], [], [], [], []
ORA-600 signalled during: ALTER DATABASE OPEN...
Trace dumping is performing id=[cdmp_20120110214555]

3.查看trace文件

Trace file d:\dbdms\diag\rdbms\dbdms\dbdms\trace\dbdms_ora_3108.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Windows NT Version V6.1 Service Pack 1 
CPU                 : 2 - type 8664, 2 Physical Cores
Process Affinity    : 0x0x0000000000000000
Memory (Avail/Total): Ph:2250M/4060M, Ph+PgF:5868M/8119M 
Instance name: dbdms
Redo thread mounted by this instance: 1
Oracle process number: 17
Windows thread id: 3108, image: ORACLE.EXE (SHAD)
…………………………
WARNING! Crash recovery of thread 1 seq 99189 is
ending at redo block 43531 but should not have ended before
redo block 43569
Incident 826550 created, dump file: d:\dbdms\diag\rdbms\dbdms\dbdms\incident\incdir_826550\dbdms_ora_3108_i826550.trc
ORA-00600: ??????, ??: [kcratr_nab_less_than_odr], [1], [99189], [43531], [43569], [], [], [], [], [], [], []

ORA-00600: ??????, ??: [kcratr_nab_less_than_odr], [1], [99189], [43531], [43569], [], [], [], [], [], [], []
ORA-00600: ??????, ??: [kcratr_nab_less_than_odr], [1], [99189], [43531], [43569], [], [], [], [], [], [], []

通过alert和trace中的内容可以知道,数据库需要恢复到rba到43569,但是因为某种原因实例恢复的时候,只能利用1 thread 99189 seq#,恢复rba到43531。从而导致数据库无法正常open

This Problem is caused by Storage Problem of the Database Files. 
The Subsystem (eg. SAN) crashed while the Database was open. 
The Database then crashed since the Database Files were not accessible anymore. 
This caused a lost Write into the Online RedoLogs and so Instance Recovery is not possible and raising the ORA-600.

4.解决方法

SQL> SELECT STATUS FROM V$INSTANCE;

STATUS
------------
MOUNTED

--尝试直接recover database
SQL> RECOVER DATABASE ;
ORA-00283: 恢复会话因错误而取消
ORA-00264: 不要求恢复
--提示不用恢复

--再打开数据库,还是kcratr_nab_less_than_odr错误警告
SQL> ALTER DATABASE OPEN;
ALTER DATABASE OPEN
*
第 1 行出现错误:
ORA-00600: 内部错误代码, 参数: [kcratr_nab_less_than_odr], [1], [99189],
[43531], [43569], [], [], [], [], [], [], []

--尝试不完全恢复
SQL> RECOVER DATABASE UNTIL CANCEL;
ORA-10879: error signaled in parallel recovery slave
ORA-01547: 警告: RECOVER 成功但 OPEN RESETLOGS 将出现如下错误
ORA-01152: 文件 1 没有从过旧的备份中还原
ORA-01110: 数据文件 1: 'D:\DBDMS\DATA\SYSTEM01.DBF'

--重建控制文件
SQL> ALTER DATABASE BACKUP CONTROLFILE TO TRACE AS 'D:/1.TXT';

数据库已更改。

SQL> SHUTDOWN IMMEDIATE;
ORA-01109: 数据库未打开


已经卸载数据库。
ORACLE 例程已经关闭。
SQL> STARTUP NOMOUNT;
ORACLE 例程已经启动。

Total System Global Area  417546240 bytes
Fixed Size                  2176328 bytes
Variable Size             268438200 bytes
Database Buffers          138412032 bytes
Redo Buffers                8519680 bytes
SQL> CREATE CONTROLFILE REUSE DATABASE "DBDMS" NORESETLOGS  NOARCHIVELOG
  2      MAXLOGFILES 16
  3      MAXLOGMEMBERS 3
  4      MAXDATAFILES 100
  5      MAXINSTANCES 8
  6      MAXLOGHISTORY 18688
  7  LOGFILE
  8    GROUP 1 'D:\DBDMS\LOG\REDO01.LOG'  SIZE 50M BLOCKSIZE 512,
  9    GROUP 2 'D:\DBDMS\LOG\REDO02.LOG'  SIZE 50M BLOCKSIZE 512,
 10    GROUP 3 'D:\DBDMS\LOG\REDO03.LOG'  SIZE 50M BLOCKSIZE 512
 11  DATAFILE
 12    'D:\DBDMS\DATA\SYSTEM01.DBF',
 13    'D:\DBDMS\DATA\SYSAUX01.DBF',
 14    'D:\DBDMS\DATA\RBSG01.DBF',
 15    'D:\DBDMS\DATA\DATA01.DBF',
 16    'D:\DBDMS\DATA\INDX01.DBF',
 17    'D:\DBDMS\DATA\DATA02.DBF',
 18    'D:\DBDMS\DATA\DATA03.DBF',
 19    'D:\DBDMS\DATA\DATA04.DBF',
 20    'D:\DBDMS\DATA\INDX02.DBF',
 21    'D:\DBDMS\DATA\SYSTEM02.DBF'
 22  CHARACTER SET ZHS16GBK
 23  ;

控制文件已创建。

--继续尝试恢复
SQL> RECOVER DATABASE ;
完成介质恢复。
SQL> ALTER DATABASE OPEN;

数据库已更改。
--open成功

在这次恢复中,主要就是重建控制文件,然后直接恢复成功,如果redo有损坏,那么可能需要使用不完全恢复,然后使用resetlogs打开数据库

ORA-01555 caused by SQL statement below

一.发现ORA-01555

Mon Dec 26 10:08:22 2011
ORA-01555 caused by SQL statement below (Query Duration=49146 sec, SCN: 0x0b4b.17f5ae42):
Mon Dec 26 10:08:22 2011
SELECT COMPANY_ID,
       COMPANY_MOBILE,
       TO_CHAR(NVL(REG_DATE, SYSDATE - 100), 'yyyymmddhh24miss'),
       BAK_FIELD2
  FROM TAB_XN_COMPANY
 WHERE (COMPANY_STATUS = 1 OR
       (COMPANY_STATUS = 3 AND
       NVL(UNREG_DATE, SYSDATE + 100) >=
       TO_DATE('20111226094500', 'yyyymmddhh24miss')))
   AND NVL(REG_DATE, SYSDATE - 100) <=
       TO_DATE('20111226095959', 'yyyymmddhh24miss')
   AND PAY_TYPE > 0

二.数据库状态

[oracle@ora02 ~]$ sqlplus "/ as sysdba"

SQL*Plus: Release 9.2.0.4.0 - Production on Wed Jan 4 10:48:17 2012

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

SQL> show parameter undo;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
undo_management                      string      AUTO
undo_retention                       integer     10800
undo_suppress_errors                 boolean     FALSE
undo_tablespace                      string      UNDOTBS1

SQL> select sum(maxbytes)/1024/1024/1024,
   2 SUM(USER_BYTES)/1024/1024/1024 FROM dba_data_files where tablespace_NAME='UNDOTBS1';

SUM(MAXBYTES)/1024/1024/1024 SUM(USER_BYTES)/1024/1024/1024
---------------------------- ------------------------------
                  61.9999847                     32.6834106

SQL> SELECT DISTINCT STATUS "状态",
  2                  COUNT(*) "EXTENT数量",
  3                  SUM(BYTES) / 1024 / 1024 / 1024 "UNDO大小"
  4    FROM DBA_UNDO_EXTENTS
  5   GROUP BY STATUS;

状态      EXTENT数量   UNDO大小
--------- ---------- ----------
ACTIVE             1 .000976563
EXPIRED         2549 31.2333298
UNEXPIRED          3 .000175476

通过undo_retention保留时间为10800秒,而该sql执行了49146秒,在这49146秒钟,TAB_XN_COMPANY表中的数据被修改,而且被修改的undo数据在10800秒后被覆盖导致,导致原查询语句不能获取到scn小于或者等于查询时候的数据块内容(在undo中),所以出现ORA-01555。从这里也可以看出来,在undo空间还剩余的情况下,如果超过了undo_retention限制,undo内容还是有可能被覆盖,而不是使用未使用的undo

三.出现ORA-1555原因
The ORA-1555 errors can happen when a query is unable to access enough undo to build a copy of the data at the time the query started. Committed “versions” of blocks are maintained along with newer uncommitted “versions” of those blocks so that queries can access data as it existed in the database at the time of the query. These are referred to as “consistent read” blocks and are maintained using Oracle undo management.
就是一个查询要访问某个数据块,而这个数据块在这个查询执行过程中修改过,那么该查询需要查询undo中数据块,而undo中该数据块已经不存在,从而出现ORA-1555

四.ORA-1555解决方法
Case 1 – Rollback Overwritten
1.缩短sql运行时间
2.增加undo_retention,这个同时需要考虑undo空间大小
3.减少commit(rollback)次数
4.在一条sql中尽量使数据块访问一次
4.1)Using a full table scan rather than an index lookup
4.2)Introducing a dummy sort so that we retrieve all the data, sort it and then sequentially visit these data blocks.

Case 2 – Rollback Transaction Slot Overwritten
这种问题,主要是延迟块清理导致,一般建议在进行大批量的dml操作后,使用全表(全index)扫描执行一遍,或者收集全部统计信息

ORA-07445[kslgetl()+120]/ORA-00108错误解决

一.数据库版本

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

二.alert中发现ORA-07445[kslgetl()+120]/ORA-00108错误

Mon Dec 19 09:19:42 2011
found dead dispatcher 'D000', pid = (13, 1)
Mon Dec 19 09:19:42 2011
dispatcher 'D000' encountered error getting listening address
Mon Dec 19 09:19:42 2011
Errors in file /opt/oracle/admin/gaxt/bdump/gaxt_ora_16297.trc:
ORA-07445: exception encountered: core dump [kslgetl()+120] [SIGSEGV] [Address not mapped to object] [0x000000208] [] []
ORA-00108: failed to set up dispatcher to accept connection asynchronously
Mon Dec 19 09:19:45 2011
found dead dispatcher 'D000', pid = (21, 2)
Mon Dec 19 09:19:45 2011
dispatcher 'D000' encountered error getting listening address
Mon Dec 19 09:19:45 2011
Errors in file /opt/oracle/admin/gaxt/bdump/gaxt_ora_16299.trc:
ORA-07445: exception encountered: core dump [kslgetl()+120] [SIGSEGV] [Address not mapped to object] [0x000000208] [] []
ORA-00108: failed to set up dispatcher to accept connection asynchronously

三.trace文件信息

Oracle process number: 15
Unix process pid: 10607, image: oracle@gongantest (D000)

Warning: keltnfy call to ldmInit failed with error 46
*** 2011-12-19 19:21:40.100
network error encountered getting listening address:
  NS Primary Error: TNS-12533: TNS:illegal ADDRESS parameters
  NS Secondary Error: TNS-12560: TNS:protocol adapter error
  NT Generic Error: TNS-00503: Illegal ADDRESS parameters
OPIRIP: Uncaught error 108. Error stack:
ORA-00108: failed to set up dispatcher to accept connection asynchronously
Exception signal: 11 (SIGSEGV), code: 1 (Address not mapped to object), addr: 0x208, PC: [0x7a06b8, kslgetl()+120]
*** 2011-12-19 19:21:40.107
ksedmp: internal or fatal error
ORA-07445: exception encountered: core dump [kslgetl()+120] [SIGSEGV] [Address not mapped to object] [0x000000208] [] []
ORA-00108: failed to set up dispatcher to accept connection asynchronously
Current SQL information unavailable - no session.
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst()+31          call     ksedst1()            000000000 ? 000000001 ?
                                                   2B91BE8F7D50 ? 2B91BE8F7DB0 ?
                                                   2B91BE8F7CF0 ? 000000000 ?
ksedmp()+610         call     ksedst()             000000000 ? 000000001 ?
                                                   2B91BE8F7D50 ? 2B91BE8F7DB0 ?
                                                   2B91BE8F7CF0 ? 000000000 ?
ssexhd()+629         call     ksedmp()             000000003 ? 000000001 ?
                                                   2B91BE8F7D50 ? 2B91BE8F7DB0 ?
                                                   2B91BE8F7CF0 ? 000000000 ?
__restore_rt()+0     call     ssexhd()             00000000B ? 2B91BE8F8D70 ?
                                                   2B91BE8F8C40 ? 2B91BE8F7DB0 ?
                                                   2B91BE8F7CF0 ? 000000000 ?
kslgetl()+120        signal   __restore_rt()       0600E7560 ? 0000000E8 ?
                                                   09AAE5728 ? 0000009A9 ?
                                                   000003980 ? 09AAE5740 ?
ksfglt()+108         call     kslgetl()            0600E7560 ? 000000001 ?
                                                   09AAE5728 ? 0000009A9 ?
                                                   000003980 ? 09AAE5740 ?

四.在MOS上找到相关文章[ID 1298804.1]

Applies to:

Oracle Server - Enterprise Edition - Version: 11.1.0.6 to 11.1.0.7 - Release: 11.1 to 11.1
Information in this document applies to any platform.
Symptoms

The following errors are seen in the trace file written by an ORA-7445 [kslgetl]:

network error encountered getting listening address:
NS Primary Error: TNS-12533: TNS:illegal ADDRESS parameters
NS Secondary Error: TNS-12560: TNS:protocol adapter error
NT Generic Error: TNS-00503: Illegal ADDRESS parameters
OPIRIP: Uncaught error 108. Error stack:
ORA-00108: failed to set up dispatcher to accept connection asynchronously
Exception signal: 11 (SIGSEGV), code: 1 (Address not mapped to object), addr: 0x130, PC: [0x82f09dc, kslgetl()+80]

The trace file indicates that there is no session:

Current SQL information unavailable - no session.

The Call Stack Trace in the ORA-7445 trace file contains a function list similar to:

 kslgetl <- PGOSF57_ksfglt
<- kghfre <- kmnsbf <- nsbfr <- nsiofrrg <- nsiocancel
<- nsopen_shutitdown <- nsclose <- nsgblclose <- nsgblTRMHelper <- nsgblRealTerm
<- nlstdstp <- npinlt <- ksuabt <- opidrv <- sou2o
<- opimai_real <- main <- libc_start_main

Cause

The trace file first reports: Warning: keltnfy call to ldmInit failed with error 46 

The ORA-7445 is not the starting point here. This exception is just a spin-off from ORA-180 and it is possible that different internal errors may be seen, such as ORA-600 [504], depending on what is happening when the ORA-180 is encountered.

The cause for the ORA-180 is related to the inital message at the beginning of the trace file: "keltnfy call to ldmInit failed with error 46" and this is followed by: "network error encountered getting listening address:" 

The error code (here: 46) is the key for solving the issue.
This warning says that ldmInit() returned error 46 which is LDMERR_HOST_NOT_FOUND (host not found).

This error is returned if the OS call gethostbyname() fails with an error. So these appears to be a network specific issue.
Solution

1) Check permission on /etc/hosts

$ ls -l /etc/hosts -rw-r--r-- 2 root root 194 Oct 17 2006 /etc/hosts


Check if /etc/hosts file is correctly configured


<ip address> <fully qualified hostname> <simple or short hostname> <alias, if applicable> ( all of this on one line ).


2) Check the hostname:

$ hostname
$ ping `hostname`

Make sure you are able to ping the hostname


3) Check if /etc/nodename is correctly configured
If you have DNS setup, ping is not a tool to diagnose DNS problem. A better tool to use is nslookup, dnsquery, or dig.


$ nslookup <shortname> $ nslookup <long name> $ nslookup <ip address>

The forward and reverse lookup should succeed and return consistent address/info.


4) Check nsswitch.conf


$ more nsswitch.confhosts: files dnsMake sure host lookup is also done through the /etc/hosts file and not just dns. It is recommended that FILES come first before DNS.


Also, check the resolv.conf. This makes sure that the DNS is working properly.

在这里看到虽然数据库的版本不一样,alert和trace中的错误不完全一致,但是很相似。是由于主机名不能被正常访问导致,所以尝试这从主机名相关部分着手解决。

五.查看主机名相关信息,解决问题

[oracle@gongantest ~]$  ls -l /etc/hosts
-rw-r--r-- 2 root root 176 Dec 16 13:43 /etc/hosts
[oracle@gongantest ~]$ hostname
gongantest
[oracle@gongantest ~]$ ping gongantest
ping: unknown host gongantest
[oracle@gongantest ~]$ more  /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1        localhost.localdomain localhost
#192.168.11.60    gongantest
--发现改主机名的dns被注释掉
[oracle@gongantest ~]$ ping gongantest
PING gongantest (192.168.11.60) 56(84) bytes of data.
From 192.168.9.66 icmp_seq=2 Destination Host Unreachable
From 192.168.9.66 icmp_seq=3 Destination Host Unreachable
From 192.168.9.66 icmp_seq=4 Destination Host Unreachable

--- gongantest ping statistics ---
5 packets transmitted, 0 received, +3 errors, 100% packet loss, time 3999ms, pipe 3
--除掉注释,测试不通
[root@gongantest ~]# ifconfig
eth1      Link encap:Ethernet  HWaddr 00:14:22:10:96:CA  
          inet addr:192.168.9.66  Bcast:192.168.11.255  Mask:255.255.252.0
          inet6 addr: fe80::214:22ff:fe10:96ca/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:4207222 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2482964 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100 
          RX bytes:1156547557 (1.0 GiB)  TX bytes:565900103 (539.6 MiB)
--发现该机器的ip为192.168.9.66,修改hosts文件。
--初步确定出现问题的原因是因为机器迁移,使用了新ip,注释掉了hosts文件中错误ip
[root@gongantest ~]# vi /etc/hosts

# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1        localhost.localdomain localhost
192.168.9.66    gongantest

[oracle@gongantest ~]$ ping gongantest
PING gongantest (192.168.9.66) 56(84) bytes of data.
64 bytes from gongantest (192.168.9.66): icmp_seq=1 ttl=64 time=0.037 ms
64 bytes from gongantest (192.168.9.66): icmp_seq=2 ttl=64 time=0.031 ms
64 bytes from gongantest (192.168.9.66): icmp_seq=3 ttl=64 time=0.033 ms

--- gongantest ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.031/0.033/0.037/0.007 ms
--ping测试正常

继续观察

--观察d000进程是否启动
[oracle@gongantest ~]$ ps -ef|grep ora_
oracle   10180     1  0 13:23 ?        00:00:00 ora_pmon_gaxt
oracle   10182     1  0 13:23 ?        00:00:00 ora_psp0_gaxt
oracle   10184     1  0 13:23 ?        00:00:00 ora_mman_gaxt
oracle   10186     1  0 13:23 ?        00:00:00 ora_dbw0_gaxt
oracle   10188     1  0 13:23 ?        00:00:02 ora_lgwr_gaxt
oracle   10190     1  0 13:23 ?        00:00:00 ora_ckpt_gaxt
oracle   10192     1  0 13:23 ?        00:00:00 ora_smon_gaxt
oracle   10194     1  0 13:23 ?        00:00:00 ora_reco_gaxt
oracle   10196     1  0 13:23 ?        00:00:00 ora_cjq0_gaxt
oracle   10198     1  0 13:23 ?        00:00:00 ora_mmon_gaxt
oracle   10200     1  0 13:23 ?        00:00:00 ora_mmnl_gaxt
oracle   10204     1  0 13:23 ?        00:00:00 ora_s000_gaxt
oracle   10210     1  0 13:23 ?        00:00:00 ora_arc0_gaxt
oracle   10212     1  0 13:23 ?        00:00:00 ora_arc1_gaxt
oracle   10214     1  0 13:23 ?        00:00:00 ora_qmnc_gaxt
oracle   10218     1  0 13:23 ?        00:00:00 ora_j000_gaxt
oracle   10222     1  0 13:24 ?        00:00:00 ora_q000_gaxt
oracle   10234     1  0 13:24 ?        00:00:00 ora_q001_gaxt
oracle   10609     1  0 13:32 ?        00:00:00 ora_d000_gaxt
oracle   10639  9962  0 13:35 pts/0    00:00:00 grep ora_

--观察alert日志未出现该错误
Mon Dec 26 13:32:18 2011
Errors in file /opt/oracle/admin/gaxt/bdump/gaxt_ora_10607.trc:
ORA-07445: exception encountered: core dump [kslgetl()+120] [SIGSEGV] [Address not mapped to object] [0x000000208] [] []
ORA-00108: failed to set up dispatcher to accept connection asynchronously
Mon Dec 26 13:32:21 2011
found dead dispatcher 'D000', pid = (13, 85)
Mon Dec 26 13:36:04 2011
Thread 1 advanced to log sequence 232 (LGWR switch)
  Current log# 1 seq# 232 mem# 0: /opt/oracle/oradata/gaxt/redo01.log

至此ORA-07445[kslgetl()+120]/ORA-00108错误解决