obj$坏块exp不能执行原因探讨

上篇(obj$坏块exp/expdp导出不能执行),验证了在obj$有坏块的情况下,不能执行exp/expdp操作,这篇是说明是什么原因导致在obj$有坏块的情况下exp不能正常执行
一.启动数据库级别会话跟踪

[oracle@node1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Sun Jan 15 11:37:07 2012

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options

SQL> create pfile='/tmp/pfile' from spfile;

File created.

--------------------------------------------------
在pfile中添加
event='10046 trace name context forever,level 12'
--------------------------------------------------

SQL> startup pfile='/tmp/pfile' force
ORACLE instance started.

Total System Global Area  622149632 bytes
Fixed Size                  2230912 bytes
Variable Size             398460288 bytes
Database Buffers          213909504 bytes
Redo Buffers                7548928 bytes
Database mounted.
Database opened.

二.执行单表导出,找到trace文件

[oracle@node1 trace]$ exp "'/ as sysdba'" tables=chf.t1 file=/tmp/xifenfei.dmp \
> log=/tmp/xifenfei.log INDEXES =n  COMPRESS =n CONSISTENT =n GRANTS =n \
> STATISTICS =none TRIGGERS =n CONSTRAINTS =n

Export: Release 11.2.0.3.0 - Production on Sun Jan 15 11:48:50 2012

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.


Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing option
Export done in ZHS16GBK character set and AL16UTF16 NCHAR character set
Note: grants on tables/views/sequences/roles will not be exported
Note: indexes on tables will not be exported
Note: constraints on tables will not be exported

About to export specified tables via Conventional Path ...
Current user changed to CHF
. . exporting table                             T1

--另外会话观察
Tasks: 241 total,   1 running, 240 sleeping,   0 stopped,   0 zombie
Cpu(s):  8.9%us,  1.2%sy,  0.0%ni, 85.1%id,  4.8%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   8165060k total,  7168288k used,   996772k free,   266028k buffers
Swap:  8289500k total,      168k used,  8289332k free,  4653408k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                         
 4829 oracle    18   0 69812  12m 9144 S 51.1  0.2   0:03.64 exp               tables=chf.t1 file=/tmp/xifenfei.dmp log=/tmp/xifenfei.log INDEXES =n COMPRESS
 4830 oracle    18   0  829m  62m  58m D 27.9  0.8   0:03.85 oraclechf (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))  

[oracle@node1 trace]$ ll |grep 4830
-rw-r----- 1 oracle oinstall 14101447 01-15 11:49 chf_ora_4830.trc
-rw-r----- 1 oracle oinstall    75398 01-15 11:49 chf_ora_4830.trm
1

<strong>三.阅读trace文件</strong>
因为是obj$对象出现坏块,导致exp不能执行,如果是使用了obj$表的index,那么不会每次都报错,而我测试了多次都报错,所以怀疑是对obj$表进行全表扫描导致该错误发生,而使得exp不能继续下去。所以这次查找trace文件,重点是关注obj$表的全表扫描操作,经过耐心查找,终于发现了一个对obj$全表扫描的操作
1
PARSING IN CURSOR #46986932266584 len=41 dep=0 uid=0 oct=3 lid=0 tim=1326599330636591 hv=2311813821 ad='7be773c8' sqlid='ftx7dd64wqypx'
SELECT COUNT(*)      FROM   SYS.EXU81JAVT
END OF STMT
PARSE #46986932266584:c=2999,e=2938,p=5,cr=23,cu=0,mis=1,r=0,dep=0,og=1,plh=23986678,tim=1326599330636590
WAIT #46986932266584: nam='SQL*Net message to client' ela= 3 driver id=1650815232 #bytes=1 p3=0 obj#=91 tim=1326599330636682
WAIT #46986932266584: nam='SQL*Net message from client' ela= 42 driver id=1650815232 #bytes=1 p3=0 obj#=91 tim=1326599330636738
EXEC #46986932266584:c=0,e=21,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=23986678,tim=1326599330636788
WAIT #46986932266584: nam='SQL*Net message to client' ela= 2 driver id=1650815232 #bytes=1 p3=0 obj#=91 tim=1326599330636810
WAIT #46986932266584: nam='SQL*Net message from client' ela= 91 driver id=1650815232 #bytes=1 p3=0 obj#=91 tim=1326599330636913
WAIT #46986932266584: nam='SQL*Net message to client' ela= 9 driver id=1650815232 #bytes=1 p3=0 obj#=91 tim=1326599330668126
FETCH #46986932266584:c=30995,e=31256,p=0,cr=989,cu=0,mis=0,r=1,dep=0,og=1,plh=23986678,tim=1326599330668198
STAT #46986932266584 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=989 pr=0 pw=0 time=31173 us)'
STAT #46986932266584 id=2 cnt=1 pid=1 pos=1 obj=90724 op='TABLE ACCESS FULL OBJ$ (cr=989 pr=0 pw=0 time=31156 us cost=220 size=18270 card=522)'
WAIT #46986932266584: nam='SQL*Net message from client' ela= 76 driver id=1650815232 #bytes=1 p3=0 obj#=91 tim=1326599330668403
CLOSE #46986932266584:c=0,e=10,dep=0,type=0,tim=1326599330668452
WAIT #0: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=91 tim=1326599330668481
WAIT #0: nam='SQL*Net message from client' ela= 113 driver id=1650815232 #bytes=1 p3=0 obj#=91 tim=1326599330668606

四.对EXU81JAVT对象深究

SQL> select object_type from dba_objects where object_name='EXU81JAVT';

OBJECT_TYPE
-------------------
VIEW

SQL> set long 1000
SQL> select TEXT from dba_views where view_name='EXU81JAVT';

TEXT
------------------------------------------------------
SELECT  obj#
        FROM    sys.obj$
        WHERE   name LIKE '%DbmsJava' AND
                type# = 29 AND
                owner# = 0 AND
                status = 1


SQL> SELECT  obj#
  2          FROM    sys.obj$
  3          WHERE   name LIKE '%DbmsJava' AND
  4                type# = 29 AND
  5                owner# = 0 AND
  6                status = 1     ;

      OBJ#
----------
     17671

SQL> select name from obj$ where obj#=17671;

NAME
------------------------------
oracle/aurora/rdbms/DbmsJava

现在稳定已经定位到,是因为exp判断是否使用了java,是去找”/oracle/aurora/rdbms/DbmsJava”.这个对象的,如果java enabled,那么它就会使用dbms_java做一些转换,实际上oracle是查找视图exu81javt来确定DbmsJava的。
这里的EXU81JAVT是查询obj$而是通过name LIKE ‘%DbmsJava’,导致index不能正常使用,从而使得obj$全表扫描,而obj$有坏块,从而使得exp在obj$有坏块的情况下,不能正常执行

obj$坏块exp/expdp导出不能正常执行

今天有个朋友的多个库同时出现了obj$表出现坏块,总计数据量在100T-200T之间,而且是非归档模式,幸好数据不是很重要,不然将是一个非常大的悲剧。
我通过试验模拟证明obj$表如果出现坏块(具体方法见:bbed破坏数据文件),数据库的不能通过逻辑导出,然后建库。
一.alert发现坏块

Sat Jan 14 17:36:53 2012
Errors in file /opt/oracle/diag/rdbms/chf/chf/trace/chf_smon_493.trc:
ORA-01578: ORACLE data block corrupted (file # 1, block # 95369)
ORA-01110: data file 1: '/opt/oracle/oradata/chf/system01.dbf'

二.检查坏块对象

[oracle@node1 chf]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Sat Jan 14 17:40:29 2012

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options

SQL> col owner for a10
SQL> col SEGMENT_NAME for a15
SQL> col SEGMENT_TYPE for a10
SQL> col TABLESPACE_NAME for a10
SQL> col PARTITION_NAME for a10
SQL> SELECT OWNER, SEGMENT_NAME, SEGMENT_TYPE, TABLESPACE_NAME, A.PARTITION_NAME
  2    FROM DBA_EXTENTS A
  3   WHERE FILE_ID = &FILE_ID
  4     AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1;
Enter value for file_id: 1
old   3:  WHERE FILE_ID = &FILE_ID
new   3:  WHERE FILE_ID = 1
Enter value for block_id: 95369
old   4:    AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
new   4:    AND 95369 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1

OWNER      SEGMENT_NAME    SEGMENT_TY TABLESPACE PARTITION_
---------- --------------- ---------- ---------- ----------
SYS        OBJ$            TABLE      SYSTEM

三.验证坏块方法
1.sql查询

SQL> select /*+ full(obj$) */ count(*) from obj$;
select /*+ full(obj$) */ count(*) from obj$
                                       *
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 1, block # 95369)
ORA-01110: data file 1: '/opt/oracle/oradata/chf/system01.dbf'

2.dump数据文件

SQL> alter system dump datafile 1 block 95369;

System altered.

--查看dump文件
Start dump data blocks tsn: 0 file#:1 minblk 95369 maxblk 95369
Block dump from cache:
Dump of buffer cache at level 4 for tsn=0 rdba=4289673
BH (0x6aff6a88) file#: 1 rdba: 0x00417489 (1/95369) class: 1 ba: 0x6af3c000
  set: 19 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 25,19
  dbwrid: 0 obj: 90724 objn: 90724 tsn: 0 afn: 1 hint: f
  hash: [0x6d7f6088,0x838655e0] lru: [0x6aff6ca0,0x6aff6a40]
  ckptq: [NULL] fileq: [NULL] objq: [0x6b3f2458,0x6a3e03c8] objaq: [0x6b3f2468,0x6a3e03d8]
  st: XCURRENT md: NULL fpin: 'kdswh11: kdst_fetch' tch: 0
  flags: only_sequential_access auto_bmr_tried
  LRBA: [0x0.0.0] LSCN: [0x0.0] HSCN: [0xffff.ffffffff] HSUB: [65535]
BH (0x6d7f5fd8) file#: 1 rdba: 0x00417489 (1/95369) class: 1 ba: 0x6d72a000
  set: 23 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 16,28
  dbwrid: 0 obj: 90724 objn: 90724 tsn: 0 afn: 1 hint: f
  hash: [0x838655e0,0x6aff6b38] lru: [0x60ff3ac0,0x83b346e8]
  lru-flags: on_auxiliary_list
  ckptq: [NULL] fileq: [NULL] objq: [NULL] objaq: [NULL]
  st: FREE md: NULL fpin: 'kdswh11: kdst_fetch' tch: 0 lfb: 33
  flags:
Block dump from disk:
buffer tsn: 0 rdba: 0x00417489 (1/95369)
scn: 0x0000.00db0299 seq: 0x01 flg: 0x06 tail: 0x39393332
frmt: 0x02 chkval: 0x05f8 type: 0x06=trans data
Hex dump of corrupt header 2 = BROKEN
Dump of memory from 0x00002B5631A02A00 to 0x00002B5631A02A14
2B5631A02A00 0000A206 00417489 00DB0299 06010000  [.....tA.........]
2B5631A02A10 000005F8                             [....]     

SQL>   select object_name from dba_objects where object_id=90724;

OBJECT_NAME
----------------------------------
OBJ$
       

3.bbed

[oracle@node1 chf]$ bbed
Password: 

BBED: Release 2.0.0.0.0 - Limited Production on Sat Jan 14 18:28:25 2012

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

************* !!! For Oracle Internal Use only !!! ***************

BBED> set filename 'system01.dbf'
        FILENAME        ./system01.dbf

BBED> set blocksize 8192
        BLOCKSIZE       8192

BBED> set block 95369
        BLOCK#          95369

BBED> verify
DBVERIFY - Verification starting
FILE = ././system01.dbf
BLOCK = 95368

Block 95368 is corrupt
Corrupt block relative dba: 0x00417489 (file 0, block 95369)
Fractured block found during verification
Data in bad block:
 type: 6 format: 2 rdba: 0x00417488
 last change scn: 0x0000.00da8bce seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x65636238
 check value in block header: 0x9925
 computed block checksum: 0x8a94


DBVERIFY - Verification complete

Total Blocks Examined         : 1
Total Blocks Processed (Data) : 0
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 1
Total Blocks Influx           : 2
Message 531 not found;  product=RDBMS; facility=BBED

4.dbv

[oracle@node1 chf]$ dbv file=system01.dbf

DBVERIFY: Release 11.2.0.3.0 - Production on Sat Jan 14 18:29:43 2012

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /opt/oracle/oradata/chf/system01.dbf
Page 95369 is influx - most likely media corrupt
Corrupt block relative dba: 0x00417489 (file 1, block 95369)
Fractured block found during dbv: 
Data in bad block:
 type: 6 format: 2 rdba: 0x00417489
 last change scn: 0x0000.00db0299 seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x39393332
 check value in block header: 0x5f8
 computed block checksum: 0xe93



DBVERIFY - Verification complete

Total Pages Examined         : 172800
Total Pages Processed (Data) : 132246
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 15726
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 3548
Total Pages Processed (Seg)  : 1
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 21278
Total Pages Marked Corrupt   : 1
Total Pages Influx           : 1
Total Pages Encrypted        : 0
Highest block SCN            : 16682372 (0.16682372)

5.rman

[oracle@node1 chf]$ rman target / 

Recovery Manager: Release 11.2.0.3.0 - Production on Sat Jan 14 18:30:54 2012

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to target database: CHF (DBID=3444205684)

RMAN> backup check logical validate datafile 1;

Starting backup at 2012-01-14 18:31:29
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=223 device type=DISK
channel ORA_DISK_1: starting compressed full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=/opt/oracle/oradata/chf/system01.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:03
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
1    FAILED 0              21278        172802          16682540  
  File Name: /opt/oracle/oradata/chf/system01.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       1              132247                  
  Other      0              3548            

validate found one or more corrupt blocks
See trace file /opt/oracle/diag/rdbms/chf/chf/trace/chf_ora_2429.trc for details
channel ORA_DISK_1: starting compressed full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current control file in backup set
including current SPFILE in backup set
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
List of Control File and SPFILE
===============================
File Type    Status Blocks Failing Blocks Examined
------------ ------ -------------- ---------------
SPFILE       OK     0              2               
Control File OK     0              882             
Finished backup at 2012-01-14 18:31:34


SQL>  select file#,block#,blocks from v$database_block_corruption;

     FILE#     BLOCK#     BLOCKS
---------- ---------- ----------
         1      95369          1

6.ANALYZE

SQL> ANALYZE TABLE sys.OBJ$ VALIDATE STRUCTURE;
ANALYZE TABLE sys.OBJ$ VALIDATE STRUCTURE
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 1, block # 95369)
ORA-01110: data file 1: '/opt/oracle/oradata/chf/system01.dbf'

三.测试逻辑导出数据
1.exp导出单个表

[oracle@node1 chf]$ exp "'/ as sysdba'" tables=chf.t_undo file=/tmp/chf.dmp log=/tmp/chf.log

Export: Release 11.2.0.3.0 - Production on Sat Jan 14 17:41:35 2012

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.


Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing option
EXP-00008: ORACLE error 1578 encountered
ORA-01578: ORACLE data block corrupted (file # 1, block # 95369)
ORA-01110: data file 1: '/opt/oracle/oradata/chf/system01.dbf'
EXP-00000: Export terminated unsuccessfully

2.expdp导出单个表

[oracle@node1 chf]$ expdp "'/ as sysdba'" dumpfile=xifenfei.dmp  tables=chf.t_odu

Export: Release 11.2.0.3.0 - Production on Sat Jan 14 17:55:09 2012

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options
Starting "SYS"."SYS_EXPORT_TABLE_01":  "/******** AS SYSDBA" dumpfile=xifenfei.dmp tables=chf.t_odu 
Estimate in progress using BLOCKS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 6 MB
Processing object type TABLE_EXPORT/TABLE/TABLE
ORA-39126: Worker unexpected fatal error in KUPW$WORKER.FETCH_XML_OBJECTS [TABLE:"CHF"."T_ODU"] 
ORA-01578: ORACLE data block corrupted (file # 1, block # 95369)
ORA-01110: data file 1: '/opt/oracle/oradata/chf/system01.dbf'

ORA-06512: at "SYS.DBMS_SYS_ERROR", line 95
ORA-06512: at "SYS.KUPW$WORKER", line 9001

----- PL/SQL Call Stack -----
  object      line  object
  handle    number  name
0x7a8608a8     20462  package body SYS.KUPW$WORKER
0x7a8608a8      9028  package body SYS.KUPW$WORKER
0x7a8608a8     10935  package body SYS.KUPW$WORKER
0x7a8608a8      2728  package body SYS.KUPW$WORKER
0x7a8608a8      9697  package body SYS.KUPW$WORKER
0x7a8608a8      1775  package body SYS.KUPW$WORKER
0x7a864160         2  anonymous block

Processing object type TABLE_EXPORT/TABLE/TABLE
ORA-39126: Worker unexpected fatal error in KUPW$WORKER.FETCH_XML_OBJECTS [TABLE:"CHF"."T_ODU"] 
ORA-01578: ORACLE data block corrupted (file # 1, block # 95369)
ORA-01110: data file 1: '/opt/oracle/oradata/chf/system01.dbf'

ORA-06512: at "SYS.DBMS_SYS_ERROR", line 95
ORA-06512: at "SYS.KUPW$WORKER", line 9001

----- PL/SQL Call Stack -----
  object      line  object
  handle    number  name
0x7a8608a8     20462  package body SYS.KUPW$WORKER
0x7a8608a8      9028  package body SYS.KUPW$WORKER
0x7a8608a8     10935  package body SYS.KUPW$WORKER
0x7a8608a8      2728  package body SYS.KUPW$WORKER
0x7a8608a8      9697  package body SYS.KUPW$WORKER
0x7a8608a8      1775  package body SYS.KUPW$WORKER
0x7a864160         2  anonymous block

Job "SYS"."SYS_EXPORT_TABLE_01" stopped due to fatal error at 17:55:24

3.exp表空间传输

[oracle@node1 chf]$ exp userid=\'/ as sysdba\' tablespaces=users file=/tmp/users.dmp transport_tablespace=y

Export: Release 11.2.0.3.0 - Production on Sat Jan 14 18:00:21 2012

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.


Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing option
EXP-00008: ORACLE error 1578 encountered
ORA-01578: ORACLE data block corrupted (file # 1, block # 95369)
ORA-01110: data file 1: '/opt/oracle/oradata/chf/system01.dbf'
EXP-00000: Export terminated unsuccessfully

4.expdp表空间传输

[oracle@node1 chf]$ expdp userid=\'/ as sysdba\' dumpfile=xienfei.dmp  transport_tablespaces=users

Export: Release 11.2.0.3.0 - Production on Sat Jan 14 18:12:12 2012

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options
Starting "SYS"."SYS_EXPORT_TRANSPORTABLE_01":  userid="/******** AS SYSDBA" dumpfile=xienfei.dmp transport_tablespaces=users 
ORA-39123: Data Pump transportable tablespace job aborted
ORA-01578: ORACLE data block corrupted (file # 1, block # 95369)
ORA-01110: data file 1: '/opt/oracle/oradata/chf/system01.dbf'

Job "SYS"."SYS_EXPORT_TRANSPORTABLE_01" stopped due to fatal error at 18:12:14

四.总结obj$表坏块
1.在执行导出过程中,应该会去查询obj$表,而该表因出现坏块,不能被正常方法,导致逻辑备份异常终止。
2.如果你真的出现了obj$坏块,而你没有备份,那么恭喜你,悲剧的人生开始了。该对象出现问题,逻辑备份都不能工作,就算你有心重建库也不会给你这个机会,可能你不得不借助odu/dul之类的工具去解决问题。再次提醒各位,备份重于一切,安全重于泰山。

诡异dblink问题解决–dblink insert操作数据类型发生改变

1.展示列属性

[oracle@saas-xunzhi-db1 ~]$ sqlplus testga/testga

SQL*Plus: Release 10.2.0.1.0 - Production on Thu Jan 12 16:26:56 2012

Copyright (c) 1982, 2005, Oracle.  All rights reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options

SQL> desc t_xifenfei;
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 PROCESS_INSTANCE_ID                       NOT NULL NUMBER(10)
 STATUS_NEW                                         NUMBER
 PACKAGE_NAME                                       VARCHAR2(50)

SQL> desc wf_proc_inst@oldmoa
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 ID                                        NOT NULL NUMBER(10)
 NAME                                               VARCHAR2(500)
 PROC_ID                                   NOT NULL NUMBER(10)
 PARE_INST_ID                              NOT NULL NUMBER(10)
 PARE_ACTI_ID                              NOT NULL NUMBER(10)
 INST_CREATOR                                       VARCHAR2(40)
 CREAT_DATE                                         DATE
 STATUS                                             CHAR(1)
 ATT1                                               NUMBER(10)
 ATT2                                               VARCHAR2(255)
 SEQVALUE                                           VARCHAR2(50)

从这里可以看出,原表的status列是char,而目标表的STATUS_NEW列是number类型

2.通过dblink查询列值

SQL> select t.inst_id as ID,
  2         case w.status
  3           when '0' then
  4            4
  5           when '2' then
  6            3
  7           else
  8            1
  9         end as  status_new,
 10         p.name as PACKAGE_NAME
 11    from wf_proc_inst@oldmoa i
 12    left join wf_proc_info_inst@oldmoa t on t.inst_id = i.id
 13    left join wf_proc_type@oldmoa p on t.type_id = p.id
 14    left join wf_proc_inst@oldmoa w on t.inst_id = w.id
 15   where t.inst_id = i.id
 16     and t.proc_id <> 53
 17     and t.is_ok = 1
 18     AND t.inst_id <= 4837 
 19     AND t.inst_id>=4735;

        ID STATUS_NEW PACKAGE_NAME
---------- ---------- --------------------------
      4755          3 呈批处理
      4836          3 公文处理

从这里可以看出来,case工作正常,传入的char类型,通过case转换为number类型

3.使用dblink插入数据并查询

SQL> INSERT INTO t_xifenfei
  2  select t.inst_id as ID,
  3         case w.status
  4           when '0' then
  5            4
  6           when '2' then
  7            3
  8           else
  9            1
 10         end as  status_new,
 11         p.name as PACKAGE_NAME
 12    from wf_proc_inst@oldmoa i
 13    left join wf_proc_info_inst@oldmoa t on t.inst_id = i.id
 14    left join wf_proc_type@oldmoa p on t.type_id = p.id
 15    left join wf_proc_inst@oldmoa w on t.inst_id = w.id
 16   where t.inst_id = i.id
 17     and t.proc_id <> 53
 18     and t.is_ok = 1
 19     AND t.inst_id <= 4837 
 20     AND t.inst_id>=4735;

2 rows created.

SQL> commit;

Commit complete.

SQL> col package_name for a10
SQL> select * from t_xifenfei;

PROCESS_INSTANCE_ID STATUS_NEW PACKAGE_NA
------------------- ---------- ----------
               4755          1 呈批处理
               4836          1 公文处理

这里出现异常,怀疑case中传入的值,都走到else那边了,怀疑是在使用dblink 的时候,char的类型发生的改变,转为了number,导致case匹配失败,所以走到else。

4.在dblink端创建视图,插入数据

SQL> CREATE VIEW GZZJ.v_xifenfei
  2  AS
  3  select t.inst_id as ID,
  4         case to_char(w.status)
  5           when '0' then
  6            4
  7           when '2' then
  8            3
  9           else
 10            1
 11         end as  status_new,
 12         p.name as PACKAGE_NAME
 13    from GZZJ.wf_proc_inst i
 14    left join GZZJ.wf_proc_info_inst t on t.inst_id = i.id
 15    left join GZZJ.wf_proc_type p on t.type_id = p.id
 16    left join GZZJ.wf_proc_inst w on t.inst_id = w.id
 17   where t.inst_id = i.id
 18     and t.proc_id <> 53
 19     and t.is_ok = 1
 20     AND t.inst_id <= 4837 
 21     AND t.inst_id>=4735;

View created.

SQL> insert into t_xifenfei 
  2  select * from v_xifenfei@oldmoa;

2 rows created.

SQL> commit;

Commit complete.

SQL> select * from t_xifenfei;

PROCESS_INSTANCE_ID STATUS_NEW PACKAGE_NA
------------------- ---------- ----------
               4755          3 呈批处理
               4836          3 公文处理

通过远程视图,接触数据类型发生改变异常(怀疑是dblink 在insert时候,char数据类型变成了number,但是未得到官方或者权威的确认)

RBA和实例恢复关系

1.关于heartbeat和checkpoint
在这次的删除过程中我没有执行commit,而是直接abort数据库。整个删除过程执行了近6分钟,控制文件的心跳每三秒进行一次,心跳是把low cache rba记录到了控制文件中,而没有真正的把全部的脏数据写入到磁盘( 只有发生了checkpoint时候,才会把相关的脏数据写入到磁盘,而这里的控制文件的heartbeat和checkpoint是两回事,checkpoint一般是在切换日志,数据文件正常离线,执行begin backup命令时发生,昨晚晚上后面的一个困惑就是上面的英文描述,让我把这两者搞混淆了)

2.三种rba解释
low rba :在buffer cache中的数据块第一次数据改变所对应的RAB。
high rba :在buffer cache中的数据块最近一次数据改变时所对应的RAB。
on-disk rba:是 lgwr 写日志文件的最末位置的地址。

3.实例恢复过程解释
实例恢复的时候,是从控制文件heartbeat记录的low rba开始读redo log数据(会多读取一点,因为heartbeat是每三秒执行一次,假设在2.9秒的时候,数据库异常down了,控制文件中记录的还是2.9秒前的low rba,这个时候,从该点开始读取redo),恢复到on-disk rba,而不是high rba(high rba一般情况下会大于on-disk rba,但是因为high rba比on-disk rba多的部分记录在redo log buffer中,在实例恢复的时候,因为其未被记录到redo log file中,所以不能被恢复,其实也没有必要恢复,因为该数据肯定是没有commit或者rollback)

具体内容和实验请见:RBA和实例恢复关系

ORA-600 kcratr_nab_less_than_odr故障解决

朋友的数据库服务器出现ORA-00600[kcratr_nab_less_than_odr],不能open数据库
1.open数据库报ORA-00600[kcratr_nab_less_than_odr]

SQL> ALTER DATABASE OPEN;
ALTER DATABASE OPEN
*
第 1 行出现错误:
ORA-00600: 内部错误代码, 参数: [kcratr_nab_less_than_odr], [1], [99189],
[43531], [43569], [], [], [], [], [], [], []

2.查看alert日志

Wed Jan 11 13:56:16 2012
ALTER DATABASE OPEN
Beginning crash recovery of 1 threads
 parallel recovery started with 2 processes
Started redo scan
Completed redo scan
 read 54591 KB redo, 0 data blocks need recovery
Errors in file d:\dbdms\diag\rdbms\dbdms\dbdms\trace\dbdms_ora_3108.trc  (incident=818557):
ORA-00600: 内部错误代码, 参数: [kcratr_nab_less_than_odr], [1], [99189], [43531], [43569], [], [], [], [], [], [], []
Incident details in: d:\dbdms\diag\rdbms\dbdms\dbdms\incident\incdir_818557\dbdms_ora_3936_i818557.trc
Aborting crash recovery due to error 600
Errors in file d:\dbdms\diag\rdbms\dbdms\dbdms\trace\dbdms_ora_3108.trc:
ORA-00600: 内部错误代码, 参数: [kcratr_nab_less_than_odr], [1], [99189], [43531], [43569], [], [], [], [], [], [], []
Errors in file d:\dbdms\diag\rdbms\dbdms\dbdms\trace\dbdms_ora_3108.trc:
ORA-00600: 内部错误代码, 参数: [kcratr_nab_less_than_odr], [1], [99189], [43531], [43569], [], [], [], [], [], [], []
ORA-600 signalled during: ALTER DATABASE OPEN...
Trace dumping is performing id=[cdmp_20120110214555]

3.查看trace文件

Trace file d:\dbdms\diag\rdbms\dbdms\dbdms\trace\dbdms_ora_3108.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Windows NT Version V6.1 Service Pack 1 
CPU                 : 2 - type 8664, 2 Physical Cores
Process Affinity    : 0x0x0000000000000000
Memory (Avail/Total): Ph:2250M/4060M, Ph+PgF:5868M/8119M 
Instance name: dbdms
Redo thread mounted by this instance: 1
Oracle process number: 17
Windows thread id: 3108, image: ORACLE.EXE (SHAD)
…………………………
WARNING! Crash recovery of thread 1 seq 99189 is
ending at redo block 43531 but should not have ended before
redo block 43569
Incident 826550 created, dump file: d:\dbdms\diag\rdbms\dbdms\dbdms\incident\incdir_826550\dbdms_ora_3108_i826550.trc
ORA-00600: ??????, ??: [kcratr_nab_less_than_odr], [1], [99189], [43531], [43569], [], [], [], [], [], [], []

ORA-00600: ??????, ??: [kcratr_nab_less_than_odr], [1], [99189], [43531], [43569], [], [], [], [], [], [], []
ORA-00600: ??????, ??: [kcratr_nab_less_than_odr], [1], [99189], [43531], [43569], [], [], [], [], [], [], []

通过alert和trace中的内容可以知道,数据库需要恢复到rba到43569,但是因为某种原因实例恢复的时候,只能利用1 thread 99189 seq#,恢复rba到43531。从而导致数据库无法正常open

This Problem is caused by Storage Problem of the Database Files. 
The Subsystem (eg. SAN) crashed while the Database was open. 
The Database then crashed since the Database Files were not accessible anymore. 
This caused a lost Write into the Online RedoLogs and so Instance Recovery is not possible and raising the ORA-600.

4.解决方法

SQL> SELECT STATUS FROM V$INSTANCE;

STATUS
------------
MOUNTED

--尝试直接recover database
SQL> RECOVER DATABASE ;
ORA-00283: 恢复会话因错误而取消
ORA-00264: 不要求恢复
--提示不用恢复

--再打开数据库,还是kcratr_nab_less_than_odr错误警告
SQL> ALTER DATABASE OPEN;
ALTER DATABASE OPEN
*
第 1 行出现错误:
ORA-00600: 内部错误代码, 参数: [kcratr_nab_less_than_odr], [1], [99189],
[43531], [43569], [], [], [], [], [], [], []

--尝试不完全恢复
SQL> RECOVER DATABASE UNTIL CANCEL;
ORA-10879: error signaled in parallel recovery slave
ORA-01547: 警告: RECOVER 成功但 OPEN RESETLOGS 将出现如下错误
ORA-01152: 文件 1 没有从过旧的备份中还原
ORA-01110: 数据文件 1: 'D:\DBDMS\DATA\SYSTEM01.DBF'

--重建控制文件
SQL> ALTER DATABASE BACKUP CONTROLFILE TO TRACE AS 'D:/1.TXT';

数据库已更改。

SQL> SHUTDOWN IMMEDIATE;
ORA-01109: 数据库未打开


已经卸载数据库。
ORACLE 例程已经关闭。
SQL> STARTUP NOMOUNT;
ORACLE 例程已经启动。

Total System Global Area  417546240 bytes
Fixed Size                  2176328 bytes
Variable Size             268438200 bytes
Database Buffers          138412032 bytes
Redo Buffers                8519680 bytes
SQL> CREATE CONTROLFILE REUSE DATABASE "DBDMS" NORESETLOGS  NOARCHIVELOG
  2      MAXLOGFILES 16
  3      MAXLOGMEMBERS 3
  4      MAXDATAFILES 100
  5      MAXINSTANCES 8
  6      MAXLOGHISTORY 18688
  7  LOGFILE
  8    GROUP 1 'D:\DBDMS\LOG\REDO01.LOG'  SIZE 50M BLOCKSIZE 512,
  9    GROUP 2 'D:\DBDMS\LOG\REDO02.LOG'  SIZE 50M BLOCKSIZE 512,
 10    GROUP 3 'D:\DBDMS\LOG\REDO03.LOG'  SIZE 50M BLOCKSIZE 512
 11  DATAFILE
 12    'D:\DBDMS\DATA\SYSTEM01.DBF',
 13    'D:\DBDMS\DATA\SYSAUX01.DBF',
 14    'D:\DBDMS\DATA\RBSG01.DBF',
 15    'D:\DBDMS\DATA\DATA01.DBF',
 16    'D:\DBDMS\DATA\INDX01.DBF',
 17    'D:\DBDMS\DATA\DATA02.DBF',
 18    'D:\DBDMS\DATA\DATA03.DBF',
 19    'D:\DBDMS\DATA\DATA04.DBF',
 20    'D:\DBDMS\DATA\INDX02.DBF',
 21    'D:\DBDMS\DATA\SYSTEM02.DBF'
 22  CHARACTER SET ZHS16GBK
 23  ;

控制文件已创建。

--继续尝试恢复
SQL> RECOVER DATABASE ;
完成介质恢复。
SQL> ALTER DATABASE OPEN;

数据库已更改。
--open成功

在这次恢复中,主要就是重建控制文件,然后直接恢复成功,如果redo有损坏,那么可能需要使用不完全恢复,然后使用resetlogs打开数据库