使用asm disk header 自动备份信息恢复异常asm disk header

通过参考kamus的Where is the backup of ASM disk header block,发现从10.2.0.5开始的asm确实存在自动备份asm disk header功能.有了这个功能对于那些不备份asm disk header的同学,提供了一层保证,也增加了asm的安全性.
对于10.2.0.5.0以及以后版本,不管au size是多少,asm disk header自动备份存储的位置是第2个au的倒数第2个block.
计算方法:AU中包含的block num[AU_SIZE/block_size]*2-2[因为从第一个块从0计数],通过该方法计算结论为:
1M AU在510
2M AU在1022
4M AU在2046
8M AU在4094
16M AU在8190
32M AU在16382
64M AU在32766
1.对比备份asm disk header

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - Prod
PL/SQL Release 10.2.0.5.0 - Production
CORE    10.2.0.5.0      Production
TNS for Linux: Version 10.2.0.5.0 - Production
NLSRTL Version 10.2.0.5.0 - Production

SQL> select to_char(sysdate,'yyyy-mm-dd hh24:mi:ss') "xifenfei.com"  from dual;

xifenfei.com
-------------------
2012-06-17 09:41:19

SQL>  select group_number,DISK_NUMBER,PATH,HEADER_STATUS 
   2  from v$asm_disk where group_number<>0;

GROUP_NUMBER DISK_NUMBER PATH            HEADER_STATU
------------ ----------- --------------- ------------
           1           1 /dev/raw/raw2   MEMBER
           1           0 /dev/raw/raw1   MEMBER

SQL> select group_number,name,BLOCK_SIZE,ALLOCATION_UNIT_SIZE from v$asm_diskgroup;

GROUP_NUMBER NAME                           BLOCK_SIZE ALLOCATION_UNIT_SIZE
------------ ------------------------------ ---------- --------------------
           1 DATA                                 4096              1048576

rac1->  kfed read /dev/raw/raw1 blknum=510|>/tmp/xifenfei.510
rac1->  kfed read /dev/raw/raw1 blknum=0|>/tmp/xifenfei.0
rac1-> ll /tmp/xifenfei*
-rw-r--r--  1 oracle oinstall 6606 Jun 14 04:11 /tmp/xifenfei.0
-rw-r--r--  1 oracle oinstall 6606 Jun 14 04:12 /tmp/xifenfei.510
rac1-> diff /tmp/xifenfei.510 /tmp/xifenfei.0
--通过对比发现两者无不同记录返回,证明他们记录内容完全相同

2.尝试破坏asm disk header

rac1-> dd if=/dev/zero of=/dev/raw/raw1 bs=4096 count=1
1+0 records in
1+0 records out
rac1->  kfed read /dev/raw/raw1 blknum=0
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj:                       0 ; 0x008: TYPE=0x0 NUMB=0x0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000

SQL> select group_number,DISK_NUMBER,PATH,HEADER_STATUS 
   2 from v$asm_disk where group_number<>0;

GROUP_NUMBER DISK_NUMBER PATH            HEADER_STATU
------------ ----------- --------------- ------------
           1           1 /dev/raw/raw2   MEMBER
           1           0 /dev/raw/raw1   CANDIDATE

SQL> alter diskgroup  data dismount;

Diskgroup altered.

SQL> alter diskgroup  data mount;
alter diskgroup  data mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "DATA"

3.使用kfed repair修改损坏asm disk header

rac1-> kfed  repair '/dev/raw/raw1'
rac1->  kfed read /dev/raw/raw1 blknum=0
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj:              2147483648 ; 0x008: TYPE=0x8 NUMB=0x0
kfbh.check:                   883602253 ; 0x00c: 0x34aab34d
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
…………

SQL> alter diskgroup  data mount;

Diskgroup altered.

4.使用kfed merge恢复asm disk header

rac1-> dd if=/dev/zero of=/dev/raw/raw1 bs=4096 count=1
1+0 records in
1+0 records out
rac1->  kfed read /dev/raw/raw1 blknum=0
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj:                       0 ; 0x008: TYPE=0x0 NUMB=0x0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000

SQL> alter diskgroup  data dismount;

Diskgroup altered.

SQL> alter diskgroup  data mount;
alter diskgroup  data mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "DATA"

rac1->  kfed merge /dev/raw/raw1 /tmp/xifenfei.510

SQL> alter diskgroup  data mount;

Diskgroup altered.

通过试验证明在10.2.0.5及其以后版本中,对于备份的asm disk header我们可以通过使用kfed repair和kfed merge来恢复.

11G RAC库 ORA-00600[ktubko_1]错误

数据库版本信息

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production

SQL> select to_char(sysdate,'yyyy-mm-dd hh24:mi:ss') "www.orasos.com" from dual;

www.orasos.com
-------------------
2012-06-12 14:44:53

数据库启动报ORA-00600[ktubko_1]错误

Database Characterset is ZHS16GBK
Errors in file /u01/diag/rdbms/xff/XFF1/trace/XFF1_smon_17248.trc  (incident=21754):
ORA-00600: internal error code, arguments: [ktubko_1], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/diag/rdbms/xff/XFF1/incident/incdir_21754/XFF1_smon_17248_i21754.trc
Tue Jun 12 10:37:10 2012
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
ORACLE Instance XFF1 (pid = 19) - Error 600 encountered while recovering transaction (4, 2) on object 5887.
Errors in file /u01/diag/rdbms/xff/XFF1/trace/XFF1_smon_17248.trc:
ORA-00600: internal error code, arguments: [ktubko_1], [], [], [], [], [], [], [], [], [], [], []

查看trace文件

Incorrect next uba in kturCurrBackoutOneChg while backing out xid: 0x0004.002.0000022b uba: 0x00c02068.00af.3b
Undo record:
ktubu redo: slt: 2 rci: 58 opc: 10.22 objn: 5887 objd: 5887 tsn: 1
Undo type:  Regular undo       Undo type:  Last buffer split:  No 
Tablespace Undo:  No 
             0x00000000
index undo for leaf key operations
KTB Redo 
op: 0x04  ver: 0x01  
compat bit: 4 (post-11) padding: 1
op: L  itl: xid:  0x0003.003.0000030a uba: 0x00c0200c.010a.15
                      flg: C---    lkc:  0     scn: 0x0000.00118c55
Dump kdilk : itl=3, kdxlkflg=0x1 sdc=0 indexid=0x800df2 block=0x00800df3
(kdxlre): restore leaf row (clear leaf delete flags)
key :(5):  02 c1 0d 01 80
keydata/bitmap: (6):  00 81 0f 41 00 01
Undo block: tsn 0x2 rdba: 0xc02068
Dump of buffer cache at level 4 for tsn=2 rdba=12591208
BH (0x33ff7264) file#: 3 rdba: 0x00c02068 (3/8296) class: 24 ba: 0x33f24000

*** 2012-06-12 10:36:40.265
  set: 5 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,0
  dbwrid: 0 obj: -1 objn: 0 tsn: 2 afn: 3 hint: f
  hash: [0x3e78a49c,0x3e78a49c] lru: [0x33ff73ec,0x33ff723c]
  obj-flags: object_ckpt_list
  ckptq:[0x3e05cd68,0x33ff7f0c]fileq:[0x3e05cda4,0x3e05cda4]objq:[0x3b9094a4,0x3b9094a4]objaq:[0x33ff7acc,0x3b909494]
  st: XCURRENT md: NULL fpin: 'ktuwh23: ktubko' tch: 1
  flags: buffer_dirty redo_since_read
  LRBA: [0x15.26.0] LSCN: [0x0.11acb6] HSCN: [0x0.11acb6] HSUB: [1]
Data block dump: tsn: 0x1 rdba: 0x800df3
Dump of buffer cache at level 3 for tsn=1 rdba=8392179
BH (0x33ff70b4) file#: 2 rdba: 0x00800df3 (2/3571) class: 1 ba: 0x33f20000
  set: 5 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,0
  dbwrid: 0 obj: 5887 objn: 5887 tsn: 1 afn: 2 hint: f
  hash: [0x3e7d85e4,0x3e7d85e4] lru: [0x33ff723c,0x3e05c760]
  ckptq: [NULL] fileq: [NULL] objq: [0x3b9092a8,0x3b9092a8] objaq: [0x3b9092a0,0x3b9092a0]
  st: XCURRENT md: NULL fpin: 'kdiwh27: kdiulk' tch: 1
  flags:
  LRBA: [0x0.0.0] LSCN: [0x0.0] HSCN: [0xffff.ffffffff] HSUB: [65535]
  buffer tsn: 1 rdba: 0x00800df3 (2/3571)
  scn: 0x0000.00118c67 seq: 0x01 flg: 0x06 tail: 0x8c670601
  frmt: 0x02 chkval: 0x5d04 type: 0x06=trans data

Block header dump:  0x00800df3
 Object id on Block? Y
 seg/obj: 0x16ff  csc: 0x00.118c60  itc: 3  flg: E  typ: 2 - INDEX
     brn: 0  bdba: 0x800df0 ver: 0x01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x02   0x0006.01d.00000319  0x00c00332.009b.15  --U-    1  fsc 0x000f.00118c67
0x03   0x0003.003.0000030a  0x00c0200c.010a.15  C---    0  scn 0x0000.00118c55


kcra_dump_redo_internal: skipped for critical process
Dumping redo for undo$
kcra_dump_redo_internal: skipped for critical process

*** 2012-06-12 10:36:43.906
Incident 21754 created, dump file: /u01/diag/rdbms/xff/XFF1/incident/incdir_21754/XFF1_smon_17248_i21754.trc
ORA-00600: internal error code, arguments: [ktubko_1], [], [], [], [], [], [], [], [], [], [], []

ORACLE Instance XFF1 (pid = 19) - Error 600 encountered while recovering transaction (4, 2) on object 5887.

*** 2012-06-12 10:37:10.646
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=3, mask=0x0)
----- Error Stack Dump -----
ORA-00600: internal error code, arguments: [ktubko_1], [], [], [], [], [], [], [], [], [], [], []
----- SQL Statement (None) -----
Current SQL information unavailable - no cursor.

----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
skdstdst()+41        call     kgdsdst()            BFFD84FC ? 2 ?
ksedst1()+77         call     skdstdst()           BFFD84FC ? 0 ? 1 ? 8592A48 ?
                                                   85928C6 ? 8592A48 ?
ksedst()+33          call     ksedst1()            0 ? 1 ?
dbkedDefDump()+2704  call     ksedst()             0 ? 0 ? 3FFE17CC ? 3E7C5344 ?
                                                   1 ? 1 ?
ksedmp()+47          call     dbkedDefDump()       3 ? 0 ?
kturRecoverTxn()+52  call     ksedmp()             3 ? B6B3FEA4 ? 2C ?
471                                                B6B3FE38 ? 1 ? B6B3FE58 ?
kturRecoverUndoSegm  call     kturRecoverTxn()     BFFD9078 ? 2 ? 1 ? 0 ? 11 ?
ent()+1091                                         4 ? 1 ?
kturRecoverActiveTx  call     kturRecoverUndoSegm  4 ? 0 ? 1 ? 0 ? FFFF ? 11 ?
ns()+931                      ent()                4 ?
ktprbeg()+281        call     kturRecoverActiveTx  10 ? 0 ?
                              ns()                 
ktmmon()+13050       call     ktprbeg()            0 ? 1 ? 0 ? B6B5B72C ? 0 ?
                                                   0 ?
ktmSmonMain()+174    call     ktmmon()             20018C2C ? B6B46D2C ?
                                                   114E7B00 ? 0 ? 0 ? B6B59F50 ?
ksbrdp()+826         call     00000000             20018C2C ? 432A884E ? 0 ? 0 ?
                                                   0 ? 0 ?
opirip()+559         call     ksbrdp()             0 ? 0 ? 0 ? 0 ? 0 ? 0 ?
opidrv()+515         call     opirip()             32 ? 4 ? BFFDB20C ?
sou2o()+80           call     opidrv()             32 ? 4 ? BFFDB20C ?
opimai_real()+230    call     sou2o()              BFFDB1F0 ? 32 ? 4 ?
                                                   BFFDB20C ?
ssthrdmain()+212     call     00000000             3 ? BFFDB338 ? 0 ? 4318AF14 ?
                                                   BFFDB2F4 ? 4317E670 ?
main()+147           call     ssthrdmain()         3 ? BFFDB338 ?
__libc_start_main()  call     00000000             1 ? BFFDB434 ? BFFDB43C ?
+220                                               4317E828 ? 0 ? 1 ?
_start()+33          call     __libc_start_main()  856F1C4 ? 1 ? BFFDB434 ?
                                                   BCF1440 ? BCF1430 ?
                                                   43170790 ?
 

--------------------- Binary Stack Dump ---------------------

通过上面的trace可以看出是2/3571中包含了事务,但是和3/8296[4号回滚段]回滚中的信息不相符,从而出现了在数据库启动回滚的时候出现该错误.查询mos[ID 1318986.1]发现这个是数据库的Bug 10205230比较相似,虽说在11.2.0.2中修复而且在asm中不受该影响,我这里库是11.2.0.3的asm rac照样出现该bug.

解决方法
通过alert日志提示object可以找到object_id=5887.当然也可以通过trace中的rdba来确定

SQL> col OBJECT_NAME for a30 
SQL> select object_name,object_type,owner from dba_objects where object_id=5887;

OBJECT_NAME                    OBJECT_TYPE         OWNER
------------------------------ ------------------- ------------------------------
WRI$_ADV_MESSAGE_GROUPS_PK     INDEX               SYS


SQL> alter index sys.WRI$_ADV_MESSAGE_GROUPS_PK rebuild online;

Index altered.

补充说明:如果损坏对象是表,需要使用DBMS_REPAIR跳过坏块,然后重建表

重启数据库观察
数据库已经正常,开始报undo回滚段错误的记录已经不再存在,数据库恢复正常

Tue Jun 12 13:50:43 2012
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
Tue Jun 12 13:51:11 2012
No Resource Manager plan active
Tue Jun 12 13:52:01 2012
Starting background process GTX0
Tue Jun 12 13:52:01 2012
GTX0 started with pid=29, OS id=14234 
Starting background process RCBG
Tue Jun 12 13:52:04 2012
RCBG started with pid=41, OS id=14238 
replication_dependency_tracking turned off (no async multimaster replication found)
Tue Jun 12 13:54:01 2012
Starting background process QMNC
Tue Jun 12 13:54:01 2012
QMNC started with pid=42, OS id=14279 
Tue Jun 12 13:57:26 2012
Completed: ALTER DATABASE OPEN

使用bbed让rac中的sysaux数据文件online

一个朋友的11g rac库的sysaux表空间因某种原因缺少历史归档,导致无法正常online,是的数据库的很多功能受限.通过实现展示恢复过程.
模拟环境

SQL> select name,file#,status from v$datafile;

NAME                                                      FILE# STATUS
---------------------------------------------------- ---------- -------
+XIFENFEI/xff/datafile/system.256.776961315                   1 SYSTEM
+XIFENFEI/xff/datafile/sysaux.257.776961315                   2 ONLINE
+XIFENFEI/xff/datafile/undotbs1.258.776961317                 3 ONLINE
+XIFENFEI/xff/datafile/user_dd.dbf                            4 ONLINE
+XIFENFEI/xff/datafile/undotbs2.264.776961693                 5 ONLINE
+XIFENFEI/asm/datafile/xifenfei01.dbf.268.781967893           6 ONLINE

6 rows selected.

SQL> alter database datafile 2 offline;

Database altered.

SQL> archive log list;
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence     14
Next log sequence to archive   15
Current log sequence           15
SQL> alter system switch logfile;

System altered.

SQL> /

System altered.

SQL> /

System altered.

SQL> /

System altered.

SQL> /

System altered.

SQL> archive log list;
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence     19
Next log sequence to archive   19
Current log sequence           20

--删除部分归档日志
[grid@rac1 ~]$ asmcmd
ASMCMD> ls
DATA/
XIFENFEI/
ASMCMD> cd data
ASMCMD> ls
XFF/
rac-cluster/
ASMCMD> cd xff 
ASMCMD> ls
ARCHIVELOG/
CONTROLFILE/
ONLINELOG/
ASMCMD> cd archivelog
ASMCMD> ls
2012_03_03/
2012_04_13/
2012_04_30/
2012_05_01/
2012_05_24/
2012_06_12/
ASMCMD> cd 2012_06_12
ASMCMD> ls
thread_1_seq_15.280.785752747
thread_1_seq_16.281.785752845
thread_1_seq_17.282.785752929
thread_1_seq_18.283.785753043
thread_1_seq_19.284.785753115
ASMCMD> rm thread_1_seq_16.281.785752845
ASMCMD> rm thread_1_seq_15.280.785752747

尝试online 数据文件

SQL> alter database datafile 2 online;
alter database datafile 2 online
*
ERROR at line 1:
ORA-01113: file 2 needs media recovery
ORA-01110: data file 2: '+XIFENFEI/xff/datafile/sysaux.257.776961315'


SQL> recover datafile 2;
ORA-00279: change 1155352 generated at 06/12/2012 08:20:10 needed for thread 1
ORA-00289: suggestion :
+DATA/xff/archivelog/2012_06_12/thread_1_seq_15.280.785752747
ORA-00280: change 1155352 for thread 1 is in sequence #15


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
auto
ORA-00308: cannot open archived log
'+DATA/xff/archivelog/2012_06_12/thread_1_seq_15.280.785752747'
ORA-17503: ksfdopn:2 Failed to open file
+DATA/xff/archivelog/2012_06_12/thread_1_seq_15.280.785752747
ORA-15012: ASM file
'+DATA/xff/archivelog/2012_06_12/thread_1_seq_15.280.785752747' does not exist


ORA-00308: cannot open archived log
'+DATA/xff/archivelog/2012_06_12/thread_1_seq_15.280.785752747'
ORA-17503: ksfdopn:2 Failed to open file
+DATA/xff/archivelog/2012_06_12/thread_1_seq_15.280.785752747
ORA-15012: ASM file
'+DATA/xff/archivelog/2012_06_12/thread_1_seq_15.280.785752747' does not exist

准备bbed修改数据文件
现在datafile 2不能恢复,我们需要修改的就是该datafile header 相关的scn等信息,另外拷贝一个数据文件出来做修改时候参考

RMAN> copy datafile 2 to  '/tmp/auxsys.dbf_rman';

Starting backup at 2012-06-12 08:59:07
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=20 instance=XFF1 device type=DISK
channel ORA_DISK_1: starting datafile copy
input datafile file number=00002 name=+XIFENFEI/xff/datafile/sysaux.257.776961315
output file name=/tmp/auxsys.dbf_rman tag=TAG20120612T090029 RECID=1 STAMP=785754322
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:03:50
Finished backup at 2012-06-12 09:05:36

RMAN>  copy datafile 4 to '/tmp/user.dbf_rman';

Starting backup at 2012-06-12 09:09:28
using channel ORA_DISK_1
channel ORA_DISK_1: starting datafile copy
input datafile file number=00004 name=+XIFENFEI/xff/datafile/user_dd.dbf
output file name=/tmp/user.dbf_rman tag=TAG20120612T090932 RECID=2 STAMP=785754582
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:15
Finished backup at 2012-06-12 09:09:48

bbed修改datafile header

[oracle@rac1 tmp]$ bbed password=blockedit listfile=/tmp/o_bbed  mode=edit

BBED: Release 2.0.0.0.0 - Limited Production on Tue Jun 12 09:37:30 2012

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

************* !!! For Oracle Internal Use only !!! ***************

BBED> info
 File#  Name                                                        Size(blks)
 -----  ----                                                        ----------
     1  /tmp/auxsys.dbf_rman                                                 0
     2  /tmp/user.dbf_rman                                                   0

BBED> set file 2 block 1
        FILE#           2
        BLOCK#          1


BBED> p kcvfhckp
struct kcvfhckp, 36 bytes                   @484     
   struct kcvcpscn, 8 bytes                 @484     
      ub4 kscnbas                           @484      0x0011a787
      ub2 kscnwrp                           @488      0x0000
   ub4 kcvcptim                             @492      0x2ed5a9cd
   ub2 kcvcpthr                             @496      0x0001
   union u, 12 bytes                        @500     
      struct kcvcprba, 12 bytes             @500     
         ub4 kcrbaseq                       @500      0x00000014
         ub4 kcrbabno                       @504      0x000000c5
         ub2 kcrbabof                       @508      0x0010
   ub1 kcvcpetb[0]                          @512      0x02
   ub1 kcvcpetb[1]                          @513      0x00
   ub1 kcvcpetb[2]                          @514      0x00
   ub1 kcvcpetb[3]                          @515      0x00
   ub1 kcvcpetb[4]                          @516      0x00
   ub1 kcvcpetb[5]                          @517      0x00
   ub1 kcvcpetb[6]                          @518      0x00
   ub1 kcvcpetb[7]                          @519      0x00

BBED> p kcvfhcpc                             
ub4 kcvfhcpc                                @140      0x00000086

BBED> p kcvfhccc                             
ub4 kcvfhccc                                @148      0x00000085

BBED> set file 1 block 1
        FILE#           1
        BLOCK#          1

BBED> p kcvfhckp
struct kcvfhckp, 36 bytes                   @484     
   struct kcvcpscn, 8 bytes                 @484     
      ub4 kscnbas                           @484      0x0011a118
      ub2 kscnwrp                           @488      0x0000
   ub4 kcvcptim                             @492      0x2ed59e3a
   ub2 kcvcpthr                             @496      0x0001
   union u, 12 bytes                        @500     
      struct kcvcprba, 12 bytes             @500     
         ub4 kcrbaseq                       @500      0x0000000f
         ub4 kcrbabno                       @504      0x0000c4ed
         ub2 kcrbabof                       @508      0x0010
   ub1 kcvcpetb[0]                          @512      0x02
   ub1 kcvcpetb[1]                          @513      0x00
   ub1 kcvcpetb[2]                          @514      0x00
   ub1 kcvcpetb[3]                          @515      0x00
   ub1 kcvcpetb[4]                          @516      0x00
   ub1 kcvcpetb[5]                          @517      0x00
   ub1 kcvcpetb[6]                          @518      0x00
   ub1 kcvcpetb[7]                          @519      0x00

BBED> p kcvfhcpc
ub4 kcvfhcpc                                @140      0x00000079

BBED>  p kcvfhccc 
ub4 kcvfhccc                                @148      0x00000078

/*
确定需要修改项kscnbas/kcvcptim/kcvfhcpc/kcvfhccc的相关信息
*/

BBED> set count 16
        COUNT           16

BBED> d file 2 block 1 offset 484                            
 File: /tmp/user.dbf_rman (2)
 Block: 1                Offsets:  484 to  499           Dba:0x00800001
------------------------------------------------------------------------
 87a71100 00001000 cda9d52e 01000000 

 <32 bytes per line>

BBED> m /x 87a71100 file 1 block 1 offset 484
BBED-00209: invalid number (87a71100)


BBED> m /x 87a7 file 1 block 1 offset 484
 File: /tmp/auxsys.dbf_rman (1)
 Block: 1                Offsets:  484 to  499           Dba:0x00400001
------------------------------------------------------------------------
 87a71100 00000000 3a9ed52e 01000000 

 <32 bytes per line>

BBED> d file 2 block 1 offset 492
 File: /tmp/user.dbf_rman (2)
 Block: 1                Offsets:  492 to  507           Dba:0x00800001
------------------------------------------------------------------------
 cda9d52e 01000000 14000000 c5000000 

 <32 bytes per line>

BBED> m /x cda9d52e file 1 block 1 offset 492
BBED-00209: invalid number (cda9d52e)


BBED> d file 1 block 1 offset 492
 File: /tmp/auxsys.dbf_rman (1)
 Block: 1                Offsets:  492 to  507           Dba:0x00400001
------------------------------------------------------------------------
 3a9ed52e 01000000 0f000000 edc40000 

 <32 bytes per line>

BBED> m /x cda9 file 1 block 1 offset 492
 File: /tmp/auxsys.dbf_rman (1)
 Block: 1                Offsets:  492 to  507           Dba:0x00400001
------------------------------------------------------------------------
 cda9d52e 01000000 0f000000 edc40000 

 <32 bytes per line>

BBED> d file 1 block 1 offset 140
 File: /tmp/auxsys.dbf_rman (1)
 Block: 1                Offsets:  140 to  155           Dba:0x00400001
------------------------------------------------------------------------
 79000000 2970bc2e 78000000 00000000 

 <32 bytes per line>

BBED> d file 2 block 1 offset 140
 File: /tmp/user.dbf_rman (2)
 Block: 1                Offsets:  140 to  155           Dba:0x00800001
------------------------------------------------------------------------
 86000000 2970bc2e 85000000 00000000 

 <32 bytes per line>

BBED> m /x 86000000 file 1 block 1 offset 140
BBED-00209: invalid number (86000000)


BBED> m /x 8600 file 1 block 1 offset 140
 File: /tmp/auxsys.dbf_rman (1)
 Block: 1                Offsets:  140 to  155           Dba:0x00400001
------------------------------------------------------------------------
 86000000 2970bc2e 78000000 00000000 

 <32 bytes per line>

BBED> d file 2 block 1 offset 148
 File: /tmp/user.dbf_rman (2)
 Block: 1                Offsets:  148 to  163           Dba:0x00800001
------------------------------------------------------------------------
 85000000 00000000 00000000 00000000 

 <32 bytes per line>

BBED> m /x 8500 file 1 block 1 offset 148
 File: /tmp/auxsys.dbf_rman (1)
 Block: 1                Offsets:  148 to  163           Dba:0x00400001
------------------------------------------------------------------------
 85000000 00000000 00000000 00000000 

 <32 bytes per line>

BBED> set file 1 block 1
        FILE#           1
        BLOCK#          1

BBED> p kcvfhckp
struct kcvfhckp, 36 bytes                   @484     
   struct kcvcpscn, 8 bytes                 @484     
      ub4 kscnbas                           @484      0x0011a787
      ub2 kscnwrp                           @488      0x0000
   ub4 kcvcptim                             @492      0x2ed5a9cd
   ub2 kcvcpthr                             @496      0x0001
   union u, 12 bytes                        @500     
      struct kcvcprba, 12 bytes             @500     
         ub4 kcrbaseq                       @500      0x0000000f
         ub4 kcrbabno                       @504      0x0000c4ed
         ub2 kcrbabof                       @508      0x0010
   ub1 kcvcpetb[0]                          @512      0x02
   ub1 kcvcpetb[1]                          @513      0x00
   ub1 kcvcpetb[2]                          @514      0x00
   ub1 kcvcpetb[3]                          @515      0x00
   ub1 kcvcpetb[4]                          @516      0x00
   ub1 kcvcpetb[5]                          @517      0x00
   ub1 kcvcpetb[6]                          @518      0x00
   ub1 kcvcpetb[7]                          @519      0x00

BBED> p kcvfhcpc
ub4 kcvfhcpc                                @140      0x00000086

BBED>  p kcvfhccc 
ub4 kcvfhccc                                @148      0x00000085

BBED> sum apply
Check value for File 1, Block 1:
current = 0x48c4, required = 0x48c4

使用修改后数据文件尝试online

SQL> alter database rename file '+XIFENFEI/xff/datafile/sysaux.257.776961315' to '/tmp/auxsys.dbf_rman';

Database altered.

SQL> recover database datafile 2 ;
ORA-00274: illegal recovery option DATAFILE


SQL> recover database datafile 2;
ORA-00274: illegal recovery option DATAFILE


SQL> recover datafile 2;
ORA-00283: recovery session canceled due to errors
ORA-01122: database file 2 failed verification check
ORA-01110: data file 2: '/tmp/auxsys.dbf_rman'
ORA-01207: file is more recent than control file - old control file

尝试重建控制文件

SQL> alter database backup controlfile to trace as '/tmp/xifenfei.ctl';

Database altered.

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> STARTUP NOMOUNT
ORACLE instance started.

Total System Global Area  535662592 bytes
Fixed Size                  1346140 bytes
Variable Size             411043236 bytes
Database Buffers          117440512 bytes
Redo Buffers                5832704 bytes
SQL> @xifenfei_ctl

CREATE CONTROLFILE REUSE DATABASE "XFF" NORESETLOGS  ARCHIVELOG
*
ERROR at line 1:
ORA-01503: CREATE CONTROLFILE failed
ORA-12720: operation requires database is in EXCLUSIVE mode

--在rac中重建控制文件需要设置cluster_database=FALSE 

SQL> alter system set  cluster_database=FALSE scope=spfile;

System altered.

SQL> shutdown immediate;
ORA-01507: database not mounted


ORACLE instance shut down.
SQL> STARTUP NOMOUNT
ORACLE instance started.

Total System Global Area  535662592 bytes
Fixed Size                  1346140 bytes
Variable Size             411043236 bytes
Database Buffers          117440512 bytes
Redo Buffers                5832704 bytes
SQL> @xifenfei_ctl

Control file created.

online数据文件
重建控制文件恢复数据库之后 datafile 2自动online成功,省去了手工处理麻烦,如果没有自动online,请手工处理

SQL> recover database;
Media recovery complete.
SQL> alter database open;

Database altered.

SQL> col name for a52
SQL> select name,file#,status from v$datafile;

NAME                                                      FILE# STATUS
---------------------------------------------------- ---------- -------
+XIFENFEI/xff/datafile/system.256.776961315                   1 SYSTEM
/tmp/auxsys.dbf_rman                                          2 ONLINE
+XIFENFEI/xff/datafile/undotbs1.258.776961317                 3 ONLINE
+XIFENFEI/xff/datafile/user_dd.dbf                            4 ONLINE
+XIFENFEI/xff/datafile/undotbs2.264.776961693                 5 ONLINE
+XIFENFEI/asm/datafile/xifenfei01.dbf.268.781967893           6 ONLINE

6 rows selected.

文件系统中的datafile 2 恢复到asm中

SQL> alter database datafile 2 offline;

Database altered.

RMAN> copy datafile 2 to '+XIFENFEI';

Starting backup at 2012-06-12 10:55:42
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=21 device type=DISK
channel ORA_DISK_1: starting datafile copy
input datafile file number=00002 name=/tmp/auxsys.dbf_rman
output file name=+XIFENFEI/xff/datafile/sysaux.257.785761227 tag=TAG20120612T105800 RECID=1 STAMP=785762097
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:16:24
Finished backup at 2012-06-12 11:15:05


RMAN> switch datafile 2 to copy;

datafile 2 switched to datafile copy "+XIFENFEI/xff/datafile/sysaux.257.785761227"

RMAN> recover datafile 2;

Starting recover at 2012-06-12 11:30:32
using channel ORA_DISK_1

starting media recovery
media recovery complete, elapsed time: 00:01:30

Finished recover at 2012-06-12 11:34:11

RMAN> sql 'alter database datafile 2 online';

sql statement: alter database datafile 2 online

验证和收尾工作

SQL> select name,file#,status from v$datafile;

NAME                                                      FILE# STATUS
---------------------------------------------------- ---------- -------
+XIFENFEI/xff/datafile/system.256.776961315                   1 SYSTEM
+XIFENFEI/xff/datafile/sysaux.257.785761227                   2 ONLINE
+XIFENFEI/xff/datafile/undotbs1.258.776961317                 3 ONLINE
+XIFENFEI/xff/datafile/user_dd.dbf                            4 ONLINE
+XIFENFEI/xff/datafile/undotbs2.264.776961693                 5 ONLINE
+XIFENFEI/asm/datafile/xifenfei01.dbf.268.781967893           6 ONLINE


SQL> alter system set  cluster_database=true scope=spfile;

System altered.

--然后重启节点

resmgr:cpu quantum等待

昨天晚上数据库升级(使用exp/imp从9i升级到11g),开启业务,数据库出现很多resmgr:cpu quantum等待

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for IBM/AIX RISC System/6000: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

SQL> SELECT sid,event FROM v$session WHERE wait_class#<>6;

       SID EVENT
---------- ----------------------------------------------------------------
        27 resmgr:cpu quantum
        38 resmgr:cpu quantum
        43 resmgr:cpu quantum
        46 resmgr:cpu quantum
       113 resmgr:cpu quantum
       118 resmgr:cpu quantum
       125 resmgr:cpu quantum
       140 resmgr:cpu quantum
       143 resmgr:cpu quantum
       199 resmgr:cpu quantum
       205 resmgr:cpu quantum

       SID EVENT
---------- ----------------------------------------------------------------
       212 resmgr:cpu quantum
       220 resmgr:cpu quantum
       221 resmgr:cpu quantum
       223 resmgr:cpu quantum
       238 resmgr:cpu quantum
       241 resmgr:cpu quantum
       301 resmgr:cpu quantum
       313 resmgr:cpu quantum
       314 resmgr:cpu quantum
       405 resmgr:cpu quantum
       410 resmgr:cpu quantum

       SID EVENT
---------- ----------------------------------------------------------------
       415 resmgr:cpu quantum
       435 resmgr:cpu quantum
       502 resmgr:cpu quantum
       503 resmgr:cpu quantum
       509 resmgr:cpu quantum
       510 resmgr:cpu quantum
       512 resmgr:cpu quantum
       521 resmgr:cpu quantum
       526 resmgr:cpu quantum
       528 resmgr:cpu quantum
       532 resmgr:cpu quantum

       SID EVENT
---------- ----------------------------------------------------------------
       533 enq: TX - row lock contention
       589 resmgr:cpu quantum
       596 resmgr:cpu quantum
       600 resmgr:cpu quantum
       609 resmgr:cpu quantum
       611 resmgr:cpu quantum
       625 resmgr:cpu quantum
       635 null event
       707 resmgr:cpu quantum
       727 resmgr:cpu quantum
       731 SQL*Net message to client

44 rows selected.

查询alert日志

Sat Jun 09 06:00:00 2012
Setting Resource Manager plan SCHEDULER[0x310C]:DEFAULT_MAINTENANCE_PLAN via scheduler window
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Sun Jun 10 02:00:00 2012
Closing Resource Manager plan via scheduler window
Clearing Resource Manager plan via parameter

Sun Jun 10 06:00:00 2012
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Mon Jun 11 02:00:00 2012
Closing Resource Manager plan via scheduler window
Clearing Resource Manager plan via parameter

Mon Jun 11 22:00:00 2012
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Tue Jun 12 02:00:00 2012
Closing scheduler window
Closing Resource Manager plan via scheduler window
Clearing Resource Manager plan via parameter

Tue Jun 12 22:00:00 2012
Setting Resource Manager plan SCHEDULER[0x3108]:DEFAULT_MAINTENANCE_PLAN via scheduler window
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Wed Jun 13 02:00:00 2012
Closing scheduler window
Closing Resource Manager plan via scheduler window
Clearing Resource Manager plan via parameter

从这里可以看出来,因为SCHEDULER定时启动和关闭资源管理的DEFAULT_MAINTENANCE_PLAN从而导致在晚上10点到2点Resource Manager plan处于启用状态.上线测试刚好在晚上2点之前,所有当时查询的时候发现很多resmgr:cpu quantum等待是因为Resource Manager plan启用导致(使用SCHEDULER控制其启用和关闭),很多情况下数据库跑的应用比较单一,不是十分的需要启动资源管理.
在11g中关闭方法如下

1. Set the current resource manager plan to null (or another plan that is not restrictive):

alter system set resource_manager_plan='' scope=both;

2. Change the active windows to use the null resource manager plan (or other nonrestrictive plan) using:

execute dbms_scheduler.set_attribute('WEEKNIGHT_WINDOW','RESOURCE_PLAN',''); 
execute dbms_scheduler.set_attribute('WEEKEND_WINDOW','RESOURCE_PLAN','');

3. Then, for each window_name (WINDOW_NAME from DBA_SCHEDULER_WINDOWS), run:

execute dbms_scheduler.set_attribute('<window name>','RESOURCE_PLAN','');

SQL> select WINDOW_NAME  from DBA_SCHEDULER_WINDOWS;

WINDOW_NAME
------------------------------
MONDAY_WINDOW
TUESDAY_WINDOW
WEDNESDAY_WINDOW
THURSDAY_WINDOW
FRIDAY_WINDOW
SATURDAY_WINDOW
SUNDAY_WINDOW
WEEKNIGHT_WINDOW
WEEKEND_WINDOW

9 rows selected.

OPTIMIZER_INDEX_CACHING和OPTIMIZER_INDEX_COST_ADJ参数说明

OPTIMIZER_INDEX_COST_ADJ参数说明
OPTIMIZER_INDEX_COST_ADJ lets you tune optimizer behavior for access path selection to be more or less index friendly—that is, to make the optimizer more or less prone to selecting an index access path over a full table scan.
The default for this parameter is 100 percent, at which the optimizer evaluates index access paths at the regular cost. Any other value makes the optimizer evaluate the access path at that percentage of the regular cost. For example, a setting of 50 makes the index access path look half as expensive as normal.
这个数反映执行多块IO(全表扫描)的成本与执行单个IO(索引读取)的成本。保持为100,则多块IO与单块IO成本相同。设为50优化程序认为访问单块IO的成本为多块IO的一半。

OPTIMIZER_INDEX_COST_ADJ参数试验

SQL> create table t_xifenfei
  2  as
  3   select object_id,object_name from dba_objects where rownum<101;

Table created.

SQL> create index ind_t_xifenfei on t_xifenfei(object_id);

Index created.

SQL> EXEC DBMS_STATS.gather_table_stats(user,'T_XIFENFEI',CASCADE=>TRUE);

PL/SQL procedure successfully completed.

SQL>  show parameter OPTIMIZER_INDEX;

NAME                                 TYPE                   VALUE
------------------------------------ ---------------------- ------------------------------
optimizer_index_caching              integer                0
optimizer_index_cost_adj             integer                100

SQL> set linesize 150
SQL> set autot trace ext
SQL> select object_name from t_xifenfei where object_id>100;


Execution Plan
----------------------------------------------------------
Plan hash value: 2444553208

----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |     1 |    11 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T_XIFENFEI     |     1 |    11 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_T_XIFENFEI |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID">100)

SQL> alter session set OPTIMIZER_INDEX_COST_ADJ=1000;

Session altered.

SQL> select object_name from t_xifenfei where object_id>100;

Execution Plan
----------------------------------------------------------
Plan hash value: 548923532

--------------------------------------------------------------------------------
| Id  | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            |     1 |    11 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T_XIFENFEI |     1 |    11 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("OBJECT_ID">100)

通过试验证明修改OPTIMIZER_INDEX_COST_ADJ会导致执行计划走index还是全表扫描

OPTIMIZER_INDEX_CACHING参数说明
OPTIMIZER_INDEX_CACHING lets you adjust the behavior of cost-based optimization to favor nested loops joins and IN-list iterators.The cost of executing an index using an IN-list iterator or of executing a nested loops join when an index is used to access the inner table depends on the caching of that index in the buffer cache. The amount of caching depends on factors that the optimizer cannot predict, such as the load on the system and the block access patterns of different users.You can modify the optimizer’s assumptions about index caching for nested loops joins and IN-list iterators by setting this parameter to a value between 0 and 100 to indicate the percentage of the index blocks the optimizer should assume are in the cache. Setting this parameter to a higher value makes nested loops joins and IN-list iterators look less expensive to the optimizer. As a result, it will be more likely to pick nested loops joins over hash or sort-merge joins and to pick indexes using IN-list iterators over other indexes or full table scans. The default for this parameter is 0, which results in default optimizer behavior.
这个表明的是在nested loops joins and IN-list iterators的时候,如果使用了OPTIMIZER_INDEX_CACHING参数,表明两个表关联的时候优化器考虑index cache的比例,从而选择不同的执行计划.而不是网上所说的优化器考虑所有情况下的index的cache情况(这个参数只有在nested loops joins and IN-list iterators表关联的时候的index才会被优化器考虑[index cache的比例]).进一步说明:这个参数影响两个表关联的时候是选择hash jion还是nested loops joins/sort-merge joins

总结说明
1.关于OPTIMIZER_INDEX_CACHING的参数效果我未试验出来(可能方法不正确)
2.根据网上建议在oltp系统中设置
optimizer_index_caching = 0 optimizer_index_cost_adj = 100 的默认值,一般时候数据仓库报表系统。
optimizer_index_caching = 90 optimizer_index_cost_adj = 25-50 一般时候事务处理/OLTP系统
3.设置这些参数可能存在bug