dul支持ORACLE 12C CDB数据库恢复

熟悉dul的朋友都知道dul是通过file# 1 block 1的kcvfhrdb找到bootstarp$的segment header(其实kcvfhrdb就是bootstarp$ segment header的rdba地址),然后通过bootstarp$中存储的相关sql找对一些基础的基表对象(obj$,tab$,col$,seg$等),然后通过他们定位到具体的对象的segment记录,从而通过segment找到extent分布,然后按照extent恢复数据(如果丢失system的情况,是通过扫描来确定extent属于哪个segment,然后恢复,该情况不在本次讨论范围之类)。在ORACLE 12C之前,一个实例最多都只有一个数据库,也就是说,在一个完整的数据库中只会存在一个bootstarp$,只要通过file# 1 block 1 定位到kcvfhrdb就可以读取数据库中的所有内容.但是从12C开始数据库引入了CDB的概念,也就是在一个CDB数据库中有了多个PDB数据库,那这些PDB数据库如果要编写类似dul之类工具将如何恢复出来,这里根据自己对于CDB的理解,先普及一些在CDB数据库中和bootstarp$表有关知识
bootstarp$表在每个PDB中都存在,可以通过bbed证明

--查看pdb相关信息
SQL> show pdbs;

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDB1                           MOUNTED
         4 PDB2                           READ WRITE NO
         5 ORA11G                         MOUNTED

SQL>  select con_id,header_file,header_block from cdb_segments where segment_name='BOOTSTRAP$';

    CON_ID HEADER_FILE HEADER_BLOCK
---------- ----------- ------------
         4          11          520
         1           1          520
         2           5          520
----因为有部分库未read write,所以查询cdb_segments未显示

--file 1
RMAN> copy datafile 1 to '/tmp/system_01.dbf';

BBED> set block 1
        BLOCK#          1

BBED> map
 File: /tmp/system_01.dbf (0)
 Block: 1                                     Dba:0x00000000
------------------------------------------------------------
 Data File Header

 struct kcvfh, 1112 bytes                   @0       

 ub4 tailchk                                @8188    

BBED> p kcvfhrdb
ub4 kcvfhrdb                                @96       0x00400208

SQL> select to_number('400208','xxxxxxxxxx') from dual;

TO_NUMBER('400208','XXXXXXXXXX')
--------------------------------
                         4194824

SQL> select dbms_utility.data_block_address_block(4194824) "block",
  2  dbms_utility.data_block_address_file(4194824) "file" from dual;

     block       file
---------- ----------
       520          1

----可以知道bootstarp$起点的rdba为4194824,在rfile# 1 block# 520上

--file 11
RMAN> copy datafile 11 to '/tmp/system_11.dbf';

BBED> set filename '/tmp/system_11.dbf'
        FILENAME        /tmp/system_11.dbf

BBED> set block 1
        BLOCK#          1

BBED> p kcvfhrdb
ub4 kcvfhrdb                                @96       0x00400208

---显示的rdba地址完全与file# 1中的kcvfhrdb相同,也就是表示rfile# 1 block# 520

--验证未mount pdb,并且从11.2.0.4升级到12.1.0.1
ASMCMD> cp system01.dbf /tmp/system_18.dbf
copying +data/ora11g/system01.dbf -> /tmp/system_18.dbf

BBED> set filename '/tmp/system_18.dbf'
        FILENAME        /tmp/system_18.dbf

BBED>  set block 1
        BLOCK#          1

BBED> p kcvfhrdb
ub4 kcvfhrdb                                @96       0x0041ad40

SQL> select to_number('41ad40','xxxxxxxxx') from dual;

TO_NUMBER('41AD40','XXXXXXXXX')
-------------------------------
                        4304192

SQL> select dbms_utility.data_block_address_block(4304192) "block",
  2  dbms_utility.data_block_address_file(4304192) "file" from dual;

     block       file
---------- ----------
    109888          1
----可以知道bootstarp$起点的rdba为4304192,在rfile# 1 block# 109888上

查询contrainer$视图确认bootstarp$

SQL> select a.con_id#, a.dbid, a.rdba, dbms_utility.data_block_address_file(a.rdba) "file",
2    dbms_utility.data_block_address_block(a.rdba) "block"from container$ a;

   CON_ID#       DBID       RDBA       file      block
---------- ---------- ---------- ---------- ----------
         1 1922813718    4194824          1        520
         5 4211303690    4304192          1     109888
         2 4048821679    4194824          1        520
         4 3872456618    4194824          1        520
         3 3313918585    4194824          1        520

通过上面的知识点,我们明确,在ORACLE 12C CDB设计理念中,为了和12C之前的版本兼用(12C之前的版本可以通过PDB插入到CDB中),也为了方便用户在操作PDB时候和传统数据库一样,没有任何区别,所以它把每个PDB的rdba的计算方法认为PDB内部的RELFILE#是从1开始(也就是说每个rdba都是相对于自己的pdb而言),所以这里的contrainer$查询出来的rdba的地址就比较好理解(并非是绝对文件号,而是相对文件号,即表示pdb的第一个数据文件[传统的system01.dbf])

rdba中的file#和cdb中的file#关系

SQL> show con_name;

CON_NAME
------------------------------
PDB2
SQL> select file#, RELFILE#   from file$;

     FILE#   RELFILE#
---------- ----------
        12          4
        11          1
        13         13

SQL> show con_name;

CON_NAME
------------------------------
CDB$ROOT
SQL> select file#, RELFILE#   from file$;

     FILE#   RELFILE#
---------- ----------
         1          1
         3          3
         5
         6          6
         2
         4          4

6 rows selected.

通过这里的分析,就可以清晰的知道当前的dul是完全可以处理ORACLE 12C的CDB数据库.

dul恢复CDB中PDB数据

--在pdb中创建测试表
SQL> show pdbs;

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDB1                           MOUNTED
         4 PDB2                           READ WRITE NO
         5 ORA11G                         MOUNTED

SQL> alter session set container=pdb2;

Session altered.

SQL> show con_name;

CON_NAME
------------------------------
PDB2

SQL> show con_id

CON_ID
------------------------------
3

SQL> create user xff identified by xifenfei;

User created.

SQL> grant dba to xff;

Grant succeeded.

SQL> create table xff.t_xifenfei tablespace users 
  2  as select * from dba_objects;

Table created.

SQL> alter system checkpoint;

System altered.

SQL> select count(*) from xff.t_xifenfei;

  COUNT(*)
----------
     90756

--使用dul抽取数据
[oracle@xifenfei dul]$ ./dul


 Strictly Oracle Internal Use Only


DUL: Warning: Recreating file "dul.log"
Disk group DATA, dul group_cid 0
Discovered disk /dev/sdb as diskgroup DATA, disk number 0 size 20480 Mb File1 starts at 10, dul_disk_cid 0

DUL: Warning: Dictionary cache DC_ASM_EXTENTS is empty
Probing for attributes in File9, the attribute directory, for disk group DATA
attribute name "_extent_sizes", value "1 4 16"
attribute name "_extent_counts", value "20000 20000 214748367"
Oracle data file size 283123712 bytes, block size 8192
Found db_id = 1922813718
Found db_name = CDB
Oracle data file size 713039872 bytes, block size 8192
DUL> bootstrap;
Probing file = 1, block = 520
. unloading table                BOOTSTRAP$
DUL: Warning: block number is non zero but marked deferred trying to process it anyhow
      60 rows unloaded
DUL: Warning: Dictionary cache DC_BOOTSTRAP is empty
Reading BOOTSTRAP.dat 60 entries loaded
Parsing Bootstrap$ contents
DUL: Warning: Recreating file "dict.ddl"
Generating dict.ddl for version 11
 OBJ$: segobjno 18, file 1 block 240
 TAB$: segobjno 2, tabno 1, file 1  block 144
 COL$: segobjno 2, tabno 5, file 1  block 144
 USER$: segobjno 10, tabno 1, file 1  block 208
Running generated file "@dict.ddl" to unload the dictionary tables
. unloading table                      OBJ$   90758 rows unloaded
. unloading table                      TAB$    2363 rows unloaded
. unloading table                      COL$  106731 rows unloaded
. unloading table                     USER$     124 rows unloaded
Reading USER.dat 124 entries loaded
Reading OBJ.dat 90758 entries loaded and sorted 90758 entries
Reading TAB.dat 2363 entries loaded
Reading COL.dat 106685 entries loaded and sorted 106685 entries
Reading BOOTSTRAP.dat 60 entries loaded

DUL: Warning: Recreating file "dict.ddl"
Generating dict.ddl for version 11
 OBJ$: segobjno 18, file 1 block 240
 TAB$: segobjno 2, tabno 1, file 1  block 144
 COL$: segobjno 2, tabno 5, file 1  block 144
 USER$: segobjno 10, tabno 1, file 1  block 208
 TABPART$: segobjno 692, file 1 block 4528
 INDPART$: segobjno 697, file 1 block 4568
 TABCOMPART$: segobjno 714, file 1 block 9880
 INDCOMPART$: segobjno 719, file 0 block 0
 TABSUBPART$: segobjno 704, file 1 block 9928
 INDSUBPART$: segobjno 709, file 0 block 0
 IND$: segobjno 2, tabno 3, file 1  block 144
 ICOL$: segobjno 2, tabno 4, file 1  block 144
 LOB$: segobjno 2, tabno 6, file 1  block 144
 COLTYPE$: segobjno 2, tabno 7, file 1  block 144
 TYPE$: segobjno 619, tabno 1, file 1  block 1528
 COLLECTION$: segobjno 619, tabno 2, file 1  block 1528
 ATTRIBUTE$: segobjno 619, tabno 3, file 1  block 1528
 LOBFRAG$: segobjno 725, file 1 block 4616
 LOBCOMPPART$: segobjno 728, file 0 block 0
 UNDO$: segobjno 15, file 1 block 224
 TS$: segobjno 6, tabno 2, file 1  block 176
 PROPS$: segobjno 126, file 1 block 1096
Running generated file "@dict.ddl" to unload the dictionary tables
. unloading table                      OBJ$
DUL: Warning: Recreating file "OBJ.ctl"
   90758 rows unloaded
. unloading table                      TAB$
DUL: Warning: Recreating file "TAB.ctl"
    2363 rows unloaded
. unloading table                      COL$
DUL: Warning: Recreating file "COL.ctl"
  106731 rows unloaded
. unloading table                     USER$
DUL: Warning: Recreating file "USER.ctl"
     124 rows unloaded
. unloading table                  TABPART$     234 rows unloaded
. unloading table                  INDPART$     155 rows unloaded
. unloading table               TABCOMPART$       1 row  unloaded

DUL: Error: dc_segment_header(dataobj#=719, ts#=0, fil=0, blk=0) failed
DUL: Warning: Nothing to unload from empty delayed segment creation table INDCOMPART$
. unloading table               TABSUBPART$      32 rows unloaded

DUL: Error: dc_segment_header(dataobj#=709, ts#=0, fil=0, blk=0) failed
DUL: Warning: Nothing to unload from empty delayed segment creation table INDSUBPART$
. unloading table                      IND$    4237 rows unloaded
. unloading table                     ICOL$    6290 rows unloaded
. unloading table                      LOB$     849 rows unloaded
. unloading table                  COLTYPE$    2567 rows unloaded
. unloading table                     TYPE$    3651 rows unloaded
. unloading table               COLLECTION$    1345 rows unloaded
. unloading table                ATTRIBUTE$   13755 rows unloaded
. unloading table                  LOBFRAG$       6 rows unloaded

DUL: Error: dc_segment_header(dataobj#=728, ts#=0, fil=0, blk=0) failed
DUL: Warning: Nothing to unload from empty delayed segment creation table LOBCOMPPART$
. unloading table                     UNDO$       1 row  unloaded
. unloading table                       TS$       4 rows unloaded
. unloading table                    PROPS$      38 rows unloaded
Reading USER.dat 124 entries loaded
Reading OBJ.dat 90758 entries loaded and sorted 90758 entries
Reading TAB.dat 2363 entries loaded
Reading COL.dat 106685 entries loaded and sorted 106685 entries
Reading TABPART.dat 234 entries loaded and sorted 234 entries
Reading TABCOMPART.dat 1 entries loaded and sorted 1 entries
Reading TABSUBPART.dat 32 entries loaded and sorted 32 entries
Reading INDPART.dat 155 entries loaded and sorted 155 entries
Reading IND.dat 4237 entries loaded
Reading LOB.dat 849 entries loaded
Reading ICOL.dat 6290 entries loaded
Reading COLTYPE.dat 2567 entries loaded
Reading TYPE.dat 3651 entries loaded
Reading ATTRIBUTE.dat 13755 entries loaded
Reading COLLECTION.dat
DUL: Warning: Increased the size of DC_COLLECTIONS from 1024 to 8192 entries
 1345 entries loaded
Reading BOOTSTRAP.dat 60 entries loaded
Reading LOBFRAG.dat 6 entries loaded and sorted 6 entries
Reading UNDO.dat 1 entries loaded
Reading TS.dat 4 entries loaded
Reading PROPS.dat 38 entries loaded
Database character set is ZHS16GBK
Database national character set is AL16UTF16
DUL> unload table xff.t_xifenfei;
. unloading table                T_XIFENFEI   90756 rows unloaded

核对结果

SQL> create table xff.t_xifenfei_new as select  * from xff.t_xifenfei where 1=0;

Table created.

[oracle@xifenfei dul]$ sqlldr xff/xifenfei@pdb2 control=XFF_T_XIFENFEI.ctl 

SQL*Loader: Release 12.1.0.1.0 - Production on Sun Jun 2 18:08:04 2013

Copyright (c) 1982, 2013, Oracle and/or its affiliates.  All rights reserved.

Path used:      Conventional
Commit point reached - logical record count 64
Commit point reached - logical record count 128
Commit point reached - logical record count 192
Commit point reached - logical record count 256
Commit point reached - logical record count 320
Commit point reached - logical record count 384
Commit point reached - logical record count 448
Commit point reached - logical record count 512
Commit point reached - logical record count 576
…………
Commit point reached - logical record count 90589
Commit point reached - logical record count 90653
Commit point reached - logical record count 90717
Commit point reached - logical record count 90756

Table "XFF"."T_XIFENFEI_NEW":
  90756 Rows successfully loaded.

Check the log file:
  XFF_T_XIFENFEI.log
for more information about the load.

SQL> select count(*) from xff.t_xifenfei_new;

  COUNT(*)
----------
     90756

通过分析12C的bootstarp$表分布,和dul恢复数据库原理,通过变动实现dul完美恢复CDB中的pdb数据

ORACLE 12C RMAN 功能增强

在ORACLE 12C中对rman的功能有了不少增强,在以前的文章中写过RMAN RECOVER TABLE功能,这里另外补充rman增强的两个小功能(sql语句和数据文件分割)
数据库版本

SQL>  select * from v$version;

BANNER                                                                               CON_ID
-------------------------------------------------------------------------------- ----------
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production              0
PL/SQL Release 12.1.0.1.0 - Production                                                    0
CORE    12.1.0.1.0      Production                                                        0
TNS for Linux: Version 12.1.0.1.0 - Production                                            0
NLSRTL Version 12.1.0.1.0 - Production                                                    0

rman对sql语句支持增强

[oracle@xifenfei tmp]$ rman target /

Recovery Manager: Release 12.1.0.1.0 - Production on Sat Jun 1 14:07:50 2013

Copyright (c) 1982, 2013, Oracle and/or its affiliates.  All rights reserved.

connected to target database: CDB (DBID=1922813718)

RMAN> select sysdate from dual;

using target database control file instead of recovery catalog
SYSDATE  
---------
01-JUN-13

RMAN> alter session set nls_date_format='yyyy-mm-dd hh24:mi:ss';

Statement processed

RMAN>  select sysdate from dual;

SYSDATE            
-------------------
2013-06-01 14:16:48

RMAN> desc v$log

 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 GROUP#                                             NUMBER                      
 THREAD#                                            NUMBER                      
 SEQUENCE#                                          NUMBER                      
 BYTES                                              NUMBER                      
 BLOCKSIZE                                          NUMBER                      
 MEMBERS                                            NUMBER                      
 ARCHIVED                                           VARCHAR2(3)                 
 STATUS                                             VARCHAR2(16)                
 FIRST_CHANGE#                                      NUMBER                      
 FIRST_TIME                                         DATE                        
 NEXT_CHANGE#                                       NUMBER                      
 NEXT_TIME                                          DATE                        
 CON_ID                                             NUMBER                      

这里看到rman只是sql语句中的select和desc用法

rman分割数据文件增强

RMAN>  CONFIGURE DEVICE TYPE DISK PARALLELISM 3;

old RMAN configuration parameters:
CONFIGURE DEVICE TYPE DISK PARALLELISM 1 BACKUP TYPE TO BACKUPSET;
new RMAN configuration parameters:
CONFIGURE DEVICE TYPE DISK PARALLELISM 3 BACKUP TYPE TO BACKUPSET;
new RMAN configuration parameters are successfully stored

RMAN> backup incremental level 1 section size 30M datafile 1 format '/tmp/system_%U.rman';

Starting backup at 01-JUN-13
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=27 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=269 device type=DISK
allocated channel: ORA_DISK_3
channel ORA_DISK_3: SID=24 device type=DISK
no parent backup or copy of datafile 1 found
channel ORA_DISK_1: starting incremental level 1 datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=+DATA/cdb/system01.dbf
backing up blocks 1 through 3840
channel ORA_DISK_1: starting piece 1 at 01-JUN-13
channel ORA_DISK_2: starting incremental level 1 datafile backup set
channel ORA_DISK_2: specifying datafile(s) in backup set
input datafile file number=00001 name=+DATA/cdb/system01.dbf
……………………
backing up blocks 96001 through 99840
channel ORA_DISK_3: starting piece 26 at 01-JUN-13
channel ORA_DISK_1: finished piece 24 at 01-JUN-13
piece handle=/tmp/system_02ob3pg1_24_1.rman tag=TAG20130601T144518 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:08
channel ORA_DISK_1: starting incremental level 1 datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=+DATA/cdb/system01.dbf
backing up blocks 99841 through 101120
channel ORA_DISK_1: starting piece 27 at 01-JUN-13
channel ORA_DISK_2: finished piece 25 at 01-JUN-13
piece handle=/tmp/system_02ob3pg1_25_1.rman tag=TAG20130601T144518 comment=NONE
channel ORA_DISK_2: backup set complete, elapsed time: 00:00:07
channel ORA_DISK_3: finished piece 26 at 01-JUN-13
piece handle=/tmp/system_02ob3pg1_26_1.rman tag=TAG20130601T144518 comment=NONE
channel ORA_DISK_3: backup set complete, elapsed time: 00:00:06
channel ORA_DISK_1: finished piece 27 at 01-JUN-13
piece handle=/tmp/system_02ob3pg1_27_1.rman tag=TAG20130601T144518 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:07
Finished backup at 01-JUN-13

备份文件情况

[oracle@xifenfei tmp]$ ll -ltr system*
-rw-r----- 1 oracle dba 14761984 Jun  1 14:45 system_02ob3pg1_1_1.rman
-rw-r----- 1 oracle dba  9535488 Jun  1 14:45 system_02ob3pg1_2_1.rman
-rw-r----- 1 oracle dba 16973824 Jun  1 14:45 system_02ob3pg1_4_1.rman
-rw-r----- 1 oracle dba 18284544 Jun  1 14:45 system_02ob3pg1_3_1.rman
-rw-r----- 1 oracle dba 12804096 Jun  1 14:45 system_02ob3pg1_5_1.rman
-rw-r----- 1 oracle dba 29163520 Jun  1 14:45 system_02ob3pg1_6_1.rman
-rw-r----- 1 oracle dba 31326208 Jun  1 14:46 system_02ob3pg1_7_1.rman
-rw-r----- 1 oracle dba 30851072 Jun  1 14:46 system_02ob3pg1_8_1.rman
-rw-r----- 1 oracle dba 30801920 Jun  1 14:46 system_02ob3pg1_9_1.rman
-rw-r----- 1 oracle dba 23977984 Jun  1 14:46 system_02ob3pg1_11_1.rman
-rw-r----- 1 oracle dba 28508160 Jun  1 14:46 system_02ob3pg1_10_1.rman
-rw-r----- 1 oracle dba 30277632 Jun  1 14:46 system_02ob3pg1_12_1.rman
-rw-r----- 1 oracle dba 31498240 Jun  1 14:46 system_02ob3pg1_13_1.rman
-rw-r----- 1 oracle dba 31498240 Jun  1 14:47 system_02ob3pg1_14_1.rman
-rw-r----- 1 oracle dba 31498240 Jun  1 14:47 system_02ob3pg1_15_1.rman
-rw-r----- 1 oracle dba 30507008 Jun  1 14:47 system_02ob3pg1_17_1.rman
-rw-r----- 1 oracle dba 30834688 Jun  1 14:47 system_02ob3pg1_16_1.rman
-rw-r----- 1 oracle dba 31498240 Jun  1 14:47 system_02ob3pg1_18_1.rman
-rw-r----- 1 oracle dba 30244864 Jun  1 14:47 system_02ob3pg1_19_1.rman
-rw-r----- 1 oracle dba 29016064 Jun  1 14:47 system_02ob3pg1_20_1.rman
-rw-r----- 1 oracle dba 29212672 Jun  1 14:47 system_02ob3pg1_21_1.rman
-rw-r----- 1 oracle dba 30728192 Jun  1 14:47 system_02ob3pg1_22_1.rman
-rw-r----- 1 oracle dba 29384704 Jun  1 14:47 system_02ob3pg1_23_1.rman
-rw-r----- 1 oracle dba 26566656 Jun  1 14:47 system_02ob3pg1_24_1.rman
-rw-r----- 1 oracle dba 24928256 Jun  1 14:48 system_02ob3pg1_25_1.rman
-rw-r----- 1 oracle dba 19324928 Jun  1 14:48 system_02ob3pg1_26_1.rman
-rw-r----- 1 oracle dba  6291456 Jun  1 14:48 system_02ob3pg1_27_1.rman

在12C之前的版本,ORACLE 11GR2只是对于全备的备份集备份(非增量,非copy备份方式)方式支持数据文件分割备份功能,对于11.2之前的版本均不支持该功能.在12C中rman可以支持对于全备,增量备份,copy备份全部支持分割数据文件备份(CONTROLFILE,SPFILE不支持)

跳过rman坏块恢复

在有些情况下,我们仅有一份rman备份,而这个时候rman 备份有出现坏块,使得我们的还原/恢复工作无法继续下去,导致数据大量丢失.我们可以通过设置event 19548/19549来跳过坏块,最大程度抢救数据
rman备份数据文件

C:\Users\XIFENFEI>rman target /

Recovery Manager: Release 11.2.0.3.0 - Production on Thu Jun 6 20:31:19 2013

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to target database: XIFENFEI (DBID=1422012639)

RMAN> backup tablespace users format 'f:/users_bak.rman';

Starting backup at 06-JUN-13
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=197 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00004 name=E:\ORACLE\ORADATA\XIFENFEI\USERS01.DBF
channel ORA_DISK_1: starting piece 1 at 06-JUN-13
channel ORA_DISK_1: finished piece 1 at 06-JUN-13
piece handle=F:\USERS_BAK.RMAN tag=TAG20130606T203154 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:03
Finished backup at 06-JUN-13

切换归档日志

SQL> alter system switch logfile;

System altered.

SQL> /

System altered.

SQL> /

System altered.

SQL> archive log list;
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            E:\oracle\product\11.2.0\dbhome_1\RDBMS
Oldest online log sequence     95
Next log sequence to archive   97
Current log sequence           97

重命名数据文件

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.

--------------------------------------
e:\oracle\oradata\XIFENFEI>move USERS01.DBF USERS01_bak.DBF
移动了         1 个文件。
--------------------------------------

SQL> startup
ORACLE instance started.

Total System Global Area  418484224 bytes
Fixed Size                  1385052 bytes
Variable Size             327159204 bytes
Database Buffers           83886080 bytes
Redo Buffers                6053888 bytes
Database mounted.
ORA-01157: cannot identify/lock data file 4 - see DBWR trace file
ORA-01110: data file 4: 'E:\ORACLE\ORADATA\XIFENFEI\USERS01.DBF'

破坏备份集
破坏前


破坏后


这里很明显,我通过ue把rman备份集中的T修改为了A,肯定破坏了文件,使之出现坏块

rman还原数据文件

C:\Users\XIFENFEI>rman target /

Recovery Manager: Release 11.2.0.3.0 - Production on Thu Jun 6 21:02:41 2013

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to target database: XIFENFEI (DBID=1422012639, not open)

RMAN> restore datafile 4;

Starting restore at 06-JUN-13
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=63 device type=DISK

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00004 to E:\ORACLE\ORADATA\XIFENFEI\USERS
01.DBF
channel ORA_DISK_1: reading from backup piece F:\USERS_BAK.RMAN
channel ORA_DISK_1: ORA-19870: error while restoring backup piece F:\USERS_BAK.R
MAN
ORA-19612: datafile 4 not restored due to missing or corrupt data

failover to previous backup

creating datafile file number=4 name=E:\ORACLE\ORADATA\XIFENFEI\USERS01.DBF
Finished restore at 06-JUN-13

这里可以清晰的看到rman报ORA-19612错误,restore 失败,alert日志为:

Thu Jun 06 21:02:31 2013
ALTER DATABASE OPEN
Errors in file E:\ORACLE\diag\rdbms\xifenfei\xff\trace\xff_dbw0_7400.trc:
ORA-01157: ????/?????? 4 - ??? DBWR ????
ORA-01110: ???? 4: 'E:\ORACLE\ORADATA\XIFENFEI\USERS01.DBF'
ORA-27041: ??????
OSD-04002: unable to open file
O/S-Error: (OS 2) 系统找不到指定的文件。
Errors in file E:\ORACLE\diag\rdbms\xifenfei\xff\trace\xff_ora_4272.trc:
ORA-01157: cannot identify/lock data file 4 - see DBWR trace file
ORA-01110: data file 4: 'E:\ORACLE\ORADATA\XIFENFEI\USERS01.DBF'
ORA-1157 signalled during: ALTER DATABASE OPEN...
Thu Jun 06 21:02:33 2013
Checker run found 1 new persistent data failures
Thu Jun 06 21:03:23 2013
Corrupt block 101 found during reading backup piece, file=F:\USERS_BAK.RMAN, corr_type=3
Reread of blocknum=101, file=F:\USERS_BAK.RMAN, found same corrupt data
Reread of blocknum=101, file=F:\USERS_BAK.RMAN, found same corrupt data
Reread of blocknum=101, file=F:\USERS_BAK.RMAN, found same corrupt data
Reread of blocknum=101, file=F:\USERS_BAK.RMAN, found same corrupt data
Reread of blocknum=101, file=F:\USERS_BAK.RMAN, found same corrupt data
Continuing reading piece F:\USERS_BAK.RMAN, no other copies available.

rman备份集有坏块,导致rman还原无法正常进行下去,还原后的数据文件大小


观察已经正常还原出来数据文件情况

SQL> select CHECKPOINT_CHANGE#,file# from v$datafile_header;

CHECKPOINT_CHANGE#      FILE#
------------------ ----------
           1571582          1
           1571582          2
           1571582          3
             18379          4
           1571582          5
           1571582          6
           1571582          7

SQL> recover database datafile 4 ;
ORA-00274: illegal recovery option DATAFILE


SQL> recover datafile 4;
ORA-00279: change 18379 generated at 01/20/2013 17:13:56 needed for thread 1
ORA-00289: suggestion :
E:\ORACLE\PRODUCT\11.2.0\DBHOME_1\RDBMS\ARC0000000001_0805223583.0001
ORA-00280: change 18379 for thread 1 is in sequence #1


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}

rman只是还原了很小的一部分文件,做恢复提示需要从归档日志seq 1开始(某些情况可能需要其他归档,总之不是正常情况),证明rman还原异常

设置event事件还原

SQL> shutdown abort;
ORACLE instance shut down.
SQL> startup pfile='e:/pfile.txt' mount;
ORACLE instance started.

Total System Global Area  418484224 bytes
Fixed Size                  1385052 bytes
Variable Size             327159204 bytes
Database Buffers           83886080 bytes
Redo Buffers                6053888 bytes
Database mounted.
SQL> show parameter event;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
event                                string      19548 trace name context forev
                                                 er, 19549 trace name context f
                                                 orever
Event 19548:This will attempt to restore content of the corrupted block if it is possible.
Event 19549:This will suppress erroring out during restore

rman还原数据文件

RMAN> restore datafile 4;

Starting restore at 06-JUN-13
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=63 device type=DISK

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00004 to E:\ORACLE\ORADATA\XIFENFEI\USERS
01.DBF
channel ORA_DISK_1: reading from backup piece F:\USERS_BAK.RMAN
channel ORA_DISK_1: piece handle=F:\USERS_BAK.RMAN tag=TAG20130606T203154
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:35
Finished restore at 06-JUN-13

这里证明数据库rman有坏块通过rman还原成功,alert日志提示如下

Thu Jun 06 21:29:53 2013
WARNING: The block that appears to be block number 100
         in file 4 is corrupt in backup piece F:\USERS_BAK.RMAN.
         Such blocks would usually be formatted as empty
         in the restored file, but event 19548 has been
         set to include the block as-is in the restored
         file.
Corrupt block 102 found during reading backup piece, file=F:\USERS_BAK.RMAN, corr_type=-2
Reread of blocknum=102, file=F:\USERS_BAK.RMAN, found same corrupt data
Reread of blocknum=102, file=F:\USERS_BAK.RMAN, found same corrupt data
Reread of blocknum=102, file=F:\USERS_BAK.RMAN, found same corrupt data
Reread of blocknum=102, file=F:\USERS_BAK.RMAN, found same corrupt data
Reread of blocknum=102, file=F:\USERS_BAK.RMAN, found same corrupt data
Continuing reading piece F:\USERS_BAK.RMAN, no other copies available.
…………
Corrupt block 258 found during reading backup piece, file=F:\USERS_BAK.RMAN, corr_type=-2
Reread of blocknum=258, file=F:\USERS_BAK.RMAN, found same corrupt data
Reread of blocknum=258, file=F:\USERS_BAK.RMAN, found same corrupt data
Reread of blocknum=258, file=F:\USERS_BAK.RMAN, found same corrupt data
Reread of blocknum=258, file=F:\USERS_BAK.RMAN, found same corrupt data
Reread of blocknum=258, file=F:\USERS_BAK.RMAN, found same corrupt data
Continuing reading piece F:\USERS_BAK.RMAN, no other copies available.
WARNING: some data in the backup of file 4 was missing
         or corrupt.  Event 19549 has been set to allow
         the file to be restored anyway.
           backup header block count: 5369
           backup actual block count: 5212
              backup header checksum: -218250743
              backup actual checksum: 1442665538
Full restore complete of datafile 4 E:\ORACLE\ORADATA\XIFENFEI\USERS01.DBF.  Elapsed time: 0:00:25 
  checkpoint is 1570136
  last deallocation scn is 1508457

这里rman还原依然遇到很多坏块,但是均跳过坏块,还是完整的恢复出来的数据文件(大小)


rman还原数据文件

RMAN> recover datafile 4;

Starting recover at 06-JUN-13
using channel ORA_DISK_1

starting media recovery

archived log for thread 1 with sequence 94 is already on disk as file E:\ORACLE\
PRODUCT\11.2.0\DBHOME_1\RDBMS\ARC0000000094_0805223583.0001
archived log for thread 1 with sequence 95 is already on disk as file E:\ORACLE\
PRODUCT\11.2.0\DBHOME_1\RDBMS\ARC0000000095_0805223583.0001
archived log for thread 1 with sequence 96 is already on disk as file E:\ORACLE\
PRODUCT\11.2.0\DBHOME_1\RDBMS\ARC0000000096_0805223583.0001
archived log file name=E:\ORACLE\PRODUCT\11.2.0\DBHOME_1\RDBMS\ARC0000000094_080
5223583.0001 thread=1 sequence=94
media recovery complete, elapsed time: 00:00:00
Finished recover at 06-JUN-13

这里可以明显的看到在recover过程中数据库应用的是备份后的所有归档,数据文件是正常被还原出来(坏块除外)

查询对象

SQL> alter database open;

Database altered.

SQL> conn test/test
Connected.
SQL> select * from tab;

TNAME                          TABTYPE  CLUSTERID
------------------------------ ------- ----------
STB101                         TABLE

SQL> select count(*) from stb101;
select count(*) from stb101
                     *
ERROR at line 1:
ORA-08103: object no longer exists

dbv检查坏块

e:\oracle\oradata\XIFENFEI>dbv file=USERS01.DBF

DBVERIFY: Release 11.2.0.3.0 - Production on Thu Jun 6 23:59:49 2013

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = E:\ORACLE\ORADATA\XIFENFEI\USERS01.DBF

Page 100 is marked corrupt
Corrupt block relative dba: 0x01000064 (file 4, block 100)
Bad check value found during dbv:
Data in bad block:
 type: 30 format: 2 rdba: 0x01000064
 last change scn: 0x0000.00004890 seq: 0x1 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x48901e01
 check value in block header: 0x8311
 computed block checksum: 0x20



DBVERIFY - Verification complete

Total Pages Examined         : 12320
Total Pages Processed (Data) : 4952
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 7069
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 298

证明设置了event之后,rman确实跳过了备份集中的坏块,而且是直接还原了坏块内容,证明了event 19548和19549作用

补充说明
在非特殊情况下强烈不建议设置相关event跳过rman中的坏块来还原/恢复数据库,这样将对数据的丢失,甚至数据库是否可以正常open不好评估,rman备份重要,确保rman备份可用也很重要.

记录因磁盘头被重写,抢救redo恢复经历

客户使用赛门铁克做同城异地容灾部署extent rac,因某种情况导致主备容灾不同步,然后在主库中进行了若干操作,导致主库所有裸设备丢失,然后进行了一些列的操作,主库识别了裸设备,但是oracle出现异常
数据库裸设备异常

Fri May 31 22:07:39 2013
ORA-00202: control file: '/dev/rcontrol2'
ORA-27047: unable to read the header block of file
Additional information: 2
ORA-205 signalled during: ALTER DATABASE   MOUNT...

使用备份还原控制文件后,查询数据文件头v$datafile_header.error全部为”WRONG FILE TYPE”,使用bbed去查看,10个block以内全部是0,证明数据库文件头也损坏。因为客户的数据库虽然有rman备份,但涉及到memory,对redo的信息也很敏感,所以希望能够在他们当前的情况下评估redo是否可以应用,确保他们的数据不丢失.验证文件头

DATA FILE #1: 
  (name #4) /dev/rsystem
creation size=128000 block size=8192 status=0xe head=4 tail=4 dup=1
 tablespace 0, index=1 krfil=1 prev_file=0
 unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00
 Checkpoint cnt:22469 scn: 0x0000.7b4f9d86 05/29/2013 22:09:50
 Stop scn: 0xffff.ffffffff 05/15/2013 00:08:31
 Creation Checkpointed at scn:  0x0000.00000009 05/20/2007 21:52:41
 thread:1 rba:(0x1.3.10)
 enabled  threads:  01000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000
 Offline scn: 0x0000.00000000 prev_range: 0
 Online Checkpointed at scn:  0x0000.00000000 
 thread:0 rba:(0x0.0.0)
 enabled  threads:  00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000
 Hot Backup end marker scn: 0x0000.00000000
 aux_file is NOT DEFINED 
File header version cannot be determined due to corruption
Dump may be suspect
 V10 STYLE FILE HEADER:
	Software vsn=0=0x0, Compatibility Vsn=0=0x0
	Db ID=0=0x0, Db Name=''
	Activation ID=0=0x0
	Control Seq=0=0x0, File size=0=0x0
	File Number=0, Blksiz=0, File Type=0 UNKNOWN
Tablespace #0 -   rel_fn:0 
Creation   at   scn: 0x0000.00000000 01/01/1988 00:00:00
Backup taken at scn: 0x0000.00000000 01/01/1988 00:00:00 thread:0
 reset logs count:0x0 scn: 0x0000.00000000 reset logs terminal rcv data:0x0 scn: 0x0000.00000000
 prev reset logs count:0x0 scn: 0x0000.00000000 prev reset logs terminal rcv data:0x0 scn: 0x0000.00000000
 recovered at 01/01/1988 00:00:00
 status:0x0 root dba:0x00000000 chkpt cnt: 0 ctl cnt:0
begin-hot-backup file size: 0
Checkpointed at scn:  0x0000.00000000 
 thread:0 rba:(0x0.0.0)
 enabled  threads:  00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000
Backup Checkpointed at scn:  0x0000.00000000 
 thread:0 rba:(0x0.0.0)
 enabled  threads:  00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  00000000 00000000 00000000 00000000 00000000 00000000
External cache id: 0x0 0x0 0x0 0x0
Absolute fuzzy scn: 0x0000.00000000
Recovery fuzzy scn: 0x0000.00000000 01/01/1988 00:00:00
Terminal Recovery Stamp  01/01/1988 00:00:00

这里可以看出来文件头完全损坏,dump redo header

LOG FILE #1: 
  (name #1) /dev/rredo11
 Thread 1 redo log links: forward: 2 backward: 0
 siz: 0x3c000 seq: 0x00003f9c hws: 0x2 bsz: 512 nab: 0x1763c flg: 0x1 dup: 1
 Archive links: fwrd: 0 back: 0 Prev scn: 0x0000.7b458379
 Low scn: 0x0000.7b4583ad 05/29/2013 01:32:39
 Next scn: 0x0000.7b4fa0e5 05/29/2013 22:17:59
 FILE HEADER:
	Software vsn=0=0x0, Compatibility Vsn=0=0x0
	Db ID=0=0x0, Db Name=''
	Activation ID=0=0x0
	Control Seq=0=0x0, File size=0=0x0
	File Number=0, Blksiz=0, File Type=0 UNKNOWN
 descrip:""
 thread: 0 nab: 0x0 seq: 0x00000000 hws: 0x0 eot: 0 dis: 0
 reset logs count: 0x0 scn: 0x0000.00000000
 Low scn: 0x0000.00000000 01/01/1988 00:00:00
 Next scn: 0x0000.00000000 01/01/1988 00:00:00
 Enabled scn: 0x0000.00000000 01/01/1988 00:00:00
 Thread closed scn: 0x0000.00000000 01/01/1988 00:00:00
 Log format vsn: 0x0 Disk cksum: 0x0 Calc cksum: 0x0
 Terminal Recovery Stop scn: 0x0000.00000000
 Terminal Recovery Stamp  01/01/1988 00:00:00
 Most recent redo scn: 0x0000.00000000
 Largest LWN: 0 blocks
 Miscellaneous flags: 0x0
 Thread internal enable indicator: thr: 0, seq: 0 scn: 0x0000.00000000

验证redo header已经异常,dump redo logfile全部提示文件头错误


现在情况已经很明显,客户的库因为使用了裸设备,online 磁盘的过程中,所有的裸设备卷已经重写了文件头,oracle的datafile header信息不在了,无法正常操作完成.我们决定使用rman备份来恢复该数据库,然后想办法处理redo

注册带库备份集
在rman恢复过程中,我们遇到一个问题,客户的库rman备份策略有问题,一周一个全备,每天一个增量备份,一次归档备份,最后一次增量备份后备份了控制文件,但是最后一次归档备份之后无控制文件,而且是归档的备份发生在增量备份之后,因为是使用了带库无catalog库,我们增量恢复之后,数据不一致需要归档,但是归档,而归档的备份未记录在还原出来的控制文件中,需要人工注册带库的备份集到控制文件中

--涉及到3个节点都有配置不同的NB_ORA_CLIENT=pysa,如果在同一个节点中还原归档日志,需要配置如下
CONFIGURE CHANNEL DEVICE TYPE 'SBT_TAPE' PARMS  'ENV=(NB_ORA_CLIENT=pysa)';

--如果在默认节点直接分配sbt通道即可
configure default device type to sbt_tape;

--注册带库备份集
catalog device type 'sbt_tape' backuppiece 'al_21395_1_816744765';

分析redo
通过ue打开dd出来的redo文件,我们分析得到20000h(10*8192)全部为0,应该是和赛门铁克存储管理系统有关系,后面开始是aix的设备头信息

正常redo文件信息

该库redo信息对比(获得aix偏移量)

对比正常redo起点信息和经验我们定位到aix的设备头偏移量为1000h(4096),整体偏移量为21000h(10*8192+4096)
该库的redo起点为21000h,也就是说,我们需要执行的dd语句为类似语句(redo 大小为120M)

dd if=/dev/vx/rdsk/dg/redo31 bs=512 skip=264 count=245761 of=/arch/xifenfei/redo31

dd出来所有数据后,对先要已经应用了归档的库,继续尝试recover redo

SQL> recover database using backup controlfile;
ORA-00279: change 2069348436 generated at 05/30/2013 16:18:08 needed for thread
1
ORA-00289: suggestion : /arch/1_16289_623109141.dbf
ORA-00280: change 2069348436 for thread 1 is in sequence #16289


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/arch/xifenfei/redo12
ORA-00279: change 2069348436 generated at 05/30/2013 16:18:08 needed for thread
2


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/arch/xifenfei/redo23
ORA-00279: change 2069348436 generated at 05/30/2013 16:18:07 needed for thread
3
ORA-00289: suggestion : /arch/3_3898_623109141.dbf
ORA-00280: change 2069348436 for thread 3 is in sequence #3898


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/arch/xifenfei/redo31
ORA-00279: change 2069455373 generated at 05/30/2013 20:03:26 needed for thread
1
ORA-00289: suggestion : /arch/1_16290_623109141.dbf
ORA-00280: change 2069455373 for thread 1 is in sequence #16290
ORA-00278: log file '/arch/xifenfei/redo12' no longer needed for this recovery


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/arch/xifenfei/redo11
ORA-00279: change 2069475956 generated at 05/30/2013 20:04:09 needed for thread
3
ORA-00289: suggestion : /arch/3_3899_623109141.dbf
ORA-00280: change 2069475956 for thread 3 is in sequence #3899
ORA-00278: log file '/arch/xifenfei/redo31' no longer needed for this recovery


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/arch/xifenfei/redo33
Log applied.
Media recovery complete.
SQL> alter database open resetlogs;

Database altered.

到这一步我们完整的通过dd跳过了由于赛门铁克管理磁盘导致的磁盘头损坏的块,从裸设备中复制出redo到文件系统,然后进行恢复,完整的抢救了客户的数据,减少了客户的损坏.这里温馨提示对于非常重要的系统(涉及钱),强烈建议redo多路冗余,光依靠存储容灾,备份,dg依然不够

记录一次ORA-600 4000数据库故障恢复

ORA-600[4000]错误
一朋友数据库因为当前redo丢失,在恢复的过程中启动报ORA-600[4000]错误

SMON: enabling cache recovery
Thu May 30 16:24:17 2013
Errors in file /u02/oracle/app/oracle/admin/xifenfei/udump/xifenfei1_ora_1458370.trc:
ORA-00600: internal error code, arguments: [4000], [83], [], [], [], [], [], []
Thu May 30 16:24:19 2013
Errors in file /u02/oracle/app/oracle/admin/xifenfei/udump/xifenfei1_ora_1458370.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [4000], [83], [], [], [], [], [], []
Thu May 30 16:24:19 2013
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 1458370
ORA-1092 signalled during: alter database open resetlogs...

分析trace文件

*** 2013-05-30 16:24:17.979
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [4000], [83], [], [], [], [], [], []
Current SQL statement for this session:
select ctime, mtime, stime from obj$ where obj# = :1
--确定是obj$对象异常,通过某种手段找到obj$的objid和dataobjid均为16,对应16进制为12

Block header dump:  0x0040007a
 Object id on Block? Y
 seg/obj: 0x12  csc: 0xc1e.a329e76f  itc: 1  flg: -  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0053.02a.000598bd  0x0d407e46.4f52.2f  --U-    1  fsc 0x0000.a329e772

这里比较明显obj$对象在rdba为0040007a的block上,scn为0c1e.a329e76f(13325725984623)且未提交的事务,这样的现象就决定了处理的特殊性(不是因为块延迟清理导致访问undo现象,该现象直接推进scn解决,而该情况不行)

数据文件头scn

SQL> SELECT DISTINCT CHECKPOINT# FROM V$DATAFILE_HEADER;

    CHECKPOINT_CHANGE#
-------------------------
           13324676536960

bbed查看文件头scn

   struct kcvfhckp, 160 bytes               @484     
      struct kcvcpscn, 8 bytes              @484     
         ub4 kscnbas                        @484      0x649c9a80
         ub2 kscnwrp                        @488      0x0c1e

这里看到的文件头scn也是为13324676536960(0c1e.649c9a80)和sql查询结果一致,也就是说数据库中的obj$的某个对象含有事务,且scn大于文件头scn(因为当前redo丢失,无法前滚,所以出现该情况),当数据库访问obj$的时候,为了事务的一致性,就需要访问undo(这里提示为83 回滚段),而undo异常,所以smon进程回滚失败,数据库无法正常启动

使用bbed提交事务

BBED> map
 File: /oradata/sys/xifenfei/system01.dbf (1)
 Block: 122                                   Dba:0x0040007a
------------------------------------------------------------
 KTB Data Block (Table/Cluster)

 struct kcbh, 20 bytes                      @0       

 struct ktbbh, 48 bytes                     @20      

 struct kdbh, 14 bytes                      @68      

 struct kdbt[1], 4 bytes                    @82      

 sb2 kdbr[108]                              @86      

 ub1 freespace[802]                         @302     

 ub1 rowdata[7084]                          @1104    

 ub4 tailchk                                @8188    


BBED> p ktbbh
struct ktbbh, 48 bytes                      @20      
   ub1 ktbbhtyp                             @20       0x01 (KDDBTDATA)
   union ktbbhsid, 4 bytes                  @24      
      ub4 ktbbhsg1                          @24       0x00000012
      ub4 ktbbhod1                          @24       0x00000012
   struct ktbbhcsc, 8 bytes                 @28      
      ub4 kscnbas                           @28       0xa329e76f
      ub2 kscnwrp                           @32       0x0c1e
   b2 ktbbhict                              @36       1
   ub1 ktbbhflg                             @38       0x02 (NONE)
   ub1 ktbbhfsl                             @39       0x00
   ub4 ktbbhfnx                             @40       0x00000000
   struct ktbbhitl[0], 24 bytes             @44      
      struct ktbitxid, 8 bytes              @44      
         ub2 kxidusn                        @44       0x0053
         ub2 kxidslt                        @46       0x002a
         ub4 kxidsqn                        @48       0x000598bd
      struct ktbituba, 8 bytes              @52      
         ub4 kubadba                        @52       0x0d407e46
         ub2 kubaseq                        @56       0x4f52
         ub1 kubarec                        @58       0x2f
      ub2 ktbitflg                          @60       0x2001 (KTBFUPB)<--需要提交
      union _ktbitun, 2 bytes               @62      
         b2 _ktbitfsc                       @62       0
         ub2 _ktbitwrp                      @62       0x0000
      ub4 ktbitbas                          @64       0xa329e772

BBED> set count 32
        COUNT           32

BBED> set offset 60
        OFFSET          60

BBED> d
 File: /oradata/sys/xifenfei/system01.dbf (1)
 Block: 122              Offsets:   60 to   91           Dba:0x0040007a
------------------------------------------------------------------------
 20010000 a329e772 0001006c ffff00ea 040c0368 03680000 006c1f7c 1f3c1efb 

 <32 bytes per line>

BBED> m /x 8001
 File: /oradata/sys/xifenfei/system01.dbf (1)
 Block: 122              Offsets:   60 to   91           Dba:0x0040007a
------------------------------------------------------------------------
 80010000 a329e772 0001006c ffff00ea 040c0368 03680000 006c1f7c 1f3c1efb 

 <32 bytes per line>

BBED> sum apply
Check value for File 1, Block 122:
current = 0xafd6, required = 0xafd6

尝试open数据库ORA-600[2662]解决

Thu May 30 21:16:00 2013
Errors in file /u02/oracle/app/oracle/admin/xifenfei/bdump/xifenfei1_smon_819664.trc:
ORA-00600: internal error code, arguments:[2662],[3102],[2737532996],[3102],[2745973074],[4194397],[],[]
Non-fatal internal error happenned while SMON was doing non-existent object cleanup.
SMON encountered 1 out of maximum 100 non-fatal internal errors.
Thu May 30 21:16:01 2013
Trace dumping is performing id=[cdmp_20130530211601]
Thu May 30 21:16:02 2013
Errors in file /u02/oracle/app/oracle/admin/xifenfei/bdump/xifenfei1_smon_819664.trc:
ORA-00600: internal error code, arguments:[2662],[3102],[2737532997],[3102],[2745973074],[4194397],[],[]
Thu May 30 21:16:03 2013
Non-fatal internal error happenned while SMON was doing logging scn->time mapping.
SMON encountered 2 out of maximum 100 non-fatal internal errors.
Thu May 30 21:16:05 2013
Errors in file /u02/oracle/app/oracle/admin/xifenfei/bdump/xifenfei1_smon_819664.trc:
ORA-00600: internal error code, arguments:[2662],[3102],[2737532997],[3102],[2745973074],[4194397],[],[]
Thu May 30 21:16:08 2013
Errors in file /u02/oracle/app/oracle/admin/xifenfei/bdump/xifenfei1_pmon_958764.trc:
ORA-00474: SMON process terminated with error
Thu May 30 21:16:08 2013
PMON: terminating instance due to error 474

数据库在open过程中遇到大量ORA-00600[2662],这个是因为数据库中文件头的scn小于访问的数据块scn导致该问题,解决方法推荐scn,如果数据库的scn本身就很大(和时间理论scn较接近),推进过程中可能遇到如下错误,这个时候就需要选择合适的方法/合适的值来推进scn

SQL> startup pfile=/home/oracle/pfile force
ORACLE instance started.

Total System Global Area 5.5835E+10 bytes
Fixed Size                  2177056 bytes
Variable Size            3.2867E+10 bytes
Database Buffers         2.2951E+10 bytes
Redo Buffers               14598144 bytes
Database mounted.
ORA-01052: required destination LOG_ARCHIVE_DUPLEX_DEST is not specified

后面的工作因为没有redo前滚,而且该库故障时有大量事务在跑,现在无法前滚,导致大量的undo回滚段异常,index和data不一致等故障,需要做的就是屏蔽undo seg,重建undo,重建库