存储异常导致ORA-10562故障恢复

朋友数据库由于存储变动,导致数据库瞬间hang住,然后直接crash,之后无法正常启动,请求技术支持.
数据库报ORA-00600[2131]错误
不能mount,可以通过重建控制文件解决

Mon Nov 30 20:35:38 2015
alter database mount
Mon Nov 30 20:35:38 2015
NOTE: Loaded library: System 
Mon Nov 30 20:35:38 2015
SUCCESS: diskgroup DATADG was mounted
Mon Nov 30 20:35:38 2015
NOTE: dependency between database xifenfei and diskgroup resource ora.DATADG.dg is established
Errors in file /u01/app/oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_ora_26450.trc  (incident=3032256):
ORA-00600: internal error code, arguments: [2131], [33], [32], [], [], [], [], [], [], [], [], []
ORA-600 signalled during: alter database mount...

尝试recover数据库

Mon Nov 30 20:45:53 2015
ALTER DATABASE RECOVER  database  
Media Recovery Start
 started logmerger process
Parallel Media Recovery started with 80 slaves
Mon Nov 30 20:45:56 2015
Recovery of Online Redo Log: Thread 2 Group 11 Seq 617 Reading mem 0
  Mem# 0: +DATADG/xifenfei/redo011.log
Recovery of Online Redo Log: Thread 1 Group 4 Seq 5410 Reading mem 0
  Mem# 0: +DATADG/xifenfei/redo04.log
Recovery of Online Redo Log: Thread 1 Group 5 Seq 5411 Reading mem 0
  Mem# 0: +DATADG/xifenfei/redo05.log
Mon Nov 30 20:46:07 2015
Recovery of Online Redo Log: Thread 1 Group 6 Seq 5412 Reading mem 0
  Mem# 0: +DATADG/xifenfei/redo06.log
Mon Nov 30 20:46:13 2015
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0xC] [PC:0x95FB502, kdxlin()+4088] [flags: 0x0, count: 1]
Errors in file /u01/app/oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_pr13_30480.trc  (incident=3032568):
ORA-07445: 出现异常错误: 核心转储 [kdxlin()+4088] [SIGSEGV] [ADDR:0xC] [PC:0x95FB502] [Address not mapped to object] []
Mon Nov 30 20:46:17 2015
Sweep [inc][3032568]: completed
Sweep [inc2][3032568]: completed
Mon Nov 30 20:46:31 2015
Slave exiting with ORA-10562 exception
Errors in file /u01/app/oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_pr13_30480.trc:
ORA-10562: Error occurred while applying redo to data block (file# 2, block# 165054)
ORA-10564: tablespace SYSAUX
ORA-01110: 数据文件 2: '+DATADG/xifenfei/datafile/sysaux.265.861925867'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 271
ORA-00607: 当更改数据块时出现内部错误
ORA-00602: 内部编程异常错误
ORA-07445: 出现异常错误: 核心转储 [kdxlin()+4088] [SIGSEGV] [ADDR:0xC] [PC:0x95FB502] [Address not mapped to object] []
Mon Nov 30 20:46:31 2015
Recovery Slave PR13 previously exited with exception 10562
Mon Nov 30 20:46:33 2015
Checker run found 28 new persistent data failures
Mon Nov 30 20:46:35 2015
Media Recovery failed with error 448
Errors in file /u01/app/oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_pr00_30400.trc:
ORA-00283: 恢复会话因错误而取消
ORA-00448: 后台进程正常结束
ORA-10562 signalled during: ALTER DATABASE RECOVER  database  ...

通过这里可以看到,由于在recover 操作之时,由于某种原因redo的数据无法apply到file 2 block 165054中,导致数据库recover database失败.

按照数据文件recover操作

SQL> recover datafile 1;
Media recovery complete.
SQL> recover datafile 3,4,5,6,7;
Media recovery complete.
SQL> recover datafile 8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28;              
Media recovery complete.
SQL> recover datafile 2;
ORA-00283: recovery session canceled due to errors
ORA-10562: Error occurred while applying redo to data block (file# 2, block#
165054)
ORA-10564: tablespace SYSAUX
ORA-01110: data file 2: '+DATADG/xifenfei/datafile/sysaux.265.861925867'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 271
ORA-00607: Internal error occurred while making a change to a data block
ORA-00602: internal programming exception
ORA-07445: exception encountered: core dump [kdxlin()+4088] [SIGSEGV]
[ADDR:0xC] [PC:0x95FB502] [Address not mapped to object] []

错误提示和recover database一样,那我们只能让恢复跳过该block继续恢复,因为根据经验判断data object# 271不是系统核心对象,不会影响数据库的启动

跳过坏块继续恢复

SQL> recover  datafile 2 allow 1 corruption;
ORA-00283: recovery session canceled due to errors
ORA-00600: internal error code, arguments: [3020], [2], [69793], [8458401], [],
[], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 2, block# 69793, file
offset is 571744256 bytes)
ORA-10564: tablespace SYSAUX
ORA-01110: data file 2: '+DATADG/xifenfei/datafile/sysaux.265.861925867'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 272


SQL> recover  datafile 2 allow 1 corruption;
Media recovery complete.
SQL> alter database open;

Database altered.

出现了ORA-600[3020] 继续跳过坏块,然后数据库顺利open,别忘记加tempfile

处理异常对象

SQL> select object_name,object_type from dba_objects where data_object_id in(272,271);

OBJECT_NAME
--------------------------------------------------------------------------------
OBJECT_TYPE
-------------------
SMON_SCN_TIME_TIM_IDX
INDEX

SMON_SCN_TIME_SCN_IDX
INDEX

SQL> select index_name from dba_indexes where table_name='SMON_SCN_TIME';

INDEX_NAME
------------------------------
SMON_SCN_TIME_TIM_IDX
SMON_SCN_TIME_SCN_IDX

SQL> set pages 1000
SQL> set long 1000
SQL> Select dbms_metadata.get_ddl('TABLE','SMON_SCN_TIME','SYS') FROM DUAL ;

DBMS_METADATA.GET_DDL('TABLE','SMON_SCN_TIME','SYS')
--------------------------------------------------------------------------------

  CREATE TABLE "SYS"."SMON_SCN_TIME"
   (    "THREAD" NUMBER,
        "TIME_MP" NUMBER,
        "TIME_DP" DATE,
        "SCN_WRP" NUMBER,
        "SCN_BAS" NUMBER,
        "NUM_MAPPINGS" NUMBER,
        "TIM_SCN_MAP" RAW(1200),
        "SCN" NUMBER DEFAULT 0,
        "ORIG_THREAD" NUMBER DEFAULT 0           /* for downgrade */
   ) CLUSTER "SYS"."SMON_SCN_TO_TIME_AUX" ("THREAD")

SQL> analyze table smon_scn_time validate structure cascade online;
analyze table smon_scn_time validate structure cascade online
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 2, block # 165054)
ORA-01110: data file 2: '+DATADG/xifenfei/datafile/sysaux.265.861925867'

SQL> truncate CLUSTER "SYS"."SMON_SCN_TO_TIME_AUX";

Cluster truncated.

关于SMON_SCN_TIME部分处理,可以参考:关于SMON_SCN_TIME若干问题说明.至此数据库基本上恢复完成,而且运气非常好,恢复的非常完美,数据实现0丢失.

asm disk格式化恢复

接到网友请求,由于操作人员粗心把asm disk的磁盘映射到另外的机器上,并且格式化为了win ntfs文件系统,导致asm 磁盘组异常,数据库无法使用
asm 日志报ORA-27072错

Mon Nov 30 12:00:13 2015
Errors in file c:\app\administrator\diag\asm\+asm\+asm\trace\+asm_gmon_868.trc:
ORA-27070: async read/write failed
OSD-04008: WriteFile() 失败, 无法写入文件
O/S-Error: (OS 21) 设备未就绪。
WARNING: IO Failed. group:1 disk(number.incarnation):0.0xf0f0bbfb disk_path:\\.\ORCLDISKDATA0
	 AU:1 disk_offset(bytes):2093056 io_size:4096 operation:Write type:synchronous
	 result:I/O error process_id:868
WARNING: disk 0.4042308603 (DATA_0000) not responding to heart beat
ERROR: too many offline disks in PST (grp 1)
WARNING: Disk DATA_0000 in mode 0x7f will be taken offline
Mon Nov 30 12:00:13 2015
NOTE: process 576:37952 initiating offline of disk 0.4042308603 (DATA_0000) with mask 0x7e in group 1
WARNING: Disk DATA_0000 in mode 0x7f is now being taken offline
NOTE: initiating PST update: grp = 1, dsk = 0/0xf0f0bbfb, mode = 0x15
kfdp_updateDsk(): 5 
kfdp_updateDskBg(): 5 
Errors in file c:\app\administrator\diag\asm\+asm\+asm\trace\+asm_gmon_868.trc:
ORA-27072: File I/O error
WARNING: IO Failed. group:1 disk(number.incarnation):1.0xf0f0bbfc disk_path:\\.\ORCLDISKDATA1
	 AU:1 disk_offset(bytes):1048576 io_size:4096 operation:Read type:synchronous
	 result:I/O error process_id:868
Errors in file c:\app\administrator\diag\asm\+asm\+asm\trace\+asm_gmon_868.trc:
ORA-27072: File I/O error
WARNING: IO Failed. group:1 disk(number.incarnation):1.0xf0f0bbfc disk_path:\\.\ORCLDISKDATA1
	 AU:1 disk_offset(bytes):1052672 io_size:4096 operation:Read type:synchronous
	 result:I/O error process_id:868
Errors in file c:\app\administrator\diag\asm\+asm\+asm\trace\+asm_gmon_868.trc:
ORA-27072: File I/O error
WARNING: IO Failed. group:1 disk(number.incarnation):2.0xf0f0bbfd disk_path:\\.\ORCLDISKDATA2
	 AU:1 disk_offset(bytes):1048576 io_size:4096 operation:Read type:synchronous
	 result:I/O error process_id:868
Errors in file c:\app\administrator\diag\asm\+asm\+asm\trace\+asm_gmon_868.trc:
ORA-27072: File I/O error
WARNING: IO Failed. group:1 disk(number.incarnation):2.0xf0f0bbfd disk_path:\\.\ORCLDISKDATA2
	 AU:1 disk_offset(bytes):1052672 io_size:4096 operation:Read type:synchronous
	 result:I/O error process_id:868
Errors in file c:\app\administrator\diag\asm\+asm\+asm\trace\+asm_gmon_868.trc:
ORA-27072: File I/O error
WARNING: IO Failed. group:1 disk(number.incarnation):3.0xf0f0bbfe disk_path:\\.\ORCLDISKDATA3
	 AU:1 disk_offset(bytes):1048576 io_size:4096 operation:Read type:synchronous
	 result:I/O error process_id:868
Errors in file c:\app\administrator\diag\asm\+asm\+asm\trace\+asm_gmon_868.trc:
ORA-27072: File I/O error
WARNING: IO Failed. group:1 disk(number.incarnation):3.0xf0f0bbfe disk_path:\\.\ORCLDISKDATA3
	 AU:1 disk_offset(bytes):1052672 io_size:4096 operation:Read type:synchronous
	 result:I/O error process_id:868
Errors in file c:\app\administrator\diag\asm\+asm\+asm\trace\+asm_gmon_868.trc:
ORA-27072: File I/O error
WARNING: IO Failed. group:1 disk(number.incarnation):4.0xf0f0bbff disk_path:\\.\ORCLDISKDATA4
	 AU:1 disk_offset(bytes):1048576 io_size:4096 operation:Read type:synchronous
	 result:I/O error process_id:868
Errors in file c:\app\administrator\diag\asm\+asm\+asm\trace\+asm_gmon_868.trc:
ORA-27072: File I/O error
WARNING: IO Failed. group:1 disk(number.incarnation):4.0xf0f0bbff disk_path:\\.\ORCLDISKDATA4
	 AU:1 disk_offset(bytes):1052672 io_size:4096 operation:Read type:synchronous
	 result:I/O error process_id:868
Errors in file c:\app\administrator\diag\asm\+asm\+asm\trace\+asm_gmon_868.trc:
ORA-27072: File I/O error
WARNING: IO Failed. group:1 disk(number.incarnation):6.0xf0f0bc01 disk_path:\\.\ORCLDISKDATA6
	 AU:1 disk_offset(bytes):1048576 io_size:4096 operation:Read type:synchronous
	 result:I/O error process_id:868
Errors in file c:\app\administrator\diag\asm\+asm\+asm\trace\+asm_gmon_868.trc:
ORA-27072: File I/O error
WARNING: IO Failed. group:1 disk(number.incarnation):6.0xf0f0bc01 disk_path:\\.\ORCLDISKDATA6
	 AU:1 disk_offset(bytes):1052672 io_size:4096 operation:Read type:synchronous
	 result:I/O error process_id:868
Errors in file c:\app\administrator\diag\asm\+asm\+asm\trace\+asm_gmon_868.trc:
ORA-27072: File I/O error
WARNING: IO Failed. group:1 disk(number.incarnation):7.0xf0f0bc02 disk_path:\\.\ORCLDISKDATA7
	 AU:1 disk_offset(bytes):1048576 io_size:4096 operation:Read type:synchronous
	 result:I/O error process_id:868
Errors in file c:\app\administrator\diag\asm\+asm\+asm\trace\+asm_gmon_868.trc:
ORA-27072: File I/O error
WARNING: IO Failed. group:1 disk(number.incarnation):7.0xf0f0bc02 disk_path:\\.\ORCLDISKDATA7
	 AU:1 disk_offset(bytes):1052672 io_size:4096 operation:Read type:synchronous
	 result:I/O error process_id:868
ERROR: no PST quorum in group: required 1, found 0
WARNING: Disk DATA_0000 in mode 0x7f offline aborted
Mon Nov 30 12:00:14 2015
SQL> alter diskgroup DATA dismount force /* ASM SERVER */ 
NOTE: cache dismounting (not clean) group 1/0xBB404B03 (DATA) 
Mon Nov 30 12:00:14 2015
NOTE: halting all I/Os to diskgroup DATA
Mon Nov 30 12:00:14 2015
NOTE: LGWR doing non-clean dismount of group 1 (DATA)
NOTE: LGWR sync ABA=367.7265 last written ABA 367.7265
NOTE: cache dismounted group 1/0xBB404B03 (DATA) 
kfdp_dismount(): 6 
kfdp_dismountBg(): 6 
NOTE: De-assigning number (1,0) from disk (\\.\ORCLDISKDATA0)
NOTE: De-assigning number (1,1) from disk (\\.\ORCLDISKDATA1)
NOTE: De-assigning number (1,2) from disk (\\.\ORCLDISKDATA2)
NOTE: De-assigning number (1,3) from disk (\\.\ORCLDISKDATA3)
NOTE: De-assigning number (1,4) from disk (\\.\ORCLDISKDATA4)
NOTE: De-assigning number (1,5) from disk (\\.\ORCLDISKDATA5)
NOTE: De-assigning number (1,6) from disk (\\.\ORCLDISKDATA6)
NOTE: De-assigning number (1,7) from disk (\\.\ORCLDISKDATA7)
SUCCESS: diskgroup DATA was dismounted
NOTE: cache deleting context for group DATA 1/-1153414397
SUCCESS: alter diskgroup DATA dismount force /* ASM SERVER */
ERROR: PST-initiated MANDATORY DISMOUNT of group DATA

这里的asm日志很明显由于asm disk无法正常访问,报ORA-27072错误,磁盘组强制dismount.

分析磁盘情况
asm-disk1
asm-disk2


通过与客户沟通,确定从I到O本为asm disk 被格式化为了NTFS文件系统的磁盘,结合asmtool分析可以发现还有一个asm disk没有格式化掉,该磁盘组中一个共有8个磁盘格式化掉了7个.

通过kfed分析磁盘信息

C:\Users\Administrator>kfed read '\\.\J:'
kfbh.endian:                        235 ; 0x000: 0xeb
kfbh.hard:                           82 ; 0x001: 0x52
kfbh.type:                          144 ; 0x002: *** Unknown Enum ***
kfbh.datfmt:                         78 ; 0x003: 0x4e
kfbh.block.blk:               542328404 ; 0x004: T=0 NUMB=0x20534654
kfbh.block.obj:                 2105376 ; 0x008: TYPE=0x0 NUMB=0x2020
kfbh.check:                        2050 ; 0x00c: 0x00000802
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                    63488 ; 0x014: 0x0000f800
kfbh.spare1:                   16711743 ; 0x018: 0x00ff003f
kfbh.spare2:                       2048 ; 0x01c: 0x00000800
ERROR!!!, failed to get the oracore error message

C:\Users\Administrator>kfed read '\\.\J:' blkn=2
kfbh.endian:                         70 ; 0x000: 0x46
kfbh.hard:                           73 ; 0x001: 0x49
kfbh.type:                           76 ; 0x002: *** Unknown Enum ***
kfbh.datfmt:                         69 ; 0x003: 0x45
kfbh.block.blk:                  196656 ; 0x004: T=0 NUMB=0x30030
kfbh.block.obj:                33563364 ; 0x008: TYPE=0x0 NUMB=0x22e4
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                    65537 ; 0x010: 0x00010001
kfbh.fcn.wrap:                    65592 ; 0x014: 0x00010038
kfbh.spare1:                        416 ; 0x018: 0x000001a0
kfbh.spare2:                       1024 ; 0x01c: 0x00000400
ERROR!!!, failed to get the oracore error message

C:\Users\Administrator>kfed read '\\.\J:' blkn=256
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                           13 ; 0x002: KFBTYP_PST_NONE
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:              2147483648 ; 0x004: T=1 NUMB=0x0
kfbh.block.obj:              2147483654 ; 0x008: TYPE=0x8 NUMB=0x6
kfbh.check:                    17662471 ; 0x00c: 0x010d8207
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
ERROR!!!, failed to get the oracore error message

C:\Users\Administrator>kfed read '\\.\J:' blkn=510
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                     254 ; 0x004: T=0 NUMB=0xfe
kfbh.block.obj:              2147483654 ; 0x008: TYPE=0x8 NUMB=0x6
kfbh.check:                   717599272 ; 0x00c: 0x2ac5b228
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr:    ORCLDISKDATA6 ; 0x000: length=13
kfdhdb.driver.reserved[0]:   1096040772 ; 0x008: 0x41544144
kfdhdb.driver.reserved[1]:           54 ; 0x00c: 0x00000036
kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
…………

通过分析,可以确定asm disk的备份block没有被覆盖,原则上可以通过备份block实现磁盘组恢复,从而减小了恢复难度

kfed恢复磁盘头

C:\Users\Administrator> kfed repair '\\.\J:'
C:\Users\Administrator>kfed read '\\.\J:' 
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                     254 ; 0x004: T=0 NUMB=0xfe
kfbh.block.obj:              2147483654 ; 0x008: TYPE=0x8 NUMB=0x6
kfbh.check:                   717599272 ; 0x00c: 0x2ac5b228
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr:    ORCLDISKDATA6 ; 0x000: length=13
kfdhdb.driver.reserved[0]:   1096040772 ; 0x008: 0x41544144
kfdhdb.driver.reserved[1]:           54 ; 0x00c: 0x00000036
kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
…………

确定asm disk相关信息
对于7个被格式化的磁盘都进行类似处理之后,通过工具看到相关磁盘信息如下
asm-disk3


恢复处理
根据ntfs的文件系统分布,我们可以知道,虽然asm disk header备份block正常,但是asm disk中间部分依旧有不少au会被破坏
ntfs


这样的情况,不合适直接使用工具拷贝出来datafile(由于可能记录block的字典正好被覆盖,导致拷贝出来的文件异常,在恢复过程中我们也做了试验小文件拷贝ok,大文件拷贝然后使用dbv检测有很多坏块),我们采用工具(asm disk header 彻底损坏恢复)从底层扫描直接重组出来asm disk中的数据文件,然后结合拷贝出来的控制文件,redo文件,参数文件,然后通过重命名相关路径,然后直接open数据库

Q:\>sqlplus / as sysdba

SQL*Plus: Release 11.2.0.1.0 Production on 星期三 1月 22 16:08:18 2014

Copyright (c) 1982, 2010, Oracle.  All rights reserved.


连接到:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options


SQL> set pages 1000
SQL> col name for a100
SQL> set lines 150
SQL> select file#,name from v$datafile;

     FILE# NAME
---------- --------------------------------------------------------------------
         1 +DATA/vspdb/datafile/system.256.778520603
         2 +DATA/vspdb/datafile/sysaux.257.778520603
         3 +DATA/vspdb/datafile/undotbs1.258.778520603
         4 +DATA/vspdb/datafile/users.259.778520603
         5 +DATA/vspdb/datafile/vsp_tbs.293.779926097
        …………
       147 +DATA/vspdb/datafile/index_dg.418.864665747
       148 +DATA/vspdb/datafile/data_dg.419.864667053
       149 +DATA/vspdb/datafile/vsp_mm_tbs.420.890410367
       150 +DATA/vspdb/datafile/vsp_mm_tbs.421.890410457

SQL> select member from v$logfile;

MEMBER
-------------------------------------------------------------------------------------
+DATA/vspdb/onlinelog/group_7.263.862676593
+DATA/vspdb/onlinelog/group_7.262.862676601
+DATA/vspdb/onlinelog/group_4.410.862652291
+DATA/vspdb/onlinelog/group_4.411.862652307
+DATA/vspdb/onlinelog/group_5.412.862653715
+DATA/vspdb/onlinelog/group_5.413.862653727
+DATA/vspdb/onlinelog/group_6.414.862676425
+DATA/vspdb/onlinelog/group_6.415.862676433

重命名数据文件和redo文件,open数据库

SQL> recover database;
完成介质恢复。
SQL> alter database open;

数据库已更改。

已用时间:  00: 00: 04.51

由于部分block被覆盖,使用空块代替,导致数据访问到该block就会出现ora-8103(模拟普通ORA-08103并解决,模拟极端ORA-08103并解决)错误,对于该种对象,最简单处理方法就是直接通过dul抽出来数据然后truncate table重新导入数据,当然如果你想彻底安全逻辑方式重建库最靠谱

ORA-15032 ORA-15040 ORA-15042 asm故障恢复

接到一个朋友恢复请求,19个lun的asm 磁盘组,由于其中一个lun有问题,他们进行了增加一个新lun,删除老lun的方法操作,但是操作一半hang住了(因为坏的lun是底层损坏,无法完成rebalance),然后存储工程师继续修复异常lun,非常幸运异常lun修复好了,但是高兴过了头,直接从存储上删除了新加入的lun(已经rebalance一部分数据进去了),这个时候asm dg彻底趴下了,不能mount成功,请求恢复支持。由于某种原因,无法从lun层面恢复,只能让我们提供数据库层面恢复

Mon Sep 21 19:52:35 2015
SQL> alter diskgroup  dg_XFF add disk '/dev/rhdisk116' size 716800M drop disk dg_XFF_0012
NOTE: Assigning number (1,20) to disk (/dev/rhdisk116)
NOTE: requesting all-instance membership refresh for group=1
NOTE: initializing header on grp 1 disk DG_XFF_0020
NOTE: requesting all-instance disk validation for group=1
Mon Sep 21 19:52:44 2015
NOTE: skipping rediscovery for group 1/0xb94738f1 (DG_XFF) on local instance.
NOTE: requesting all-instance disk validation for group=1
NOTE: skipping rediscovery for group 1/0xb94738f1 (DG_XFF) on local instance.
NOTE: initiating PST update: grp = 1
Mon Sep 21 19:52:44 2015
GMON updating group 1 at 25 for pid 27, osid 12124486
NOTE: PST update grp = 1 completed successfully
NOTE: membership refresh pending for group 1/0xb94738f1 (DG_XFF)
GMON querying group 1 at 26 for pid 18, osid 10092734
NOTE: cache opening disk 20 of grp 1: DG_XFF_0020 path:/dev/rhdisk116
GMON querying group 1 at 27 for pid 18, osid 10092734
SUCCESS: refreshed membership for 1/0xb94738f1 (DG_XFF)
Mon Sep 21 19:52:47 2015
SUCCESS: alter diskgroup  dg_XFF add disk '/dev/rhdisk116' size 716800M drop disk dg_XFF_0012
NOTE: starting rebalance of group 1/0xb94738f1 (DG_XFF) at power 1
Starting background process ARB0
Mon Sep 21 19:52:47 2015
ARB0 started with pid=28, OS id=10944804
NOTE: assigning ARB0 to group 1/0xb94738f1 (DG_XFF) with 1 parallel I/O
NOTE: Attempting voting file refresh on diskgroup DG_XFF
Mon Sep 21 20:35:06 2015
SQL> ALTER DISKGROUP DG_XFF MOUNT  /* asm agent *//* {1:51107:7083} */
NOTE: cache registered group DG_XFF number=1 incarn=0xdd6f975a
NOTE: cache began mount (first) of group DG_XFF number=1 incarn=0xdd6f975a
NOTE: Assigning number (1,0) to disk (/dev/rhdisk10)
NOTE: Assigning number (1,1) to disk (/dev/rhdisk11)
NOTE: Assigning number (1,2) to disk (/dev/rhdisk16)
NOTE: Assigning number (1,3) to disk (/dev/rhdisk17)
NOTE: Assigning number (1,4) to disk (/dev/rhdisk22)
NOTE: Assigning number (1,5) to disk (/dev/rhdisk23)
NOTE: Assigning number (1,6) to disk (/dev/rhdisk28)
NOTE: Assigning number (1,7) to disk (/dev/rhdisk29)
NOTE: Assigning number (1,8) to disk (/dev/rhdisk33)
NOTE: Assigning number (1,9) to disk (/dev/rhdisk34)
NOTE: Assigning number (1,10) to disk (/dev/rhdisk4)
NOTE: Assigning number (1,11) to disk (/dev/rhdisk40)
NOTE: Assigning number (1,12) to disk (/dev/rhdisk41)
NOTE: Assigning number (1,13) to disk (/dev/rhdisk45)
NOTE: Assigning number (1,14) to disk (/dev/rhdisk46)
NOTE: Assigning number (1,15) to disk (/dev/rhdisk5)
NOTE: Assigning number (1,16) to disk (/dev/rhdisk52)
NOTE: Assigning number (1,17) to disk (/dev/rhdisk53)
NOTE: Assigning number (1,18) to disk (/dev/rhdisk57)
NOTE: Assigning number (1,19) to disk (/dev/rhdisk58)
Wed Sep 30 11:08:07 2015
NOTE: start heartbeating (grp 1)
GMON querying group 1 at 33 for pid 35, osid 4194488
NOTE: Assigning number (1,20) to disk ()
GMON querying group 1 at 34 for pid 35, osid 4194488
NOTE: cache dismounting (clean) group 1/0xDD6F975A (DG_XFF)
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 1/0xDD6F975A (DG_XFF)
NOTE: cache ending mount (fail) of group DG_XFF number=1 incarn=0xdd6f975a
NOTE: cache deleting context for group DG_XFF 1/0xdd6f975a
GMON dismounting group 1 at 35 for pid 35, osid 4194488
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
ERROR: diskgroup DG_XFF was not mounted
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "20" is missing from group number "1"
ERROR: ALTER DISKGROUP DG_XFF MOUNT  /* asm agent *//* {1:51107:7083} */

这里比较明显,由于存储工程师直接删除了lun,这里导致磁盘组DG_XFF丢失asm disk 20,使得磁盘组无法直接mount,由于该磁盘组已经进行了较长时间的rebalance,丢失的盘中已经有大量数据(包括元数据),因此就算修改pst让磁盘组mount起来(不一定成功),也会丢失大量数据,也不一定可以直接拿出来里面的数据,如果只是加入盘,但是由于某种原因没有做rebalance,那我们直接可以通过修改pst,使得磁盘组mount起来。因此对于这样的情况,我们能够做的,只能从底层扫描磁盘,生成数据文件(因为有部分文件的元数据在丢失lun之上,如果直接使用现存元数据信息,直接拷贝,或者unload数据都会丢失大量数据),然后再进一步unload数据,完成恢复。需要恢复磁盘信息

grp# dsk# bsize ausize disksize diskname        groupname       path
---- ---- ----- ------ -------- --------------- --------------- -------------
   1    0  4096  4096K   179200 DG_XFF_0000     DG_XFF          /dev/rhdisk10
   1    1  4096  4096K   179200 DG_XFF_0001     DG_XFF          /dev/rhdisk11
   1    2  4096  4096K   179200 DG_XFF_0002     DG_XFF          /dev/rhdisk16
   1    3  4096  4096K   179200 DG_XFF_0003     DG_XFF          /dev/rhdisk17
   1    4  4096  4096K   179200 DG_XFF_0004     DG_XFF          /dev/rhdisk22
   1    5  4096  4096K   179200 DG_XFF_0005     DG_XFF          /dev/rhdisk23
   1    6  4096  4096K   179200 DG_XFF_0006     DG_XFF          /dev/rhdisk28
   1    7  4096  4096K   179200 DG_XFF_0007     DG_XFF          /dev/rhdisk29
   1    8  4096  4096K   179200 DG_XFF_0008     DG_XFF          /dev/rhdisk33
   1    9  4096  4096K   179200 DG_XFF_0009     DG_XFF          /dev/rhdisk34
   1   10  4096  4096K   179200 DG_XFF_0010     DG_XFF          /dev/rhdisk4
   1   11  4096  4096K   179200 DG_XFF_0011     DG_XFF          /dev/rhdisk40
   1   12  4096  4096K   179200 DG_XFF_0012     DG_XFF          /dev/rhdisk41
   1   13  4096  4096K   179200 DG_XFF_0013     DG_XFF          /dev/rhdisk45
   1   14  4096  4096K   179200 DG_XFF_0014     DG_XFF          /dev/rhdisk46
   1   15  4096  4096K   179200 DG_XFF_0015     DG_XFF          /dev/rhdisk5
   1   16  4096  4096K   179200 DG_XFF_0016     DG_XFF          /dev/rhdisk52
   1   17  4096  4096K   179200 DG_XFF_0017     DG_XFF          /dev/rhdisk53
   1   18  4096  4096K   179200 DG_XFF_0018     DG_XFF          /dev/rhdisk57
   1   19  4096  4096K   179200 DG_XFF_0019     DG_XFF          /dev/rhdisk58

这次运气比较好,丢失的磁盘组只是一个业务磁盘组,而且里面只有19个表空间,10个分区表,因此在数据字典完成的情况下,恢复10个分区表(一共6443个分区)的数据,整体恢复效果如下:
RECOVER


从整体数据量看恢复比例为:6003.26953/6027.26935*100%=99.6018127%,对于丢失了一个已经rebalance的大部分的lun,依旧能够恢复如此的数据,整体看非常理想.不得不说,老熊威武

Oracle bug ORA-600 k2vcbk_2故障恢复

有朋友找到我说他们数据库无法启动,数据库启动报ORA-600[k2vcbk_2]错误,数据库版本为11.2.0.2 RAC,操作系统是AIX 6.1

SQL> recover database;
Media recovery complete.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [k2vcbk_2], [], [], [], [], [], [],
[], [], [], [], []
Process ID: 7930020
Session ID: 49 Serial number: 14761

数据库节点1日志

Mon Sep 21 15:45:41 2015
Thread 1 advanced to log sequence 54076 (LGWR switch)
  Current log# 13 seq# 54076 mem# 0: +DG01/xifenfei/onlinelog/group_13.332.779459035
  Current log# 13 seq# 54076 mem# 1: +DG01/xifenfei/onlinelog/group_13.344.779582621
Mon Sep 21 15:45:44 2015
Archived Log entry 74655 added for thread 1 sequence 54075 ID 0x5a0bc0e1 dest 1:
Mon Sep 21 15:56:18 2015
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_ora_18088342.trc  (incident=184348):
ORA-00600: 内部错误代码, 参数: [kturPOTS_0], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /oracle/diag/rdbms/xifenfei/xifenfei1/incident/incdir_184348/xifenfei1_ora_18088342_i184348.trc
Mon Sep 21 15:56:34 2015
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Error 600 trapped in 2PC on transaction 7.16.120119. Cleaning up.
Error stack returned to user:
ORA-00600: 内部错误代码, 参数: [kturPOTS_0], [], [], [], [], [], [], [], [], [], [], []
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_ora_18088342.trc  (incident=184349):
ORA-00603: ORACLE 服务器会话因致命错误而终止
ORA-00600: 内部错误代码, 参数: [kturPOTS_0], [], [], [], [], [], [], [], [], [], [], []
Mon Sep 21 15:56:34 2015
Dumping diagnostic data in directory=[cdmp_20150921155634], requested by (instance=1, osid=18088342), summary=[incident=184348].
Incident details in: /oracle/diag/rdbms/xifenfei/xifenfei1/incident/incdir_184349/xifenfei1_ora_18088342_i184349.trc
Mon Sep 21 15:56:35 2015
Sweep [inc][184349]: completed
Sweep [inc][184348]: completed
Sweep [inc2][184348]: completed
opiodr aborting process unknown ospid (18088342) as a result of ORA-603
Mon Sep 21 15:57:12 2015
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_smon_7536810.trc  (incident=184274):
ORA-00600: internal error code, arguments: [k2vcbk_2], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /oracle/diag/rdbms/xifenfei/xifenfei1/incident/incdir_184274/xifenfei1_smon_7536810_i184274.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Mon Sep 21 15:57:16 2015
Dumping diagnostic data in directory=[cdmp_20150921155716], requested by (instance=1, osid=7536810 (SMON)), summary=[incident=184274].
Fatal internal error happened while SMON was doing active transaction recovery.
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_smon_7536810.trc:
ORA-00600: internal error code, arguments: [k2vcbk_2], [], [], [], [], [], [], [], [], [], [], []
SMON (ospid: 7536810): terminating the instance due to error 474
Mon Sep 21 15:57:18 2015
ORA-1092 : opitsk aborting process

数据库节点2日志

Mon Sep 21 15:21:50 2015
Archived Log entry 74653 added for thread 2 sequence 23559 ID 0x5a0bc0e1 dest 1:
Mon Sep 21 15:44:28 2015
Thread 2 advanced to log sequence 23561 (LGWR switch)
  Current log# 12 seq# 23561 mem# 0: +DG01/xifenfei/onlinelog/group_12.338.779457003
  Current log# 12 seq# 23561 mem# 1: +DG01/xifenfei/onlinelog/group_12.265.779582493
Mon Sep 21 15:44:31 2015
Archived Log entry 74654 added for thread 2 sequence 23560 ID 0x5a0bc0e1 dest 1:
Mon Sep 21 15:45:31 2015
DISTRIB TRAN xifenfei.1ebab0a5.20.3.1533822
  is local tran 20.3.1533822 (hex=14.03.17677e)
  insert pending committed tran, scn=14590688068086 (hex=d45.28c781f6)
Mon Sep 21 15:45:31 2015
DISTRIB TRAN xifenfei.1ebab0a5.20.3.1533822
  is local tran 20.3.1533822 (hex=14.03.17677e))
  delete pending committed tran, scn=14590688068086 (hex=d45.28c781f6)
Mon Sep 21 15:56:35 2015
Dumping diagnostic data in directory=[cdmp_20150921155634], requested by (instance=1, osid=18088342), summary=[incident=184348].
Mon Sep 21 15:57:10 2015
Error 3135 trapped in 2PC on transaction 20.11.1534704. Cleaning up.
Error stack returned to user:
ORA-03135: 连接失去联系
opidcl aborting process unknown ospid (9175532) as a result of ORA-604
Mon Sep 21 15:57:17 2015
Dumping diagnostic data in directory=[cdmp_20150921155716], requested by (instance=1, osid=7536810 (SMON)), summary=[incident=184274].
Mon Sep 21 15:57:23 2015
Reconfiguration started (old inc 18, new inc 20)
List of instances:
 2 (myinst: 2) 
 Global Resource Directory frozen
 * dead instance detected - domain 0 invalid = TRUE 
 Communication channels reestablished
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
Mon Sep 21 15:57:23 2015
 LMS 2: 3 GCS shadows cancelled, 1 closed, 0 Xw survived
Mon Sep 21 15:57:23 2015
 LMS 0: 2 GCS shadows cancelled, 0 closed, 0 Xw survived
Mon Sep 21 15:57:23 2015
 LMS 1: 3 GCS shadows cancelled, 1 closed, 0 Xw survived
 Set master node info 
 Submitted all remote-enqueue requests
 Dwn-cvts replayed, VALBLKs dubious
 All grantable enqueues granted
 Post SMON to start 1st pass IR
Mon Sep 21 15:57:23 2015
minact-scn: Inst 2 is now the master inc#:20 mmon proc-id:6816208 status:0x7
minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0d45.28c2bb5c gcalc-scn:0x0d45.28c3bd2e
minact-scn: master found reconf/inst-rec before recscn scan old-inc#:20 new-inc#:20
Mon Sep 21 15:57:23 2015
Instance recovery: looking for dead threads
 Submitted all GCS remote-cache requests
 Post SMON to start 1st pass IR
 Fix write in gcs resources
Reconfiguration complete
Beginning instance recovery of 1 threads
 parallel recovery started with 31 processes
Started redo scan
Completed redo scan
 read 12626 KB redo, 1724 data blocks need recovery
Started redo application at
 Thread 1: logseq 54076, block 184416
Recovery of Online Redo Log: Thread 1 Group 13 Seq 54076 Reading mem 0
  Mem# 0: +DG01/xifenfei/onlinelog/group_13.332.779459035
  Mem# 1: +DG01/xifenfei/onlinelog/group_13.344.779582621
Completed redo application of 9.78MB
Completed instance recovery at
 Thread 1: logseq 54076, block 209669, scn 14590688357285
 1633 data blocks read, 1794 data blocks written, 12626 redo k-bytes read
Thread 1 advanced to log sequence 54077 (thread recovery)
Mon Sep 21 15:57:33 2015
Error 3113 trapped in 2PC on transaction 21.18.1965522. Cleaning up.
Redo thread 1 internally disabled at seq 54077 (SMON)
Error stack returned to user:
ORA-02050: 事务处理 21.18.1965522 已回退, 某些远程数据库可能有问题
ORA-03113: 通信通道的文件结尾
ORA-02063: 紧接着 line (起自 ZSK)
Mon Sep 21 15:57:34 2015
Archived Log entry 74656 added for thread 1 sequence 54076 ID 0x5a0bc0e1 dest 1:
Mon Sep 21 15:57:34 2015
ARC0: Archiving disabled thread 1 sequence 54077
Archived Log entry 74657 added for thread 1 sequence 54077 ID 0x5a0bc0e1 dest 1:
Mon Sep 21 15:57:35 2015
Thread 2 advanced to log sequence 23562 (LGWR switch)
  Current log# 8 seq# 23562 mem# 0: +DG01/xifenfei/onlinelog/group_8.334.779456945
  Current log# 8 seq# 23562 mem# 1: +DG01/xifenfei/onlinelog/group_8.267.779582453
Mon Sep 21 15:57:36 2015
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei2/trace/xifenfei2_smon_6750672.trc  (incident=200218):
ORA-00600: internal error code, arguments: [k2vcbk_2], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /oracle/diag/rdbms/xifenfei/xifenfei2/incident/incdir_200218/xifenfei2_smon_6750672_i200218.trc
Archived Log entry 74658 added for thread 2 sequence 23561 ID 0x5a0bc0e1 dest 1:
Mon Sep 21 15:57:38 2015
minact-scn: master continuing after IR
Mon Sep 21 15:57:41 2015
Dumping diagnostic data in directory=[cdmp_20150921155741], requested by (instance=2, osid=6750672 (SMON)), summary=[incident=200218].
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Fatal internal error happened while SMON was doing instance transaction recovery.
Errors in file /oracle/diag/rdbms/xifenfei/xifenfei2/trace/xifenfei2_smon_6750672.trc:
ORA-00600: internal error code, arguments: [k2vcbk_2], [], [], [], [], [], [], [], [], [], [], []
SMON (ospid: 6750672): terminating the instance due to error 474
Mon Sep 21 15:57:41 2015
ORA-1092 : opitsk aborting process
Mon Sep 21 15:57:42 2015
ORA-1092 : opitsk aborting process
Mon Sep 21 15:57:42 2015
License high water mark = 291
Instance terminated by SMON, pid = 6750672
USER (ospid: 18874814): terminating the instance

通过数据库日志大概可以看出来,由于节点2的分布式事事务异常,而在11.2.0.2中,分布式事务跨节点,引起节点2的pmon清理异常事务,但是由于bug,使得异常事务无法被清理掉,从而引起节点1 crash,节点1 crash之后节点2进行恢复,也因为分布式事务bug,导致smon回滚失败,实例也crash。无法进行回滚导致数据库无法正常启动,通过查询mos发现定位到是Bug 10222544 ORA-600 [k2vpci_2] from multi-branch distributed transaction
ORA-600-k2vpci_2


对于这类问题,由于分布事务无法清理,处理方法就是找出来事务人工提交或者直接屏蔽掉这个事务解决该问题

文件头损坏ORA-01122 ORA-01210恢复

有朋友数据文件头出现错误ORA-01122和ORA-01210等错误,数据库无法正常open。
ORA-01210


因为平台是win,他们找我咨询win bbed,因为回老家电脑没有带,无法提供win的bbed.我通过dd部分文件头,然后在linux平台分析发现是该文件的文件头block大量坏块

bbed分析坏块情况

BBED> show all
        FILE#           0
        BLOCK#          1
        OFFSET          0
        DBA             0x00000000 (0 0,1)
        FILENAME        /tmp/30.dbf
        BIFILE          bifile.bbd
        LISTFILE       
        BLOCKSIZE       8192
        MODE            Browse
        EDIT            Unrecoverable
        IBASE           Dec
        OBASE           Dec
        WIDTH           80
        COUNT           512
        LOGFILE         log.bbd
        SPOOL           No

BBED> set count 64
        COUNT           64

BBED> map
 File: /tmp/30.dbf (0)
 Block: 1                                     Dba:0x00000000
------------------------------------------------------------
BBED-00400: invalid blocktype (27)


BBED> d
 File: /tmp/30.dbf (0)
 Block: 1                Offsets:    0 to   63           Dba:0x00000000
------------------------------------------------------------------------
 1ba20000 03004400 bffd8a1d 0000000c acba0000 008f4500 00003455 fc020000 
 02040000 00000000 00008001 04000000 00000000 00000000 949400b4 94514005 

 <32 bytes per line>

BBED> set block +1
        BLOCK#          2

BBED> map
 File: /tmp/30.dbf (0)
 Block: 2                                     Dba:0x00000000
------------------------------------------------------------
BBED-00400: invalid blocktype (27)


BBED> d
 File: /tmp/30.dbf (0)
 Block: 2                Offsets:    0 to   63           Dba:0x00000000
------------------------------------------------------------------------
 1ba20000 04004400 bffd8a1d 0000000c a6e00000 008f4500 00003455 fc020000 
 0204e81f 00000000 0000241e 05000000 00000000 00000000 11fc297f b426fe2b 

 <32 bytes per line>

BBED> set block +1
        BLOCK#          3

BBED> d
 File: /tmp/30.dbf (0)
 Block: 3                Offsets:    0 to   63           Dba:0x00000000
------------------------------------------------------------------------
 1ba20000 05004400 bffd8a1d 0000000c 780a0000 008f4500 00003455 fc020000 
 0204e81f 00000000 0000c001 06000000 00000000 00000000 2969a0d2 d30168a2 

 <32 bytes per line>

BBED> set block +1
        BLOCK#          4

BBED> d
 File: /tmp/30.dbf (0)
 Block: 4                Offsets:    0 to   63           Dba:0x00000000
------------------------------------------------------------------------
 1ba20000 06004400 bffd8a1d 0000000c 6c5a0000 008f4500 00003455 fc020000 
 0204e81f 00000000 0000f81d 07000000 00000000 00000000 7b51d409 6dc7ca4d 

 <32 bytes per line>

BBED> set block +1
        BLOCK#          5

BBED> d
 File: /tmp/30.dbf (0)
 Block: 5                Offsets:    0 to   63           Dba:0x00000000
------------------------------------------------------------------------
 1ba20000 07004400 bffd8a1d 0000000c c5600000 008f4500 00003455 fc020000 
 02040000 00000000 0000c001 08000000 00000000 00000000 14514005 25145200 

 <32 bytes per line>

BBED> set block +1
        BLOCK#          6

BBED> d
 File: /tmp/30.dbf (0)
 Block: 6                Offsets:    0 to   63           Dba:0x00000000
------------------------------------------------------------------------
 1ba20000 08004400 bffd8a1d 0000000c 60480000 008f4500 00003455 fc020000 
 0204e81f 00000000 0000c301 09000000 00000000 00000000 c2a1606a 7615130a 

 <32 bytes per line>

BBED> set block +1
        BLOCK#          7

BBED> d
 File: /tmp/30.dbf (0)
 Block: 7                Offsets:    0 to   63           Dba:0x00000000
------------------------------------------------------------------------
 1ba20000 09004400 bffd8a1d 0000000c e3430000 008f4500 00003455 fc020000 
 0204e81f 00000000 00000002 0a000000 00000000 00000000 00a28a28 00a28a28 

 <32 bytes per line>

BBED> set block +1
        BLOCK#          8

BBED> d
 File: /tmp/30.dbf (0)
 Block: 8                Offsets:    0 to   63           Dba:0x00000000
------------------------------------------------------------------------
 1ba20000 0a004400 07fe8a1d 0000000c fc000000 008f4500 00003455 fc020000 
 0205e81f 00000000 0000f41d 00000000 00000000 00000000 ffd8ffe0 00104a46 

 <32 bytes per line>

BBED> set block +1
        BLOCK#          9

BBED> d
 File: /tmp/30.dbf (0)
 Block: 9                Offsets:    0 to   63           Dba:0x00000000
------------------------------------------------------------------------
 1ba20000 0b004400 07fe8a1d 0000000c 48da0000 008f4500 00003455 fc020000 
 0205e81f 00000000 0000c601 01000000 00000000 00000000 b47d69d3 7fa96a6f 

 <32 bytes per line>

BBED> set block +1
        BLOCK#          10

BBED> d
 File: /tmp/30.dbf (0)
 Block: 10               Offsets:    0 to   63           Dba:0x00000000
------------------------------------------------------------------------
 1ba20000 0c004400 07fe8a1d 0000000c be0f0000 008f4500 00003455 fc020000 
 0205e81f 00000000 0000181d 02000000 00000000 00000000 9de3e868 4782d83a 

 <32 bytes per line>

BBED> set block +1
        BLOCK#          11

BBED> d
 File: /tmp/30.dbf (0)
 Block: 11               Offsets:    0 to   63           Dba:0x00000000
------------------------------------------------------------------------
 1ba20000 0d004400 07fe8a1d 0000000c 9cd00000 008f4500 00003455 fc020000 
 0205e81f 00000000 0000241e 03000000 00000000 00000000 dead1259 5919e385 

 <32 bytes per line>

BBED> set block +1
        BLOCK#          12

BBED> d
 File: /tmp/30.dbf (0)
 Block: 12               Offsets:    0 to   63           Dba:0x00000000
------------------------------------------------------------------------
 1ba20000 0e004400 07fe8a1d 0000000c df450000 008f4500 00003455 fc020000 
 0205e81f 00000000 00004001 04000000 00000000 00000000 31d9a292 9698828a 

 <32 bytes per line>

BBED> set block +1
        BLOCK#          13

BBED> d
 File: /tmp/30.dbf (0)
 Block: 13               Offsets:    0 to   63           Dba:0x00000000
------------------------------------------------------------------------
 1ba20000 0f004400 07fe8a1d 0000000c 18790000 008f4500 00003455 fc020000 
 02050000 00000000 00000002 05000000 00000000 00000000 b93f8235 5ea063b7 

 <32 bytes per line>

拿block 1的rdba(04004400–倒序存储)分析[win文件拷贝到linux后使用bbed查看相差1 block]可以的出来block信息为file=1, block=262148,明显错误.

通过dul分析文件头损坏情况

Data UnLoader: 10.2.0.6.9 - Internal Only - on Tue Sep 29 22:15:22 2015
with 64-bit io functions

Copyright (c) 1994 2015 Bernard van Duijnen All rights reserved.

 Strictly Oracle Internal Use Only


DUL: Warning: Recreating file "dul.log"
Reading SCANNEDLOBPAGE.dat 1204 entries loaded and sorted 1204 entries
Reading SEG.dat 0 entries loaded
Reading EXT.dat 44 entries loaded and sorted 44 entries
Reading COMPATSEG.dat 0 entries loaded

DUL: Warning: Wrong DBA  0X00440004 (file=1, block=262148) (Ignored)
DUL: Error: While processing file# 30 block# 1
DUL: Warning: Found mismatch while checking file E:\TEMP\shebao\30.dbf
DUL: Warning: DUL osd_parameter or control.dul configuration error
DUL: Warning: Given file number(30) in control file does not match file# in dba(1)
DUL: Warning: Wrong DBA  0X00440004 (file=1, block=262148) (Ignored)
DUL: Error: While processing file# 30 block# 1
DUL>

通过bbed和dul证明文件头大量损坏,而且尚未有任何该文件的物理备份,因此恢复起来难道较大。

分析Oracle Database Recovery Check Result
通过对Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)的分析结果,我们意外的发现,人品不错,发现异常的文件创建时间为2015-09-26 19:39:33,进一步和客户沟通,这个文件存储为图片,少量丢失可以允许,优先恢复业务
Oracle-Database-Recovery-Check


有了这个结论,那处理起来就easy了,直接offline异常文件,然后分析丢失的表
从而确定时lob字典的少量extent数据分配到了file 30上
lob


为了避免查询对应lob之时出现错误,通过update 对应lob为空规避该问题

create table corrupt_lobs (corrupt_rowid rowid,table_name varchar2(100)); 
declare 
  n number; 
begin 
  for cursor_lob in (select rowid r, xff_lob from xff.t_xifenfei) loop 
  begin 
    n:=dbms_lob.instr(cursor_lob.xff_lob,hextoraw('889911')); 
  exception 
    when others then 
      insert into corrupt_lobs values (cursor_lob.r,'xff.t_xifenfei');
      commit; 
    end; 
  end loop; 
end; 
/ 

update xff.t_xifenfei 
     set xff_lob = empty_blob() 
     where rowid in (select corrupted_rowid from corrupt_lobs);

本次恢复是由于运气好,遇到异常文件刚好是最近加入,而且都是图片,客户允许少量丢失,如果是不允许丢失的数据文件,可能需要通过找历史的该文件的备份(Oracle 12C的第一次异常恢复—文件头坏块),在某些情况下,如果也没有此类备份,只能通过bbed重构block 1(如果有其他异常块一次处理,如果太多无法处理,最少也需要重构block 1),然后尝试open数据库或者使用dul之类工具处理(因为文件头损坏,工具可能不能识别文件无法恢复)