10.2.0.4 asm磁盘头损坏数据0丢失恢复

接到朋友反馈说他们公司的10.2.0.4(无磁盘头备份)asm 磁盘头都损坏了,确定是被人恶意dd掉了磁盘头的1k,他通过查询发过来结果如下
asm_disk_header


分析alert日志,确定磁盘组和磁盘信息
asm mount data磁盘组报错

Sun Apr 16 21:39:31 2017
NOTE: cache dismounting group 2/0x3F94036B (DATA) 
NOTE: dbwr not being msg'd to dismount
ERROR: diskgroup DATA was not mounted
Sun Apr 16 21:39:31 2017
ERROR: no PST quorum in group 3: required 2, found 0

data磁盘组和磁盘信息

Mon Mar 20 16:21:59 2017
NOTE: Hbeat: instance not first (grp 3)
NOTE: cache opening disk 2 of grp 2: DATA_0002 path:/dev/raw/raw21
Mon Mar 20 16:21:59 2017
NOTE: F1X0 found on disk 2 fcn 0.47624333
NOTE: cache opening disk 3 of grp 2: DATA_0003 path:/dev/raw/raw22
NOTE: cache opening disk 4 of grp 2: DATA_0004 path:/dev/raw/raw23
NOTE: cache opening disk 5 of grp 2: DATA_0005 path:/dev/raw/raw24
NOTE: F1X0 found on disk 5 fcn 0.47624333
NOTE: cache opening disk 6 of grp 2: DATA_0006 path:/dev/raw/raw26
NOTE: cache opening disk 7 of grp 2: DATA_0007 path:/dev/raw/raw25
NOTE: F1X0 found on disk 7 fcn 0.47624333
NOTE: cache mounting (not first) group 2/0x01B869DC (DATA)
Mon Mar 20 16:21:59 2017
kjbdomatt send to node 1
Mon Mar 20 16:21:59 2017
NOTE: attached to recovery domain 2
Mon Mar 20 16:21:59 2017
NOTE: opening chunk 2 at fcn 0.201560874 ABA 
NOTE: seq=614 blk=4144 
Mon Mar 20 16:21:59 2017
NOTE: cache mounting group 2/0x01B869DC (DATA) succeeded
SUCCESS: diskgroup DATA was mounted

最后一次正常mount是使用了raw21-raw26的裸设备为data磁盘组,但是这里从DATA_002开始,表明很可能最初了两个asm disk被删除,继续分析alert日志

Mon Oct 15 01:53:16 2012
SQL> CREATE DISKGROUP DATA Normal REDUNDANCY  DISK '/dev/raw/raw6' SIZE 1144409M ,
'/dev/raw/raw7' SIZE 1144409M   

Sat Dec 27 22:41:39 2014
SQL> alter diskgroup data add disk '/dev/raw/raw21' 

Sat Dec 27 22:41:54 2014
SQL> alter diskgroup data add disk '/dev/raw/raw22' 

Sat Dec 27 22:42:14 2014
SQL> alter diskgroup data add disk '/dev/raw/raw23' 

Sat Dec 27 22:42:31 2014
SQL> alter diskgroup data add disk '/dev/raw/raw24' 


Sat Dec 27 22:42:51 2014
SQL> alter diskgroup data add disk '/dev/raw/raw26' 

Sat Dec 27 22:43:10 2014
SQL> alter diskgroup data add disk '/dev/raw/raw25' 

Mon Dec 29 17:55:07 2014
SQL> alter diskgroup data drop disk 'DATA_0000' 

Tue Dec 30 03:14:42 2014
SQL> alter diskgroup data drop disk 'DATA_0001'  

kfed确认磁盘头损坏情况
通过kfed分析dd出来的磁盘头发现每个磁盘头都一样,第一个block损坏

[oracle@rac1 xifenfei]$ kfed read raw22
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
7F21AF427400 00000000 00000000 00000000 00000000  [................]
  Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

[oracle@rac1 xifenfei]$ kfed read raw22 blkn=2|more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       2 ; 0x004: blk=2
kfbh.block.obj:              2147483651 ; 0x008: disk=3
kfbh.check:                  2184525105 ; 0x00c: 0x82353531
kfbh.fcn.base:                 47625389 ; 0x010: 0x02d6b4ad
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdatb10.aunum:                       0 ; 0x000: 0x00000000
kfdatb10.shrink:                    448 ; 0x004: 0x01c0
kfdatb10.ub2pad:                      0 ; 0x006: 0x0000
kfdatb10.auinfo[0].link.next:         8 ; 0x008: 0x0008
kfdatb10.auinfo[0].link.prev:         8 ; 0x00a: 0x0008
kfdatb10.auinfo[0].free:              0 ; 0x00c: 0x0000
kfdatb10.auinfo[0].total:           448 ; 0x00e: 0x01c0
kfdatb10.auinfo[1].link.next:        16 ; 0x010: 0x0010
kfdatb10.auinfo[1].link.prev:        16 ; 0x012: 0x0010
kfdatb10.auinfo[1].free:              0 ; 0x014: 0x0000
kfdatb10.auinfo[1].total:             0 ; 0x016: 0x0000
kfdatb10.auinfo[2].link.next:        24 ; 0x018: 0x0018
kfdatb10.auinfo[2].link.prev:        24 ; 0x01a: 0x0018
kfdatb10.auinfo[2].free:              0 ; 0x01c: 0x0000
kfdatb10.auinfo[2].total:             0 ; 0x01e: 0x0000
kfdatb10.auinfo[3].link.next:        32 ; 0x020: 0x0020
kfdatb10.auinfo[3].link.prev:        32 ; 0x022: 0x0020

恢复思路
确定磁盘是只被干掉了第一个block,但是由于asm 是10.2.0.4的,没有asm disk header的备份,因此也只能自己去人工kfed修复.但是考虑到该case中所有的asm disk header 全部丢失,无任何参考,完全修复比较麻烦,另外这个库也比较小,考虑修复asm disk header 关键部位,然后通过工具直接拷贝出来数据文件,在文件系统中open库的思路.主要需要恢复磁盘头基本信息(diskname,disksize,disknum,ausize,blocksize,file directory等)

通过kfed找出来file directory

kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            4 ; 0x002: KFBTYP_FILEDIR
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       2 ; 0x004: blk=2
kfbh.block.obj:                       1 ; 0x008: file=1
kfbh.check:                  2363360058 ; 0x00c: 0x8cde033a
kfbh.fcn.base:                 48245591 ; 0x010: 0x02e02b57
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfffdb.node.incarn:                   1 ; 0x000: A=1 NUMM=0x0
kfffdb.node.frlist.number:   4294967295 ; 0x004: 0xffffffff
kfffdb.node.frlist.incarn:            0 ; 0x008: A=0 NUMM=0x0
kfffdb.hibytes:                       0 ; 0x00c: 0x00000000
kfffdb.lobytes:                 1048576 ; 0x010: 0x00100000
kfffdb.xtntcnt:                       3 ; 0x014: 0x00000003
kfffdb.xtnteof:                       3 ; 0x018: 0x00000003
kfffdb.blkSize:                    4096 ; 0x01c: 0x00001000
kfffdb.flags:                        65 ; 0x020: O=1 S=0 S=0 D=0 C=0 I=0 R=1 A=0
kfffdb.fileType:                     15 ; 0x021: 0x0f
kfffdb.dXrs:                         19 ; 0x022: SCHE=0x1 NUMB=0x3

通过kfed找出来disk directory

kfffde[0].xptr.au:                    4 ; 0x4a0: 0x00000004
kfffde[0].xptr.disk:                  7 ; 0x4a4: 0x0007
kfffde[0].xptr.flags:                 0 ; 0x4a6: L=0 E=0 D=0 S=0
kfffde[0].xptr.chk:                  41 ; 0x4a7: 0x29
kfffde[1].xptr.au:                17405 ; 0x4a8: 0x000043fd
kfffde[1].xptr.disk:                  6 ; 0x4ac: 0x0006
kfffde[1].xptr.flags:                 0 ; 0x4ae: L=0 E=0 D=0 S=0
kfffde[1].xptr.chk:                 146 ; 0x4af: 0x92
kfffde[2].xptr.au:               330031 ; 0x4b0: 0x0005092f
kfffde[2].xptr.disk:                  4 ; 0x4b4: 0x0004
kfffde[2].xptr.flags:                 0 ; 0x4b6: L=0 E=0 D=0 S=0
kfffde[2].xptr.chk:                  13 ; 0x4b7: 0x0d


kfddde[2].entry.incarn:               1 ; 0x3a4: A=1 NUMM=0x0
kfddde[2].entry.hash:                 2 ; 0x3a8: 0x00000002
kfddde[2].entry.refer.number:4294967295 ; 0x3ac: 0xffffffff
kfddde[2].entry.refer.incarn:         0 ; 0x3b0: A=0 NUMM=0x0
kfddde[2].dsknum:                     2 ; 0x3b4: 0x0002
kfddde[2].state:                      2 ; 0x3b6: KFDSTA_NORMAL
kfddde[2].ddchgfl:                    0 ; 0x3b7: 0x00
kfddde[2].dskname:            DATA_0002 ; 0x3b8: length=9
kfddde[2].fgname:             DATA_0002 ; 0x3d8: length=9
kfddde[2].crestmp.hi:          33010550 ; 0x3f8: HOUR=0x16 DAYS=0x1b MNTH=0xc YEAR=0x7de
kfddde[2].crestmp.lo:        2793310208 ; 0x3fc: USEC=0x0 MSEC=0x3a2 SECS=0x27 MINS=0x29
kfddde[2].failstmp.hi:                0 ; 0x400: HOUR=0x0 DAYS=0x0 MNTH=0x0 YEAR=0x0
kfddde[2].failstmp.lo:                0 ; 0x404: USEC=0x0 MSEC=0x0 SECS=0x0 MINS=0x0
kfddde[2].timer:                      0 ; 0x408: 0x00000000
kfddde[2].size:                 1258291 ; 0x40c: 0x00133333
kfddde[2].srRloc.super.hiStart:       0 ; 0x410: 0x00000000
kfddde[2].srRloc.super.loStart:       0 ; 0x414: 0x00000000
kfddde[2].srRloc.super.length:        0 ; 0x418: 0x00000000
kfddde[2].srRloc.incarn:              0 ; 0x41c: 0x00000000
kfddde[2].dskrprtm:                   0 ; 0x420: 0x00000000
kfddde[2].start0:                     0 ; 0x424: 0x00000000
kfddde[2].size0:                1258291 ; 0x428: 0x00133333
kfddde[2].used0:                1258229 ; 0x42c: 0x001332f5
kfddde[2].slot:                       0 ; 0x430: 0x00000000


kfddde[3].entry.incarn:               1 ; 0x564: A=1 NUMM=0x0
kfddde[3].entry.hash:                 3 ; 0x568: 0x00000003
kfddde[3].entry.refer.number:4294967295 ; 0x56c: 0xffffffff
kfddde[3].entry.refer.incarn:         0 ; 0x570: A=0 NUMM=0x0
kfddde[3].dsknum:                     3 ; 0x574: 0x0003
kfddde[3].state:                      2 ; 0x576: KFDSTA_NORMAL
kfddde[3].ddchgfl:                    0 ; 0x577: 0x00
kfddde[3].dskname:            DATA_0003 ; 0x578: length=9
kfddde[3].fgname:             DATA_0003 ; 0x598: length=9
kfddde[3].crestmp.hi:          33010550 ; 0x5b8: HOUR=0x16 DAYS=0x1b MNTH=0xc YEAR=0x7de
kfddde[3].crestmp.lo:        2811397120 ; 0x5bc: USEC=0x0 MSEC=0xa1 SECS=0x39 MINS=0x29
kfddde[3].failstmp.hi:                0 ; 0x5c0: HOUR=0x0 DAYS=0x0 MNTH=0x0 YEAR=0x0
kfddde[3].failstmp.lo:                0 ; 0x5c4: USEC=0x0 MSEC=0x0 SECS=0x0 MINS=0x0
kfddde[3].timer:                      0 ; 0x5c8: 0x00000000
kfddde[3].size:                 1258291 ; 0x5cc: 0x00133333
kfddde[3].srRloc.super.hiStart:       0 ; 0x5d0: 0x00000000
kfddde[3].srRloc.super.loStart:       0 ; 0x5d4: 0x00000000
kfddde[3].srRloc.super.length:        0 ; 0x5d8: 0x00000000
kfddde[3].srRloc.incarn:              0 ; 0x5dc: 0x00000000
kfddde[3].dskrprtm:                   0 ; 0x5e0: 0x00000000
kfddde[3].start0:                     0 ; 0x5e4: 0x00000000
kfddde[3].size0:                1258291 ; 0x5e8: 0x00133333
kfddde[3].used0:                1258128 ; 0x5ec: 0x00133290
kfddde[3].slot:                       0 ; 0x5f0: 0x00000000

kfddde[4].entry.incarn:               1 ; 0x724: A=1 NUMM=0x0
kfddde[4].entry.hash:                 4 ; 0x728: 0x00000004
kfddde[4].entry.refer.number:4294967295 ; 0x72c: 0xffffffff
kfddde[4].entry.refer.incarn:         0 ; 0x730: A=0 NUMM=0x0
kfddde[4].dsknum:                     4 ; 0x734: 0x0004
kfddde[4].state:                      2 ; 0x736: KFDSTA_NORMAL
kfddde[4].ddchgfl:                    0 ; 0x737: 0x00
kfddde[4].dskname:            DATA_0004 ; 0x738: length=9
kfddde[4].fgname:             DATA_0004 ; 0x758: length=9
kfddde[4].crestmp.hi:          33010550 ; 0x778: HOUR=0x16 DAYS=0x1b MNTH=0xc YEAR=0x7de
kfddde[4].crestmp.lo:        2834565120 ; 0x77c: USEC=0x0 MSEC=0x102 SECS=0xf MINS=0x2a
kfddde[4].failstmp.hi:                0 ; 0x780: HOUR=0x0 DAYS=0x0 MNTH=0x0 YEAR=0x0
kfddde[4].failstmp.lo:                0 ; 0x784: USEC=0x0 MSEC=0x0 SECS=0x0 MINS=0x0
kfddde[4].timer:                      0 ; 0x788: 0x00000000
kfddde[4].size:                 1258291 ; 0x78c: 0x00133333
kfddde[4].srRloc.super.hiStart:       0 ; 0x790: 0x00000000
kfddde[4].srRloc.super.loStart:       0 ; 0x794: 0x00000000
kfddde[4].srRloc.super.length:        0 ; 0x798: 0x00000000
kfddde[4].srRloc.incarn:              0 ; 0x79c: 0x00000000
kfddde[4].dskrprtm:                   0 ; 0x7a0: 0x00000000
kfddde[4].start0:                     0 ; 0x7a4: 0x00000000
kfddde[4].size0:                1258291 ; 0x7a8: 0x00133333
kfddde[4].used0:                1258291 ; 0x7ac: 0x00133333
kfddde[4].slot:                       0 ; 0x7b0: 0x00000000


kfddde[5].entry.incarn:               1 ; 0x8e4: A=1 NUMM=0x0
kfddde[5].entry.hash:                 5 ; 0x8e8: 0x00000005
kfddde[5].entry.refer.number:4294967295 ; 0x8ec: 0xffffffff
kfddde[5].entry.refer.incarn:         0 ; 0x8f0: A=0 NUMM=0x0
kfddde[5].dsknum:                     5 ; 0x8f4: 0x0005
kfddde[5].state:                      2 ; 0x8f6: KFDSTA_NORMAL
kfddde[5].ddchgfl:                    0 ; 0x8f7: 0x00
kfddde[5].dskname:            DATA_0005 ; 0x8f8: length=9
kfddde[5].fgname:             DATA_0005 ; 0x918: length=9
kfddde[5].crestmp.hi:          33010550 ; 0x938: HOUR=0x16 DAYS=0x1b MNTH=0xc YEAR=0x7de
kfddde[5].crestmp.lo:        2853560320 ; 0x93c: USEC=0x0 MSEC=0x178 SECS=0x21 MINS=0x2a
kfddde[5].failstmp.hi:                0 ; 0x940: HOUR=0x0 DAYS=0x0 MNTH=0x0 YEAR=0x0
kfddde[5].failstmp.lo:                0 ; 0x944: USEC=0x0 MSEC=0x0 SECS=0x0 MINS=0x0
kfddde[5].timer:                      0 ; 0x948: 0x00000000
kfddde[5].size:                 1258291 ; 0x94c: 0x00133333
kfddde[5].srRloc.super.hiStart:       0 ; 0x950: 0x00000000
kfddde[5].srRloc.super.loStart:       0 ; 0x954: 0x00000000
kfddde[5].srRloc.super.length:        0 ; 0x958: 0x00000000
kfddde[5].srRloc.incarn:              0 ; 0x95c: 0x00000000
kfddde[5].dskrprtm:                   0 ; 0x960: 0x00000000
kfddde[5].start0:                     0 ; 0x964: 0x00000000
kfddde[5].size0:                1258291 ; 0x968: 0x00133333
kfddde[5].used0:                1258255 ; 0x96c: 0x0013330f
kfddde[5].slot:                       0 ; 0x970: 0x00000000


kfddde[6].entry.incarn:               1 ; 0xaa4: A=1 NUMM=0x0
kfddde[6].entry.hash:                 6 ; 0xaa8: 0x00000006
kfddde[6].entry.refer.number:4294967295 ; 0xaac: 0xffffffff
kfddde[6].entry.refer.incarn:         0 ; 0xab0: A=0 NUMM=0x0
kfddde[6].dsknum:                     6 ; 0xab4: 0x0006
kfddde[6].state:                      2 ; 0xab6: KFDSTA_NORMAL
kfddde[6].ddchgfl:                    0 ; 0xab7: 0x00
kfddde[6].dskname:            DATA_0006 ; 0xab8: length=9
kfddde[6].fgname:             DATA_0006 ; 0xad8: length=9
kfddde[6].crestmp.hi:          33010550 ; 0xaf8: HOUR=0x16 DAYS=0x1b MNTH=0xc YEAR=0x7de
kfddde[6].crestmp.lo:        2875645952 ; 0xafc: USEC=0x0 MSEC=0x1b8 SECS=0x36 MINS=0x2a
kfddde[6].failstmp.hi:                0 ; 0xb00: HOUR=0x0 DAYS=0x0 MNTH=0x0 YEAR=0x0
kfddde[6].failstmp.lo:                0 ; 0xb04: USEC=0x0 MSEC=0x0 SECS=0x0 MINS=0x0
kfddde[6].timer:                      0 ; 0xb08: 0x00000000
kfddde[6].size:                 1258291 ; 0xb0c: 0x00133333
kfddde[6].srRloc.super.hiStart:       0 ; 0xb10: 0x00000000
kfddde[6].srRloc.super.loStart:       0 ; 0xb14: 0x00000000
kfddde[6].srRloc.super.length:        0 ; 0xb18: 0x00000000
kfddde[6].srRloc.incarn:              0 ; 0xb1c: 0x00000000
kfddde[6].dskrprtm:                   0 ; 0xb20: 0x00000000
kfddde[6].start0:                     0 ; 0xb24: 0x00000000
kfddde[6].size0:                1258291 ; 0xb28: 0x00133333
kfddde[6].used0:                1258247 ; 0xb2c: 0x00133307
kfddde[6].slot:                       0 ; 0xb30: 0x00000000

kfddde[7].entry.incarn:               1 ; 0xc64: A=1 NUMM=0x0
kfddde[7].entry.hash:                 7 ; 0xc68: 0x00000007
kfddde[7].entry.refer.number:4294967295 ; 0xc6c: 0xffffffff
kfddde[7].entry.refer.incarn:         0 ; 0xc70: A=0 NUMM=0x0
kfddde[7].dsknum:                     7 ; 0xc74: 0x0007
kfddde[7].state:                      2 ; 0xc76: KFDSTA_NORMAL
kfddde[7].ddchgfl:                    0 ; 0xc77: 0x00
kfddde[7].dskname:            DATA_0007 ; 0xc78: length=9
kfddde[7].fgname:             DATA_0007 ; 0xc98: length=9
kfddde[7].crestmp.hi:          33010550 ; 0xcb8: HOUR=0x16 DAYS=0x1b MNTH=0xc YEAR=0x7de
kfddde[7].crestmp.lo:        2898849792 ; 0xcbc: USEC=0x0 MSEC=0x23c SECS=0xc MINS=0x2b
kfddde[7].failstmp.hi:                0 ; 0xcc0: HOUR=0x0 DAYS=0x0 MNTH=0x0 YEAR=0x0
kfddde[7].failstmp.lo:                0 ; 0xcc4: USEC=0x0 MSEC=0x0 SECS=0x0 MINS=0x0
kfddde[7].timer:                      0 ; 0xcc8: 0x00000000
kfddde[7].size:                 1258291 ; 0xccc: 0x00133333
kfddde[7].srRloc.super.hiStart:       0 ; 0xcd0: 0x00000000
kfddde[7].srRloc.super.loStart:       0 ; 0xcd4: 0x00000000
kfddde[7].srRloc.super.length:        0 ; 0xcd8: 0x00000000
kfddde[7].srRloc.incarn:              0 ; 0xcdc: 0x00000000
kfddde[7].dskrprtm:                   0 ; 0xce0: 0x00000000
kfddde[7].start0:                     0 ; 0xce4: 0x00000000
kfddde[7].size0:                1258291 ; 0xce8: 0x00133333
kfddde[7].used0:                1258209 ; 0xcec: 0x001332e1
kfddde[7].slot:                       0 ; 0xcf0: 0x00000000

结合上述信息构造类似磁盘头文件

kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj:              2147483648 ; 0x008: TYPE=0x8 NUMB=0x0
kfbh.check:                  3123334821 ; 0x00c: 0xba2a4ea5
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr:     ORCLDISKVOL2 ; 0x000: length=12
kfdhdb.driver.reserved[0]:    827084630 ; 0x008: 0x314c4f56
kfdhdb.driver.reserved[1]:            0 ; 0x00c: 0x00000000
kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
kfdhdb.compat:                168820736 ; 0x020: 0x0a100000
kfdhdb.dsknum:                        2 ; 0x024: 0x0002
kfdhdb.grptyp:                        1 ; 0x026: KFDGTP_NORMAL
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:               DATA_0002 ; 0x028: length=9
kfdhdb.grpname:                    DATA ; 0x048: length=4
kfdhdb.fgname:                DATA_0002 ; 0x068: length=9
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.crestmp.hi:             33010550 ; 0x3f8: HOUR=0x16 DAYS=0x1b MNTH=0xc YEAR=0x7de
kfdhdb.crestmp.lo:           2793310208 ; 0x3fc: USEC=0x0 MSEC=0x3a2 SECS=0x27 MINS=0x29
kfdhdb.mntstmp.hi:             33049840 ; 0x0b0: HOUR=0x10 DAYS=0x7 MNTH=0x3 YEAR=0x7e1
kfdhdb.mntstmp.lo:           1588567040 ; 0x0b4: USEC=0x0 MSEC=0x3e7 SECS=0x2a MINS=0x17
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.mfact:                    113792 ; 0x0c0: 0x0001bc80
kfdhdb.dsksize:                 1258291 ; 0x40c: 0x00133333
kfdhdb.pmcnt:                        19 ; 0x0c8: 0x00000013
kfdhdb.fstlocn:                       1 ; 0x0cc: 0x00000001
kfdhdb.altlocn:                       2 ; 0x0d0: 0x00000002
kfdhdb.f1b1locn:                      2 ; 0x0d4: 0x00000002

然后通过kfed merge分别对所有磁盘头进行重构,然后通过dul去识别拷贝文件

+DATA/XFF/spfileXFF.ora.265.869858859
+DATA/XFF/CONTROLFILE/Current.256.796701475
+DATA/XFF/ONLINELOG/group_1.257.796701475
+DATA/XFF/ONLINELOG/group_2.258.796701485
+DATA/XFF/ONLINELOG/group_3.266.796704261
+DATA/XFF/ONLINELOG/group_4.267.796704277
+DATA/XFF/ONLINELOG/group_5.1235.824131117
+DATA/XFF/ONLINELOG/group_6.1115.824200515
+DATA/XFF/ONLINELOG/group_7.1113.824200587
+DATA/XFF/ONLINELOG/group_8.1112.824200627
+DATA/XFF/ONLINELOG/group_9.1066.824201189
+DATA/XFF/ONLINELOG/group_10.1063.824201207
+DATA/XFF/ONLINELOG/group_11.1062.824201287
+DATA/XFF/ONLINELOG/group_12.1061.824201301


File 259 datafile size 2147491840, block size 8192
File 260 datafile size 32186048512, block size 8192
File 261 datafile size 6897541120, block size 8192
File 263 datafile size 27409784832, block size 8192
File 264 datafile size 34359730176, block size 8192
File 280 datafile size 31457288192, block size 8192
File 281 datafile size 31457288192, block size 8192
File 330 datafile size 5242888192, block size 8192
File 334 datafile size 20971528192, block size 8192
File 382 datafile size 20971528192, block size 8192
File 383 datafile size 20971528192, block size 8192
File 384 datafile size 31457288192, block size 8192
File 385 datafile size 31457288192, block size 8192
File 386 datafile size 31457288192, block size 8192
File 387 datafile size 4294975488, block size 8192
File 388 datafile size 4294975488, block size 8192
File 389 datafile size 4294975488, block size 8192
File 390 datafile size 4294975488, block size 8192
File 391 datafile size 4294975488, block size 8192
File 392 datafile size 4294975488, block size 8192
File 394 datafile size 31457288192, block size 8192
File 491 datafile size 20971528192, block size 8192
File 494 datafile size 20971528192, block size 8192
File 578 datafile size 31457288192, block size 8192
File 597 datafile size 20971528192, block size 8192
File 613 datafile size 4294975488, block size 8192
File 638 datafile size 31457288192, block size 8192
File 668 datafile size 16988184576, block size 8192
File 688 datafile size 20971528192, block size 8192
File 740 datafile size 31457288192, block size 8192
File 787 datafile size 31457288192, block size 8192
File 798 datafile size 31457288192, block size 8192
File 806 datafile size 31457288192, block size 8192
File 810 datafile size 31457288192, block size 8192
File 845 datafile size 31457288192, block size 8192
File 886 datafile size 31457288192, block size 8192
File 887 datafile size 31457288192, block size 8192
File 889 datafile size 31457288192, block size 8192
File 892 datafile size 31457288192, block size 8192
File 903 datafile size 31457288192, block size 8192
File 932 datafile size 31457288192, block size 8192
File 933 datafile size 3145736192, block size 8192
File 951 datafile size 20971528192, block size 8192
File 953 datafile size 31457288192, block size 8192
File 955 datafile size 31457288192, block size 8192
File 963 datafile size 31457288192, block size 8192
File 1000 datafile size 31457288192, block size 8192
File 1001 datafile size 12035563520, block size 8192
File 1031 datafile size 31457288192, block size 8192
File 1033 datafile size 31457288192, block size 8192
File 1035 datafile size 31457288192, block size 8192
File 1037 datafile size 31457288192, block size 8192
File 1045 datafile size 31457288192, block size 8192
File 1073 datafile size 4294975488, block size 8192
File 1074 datafile size 4294975488, block size 8192
File 1075 datafile size 4294975488, block size 8192
File 1076 datafile size 8589942784, block size 8192
File 1077 datafile size 31457288192, block size 8192
File 1078 datafile size 8589942784, block size 8192
File 1079 datafile size 8589942784, block size 8192
File 1080 datafile size 4294975488, block size 8192
File 1081 datafile size 8589942784, block size 8192
File 1082 datafile size 8589942784, block size 8192
File 1083 datafile size 8589942784, block size 8192
File 1084 datafile size 8589942784, block size 8192
File 1085 datafile size 32365355008, block size 8192
File 1086 datafile size 9071239168, block size 8192
File 1116 datafile size 8589942784, block size 8192
File 1133 datafile size 8589942784, block size 8192
File 1219 datafile size 31457288192, block size 8192
File 1245 datafile size 31457288192, block size 8192
File 1249 datafile size 31457288192, block size 8192
File 1251 datafile size 31457288192, block size 8192
File 1322 datafile size 4294975488, block size 8192
File 1442 datafile size 31457288192, block size 8192
File 1468 datafile size 1048584192, block size 8192
File 1508 datafile size 31457288192, block size 8192
File 1554 datafile size 4294975488, block size 8192
File 1570 datafile size 31457288192, block size 8192
File 2004 datafile size 31457288192, block size 8192
File 2005 datafile size 31457288192, block size 8192
File 2344 datafile size 31457288192, block size 8192
File 2345 datafile size 31457288192, block size 8192
File 2348 datafile size 31457288192, block size 8192
File 2617 datafile size 10737426432, block size 8192
File 2618 datafile size 21474844672, block size 8192
File 2766 datafile size 33554440192, block size 8192
File 2782 datafile size 31457288192, block size 8192
File 2784 datafile size 31457288192, block size 8192
File 2893 datafile size 31457288192, block size 8192
File 2924 datafile size 31457288192, block size 8192
File 2925 datafile size 31457288192, block size 8192
File 2926 datafile size 31457288192, block size 8192
File 2983 datafile size 31457288192, block size 8192
File 2984 datafile size 31457288192, block size 8192
File 3634 datafile size 31457288192, block size 8192
File 3909 datafile size 31457288192, block size 8192
File 3917 datafile size 31457288192, block size 8192
File 3920 datafile size 31457288192, block size 8192
File 3922 datafile size 31457288192, block size 8192

剩下的事情就比较简单了,通过把spfile,controlfile,datafile文件拷贝出来,本地启动数据库,恢复成功
asm_open_db


open_alert


asm 加磁盘导致磁盘组损坏恢复

接到客户恢复case请求,希望我们接入恢复数据。大概过程是这样的,16年9月份由于硬件问题,导致normal磁盘组(只有2个磁盘)中的一个磁盘丢失,然后在17年3月6日,运维方尝试增加该磁盘进入磁盘组,结果通过force命令加入成功之后,磁盘组dismount,然后再也无法mount成功。
磁盘组创建信息

Fri Jun 24 19:31:38 2016
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 2
SUCCESS: diskgroup DATADG was mounted
SUCCESS: CREATE DISKGROUP DATADG NORMAL REDUNDANCY  DISK '/dev/asm-diskdata01' SIZE 1048576M ,
'/dev/asm-diskdata02' SIZE 1048576M  ATTRIBUTE 'compatible.asm'='11.2.0.0.0','au_size'='4M' /* ASMCA */

这里可以看出来datadg是一个normal的au为4M的一个磁盘组

自动drop异常asm disk

Mon Sep 12 11:41:54 2016
WARNING: Waited 15 secs for write IO to PST disk 1 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 1 in group 1.
Mon Sep 12 11:41:55 2016
NOTE: process _b000_+asm1 (19491) initiating offline of disk 1.3915923833 (DATADG_0001) with mask 0x7e in group 1
NOTE: checking PST: grp = 1
GMON checking disk modes for group 1 at 9 for pid 29, osid 19491
NOTE: group DATADG: updated PST location: disk 0000 (PST copy 0)
NOTE: checking PST for grp 1 done.
NOTE: sending set offline flag message 2870990318 to 1 disk(s) in group 1
WARNING: Disk DATADG_0001 in mode 0x7f is now being offlined
NOTE: initiating PST update: grp = 1, dsk = 1/0xe9684179, mask = 0x6a, op = clear
GMON updating disk modes for group 1 at 10 for pid 29, osid 19491
NOTE: group DATADG: updated PST location: disk 0000 (PST copy 0)
NOTE: group DATADG: updated PST location: disk 0000 (PST copy 0)
NOTE: PST update grp = 1 completed successfully 
NOTE: initiating PST update: grp = 1, dsk = 1/0xe9684179, mask = 0x7e, op = clear
GMON updating disk modes for group 1 at 11 for pid 29, osid 19491
NOTE: group DATADG: updated PST location: disk 0000 (PST copy 0)
NOTE: group DATADG: updated PST location: disk 0000 (PST copy 0)
NOTE: cache closing disk 1 of grp 1: DATADG_0001
NOTE: PST update grp = 1 completed successfully 
Mon Sep 12 11:42:55 2016
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
Mon Sep 12 11:44:58 2016
WARNING: PST-initiated drop of 1 disk(s) in group 1(.1137226115))
SQL> alter diskgroup DATADG drop disk DATADG_0001 force /* ASM SERVER */ 
NOTE: GroupBlock outside rolling migration privileged region
NOTE: requesting all-instance membership refresh for group=1
Mon Sep 12 11:44:59 2016
GMON updating for reconfiguration, group 1 at 12 for pid 29, osid 19491
NOTE: cache closing disk 1 of grp 1: (not open) DATADG_0001
NOTE: group DATADG: updated PST location: disk 0000 (PST copy 0)
NOTE: group 1 PST updated.
Mon Sep 12 11:44:59 2016
NOTE: membership refresh pending for group 1/0x43c8b183 (DATADG)
Mon Sep 12 11:45:02 2016
NOTE: successfully read ACD block gn=1 blk=0 via retry read
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lgwr_3526.trc:
ORA-15062: ASM disk is globally closed
GMON querying group 1 at 13 for pid 18, osid 3532
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
SUCCESS: refreshed membership for 1/0x43c8b183 (DATADG)
SUCCESS: alter diskgroup DATADG drop disk DATADG_0001 force /* ASM SERVER */
NOTE: starting rebalance of group 1/0x43c8b183 (DATADG) at power 1
SUCCESS: PST-initiated drop disk in group 1(1137226115))
Starting background process ARB0
Mon Sep 12 11:45:03 2016
ARB0 started with pid=35, OS id=19945 
NOTE: assigning ARB0 to group 1/0x43c8b183 (DATADG) with 1 parallel I/O
cellip.ora not found.
NOTE: Rebalance has restored redundancy for any existing control file or redo log in disk group DATADG
NOTE: Attempting voting file refresh on diskgroup DATADG
NOTE: Refresh completed on diskgroup DATADG. No voting file found.
Mon Sep 12 11:46:21 2016
NOTE: GroupBlock outside rolling migration privileged region
NOTE: requesting all-instance membership refresh for group=1
Mon Sep 12 11:46:24 2016
GMON updating for reconfiguration, group 1 at 14 for pid 36, osid 20110
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
NOTE: group 1 PST updated.
WARNING: offline disk number 1 has references (54679 AUs)
Mon Sep 12 11:46:24 2016
NOTE: membership refresh pending for group 1/0x43c8b183 (DATADG)
GMON querying group 1 at 15 for pid 18, osid 3532
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
SUCCESS: refreshed membership for 1/0x43c8b183 (DATADG)
NOTE: Attempting voting file refresh on diskgroup DATADG
NOTE: Refresh completed on diskgroup DATADG. No voting file found.
NOTE: stopping process ARB0
SUCCESS: rebalance completed for group 1/0x43c8b183 (DATADG)

这里我们可以看出来磁盘组在2016年9月12日由于disk 1 无法响应,直接被asm 踢出了磁盘组

把被强制删除的磁盘重新加回去

Mon Mar 06 15:36:54 2017
SQL> alter diskgroup DATADG add disk '/dev/asm-diskdata01' name DATADG_0000 
NOTE: GroupBlock outside rolling migration privileged region
ORA-15032: not all alterations performed
ORA-15029: disk '/dev/asm-diskdata01' is already mounted by this instance
ERROR: alter diskgroup DATADG add disk '/dev/asm-diskdata01' name DATADG_0000
Mon Mar 06 15:38:27 2017
SQL>  alter diskgroup DATADG add disk '/dev/asm-diskdata02' name DATADG_0001 
NOTE: GroupBlock outside rolling migration privileged region
NOTE: Assigning number (1,2) to disk (/dev/asm-diskdata02)
NOTE: requesting all-instance membership refresh for group=1
NOTE: Disk DATADG_0001 in mode 0x7f marked for de-assignment
ERROR: ORA-15033 signalled during reconfiguration of diskgroup DATADG
Mon Mar 06 15:38:28 2017
NOTE: membership refresh pending for group 1/0x31584f6b (DATADG)
Mon Mar 06 15:38:31 2017
GMON querying group 1 at 7 for pid 18, osid 3468
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
GMON querying group 1 at 8 for pid 18, osid 3468
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
SUCCESS: refreshed membership for 1/0x31584f6b (DATADG)
ORA-15032: not all alterations performed
ORA-15033: disk '/dev/asm-diskdata02' belongs to diskgroup "DATADG"
ERROR:  alter diskgroup DATADG add disk '/dev/asm-diskdata02' name DATADG_0001
NOTE: Attempting voting file refresh on diskgroup DATADG
NOTE: Refresh completed on diskgroup DATADG. No voting file found.
Mon Mar 06 16:04:14 2017
SQL> alter diskgroup DATADG add disk '/dev/asm-diskdata02' name DATADG_0001 
NOTE: GroupBlock outside rolling migration privileged region
NOTE: Assigning number (1,2) to disk (/dev/asm-diskdata02)
NOTE: requesting all-instance membership refresh for group=1
NOTE: Disk DATADG_0001 in mode 0x7f marked for de-assignment
ERROR: ORA-15033 signalled during reconfiguration of diskgroup DATADG
Mon Mar 06 16:04:15 2017
NOTE: membership refresh pending for group 1/0x31584f6b (DATADG)
Mon Mar 06 16:04:18 2017
GMON querying group 1 at 9 for pid 18, osid 3468
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
GMON querying group 1 at 10 for pid 18, osid 3468
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
SUCCESS: refreshed membership for 1/0x31584f6b (DATADG)
ORA-15032: not all alterations performed
ORA-15033: disk '/dev/asm-diskdata02' belongs to diskgroup "DATADG"
ERROR: alter diskgroup DATADG add disk '/dev/asm-diskdata02' name DATADG_0001
NOTE: Attempting voting file refresh on diskgroup DATADG
NOTE: Refresh completed on diskgroup DATADG. No voting file found.
Mon Mar 06 16:23:28 2017
SQL> alter diskgroup DATADG add FAILGROUP DATA_0001 disk '/dev/adm-diskdata02' name DATA_0001 
NOTE: GroupBlock outside rolling migration privileged region
ORA-15032: not all alterations performed
ORA-15031: disk specification '/dev/adm-diskdata02' matches no disks
ERROR: alter diskgroup DATADG add FAILGROUP DATA_0001 disk '/dev/adm-diskdata02' name DATA_0001
Mon Mar 06 16:24:48 2017
SQL> alter diskgroup DATADG add FAILGROUP DATA_0001 disk '/dev/asm-diskdata02' name DATA_0001 
NOTE: GroupBlock outside rolling migration privileged region
NOTE: Assigning number (1,2) to disk (/dev/asm-diskdata02)
NOTE: requesting all-instance membership refresh for group=1
NOTE: Disk DATA_0001 in mode 0x7f marked for de-assignment
ERROR: ORA-15033 signalled during reconfiguration of diskgroup DATADG
Mon Mar 06 16:24:49 2017
NOTE: membership refresh pending for group 1/0x31584f6b (DATADG)
Mon Mar 06 16:24:52 2017
GMON querying group 1 at 11 for pid 18, osid 3468
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
GMON querying group 1 at 12 for pid 18, osid 3468
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
SUCCESS: refreshed membership for 1/0x31584f6b (DATADG)
ORA-15032: not all alterations performed
ORA-15033: disk '/dev/asm-diskdata02' belongs to diskgroup "DATADG"
ERROR: alter diskgroup DATADG add FAILGROUP DATA_0001 disk '/dev/asm-diskdata02' name DATA_0001
NOTE: Attempting voting file refresh on diskgroup DATADG
NOTE: Refresh completed on diskgroup DATADG. No voting file found.
Mon Mar 06 16:26:07 2017
SQL> alter diskgroup DATADG add FAILGROUP DATA_0001 disk '/dev/asm-diskdata02' name DATA_0001 force  
NOTE: GroupBlock outside rolling migration privileged region
NOTE: Assigning number (1,2) to disk (/dev/asm-diskdata02)
NOTE: requesting all-instance membership refresh for group=1
NOTE: initializing header on grp 1 disk DATA_0001
NOTE: requesting all-instance disk validation for group=1
Mon Mar 06 16:26:10 2017
NOTE: skipping rediscovery for group 1/0x31584f6b (DATADG) on local instance.
NOTE: requesting all-instance disk validation for group=1
NOTE: skipping rediscovery for group 1/0x31584f6b (DATADG) on local instance.
Mon Mar 06 16:26:15 2017
GMON updating for reconfiguration, group 1 at 13 for pid 28, osid 12861
NOTE: group 1 PST updated.
NOTE: initiating PST update: grp = 1
GMON updating group 1 at 14 for pid 28, osid 12861
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
NOTE: group DATADG: updated PST location: disk 0000 (PST copy 0)
NOTE: group DATADG: updated PST location: disk 0002 (PST copy 1)
NOTE: PST update grp = 1 completed successfully 
NOTE: membership refresh pending for group 1/0x31584f6b (DATADG)
GMON querying group 1 at 15 for pid 18, osid 3468
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
NOTE: cache opening disk 2 of grp 1: DATA_0001 path:/dev/asm-diskdata02
NOTE: Attempting voting file refresh on diskgroup DATADG
NOTE: Refresh completed on diskgroup DATADG. No voting file found.
GMON querying group 1 at 16 for pid 18, osid 3468
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
SUCCESS: refreshed membership for 1/0x31584f6b (DATADG)
Mon Mar 06 16:26:19 2017
SUCCESS: alter diskgroup DATADG add FAILGROUP DATA_0001 disk '/dev/asm-diskdata02' name DATA_0001 force 
NOTE: starting rebalance of group 1/0x31584f6b (DATADG) at power 1
Mon Mar 06 16:26:20 2017
Starting background process ARB0
Mon Mar 06 16:26:20 2017
ARB0 started with pid=32, OS id=25833 
NOTE: assigning ARB0 to group 1/0x31584f6b (DATADG) with 1 parallel I/O
cellip.ora not found.
WARNING:cache read  a corrupt block: group=1(DATADG) dsk=0 blk=0 disk=0 
        (DATADG_0000)incarn=3915956130 au=0 blk=0 count=1
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_25833.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483648] [0] [0 != 1]
NOTE:a corrupted block from group DATADG was dumped to /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_25833.trc
WARNING:cache read (retry)a corrupt block:group=1(DATADG) 
         dsk=0 blk=0 disk=0(DATADG_0000)incarn=3915956130 au=0 blk=0 count=1
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_25833.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483648] [0] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483648] [0] [0 != 1]
ERROR: cache failed to read group=1(DATADG) dsk=0 blk=0 from disk(s): 0(DATADG_0000)
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483648] [0] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483648] [0] [0 != 1]
NOTE: cache initiating offline of disk 0 group DATADG
NOTE:process _arb0_+asm1 (25833) initiating offline of disk 0.3915956130(DATADG_0000)with mask 0x7e in group 1
NOTE: checking PST: grp = 1
GMON checking disk modes for group 1 at 17 for pid 32, osid 25833
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
ERROR: too many offline disks in PST (grp 1)
NOTE: checking PST for grp 1 done.
NOTE: initiating PST update: grp = 1, dsk = 0/0xe968bfa2, mask = 0x6a, op = clear
GMON updating disk modes for group 1 at 18 for pid 32, osid 25833
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
ERROR: Disk 0 cannot be offlined, since all the disks [0, 1] with mirrored data would be offline.
ERROR: too many offline disks in PST (grp 1)
Mon Mar 06 16:26:23 2017
NOTE: cache dismounting (not clean) group 1/0x31584F6B (DATADG) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 25889, image: oracle@DBN01 (B000)
Mon Mar 06 16:26:23 2017
NOTE: halting all I/Os to diskgroup 1 (DATADG)
Mon Mar 06 16:26:23 2017
NOTE: LGWR doing non-clean dismount of group 1 (DATADG)
NOTE: LGWR sync ABA=19.2851 last written ABA 19.2851
WARNING: Offline for disk DATADG_0000 in mode 0x7f failed.
Mon Mar 06 16:26:23 2017
kjbdomdet send to inst 2
detach from dom 1, sending detach message to inst 2
Mon Mar 06 16:26:23 2017
List of instances:
 1 2
Dirty detach reconfiguration started (new ddet inc 1, cluster inc 8)
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_25833.trc  (incident=65537):
ORA-15335: ASM metadata corruption detected in disk group 'DATADG'
ORA-15130: diskgroup "DATADG" is being dismounted
ORA-15066: offlining disk "DATADG_0000" in group "DATADG" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483648] [0] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483648] [0] [0 != 1]
Incident details in:/u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_65537/+ASM1_arb0_25833_i65537.trc
 Global Resource Directory partially frozen for dirty detach
* dirty detach - domain 1 invalid = TRUE 
 3189 GCS resources traversed, 0 cancelled
Dirty Detach Reconfiguration complete
ERROR: ORA-15130 in COD recovery for diskgroup 1/0x31584f6b (DATADG)
ERROR: ORA-15130 thrown in RBAL for group number 1
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_3468.trc:
ORA-15130: diskgroup "DATADG" is being dismounted
Mon Mar 06 16:26:23 2017
WARNING: dirty detached from domain 1
NOTE: cache dismounted group 1/0x31584F6B (DATADG) 

---后续mount报错
SQL> ALTER DISKGROUP DATADG MOUNT  /* asm agent *//* {1:18003:2} */ 
NOTE: cache registered group DATADG number=1 incarn=0xb368408f
NOTE: cache began mount (first) of group DATADG number=1 incarn=0xb368408f
NOTE: Assigning number (1,2) to disk (/dev/asm-diskdata02)
WARNING:GMON has insufficient disks to maintain consensus.
Minimum required is 2:updating 1 PST copies from a total of 2.
ERROR: GMON failed to obtain a quorum ofsupporting disks in group 1
NOTE: cache dismounting (clean) group 1/0xB368408F (DATADG) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 27651, image: oracle@DBN01 (TNS V1-V3)
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 1/0xB368408F (DATADG) 
NOTE: cache ending mount (fail) of group DATADG number=1 incarn=0xb368408f
NOTE: cache deleting context for group DATADG 1/0xb368408f
GMON dismounting group 1 at 12 for pid 30, osid 27651
NOTE: Disk DATA_0001 in mode 0x9 marked for de-assignment
ERROR: diskgroup DATADG was not mounted
ORA-15032: not all alterations performed
ORA-15017: diskgroup "DATADG" cannot be mounted
ORA-15315: Write errors in disk group DATADG could lead to inconsistent ASM metadata.
ERROR: ALTER DISKGROUP DATADG MOUNT  /* asm agent *//* {1:18003:2} */

从这里我们可以看出来,前几次加asm disk 由于各种原因都失败了,最后一次通过加force关键字,使得被自动drop的disk重新强制加到datadg里面.可悲的是在加入成功之后,开始做rebalance的时候,发现disk 0出现坏块,从而引起ORA-15196的错误,使得rebalance无法进行下去,进而整个asm 磁盘组datadg自动dismount.后面再次尝试mount datadg的时候,直接提示元数据库不一致,因为disk 0 的磁盘头已经异常.

通过kfed分析disk 0信息
这里是通过dd命令备份的磁盘头到win进行分析的,以前正常的disk 0的磁盘头损坏(全0)
asm-kfed


对于这个故障已经比较清楚,恢复思路也基本上确定:依次递进
方案1:通过kfed修改文件头,然后尝试mount磁盘头手工修复ASM DISK HEADER 异常
方案2:直接通过amdu,dul之类的工具拷贝出来数据文件找回ASM中数据文件
方案3:通过底层au重组出来数据文件asm disk header 彻底损坏恢复
在我们的实际恢复中运气比较好,通过方案1就完成了恢复工作,通过kfed修复磁盘头之后,然后报错如下

SQL> alter diskgroup DATADG mount 
NOTE: cache registered group DATADG number=1 incarn=0x5134d0d4
NOTE: cache began mount (first) of group DATADG number=1 incarn=0x5134d0d4
NOTE: Assigning number (1,2) to disk (/dev/asm-diskdata02)
NOTE: Assigning number (1,0) to disk (/dev/asm-diskdata01)
Tue Mar 07 19:03:40 2017
NOTE: GMON heartbeating for grp 1
GMON querying group 1 at 27 for pid 28, osid 13837
NOTE: Assigning number (1,1) to disk ()
GMON querying group 1 at 28 for pid 28, osid 13837
NOTE: cache closing disk 1 of grp 1: (not open) 
NOTE: cache opening disk 0 of grp 1: DATADG_0000 path:/dev/asm-diskdata01
NOTE: F1X0 found on disk 0 au 2 fcn 0.178802
NOTE: cache opening disk 2 of grp 1: DATA_0001 path:/dev/asm-diskdata02
NOTE: cache mounting (first) normal redundancy group 1/0x5134D0D4 (DATADG)
Tue Mar 07 19:03:40 2017
* allocate domain 1, invalid = TRUE 
kjbdomatt send to inst 2
Tue Mar 07 19:03:40 2017
NOTE: attached to recovery domain 1
NOTE: starting recovery of thread=1 ckpt=19.2851 group=1 (DATADG)
NOTE: starting recovery of thread=2 ckpt=13.5327 group=1 (DATADG)
NOTE: advancing ckpt for group 1 (DATADG) thread=2 ckpt=13.5327
NOTE: advancing ckpt for group 1 (DATADG) thread=1 ckpt=19.2852
NOTE: cache recovered group 1 to fcn 0.365868
NOTE: redo buffer size is 512 blocks (2101760 bytes)
Tue Mar 07 19:03:40 2017
NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (DATADG)
NOTE: LGWR found thread 1 closed at ABA 19.2851
NOTE: LGWR mounted thread 1 for diskgroup 1 (DATADG)
NOTE: LGWR opening thread 1 at fcn 0.365868 ABA 20.2852
NOTE: cache mounting group 1/0x5134D0D4 (DATADG) succeeded
NOTE: cache ending mount (success) of group DATADG number=1 incarn=0x5134d0d4
Tue Mar 07 19:03:40 2017
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1
SUCCESS: diskgroup DATADG was mounted
SUCCESS: alter diskgroup DATADG mount
Tue Mar 07 19:03:40 2017
NOTE: diskgroup resource ora.DATADG.dg is online
Tue Mar 07 19:03:41 2017
ASM Health Checker found 1 new failures
NOTE: ASM did background COD recovery for group 1/0x5134d0d4 (DATADG)
NOTE: starting rebalance of group 1/0x5134d0d4 (DATADG) at power 1
Starting background process ARB0
Tue Mar 07 19:03:42 2017
ARB0 started with pid=30, OS id=13905 
NOTE: assigning ARB0 to group 1/0x5134d0d4 (DATADG) with 1 parallel I/O
cellip.ora not found.
WARNING: cache read  a corrupt block: group=1(DATADG) dsk=0 blk=0 disk=0 
         (DATADG_0000) incarn=2202280062 au=0 blk=0 count=1
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_13905.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [2147483648] [0] [1022 != 0]
NOTE: a corrupted block from group DATADG was dumped to /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_13905.trc
WARNING:cache read (retry)a corrupt block:group=1(DATADG) dsk=0 blk=0 disk=0
        (DATADG_0000)incarn=2202280062 au=0 blk=0 count=1
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_13905.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [2147483648] [0] [1022 != 0]
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [2147483648] [0] [1022 != 0]
ERROR: cache failed to read group=1(DATADG) dsk=0 blk=0 from disk(s): 0(DATADG_0000)
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [2147483648] [0] [1022 != 0]
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [2147483648] [0] [1022 != 0]
Tue Mar 07 19:03:52 2017
NOTE: client oradb1:oradb registered, osid 13989, mbr 0x1
NOTE: cache initiating offline of disk 0 group DATADG
NOTE:process _arb0_+asm1 (13905) initiating offline of disk 0.2202280062(DATADG_0000)with mask 0x7e in group 1
NOTE: checking PST: grp = 1
Tue Mar 07 19:03:52 2017
GMON checking disk modes for group 1 at 30 for pid 30, osid 13905
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
ERROR: too many offline disks in PST (grp 1)
NOTE: checking PST for grp 1 done.
NOTE: initiating PST update: grp = 1, dsk = 0/0x8344207e, mask = 0x6a, op = clear
GMON updating disk modes for group 1 at 31 for pid 30, osid 13905
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATADG
ERROR: Disk 0 cannot be offlined, since all the disks [0, 1] with mirrored data would be offline.
ERROR: too many offline disks in PST (grp 1)
Tue Mar 07 19:03:52 2017
NOTE: cache dismounting (not clean) group 1/0x5134D0D4 (DATADG) 
WARNING: Offline for disk DATADG_0000 in mode 0x7f failed.
Tue Mar 07 19:03:52 2017
NOTE: halting all I/Os to diskgroup 1 (DATADG)
NOTE: messaging CKPT to quiesce pins Unix process pid: 14002, image: oracle@DBN01 (B000)
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_13905.trc  (incident=76402):
ORA-15335: ASM metadata corruption detected in disk group 'DATADG'
ORA-15130: diskgroup "DATADG" is being dismounted
ORA-15066: offlining disk "DATADG_0000" in group "DATADG" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [2147483648] [0] [1022 != 0]
ORA-15196: invalid ASM block header [kfc.c:26368] [blk_kfbl] [2147483648] [0] [1022 != 0]
Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_76402/+ASM1_arb0_13905_i76402.trc
Tue Mar 07 19:03:52 2017
NOTE: LGWR doing non-clean dismount of group 1 (DATADG)
NOTE: LGWR sync ABA=20.2857 last written ABA 20.2857

这里比较比较幸运,datadg已经mount成功了,但是由于rab依旧读取到disk header异常信息(没有完全修复成功,而且在日志中不光这个block异常,还有其他block异常,因此不考虑进一步修复),因此直接通过屏蔽asm的acd和cod实现该磁盘组mount,而且不会dismount。

SQL> alter diskgroup DATADG mount 
NOTE: cache registered group DATADG number=1 incarn=0x9c94d0eb
NOTE: cache began mount (first) of group DATADG number=1 incarn=0x9c94d0eb
NOTE: Assigning number (1,2) to disk (/dev/asm-diskdata02)
NOTE: Assigning number (1,0) to disk (/dev/asm-diskdata01)
NOTE: skip COD recovery as part of test at kfrc.c:1639 
NOTE: skip COD recovery as part of test at kfrc.c:1639 
Tue Mar 07 19:12:45 2017
NOTE: GMON heartbeating for grp 1
GMON querying group 1 at 75 for pid 28, osid 15615
NOTE: Assigning number (1,1) to disk ()
GMON querying group 1 at 76 for pid 28, osid 15615
NOTE: cache closing disk 1 of grp 1: (not open) 
NOTE: cache opening disk 0 of grp 1: DATADG_0000 path:/dev/asm-diskdata01
NOTE: F1X0 found on disk 0 au 2 fcn 0.178802
NOTE: cache opening disk 2 of grp 1: DATA_0001 path:/dev/asm-diskdata02
NOTE: cache mounting (first) normal redundancy group 1/0x9C94D0EB (DATADG)
Tue Mar 07 19:12:45 2017
* allocate domain 1, invalid = TRUE 
kjbdomatt send to inst 2
Tue Mar 07 19:12:45 2017
NOTE: attached to recovery domain 1
NOTE: starting recovery of thread=1 ckpt=25.2870 group=1 (DATADG)
NOTE: advancing ckpt for group 1 (DATADG) thread=1 ckpt=25.2873
NOTE: cache recovered group 1 to fcn 0.365897
NOTE: redo buffer size is 512 blocks (2101760 bytes)
Tue Mar 07 19:12:45 2017
NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (DATADG)
NOTE: LGWR found thread 1 closed at ABA 25.2872
NOTE: LGWR mounted thread 1 for diskgroup 1 (DATADG)
NOTE: LGWR opening thread 1 at fcn 0.365897 ABA 26.2873
NOTE: cache mounting group 1/0x9C94D0EB (DATADG) succeeded
NOTE: cache ending mount (success) of group DATADG number=1 incarn=0x9c94d0eb
Tue Mar 07 19:12:45 2017
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1
SUCCESS: diskgroup DATADG was mounted
SUCCESS: alter diskgroup DATADG mount
Tue Mar 07 19:12:45 2017
NOTE: diskgroup resource ora.DATADG.dg is online
NOTE: skip COD recovery as part of test at kfrc.c:1639 
NOTE: skip COD recovery as part of test at kfrc.c:1639 
NOTE: skip COD recovery as part of test at kfrc.c:1639 
NOTE: skip COD recovery as part of test at kfrc.c:1639 

asm的问题解决后,然后登录数据库,发现运气比较好,两个数据库正常open成功,而且alert日志无任何报错,直接通过rman备份出来数据,重建asm磁盘组,还原数据,恢复完成,而且实现数据0丢失。

KFED-00322 [kfbtTraverseBlock][Invalid OSM block type]

在oracle asm的使用过程中由于操作系统层面的错误操作导致asm disk 被破坏,这里列举了几种破坏之后的kfed报错现象(KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type])
asm mount 磁盘组报错(ORA-15040 ORA-15042)

SQL> alter diskgroup DATA mount;
alter diskgroup DATA mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "2" is missing from group number "2"

asm alert日志报错(ORA-15335 ORA-15066 ORA-15196等)

ORA-15335: ASM metadata corruption detected in disk group 'DATA'
ORA-15130: diskgroup "DATA" is being dismounted
ORA-15066: offlining disk "DATA_0002" in group "DATA" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483651] [48] [0 != 1]

kfed查看磁盘头报错
文件文件头(不光是disk header的4k,可能是连续的几个au,甚至更多)可能彻底损坏,一般kfed 读取都会看到KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type]之类错误

[oracle@fcomtaep2 disks]$ kfed read ASMRECO03
kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 0 ; 0x001: 0x00
kfbh.type: 0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt: 0 ; 0x003: 0x00
kfbh.block.blk: 0 ; 0x004: blk=0
kfbh.block.obj: 0 ; 0x008: file=0
kfbh.check: 0 ; 0x00c: 0x00000000
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
7FC18D899400 00000000 00000000 00000000 00000000 [................]
  Repeat 27 times
7FC18D8995C0 FEEE0001 0001FFFF FFFF0000 00000FFF [................]
7FC18D8995D0 00000000 00000000 00000000 00000000 [................]
  Repeat 1 times
7FC18D8995F0 00000000 00000000 00000000 AA550000 [..............U.]
7FC18D899600 20494645 54524150 00010000 0000005C [EFI PART....\...] <==== **** Here ******
7FC18D899610 BD82BBB3 00000000 00000001 00000000 [................]
7FC18D899620 0FFFFFFF 00000000 00000022 00000000 [........".......]
7FC18D899630 0FFFFFDE 00000000 FD8857E5 42D7B49B [.........W.....B]
7FC18D899640 0901FA87 6B3DB5AA 00000002 00000000 [......=k........]
7FC18D899650 00000080 00000080 FE48EB77 00000000 [........w.H.....]
7FC18D899660 00000000 00000000 00000000 00000000 [................]
  Repeat 25 times
7FC18D899800 EBD0A0A2 4433B9E5 B668C087 C79926B7 [......3D..h..&..]
7FC18D899810 5381F6DF 4626F988 0E4F468D D78D3B28 [...S..&F.FO.(;..]
7FC18D899820 000007A1 00000000 0FFFF85F 00000000 [........_.......]
7FC18D899830 00000000 00000000 00720070 006D0069 [........p.r.i.m.]
7FC18D899840 00720061 00000079 00000000 00000000 [a.r.y...........]
7FC18D899850 00000000 00000000 00000000 00000000 [................]
 Repeat 186 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

“EFI PART”是分区的元数据,一般是被分区导致asm disk损坏.

[ebernal@dbaasm new2]$ kfed read emcpowerl | head -25
kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 0 ; 0x001: 0x00
kfbh.type: 0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt: 0 ; 0x003: 0x00
kfbh.block.blk: 0 ; 0x004: blk=0
kfbh.block.obj: 0 ; 0x008: file=0
kfbh.check: 0 ; 0x00c: 0x00000000
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
2ABD671E9400 00000000 00000000 00000000 00000000 [................]
  Repeat 31 times
2ABD671E9600 4542414C 454E4F4C 00000001 00000000 [LABELONE........]
2ABD671E9610 E4E1DDB1 00000020 324D564C 31303020 [.... ...LVM2 001] <==== **** Here ******
2ABD671E9620 50365A77 71327874 34303156 4B4E6136 [wZ6Ptx2qV1046aNK]
2ABD671E9630 35395159 5147634C 487A5A38 63575A37 [YQ95LcGQ8ZzH7ZWc]
2ABD671E9640 00000000 00000019 00030000 00000000 [................]
2ABD671E9650 00000000 00000000 00000000 00000000 [................]
2ABD671E9660 00000000 00000000 00001000 00000000 [................]
2ABD671E9670 0002F000 00000000 00000000 00000000 [................]
2ABD671E9680 00000000 00000000 00000000 00000000 [................]
  Repeat 215 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

“LVM2 001” 是逻辑卷的名字,该asm disk很可能被做为lvm管理而被破坏

[ebernal@dbaasm tars]$ kfed read rhdisk16
kfbh.endian:                         65 ; 0x000: 0x41
kfbh.hard:                           73 ; 0x001: 0x49
kfbh.type:                           88 ; 0x002: *** Unknown Enum ***
kfbh.datfmt:                         32 ; 0x003: 0x20
kfbh.block.blk:              1111709260 ; 0x004: blk=1111709260
kfbh.block.obj:              1634861056 ; 0x008: file=131072
kfbh.check:                         119 ; 0x00c: 0x00000077
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
2B6FE2AC1400 20584941 4243564C 61720000 00000077  [AIX LVCB..raw...] <==== **** Here ******
2B6FE2AC1410 00000000 00000000 00000000 00000000  [................]
2B6FE2AC1420 00000000 00000000 30300000 38306430  [..........000d08]
2B6FE2AC1430 30306131 34643030 30303030 31303030  [1a0000d400000001]
2B6FE2AC1440 61006533 766C6D73 7461645F 00003161  [3e.asmlv_data1..]
2B6FE2AC1450 00000000 00000000 00000000 00000000  [................]
        Repeat 2 times
2B6FE2AC1480 54000000 4D206575 20207961 31312037  [...Tue May  7 11]
2B6FE2AC1490 3A33343A 32203633 0A333130 00000000  [:43:36 2013.....]
2B6FE2AC14A0 65755400 79614D20 20372020 343A3131  [.Tue May  7 11:4]
2B6FE2AC14B0 34323A38 31303220 00000A33 44000000  [8:24 2013......D]
2B6FE2AC14C0 41313830 30303444 6D6D7900 02007900  [081AD400.ymm.y..]
2B6FE2AC14D0 0100E40C 656E6F4E 00000000 00000000  [....None........]
2B6FE2AC14E0 00000000 00000000 00000000 00000000  [................]
        Repeat 14 times
2B6FE2AC15D0 00000000 00000000 65310000 61653934  [..........1e49ea]
2B6FE2AC15E0 342E3862 00000000 00000000 00000000  [b8.4............]
2B6FE2AC15F0 00000000 00000000 00000000 00000000  [................]
  Repeat 224 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][88]

这里的“AIX LVCB..raw” 是AIX OS volume 的元数据库,也就是说,asm disk 被作为了aix os层面破坏

[oracle@dbep2 disks]$ kfed read asm-disk3
kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 0 ; 0x001: 0x00
kfbh.type: 0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt: 0 ; 0x003: 0x00
kfbh.block.blk: 0 ; 0x004: blk=0
kfbh.block.obj: 0 ; 0x008: file=0
kfbh.check: 0 ; 0x00c: 0x00000000
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
06000000 00000000 00000000 00000000 00000000 [................]
  Repeat 25 times
0602100 51e2b7f6 00ed4e00 00000000 00000001  [...Q.N..........]
0602120 00000000 0000000b 00000100 0000003c [............<...]
0602140 00000242 0000007b 5d8468e7 6147782a [B...{....h.]*xGa]
0602160 d17851a2 327552e2 00000000 00000000 [.Qx..Ru2........]
0602200 00000000 00000000 3130752f 91a4f000 [......../u01....] <==== **** Here ******
0602220 ff8808e4 d5104cff 000000ac 00000100 [.....L..........]
0602240 00000000 00000000 00000000 09d18000 [................]
  Repeat 254 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][88]

这里的/u01很可能表明该asm disk被文件系统覆盖

对于asm disk的各种破坏情况,如果是normal/high冗余,那么asm dg没有问题,可以考虑通过删除异常盘,然后重新加入;如果是外部冗余遭遇到asm disk 被破坏,一般asm disk 会dismount,而且无法正常mount,如果有备份的磁盘头,可以尝试还原磁盘头,mount 磁盘组,然后只读方式迁移数据;如果没有备份磁盘头或者还原之后也无法mount,可能需要通过一些额外的方式处理比如通过工具在asm dismount状态下恢复数据文件,甚至通过对asm block/oracle block碎片重组的方式恢复数据.参考相关文章:
oracle asm系列文章汇总
pvid=yes导致asm无法mount
asm disk header 彻底损坏恢复
分区无法识别导致asm diskgroup无法mount
oracle asm disk格式化恢复—格式化为ext4文件系统
oracle asm disk格式化恢复—格式化为ntfs文件系统
asm disk误设置pvid导致asm diskgroup无法mount恢复
分享oracleasm createdisk重新创建asm disk后数据0丢失恢复案例
ORA-15042: ASM disk “N” is missing from group number “M” 故障恢复

Oracle比特币病毒—由于系统感染病毒

最近Oracle数据库被黑勒索比特币的事件一起又一起,相比是最近黑客们是不是都想着弄点钱回家好过春节.前段时间分析过关于由于绿色版plsql dev的afterconnect.sql被人注入了恶意的脚本的事件具体见:plsql dev引起的数据库被黑勒索比特币实现原理分析和解决方案,对该种故障进行了分析,也提供了预防和解决方案.今天对于另一起比较常见比较常见的黑文件系统,勒索比特币的案例进行了分析,并且成功恢复出来数据.
被黑后现象
How to restore files


数据文件被加密
20161126191331
bbed分析文件

C:\Users\XIFENFEI>bbed password=blockedit

BBED: Release 2.0.0.0.0 - Limited Production on Sat Nov 26 19:15:00 2016

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.

************* !!! For Oracle Internal Use only !!! ***************

BBED> set filename 'D:\TEMP\BAK\GCwqPFVL2zzGGBCo.orgasm'
        FILENAME        D:\TEMP\BAK\GCwqPFVL2zzGGBCo.orgasm

BBED> set blocksize 8192
        BLOCKSIZE       8192


BBED> d
 File: D:\TEMP\BAK\GCwqPFVL2zzGGBCo.orgasm (0)
 Block: 1                Offsets:    0 to  511           Dba:0x00000000
------------------------------------------------------------------------
 f993499a db22faf6 c4ee1bde efbedec0 d7c0e2a0 5408896f e3f88e2f c3179373
 0de737db 595477d5 a12c64ae 08341d67 ff93dd1f ef04568b 62f46c03 315605db
 25affd5c 9be11af8 4eff4028 26081235 8fa1f832 6e02ab4a 6295a24a 4021a418
 a4a5c934 d4ecea4d f614f413 04869513 dec22a31 43fd7a5b 51a8b29b 656e1963
 955f540a 892dff97 6a681024 c5a57176 c6168175 28e2fea6 2439fd0c cf3c8c99
 487e909d 0951aded 8ba53a61 542b1f9a cb3c5d8b 21d25c82 fbb31554 20977f8b
 f46ba6d0 5e47328a d8c4d634 97bfeefa 4b632c38 a35a2f19 06fa9be2 b1984eab
 d9a30fe2 67513bc1 657897b2 f12a2e90 acee0bd2 8bf8af82 cfc79417 f8afd2c6
 6bab98cd 16e9ddc5 40e1705f b5763002 505b8964 805c6361 a7a84980 b7826019
 a4dafb89 0b0f27c0 fa19924e c3956a43 7815f01c e9eae471 c23e52bd 4b76556c
 40895e77 c1fd70c0 7dff4132 4189788e 895590b5 57469886 90ebc360 796e426c
 913026ed b4ea7026 41f18de1 e174a0c8 7a325eb9 edb1b296 dbfc1948 25f7af82
 df999d8b c8106e84 134531ea 5f5c6461 8fd4fc3e b04be820 4c838a2f 045d818d
 8eeace89 ac1ee884 b8d56032 9fc580c4 e288fd13 6f80f6d3 f176df8b 47645955
 738a9bbd 1c0ce841 395d3d84 a071937a 74991167 41a5bdf3 970a283b 4c6ecd25
 c1f0b4da 9115574b 7cdf44c6 4f799113 a6edca86 f62a20fb 354e433b 4f2250f5

 <32 bytes per line>

BBED> set block 2
        BLOCK#          2

BBED> d
 File: D:\TEMP\BAK\GCwqPFVL2zzGGBCo.orgasm (0)
 Block: 2                Offsets:    0 to  511           Dba:0x00000000
------------------------------------------------------------------------
 f19a5ede 35f9e2ed d51d9b35 ef28cfe0 9deb6021 e018bf6f 2d84be28 779ee63a
 cf3172ff 49f3b959 ef92728b c0a7129a 7335afb8 a1e0fe5c 7fc5b213 8a2afc78
 78ddd22e 9ff63c2e 6432a073 adad2138 b18ee56d fab16ba3 688968f5 b53d03ca
 1f1d80d3 a87bb038 38c84b6a 74f253bc 55efc8f9 e2c1d194 26803ccc 575300b2
 025eff5c 824bca6b 440e3cc3 2f48a704 b3db6db8 bf48903e 04bf7efa 019f0986
 264806ab a8a93048 1f2d7b4e b29a923b 61a701d0 d6783f69 027f06db f4d16fba
 4b9bbd68 3a32fa66 e9e18a4c 7332a908 7e9fab7e cc8810dc 438d7157 397467b1
 d8d0e972 4b892411 bfc1bab2 6e247b4c ad2b05a8 be799d07 d1226408 0ff00bc1
 3943c5aa 63182479 6c84e5db ab213221 736af62b bb9dc047 d4a28e40 8451119c
 6db794cc 5df39d30 592d7656 0a76048d 9b5d3b3d 7d85ccb8 796c5809 ae1122e1
 73006061 d22d0dfe 9a7b839a a5c32d6a cd8ad956 5e2a8013 280fc444 9807b477
 3eda5bca 0f6a2958 e0334dfc 52c23a95 fa2cfaff a86b1456 c74a0cd9 eec77fe6
 96261513 0044e3ef ef843e13 004a9ef2 ef01d670 6c988cb7 df0dea99 58ff78ac
 aa5783e8 b6e2b89d da953d7b 3b4ea7fa 8352d388 eb6a8d76 9b9525a0 f34d444c
 83d60651 6ff7f287 bd9f8bcf 5dae9592 32db539d d0c14939 0ab4e403 ff537cfa
 9657a1be 3e5aa43d 6fdf56fd 90dbd567 b7fe4aeb d3226a29 075da375 7c3d7581

 <32 bytes per line>

通过我们的一系列分析,对于这种数据文件里面的数据已经被加密,直接使用他们肯定是无法恢复出来其中的数据,我们通过一系列的分析,已经成功找出来他们其中的规律,对这种故障实现完美恢复,如果有这个方面的恢复请求,可以随时联系我们
Phone:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com

CON$ ORA-600 kdsgrp1

数据库报ORA 600 kdsgrp1错误
数据库报ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []错

Thread 1 advanced to log sequence 23861 (LGWR switch)
  Current log# 7 seq# 23861 mem# 0: /oradata/easdb/redo07.log
Tue Nov 15 10:00:42 2016
Errors in file /u01/oracle/diag/rdbms/easdb/easdb/trace/easdb_dw00_3165.trc  (incident=908262):
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/oracle/diag/rdbms/easdb/easdb/incident/incdir_908262/easdb_dw00_3165_i908262.trc
Tue Nov 15 10:00:55 2016
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Tue Nov 15 10:00:56 2016
Errors in file /u01/oracle/diag/rdbms/easdb/easdb/trace/easdb_dw00_3165.trc  (incident=908263):
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
ORA-06512: at "SYS.KUPW$WORKER", line 1751
ORA-06512: at line 2
Incident details in: /u01/oracle/diag/rdbms/easdb/easdb/incident/incdir_908263/easdb_dw00_3165_i908263.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
DW00 terminating with fatal err=600, pid=40, wid=1, job SYSTEM.
Tue Nov 15 10:01:01 2016
Thread 1 advanced to log sequence 23862 (LGWR switch)
  Current log# 2 seq# 23862 mem# 0: /oradata/easdb/redo02.log
Tue Nov 15 10:01:23 2016
Errors in file /u01/oracle/diag/rdbms/easdb/easdb/trace/easdb_dm00_3163.trc  (incident=908254):
ORA-31671: Worker process DW00 had an unhandled exception.
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
ORA-06512: at "SYS.KUPW$WORKER", line 1751
ORA-06512: at line 2
Incident details in: /u01/oracle/diag/rdbms/easdb/easdb/incident/incdir_908254/easdb_dm00_3163_i908254.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Tue Nov 15 10:01:26 2016
Tue Nov 15 10:01:28 2016
Thread 1 advanced to log sequence 23863 (LGWR switch)
  Current log# 4 seq# 23863 mem# 0: /oradata/easdb/redo04.log

trace文件中信息

*** 2016-11-15 10:00:35.977
* kdsgrp1-1: *************************************************
            row 0x004459e6.26 continuation at
            0x004459e6.26 file# 1 block# 285158 slot 38 not found
KDSTABN_GET: 0 ..... ntab: 1
curSlot: 38 ..... nrows: 208
kdsgrp - dump CR block dba=0x004459e6
Block header dump:  0x004459e6
 Object id on Block? Y
 seg/obj: 0x1c  csc: 0x01.c712f743  itc: 3  flg: -  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000b.015.0036d715  0x00c01bba.0fbd.02  C---    0  scn 0x0001.c6b4cb1a
0x02   0x000c.004.00044d36  0x04c0dd93.3eec.33  C---    0  scn 0x0001.c6d2c65b
0x03   0x000d.008.00008eb9  0x04c0777a.10e3.02  --U-    2  fsc 0x0056.c7346f21

确定报错对象和确认异常

SQL> select object_name from dba_objects where object_id=28;

OBJECT_NAME
---------------------------------------------------------
CON$

SQL> ANALYZE TABLE sys.CON$ VALIDATE STRUCTURE CASCADE online;
ANALYZE TABLE sys.CON$ VALIDATE STRUCTURE CASCADE online
*
ERROR at line 1:
ORA-01499: table/index cross reference failure - see trace file


SQL> SET LINES 122
SQL> COL INDEX_OWNER FOR A20
SQL> COL INDEX_NAME FOR A30
SQL> COL TABLE_OWNER FOR A20
SQL> COL COLUMN_NAME FOR A25
SQL> SELECT TABLE_OWNER,INDEX_NAME,COLUMN_NAME,COLUMN_POSITION
2  FROM Dba_Ind_Columns
3  WHERE table_name = upper('&TABLE_NAME') order by TABLE_OWNER,INDEX_OWNER,INDEX_NAME,COLUMN_POSITION;
Enter value for table_name: CON$
old   3:  WHERE table_name = upper('&TABLE_NAME') order by TABLE_OWNER,INDEX_OWNER,INDEX_NAME,COLUMN_POSITION
new   3:  WHERE table_name = upper('CON$') order by TABLE_OWNER,INDEX_OWNER,INDEX_NAME,COLUMN_POSITION

TABLE_OWNER	     INDEX_NAME 		    COLUMN_NAME 	      COLUMN_POSITION
-------------------- ------------------------------ ------------------------- ---------------
SYS		     I_CON1			    OWNER#				    1
SYS		     I_CON1			    NAME				    2
SYS		     I_CON2			    CON#				    1


SQL> select owner#,name from con$
2    minus
3   select /*+ full(t) */owner#,name from con$ t;

no rows selected

SQL> select /*+ full(t) */owner#,name from con$ t
2    minus
3   select owner#,name from con$  ;

no rows selected

SQL> select /*+ full(t) */ con# from con$ t
2    minus
3   select con# from con$ ;

no rows selected

SQL> select con# from con$
2    minus
3   select /*+ full(t) */ con# from con$ t   ;

      CON#
----------
   1037224
   1037225
   1037386
   1037387
   1037388
   ……
   1037846

62 rows selected.

通过上述分析,可以确定是由于CON$和I_CON2数据不一致,而且是index的数据比表中多了62条.针对这样情况,考虑通过重建index来解决.

尝试rebuild index

SQL> alter index I_CON2 rebuild online;
alter index I_CON2 rebuild online
*
ERROR at line 1:
ORA-00701: object necessary for warmstarting database cannot be altered


SQL>   
SQL> 
SQL> 
SQL> 
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup upgrade
ORACLE instance started.

Total System Global Area 2421825536 bytes
Fixed Size                  2215744 bytes
Variable Size            1828716736 bytes
Database Buffers          570425344 bytes
Redo Buffers               20467712 bytes
Database mounted.
Database opened.
SQL> alter index I_CON2 rebuild;                  
alter index I_CON2 rebuild
*
ERROR at line 1:
ORA-00701: object necessary for warmstarting database cannot be altered

因为是数据库核心index,无法直接rebuild解决,只能通过bootstrap$核心index(I_OBJ1,I_USER1,I_FILE#_BLOCK#,I_IND1,I_TS#,I_CDEF1等)异常恢复—ORA-00701错误解决 方式解决