ORA-600 kffmLoad_1 kffmVerify_4

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:ORA-600 kffmLoad_1 kffmVerify_4

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有朋友asm运行一段时间asm实例会报错导致数据库实例异常

Wed Dec 23 08:31:55 2020
Errors in file /u01/app/oracle/admin/+ASM/bdump/+asm1_asmb_6729.trc:
ORA-00600: internal error code, arguments: [kffmLoad_1], [4365], [1], [], [], [], [], []
Wed Dec 23 08:31:55 2020
Errors in file /u01/app/oracle/admin/+ASM/bdump/+asm1_asmb_6729.trc:
ORA-00600: internal error code, arguments: [kffmLoad_1], [4365], [1], [], [], [], [], []

Errors in file /u01/app/oracle/admin/+ASM/bdump/+asm1_asmb_29743.trc:
ORA-00600: internal error code, arguments: [kffmLoad_1], [670], [1], [], [], [], [], []
Wed Dec 23 09:10:22 2020
Errors in file /u01/app/oracle/admin/+ASM/bdump/+asm1_asmb_29743.trc:
ORA-00600: internal error code, arguments: [kffmLoad_1], [670], [1], [], [], [], [], []
Wed Dec 23 09:10:22 2020

Wed Dec 23 10:18:33 2020
Errors in file /u01/app/oracle/admin/+ASM/udump/+asm1_ora_25890.trc:
ORA-00600: internal error code, arguments: [kffmVerify_4], [0], [0], [887], [1005986561], [1352], [1], [0]

对应的trace文件

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/10.2.0/db
System name:	Linux
Node name:	shb01
Release:	2.6.18-348.el5
Version:	#1 SMP Wed Nov 28 21:22:00 EST 2012
Machine:	x86_64
Instance name: +ASM1
Redo thread mounted by this instance: 0 <none>
Oracle process number: 29
Unix process pid: 26337, image: oracle@xff01 (TNS V1-V3)

*** ACTION NAME:() 2020-12-22 19:03:41.272
*** MODULE NAME:(sp_ocap@xff01 (TNS V1-V3)) 2020-12-22 19:03:41.272
*** SERVICE NAME:() 2020-12-22 19:03:41.272
*** SESSION ID:(143.1) 2020-12-22 19:03:41.272
*** 2020-12-22 19:03:41.272
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kffmVerify_4], [0], [0], [1657], [1005987045], [152], [1], [0]
Current SQL statement for this session:
DECLARE 
fileType varchar2(16); 
fileName varchar2(1024); 
blkSz number; 
fileSz number; 
hdl number; 
plksz number;
BEGIN
fileName := '+DATA4/xifenfei/onlinelog/group_6.1657.1005987045'; 
BEGIN
dbms_diskgroup.getfileattr(fileName,fileType,fileSz, blkSz); 
dbms_diskgroup.open(fileName,'r',fileType,blkSz,hdl,plkSz,fileSz); 
EXCEPTION
WHEN OTHERS then
  :rc := SQLCODE;
  :err_msg := SQLERRM;
  return;
END;
:handle := hdl; 
:bsz := blkSz; 
:bcnt := fileSz; 
:rc := 0;
END;
----- PL/SQL Call Stack -----
  object      line  object
  handle    number  name
0x15ce59360        96  package body SYS.X$DBMS_DISKGROUP
0x15cd88568        12  anonymous block
----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
ksedst()+31          call     ksedst1()            000000000 ? 000000001 ?
                                                   7FFFBDFC3450 ? 7FFFBDFC34B0 ?
                                                   7FFFBDFC33F0 ? 000000000 ?
ksedmp()+610         call     ksedst()             000000000 ? 000000001 ?
                                                   7FFFBDFC3450 ? 7FFFBDFC34B0 ?
                                                   7FFFBDFC33F0 ? 000000000 ?
ksfdmp()+21          call     ksedmp()             000000003 ? 000000001 ?
                                                   7FFFBDFC3450 ? 7FFFBDFC34B0 ?
                                                   7FFFBDFC33F0 ? 000000000 ?
kgerinv()+161        call     ksfdmp()             000000003 ? 000000001 ?
                                                   7FFFBDFC3450 ? 7FFFBDFC34B0 ?
                                                   7FFFBDFC33F0 ? 000000000 ?
kgeasnmierr()+163    call     kgerinv()            0068996E0 ? 009AA2670 ?
                                                   7FFFBDFC34B0 ? 7FFFBDFC33F0 ?
                                                   000000000 ? 000000000 ?
kffmVerify()+379     call     kgeasnmierr()        0068996E0 ? 009AA2670 ?
                                                   7FFFBDFC34B0 ? 7FFFBDFC33F0 ?
                                                   000000000 ? 000000000 ?
kfioIdentify()+1276  call     kffmVerify()         000000000 ? 00000000D ?
                                                   000000001 ?
                                                   927B814400000004 ?
                                                   3BF624E500000679 ?
                                                   000000000 ?
ksfd_osmopn()+1138   call     kfioIdentify()       7FFFBDFC4820 ? 15DB873F4 ?
                                                   15DB87556 ? 000000200 ?
                                                   7FFF00000003 ? 15DB873C8 ?
ksfdopn()+1014       call     ksfd_osmopn()        7FFFBDFC4820 ? 00000002D ?
                                                   000000200 ? 000000003 ?
                                                   2B3800020000 ? 15F3031F0 ?
kfpkgDGOpenFile()+2  call     ksfdopn()            7FFFBDFC4820 ? 00000002D ?
301                                                000000200 ? 000000003 ?
                                                   000020000 ? 15F3031F0 ?
pevm_icd_call_commo  call     kfpkgDGOpenFile()    2B383F459FA8 ? 00000002D ?
n()+1003                                           2B383F439070 ? 000000003 ?
                                                   000020000 ? 15F3031F0 ?
pfrinstr_ICAL()+228  call     pevm_icd_call_commo  7FFFBDFC5700 ? 000000000 ?
                              n()                  000000001 ? 000000001 ?
                                                   000000007 ? 7FFF00000000 ?
pfrrun_no_tool()+65  call     pfrinstr_ICAL()      2B383F459FA8 ? 005DBD8AA ?
                                                   2B383F45A010 ? 000000001 ?
                                                   000000007 ? 7FFF00000000 ?
pfrrun()+906         call     pfrrun_no_tool()     2B383F459FA8 ? 005DBD8AA ?
                                                   2B383F45A010 ? 000000001 ?
                                                   000000007 ? 7FFF00000000 ?
plsql_run()+841      call     pfrrun()             2B383F459FA8 ? 000000000 ?
                                                   2B383F45A010 ? 7FFFBDFC5700 ?
                                                   000000007 ? 15CD77BD6 ?
peicnt()+298         call     plsql_run()          2B383F459FA8 ? 000000001 ?
                                                   000000000 ? 7FFFBDFC5700 ?
                                                   000000007 ? 900000000 ?
kkxexe()+503         call     peicnt()             7FFFBDFC5700 ? 2B383F459FA8 ?
                                                   2B383F438830 ? 7FFFBDFC5700 ?
                                                   2B383F4367D8 ? 900000000 ?
opiexe()+4691        call     kkxexe()             2B383F4561D8 ? 2B383F459FA8 ?
                                                   2B383F438830 ? 15C160BD8 ?
                                                   0040D677F ? 900000000 ?
kpoal8()+2273        call     opiexe()             000000049 ? 000000003 ?
                                                   7FFFBDFC6950 ? 000000001 ?
                                                   0040D677F ? 900000000 ?
opiodr()+984         call     kpoal8()             00000005E ? 000000017 ?
                                                   7FFFBDFC9830 ? 000000001 ?
                                                   000000001 ? 900000000 ?
ttcpip()+1012        call     opiodr()             00000005E ? 000000017 ?
                                                   7FFFBDFC9830 ? 000000000 ?
                                                   0059C35D0 ? 900000000 ?
opitsk()+1322        call     ttcpip()             0068A13B0 ? 7FFFBDFC75A0 ?
                                                   7FFFBDFC9830 ? 000000000 ?
                                                   7FFFBDFC9328 ? 7FFFBDFC9998 ?
opiino()+1026        call     opitsk()             000000003 ? 000000000 ?
                                                   7FFFBDFC9830 ? 000000001 ?
                                                   000000000 ? 4E6111C00000001 ?
opiodr()+984         call     opiino()             00000003C ? 000000004 ?
                                                   7FFFBDFCA9F8 ? 000000001 ?
                                                   000000000 ? 4E6111C00000001 ?
opidrv()+547         call     opiodr()             00000003C ? 000000004 ?
                                                   7FFFBDFCA9F8 ? 000000000 ?
                                                   0059C3080 ? 4E6111C00000001 ?
sou2o()+114          call     opidrv()             00000003C ? 000000004 ?
                                                   7FFFBDFCA9F8 ? 000000000 ?
                                                   0059C3080 ? 4E6111C00000001 ?
opimai_real()+163    call     sou2o()              7FFFBDFCA9D0 ? 00000003C ?
                                                   000000004 ? 7FFFBDFCA9F8 ?
                                                   0059C3080 ? 4E6111C00000001 ?
main()+116           call     opimai_real()        000000002 ? 7FFFBDFCAA60 ?
                                                   000000004 ? 7FFFBDFCA9F8 ?
                                                   0059C3080 ? 4E6111C00000001 ?
__libc_start_main()  call     main()               000000002 ? 7FFFBDFCAA60 ?
+244                                               000000004 ? 7FFFBDFCA9F8 ?
                                                   0059C3080 ? 4E6111C00000001 ?
_start()+41          call     __libc_start_main()  0007230B8 ? 000000002 ?
                                                   7FFFBDFCABB8 ? 000000000 ?
                                                   0059C3080 ? 000000002 ?
 
--------------------- Binary Stack Dump ---------------------

结合mos信息ORA-600[KFFMVERIFY_4] OR ORA-600 [kffmLoad_1], [131635] REPORTED ON THE ASMINSTANCE (Doc ID 794103.1)的描述,由于多个进程/现场使用dbms_diskgroup访问不同磁盘组之时可能触发
BUG:6377738 – ASMB ORA-00600 [KFFMVERIFY_4]
BUG:8328467 – ASM CRASHED WITH ORA-600[KFFMVERIFY_4] OR [KFFMVERIFY_4] AND [KFFMLOAD_1]
从而导致asm实例crash,引起数据库异常.结合客户这边的情况,确认他们是使用了多个SharePlex程序同步数据,而且redo放在多个磁盘组中,从而出现该问题.临时解决方案为把所有的redo和归档放一个磁盘组,这样多个SharePlex进程调用dbms_diskgroup访问redo/arch不会触发该bug.

ORA-00600 kfrHtAdd01

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:ORA-00600 kfrHtAdd01

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

由于存储掉电,报ORA-15096: lost disk write detected错误,无法mount磁盘组.

Sun Dec 20 16:56:51 2020
SQL> alter diskgroup data mount 
NOTE: cache registered group DATA number=1 incarn=0x0c1a7a4e
NOTE: cache began mount (first) of group DATA number=1 incarn=0x0c1a7a4e
NOTE: Assigning number (1,2) to disk (/dev/mapper/multipath12)
NOTE: Assigning number (1,5) to disk (/dev/mapper/multipath15)
NOTE: Assigning number (1,3) to disk (/dev/mapper/multipath13)
NOTE: Assigning number (1,7) to disk (/dev/mapper/multipath17)
NOTE: Assigning number (1,1) to disk (/dev/mapper/multipath11)
NOTE: Assigning number (1,6) to disk (/dev/mapper/multipath16)
NOTE: Assigning number (1,0) to disk (/dev/mapper/multipath10)
NOTE: Assigning number (1,4) to disk (/dev/mapper/multipath14)
Sun Dec 20 16:56:57 2020
NOTE: GMON heartbeating for grp 1
GMON querying group 1 at 19 for pid 32, osid 130347
NOTE: cache opening disk 0 of grp 1: DATA_0000 path:/dev/mapper/multipath10
NOTE: F1X0 found on disk 0 au 2 fcn 0.14159360
NOTE: cache opening disk 1 of grp 1: DATA_0001 path:/dev/mapper/multipath11
NOTE: F1X0 found on disk 1 au 2 fcn 0.14159360
NOTE: cache opening disk 2 of grp 1: DATA_0002 path:/dev/mapper/multipath12
NOTE: F1X0 found on disk 2 au 2 fcn 0.14159360
NOTE: cache opening disk 3 of grp 1: DATA_0003 path:/dev/mapper/multipath13
NOTE: cache opening disk 4 of grp 1: DATA_0004 path:/dev/mapper/multipath14
NOTE: cache opening disk 5 of grp 1: DATA_0005 path:/dev/mapper/multipath15
NOTE: cache opening disk 6 of grp 1: DATA_0006 path:/dev/mapper/multipath16
NOTE: cache opening disk 7 of grp 1: DATA_0007 path:/dev/mapper/multipath17
NOTE: cache mounting (first) normal redundancy group 1/0x0C1A7A4E (DATA)
Sun Dec 20 16:56:57 2020
* allocate domain 1, invalid = TRUE 
Sun Dec 20 16:56:58 2020
NOTE: attached to recovery domain 1
NOTE: starting recovery of thread=1 ckpt=233.4189 group=1 (DATA)
NOTE: starting recovery of thread=2 ckpt=542.6409 group=1 (DATA)
lost disk write detected during recovery (apply)
NOTE: recovery (pass 2) of diskgroup 1 (DATA) caught error ORA-15096
Errors in file /grid/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_130347.trc:
ORA-15096: lost disk write detected
Abort recovery for domain 1
NOTE: crash recovery signalled OER-15096
ERROR: ORA-15096 signalled during mount of diskgroup DATA
NOTE: cache dismounting (clean) group 1/0x0C1A7A4E (DATA) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 130347, image: oracle@db1.rac.com (TNS V1-V3)
NOTE: lgwr not being msg'd to dismount

通过一系列修复之后报错如下

Sun Dec 20 20:12:35 2020
NOTE: GMON heartbeating for grp 1
GMON querying group 1 at 23 for pid 26, osid 67538
Sun Dec 20 20:12:35 2020
NOTE: cache opening disk 0 of grp 1: DATA_0000 path:/dev/mapper/multipath10
NOTE: F1X0 found on disk 0 au 2 fcn 0.14159360
NOTE: cache opening disk 1 of grp 1: DATA_0001 path:/dev/mapper/multipath11
NOTE: F1X0 found on disk 1 au 2 fcn 0.14159360
NOTE: cache opening disk 2 of grp 1: DATA_0002 path:/dev/mapper/multipath12
NOTE: F1X0 found on disk 2 au 2 fcn 0.14159360
NOTE: cache opening disk 3 of grp 1: DATA_0003 path:/dev/mapper/multipath13
NOTE: cache opening disk 4 of grp 1: DATA_0004 path:/dev/mapper/multipath14
NOTE: cache opening disk 5 of grp 1: DATA_0005 path:/dev/mapper/multipath15
NOTE: cache opening disk 6 of grp 1: DATA_0006 path:/dev/mapper/multipath16
NOTE: cache opening disk 7 of grp 1: DATA_0007 path:/dev/mapper/multipath17
NOTE: cache mounting (first) normal redundancy group 1/0x64848829 (DATA)
Sun Dec 20 20:12:36 2020
* allocate domain 1, invalid = TRUE 
Sun Dec 20 20:12:36 2020
NOTE: attached to recovery domain 1
NOTE: Fallback recovery: thread 2 read 10751 blocks oldest redo found in ABA 540.6429
NOTE: Fallback recovery: thread 1 read 10751 blocks oldest redo found in ABA 232.4218
Errors in file /grid/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_67538.trc  (incident=1692689):
ORA-00600: internal error code, arguments: [kfrHtAdd01], [2147483651], [1025], [0], [38660545], [0],
 [38687990], [1], [2], [6429], [], []
Incident details in: /grid/app/grid/diag/asm/+asm/+ASM1/incident/incdir_1692689/+ASM1_ora_67538_i1692689.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Sun Dec 20 20:12:39 2020
Sweep [inc][1692689]: completed
Sweep [inc2][1692689]: completed
Errors in file /grid/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_67538.trc:
ORA-00600: internal error code, arguments: [kfrHtAdd01], [2147483651], [1025], [0], [38660545], [0],
 [38687990], [1], [2], [6429], [], []
NOTE: crash recovery signalled OER-600
ERROR: ORA-600 signalled during mount of diskgroup DATA
NOTE: cache dismounting (clean) group 1/0x64848829 (DATA) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 67538, image: oracle@db1.rac.com (TNS V1-V3)
NOTE: lgwr not being msg'd to dismount
freeing rdom 1
NOTE: detached from domain 1
NOTE: cache dismounted group 1/0x64848829 (DATA) 
NOTE: cache ending mount (fail) of group DATA number=1 incarn=0x64848829
NOTE: cache deleting context for group DATA 1/0x64848829
GMON dismounting group 1 at 24 for pid 26, osid 67538
NOTE: Disk DATA_0000 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0001 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0002 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0003 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0004 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0005 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0006 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0007 in mode 0x7f marked for de-assignment
ERROR: diskgroup DATA was not mounted
ORA-00600: internal error code, arguments: [kfrHtAdd01], [2147483651], [1025], [0],
 [38660545], [0], [38687990], [1], [2], [6429], [], []
ERROR: alter diskgroup data mount

分析trace文件

*** 2020-12-20 20:11:54.956
kfdp_query(DATA): 19 
----- Abridged Call Stack Trace -----
ksedsts()+465<-kfdp_query()+530<-kfdPstSyncPriv()+585<-kfgFinalizeMount()+1630<-kfgscFinalize()+1433<
-kfgForEachKfgsc()+285<-kfgsoFinalize()+135<-kfgFinalize()+398<-kfxdrvMount()+5558<-kfxdrvEntry()
+2207<-opiexe()+20624<-opiosq0()+3932<-kpooprx()+274<-kpoal8()+842<-opiodr()+917<-ttcpip()
+2183<-opitsk()+1710<-opiino()+969<-opiodr()+917<-opidrv()+570<-sou2o()
+103<-opimai_real()+133<-ssthrdmain()+265<-main()+201<-__libc_start_main()+253 
----- End of Abridged Call Stack Trace -----
2020-12-20 20:11:55.393106 : Start recovery for domain=1, valid=0, flags=0x4
NOTE: starting recovery of thread=1 ckpt=233.4189 group=1 (DATA)
NOTE: starting recovery of thread=2 ckpt=542.6409 group=1 (DATA)
lost disk write detected during recovery (apply):
last written kfcn: 0.38747593 aba=233.4208 thd=1
kfcn_kfrbcd=0.38747593 flags_kfrbcd=0x001c aba=542.6410 thd=2
CE: (0x0x66edc798) group=1 (DATA) fn=4 blk=1
    hashFlags=0x0000 lid=0x0002 lruFlags=0x0000 bastCount=1
    mirror=0
    flags_kfcpba=0x38 copies=3 blockIndex=1 AUindex=0 AUcount=0 loctr fcn=0.0
    copy #0:  disk=6  au=35 flags=01
    copy #1:  disk=0  au=34 flags=01
    copy #2:  disk=4  au=52 flags=01
BH: (0x0x66e10d00) bnum=33 type=COD_RBO state=rcv chgSt=not modifying pageIn=rcvRead
    flags=0x00000000 pinmode=excl lockmode=null bf=0x66020000
    kfbh_kfcbh.fcn_kfbh = 0.38747538 lowAba=0.0 highAba=0.0
    modTime=0
    last kfcbInitSlot return code=null chgCount=0 cpkt lnk is null ralFlags=0x00000000
    PINS:
    (kfcbps) pin=91 get by kfr.c line 7879 mode=excl
             fn=4 blk=1 status=pinned
             flags=0x88000000 flags2=0x00000000
             class=0 type=INVALID stateWanted=rcvRead
             bastCount=1 waitStatus=0x00000000 relocCount=0
             scanBastCount=0 scanBxid=0 scanSkipCode=0
             last released by kfc.c 21183
NOTE: recovery (pass 2) of diskgroup 1 (DATA) caught error ORA-15096
last new 0.0
kfrPass2: dump of current log buffer for error 15096 follows
=======================
OSM metadata block dump:
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            8 ; 0x002: KFBTYP_CHNGDIR
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                   17162 ; 0x004: blk=17162
kfbh.block.obj:                       3 ; 0x008: file=3
kfbh.check:                  4226524538 ; 0x00c: 0xfbeba57a
kfbh.fcn.base:                 38747431 ; 0x010: 0x024f3d27
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfracdb.aba.seq:                    542 ; 0x000: 0x0000021e
kfracdb.aba.blk:                   6409 ; 0x004: 0x00001909
kfracdb.ents:                         1 ; 0x008: 0x0001
kfracdb.ub2spare:                     0 ; 0x00a: 0x0000
kfracdb.lge[0].valid:                 1 ; 0x00c: V=1 B=0 M=0
kfracdb.lge[0].chgCount:              1 ; 0x00d: 0x01
kfracdb.lge[0].len:                  68 ; 0x00e: 0x0044
kfracdb.lge[0].kfcn.base:      38747432 ; 0x010: 0x024f3d28
kfracdb.lge[0].kfcn.wrap:             0 ; 0x014: 0x00000000
kfracdb.lge[0].bcd[0].kfbl.blk:    1292 ; 0x018: blk=1292
kfracdb.lge[0].bcd[0].kfbl.obj:       1 ; 0x01c: file=1
kfracdb.lge[0].bcd[0].kfcn.base:38743102 ; 0x020: 0x024f2c3e
kfracdb.lge[0].bcd[0].kfcn.wrap:      0 ; 0x024: 0x00000000
kfracdb.lge[0].bcd[0].oplen:          8 ; 0x028: 0x0008
kfracdb.lge[0].bcd[0].blkIndex:      12 ; 0x02a: 0x000c
kfracdb.lge[0].bcd[0].flags:         28 ; 0x02c: F=0 N=0 F=1 L=1 V=1 A=0 C=0
kfracdb.lge[0].bcd[0].opcode:       135 ; 0x02e: 0x0087
kfracdb.lge[0].bcd[0].kfbtyp:         4 ; 0x030: KFBTYP_FILEDIR
kfracdb.lge[0].bcd[0].redund:        19 ; 0x031: SCHE=0x1 NUMB=0x3
kfracdb.lge[0].bcd[0].pad:        63903 ; 0x032: 0xf99f
kfracdb.lge[0].bcd[0].KFFFD_COMMIT.modts.hi:33108586 ; 0x034: HOUR=0xa DAYS=0x13 MNTH=0xc YEAR=0x7e4
kfracdb.lge[0].bcd[0].KFFFD_COMMIT.modts.lo:0 ; 0x038: USEC=0x0 MSEC=0x0 SECS=0x0 MINS=0x0
kfracdb.lge[0].bcd[0].au[0]:     292415 ; 0x03c: 0x0004763f
kfracdb.lge[0].bcd[0].au[1]:     292452 ; 0x040: 0x00047664
kfracdb.lge[0].bcd[0].au[2]:     292474 ; 0x044: 0x0004767a
kfracdb.lge[0].bcd[0].disks[0]:       2 ; 0x048: 0x0002
kfracdb.lge[0].bcd[0].disks[1]:       1 ; 0x04a: 0x0001
kfracdb.lge[0].bcd[0].disks[2]:       0 ; 0x04c: 0x0000

彻底屏蔽asm的实例恢复,mount磁盘组,尝试启动库进行数据库恢复.如果如果此类asm无法mount问题,无法自行解决请联系我们
电话/微信:13429648788    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com

asm磁盘类似_DROPPED_0001_DATA名称故障处理

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:asm磁盘类似_DROPPED_0001_DATA名称故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

发现一客户数据库的asm磁盘组中有磁盘掉线(通过分析日志确认2016年就已经掉线,而且不在做rebalance)
20201205195855


20201205221937

进一步检查

SQL> /

NAME			       PATH		  GROUP_NUMBER DISK_NUMBER MOUNT_STATUS   HEADER_STATUS
------------------------------ --------------------- ------------ ----------- -------------- ------------------------
MODE_STATUS    STATE		FAILGROUP
-------------- ---------------- --------------------
			       ORCL:DATA2	  0		 0 CLOSED	  MEMBER
ONLINE	       NORMAL

			       ORCL:FLASH1	  0		 1 CLOSED	  MEMBER
ONLINE	       NORMAL

			       ORCL:GRID3	  0		 2 CLOSED	  MEMBER
ONLINE	       NORMAL

_DROPPED_0000_FLASH				  2		 0 MISSING	  UNKNOWN
OFFLINE        FORCING		FLASH1

_DROPPED_0001_DATA				  1		 1 MISSING	  UNKNOWN
OFFLINE        FORCING		DATA2

DATA1			       ORCL:DATA1	  1		 0 CACHED	  MEMBER
ONLINE	       NORMAL		DATA1

FLASH2			       ORCL:FLASH2	  2		 1 CACHED	  MEMBER
ONLINE	       NORMAL		FLASH2

GRID1			       ORCL:GRID1	  3		 0 CACHED	  MEMBER
ONLINE	       NORMAL		GRID1

GRID2			       ORCL:GRID2	  3		 1 CACHED	  MEMBER
ONLINE	       NORMAL		GRID2

GRID4			       ORCL:GRID4	  3		 3 CACHED	  MEMBER
ONLINE	       NORMAL		GRID4

GRID5			       ORCL:GRID5	  3		 4 CACHED	  MEMBER
ONLINE	       NORMAL		GRID5

GRID6			       ORCL:GRID6	  3		 5 CACHED	  MEMBER
ONLINE	       NORMAL		GRID6


12 rows selected.


SQL> select NAME,STATE,TYPE,OFFLINE_DISKS from v$asm_diskgroup;

NAME
------------------------------------------------------------
STATE		       TYPE	    OFFLINE_DISKS
---------------------- ------------ -------------
DATA
MOUNTED 	       NORMAL			1

FLASH
MOUNTED 	       NORMAL			1

GRID
MOUNTED 	       NORMAL			0

主要问题是由于ORCL:FLASH1和ORCL:DATA2磁盘掉线导致处于_DROPPED_0000_FLASH和_DROPPED_0001_DATA状态.底层检查,确定现在这些磁盘都正常.然后使用force命令进行强制增加掉线的磁盘到对应的磁盘组中

SQL> alter diskgroup FLASH add failgroup flg1 disk 'ORCL:FLASH1'  force;

Diskgroup altered.

SQL> alter diskgroup data add failgroup dg2 disk 'ORCL:DATA2'  force;

Diskgroup altered.

观察asm 日志,等rebalance完成

Sat Dec 05 16:48:10 2020
SQL> alter diskgroup FLASH add failgroup flg1 disk 'ORCL:FLASH1'  force 
NOTE: GroupBlock outside rolling migration privileged region
NOTE: Assigning number (2,2) to disk (ORCL:FLASH1)
NOTE: requesting all-instance membership refresh for group=2
NOTE: initializing header on grp 2 disk FLASH1
NOTE: requesting all-instance disk validation for group=2
Sat Dec 05 16:48:13 2020
NOTE: skipping rediscovery for group 2/0x58e713e7 (FLASH) on local instance.
NOTE: requesting all-instance disk validation for group=2
NOTE: skipping rediscovery for group 2/0x58e713e7 (FLASH) on local instance.
Sat Dec 05 16:48:19 2020
GMON updating for reconfiguration, group 2 at 14 for pid 34, osid 12203
NOTE: group 2 PST updated.
NOTE: initiating PST update: grp = 2
GMON updating group 2 at 15 for pid 34, osid 12203
NOTE: cache closing disk 0 of grp 2: (not open) _DROPPED_0000_FLASH
NOTE: group FLASH: updated PST location: disk 0001 (PST copy 0)
NOTE: group FLASH: updated PST location: disk 0002 (PST copy 1)
NOTE: PST update grp = 2 completed successfully 
NOTE: membership refresh pending for group 2/0x58e713e7 (FLASH)
GMON querying group 2 at 16 for pid 18, osid 41180
NOTE: cache closing disk 0 of grp 2: (not open) _DROPPED_0000_FLASH
NOTE: cache opening disk 2 of grp 2: FLASH1 label:FLASH1
NOTE: Attempting voting file refresh on diskgroup FLASH
NOTE: Refresh completed on diskgroup FLASH. No voting file found.
GMON querying group 2 at 17 for pid 18, osid 41180
NOTE: cache closing disk 0 of grp 2: (not open) _DROPPED_0000_FLASH
Sat Dec 05 16:48:25 2020
SUCCESS: refreshed membership for 2/0x58e713e7 (FLASH)
Sat Dec 05 16:48:25 2020
SUCCESS: alter diskgroup FLASH add failgroup flg1 disk 'ORCL:FLASH1'  force
NOTE: starting rebalance of group 2/0x58e713e7 (FLASH) at power 1
Starting background process ARB0
Sat Dec 05 16:48:26 2020
ARB0 started with pid=36, OS id=12451 
NOTE: assigning ARB0 to group 2/0x58e713e7 (FLASH) with 1 parallel I/O
cellip.ora not found.
NOTE: F1X0 copy 2 relocating from 0:2 to 2:2 for diskgroup 2 (FLASH)
NOTE: Attempting voting file refresh on diskgroup FLASH
NOTE: Refresh completed on diskgroup FLASH. No voting file found.
Sat Dec 05 16:48:45 2020
NOTE: Rebalance has restored redundancy for any existing control file or redo log in disk group FLASH
Sat Dec 05 16:49:06 2020
NOTE: stopping process ARB0
SUCCESS: rebalance completed for group 2/0x58e713e7 (FLASH)
Sat Dec 05 16:49:08 2020
NOTE: GroupBlock outside rolling migration privileged region
NOTE: requesting all-instance membership refresh for group=2
Sat Dec 05 16:49:11 2020
GMON updating for reconfiguration, group 2 at 18 for pid 36, osid 12681
NOTE: cache closing disk 0 of grp 2: (not open) _DROPPED_0000_FLASH
NOTE: group FLASH: updated PST location: disk 0001 (PST copy 0)
NOTE: group FLASH: updated PST location: disk 0002 (PST copy 1)
NOTE: group 2 PST updated.
SUCCESS: grp 2 disk _DROPPED_0000_FLASH going offline 
GMON updating for reconfiguration, group 2 at 19 for pid 36, osid 12681
NOTE: cache closing disk 0 of grp 2: (not open) _DROPPED_0000_FLASH
NOTE: group FLASH: updated PST location: disk 0001 (PST copy 0)
NOTE: group FLASH: updated PST location: disk 0002 (PST copy 1)
NOTE: group 2 PST updated.
NOTE: membership refresh pending for group 2/0x58e713e7 (FLASH)
GMON querying group 2 at 20 for pid 18, osid 41180
GMON querying group 2 at 21 for pid 18, osid 41180
NOTE: Disk _DROPPED_0000_FLASH in mode 0x0 marked for de-assignment
SUCCESS: refreshed membership for 2/0x58e713e7 (FLASH)
Sat Dec 05 16:51:56 2020
SQL> alter diskgroup data add failgroup dg2 disk 'ORCL:DATA2'  force 
NOTE: GroupBlock outside rolling migration privileged region
NOTE: Assigning number (1,2) to disk (ORCL:DATA2)
NOTE: requesting all-instance membership refresh for group=1
NOTE: initializing header on grp 1 disk DATA2
NOTE: requesting all-instance disk validation for group=1
Sat Dec 05 16:51:57 2020
NOTE: skipping rediscovery for group 1/0x58d713e6 (DATA) on local instance.
NOTE: requesting all-instance disk validation for group=1
NOTE: skipping rediscovery for group 1/0x58d713e6 (DATA) on local instance.
Sat Dec 05 16:52:02 2020
GMON updating for reconfiguration, group 1 at 22 for pid 34, osid 12203
NOTE: group 1 PST updated.
NOTE: initiating PST update: grp = 1
GMON updating group 1 at 23 for pid 34, osid 12203
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
NOTE: group DATA: updated PST location: disk 0000 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0002 (PST copy 1)
NOTE: PST update grp = 1 completed successfully 
NOTE: membership refresh pending for group 1/0x58d713e6 (DATA)
GMON querying group 1 at 24 for pid 18, osid 41180
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
NOTE: cache opening disk 2 of grp 1: DATA2 label:DATA2
Sat Dec 05 16:52:08 2020
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: Refresh completed on diskgroup DATA. No voting file found.
GMON querying group 1 at 25 for pid 18, osid 41180
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
SUCCESS: refreshed membership for 1/0x58d713e6 (DATA)
Sat Dec 05 16:52:08 2020
SUCCESS: alter diskgroup data add failgroup dg2 disk 'ORCL:DATA2'  force
NOTE: starting rebalance of group 1/0x58d713e6 (DATA) at power 1
Starting background process ARB0
Sat Dec 05 16:52:08 2020
ARB0 started with pid=37, OS id=13463 
NOTE: assigning ARB0 to group 1/0x58d713e6 (DATA) with 1 parallel I/O
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: Refresh completed on diskgroup DATA. No voting file found.
Sat Dec 05 16:52:44 2020
cellip.ora not found.
NOTE: F1X0 copy 2 relocating from 1:2 to 2:2 for diskgroup 1 (DATA)
Sat Dec 05 16:53:22 2020
NOTE: Rebalance has restored redundancy for any existing control file or redo log in disk group DATA
NOTE: membership refresh pending for group 1/0x58d713e6 (DATA)
GMON querying group 1 at 27 for pid 18, osid 41180
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
SUCCESS: refreshed membership for 1/0x58d713e6 (DATA)
SUCCESS: alter diskgroup data rebalance power 11
NOTE: starting rebalance of group 1/0x58d713e6 (DATA) at power 11
Starting background process ARB0
Sat Dec 05 17:27:52 2020
ARB0 started with pid=35, OS id=23318 
NOTE: assigning ARB0 to group 1/0x58d713e6 (DATA) with 11 parallel I/Os
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: Refresh completed on diskgroup DATA. No voting file found.
Sat Dec 05 17:28:29 2020
cellip.ora not found.
Sat Dec 05 17:28:45 2020
NOTE: Rebalance has restored redundancy for any existing control file or redo log in disk group DATA
Sat Dec 05 18:48:10 2020
NOTE: GroupBlock outside rolling migration privileged region
NOTE: requesting all-instance membership refresh for group=1
Sat Dec 05 18:48:32 2020
GMON updating for reconfiguration, group 1 at 28 for pid 36, osid 47454
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
NOTE: group DATA: updated PST location: disk 0000 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0002 (PST copy 1)
Sat Dec 05 18:48:32 2020
NOTE: group 1 PST updated.
SUCCESS: grp 1 disk _DROPPED_0001_DATA going offline 
GMON updating for reconfiguration, group 1 at 29 for pid 36, osid 47454
NOTE: cache closing disk 1 of grp 1: (not open) _DROPPED_0001_DATA
NOTE: group DATA: updated PST location: disk 0000 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0002 (PST copy 1)
NOTE: group 1 PST updated.
Sat Dec 05 18:48:32 2020
NOTE: membership refresh pending for group 1/0x58d713e6 (DATA)
GMON querying group 1 at 30 for pid 18, osid 41180
GMON querying group 1 at 31 for pid 18, osid 41180
NOTE: Disk _DROPPED_0001_DATA in mode 0x0 marked for de-assignment
SUCCESS: refreshed membership for 1/0x58d713e6 (DATA)
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: Refresh completed on diskgroup DATA. No voting file found.
Sat Dec 05 18:52:24 2020
NOTE: stopping process ARB0
SUCCESS: rebalance completed for group 1/0x58d713e6 (DATA)

查询磁盘状态,掉线磁盘已经被加入,asm磁盘组恢复正常
20201205201841


20201205201851
总结:对于normal磁盘组由于某种原因磁盘从磁盘组中掉,v$asm_disk.name类似_DROPPED_0001_DATA,v$asm_disk.state为FORCING,可以通过类似alter diskgroup data add failgroup dg2 disk ‘ORCL:DATA2′ force;方式强制增加掉线的磁盘进入磁盘组,然后待rebalance完成,问题修复

ORA-15096: lost disk write detected

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:ORA-15096: lost disk write detected

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

又一例由于存储掉电导致asm磁盘组,由于ORA-15096: lost disk write detected,导致无法mount的恢复请求

SQL> ALTER DISKGROUP DATA MOUNT  /* asm agent *//* {1:45277:148} */
NOTE: cache registered group DATA number=2 incarn=0x73886b6a
NOTE: cache began mount (first) of group DATA number=2 incarn=0x73886b6a
NOTE: Assigning number (2,2) to disk (/dev/asm-data3)
NOTE: Assigning number (2,1) to disk (/dev/asm-data2)
NOTE: Assigning number (2,0) to disk (/dev/asm-data1)
Fri Nov 06 19:06:56 2020
NOTE: GMON heartbeating for grp 2
GMON querying group 2 at 94 for pid 30, osid 11596
NOTE: cache opening disk 0 of grp 2: DATA_0000 path:/dev/asm-data1
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache opening disk 1 of grp 2: DATA_0001 path:/dev/asm-data2
NOTE: cache opening disk 2 of grp 2: DATA_0002 path:/dev/asm-data3
NOTE: cache mounting (first) external redundancy group 2/0x73886B6A (DATA)
Fri Nov 06 19:06:57 2020
* allocate domain 2, invalid = TRUE
kjbdomatt send to inst 2
Fri Nov 06 19:06:57 2020
NOTE: attached to recovery domain 2
NOTE: starting recovery of thread=1 ckpt=25.7986 group=2 (DATA)
NOTE: starting recovery of thread=2 ckpt=33.364 group=2 (DATA)
NOTE: BWR validation signaled ORA-15096
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_11596.trc:
ORA-15096: lost disk write detected
NOTE: crash recovery signalled OER-15096
ERROR: ORA-15096 signalled during mount of diskgroup DATA
NOTE: cache dismounting (clean) group 2/0x73886B6A (DATA)
NOTE: messaging CKPT to quiesce pins Unix process pid: 11596, image: oracle@db1 (TNS V1-V3)
NOTE: lgwr not being msg'd to dismount
kjbdomdet send to inst 2
detach from dom 2, sending detach message to inst 2
freeing rdom 2
NOTE: detached from domain 2
NOTE: cache dismounted group 2/0x73886B6A (DATA)
NOTE: cache ending mount (fail) of group DATA number=2 incarn=0x73886b6a
NOTE: cache deleting context for group DATA 2/0x73886b6a
GMON dismounting group 2 at 95 for pid 30, osid 11596
NOTE: Disk DATA_0000 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0001 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0002 in mode 0x7f marked for de-assignment
ERROR: diskgroup DATA was not mounted
ORA-15032: not all alterations performed
ORA-15096: lost disk write detected
ERROR: ALTER DISKGROUP DATA MOUNT  /* asm agent *//* {1:45277:148} */

通过判断,通过一系列处理之后,数据库进行了mount操作发现报错ORA-600 2130

Fri Nov 06 17:03:27 2020
ALTER DATABASE RECOVER  database
Media Recovery Start
 started logmerger process
Parallel Media Recovery started with 40 slaves
Fri Nov 06 17:03:29 2020
Errors in file /u01/app/oracle/diag/rdbms/ynhis/ynhis1/trace/ynhis1_pr00_7393.trc  (incident=195869):
ORA-00600: internal error code, arguments: [2130], [2], [1], [2], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/ynhis/ynhis1/incident/incdir_195869/ynhis1_pr00_7393_i195869.trc
Fri Nov 06 17:03:30 2020
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Media Recovery failed with error 600
ORA-10877 signalled during: ALTER DATABASE RECOVER  database  ...

判断redo异常,通过resetlogs打开库,发现报错ORA-00600 2662

Fri Nov 06 18:21:32 2020
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 8670753264
Resetting resetlogs activation ID 306909514 (0x124b114a)
Redo thread 2 enabled by open resetlogs or standby activation
Fri Nov 06 18:21:39 2020
Setting recovery target incarnation to 2
Initializing SCN for created control file
Database SCN compatibility initialized to 3
Warning - High Database SCN: Current SCN value is 8670753267, threshold SCN value is 0
Fri Nov 06 18:21:39 2020
Assigning activation ID 408224320 (0x18550240)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: /orabak/data/group_1.289.954514319
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Nov 06 18:21:40 2020
SMON: enabling cache recovery
Errors in file /u01/app/oracle/diag/rdbms/ynhis/ynhis1/trace/ynhis1_ora_24310.trc  (incident=231847):
ORA-00600: internal error code, arguments: [2662], [2], [80818679], [2], [93545365], [4194545], [], [], [], [], [],[]
Incident details in: /u01/app/oracle/diag/rdbms/ynhis/ynhis1/incident/incdir_231847/ynhis1_ora_24310_i231847.trc
Fri Nov 06 18:21:42 2020
Dumping diagnostic data in directory=[cdmp_20201106182142],requested by(instance=1,osid=24310),summary=[incident=231847]
Fri Nov 06 18:21:43 2020
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/ynhis/ynhis1/trace/ynhis1_ora_24310.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [2662], [2], [80818679], [2], [93545365],[4194545],[],[],[],[],[],[]
Errors in file /u01/app/oracle/diag/rdbms/ynhis/ynhis1/trace/ynhis1_ora_24310.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [2662], [2], [80818679], [2], [93545365],[4194545],[],[],[],[],[],[]
Error 704 happened during db open, shutting down database
USER (ospid: 24310): terminating the instance due to error 704
Instance terminated by USER, pid = 24310
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (24310) as a result of ORA-1092

处理该错误之后,数据库resetlog之后,数据库open成功但是报错ORA-00600 4137

Database Characterset is ZHS16GBK
Errors in file /u01/app/oracle/diag/rdbms/ynhis/ynhis1/trace/ynhis1_smon_26195.trc  (incident=255799):
ORA-00600: internal error code, arguments: [4137], [25.33.122556], [0], [0], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/ynhis/ynhis1/incident/incdir_255799/ynhis1_smon_26195_i255799.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
No Resource Manager plan active
Fri Nov 06 18:30:46 2020
replication_dependency_tracking turned off (no async multimaster replication found)
ORACLE Instance ynhis1 (pid = 23) - Error 600 encountered while recovering transaction (25, 33).
Errors in file /u01/app/oracle/diag/rdbms/ynhis/ynhis1/trace/ynhis1_smon_26195.trc:
ORA-00600: internal error code, arguments: [4137], [25.33.122556], [0], [0], [], [], [], [], [], [], [], []

对异常undo进行处理,数据库可以正常启动关闭,然后安排数据导出导入新库操作,恢复完成.

win asm disk header 异常恢复

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:win asm disk header 异常恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有朋友反馈win环境下rac异常,asm无法正常mount,检查日志发现

Fri Jul 03 03:55:46 2020
Errors in file C:\APP\ADMINISTRATOR\diag\asm\+asm\+asm2\trace\+asm2_ora_7004.trc:
ORA-15025: could not open disk "\\.\ORCLDISKDATA1"
ORA-27041: unable to open file
OSD-04002: 无法打开文件
O/S-Error: (OS 2) 系统找不到指定的文件。
Errors in file C:\APP\ADMINISTRATOR\diag\asm\+asm\+asm2\trace\+asm2_ora_7004.trc:
ORA-15025: could not open disk "\\.\ORCLDISKDATA1"
ORA-27041: unable to open file
OSD-04002: 无法打开文件
O/S-Error: (OS 2) 系统找不到指定的文件。
WARNING: failed to read mirror side 1 of virtual extent 0 logical extent 0 of file 267 in group [2.2254399778] 
from disk DATA_0000  allocation unit 3502 reason error; if possible, will try another mirror side
Errors in file C:\APP\ADMINISTRATOR\diag\asm\+asm\+asm2\trace\+asm2_ora_7004.trc:
ORA-15081: failed to submit an I/O operation to a disk
Fri Jul 03 03:59:46 2020
Errors in file C:\APP\ADMINISTRATOR\diag\asm\+asm\+asm2\trace\+asm2_ora_7328.trc:
ORA-15025: could not open disk "\\.\ORCLDISKDATA1"
ORA-27041: unable to open file
OSD-04002: 无法打开文件
O/S-Error: (OS 2) 系统找不到指定的文件。
Errors in file C:\APP\ADMINISTRATOR\diag\asm\+asm\+asm2\trace\+asm2_ora_7328.trc:
ORA-15025: could not open disk "\\.\ORCLDISKDATA1"
ORA-27041: unable to open file
OSD-04002: 无法打开文件
O/S-Error: (OS 2) 系统找不到指定的文件。
WARNING: failed to read mirror side 1 of virtual extent 0 logical extent 0 of file 267 in group [2.2254399778] 
from disk DATA_0000  allocation unit 3502 reason error; if possible, will try another mirror side
Errors in file C:\APP\ADMINISTRATOR\diag\asm\+asm\+asm2\trace\+asm2_ora_7328.trc:
ORA-15081: failed to submit an I/O operation to a disk

报错信息比较明显是由于无法找到\\.\ORCLDISKDATA1磁盘,因此异常,通过asmtool查看磁盘信息

C:\app\11.2.0\grid>asmtool -list
NTFS                             \Device\Harddisk0\Partition3            81920M
NTFS                             \Device\Harddisk0\Partition4           200000M
NTFS                             \Device\Harddisk0\Partition5          4293849M
                                 \Device\Harddisk1\Partition2             4062M
                                 \Device\Harddisk2\Partition2          2097022M
ORCLDISKFRA0                     \Device\Harddisk3\Partition2           511870M

明显的发现ORCLDISKDATA1磁盘丢失,通过对磁盘dd到本地然后进行分析发现,asm disk header损坏

C:\Users\Administrator>kfed read F:\temp\disk3\1\disk2.dd
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
006B38C00 00000000 00000000 00000000 00000000  [................]
  Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]


C:\Users\Administrator>kfed read F:\temp\disk3\1\disk2.dd blkn=2
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL
kfbh.datfmt:                          2 ; 0x003: 0x02
kfbh.block.blk:                       2 ; 0x004: blk=2
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                  2349305287 ; 0x00c: 0x8c078dc7
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdatb.aunum:                         0 ; 0x000: 0x00000000
kfdatb.shrink:                      448 ; 0x004: 0x01c0
kfdatb.ub2pad:                        0 ; 0x006: 0x0000
kfdatb.auinfo[0].link.next:           8 ; 0x008: 0x0008
kfdatb.auinfo[0].link.prev:           8 ; 0x00a: 0x0008
kfdatb.auinfo[1].link.next:          12 ; 0x00c: 0x000c
kfdatb.auinfo[1].link.prev:          12 ; 0x00e: 0x000c
kfdatb.auinfo[2].link.next:         456 ; 0x010: 0x01c8
kfdatb.auinfo[2].link.prev:         456 ; 0x012: 0x01c8
kfdatb.auinfo[3].link.next:         488 ; 0x014: 0x01e8
kfdatb.auinfo[3].link.prev:         488 ; 0x016: 0x01e8
kfdatb.auinfo[4].link.next:          24 ; 0x018: 0x0018
kfdatb.auinfo[4].link.prev:          24 ; 0x01a: 0x0018
kfdatb.auinfo[5].link.next:          28 ; 0x01c: 0x001c
kfdatb.auinfo[5].link.prev:          28 ; 0x01e: 0x001c
kfdatb.auinfo[6].link.next:         552 ; 0x020: 0x0228
kfdatb.auinfo[6].link.prev:        3112 ; 0x022: 0x0c28
kfdatb.spare:                         0 ; 0x024: 0x00000000
kfdate[0].discriminator:              1 ; 0x028: 0x00000001
kfdate[0].allo.lo:                    0 ; 0x028: XNUM=0x0
kfdate[0].allo.hi:              8388608 ; 0x02c: V=1 I=0 H=0 FNUM=0x0
kfdate[1].discriminator:              1 ; 0x030: 0x00000001
kfdate[1].allo.lo:                    0 ; 0x030: XNUM=0x0
kfdate[1].allo.hi:              8388608 ; 0x034: V=1 I=0 H=0 FNUM=0x0
kfdate[2].discriminator:              1 ; 0x038: 0x00000001
kfdate[2].allo.lo:                    0 ; 0x038: XNUM=0x0
kfdate[2].allo.hi:              8388609 ; 0x03c: V=1 I=0 H=0 FNUM=0x1

fra磁盘虽然磁盘asm label信息存在,但是其他信息依旧损坏,但是也只是磁盘头信息损坏
20200705203336


通过现场分析,基本上可以确定是由于某种原因导致win asm 的磁盘的所有磁盘头都损坏(两个磁盘头被置空,另外一个磁盘头基本上损坏),基于原因未知
基于客户现场的情况,以及他们有前一天的rman备份,而且客户有保障现场(进一步故障原因分析)的需求,未在现场环境进行恢复,而是在不对现场环境做任何修改的情况下,直接恢复fra里面的redo和归档日志,进而结合备份异地实现数据库恢复,实现数据0丢失,又不破坏现场的效果
20200705203742

以前遇到过类似我其他操作系统平台中asm disk header异常的case:
asm磁盘分区丢失恢复
pvid=yes导致asm无法mount
asm磁盘头全部损坏数据0丢失恢复
分区无法识别导致asm diskgroup无法mount
asm disk误设置pvid导致asm diskgroup无法mount恢复