ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh]故障处理

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh]故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户对asm进行扩容,由于配置不恰当,在使用asmca增加asm disk的时候直接选中了已经被用作文件系统的vg中的磁盘

Tue Nov 19 09:48:48 2019
Non critical error ORA-48180 cFri Nov 22 12:47:48 2019
SQL> ALTER DISKGROUP XIFENFEI ADD  DISK '/dev/rhdisk29' SIZE 491520M ,
'/dev/rhdisk30' SIZE 491520M ,
'/dev/rhdisk31' SIZE 491520M /* ASMCA */ 
NOTE: GroupBlock outside rolling migration privileged region
NOTE: Assigning number (4,15) to disk (/dev/rhdisk29)
NOTE: Assigning number (4,16) to disk (/dev/rhdisk30)
NOTE: Assigning number (4,17) to disk (/dev/rhdisk31)
NOTE: requesting all-instance membership refresh for group=4
NOTE: initializing header on grp 4 disk XIFENFEI_0015
NOTE: initializing header on grp 4 disk XIFENFEI_0016
NOTE: initializing header on grp 4 disk XIFENFEI_0017
NOTE: requesting all-instance disk validation for group=4
Fri Nov 22 12:47:51 2019
NOTE: skipping rediscovery for group 4/0xb08c40b (XIFENFEI) on local instance.
NOTE: requesting all-instance disk validation for group=4
NOTE: skipping rediscovery for group 4/0xb08c40b (XIFENFEI) on local instance.
Fri Nov 22 12:47:59 2019
NOTE: initiating PST update: grp = 4
Fri Nov 22 12:47:59 2019
GMON updating group 4 at 12 for pid 27, osid 12649908
NOTE: PST update grp = 4 completed successfully 
NOTE: membership refresh pending for group 4/0xb08c40b (XIFENFEI)
GMON querying group 4 at 13 for pid 18, osid 39912680
Fri Nov 22 12:48:01 2019
NOTE: cache opening disk 15 of grp 4: XIFENFEI_0015 path:/dev/rhdisk29
NOTE: cache opening disk 16 of grp 4: XIFENFEI_0016 path:/dev/rhdisk30
NOTE: cache opening disk 17 of grp 4: XIFENFEI_0017 path:/dev/rhdisk31
NOTE: Attempting voting file refresh on diskgroup XIFENFEI
NOTE: Refresh completed on diskgroup XIFENFEI. No voting file found.
GMON querying group 4 at 14 for pid 18, osid 39912680
SUCCESS: refreshed membership for 4/0xb08c40b (XIFENFEI)
SUCCESS: ALTER DISKGROUP XIFENFEI ADD  DISK '/dev/rhdisk29' SIZE 491520M ,
'/dev/rhdisk30' SIZE 491520M ,
'/dev/rhdisk31' SIZE 491520M /* ASMCA */

发现增加错磁盘之后,从vg里面强制踢掉被asm使用的磁盘,并且尝试在asm中删除这些磁盘,并加入新磁盘

Fri Nov 22 12:52:03 2019
SQL> ALTER DISKGROUP XIFENFEI DROP  DISK 'XIFENFEI_0015','XIFENFEI_0016','XIFENFEI_0017' /* ASMCA */ 
NOTE: GroupBlock outside rolling migration privileged region
Fri Nov 22 12:52:03 2019
NOTE: stopping process ARB0
NOTE: rebalance interrupted for group 4/0xb08c40b (XIFENFEI)
NOTE: requesting all-instance membership refresh for group=4
NOTE: membership refresh pending for group 4/0xb08c40b (XIFENFEI)
Fri Nov 22 12:52:12 2019
GMON querying group 4 at 15 for pid 18, osid 39912680
SUCCESS: refreshed membership for 4/0xb08c40b (XIFENFEI)
SUCCESS: ALTER DISKGROUP XIFENFEI DROP  DISK 'XIFENFEI_0015','XIFENFEI_0016','XIFENFEI_0017' /* ASMCA */
NOTE: starting rebalance of group 4/0xb08c40b (XIFENFEI) at power 1
Starting background process ARB0
…………
Fri Nov 22 12:58:26 2019
SQL> ALTER DISKGROUP XIFENFEI ADD  DISK '/dev/rhdisk7' SIZE 491520M /* ASMCA */ 
NOTE: GroupBlock outside rolling migration privileged region
Fri Nov 22 12:58:26 2019
NOTE: stopping process ARB0
NOTE: rebalance interrupted for group 4/0xb08c40b (XIFENFEI)
NOTE: ASM did background COD recovery for group 4/0xb08c40b (XIFENFEI)
NOTE: Assigning number (4,18) to disk (/dev/rhdisk7)
NOTE: requesting all-instance membership refresh for group=4
NOTE: initializing header on grp 4 disk XIFENFEI_0018
NOTE: requesting all-instance disk validation for group=4
NOTE: skipping rediscovery for group 4/0xb08c40b (XIFENFEI) on local instance.
NOTE: requesting all-instance disk validation for group=4
NOTE: skipping rediscovery for group 4/0xb08c40b (XIFENFEI) on local instance.
Fri Nov 22 12:58:41 2019
NOTE: initiating PST update: grp = 4
Fri Nov 22 12:58:41 2019
GMON updating group 4 at 16 for pid 27, osid 12649908
NOTE: PST update grp = 4 completed successfully 
Fri Nov 22 12:58:41 2019
NOTE: membership refresh pending for group 4/0xb08c40b (XIFENFEI)
GMON querying group 4 at 17 for pid 18, osid 39912680
NOTE: cache opening disk 18 of grp 4: XIFENFEI_0018 path:/dev/rhdisk7
NOTE: Attempting voting file refresh on diskgroup XIFENFEI
NOTE: Refresh completed on diskgroup XIFENFEI. No voting file found.
GMON querying group 4 at 18 for pid 18, osid 39912680
SUCCESS: refreshed membership for 4/0xb08c40b (XIFENFEI)
NOTE: starting rebalance of group 4/0xb08c40b (XIFENFEI) at power 1
SUCCESS: ALTER DISKGROUP XIFENFEI ADD  DISK '/dev/rhdisk7' SIZE 491520M /* ASMCA */
Starting background process ARB0
Fri Nov 22 12:58:46 2019
ARB0 started with pid=44, OS id=54460432 
…………
Fri Nov 22 12:59:57 2019
SQL> ALTER DISKGROUP XIFENFEI ADD  DISK '/dev/rhdisk10' SIZE 491520M ,
'/dev/rhdisk11' SIZE 491520M ,
'/dev/rhdisk8' SIZE 491520M ,
'/dev/rhdisk9' SIZE 491520M /* ASMCA */ 
NOTE: GroupBlock outside rolling migration privileged region
Fri Nov 22 12:59:57 2019
NOTE: stopping process ARB0
NOTE: rebalance interrupted for group 4/0xb08c40b (XIFENFEI)
NOTE: ASM did background COD recovery for group 4/0xb08c40b (XIFENFEI)
NOTE: Assigning number (4,19) to disk (/dev/rhdisk10)
NOTE: Assigning number (4,20) to disk (/dev/rhdisk11)
NOTE: Assigning number (4,21) to disk (/dev/rhdisk8)
NOTE: Assigning number (4,22) to disk (/dev/rhdisk9)
NOTE: requesting all-instance membership refresh for group=4
NOTE: initializing header on grp 4 disk XIFENFEI_0019
NOTE: initializing header on grp 4 disk XIFENFEI_0020
NOTE: initializing header on grp 4 disk XIFENFEI_0021
NOTE: initializing header on grp 4 disk XIFENFEI_0022
NOTE: requesting all-instance disk validation for group=4
NOTE: skipping rediscovery for group 4/0xb08c40b (XIFENFEI) on local instance.
Fri Nov 22 13:00:08 2019
NOTE: requesting all-instance disk validation for group=4
Fri Nov 22 13:00:08 2019
NOTE: skipping rediscovery for group 4/0xb08c40b (XIFENFEI) on local instance.
NOTE: initiating PST update: grp = 4
Fri Nov 22 13:00:13 2019
GMON updating group 4 at 19 for pid 27, osid 12649908
NOTE: PST update grp = 4 completed successfully 
NOTE: membership refresh pending for group 4/0xb08c40b (XIFENFEI)
GMON querying group 4 at 20 for pid 18, osid 39912680
NOTE: cache opening disk 19 of grp 4: XIFENFEI_0019 path:/dev/rhdisk10
NOTE: cache opening disk 20 of grp 4: XIFENFEI_0020 path:/dev/rhdisk11
NOTE: cache opening disk 21 of grp 4: XIFENFEI_0021 path:/dev/rhdisk8
NOTE: cache opening disk 22 of grp 4: XIFENFEI_0022 path:/dev/rhdisk9
NOTE: Attempting voting file refresh on diskgroup XIFENFEI
NOTE: Refresh completed on diskgroup XIFENFEI. No voting file found.
GMON querying group 4 at 21 for pid 18, osid 39912680
SUCCESS: refreshed membership for 4/0xb08c40b (XIFENFEI)
SUCCESS: ALTER DISKGROUP XIFENFEI ADD  DISK '/dev/rhdisk10' SIZE 491520M ,
'/dev/rhdisk11' SIZE 491520M ,
'/dev/rhdisk8' SIZE 491520M ,
'/dev/rhdisk9' SIZE 491520M /* ASMCA */
NOTE: starting rebalance of group 4/0xb08c40b (XIFENFEI) at power 1
Starting background process ARB0

asm在做着reblance的过程中遭遇到坏块,直接导致磁盘组dismount

Sun Nov 24 04:42:27 2019
NOTE: group 4 PST updated.
WARNING: cache read  a corrupt block: group=4(XIFENFEI) dsk=15 blk=258 disk=15
 (XIFENFEI_0015) incarn=1717056824 au=113792 blk=2 count=254
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM2/trace/+ASM2_x000_28639240.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483663] [258] [56 != 0]
NOTE: a corrupted block from group XIFENFEI was dumped to
   /u01/app/oracle/diag/asm/+asm/+ASM2/trace/+ASM2_x000_28639240.trc
WARNING: cache read (retry) a corrupt block: group=4(XIFENFEI) 
 dsk=15 blk=258 disk=15 (XIFENFEI_0015) incarn=1717056824 au=113792 blk=2 count=1
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM2/trace/+ASM2_x000_28639240.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483663] [258] [56 != 0]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483663] [258] [56 != 0]
ERROR: cache failed to read group=4(XIFENFEI) dsk=15 blk=258 from disk(s): 15(XIFENFEI_0015)
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483663] [258] [56 != 0]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483663] [258] [56 != 0]
NOTE: cache initiating offline of disk 15 group XIFENFEI
NOTE: process _x000_+asm2 (28639240) initiating offline of disk 
    15.1717056824 (XIFENFEI_0015) with mask 0x7e in group 4
NOTE: initiating PST update: grp = 4, dsk = 15/0x66583538, mask = 0x6a, op = clear
GMON updating disk modes for group 4 at 23 for pid 28, osid 28639240
ERROR: Disk 15 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 4)
Sun Nov 24 04:42:27 2019
NOTE: cache dismounting (not clean) group 4/0x0B08C40B (XIFENFEI) 
WARNING: Offline for disk XIFENFEI_0015 in mode 0x7f failed.
Sun Nov 24 04:42:27 2019
NOTE: halting all I/Os to diskgroup 4 (XIFENFEI)
NOTE: messaging CKPT to quiesce pins Unix process pid: 59441780, image: oracle@xifenfei2 (B000)
Sun Nov 24 04:42:27 2019
ERROR: ORA-15130 thrown in ARB0 for group number 4
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM2/trace/+ASM2_arb0_50856926.trc:
ORA-15130: diskgroup "XIFENFEI" is being dismounted

至此两个节点的该磁盘组就陷入了不停的mount,然后dismount的轮流循环中.这里我们可以大概的分析出来,由于vg的磁盘组被写入了数据或者强制剔除的时候导致asm写入该文件的数据被破坏,导致后续的asm reblance遭遇坏块,然后直接dismount.对于该问题的解决方案,通过对对该磁盘组的acd和cod进行patch,让其不进行reblance,保持该磁盘组现在,稳定的mount状态,然后对其数据进行备份和重建该磁盘组.这个客户运气不错,vg中的asm disk磁盘写入较少,数据库运行正常.
对于这种情况,如果发生极端损坏,比如asm磁盘组无法mount,可以参考:找回ASM中数据文件
如果是asm的元数据大量损坏,无法通过asm字典级别恢复,可以通过参考:asm disk header 彻底损坏恢复

通过sql获取asm别名实际文件名

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:通过sql获取asm别名实际文件名

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

asm的别名机制带来了一些便利的同时有些时候也引入了一些麻烦,比如某些工具无法很好的识别别名,闲着写了sql直接获取别名和实际数据文件对应关系
直接查询数据文件
1-3


通过asmcmd中的ls命令查看

ASMCMD> ls -l +MGMT/ora18c/test01.dbf
Type      Redund  Striped  Time             Sys  Name
DATAFILE  UNPROT  COARSE   SEP 06 00:00:00  N    test01.dbf => +MGMT/ORA18C/DATAFILE/TEST16K.274.1016183943

但是如果太多,这样一个个替换效率太低,通过sql语句实现

获取实际数据文件
1-2


通过sql语句快速实现把数据文件路径中的别名转换为实际存储路径(omf方式存储)

ERROR: diskgroup XXXX was not mounted

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:ERROR: diskgroup XXXX was not mounted

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

aix平台10.2.0.5 2节点RAC,由于节点2系统盘故障,通过节点1镜像系统,复制到节点2,结果由于节点2磁盘顺序和节点1不匹配,aix工程师进行了相关操作之后,节点1重启之后datadg磁盘组无法mount

SQL> alter diskgroup datadg mount 
Mon Jun 10 23:23:46 CST 2019
NOTE: cache registered group DATADG number=1 incarn=0x8cf61164
Mon Jun 10 23:23:46 CST 2019
NOTE: Hbeat: instance first (grp 1)
Mon Jun 10 23:23:50 CST 2019
NOTE: start heartbeating (grp 1)
Mon Jun 10 23:23:50 CST 2019
NOTE: cache dismounting group 1/0x8CF61164 (DATADG) 
NOTE: dbwr not being msg'd to dismount
ERROR: diskgroup DATADG was not mounted

检查datadg磁盘组相关信息

Tue Jan 29 19:21:45 CST 2019
NOTE: start heartbeating (grp 2)
NOTE: cache opening disk 0 of grp 2: DATADG_0000 path:/dev/rhdisk6
Tue Jan 29 19:21:45 CST 2019
NOTE: F1X0 found on disk 0 fcn 0.0
NOTE: cache opening disk 1 of grp 2: DATADG_0001 path:/dev/rhdisk7
NOTE: cache opening disk 2 of grp 2: DATADG_0002 path:/dev/rhdisk8
NOTE: cache opening disk 3 of grp 2: DATADG_0003 path:/dev/rhdisk9
NOTE: cache mounting (first) group 2/0x60E59155 (DATADG)
* allocate domain 2, invalid = TRUE 
Tue Jan 29 19:21:45 CST 2019
NOTE: attached to recovery domain 2
Tue Jan 29 19:21:45 CST 2019
NOTE: cache recovered group 2 to fcn 0.849668
Tue Jan 29 19:21:45 CST 2019
NOTE: LGWR attempting to mount thread 1 for disk group 2
NOTE: LGWR mounted thread 1 for disk group 2
NOTE: opening chunk 1 at fcn 0.849668 ABA 
NOTE: seq=21 blk=5394 
Tue Jan 29 19:21:46 CST 2019
NOTE: cache mounting group 2/0x60E59155 (DATADG) succeeded
SUCCESS: diskgroup DATADG was mounted

通过这里可以看出来datadg磁盘组是由rhdisk6-9 四块磁盘组成,查询相关磁盘信息发现
asm-foreign


这里确定rhdisk7磁盘异常,通过kfed分析磁盘情况

D:\BaiduNetdiskDownload\xifenfei>kfed read rhdisk7.dd
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                           34 ; 0x001: 0x22
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                   49407 ; 0x004: blk=49407
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                    58396 ; 0x010: 0x0000e41c
kfbh.fcn.wrap:                   131072 ; 0x014: 0x00020000
kfbh.spare1:                 4294967064 ; 0x018: 0xffffff18
kfbh.spare2:                 2105310074 ; 0x01c: 0x7d7c7b7a
005918A00 00002200 0000C0FF 00000000 00000000  [."..............]
005918A10 0000E41C 00020000 FFFFFF18 7D7C7B7A  [............z{|}]
005918A20 00000000 00000000 00000000 00000000  [................]
  Repeat 253 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

D:\BaiduNetdiskDownload\xifenfei>kfed read rhdisk7.dd blkn=1
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
006EF8A00 00000000 00000000 00000000 00000000  [................]
  Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

D:\BaiduNetdiskDownload\xifenfei>kfed read rhdisk7.dd blkn=2|more
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                33554432 ; 0x004: blk=33554432
kfbh.block.obj:                16777344 ; 0x008: file=128
kfbh.check:                  3844041089 ; 0x00c: 0xe51f6981
kfbh.fcn.base:               1297484544 ; 0x010: 0x4d560b00
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdatb10.aunum:                       0 ; 0x000: 0x00000000
kfdatb10.shrink:                  49153 ; 0x004: 0xc001
kfdatb10.ub2pad:                  20555 ; 0x006: 0x504b
kfdatb10.auinfo[0].link.next:      2048 ; 0x008: 0x0800
kfdatb10.auinfo[0].link.prev:      2048 ; 0x00a: 0x0800
kfdatb10.auinfo[0].free:              0 ; 0x00c: 0x0000
kfdatb10.auinfo[0].total:         49153 ; 0x00e: 0xc001
kfdatb10.auinfo[1].link.next:      4096 ; 0x010: 0x1000
kfdatb10.auinfo[1].link.prev:      4096 ; 0x012: 0x1000
kfdatb10.auinfo[1].free:              0 ; 0x014: 0x0000
kfdatb10.auinfo[1].total:             0 ; 0x016: 0x0000
kfdatb10.auinfo[2].link.next:      6144 ; 0x018: 0x1800
kfdatb10.auinfo[2].link.prev:      6144 ; 0x01a: 0x1800
kfdatb10.auinfo[2].free:              0 ; 0x01c: 0x0000
kfdatb10.auinfo[2].total:             0 ; 0x01e: 0x0000
kfdatb10.auinfo[3].link.next:      8192 ; 0x020: 0x2000
kfdatb10.auinfo[3].link.prev:      8192 ; 0x022: 0x2000
kfdatb10.auinfo[3].free:              0 ; 0x024: 0x0000

对比磁盘可能的损坏情况,由于在aix 平台asm disk的block有一个特征一般0082开头,通过工具打开磁盘,检索该标记对比
正常磁盘
0082-good


异常磁盘
0082-had

通过上述分析,大概评估rhdisk7 元数据部分损坏的不光是block 0和1,人工修复继续使用的可能性不太大,而且基于客户的数据库不大,采取方案是直接拷贝数据文件、redo、控制文件到文件系统,然后在本地文件系统open库
open_db

运气不错,实现完美恢复数据0丢失

WARNING: Read Failed.导致asm磁盘组异常

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:WARNING: Read Failed.导致asm磁盘组异常

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户对asm dg进行扩容,一段时间之后,asm data 磁盘组直接dismount

Wed May 29 18:37:25 2019
SUCCESS: ALTER DISKGROUP DATA ADD  DISK '/dev/oracleasm/disks/DATA_0028' SIZE 511993M ,
'/dev/oracleasm/disks/DATA_0027' SIZE 511993M ,
'/dev/oracleasm/disks/DATA_0026' SIZE 511993M ,
'/dev/oracleasm/disks/DATA_0025' SIZE 511993M /* ASMCA */
NOTE: starting rebalance of group 1/0x9e18e2f1 (DATA) at power 1
Wed May 29 18:37:26 2019
Starting background process ARB0
Wed May 29 18:37:26 2019
ARB0 started with pid=34, OS id=96638 
NOTE: assigning ARB0 to group 1/0x9e18e2f1 (DATA) with 1 parallel I/O
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: Refresh completed on diskgroup DATA. No voting file found.
cellip.ora not found.
Wed May 29 19:21:43 2019
WARNING: Read Failed. group:1 disk:27 AU:0 offset:360448 size:4096
WARNING: cache failed reading from group=1(DATA) dsk=27 blk=88 count=1 from disk= 27 
(DATA_0027) kfkist=0x20 status=0x02 osderr=0x0 file=kfc.c line=11596
ERROR: cache failed to read group=1(DATA) dsk=27 blk=88 from disk(s): 27(DATA_0027)
ORA-15080: synchronous I/O operation to a disk failed
ORA-27072: File I/O error
Linux-x86_64 Error: 5: Input/output error
Additional information: 4
Additional information: 704
Additional information: -1
NOTE: cache initiating offline of disk 27 group DATA
NOTE: process _user31879_+asm1 (31879) initiating offline of disk 27.3915911747 (DATA_0027) with mask 0x7e in group 1
NOTE: initiating PST update: grp = 1, dsk = 27/0xe9681243, mask = 0x6a, op = clear
Wed May 29 19:21:43 2019
GMON updating disk modes for group 1 at 10 for pid 35, osid 31879
ERROR: Disk 27 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 1)
Wed May 29 19:21:43 2019
NOTE: cache dismounting (not clean) group 1/0x9E18E2F1 (DATA) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 90256, image: oracle@ftz-db-o1 (B000)
Wed May 29 19:21:43 2019
NOTE: halting all I/Os to diskgroup 1 (DATA)
WARNING: Offline for disk DATA_0027 in mode 0x7f failed.
Wed May 29 19:21:43 2019
NOTE: LGWR doing non-clean dismount of group 1 (DATA)
NOTE: LGWR sync ABA=27.3207 last written ABA 27.3207
Wed May 29 19:21:43 2019
ERROR: ORA-15130 thrown in ARB0 for group number 1
Errors in file /oracle/grid_base/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_96638.trc:
ORA-15130: diskgroup "" is being dismounted
ORA-15130: diskgroup "DATA" is being dismounted
Wed May 29 19:21:43 2019
NOTE: stopping process ARB0

后续继续mount data 磁盘组成功,但是立马又dismount

Wed May 29 18:37:25 2019
SUCCESS: ALTER DISKGROUP DATA ADD  DISK '/dev/oracleasm/disks/DATA_0028' SIZE 511993M ,
'/dev/oracleasm/disks/DATA_0027' SIZE 511993M ,
'/dev/oracleasm/disks/DATA_0026' SIZE 511993M ,
'/dev/oracleasm/disks/DATA_0025' SIZE 511993M /* ASMCA */
NOTE: starting rebalance of group 1/0x9e18e2f1 (DATA) at power 1
Wed May 29 18:37:26 2019
Starting background process ARB0
Wed May 29 18:37:26 2019
ARB0 started with pid=34, OS id=96638 
NOTE: assigning ARB0 to group 1/0x9e18e2f1 (DATA) with 1 parallel I/O
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: Refresh completed on diskgroup DATA. No voting file found.
cellip.ora not found.
Wed May 29 19:21:43 2019
WARNING: Read Failed. group:1 disk:27 AU:0 offset:360448 size:4096
WARNING: cache failed reading from group=1(DATA) dsk=27 blk=88 count=1 from disk= 27 
(DATA_0027) kfkist=0x20 status=0x02 osderr=0x0 file=kfc.c line=11596
ERROR: cache failed to read group=1(DATA) dsk=27 blk=88 from disk(s): 27(DATA_0027)
ORA-15080: synchronous I/O operation to a disk failed
ORA-27072: File I/O error
Linux-x86_64 Error: 5: Input/output error
Additional information: 4
Additional information: 704
Additional information: -1
NOTE: cache initiating offline of disk 27 group DATA
NOTE: process _user31879_+asm1 (31879) initiating offline of disk 27.3915911747 (DATA_0027) with mask 0x7e in group 1
NOTE: initiating PST update: grp = 1, dsk = 27/0xe9681243, mask = 0x6a, op = clear
Wed May 29 19:21:43 2019
GMON updating disk modes for group 1 at 10 for pid 35, osid 31879
ERROR: Disk 27 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 1)
Wed May 29 19:21:43 2019
NOTE: cache dismounting (not clean) group 1/0x9E18E2F1 (DATA) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 90256, image: oracle@ftz-db-o1 (B000)
Wed May 29 19:21:43 2019
NOTE: halting all I/Os to diskgroup 1 (DATA)
WARNING: Offline for disk DATA_0027 in mode 0x7f failed.
Wed May 29 19:21:43 2019
NOTE: LGWR doing non-clean dismount of group 1 (DATA)
NOTE: LGWR sync ABA=27.3207 last written ABA 27.3207
Wed May 29 19:21:43 2019
ERROR: ORA-15130 thrown in ARB0 for group number 1
Errors in file /oracle/grid_base/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_96638.trc:
ORA-15130: diskgroup "" is being dismounted
ORA-15130: diskgroup "DATA" is being dismounted
Wed May 29 19:21:43 2019
NOTE: stopping process ARB0

对于上述的故障现象,本质原因是由于asm 磁盘组增加新磁盘之后,开始做rebalance,但是由于遭遇到 27号盘上有IO读错误,使得asm磁盘组无法正常完成rebalance,因而data磁盘组无法稳定的mount。解决该问题思路,通过patch asm磁盘组,禁止rebalance,从而使得data磁盘组不再dismount,再进行后续恢复

ORA-600 kfrValAcd30 恢复

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:ORA-600 kfrValAcd30 恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户由于存储控制器损坏,修复控制器之后,asm无法正常mount,报ORA-600 kfrValAcd30错误,让我们提供技术支持
kfrValAcd30


asm alert日志报错

Wed Apr 03 16:50:57 2019
SQL> alter diskgroup data mount 
NOTE: cache registered group DATA number=1 incarn=0x14248741
NOTE: cache began mount (first) of group DATA number=1 incarn=0x14248741
NOTE: Assigning number (1,0) to disk (ORCL:DATAVOL1)
Wed Apr 03 16:51:03 2019
NOTE: start heartbeating (grp 1)
kfdp_query(DATA): 15 
kfdp_queryBg(): 15 
NOTE: cache opening disk 0 of grp 1: DATAVOL1 label:DATAVOL1
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache mounting (first) external redundancy group 1/0x14248741 (DATA)
Wed Apr 03 16:51:04 2019
* allocate domain 1, invalid = TRUE 
Wed Apr 03 16:51:04 2019
NOTE: attached to recovery domain 1
NOTE: starting recovery of thread=1 ckpt=27.2697 group=1 (DATA)
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_15951.trc  (incident=23394):
ORA-00600: internal error code, arguments: [kfrValAcd30], [DATA], [1], [27], [2697], [28], [2697], [], [], [], [], []
Incident details in: /u01/app/grid/diag/asm/+asm/+ASM2/incident/incdir_23394/+ASM2_ora_15951_i23394.trc
Abort recovery for domain 1
NOTE: crash recovery signalled OER-600
ERROR: ORA-600 signalled during mount of diskgroup DATA
ORA-00600: internal error code, arguments: [kfrValAcd30], [DATA], [1], [27], [2697], [28], [2697], [], [], [], [], []
ERROR: alter diskgroup data mount
NOTE: cache dismounting (clean) group 1/0x14248741 (DATA) 
NOTE: lgwr not being msg'd to dismount
freeing rdom 1
Wed Apr 03 16:51:05 2019
Sweep [inc][23394]: completed
Sweep [inc2][23394]: completed
Wed Apr 03 16:51:05 2019
Trace dumping is performing id=[cdmp_20190403165105]
NOTE: detached from domain 1
NOTE: cache dismounted group 1/0x14248741 (DATA) 
NOTE: cache ending mount (fail) of group DATA number=1 incarn=0x14248741
kfdp_dismount(): 16 
kfdp_dismountBg(): 16 
NOTE: De-assigning number (1,0) from disk (ORCL:DATAVOL1)
ERROR: diskgroup DATA was not mounted
NOTE: cache deleting context for group DATA 1/337938241

mos相关记录
参考:ORA-600 [KFRVALACD30] in ASM (Doc ID 2123013.1)
kfrValAcd30-mos


ORA-00600: internal error code, arguments: [kfrValAcd30]可能是bug或者硬件故障导致.基于客户的情况,最大可能就是由于硬件故障导致asm 磁盘组的acd无法正常进行,从而无法mount成功.如果运气好,通过kfed相关修复可以正常mount成功,运气不好可以通过底层进行恢复数据文件,从而最大限度恢复数据.