ORA-15335 ORA-15130 ORA-15066 ORA-15196

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-15335 ORA-15130 ORA-15066 ORA-15196

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户反馈,数据库无法正常启动,通过分析asm的alert日志发现,data磁盘组mount成功之后,没有一会儿自动dismount掉

Mon Sep 26 16:40:14 2022
SQL> /* ASMCMD */ALTER DISKGROUP data MOUNT  
NOTE: cache registered group DATA number=2 incarn=0x9dfa705f
NOTE: cache began mount (first) of group DATA number=2 incarn=0x9dfa705f
NOTE: Assigning number (2,1) to disk (/dev/oracleasm/disks/DATA02)
NOTE: Assigning number (2,0) to disk (/dev/oracleasm/disks/DATA01)
Mon Sep 26 16:40:20 2022
NOTE: GMON heartbeating for grp 2
GMON querying group 2 at 68 for pid 25, osid 14650
NOTE: cache opening disk 0 of grp 2: DATA_0000 path:/dev/oracleasm/disks/DATA01
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache opening disk 1 of grp 2: DATA_0001 path:/dev/oracleasm/disks/DATA02
NOTE: cache mounting (first) external redundancy group 2/0x9DFA705F (DATA)
Mon Sep 26 16:40:20 2022
* allocate domain 2, invalid = TRUE 
kjbdomatt send to inst 2
Mon Sep 26 16:40:20 2022
NOTE: attached to recovery domain 2
NOTE: cache recovered group 2 to fcn 0.321845
NOTE: redo buffer size is 256 blocks (1053184 bytes)
Mon Sep 26 16:40:20 2022
NOTE: LGWR attempting to mount thread 1 for diskgroup 2 (DATA)
NOTE: LGWR found thread 1 closed at ABA 20.3546
NOTE: LGWR mounted thread 1 for diskgroup 2 (DATA)
NOTE: LGWR opening thread 1 at fcn 0.321845 ABA 21.3547
NOTE: cache mounting group 2/0x9DFA705F (DATA) succeeded
NOTE: cache ending mount (success) of group DATA number=2 incarn=0x9dfa705f
Mon Sep 26 16:40:20 2022
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 2
SUCCESS: diskgroup DATA was mounted
SUCCESS: /* ASMCMD */ALTER DISKGROUP data MOUNT 
Mon Sep 26 16:40:22 2022
WARNING: failed to online diskgroup resource ora.DATA.dg (unable to communicate with CRSD/OHASD)
Mon Sep 26 16:40:47 2022
NOTE: client xff1:xff registered, osid 14742, mbr 0x0
Mon Sep 26 16:40:57 2022
WARNING: cache read  a corrupt block: group=2(DATA) dsk=1 blk=257 disk=1 (DATA_0001) 
incarn=3916071178 au=113792 blk=1 count=1
Errors in file /opt/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_14778.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
NOTE: a corrupted block from group DATA was dumped to /opt/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_14778.trc
WARNING: cache read (retry) a corrupt block: group=2(DATA) dsk=1 blk=257 
disk=1 (DATA_0001) incarn=3916071178 au=113792 blk=1 count=1
Errors in file /opt/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_14778.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
ERROR: cache failed to read group=2(DATA) dsk=1 blk=257 from disk(s): 1(DATA_0001)
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
NOTE: cache initiating offline of disk 1 group DATA
NOTE: process _user14778_+asm1 (14778) initiating offline of 
disk 1.3916071178 (DATA_0001) with mask 0x7e in group 2
NOTE: initiating PST update: grp = 2, dsk = 1/0xe96a810a, mask = 0x6a, op = clear
Mon Sep 26 16:40:58 2022
GMON updating disk modes for group 2 at 70 for pid 28, osid 14778
ERROR: Disk 1 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 2)
Mon Sep 26 16:40:58 2022
NOTE: cache dismounting (not clean) group 2/0x9DFA705F (DATA) 
WARNING: Offline for disk DATA_0001 in mode 0x7f failed.
NOTE: messaging CKPT to quiesce pins Unix process pid: 14782, image: oracle@oracle11grac1 (B000)
Mon Sep 26 16:40:58 2022
NOTE: halting all I/Os to diskgroup 2 (DATA)
Errors in file /opt/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_14778.trc  (incident=144548):
ORA-15335: ASM metadata corruption detected in disk group 'DATA'
ORA-15130: diskgroup "DATA" is being dismounted
ORA-15066: offlining disk "DATA_0001" in group "DATA" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
Incident details in: /opt/grid/diag/asm/+asm/+ASM1/incident/incdir_144548/+ASM1_ora_14778_i144548.trc
Mon Sep 26 16:40:58 2022
Sweep [inc][144548]: completed
System State dumped to trace file /opt/grid/diag/asm/+asm/+ASM1/incident/incdir_144548/+ASM1_ora_14778_i144548.trc
Mon Sep 26 16:40:58 2022
NOTE: AMDU dump of disk group DATA created at /opt/grid/diag/asm/+asm/+ASM1/incident/incdir_144548
Mon Sep 26 16:41:00 2022
NOTE: LGWR doing non-clean dismount of group 2 (DATA)
NOTE: LGWR sync ABA=21.3550 last written ABA 21.3550
Mon Sep 26 16:41:00 2022
Sweep [inc2][144548]: completed
Mon Sep 26 16:41:00 2022
ERROR: ORA-15130 in COD recovery for diskgroup 2/0x9dfa705f (DATA)
ERROR: ORA-15130 thrown in RBAL for group number 2
Errors in file /opt/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_5162.trc:
ORA-15130: diskgroup "DATA" is being dismounted

这里看主要是由于asm 磁盘组需要做COD recovery导致无法正常稳定的mount,主要原因是遭遇到asm disk的逻辑坏块(存储物理上看是ok的,但是实际数据在asm中看是异常的)

数据库alert日志报错

Mon Sep 26 16:40:52 2022
Successful mount of redo thread 1, with mount id 1097279951
Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
Lost write protection disabled
Completed: alter database mount
alter database open
This instance was first to open
Picked broadcast on commit scheme to generate SCNs
LGWR: STARTING ARCH PROCESSES
Mon Sep 26 16:40:56 2022
ARC0 started with pid=40, OS id=14761 
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
Mon Sep 26 16:40:57 2022
ARC1 started with pid=41, OS id=14764 
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_lgwr_14479.trc:
ORA-00313: ??????? 1 (???? 1) ???
Mon Sep 26 16:40:57 2022
ARC2 started with pid=42, OS id=14766 
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_lgwr_14479.trc:
ORA-00313: ??????? 2 (???? 1) ???
Mon Sep 26 16:40:57 2022
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_14732.trc:
ORA-00313: open failed for members of log group 1 of thread 1
Mon Sep 26 16:40:57 2022
ARC3 started with pid=44, OS id=14770 
ARC1: Archival started
ARC2: Archival started
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
ARC2: Becoming the heartbeat ARCH
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_14732.trc:
ORA-00313: open failed for members of log group 1 of thread 1
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_arc2_14766.trc:
ORA-00313: 无法打开日志组 1 (用于线程 1) 的成员
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_arc1_14764.trc:
ORA-00313: 无法打开日志组 1 (用于线程 1) 的成员
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_14732.trc  (incident=180281):
ORA-15335: ASM metadata corruption detected in disk group 'DATA'
ORA-15130: diskgroup "DATA" is being dismounted
ORA-15066: offlining disk "DATA_0001" in group "DATA" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_arc0_14761.trc:
ORA-00313: 无法打开日志组 1 (用于线程 1) 的成员
ORA-00312: 联机日志 1 线程 1: '+DATA/xff/onlinelog/group_1.271.1025610215'
ORA-17503: ksfdopn: 2 未能打开文件 +DATA/xff/onlinelog/group_1.271.1025610215
ORA-15130: diskgroup "DATA" is being dismounted
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_arc3_14770.trc:
ORA-00313: 无法打开日志组 1 (用于线程 1) 的成员
ORA-00312: 联机日志 1 线程 1: '+DATA/xff/onlinelog/group_1.271.1025610215'
ORA-17503: ksfdopn: 2 未能打开文件 +DATA/xff/onlinelog/group_1.271.1025610215
ORA-15130: diskgroup "DATA" is being dismounted
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_arc0_14761.trc:
ORA-00313: 无法打开日志组 1 (用于线程 1) 的成员
ORA-00312: 联机日志 1 线程 1: '+DATA/xff/onlinelog/group_1.271.1025610215'
ORA-17503: ksfdopn: 2 未能打开文件 +DATA/xff/onlinelog/group_1.271.1025610215
ORA-15130: diskgroup "DATA" is being dismounted
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_arc3_14770.trc:
ORA-00313: 无法打开日志组 1 (用于线程 1) 的成员
ORA-00312: 联机日志 1 线程 1: '+DATA/xff/onlinelog/group_1.271.1025610215'
ORA-17503: ksfdopn: 2 未能打开文件 +DATA/xff/onlinelog/group_1.271.1025610215
ORA-15130: diskgroup "DATA" is being dismounted
Unable to create archive log file '+DATA'
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_14732.trc:
ORA-19816: WARNING: Files may exist in db_recovery_file_dest that are not known to database.
ORA-17502: ksfdcre:4 Failed to create file +DATA
ORA-15335: ASM metadata corruption detected in disk group 'DATA'
ORA-15130: diskgroup "DATA" is being dismounted
ORA-15066: offlining disk "DATA_0001" in group "DATA" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]
*************************************************************
WARNING: A file of type ARCHIVED LOG may exist in
db_recovery_file_dest that is not known to the database.
Use the RMAN command CATALOG RECOVERY AREA to re-catalog
any such files. If files cannot be cataloged, then manually
delete them using OS command. This is most likely the
result of a crash during file creation.
*************************************************************
ARCH: Error 19504 Creating archive log file to '+DATA'
NOTE: Deferred communication with ASM instance
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_14732.trc:
ORA-15130: diskgroup "DATA" is being dismounted
NOTE: deferred map free for map id 23
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_14732.trc:
ORA-16038: log 1 sequence# 14235 cannot be archived
ORA-19504: failed to create file ""
ORA-00312: online log 1 thread 1: '+DATA/xff/onlinelog/group_1.271.1025610215'
ORA-00312: online log 1 thread 1: '+ARCH/xff/onlinelog/group_1.279.1025610217'
Mon Sep 26 16:40:58 2022
Sweep [inc][180281]: completed
Sweep [inc2][180281]: completed
USER (ospid: 14732): terminating the instance due to error 16038
Mon Sep 26 16:40:59 2022
System state dump requested by (instance=1, osid=14732), summary=[abnormal instance termination].
Instance terminated by USER, pid = 14732

对于这类故障处理相对比较容易,通过patch asm,让data磁盘组稳定mount,然后open库,迁移数据,实现数据0丢失,完美恢复

ORA-15130: diskgroup “ORADATA” is being dismounted

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-15130: diskgroup “ORADATA” is being dismounted

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

磁盘组mount之后,立马又dismount

Sat Dec 25 17:48:45 2021
SQL> alter diskgroup ORADATA mount 
NOTE: cache registered group ORADATA number=5 incarn=0xd4b7ac6a
NOTE: cache began mount (first) of group ORADATA number=5 incarn=0xd4b7ac6a
NOTE: Assigning number (5,24) to disk (/dev/mapper/data31)
NOTE: Assigning number (5,26) to disk (/dev/mapper/data33)
NOTE: Assigning number (5,21) to disk (/dev/mapper/data29)
NOTE: Assigning number (5,23) to disk (/dev/mapper/data30)
NOTE: Assigning number (5,25) to disk (/dev/mapper/data32)
NOTE: Assigning number (5,19) to disk (/dev/mapper/data27)
NOTE: Assigning number (5,20) to disk (/dev/mapper/data28)
NOTE: Assigning number (5,18) to disk (/dev/mapper/data26)
NOTE: Assigning number (5,14) to disk (/dev/mapper/data22)
NOTE: Assigning number (5,17) to disk (/dev/mapper/data25)
NOTE: Assigning number (5,16) to disk (/dev/mapper/data24)
NOTE: Assigning number (5,15) to disk (/dev/mapper/data23)
NOTE: Assigning number (5,13) to disk (/dev/mapper/data21)
NOTE: Assigning number (5,12) to disk (/dev/mapper/data20)
NOTE: Assigning number (5,10) to disk (/dev/mapper/data19)
NOTE: Assigning number (5,9) to disk (/dev/mapper/data18)
NOTE: Assigning number (5,8) to disk (/dev/mapper/data17)
NOTE: Assigning number (5,3) to disk (/dev/mapper/data12)
NOTE: Assigning number (5,22) to disk (/dev/mapper/data3)
NOTE: Assigning number (5,2) to disk (/dev/mapper/data11)
NOTE: Assigning number (5,7) to disk (/dev/mapper/data16)
NOTE: Assigning number (5,28) to disk (/dev/mapper/data5)
NOTE: Assigning number (5,32) to disk (/dev/mapper/data9)
NOTE: Assigning number (5,6) to disk (/dev/mapper/data15)
NOTE: Assigning number (5,5) to disk (/dev/mapper/data14)
NOTE: Assigning number (5,4) to disk (/dev/mapper/data13)
NOTE: Assigning number (5,1) to disk (/dev/mapper/data10)
NOTE: Assigning number (5,30) to disk (/dev/mapper/data7)
NOTE: Assigning number (5,29) to disk (/dev/mapper/data6)
NOTE: Assigning number (5,31) to disk (/dev/mapper/data8)
NOTE: Assigning number (5,11) to disk (/dev/mapper/data2)
NOTE: Assigning number (5,27) to disk (/dev/mapper/data4)
NOTE: Assigning number (5,0) to disk (/dev/mapper/data1)
Sat Dec 25 17:48:52 2021
NOTE: GMON heartbeating for grp 5
GMON querying group 5 at 153 for pid 32, osid 68608
NOTE: cache opening disk 0 of grp 5: ORADATA_0000 path:/dev/mapper/data1
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache opening disk 1 of grp 5: ORADATA_0001 path:/dev/mapper/data10
NOTE: cache opening disk 2 of grp 5: ORADATA_0002 path:/dev/mapper/data11
NOTE: cache opening disk 3 of grp 5: ORADATA_0003 path:/dev/mapper/data12
NOTE: cache opening disk 4 of grp 5: ORADATA_0004 path:/dev/mapper/data13
NOTE: cache opening disk 5 of grp 5: ORADATA_0005 path:/dev/mapper/data14
NOTE: cache opening disk 6 of grp 5: ORADATA_0006 path:/dev/mapper/data15
NOTE: cache opening disk 7 of grp 5: ORADATA_0007 path:/dev/mapper/data16
NOTE: cache opening disk 8 of grp 5: ORADATA_0008 path:/dev/mapper/data17
NOTE: cache opening disk 9 of grp 5: ORADATA_0009 path:/dev/mapper/data18
NOTE: cache opening disk 10 of grp 5: ORADATA_0010 path:/dev/mapper/data19
NOTE: cache opening disk 11 of grp 5: ORADATA_0011 path:/dev/mapper/data2
NOTE: cache opening disk 12 of grp 5: ORADATA_0012 path:/dev/mapper/data20
NOTE: cache opening disk 13 of grp 5: ORADATA_0013 path:/dev/mapper/data21
NOTE: cache opening disk 14 of grp 5: ORADATA_0014 path:/dev/mapper/data22
NOTE: cache opening disk 15 of grp 5: ORADATA_0015 path:/dev/mapper/data23
NOTE: cache opening disk 16 of grp 5: ORADATA_0016 path:/dev/mapper/data24
NOTE: cache opening disk 17 of grp 5: ORADATA_0017 path:/dev/mapper/data25
NOTE: cache opening disk 18 of grp 5: ORADATA_0018 path:/dev/mapper/data26
NOTE: cache opening disk 19 of grp 5: ORADATA_0019 path:/dev/mapper/data27
NOTE: cache opening disk 20 of grp 5: ORADATA_0020 path:/dev/mapper/data28
NOTE: cache opening disk 21 of grp 5: ORADATA_0021 path:/dev/mapper/data29
NOTE: cache opening disk 22 of grp 5: ORADATA_0022 path:/dev/mapper/data3
NOTE: cache opening disk 23 of grp 5: ORADATA_0023 path:/dev/mapper/data30
NOTE: cache opening disk 24 of grp 5: ORADATA_0024 path:/dev/mapper/data31
NOTE: cache opening disk 25 of grp 5: ORADATA_0025 path:/dev/mapper/data32
NOTE: cache opening disk 26 of grp 5: ORADATA_0026 path:/dev/mapper/data33
NOTE: cache opening disk 27 of grp 5: ORADATA_0027 path:/dev/mapper/data4
NOTE: cache opening disk 28 of grp 5: ORADATA_0028 path:/dev/mapper/data5
NOTE: cache opening disk 29 of grp 5: ORADATA_0029 path:/dev/mapper/data6
NOTE: cache opening disk 30 of grp 5: ORADATA_0030 path:/dev/mapper/data7
NOTE: cache opening disk 31 of grp 5: ORADATA_0031 path:/dev/mapper/data8
NOTE: cache opening disk 32 of grp 5: ORADATA_0032 path:/dev/mapper/data9
NOTE: cache mounting (first) external redundancy group 5/0xD4B7AC6A (ORADATA)
Sat Dec 25 17:48:52 2021
* allocate domain 5, invalid = TRUE 
kjbdomatt send to inst 2
Sat Dec 25 17:48:52 2021
NOTE: attached to recovery domain 5
NOTE: starting recovery of thread=1 ckpt=92.6417 group=5 (ORADATA)
NOTE: advancing ckpt for group 5 (ORADATA) thread=1 ckpt=92.6418
NOTE: cache recovered group 5 to fcn 0.9502919
NOTE: redo buffer size is 256 blocks (1053184 bytes)
Sat Dec 25 17:48:52 2021
NOTE: LGWR attempting to mount thread 1 for diskgroup 5 (ORADATA)
NOTE: LGWR found thread 1 closed at ABA 92.6417
NOTE: LGWR mounted thread 1 for diskgroup 5 (ORADATA)
NOTE: LGWR opening thread 1 at fcn 0.9502919 ABA 93.6418
NOTE: cache mounting group 5/0xD4B7AC6A (ORADATA) succeeded
NOTE: cache ending mount (success) of group ORADATA number=5 incarn=0xd4b7ac6a
Sat Dec 25 17:48:53 2021
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 5
SUCCESS: diskgroup ORADATA was mounted
SUCCESS: alter diskgroup ORADATA mount
Sat Dec 25 17:48:53 2021
NOTE: diskgroup resource ora.ORADATA.dg is online
WARNING:cache read  a corrupt block: group=5(ORADATA)dsk=5 blk=2 disk=5(ORADATA_0005)incarn=2406 au=0 blk=2 count=1
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_48956.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483653] [2] [0 != 1]
NOTE: a corrupted block from group ORADATA was dumped to /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_48956.trc
WARNING:cache read(retry)a corrupt block:group=5(ORADATA)dsk=5 blk=2 disk=5(ORADATA_0005)incarn=2406 au=0 blk=2 count=1
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_48956.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483653] [2] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483653] [2] [0 != 1]
ERROR: cache failed to read group=5(ORADATA) dsk=5 blk=2 from disk(s): 5(ORADATA_0005)
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483653] [2] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483653] [2] [0 != 1]
NOTE: cache initiating offline of disk 5 group ORADATA
NOTE: process _rbal_+asm1 (48956) initiating offline of disk 5.240607694 (ORADATA_0005) with mask 0x7e in group 5
NOTE: initiating PST update: grp = 5, dsk = 5/0xe5761ce, mask = 0x6a, op = clear
GMON updating disk modes for group 5 at 155 for pid 18, osid 48956
ERROR: Disk 5 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 5)
Sat Dec 25 17:48:55 2021
NOTE: cache dismounting (not clean) group 5/0xD4B7AC6A (ORADATA) 
WARNING: Offline for disk ORADATA_0005 in mode 0x7f failed.
Sat Dec 25 17:48:55 2021
NOTE: halting all I/Os to diskgroup 5 (ORADATA)
NOTE: messaging CKPT to quiesce pins Unix process pid: 22744, image: oracle@wxzldb1 (B000)
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_48956.trc  (incident=1289754):
ORA-15335: ASM metadata corruption detected in disk group 'ORADATA'
ORA-15130: diskgroup "ORADATA" is being dismounted
ORA-15066: offlining disk "ORADATA_0005" in group "ORADATA" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483653] [2] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483653] [2] [0 != 1]
Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_1289754/+ASM1_rbal_48956_i1289754.trc
NOTE: LGWR doing non-clean dismount of group 5 (ORADATA)
NOTE: LGWR sync ABA=93.6418 last written ABA 93.6418
kjbdomdet send to inst 2
detach from dom 5, sending detach message to inst 2
Sat Dec 25 17:48:56 2021
List of instances:
 1 2
Dirty detach reconfiguration started (new ddet inc 1, cluster inc 4)
Sat Dec 25 17:48:56 2021
Sweep [inc][1289754]: completed
 Global Resource Directory partially frozen for dirty detach
* dirty detach - domain 5 invalid = TRUE 
 41 GCS resources traversed, 0 cancelled
Dirty Detach Reconfiguration complete
freeing rdom 5
System State dumped to trace file /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_1289754/+ASM1_rbal_48956_i1289754.trc
WARNING: dirty detached from domain 5
NOTE: cache dismounted group 5/0xD4B7AC6A (ORADATA) 

问题比较明显是由于disk=5 au=0 blk=2有问题导致磁盘组mount之后立马异常.通过kfed分析对应block情况

C:\Users\XFF>kfed read h:\temp\asmdisk\data14.dd|more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:              2147483653 ; 0x008: disk=5
kfbh.check:                   314993330 ; 0x00c: 0x12c66ab2
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr:         ORCLDISK ; 0x000: length=8
kfdhdb.driver.reserved[0]:            0 ; 0x008: 0x00000000
kfdhdb.driver.reserved[1]:            0 ; 0x00c: 0x00000000
kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
kfdhdb.compat:                186646528 ; 0x020: 0x0b200000
kfdhdb.dsknum:                        5 ; 0x024: 0x0005
kfdhdb.grptyp:                        1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:            ORADATA_0005 ; 0x028: length=12
kfdhdb.grpname:                 ORADATA ; 0x048: length=7
kfdhdb.fgname:             ORADATA_0005 ; 0x068: length=12

C:\Users\XFF>kfed read h:\temp\asmdisk\data14.dd aun=0 blkn=2|more
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
0066D8200 00000000 00000000 00000000 00000000  [................]
  Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

通过kfed分析,该block确实异常,该block主要记录au的分配信息,如果asm 磁盘组的空间不变化,不执行rebalance,一般不会主动访问该block,不访问该block磁盘组也就不会dismount,按照这个解决思路,通过patch解决,让oradata磁盘组不再执行rebalance和分配/回收空间即可一直稳定的mount
20211227210314


数据库直接open成功,实现数据0丢失
20211227210411

ORA-15335: ASM metadata corruption detected in disk group ‘DATA’

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-15335: ASM metadata corruption detected in disk group ‘DATA’

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

asm磁盘组增加磁盘进行扩容之后报ORA-15335: ASM metadata corruption detected in disk group ‘DATA’和ORA-15196: invalid ASM block header [kfc.c:26368] [check_kfbh] [2147483648] [7] [2183628676 != 686982479],磁盘组dismount,然后mount之后立马dismount掉.

Tue Jun 29 09:19:09 2021
SQL> ALTER DISKGROUP DATA ADD  DISK '/dev/raw/raw5' SIZE 102400M /* ASMCA */ 
NOTE: GroupBlock outside rolling migration privileged region
NOTE: Assigning number (2,1) to disk (/dev/raw/raw5)
NOTE: requesting all-instance membership refresh for group=2
NOTE: initializing header on grp 2 disk DATA_0001
NOTE: requesting all-instance disk validation for group=2
Tue Jun 29 09:19:11 2021
NOTE: skipping rediscovery for group 2/0xb0c845ce (DATA) on local instance.
NOTE: requesting all-instance disk validation for group=2
NOTE: skipping rediscovery for group 2/0xb0c845ce (DATA) on local instance.
NOTE: initiating PST update: grp = 2
Tue Jun 29 09:19:16 2021
GMON updating group 2 at 7 for pid 27, osid 25020
NOTE: PST update grp = 2 completed successfully 
NOTE: membership refresh pending for group 2/0xb0c845ce (DATA)
GMON querying group 2 at 8 for pid 18, osid 3852
NOTE: cache opening disk 1 of grp 2: DATA_0001 path:/dev/raw/raw5
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: Refresh completed on diskgroup DATA. No voting file found.
GMON querying group 2 at 9 for pid 18, osid 3852
SUCCESS: refreshed membership for 2/0xb0c845ce (DATA)
Tue Jun 29 09:19:20 2021
SUCCESS: ALTER DISKGROUP DATA ADD  DISK '/dev/raw/raw5' SIZE 102400M /* ASMCA */
NOTE: starting rebalance of group 2/0xb0c845ce (DATA) at power 1
Starting background process ARB0
Tue Jun 29 09:19:21 2021
ARB0 started with pid=33, OS id=25176 
NOTE: assigning ARB0 to group 2/0xb0c845ce (DATA) with 1 parallel I/O
cellip.ora not found.
Tue Jun 29 09:19:24 2021
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: Refresh completed on diskgroup DATA. No voting file found.
Tue Jun 29 09:19:46 2021
WARNING: cache read  a corrupt block: group=2(DATA) dsk=0 blk=7 disk=0 (DATA_0000) incarn=3915953476 au=0 blk=7 count=1
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_25176.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [check_kfbh] [2147483648] [7] [2183628676 != 686982479]
NOTE: a corrupted block from group DATA was dumped to /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_25176.trc
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_25176.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [check_kfbh] [2147483648] [7] [2183628676 != 686982479]
ORA-15196: invalid ASM block header [kfc.c:26368] [check_kfbh] [2147483648] [7] [2183628676 != 686982479]
ERROR: cache failed to read group=2(DATA) dsk=0 blk=7 from disk(s): 0(DATA_0000)
ORA-15196: invalid ASM block header [kfc.c:26368] [check_kfbh] [2147483648] [7] [2183628676 != 686982479]
ORA-15196: invalid ASM block header [kfc.c:26368] [check_kfbh] [2147483648] [7] [2183628676 != 686982479]
NOTE: cache initiating offline of disk 0 group DATA
NOTE: process _arb0_+asm1 (25176) initiating offline of disk 0.3915953476 (DATA_0000) with mask 0x7e in group 2
NOTE: initiating PST update: grp = 2, dsk = 0/0xe968b544, mask = 0x6a, op = clear
Tue Jun 29 09:19:46 2021
GMON updating disk modes for group 2 at 10 for pid 33, osid 25176
ERROR: Disk 0 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 2)
Tue Jun 29 09:19:46 2021
NOTE: cache dismounting (not clean) group 2/0xB0C845CE (DATA) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 25395, image: oracle@frsrac1 (B000)
Tue Jun 29 09:19:46 2021
NOTE: halting all I/Os to diskgroup 2 (DATA)
Tue Jun 29 09:19:46 2021
NOTE: LGWR doing non-clean dismount of group 2 (DATA)
NOTE: LGWR sync ABA=11.10715 last written ABA 11.10715
WARNING: Offline for disk DATA_0000 in mode 0x7f failed.
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_25176.trc  (incident=54665):
ORA-15335: ASM metadata corruption detected in disk group 'DATA'
ORA-15130: diskgroup "DATA" is being dismounted
ORA-15066: offlining disk "DATA_0000" in group "DATA" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [check_kfbh] [2147483648] [7] [2183628676 != 686982479]
ORA-15196: invalid ASM block header [kfc.c:26368] [check_kfbh] [2147483648] [7] [2183628676 != 686982479]
Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_54665/+ASM1_arb0_25176_i54665.trc
Tue Jun 29 09:19:46 2021
kjbdomdet send to inst 2
detach from dom 2, sending detach message to inst 2
Tue Jun 29 09:19:46 2021
List of instances:
 1 2
Dirty detach reconfiguration started (new ddet inc 1, cluster inc 24)
 Global Resource Directory partially frozen for dirty detach
* dirty detach - domain 2 invalid = TRUE 
 796 GCS resources traversed, 0 cancelled
Dirty Detach Reconfiguration complete
Tue Jun 29 09:19:46 2021
WARNING: dirty detached from domain 2
NOTE: cache dismounted group 2/0xB0C845CE (DATA) 
SQL> alter diskgroup DATA dismount force /* ASM SERVER:2965915086 */ 
Tue Jun 29 09:19:47 2021
ERROR: ORA-15130 thrown in ARB0 for group number 2
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_25176.trc:
ORA-15130: diskgroup "DATA" is being dismounted
ORA-15335: ASM metadata corruption detected in disk group 'DATA'
ORA-15130: diskgroup "DATA" is being dismounted
ORA-15066: offlining disk "DATA_0000" in group "DATA" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [check_kfbh] [2147483648] [7] [2183628676 != 686982479]
ORA-15196: invalid ASM block header [kfc.c:26368] [check_kfbh] [2147483648] [7] [2183628676 != 686982479]
Tue Jun 29 09:19:47 2021
NOTE: stopping process ARB0
Tue Jun 29 09:19:47 2021
Sweep [inc][54665]: completed
Tue Jun 29 09:19:47 2021
Sweep [inc2][54665]: completed
NOTE: cache deleting context for group DATA 2/0xb0c845ce
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_3852.trc:
ORA-15130: diskgroup "DATA" is being dismounted
GMON dismounting group 2 at 11 for pid 27, osid 25395
NOTE: Disk DATA_0000 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0001 in mode 0x7f marked for de-assignment
SUCCESS: diskgroup DATA was dismounted
SUCCESS: alter diskgroup DATA dismount force /* ASM SERVER:2965915086 */

通过kfed分析报错block,确认错误

kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL
kfbh.datfmt:                          2 ; 0x003: 0x02
kfbh.block.blk:                       7 ; 0x004: blk=7
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                  2183628676 ; 0x00c: 0x82278784 <<======该值错误,应该为:686982479
kfbh.fcn.base:                     3430 ; 0x010: 0x00000d66
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdatb.aunum:                      2240 ; 0x000: 0x000008c0
kfdatb.shrink:                      448 ; 0x004: 0x01c0
kfdatb.ub2pad:                        0 ; 0x006: 0x0000

通过修复该错误,并且禁止reblance操作[增加磁盘数据需要重新分布],mount磁盘组,然后open库,发现redo已经被覆盖(非归档),强制打开库报错

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [2662], [0], [2691201882], [0],
[2691227745], [12583040], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [2662], [0], [2691201881], [0],
[2691227745], [12583040], [], [], [], [], [], []
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [2662], [0], [2691201879], [0],
[2691227745], [12583040], [], [], [], [], [], []
Process ID: 25110
Session ID: 287 Serial number: 3

通过对scn进行处理,数据库顺利open

SQL> startup mount pfile='/tmp/pfile';
ORACLE instance started.

Total System Global Area 5044088832 bytes
Fixed Size		    2261928 bytes
Variable Size		 1442843736 bytes
Database Buffers	 3590324224 bytes
Redo Buffers		    8658944 bytes
Database mounted.
SQL> alter database open;

Database altered.

ORA-15196: invalid ASM block header [kfc.c:26368]故障恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-15196: invalid ASM block header [kfc.c:26368]故障恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户对asm的data磁盘组增加磁盘进行扩容,在做reblance的过程中重启了主机,结果导致data磁盘组mount之后自动dismount

Fri Oct 09 20:48:06 2020
NOTE: PST enabling heartbeating (grp 1)
Fri Oct 09 20:48:06 2020
NOTE: ASM did background COD recovery for group 1/0x739536c (DATA)
NOTE: starting rebalance of group 1/0x739536c (DATA) at power 10
Starting background process ARB0
Fri Oct 09 20:48:07 2020
ARB0 started with pid=28, OS id=39278 
NOTE: assigning ARB0 to group 1/0x739536c (DATA) with 10 parallel I/Os
cellip.ora not found.
WARNING:cache read a corrupt block:group=1(DATA) dsk=8 blk=7 disk=8(DATA_0008)incarn=3916014506 au=0 blk=7 count=1
Errors in file /u01/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_39278.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [check_kfbh] [2147483656] [7] [2182009786 != 2190395015]
NOTE: a corrupted block from group DATA was dumped to /u01/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_39278.trc
WARNING:cache read(retry)a corrupt block:group=1(DATA) dsk=8 blk=7 disk=8(DATA_0008)incarn=3916014506 au=0 blk=7 count=1
Errors in file /u01/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_39278.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [check_kfbh] [2147483656] [7] [2182009786 != 2190395015]
ORA-15196: invalid ASM block header [kfc.c:26368] [check_kfbh] [2147483656] [7] [2182009786 != 2190395015]
ERROR: cache failed to read group=1(DATA) dsk=8 blk=7 from disk(s): 8(DATA_0008)
Fri Oct 09 20:48:13 2020
NOTE: GroupBlock outside rolling migration privileged region
ORA-15196: invalid ASM block header [kfc.c:26368] [check_kfbh] [2147483656] [7] [2182009786 != 2190395015]
ORA-15196: invalid ASM block header [kfc.c:26368] [check_kfbh] [2147483656] [7] [2182009786 != 2190395015]
NOTE: requesting all-instance membership refresh for group=1
NOTE: cache initiating offline of disk 8 group DATA
NOTE: process _arb0_+asm1 (39278) initiating offline of disk 8.3916014506 (DATA_0008) with mask 0x7e in group 1
NOTE: initiating PST update: grp = 1, dsk = 8/0xe969a3aa, mask = 0x6a, op = clear
GMON updating disk modes for group 1 at 7 for pid 28, osid 39278
ERROR: Disk 8 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 1)
Fri Oct 09 20:48:13 2020
NOTE: cache dismounting (not clean) group 1/0x0739536C (DATA) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 39346, image: oracle@rac1 (B000)
Fri Oct 09 20:48:13 2020
NOTE: halting all I/Os to diskgroup 1 (DATA)
Fri Oct 09 20:48:13 2020
NOTE: LGWR doing non-clean dismount of group 1 (DATA)
NOTE: LGWR sync ABA=32.4749 last written ABA 32.4749
WARNING: Offline for disk DATA_0008 in mode 0x7f failed.
Fri Oct 09 20:48:13 2020
kjbdomdet send to inst 2
detach from dom 1, sending detach message to inst 2
Fri Oct 09 20:48:13 2020
List of instances:
 1 2
Dirty detach reconfiguration started (new ddet inc 2, cluster inc 4)
Errors in file /u01/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_39278.trc  (incident=337185):
ORA-15335: ASM metadata corruption detected in disk group 'DATA'
ORA-15130: diskgroup "DATA" is being dismounted
ORA-15066: offlining disk "DATA_0008" in group "DATA" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [check_kfbh] [2147483656] [7] [2182009786 != 2190395015]
ORA-15196: invalid ASM block header [kfc.c:26368] [check_kfbh] [2147483656] [7] [2182009786 != 2190395015]
Incident details in: /u01/grid/diag/asm/+asm/+ASM1/incident/incdir_337185/+ASM1_arb0_39278_i337185.trc
 Global Resource Directory partially frozen for dirty detach
* dirty detach - domain 1 invalid = TRUE 
 2341 GCS resources traversed, 0 cancelled
Dirty Detach Reconfiguration complete
freeing rdom 1
Fri Oct 09 20:48:13 2020
WARNING: dirty detached from domain 1
NOTE: cache dismounted group 1/0x0739536C (DATA) 

错误信息比较明显dsk=8 blk=7 au=0 blk=7 的check值不对,本来应该是2190395015现在变为了2182009786,通过kfed分析确实如此

C:\Users\Administrator>kfed read f:/temp/xff/2.dd blkn=7|more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL
kfbh.datfmt:                          2 ; 0x003: 0x02
kfbh.block.blk:                       7 ; 0x004: blk=7
kfbh.block.obj:              2147483656 ; 0x008: disk=8
kfbh.check:                  2182009786 ; 0x00c: 0x820ed3ba
kfbh.fcn.base:                  2711248 ; 0x010: 0x00295ed0
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdatb.aunum:                      2240 ; 0x000: 0x000008c0
kfdatb.shrink:                      448 ; 0x004: 0x01c0
kfdatb.ub2pad:                        0 ; 0x006: 0x0000
kfdatb.auinfo[0].link.next:           8 ; 0x008: 0x0008
kfdatb.auinfo[0].link.prev:           8 ; 0x00a: 0x0008
kfdatb.auinfo[1].link.next:          12 ; 0x00c: 0x000c
kfdatb.auinfo[1].link.prev:          12 ; 0x00e: 0x000c
kfdatb.auinfo[2].link.next:          16 ; 0x010: 0x0010
kfdatb.auinfo[2].link.prev:          16 ; 0x012: 0x0010
kfdatb.auinfo[3].link.next:          20 ; 0x014: 0x0014
kfdatb.auinfo[3].link.prev:          20 ; 0x016: 0x0014
kfdatb.auinfo[4].link.next:          24 ; 0x018: 0x0018
kfdatb.auinfo[4].link.prev:          24 ; 0x01a: 0x0018
kfdatb.auinfo[5].link.next:          28 ; 0x01c: 0x001c
kfdatb.auinfo[5].link.prev:          28 ; 0x01e: 0x001c
kfdatb.auinfo[6].link.next:          32 ; 0x020: 0x0020
kfdatb.auinfo[6].link.prev:          32 ; 0x022: 0x0020
kfdatb.spare:                         0 ; 0x024: 0x00000000

修改该值之后,再次mount data磁盘组,报错如下

Sat Oct 10 13:49:22 2020
ARB0 started with pid=28, OS id=10329 
NOTE: assigning ARB0 to group 1/0x3759521c (DATA) with 10 parallel I/Os
cellip.ora not found.
Sat Oct 10 13:49:26 2020
NOTE: GroupBlock outside rolling migration privileged region
NOTE: requesting all-instance membership refresh for group=1
WARNING: cache read  a corrupt block: group=1(DATA) dsk=8 blk=8 disk=8 (DATA_0008) incarn=3916014011 au=0 blk=8 count=1
Errors in file /u01/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_10329.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483656] [8] [0 != 1]
NOTE: a corrupted block from group DATA was dumped to /u01/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_10329.trc
WARNING:cache read(retry)a corrupt block: group=1(DATA)dsk=8 blk=8 disk=8(DATA_0008)incarn=3916014011 au=0 blk=8 count=1
Errors in file /u01/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_10329.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483656] [8] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483656] [8] [0 != 1]
ERROR: cache failed to read group=1(DATA) dsk=8 blk=8 from disk(s): 8(DATA_0008)
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483656] [8] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483656] [8] [0 != 1]
NOTE: cache initiating offline of disk 8 group DATA
NOTE: process _arb0_+asm1 (10329) initiating offline of disk 8.3916014011 (DATA_0008) with mask 0x7e in group 1
NOTE: initiating PST update: grp = 1, dsk = 8/0xe969a1bb, mask = 0x6a, op = clear
GMON updating disk modes for group 1 at 64 for pid 28, osid 10329
ERROR: Disk 8 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 1)
Sat Oct 10 13:49:28 2020
NOTE: cache dismounting (not clean) group 1/0x3759521C (DATA) 
WARNING: Offline for disk DATA_0008 in mode 0x7f failed.
Sat Oct 10 13:49:28 2020
NOTE: halting all I/Os to diskgroup 1 (DATA)
NOTE: messaging CKPT to quiesce pins Unix process pid: 10346, image: oracle@rac1 (B000)
Errors in file /u01/grid/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_10329.trc  (incident=363107):
ORA-15335: ASM metadata corruption detected in disk group 'DATA'
ORA-15130: diskgroup "DATA" is being dismounted
ORA-15066: offlining disk "DATA_0008" in group "DATA" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483656] [8] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483656] [8] [0 != 1]
Incident details in: /u01/grid/diag/asm/+asm/+ASM1/incident/incdir_363107/+ASM1_arb0_10329_i363107.trc

该报错为:dsk=8 blk=7 au=0 blk=8异常,通过kfed查看发现

C:\Users\Administrator>kfed read f:/temp/xff/2.dd blkn=8
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
006BE8C00 00000000 00000000 00000000 00000000  [................]
        Repeat 31 times
006BE8E00 012C0000 04AFFC07 003BFFCD 03F15BD0  [..,.......;..[..]
006BE8E10 012BFC30 00000000 00000002 00000002  [0.+.............]
006BE8E20 00008000 00008000 00002000 564F22AF  [......... ..."OV]
006BE8E30 5F805293 FFFF0002 0001EF53 00000001  [.R._....S.......]
006BE8E40 545AE384 00000000 00000000 00000001  [..ZT............]
006BE8E50 00000000 0000000B 00000100 0000003C  [............<...]
006BE8E60 00000242 0000007B 52438BA0 C44FFA90  [B...{.....CR..O.]
006BE8E70 33B6F381 919E2DBA 00000000 00000000  [...3.-..........]
006BE8E80 00000000 00000000 6361622F 0070756B  [......../backup.]
006BE8E90 00000000 00000000 00000000 00000000  [................]
        Repeat 2 times
006BE8EC0 00000000 00000000 00000000 03ED0000  [................]
006BE8ED0 00000000 00000000 00000000 00000000  [................]
006BE8EE0 00000008 00000000 00000000 AC3C87D6  [..............<.]
006BE8EF0 F1401174 F4F036BD 274FB92F 00000101  [t.@..6../.O'....]
006BE8F00 0000000C 00000000 545AE384 0002F30A  [..........ZT....]
006BE8F10 00000004 00000000 00000000 00007FFF  [................]
006BE8F20 02508000 00007FFF 00000001 0250FFFF  [..P...........P.]
006BE8F30 00000000 00000000 00000000 00000000  [................]
006BE8F40 00000000 00000000 00000000 08000000  [................]
006BE8F50 00000000 00000000 00000000 001C001C  [................]
006BE8F60 00000001 00000000 00000000 00000000  [................]
006BE8F70 00000000 00000004 A9AF72B9 0000003B  [.........r..;...]
006BE8F80 00000000 00000000 00000000 00000000  [................]
        Repeat 167 times
006BE9A00 00001CC4 00800101 00001CC9 00800101  [................]
006BE9A10 00001CCD 00800101 00001CD2 00800101  [................]
006BE9A20 00001CD7 00800101 00001CDE 00800101  [................]
006BE9A30 00001CE3 00800101 00001CE8 00800101  [................]
006BE9A40 00001CEC 00800101 00000000 00000000  [................]
006BE9A50 00000000 00000000 00000000 00000000  [................]
  Repeat 26 times
KFED-00322:Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

该block完全损坏,基本上无直接修复的可能,通过对data 磁盘组进行patch操作,让其mount之后不再dismount

NOTE: GMON heartbeating for grp 1
GMON querying group 1 at 76 for pid 27, osid 14466
NOTE: cache opening disk 0 of grp 1: DATA_0000 path:/dev/emcpowere
NOTE: F1X0 found on disk 0 au 2 fcn 0.2708382
NOTE: cache opening disk 1 of grp 1: DATA_0001 path:/dev/emcpowerf
NOTE: cache opening disk 2 of grp 1: DATA_0002 path:/dev/emcpowerg
NOTE: cache opening disk 3 of grp 1: DATA_0003 path:/dev/emcpowerh
NOTE: cache opening disk 4 of grp 1: DATA_0004 path:/dev/emcpoweri
NOTE: cache opening disk 5 of grp 1: DATA_0005 path:/dev/emcpowerj
NOTE: cache opening disk 6 of grp 1: DATA_0006 path:/dev/emcpowerk
NOTE: cache opening disk 7 of grp 1: DATA_0007 path:/dev/emcpowerl
NOTE: cache opening disk 8 of grp 1: DATA_0008 path:/dev/emcpowerc
NOTE: cache mounting (first) external redundancy group 1/0x47495222 (DATA)
Sat Oct 10 13:59:38 2020
* allocate domain 1, invalid = TRUE 
Sat Oct 10 13:59:38 2020
NOTE: attached to recovery domain 1
NOTE: starting recovery of thread=1 ckpt=53.6778 group=1 (DATA)
NOTE: advancing ckpt for group 1 (DATA) thread=1 ckpt=53.6778
NOTE: cache recovered group 1 to fcn 0.2961429
NOTE: redo buffer size is 256 blocks (1053184 bytes)
Sat Oct 10 13:59:38 2020
NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (DATA)
NOTE: LGWR found thread 1 closed at ABA 53.6777
NOTE: LGWR mounted thread 1 for diskgroup 1 (DATA)
NOTE: LGWR opening thread 1 at fcn 0.2961429 ABA 54.6778
NOTE: cache mounting group 1/0x47495222 (DATA) succeeded
NOTE: cache ending mount (success) of group DATA number=1 incarn=0x47495222
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1
SUCCESS: diskgroup DATA was mounted
SUCCESS: alter diskgroup data mount

然后通过rman备份数据库,删除老磁盘组,创建新磁盘组,恢复数据,实现数据库完美恢复,数据0丢失.

ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh]故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh]故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户对asm进行扩容,由于配置不恰当,在使用asmca增加asm disk的时候直接选中了已经被用作文件系统的vg中的磁盘

Tue Nov 19 09:48:48 2019
Non critical error ORA-48180 cFri Nov 22 12:47:48 2019
SQL> ALTER DISKGROUP XIFENFEI ADD  DISK '/dev/rhdisk29' SIZE 491520M ,
'/dev/rhdisk30' SIZE 491520M ,
'/dev/rhdisk31' SIZE 491520M /* ASMCA */ 
NOTE: GroupBlock outside rolling migration privileged region
NOTE: Assigning number (4,15) to disk (/dev/rhdisk29)
NOTE: Assigning number (4,16) to disk (/dev/rhdisk30)
NOTE: Assigning number (4,17) to disk (/dev/rhdisk31)
NOTE: requesting all-instance membership refresh for group=4
NOTE: initializing header on grp 4 disk XIFENFEI_0015
NOTE: initializing header on grp 4 disk XIFENFEI_0016
NOTE: initializing header on grp 4 disk XIFENFEI_0017
NOTE: requesting all-instance disk validation for group=4
Fri Nov 22 12:47:51 2019
NOTE: skipping rediscovery for group 4/0xb08c40b (XIFENFEI) on local instance.
NOTE: requesting all-instance disk validation for group=4
NOTE: skipping rediscovery for group 4/0xb08c40b (XIFENFEI) on local instance.
Fri Nov 22 12:47:59 2019
NOTE: initiating PST update: grp = 4
Fri Nov 22 12:47:59 2019
GMON updating group 4 at 12 for pid 27, osid 12649908
NOTE: PST update grp = 4 completed successfully 
NOTE: membership refresh pending for group 4/0xb08c40b (XIFENFEI)
GMON querying group 4 at 13 for pid 18, osid 39912680
Fri Nov 22 12:48:01 2019
NOTE: cache opening disk 15 of grp 4: XIFENFEI_0015 path:/dev/rhdisk29
NOTE: cache opening disk 16 of grp 4: XIFENFEI_0016 path:/dev/rhdisk30
NOTE: cache opening disk 17 of grp 4: XIFENFEI_0017 path:/dev/rhdisk31
NOTE: Attempting voting file refresh on diskgroup XIFENFEI
NOTE: Refresh completed on diskgroup XIFENFEI. No voting file found.
GMON querying group 4 at 14 for pid 18, osid 39912680
SUCCESS: refreshed membership for 4/0xb08c40b (XIFENFEI)
SUCCESS: ALTER DISKGROUP XIFENFEI ADD  DISK '/dev/rhdisk29' SIZE 491520M ,
'/dev/rhdisk30' SIZE 491520M ,
'/dev/rhdisk31' SIZE 491520M /* ASMCA */

发现增加错磁盘之后,从vg里面强制踢掉被asm使用的磁盘,并且尝试在asm中删除这些磁盘,并加入新磁盘

Fri Nov 22 12:52:03 2019
SQL> ALTER DISKGROUP XIFENFEI DROP  DISK 'XIFENFEI_0015','XIFENFEI_0016','XIFENFEI_0017' /* ASMCA */ 
NOTE: GroupBlock outside rolling migration privileged region
Fri Nov 22 12:52:03 2019
NOTE: stopping process ARB0
NOTE: rebalance interrupted for group 4/0xb08c40b (XIFENFEI)
NOTE: requesting all-instance membership refresh for group=4
NOTE: membership refresh pending for group 4/0xb08c40b (XIFENFEI)
Fri Nov 22 12:52:12 2019
GMON querying group 4 at 15 for pid 18, osid 39912680
SUCCESS: refreshed membership for 4/0xb08c40b (XIFENFEI)
SUCCESS: ALTER DISKGROUP XIFENFEI DROP  DISK 'XIFENFEI_0015','XIFENFEI_0016','XIFENFEI_0017' /* ASMCA */
NOTE: starting rebalance of group 4/0xb08c40b (XIFENFEI) at power 1
Starting background process ARB0
…………
Fri Nov 22 12:58:26 2019
SQL> ALTER DISKGROUP XIFENFEI ADD  DISK '/dev/rhdisk7' SIZE 491520M /* ASMCA */ 
NOTE: GroupBlock outside rolling migration privileged region
Fri Nov 22 12:58:26 2019
NOTE: stopping process ARB0
NOTE: rebalance interrupted for group 4/0xb08c40b (XIFENFEI)
NOTE: ASM did background COD recovery for group 4/0xb08c40b (XIFENFEI)
NOTE: Assigning number (4,18) to disk (/dev/rhdisk7)
NOTE: requesting all-instance membership refresh for group=4
NOTE: initializing header on grp 4 disk XIFENFEI_0018
NOTE: requesting all-instance disk validation for group=4
NOTE: skipping rediscovery for group 4/0xb08c40b (XIFENFEI) on local instance.
NOTE: requesting all-instance disk validation for group=4
NOTE: skipping rediscovery for group 4/0xb08c40b (XIFENFEI) on local instance.
Fri Nov 22 12:58:41 2019
NOTE: initiating PST update: grp = 4
Fri Nov 22 12:58:41 2019
GMON updating group 4 at 16 for pid 27, osid 12649908
NOTE: PST update grp = 4 completed successfully 
Fri Nov 22 12:58:41 2019
NOTE: membership refresh pending for group 4/0xb08c40b (XIFENFEI)
GMON querying group 4 at 17 for pid 18, osid 39912680
NOTE: cache opening disk 18 of grp 4: XIFENFEI_0018 path:/dev/rhdisk7
NOTE: Attempting voting file refresh on diskgroup XIFENFEI
NOTE: Refresh completed on diskgroup XIFENFEI. No voting file found.
GMON querying group 4 at 18 for pid 18, osid 39912680
SUCCESS: refreshed membership for 4/0xb08c40b (XIFENFEI)
NOTE: starting rebalance of group 4/0xb08c40b (XIFENFEI) at power 1
SUCCESS: ALTER DISKGROUP XIFENFEI ADD  DISK '/dev/rhdisk7' SIZE 491520M /* ASMCA */
Starting background process ARB0
Fri Nov 22 12:58:46 2019
ARB0 started with pid=44, OS id=54460432 
…………
Fri Nov 22 12:59:57 2019
SQL> ALTER DISKGROUP XIFENFEI ADD  DISK '/dev/rhdisk10' SIZE 491520M ,
'/dev/rhdisk11' SIZE 491520M ,
'/dev/rhdisk8' SIZE 491520M ,
'/dev/rhdisk9' SIZE 491520M /* ASMCA */ 
NOTE: GroupBlock outside rolling migration privileged region
Fri Nov 22 12:59:57 2019
NOTE: stopping process ARB0
NOTE: rebalance interrupted for group 4/0xb08c40b (XIFENFEI)
NOTE: ASM did background COD recovery for group 4/0xb08c40b (XIFENFEI)
NOTE: Assigning number (4,19) to disk (/dev/rhdisk10)
NOTE: Assigning number (4,20) to disk (/dev/rhdisk11)
NOTE: Assigning number (4,21) to disk (/dev/rhdisk8)
NOTE: Assigning number (4,22) to disk (/dev/rhdisk9)
NOTE: requesting all-instance membership refresh for group=4
NOTE: initializing header on grp 4 disk XIFENFEI_0019
NOTE: initializing header on grp 4 disk XIFENFEI_0020
NOTE: initializing header on grp 4 disk XIFENFEI_0021
NOTE: initializing header on grp 4 disk XIFENFEI_0022
NOTE: requesting all-instance disk validation for group=4
NOTE: skipping rediscovery for group 4/0xb08c40b (XIFENFEI) on local instance.
Fri Nov 22 13:00:08 2019
NOTE: requesting all-instance disk validation for group=4
Fri Nov 22 13:00:08 2019
NOTE: skipping rediscovery for group 4/0xb08c40b (XIFENFEI) on local instance.
NOTE: initiating PST update: grp = 4
Fri Nov 22 13:00:13 2019
GMON updating group 4 at 19 for pid 27, osid 12649908
NOTE: PST update grp = 4 completed successfully 
NOTE: membership refresh pending for group 4/0xb08c40b (XIFENFEI)
GMON querying group 4 at 20 for pid 18, osid 39912680
NOTE: cache opening disk 19 of grp 4: XIFENFEI_0019 path:/dev/rhdisk10
NOTE: cache opening disk 20 of grp 4: XIFENFEI_0020 path:/dev/rhdisk11
NOTE: cache opening disk 21 of grp 4: XIFENFEI_0021 path:/dev/rhdisk8
NOTE: cache opening disk 22 of grp 4: XIFENFEI_0022 path:/dev/rhdisk9
NOTE: Attempting voting file refresh on diskgroup XIFENFEI
NOTE: Refresh completed on diskgroup XIFENFEI. No voting file found.
GMON querying group 4 at 21 for pid 18, osid 39912680
SUCCESS: refreshed membership for 4/0xb08c40b (XIFENFEI)
SUCCESS: ALTER DISKGROUP XIFENFEI ADD  DISK '/dev/rhdisk10' SIZE 491520M ,
'/dev/rhdisk11' SIZE 491520M ,
'/dev/rhdisk8' SIZE 491520M ,
'/dev/rhdisk9' SIZE 491520M /* ASMCA */
NOTE: starting rebalance of group 4/0xb08c40b (XIFENFEI) at power 1
Starting background process ARB0

asm在做着reblance的过程中遭遇到坏块,直接导致磁盘组dismount

Sun Nov 24 04:42:27 2019
NOTE: group 4 PST updated.
WARNING: cache read  a corrupt block: group=4(XIFENFEI) dsk=15 blk=258 disk=15
 (XIFENFEI_0015) incarn=1717056824 au=113792 blk=2 count=254
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM2/trace/+ASM2_x000_28639240.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483663] [258] [56 != 0]
NOTE: a corrupted block from group XIFENFEI was dumped to
   /u01/app/oracle/diag/asm/+asm/+ASM2/trace/+ASM2_x000_28639240.trc
WARNING: cache read (retry) a corrupt block: group=4(XIFENFEI) 
 dsk=15 blk=258 disk=15 (XIFENFEI_0015) incarn=1717056824 au=113792 blk=2 count=1
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM2/trace/+ASM2_x000_28639240.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483663] [258] [56 != 0]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483663] [258] [56 != 0]
ERROR: cache failed to read group=4(XIFENFEI) dsk=15 blk=258 from disk(s): 15(XIFENFEI_0015)
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483663] [258] [56 != 0]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483663] [258] [56 != 0]
NOTE: cache initiating offline of disk 15 group XIFENFEI
NOTE: process _x000_+asm2 (28639240) initiating offline of disk 
    15.1717056824 (XIFENFEI_0015) with mask 0x7e in group 4
NOTE: initiating PST update: grp = 4, dsk = 15/0x66583538, mask = 0x6a, op = clear
GMON updating disk modes for group 4 at 23 for pid 28, osid 28639240
ERROR: Disk 15 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 4)
Sun Nov 24 04:42:27 2019
NOTE: cache dismounting (not clean) group 4/0x0B08C40B (XIFENFEI) 
WARNING: Offline for disk XIFENFEI_0015 in mode 0x7f failed.
Sun Nov 24 04:42:27 2019
NOTE: halting all I/Os to diskgroup 4 (XIFENFEI)
NOTE: messaging CKPT to quiesce pins Unix process pid: 59441780, image: oracle@xifenfei2 (B000)
Sun Nov 24 04:42:27 2019
ERROR: ORA-15130 thrown in ARB0 for group number 4
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM2/trace/+ASM2_arb0_50856926.trc:
ORA-15130: diskgroup "XIFENFEI" is being dismounted

至此两个节点的该磁盘组就陷入了不停的mount,然后dismount的轮流循环中.这里我们可以大概的分析出来,由于vg的磁盘组被写入了数据或者强制剔除的时候导致asm写入该文件的数据被破坏,导致后续的asm reblance遭遇坏块,然后直接dismount.对于该问题的解决方案,通过对对该磁盘组的acd和cod进行patch,让其不进行reblance,保持该磁盘组现在,稳定的mount状态,然后对其数据进行备份和重建该磁盘组.这个客户运气不错,vg中的asm disk磁盘写入较少,数据库运行正常.
对于这种情况,如果发生极端损坏,比如asm磁盘组无法mount,可以参考:找回ASM中数据文件
如果是asm的元数据大量损坏,无法通过asm字典级别恢复,可以通过参考:asm disk header 彻底损坏恢复