又一例TRIM导致asm磁盘数据丢失的故障

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:又一例TRIM导致asm磁盘数据丢失的故障

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

以前遇到过一个case,存储直连虚拟机,对磁盘误操作之后触发trim,导致数据被清空:ssd trim导致fdisk格式化磁盘之后无法恢复,最近再次遇到类似案例:客户错误对一块asm disk磁盘进行了格式化
mkfs


该磁盘是由6块磁盘组成了磁盘组
data

被格式化之后data磁盘组直接dismount

Tue Apr 07 18:22:31 2026
WARNING: cache read  a corrupt block: group=2(DATA) fn=261 indblk=0 disk=0 (DATA_0000) incarn=3958745085 au=605 blk=0 count=1
Errors in file /home/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_639087.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [261] [2147483648] [0 != 1]
NOTE: a corrupted block from group DATA was dumped to /home/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_639087.trc
WARNING: cache read (retry) a corrupt block: group=2(DATA) fn=261 indblk=0 disk=0 (DATA_0000) incarn=3958745085 au=605 blk=0 count=1
Errors in file /home/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_639087.trc:
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [261] [2147483648] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [261] [2147483648] [0 != 1]
ERROR: cache failed to read group=2(DATA) fn=261 indblk=0 from disk(s): 0(DATA_0000)
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [261] [2147483648] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [261] [2147483648] [0 != 1]
NOTE: cache initiating offline of disk 0 group DATA
NOTE: process _user639087_+asm1 (639087) initiating offline of disk 0.3958745085 (DATA_0000) with mask 0x7e in group 2
NOTE: initiating PST update: grp = 2, dsk = 0/0xebf5a7fd, mask = 0x6a, op = clear
Tue Apr 07 18:22:31 2026
GMON updating disk modes for group 2 at 10 for pid 28, osid 639087
ERROR: Disk 0 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 2)
Tue Apr 07 18:22:31 2026
NOTE: cache dismounting (not clean) group 2/0xE9E5571F (DATA) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 115720, image: oracle@ajjorcl1 (B000)
Tue Apr 07 18:22:31 2026
NOTE: halting all I/Os to diskgroup 2 (DATA)
WARNING: Offline for disk DATA_0000 in mode 0x7f failed.
Tue Apr 07 18:22:31 2026
NOTE: LGWR doing non-clean dismount of group 2 (DATA)
NOTE: LGWR sync ABA=15.1625 last written ABA 15.1625
Errors in file /home/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_639087.trc  (incident=309345):
ORA-15335: ASM metadata corruption detected in disk group 'DATA'
ORA-15130: diskgroup "DATA" is being dismounted
ORA-15066: offlining disk "DATA_0000" in group "DATA" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [261] [2147483648] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [261] [2147483648] [0 != 1]
Incident details in: /home/app/grid/diag/asm/+asm/+ASM1/incident/incdir_309345/+ASM1_ora_639087_i309345.trc
Tue Apr 07 18:22:31 2026
List of instances:
 1
Dirty detach reconfiguration started (new ddet inc 1, cluster inc 30)
 Global Resource Directory partially frozen for dirty detach
* dirty detach - domain 2 invalid = TRUE 
 26 GCS resources traversed, 0 cancelled
Dirty Detach Reconfiguration complete
Tue Apr 07 18:22:31 2026
freeing rdom 2
Tue Apr 07 18:22:31 2026
WARNING: dirty detached from domain 2
NOTE: cache dismounted group 2/0xE9E5571F (DATA) 
SQL> alter diskgroup DATA dismount force /* ASM SERVER:3924121375 */ 
Tue Apr 07 18:22:32 2026
Sweep [inc][309345]: completed
System State dumped to trace file /home/app/grid/diag/asm/+asm/+ASM1/incident/incdir_309345/+ASM1_ora_639087_i309345.trc
Tue Apr 07 18:22:32 2026
Dumping diagnostic data in directory=[cdmp_20260407182232], requested by (instance=1, osid=639087), summary=[incident=309345].
Tue Apr 07 18:22:32 2026
NOTE: cache deleting context for group DATA 2/0xe9e5571f
GMON dismounting group 2 at 11 for pid 32, osid 115720
NOTE: Disk DATA_0000 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0001 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0002 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0003 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0004 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0005 in mode 0x7f marked for de-assignment
NOTE:Waiting for all pending writes to complete before de-registering: grpnum 2
Tue Apr 07 18:22:34 2026
Sweep [inc2][309345]: completed
NOTE: AMDU dump of disk group DATA created at /home/app/grid/diag/asm/+asm/+ASM1/incident/incdir_309345
Tue Apr 07 18:22:37 2026
NOTE: ASM client orcl1:orcl disconnected unexpectedly.
NOTE: check client alert log.
NOTE: Trace records dumped in trace file /home/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_504268.trc
Tue Apr 07 18:23:02 2026
SUCCESS: diskgroup DATA was dismounted
SUCCESS: alter diskgroup DATA dismount force /* ASM SERVER:3924121375 */
SUCCESS: ASM-initiated MANDATORY DISMOUNT of group DATA

通过kfed分析被格式化的磁盘,随机找了一些au发现都被置空
kfed


使用lsblk查看对应磁盘是否启用了TRIM 特性
trim

基于这样的情况,基本上可以判断,该磁盘大概率已经触发了trim,数据被置空的概率非常大,最后对于镜像磁盘通过winhex查看,确认磁盘中除了基本的分区和文件系统信息之外其他都为空
kong
基于此种情况,最好的结果就是恢复该6个磁盘组磁盘中5个磁盘的数据,这样丢失数据最少1/6以上,但是也是没有办法中的办法,尽可能减少损失了.