Oracle Recovery Tools实战批量坏块修复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:Oracle Recovery Tools实战批量坏块修复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户数据库无法正常启动ORA-600 6711错误

SQL> startup mount pfile='d:/pfile.txt'
ORACLE instance started.

Total System Global Area 4294964032 bytes
Fixed Size                  9036608 bytes
Variable Size             889192448 bytes
Database Buffers         3388997632 bytes
Redo Buffers                7737344 bytes
Database mounted.
SQL> alter database open ;
alter database open 
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [6711], [4313028], [1], [4309898],
[0], [], [], [], [], [], [], []
Process ID: 22708
Session ID: 978 Serial number: 56675

alert日志报错

2022-06-26T12:34:41.855326+08:00
alter database open
2022-06-26T12:34:41.984974+08:00
Ping without log force is disabled:
  instance mounted in exclusive mode.
Endian type of dictionary set to little
Undo initialization finished serial:0 start:313418906 end:313418906 diff:0 ms (0.0 seconds)
Database Characterset is ZHS16GBK
No Resource Manager plan active
2022-06-26T12:34:43.302315+08:00
Errors in file C:\APP\XFF\diag\rdbms\orcl\ora19c\trace\ora19c_ora_22708.trc  (incident=38629):
ORA-00600: internal error code, arguments: [6711], [4313028], [1], [4309898], [0], [], [], [], [], [], [], []
Incident details in: C:\APP\XFF\diag\rdbms\orcl\ora19c\incident\incdir_38629\ora19c_ora_22708_i38629.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2022-06-26T12:34:44.232115+08:00
*****************************************************************
An internal routine has requested a dump of selected redo.
This usually happens following a specific internal error, when
analysis of the redo logs will help Oracle Support with the
diagnosis.
It is recommended that you retain all the redo logs generated (by
all the instances) during the past 12 hours, in case additional
redo dumps are required to help with the diagnosis.
*****************************************************************
2022-06-26T12:34:44.315431+08:00
Errors in file C:\APP\XFF\diag\rdbms\orcl\ora19c\trace\ora19c_ora_22708.trc:
ORA-00600: internal error code, arguments: [6711], [4313028], [1], [4309898], [0], [], [], [], [], [], [], []
2022-06-26T12:34:44.315431+08:00
Errors in file C:\APP\XFF\diag\rdbms\orcl\ora19c\trace\ora19c_ora_22708.trc:
ORA-00600: internal error code, arguments: [6711], [4313028], [1], [4309898], [0], [], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
Errors in file C:\APP\XFF\diag\rdbms\orcl\ora19c\trace\ora19c_ora_22708.trc  (incident=38630):
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [6711], [4313028], [1], [4309898], [0], [], [], [], [], [], [], []
Incident details in: C:\APP\XFF\diag\rdbms\orcl\ora19c\incident\incdir_38630\ora19c_ora_22708_i38630.trc
2022-06-26T12:34:45.266678+08:00
opiodr aborting process unknown ospid (22708) as a result of ORA-603
2022-06-26T12:34:45.274688+08:00
ORA-603 : opitsk aborting process
License high water mark = 1
USER (ospid: (prelim)): terminating the instance due to ORA error 

通过分析trace文件进行分析,确认是由于histgrm$表异常导致,通过一些特殊处理,绕过该表相关sql,open数据库,并且尝试导出数据

SQL> startup mount pfile='d:/pfile.txt';
ORACLE instance started.

Total System Global Area 4294964032 bytes
Fixed Size                  9036608 bytes
Variable Size             889192448 bytes
Database Buffers         3388997632 bytes
Redo Buffers                7737344 bytes
Database mounted.
SQL>
SQL>
SQL> alter database open;

Database altered.

使用expdp导出数据报ORA-01578错
expdp-ora-1578


通过分析是由于system有坏块导致,dbv检查文件
dbv-huikuai

通过Oracle Recovery Tools工具批量坏块修复功能修复
20220626123245
20220626123343

通过工具修复大量主要坏块被修复,还有一些内部逻辑错误(后续工具继续完善),再次尝试逻辑导出数据,无任何报错,数据比较完美恢复
20220626160209

ORA-15063: ASM discovered an insufficient number of disks for diskgroup 恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-15063: ASM discovered an insufficient number of disks for diskgroup 恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户反馈三个磁盘组无法正常mount,报错类似ORA-15032 ORA-15017 ORA-15063

SQL> ALTER DISKGROUP ASM_DATA MOUNT  /* asm agent *//* {0:0:2} */ 
NOTE: cache registered group ASM_DATA number=1 incarn=0xffa85ccd
NOTE: cache began mount (first) of group ASM_DATA number=1 incarn=0xffa85ccd
ERROR: no read quorum in group: required 2, found 0 disks
NOTE: cache dismounting (clean) group 1/0xFFA85CCD (ASM_DATA) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 5709, image: oracle@XFF (TNS V1-V3)
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 1/0xFFA85CCD (ASM_DATA) 
NOTE: cache ending mount (fail) of group ASM_DATA number=1 incarn=0xffa85ccd
NOTE: cache deleting context for group ASM_DATA 1/0xffa85ccd
Tue Jun 21 12:24:38 2022
NOTE: No asm libraries found in the system
ASM Health Checker found 1 new failures
GMON dismounting group 1 at 16 for pid 19, osid 5709
ERROR: diskgroup ASM_DATA was not mounted
ORA-15032: not all alterations performed
ORA-15017: diskgroup "ASM_DATA" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "ASM_DATA"
ERROR: ALTER DISKGROUP ASM_DATA MOUNT  /* asm agent *//* {0:0:2} */

初步判断是asm disk异常导致(比如asm disk不能被扫描到,或者丢失,或者磁盘头损坏等),分析客户的asm disk的udev文件配置

KERNEL=="sdd1", NAME="asm_grid", OWNER="grid", GROUP="asmadmin", MODE="0660"          
KERNEL=="sde1", NAME="asm_system", OWNER="grid", GROUP="asmadmin", MODE="0660"    
KERNEL=="sdf1", NAME="asm_data", OWNER="grid", GROUP="asmadmin", MODE="0660"     

从udev的配置中可以看出来,客户以前是对3个磁盘进行分析,然后使用udev映射别名给asm使用的.通过对其中一个磁盘进行分析
20220621220634
20220621220728


通过上述winhex查看,可以确认该分区的磁盘头信息异常[该信息属于磁盘刚分区的时候信息,而不是asm disk的信息],和kfed看到的结果一致[磁盘头位置肯定损坏,其他位置目前未知]

H:\TEMP\dd>kfed read sdf_sdf1.dd
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
0064D8400 00000000 00000000 00000000 00000000  [................]
        Repeat 26 times
0064D85B0 00000000 00000000 00000000 02000000  [................]
0064D85C0 FE8E0001 003FFFFF DFFC0000 0000257F  [......?......%..]
0064D85D0 00000000 00000000 00000000 00000000  [................]
        Repeat 1 times
0064D85F0 00000000 00000000 00000000 AA550000  [..............U.]
0064D8600 00000000 00000000 00000000 00000000  [................]
  Repeat 223 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

分析其他位置的block情况,初步看基本上ok[运气还不错]

H:\TEMP\dd>kfed read sdf_sdf1.dd blkn=2|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL

H:\TEMP\dd>kfed read sdf_sdf1.dd blkn=3|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL

H:\TEMP\dd>kfed read sdf_sdf1.dd blkn=1 aun=2|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL

通过检索备份出来的部分磁盘文件,找出来ORCLDISK信息部分(asm disk header)
20220621221843


然后利用这个部分对损坏的磁盘头进行修复,并且dd回生产环境中,并尝试mount磁盘组,数据库open成功
20220621181430
20220621222356


至此这个数据库运气不错,没有过多损坏,算完美恢复,可以进行了逻辑导出和rman备份,全部正常.为了后续安全,建议对其进行迁移

ORA-01110 ORA-17070 OSD-04006 故障恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-01110 ORA-17070 OSD-04006 故障恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有朋友找到我说应用访问数据库和导出数据都报ORA-01110 ORA-17070 OSD-04006之类错误,数据库可以正常open,但是业务访问关键数据和导出报错
20220611175122
20220611175156


对于这个错误,根据以往恢复经验,初步判断可能硬件异常(比如坏道,硬件故障)或者文件系统异常引起,让客户尝试拷贝该文件,确认该文件也无法拷贝
20220611174852

对于这种情况,如果放弃该文件,恢复其他文件数据,那样数据丢失比例太大,直接通过特定恢复工具对其损坏文件进行拷贝,最大限度强求当前文件数据,发现一些扇区损坏跳过继续拷贝
20220611174928

通过坏块检查工具进行检查确认该文件76个block损坏(对于32G的数据文件损坏1M数据,比较好效果)
20220611174909

对坏块进行处理,然后使用expdp导出数据,最大限度抢救数据
20220611175039

fdisk分区导致asm disk破坏数据库恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:fdisk分区导致asm disk破坏数据库恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

尝试mount data磁盘组

SQL> alter diskgroup DATADG mount 
NOTE: cache registered group DATADG number=1 incarn=0xbc43fafd
NOTE: cache began mount (first) of group DATADG number=1 incarn=0xbc43fafd
NOTE: Assigning number (1,0) to disk (/dev/raw/raw2)
Thu Jun 02 10:14:33 2022
NOTE: GMON heartbeating for grp 1
GMON querying group 1 at 27 for pid 27, osid 3853
NOTE: Assigning number (1,1) to disk ()
GMON querying group 1 at 28 for pid 27, osid 3853
NOTE: cache dismounting (clean) group 1/0xBC43FAFD (DATADG) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 3853, image: oracle@node1 (TNS V1-V3)
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 1/0xBC43FAFD (DATADG) 
NOTE: cache ending mount (fail) of group DATADG number=1 incarn=0xbc43fafd
NOTE: cache deleting context for group DATADG 1/0xbc43fafd
GMON dismounting group 1 at 29 for pid 27, osid 3853
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
ERROR: diskgroup DATADG was not mounted
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "1" is missing from group number "1" 
ERROR: alter diskgroup DATADG mount
Thu Jun 02 10:14:33 2022
ASM Health Checker found 1 new failures

报错信息比较明显 datadg的disk number 为1的磁盘丢失了。通过fdisk确认磁盘情况

Disk /dev/sdb: 42.9 GB, 42949672960 bytes
64 heads, 32 sectors/track, 40960 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0006c2be

   Device Boot      Start         End      Blocks   Id  System

Disk /dev/sda: 53.7 GB, 53687091200 bytes
64 heads, 32 sectors/track, 51200 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00061443

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           2        2049     2097152   83  Linux
Partition 1 does not end on cylinder boundary.
/dev/sda2            2050       10241     8388608   82  Linux swap / Solaris
Partition 2 does not end on cylinder boundary.
/dev/sda3           10242       12289     2097152   83  Linux
Partition 3 does not end on cylinder boundary.
/dev/sda4           12290       51200    39844864    5  Extended
Partition 4 does not end on cylinder boundary.
/dev/sda5           12291       14338     2097152   83  Linux
/dev/sda6           14340       50178    36699136   83  Linux
/dev/sda7           50180       51200     1045504   83  Linux

Disk /dev/sdc: 214.7 GB, 214748364800 bytes
255 heads, 63 sectors/track, 26108 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x1b3fba6b

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1               1        1045     8393931   83  Linux
/dev/sdc2            1046       26108   201318547+  83  Linux

Disk /dev/sdd: 536.9 GB, 536870912000 bytes
255 heads, 63 sectors/track, 65270 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x4c63ecad

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1               1       65270   524281243+  83  Linux

Disk /dev/sde: 536.9 GB, 536870912000 bytes
255 heads, 63 sectors/track, 65270 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000


Disk /dev/sdf: 536.9 GB, 536870912000 bytes
255 heads, 63 sectors/track, 65270 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

根据客户反馈,异常的应该是一个500G的磁盘,而其中sdb为分区,通过kfed命令分析,确认sdc1为ocr磁盘,sdc2为datadg的一块磁盘,另外一块磁盘应该在sdd,sde,sdf三者之中,通过kfed分析sde,sdf均不可能是asm disk(一块是文件系统,一块是彻底没有使用的空盘),如果datadg的磁盘没有丢失,那应该就是sdd这块磁盘,通过dd 磁盘100M空间,然后通过kfed进行分析确认

E:\TEMP\xff>kfed read sdd.dd
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
006648400 00000000 00000000 00000000 00000000  [................]
        Repeat 26 times
0066485B0 00000000 00000000 4C63ECAD 01000000  [..........cL....]
0066485C0 FE830001 003FFFFF CB370000 00003E7F  [......?...7..>..]
0066485D0 00000000 00000000 00000000 00000000  [................]
        Repeat 1 times
0066485F0 00000000 00000000 00000000 AA550000  [..............U.]
006648600 00000000 00000000 00000000 00000000  [................]
  Repeat 223 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

E:\TEMP\xff>kfed read sdd1.dd
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
006768400 00000000 00000000 00000000 00000000  [................]
        Repeat 26 times
0067685B0 00000000 00000000 70D364B4 FE000000  [.........d.p....]
0067685C0 FE83FFFF D13FFFFF BB7603EB 00003A93  [......?...v..:..]
0067685D0 00000000 00000000 00000000 00000000  [................]
        Repeat 1 times
0067685F0 00000000 00000000 00000000 AA550000  [..............U.]
006768600 02038201 00000008 80000001 826037C1  [.............7`.]
006768EA0 00000079 00800105 0000007A 00800105  [y.......z.......]
006768EB0 0000007C 00800105 0000007D 00800105  [|.......}.......]
0067693C0 0000015C 00800105 0000015D 00800105  [\.......].......]
0067693D0 0000015F 00800105 00000160 00800105  [_.......`.......]
0067693E0 00000161 00800105 00000163 00800105  [a.......c.......]
0067693F0 00000164 00800105 00000166 00800105  [d.......f.......]
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

E:\TEMP\xff>kfed read sdd.dd blkn=1|more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            2 ; 0x002: KFBTYP_FREESPC
kfbh.datfmt:                          2 ; 0x003: 0x02
kfbh.block.blk:                       1 ; 0x004: blk=1
kfbh.block.obj:              2147483649 ; 0x008: disk=1
kfbh.check:                  2197087544 ; 0x00c: 0x82f4e538
kfbh.fcn.base:                   616391 ; 0x010: 0x000967c7
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdfsb.aunum:                         0 ; 0x000: 0x00000000
kfdfsb.max:                         254 ; 0x004: 0x00fe
kfdfsb.cnt:                         254 ; 0x006: 0x00fe
kfdfsb.bound:                         0 ; 0x008: 0x0000
kfdfsb.flag:                          1 ; 0x00a: B=1
kfdfsb.ub1spare:                      0 ; 0x00b: 0x00
kfdfsb.spare[0]:                      0 ; 0x00c: 0x00000000
kfdfsb.spare[1]:                      0 ; 0x010: 0x00000000
kfdfsb.spare[2]:                      0 ; 0x014: 0x00000000

通过上述信息分析,基本上可以确认sdd磁盘以前是asm disk,但是被fdisk进行了分区,基于这种情况,通过对磁盘组进行修复

E:\TEMP\xff>kfed read sdd.ok
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:              2147483649 ; 0x008: disk=1
kfbh.check:                   424926402 ; 0x00c: 0x1953dcc2
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr:         ORCLDISK ; 0x000: length=8
kfdhdb.driver.reserved[0]:            0 ; 0x008: 0x00000000
kfdhdb.driver.reserved[1]:            0 ; 0x00c: 0x00000000
kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
kfdhdb.compat:                186646528 ; 0x020: 0x0b200000
kfdhdb.dsknum:                        1 ; 0x024: 0x0001
kfdhdb.grptyp:                        1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:             DATADG_0001 ; 0x028: length=11
kfdhdb.grpname:                  DATADG ; 0x048: length=6
kfdhdb.fgname:              DATADG_0001 ; 0x068: length=11
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.crestmp.hi:             33074858 ; 0x0a8: HOUR=0xa DAYS=0x15 MNTH=0xb YEAR=0x7e2
kfdhdb.crestmp.lo:           2375520256 ; 0x0ac: USEC=0x0 MSEC=0x1e4 SECS=0x19 MINS=0x23
kfdhdb.mntstmp.hi:             33074858 ; 0x0b0: HOUR=0xa DAYS=0x15 MNTH=0xb YEAR=0x7e2
kfdhdb.mntstmp.lo:           2375522304 ; 0x0b4: USEC=0x0 MSEC=0x1e6 SECS=0x19 MINS=0x23
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.mfact:                    113792 ; 0x0c0: 0x0001bc80
kfdhdb.dsksize:                  512000 ; 0x0c4: 0x0007d000
kfdhdb.pmcnt:                         6 ; 0x0c8: 0x00000006
kfdhdb.fstlocn:                       1 ; 0x0cc: 0x00000001
kfdhdb.altlocn:                       2 ; 0x0d0: 0x00000002
kfdhdb.f1b1locn:                      0 ; 0x0d4: 0x00000000
kfdhdb.redomirrors[0]:                0 ; 0x0d8: 0x0000
kfdhdb.redomirrors[1]:                0 ; 0x0da: 0x0000
kfdhdb.redomirrors[2]:                0 ; 0x0dc: 0x0000
kfdhdb.redomirrors[3]:                0 ; 0x0de: 0x0000
kfdhdb.dbcompat:              168820736 ; 0x0e0: 0x0a100000
kfdhdb.grpstmp.hi:             33072461 ; 0x0e4: HOUR=0xd DAYS=0xa MNTH=0x9 YEAR=0x7e2
kfdhdb.grpstmp.lo:           3452534784 ; 0x0e8: USEC=0x0 MSEC=0x260 SECS=0x1c MINS=0x33

磁盘组mount成功,数据库open成功,实现数据0丢失
20220611171941
20220611172005


使用rman对数据库进行备份,并且重建磁盘组实现数据0丢失

ORA-600 kcvent_internal_02故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-600 kcvent_internal_02故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

数据库启动报ORA-00600: internal error code, arguments: [kcvent_internal_02]错,无法正常open

Reconfiguration complete
 parallel recovery started with 32 processes
Started redo scan
Completed redo scan
 read 22775 KB redo, 5055 data blocks need recovery
Started redo application at
 Thread 2: logseq 166395, block 88
Recovery of Online Redo Log: Thread 2 Group 3 Seq 166395 Reading mem 0
  Mem# 0: +DATA/orcl/onlinelog/group_3.283.1036687245
  Mem# 1: +FLASH/orcl/onlinelog/group_3.264.1036687257
Recovery of Online Redo Log: Thread 2 Group 4 Seq 166396 Reading mem 0
  Mem# 0: +DATA/orcl/onlinelog/group_4.284.1036687257
  Mem# 1: +FLASH/orcl/onlinelog/group_4.265.1036687257
Completed redo application of 15.97MB
Completed instance recovery at
 Thread 2: logseq 166396, block 15854, scn 27533037896
 5055 data blocks read, 5055 data blocks written, 22775 redo k-bytes read
Thread 2 advanced to log sequence 166397 (thread recovery)
Redo thread 2 internally disabled at seq 166397 
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_35652472.trc  (incident=195549):
ORA-00600: internal error code, arguments: [kcvent_internal_02], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_195549/orcl1_ora_35652472_i195549.trc

对应的trace文件信息

Dump continued from file: /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_35652472.trc
ORA-00600: internal error code, arguments: [kcvent_internal_02], [], [], [], [], [], [], [], [], [], [], []

========= Dump for incident 195549 (ORA 600 [kcvent_internal_02]) ========

*** 2022-06-06 22:17:48.743
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=5fmpzya54p4hf) -----
ALTER DATABASE OPEN /* db agent *//* {1:38339:2} */

----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
skdstdst()+40        bl       0000000109B1E77C     000000000 ? 000000001 ?
                                                   000000003 ? 000000000 ?
                                                   000000000 ? 000000001 ?
                                                   000000003 ? 000000000 ?
ksedst1()+112        call     skdstdst()           16F60DC8B26FAB02 ?
                                                   4846284100000000 ?
                                                   FFFFFFFFFFE46D0 ?
                                                   283C6E7C6A9A6 ? 10A6B923C ?
                                                   000000000 ? 110737880 ?
                                                   2050033FFFE46D8 ?
ksedst()+40          call     ksedst1()            000000000 ? 00000000A ?
                                                   07FFFFFFF ? 700000000003670 ?
                                                   000000000 ? 000000000 ?
                                                   000002004 ? 000000001 ?
dbkedDefDump()+1516  call     ksedst()             000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 300000003 ?
ksedmp()+72          call     dbkedDefDump()       310737880 ? 110000D40 ?
                                                   FFFFFFFFFFE4EE0 ? 1106AB740 ?
                                                   100124BB8 ? 000000000 ?
                                                   700011D7387FF08 ? 1106AB740 ?
ksfdmp()+100         call     ksedmp()             000000002 ? 000000000 ?
                                                   000000002 ? 10AF01CA8 ?
                                                   10A041C38 ? 000000000 ?
                                                   11073C760 ? 110737880 ?
dbgexPhaseII()+1904  call     ksfdmp()             000000000 ? 00000000A ?
                                                   000000002 ? 000000000 ?
                                                   000000002 ? 10A041C30 ?
                                                   000000000 ? 001050005 ?
dbgexProcessError()  call     dbgexPhaseII()       110737880 ? 11073A970 ?
+1556                                              00002FBDD ? 200000000 ?
                                                   FFFFFFFFFFE5DF8 ? 00000006C ?
                                                   200000000 ? 1000000000 ?
dbgeExecuteForError  call     dbgexProcessError()  110737880 ? 11073C760 ?
()+72                                              100000703 ? 000004000 ?
                                                   000000000 ? FFFFFFFFFFE9608 ?
                                                   000000001 ? 11073E4A8 ?
dbgePostErrorKGE()+  call     dbgeExecuteForError  FFFFFFFFFFE92B0 ?
2044                          ()                   700011D61558BB8 ? 102878B5C ?
                                                   000000000 ? 000000000 ?
                                                   FFFFFFFFFFE9608 ? 000000000 ?
                                                   000000000 ?
dbkePostKGE_kgsf()+  call     dbgePostErrorKGE()   07FFFFFFF ? 700000000003670 ?
68                                                 25800000001 ? 109E4A618 ?
                                                   000000000 ? 000000000 ?
                                                   FFFFFFFFFFEA0B0 ? 1109C0040 ?
kgeadse()+380        call     dbkePostKGE_kgsf()   000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 069186EAB ?
kgerinv_internal()+  call     kgeadse()            000000002 ? 000000002 ?
48                                                 000000001 ? FFFFFFFFFFEAB58 ?
                                                   10A4E02F0 ? 000000002 ?
                                                   FFFFFFFFFFE9FE0 ? 000000000 ?
kgerinv()+48         call     kgerinv_internal()   200000002 ? 000000002 ?
                                                   FFFFFFFFFFEA060 ? 000000000 ?
                                                   102860EB0 ? FFFFFFFFFFEA458 ?
                                                   10285CE74 ? FFFFFFFFFFEA358 ?
kgeasnmierr()+72     call     kgerinv()            38400000001 ? 000000000 ?
                                                   10A4E0D20 ? 497F0A29CAE0 ?
                                                   000000001 ? FFFFFFFFFFEA1C0 ?
                                                   10A4E0D20 ? 110000D78 ?
kcvent_internal()+1  call     kgeasnmierr()        FFFFFFFFFFEA1C0 ? 200000002 ?
532                                                1F0410001F041 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000004 ?
kctenb_internal()+2  call     kcvent_internal()    FFFFFFFFFFEB378 ? 200000002 ?
772                                                FFFFFFFFFFEB448 ?
                                                   FFFFFFFFFFEB2E8 ?
                                                   41F6C57900000000 ?
                                                   000000000 ? FFFFFFFFFFEB330 ?
                                                   1106AB740 ?
kcfopd()+1508        call     kctenb_internal()    07FFFFFFF ? 000000000 ?
                                                   000000018 ? FFFFFFFFFFEC380 ?
                                                   000000000 ? 110A39050 ?
                                                   FFFFFFFFFFEC390 ? 000000000 ?
adbdrv()+8028        call     kcfopd()             081F0AD00 ? 00000000F ?
                                                   0FFFED4C0 ? 000000000 ?
                                                   FFFFFFFFFFED548 ? 100000000 ?
                                                   000000000 ? 1000100000000 ?
opiexe()+16048       call     adbdrv()             2300000023 ? 100000001 ?
                                                   000000000 ? FFFFFFFFFFF6960 ?
                                                   000000000 ? FFFFFFFFFFF6B60 ?
                                                   FFFFFFFFFFF6A98 ? 200000002 ?
opiosq0()+3984       call     opiexe()             700011E117B3B20 ? 000000000 ?
                                                   FFFFFFFFFFF7ED8 ? 110000D78 ?
                                                   000000001 ? 1109FA438 ?
                                                   FFFFFFFFFFF7E70 ?
                                                   2216414400000001 ?
kpooprx()+316        call     opiosq0()            300000000 ? 000000000 ?
                                                   000000000 ? A4000000000000 ?
                                                   000000000 ? FFFFFFFFFFF87F0 ?
                                                   28104221FFFF86F0 ?
                                                   1109FAB08 ?
kpoal8()+872         call     kpooprx()            1000CE68C ? 000000001 ?
                                                   FFFFFFFFFFFAD14 ? 100000001 ?
                                                   000000000 ? A40000000000A4 ?
                                                   109EB6D00 ? 000000000 ?
opiodr()+908         call     kpoal8()             100000000 ? 9001000A0091108 ?
                                                   000000FFF ? 07FFFFFF8 ?
                                                   FFFFFFFFFFF8F10 ? 000000018 ?
                                                   000000000 ? 000072FFF ?
ttcpip()+1028        call     opiodr()             5EFFFFA480 ? 1C00200048 ?
                                                   FFFFFFFFFFFA9F8 ? 000530058 ?
                                                   1108BEE30 ? 000000028 ?
                                                   FFFFFFFFFFFA3A0 ? 1108BEC70 ?
opitsk()+1612        call     ttcpip()             110135440 ? 000002078 ?
                                                   000000000 ? 110000D78 ?
                                                   110005210 ? 000000000 ?
                                                   FFFFFFFFFFFAA20 ?
                                                   2222208009EF13C0 ?
opiino()+940         call     opitsk()             110024C58 ? 000000000 ?
                                                   11079B550 ? 1107A0850 ?
                                                   110737880 ? FFFFFFFFFFFCAE0 ?
                                                   FFFFFFFFFFFEB3C ? 000000101 ?
opiodr()+908         call     opiino()             3C006C787C ?
                                                   BFF0000000000000 ?
                                                   FFFFFFFFFFFEF60 ?
                                                   FFFFFFFFFFFD5E9 ?
                                                   FFFFFFFFFFFD630 ? 1106AB740 ?
                                                   FFFFFFFFFFFD650 ?
                                                   9FFFFFFF000E608 ?
opidrv()+1132        call     opiodr()             3C0AFBC600 ? 410134340 ?
                                                   FFFFFFFFFFFEF60 ? 07530312F ?
                                                   108820CE4 ? 1106AB740 ?
                                                   7264626D732F6F72 ?
                                                   1106AB740 ?
sou2o()+136          call     opidrv()             3C0882A9D0 ? 41170031F ?
                                                   FFFFFFFFFFFEF60 ?
                                                   110017002A0000 ? 0E0DDF00D ?
                                                   1106AB740 ?
                                                   BADC0FFEE0DDF00D ?
                                                   BADC0FFEE0DDF00D ?
opimai_real()+560    call     sou2o()              FFFFFFFFFFFEFD0 ?
                                                   BADC0FFEE0DDF00D ?
                                                   90000000008BE3C ?
                                                   BADC0FFEE0DDF00D ?
                                                   000000002 ? 9001000A0091108 ?
                                                   A0000000A000000 ? 10B671248 ?
ssthrdmain()+276     call     opimai_real()        10B6B1D74 ? 9001000A0095260 ?
                                                   FFFFFFFFFFFF0B0 ? 10B6B1598 ?
                                                   FFFFFFFFFFFF0D0 ?
                                                   FFFFFFFFFFFF428 ?
                                                   900000000100968 ?
                                                   9001000A0091108 ?
main()+204           call     ssthrdmain()         240000000 ? FFFFFFFFFFFF418 ?
                                                   8FFFFFFF0000090 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   BADC0FFEE0DDF00D ?
                                                   BADC0FFEE0DDF00D ?
__start()+112        call     main()               000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
 

--------------------- Binary Stack Dump ---------------------

该错误在mos,互联网上没有任何信息,不过在alert日志中发现类似信息

Mon Jun 06 23:03:58 2022
Error: Controlfile sequence number in file header is different from the one in memory
       Please check that the correct mount options are used if controlfile is located on NFS

初步判断可能和这个错误有关系,解决相关问题后,尝试open库

SQL> recover database;

ORA-00279: change 27533037896 generated at 06/06/2022 22:17:46 needed for
thread 2
ORA-00289: suggestion :
+FLASH/orcl/archivelog/2022_06_06/thread_2_seq_166396.6532.1106691471
ORA-00280: change 27533037896 for thread 2 is in sequence #166396


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
Log applied.
Media recovery complete.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01216: thread 2 is expected to be disabled after CREATE CONTROLFILE

SQL> !oerr ora 01216
01216, 00000, "thread %s is expected to be disabled after CREATE CONTROLFILE"
// *Cause:  A thread that was given during CREATE CONTROLFILE is enabled, but
//          the datafiles indicate that it should be disabled.  This is
//          probably because the logs supplied to the CREATE CONTROLFILE
//          are old (from before the disabling of the thread).
// *Action: This thread is not required to run the database.  The CREATE
//          CONTROLFILE statement can be reissued without the problem thread,
//          and, if desired, the thread can be recreated after the database
//          is open.

ORA-01216这个错误比较也比较少见,但是感觉和thread有关系,大概的意思是thread 被disable了

SQL> select thread#,STATUS FROM V$THREAD;

   THREAD# STATUS
---------- ------------------
         1 CLOSED
         2 CLOSED

通过人工强制把thread个open,然后数据库启动成功

SQL> select thread#,status from v$thread;

   THREAD# STATUS
---------- ------------------
         1 OPEN
         2 CLOSED

SQL> alter database open;

Database altered.

然后启动thread 2,open 第二个节点

--需要open节点
QL> startup
ORACLE instance started.

Total System Global Area 1.2961E+11 bytes
Fixed Size                  2262400 bytes
Variable Size            3.3018E+10 bytes
Database Buffers         9.6368E+10 bytes
Redo Buffers              221818880 bytes
ORA-01618: redo thread 2 is not enabled - cannot mount


--已经open节点
SQL> ALTER DATABASE ENABLE THREAD 2;

Database altered.

--需要open节点
SQL> ALTER DATABASE MOUNT;

Database altered.

SQL> ALTER DATABASE OPEN;

Database altered.
xifenfei1:/home/grid$crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       xifenfei1                                      
               ONLINE  ONLINE       xifenfei2                                      
ora.FLASH.dg
               ONLINE  ONLINE       xifenfei1                                      
               ONLINE  ONLINE       xifenfei2                                      
ora.LISTENER.lsnr
               ONLINE  ONLINE       xifenfei1                                      
               ONLINE  ONLINE       xifenfei2                                      
ora.OCR.dg
               ONLINE  ONLINE       xifenfei1                                      
               ONLINE  ONLINE       xifenfei2                                      
ora.asm
               ONLINE  ONLINE       xifenfei1                  Started             
               ONLINE  ONLINE       xifenfei2                  Started             
ora.gsd
               OFFLINE OFFLINE      xifenfei1                                      
               OFFLINE OFFLINE      xifenfei2                                      
ora.net1.network
               ONLINE  ONLINE       xifenfei1                                      
               ONLINE  ONLINE       xifenfei2                                      
ora.ons
               ONLINE  ONLINE       xifenfei1                                      
               ONLINE  ONLINE       xifenfei2                                      
ora.registry.acfs
               ONLINE  ONLINE       xifenfei1                                      
               ONLINE  ONLINE       xifenfei2                                      
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       xifenfei1                                      
ora.cvu
      1        OFFLINE OFFLINE                                                   
ora.xifenfei1.vip
      1        ONLINE  ONLINE       xifenfei1                                      
ora.xifenfei2.vip
      1        ONLINE  ONLINE       xifenfei2                                      
ora.oc4j
      1        ONLINE  ONLINE       xifenfei2                                      
ora.orcl.db
      1        ONLINE  ONLINE       xifenfei1                  Open                
      2        ONLINE  ONLINE       xifenfei2                  Open                
ora.scan1.vip
      1        ONLINE  ONLINE       xifenfei1