win asm disk header 异常恢复

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:win asm disk header 异常恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有朋友反馈win环境下rac异常,asm无法正常mount,检查日志发现

Fri Jul 03 03:55:46 2020
Errors in file C:\APP\ADMINISTRATOR\diag\asm\+asm\+asm2\trace\+asm2_ora_7004.trc:
ORA-15025: could not open disk "\\.\ORCLDISKDATA1"
ORA-27041: unable to open file
OSD-04002: 无法打开文件
O/S-Error: (OS 2) 系统找不到指定的文件。
Errors in file C:\APP\ADMINISTRATOR\diag\asm\+asm\+asm2\trace\+asm2_ora_7004.trc:
ORA-15025: could not open disk "\\.\ORCLDISKDATA1"
ORA-27041: unable to open file
OSD-04002: 无法打开文件
O/S-Error: (OS 2) 系统找不到指定的文件。
WARNING: failed to read mirror side 1 of virtual extent 0 logical extent 0 of file 267 in group [2.2254399778] 
from disk DATA_0000  allocation unit 3502 reason error; if possible, will try another mirror side
Errors in file C:\APP\ADMINISTRATOR\diag\asm\+asm\+asm2\trace\+asm2_ora_7004.trc:
ORA-15081: failed to submit an I/O operation to a disk
Fri Jul 03 03:59:46 2020
Errors in file C:\APP\ADMINISTRATOR\diag\asm\+asm\+asm2\trace\+asm2_ora_7328.trc:
ORA-15025: could not open disk "\\.\ORCLDISKDATA1"
ORA-27041: unable to open file
OSD-04002: 无法打开文件
O/S-Error: (OS 2) 系统找不到指定的文件。
Errors in file C:\APP\ADMINISTRATOR\diag\asm\+asm\+asm2\trace\+asm2_ora_7328.trc:
ORA-15025: could not open disk "\\.\ORCLDISKDATA1"
ORA-27041: unable to open file
OSD-04002: 无法打开文件
O/S-Error: (OS 2) 系统找不到指定的文件。
WARNING: failed to read mirror side 1 of virtual extent 0 logical extent 0 of file 267 in group [2.2254399778] 
from disk DATA_0000  allocation unit 3502 reason error; if possible, will try another mirror side
Errors in file C:\APP\ADMINISTRATOR\diag\asm\+asm\+asm2\trace\+asm2_ora_7328.trc:
ORA-15081: failed to submit an I/O operation to a disk

报错信息比较明显是由于无法找到\\.\ORCLDISKDATA1磁盘,因此异常,通过asmtool查看磁盘信息

C:\app\11.2.0\grid>asmtool -list
NTFS                             \Device\Harddisk0\Partition3            81920M
NTFS                             \Device\Harddisk0\Partition4           200000M
NTFS                             \Device\Harddisk0\Partition5          4293849M
                                 \Device\Harddisk1\Partition2             4062M
                                 \Device\Harddisk2\Partition2          2097022M
ORCLDISKFRA0                     \Device\Harddisk3\Partition2           511870M

明显的发现ORCLDISKDATA1磁盘丢失,通过对磁盘dd到本地然后进行分析发现,asm disk header损坏

C:\Users\Administrator>kfed read F:\temp\disk3\1\disk2.dd
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
006B38C00 00000000 00000000 00000000 00000000  [................]
  Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]


C:\Users\Administrator>kfed read F:\temp\disk3\1\disk2.dd blkn=2
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL
kfbh.datfmt:                          2 ; 0x003: 0x02
kfbh.block.blk:                       2 ; 0x004: blk=2
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                  2349305287 ; 0x00c: 0x8c078dc7
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdatb.aunum:                         0 ; 0x000: 0x00000000
kfdatb.shrink:                      448 ; 0x004: 0x01c0
kfdatb.ub2pad:                        0 ; 0x006: 0x0000
kfdatb.auinfo[0].link.next:           8 ; 0x008: 0x0008
kfdatb.auinfo[0].link.prev:           8 ; 0x00a: 0x0008
kfdatb.auinfo[1].link.next:          12 ; 0x00c: 0x000c
kfdatb.auinfo[1].link.prev:          12 ; 0x00e: 0x000c
kfdatb.auinfo[2].link.next:         456 ; 0x010: 0x01c8
kfdatb.auinfo[2].link.prev:         456 ; 0x012: 0x01c8
kfdatb.auinfo[3].link.next:         488 ; 0x014: 0x01e8
kfdatb.auinfo[3].link.prev:         488 ; 0x016: 0x01e8
kfdatb.auinfo[4].link.next:          24 ; 0x018: 0x0018
kfdatb.auinfo[4].link.prev:          24 ; 0x01a: 0x0018
kfdatb.auinfo[5].link.next:          28 ; 0x01c: 0x001c
kfdatb.auinfo[5].link.prev:          28 ; 0x01e: 0x001c
kfdatb.auinfo[6].link.next:         552 ; 0x020: 0x0228
kfdatb.auinfo[6].link.prev:        3112 ; 0x022: 0x0c28
kfdatb.spare:                         0 ; 0x024: 0x00000000
kfdate[0].discriminator:              1 ; 0x028: 0x00000001
kfdate[0].allo.lo:                    0 ; 0x028: XNUM=0x0
kfdate[0].allo.hi:              8388608 ; 0x02c: V=1 I=0 H=0 FNUM=0x0
kfdate[1].discriminator:              1 ; 0x030: 0x00000001
kfdate[1].allo.lo:                    0 ; 0x030: XNUM=0x0
kfdate[1].allo.hi:              8388608 ; 0x034: V=1 I=0 H=0 FNUM=0x0
kfdate[2].discriminator:              1 ; 0x038: 0x00000001
kfdate[2].allo.lo:                    0 ; 0x038: XNUM=0x0
kfdate[2].allo.hi:              8388609 ; 0x03c: V=1 I=0 H=0 FNUM=0x1

fra磁盘虽然磁盘asm label信息存在,但是其他信息依旧损坏,但是也只是磁盘头信息损坏
20200705203336


通过现场分析,基本上可以确定是由于某种原因导致win asm 的磁盘的所有磁盘头都损坏(两个磁盘头被置空,另外一个磁盘头基本上损坏),基于原因未知
基于客户现场的情况,以及他们有前一天的rman备份,而且客户有保障现场(进一步故障原因分析)的需求,未在现场环境进行恢复,而是在不对现场环境做任何修改的情况下,直接恢复fra里面的redo和归档日志,进而结合备份异地实现数据库恢复,实现数据0丢失,又不破坏现场的效果
20200705203742

以前遇到过类似我其他操作系统平台中asm disk header异常的case:
asm磁盘分区丢失恢复
pvid=yes导致asm无法mount
asm磁盘头全部损坏数据0丢失恢复
分区无法识别导致asm diskgroup无法mount
asm disk误设置pvid导致asm diskgroup无法mount恢复

系统故障oracle数据库恢复

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:系统故障oracle数据库恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

由于系统故障,导致操作系统进入,客户通过其他方式进入系统拷贝出来数据文件,redo,ctl等文件,安装版本相同的数据库,修改相关路径,启动数据库,但是启动报错,让我们给予技术支持.数据库open报ORA-600 2662错误

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [2662], [2], [2313731576], [2],
[2313735660], [12583040], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [2662], [2], [2313731575], [2],
[2313735660], [12583040], [], [], [], [], [], []
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [2662], [2], [2313731573], [2],
[2313735660], [12583040], [], [], [], [], [], []
Process ID: 22446
Session ID: 577 Serial number: 1

alert日志报错

Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_22446.trc:
ORA-00600: internal error code, arguments:[2662],[2],[2313731573],[2],[2313735660],[12583040],[],[],[],[],[],[]
Errors in file /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_22446.trc:
ORA-00600: internal error code, arguments:[2662],[2],[2313731573],[2],[2313735660],[12583040],[],[],[],[],[],[]
Error 600 happened during db open, shutting down database
USER (ospid: 22446): terminating the instance due to error 600

这个错误比较常见,特别是使用了_allow_resetlogs_corruption屏蔽一致性强制拉库的时候.解决该问题比较简单,修改数据库scn,然后open数据库成功,参考部分案例_allow_resetlogs_corruption

[oracle@localhost ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Sat Jun 20 08:31:55 2020

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup mount pfile='/tmp/pfile';
ORACLE instance started.

Total System Global Area 4275781632 bytes
Fixed Size                  2235208 bytes
Variable Size            2902459576 bytes
Database Buffers         1325400064 bytes
Redo Buffers               45686784 bytes
Database mounted.
SQL> alter database open;

Database altered.

尝试导出数据

[oracle@localhost ~]$ tail -f nohup.out 
. exporting foreign function library names for user XIFENFEI 
. exporting PUBLIC type synonyms
. exporting private type synonyms
. exporting object type definitions for user XIFENFEI 
About to export LIOVBJ2017's objects ...
. exporting database links
. exporting sequence numbers
. exporting cluster definitions
. about to export LIOVBJ2017's tables via Conventional Path ...
. . exporting table                           ABCD
                                                            1 rows exported
. . exporting table               TB_D_RECORD
EXP-00008: ORACLE error 1578 encountered
ORA-01578: ORACLE data block corrupted (file # 1, block # 290344)
ORA-01110: data file 1: '/home/oracle/app/oracle/oradata/orcl/system01.dbf'
. . exporting table                      TB_DRIVER
EXP-00008: ORACLE error 1578 encountered
ORA-01578: ORACLE data block corrupted (file # 1, block # 290344)
ORA-01110: data file 1: '/home/oracle/app/oracle/oradata/orcl/system01.dbf'
. . exporting table               TB_XFF
EXP-00008: ORACLE error 1578 encountered
ORA-01578: ORACLE data block corrupted (file # 1, block # 290344)
ORA-01110: data file 1: '/home/oracle/app/oracle/oradata/orcl/system01.dbf'
. . exporting table              TB_XFF_TM_REL
EXP-00008: ORACLE error 1578 encountered
ORA-01578: ORACLE data block corrupted (file # 1, block # 290344)
ORA-01110: data file 1: '/home/oracle/app/oracle/oradata/orcl/system01.dbf'
. . exporting table                    TB_LOCATION

由于file # 1, block # 290344坏块导致数据无法导出,通过dbv检查数据文件

[oracle@localhost trace]$ dbv file=/home/oracle/app/oracle/oradata/orcl/system01.dbf

DBVERIFY: Release 11.2.0.3.0 - Production on Sat Jun 20 08:43:49 2020

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /home/oracle/app/oracle/oradata/orcl/system01.dbf
Page 290344 is influx - most likely media corrupt
Corrupt block relative dba: 0x00446e28 (file 1, block 290344)
Fractured block found during dbv: 
Data in bad block:
 type: 6 format: 2 rdba: 0x00446e28
 last change scn: 0x0002.89e2b718 seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x61980601
 check value in block header: 0xe118
 computed block checksum: 0xd680



DBVERIFY - Verification complete

Total Pages Examined         : 298240
Total Pages Processed (Data) : 257035
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 13457
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 3598
Total Pages Processed (Seg)  : 1
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 24149
Total Pages Marked Corrupt   : 1
Total Pages Influx           : 1
Total Pages Encrypted        : 0
Highest block SCN            : 3221247831 (2.3221247831)

确认只有一个坏块,尝试通过bbed进行坏块修复

BBED> set blocksize 8192
        BLOCKSIZE       8192

BBED> set block 290344
        BLOCK#          290344

BBED> map
 File: /home/oracle/app/oracle/oradata/orcl/system01.dbf (0)
 Block: 290344                                Dba:0x00000000
------------------------------------------------------------
 KTB Data Block (Index Leaf)

 struct kcbh, 20 bytes                      @0       

 struct ktbbh, 72 bytes                     @20      

 struct kdxle, 32 bytes                     @92      

 sb2 kd_off[231]                            @124     

 ub1 freespace[3026]                        @586     

 ub1 rowdata[4508]                          @3612    

 ub4 tailchk                                @8188    


BBED> verify
DBVERIFY - Verification starting
FILE = /home/oracle/app/oracle/oradata/orcl/system01.dbf
BLOCK = 290344

Block 290344 is corrupt
Corrupt block relative dba: 0x00446e28 (file 0, block 290344)
Fractured block found during verification
Data in bad block:
 type: 6 format: 2 rdba: 0x00446e28
 last change scn: 0x0002.89e2b718 seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x61980601
 check value in block header: 0xe118
 computed block checksum: 0xd680


DBVERIFY - Verification complete

Total Blocks Examined         : 1
Total Blocks Processed (Data) : 0
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 1
Total Blocks Influx           : 2
Message 531 not found;  product=RDBMS; facility=BBED


BBED> set mode edit
        MODE            Edit

BBED> p kcbh
struct kcbh, 20 bytes                       @0       
   ub1 type_kcbh                            @0        0x06
   ub1 frmt_kcbh                            @1        0xa2
   ub1 spare1_kcbh                          @2        0x00
   ub1 spare2_kcbh                          @3        0x00
   ub4 rdba_kcbh                            @4        0x00446e28
   ub4 bas_kcbh                             @8        0x89e2b718
   ub2 wrp_kcbh                             @12       0x0002
   ub1 seq_kcbh                             @14       0x01
   ub1 flg_kcbh                             @15       0x06 (KCBHFDLC, KCBHFCKV)
   ub2 chkval_kcbh                          @16       0xe118
   ub2 spare3_kcbh                          @18       0x0000

BBED> p tailchk
ub4 tailchk                                 @8188     0x61980601

BBED> d /v
 File: /home/oracle/app/oracle/oradata/orcl/system01.dbf (0)
 Block: 290344  Offsets: 8188 to 8191  Dba:0x00000000
-------------------------------------------------------
 01069861                            l ...a

 <16 bytes per line>

BBED> m /x 010618b7
 File: /home/oracle/app/oracle/oradata/orcl/system01.dbf (0)
 Block: 290344           Offsets: 8188 to 8191           Dba:0x00000000
------------------------------------------------------------------------
 010618b7 

 <32 bytes per line>

BBED> sum apply
Check value for File 0, Block 290344:
current = 0xe118, required = 0xe118

BBED> verify 
DBVERIFY - Verification starting
FILE = /home/oracle/app/oracle/oradata/orcl/system01.dbf
BLOCK = 290344


DBVERIFY - Verification complete

Total Blocks Examined         : 1
Total Blocks Processed (Data) : 0
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 1
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 0
Total Blocks Influx           : 0
Message 531 not found;  product=RDBMS; facility=BBED

继续尝试导出数据,遭遇ORA-08103,参考相关文章:
模拟普通ORA-08103并解决
模拟极端ORA-08103并解决
数据库启动ORA-08103故障恢复

EXP-00056: ORACLE error 8103 encountered
ORA-08103: object no longer exists

通过对其进行处理,恢复该记录之外的所有记录,客户创建新库导入数据,数据库恢复基本完成

asm磁盘dd破坏恢复

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:asm磁盘dd破坏恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户和我们反馈,由于运维人员操作错误,对一个磁盘组的asm disk进行了dd操作,导致部分数据丢失(客户数据文件存放在两个磁盘组中,其中一个被dd掉[误以为只是存放归档,其实由于第一个磁盘组空间不足,把部分数据文件放进该磁盘组])
20200603221148


通过对asm 日志进行分析发现被dd的磁盘是一个磁盘组,以前恢复过类似的asm 磁盘被dd的案例(asm磁盘头全部损坏数据0丢失恢复,上次因为dd破坏较少,所以可以通过修复磁盘组直接恢复出来里面数据,但是这次被dd了50M,直接修复磁盘头恢复数据基本上不太可能.通过工具对其进行磁盘扫描,参考:asm disk header 彻底损坏恢复,对扫描结果进行分析,发现不少数据块是重复,无法较好的实现重组效果.
20200612002025

类似出现这样的情况一般是由于该asm磁盘组中有同一个文件号的数据多份(比如一个磁盘组中有两个库,同一个数据文件存储多份),通过方面分析,该库没有文件多份存储而且该磁盘组中只有一个数据库.进一步分析仅有的asm alert日志(大部分日志被清除),发现类似信息

Sun Mar 14 05:25:40 CST 2020
NOTE: F1X0 found on disk 0 fcn 0.60289025
NOTE: cache opening disk 0 of grp 2: HIS_FLASH00 label:HIS_FLASH00
NOTE: cache opening disk 1 of grp 2: HIS_FLASH01 label:HIS_FLASH01
NOTE: cache opening disk 2 of grp 2: HIS_FLASH02 label:HIS_FLASH02
NOTE: cache opening disk 3 of grp 2: HIS_DATA03 label:HIS_DATA03
NOTE: cache mounting (first) group 2/0xCCD84BCB (HIS_FLASH)
* allocate domain 2, invalid = TRUE 
kjbdomatt send to node 0
Sun Mar 14 05:25:40 CST 2020
NOTE: attached to recovery domain 2
Sun Mar 14 05:25:40 CST 2020
NOTE: starting recovery of thread=1 ckpt=405.816 group=2
NOTE: advancing ckpt for thread=1 ckpt=405.819
NOTE: cache recovered group 2 to fcn 0.65493064
Sun Mar 14 05:25:40 CST 2020
NOTE: LGWR attempting to mount thread 1 for disk group 2
NOTE: LGWR mounted thread 1 for disk group 2
NOTE: opening chunk 1 at fcn 0.65493064 ABA 
NOTE: seq=406 blk=820 
Sun Mar 14 05:25:40 CST 2020
NOTE: cache mounting group 2/0xCCD84BCB (HIS_FLASH) succeeded
SUCCESS: diskgroup HIS_FLASH was mounted
Sun Mar 14 05:25:42 CST 2020
NOTE: recovering COD for group 2/0xccd84bcb (HIS_FLASH)
SUCCESS: completed COD recovery for group 2/0xccd84bcb (HIS_FLASH)
Sun Mar 14 05:25:47 CST 2020
Starting background process ASMB
ASMB started with pid=17, OS id=14599

初步可以定位,很可能是由于在3月份对该磁盘组进行了扩容,从而发生了数据文件的rebalance操作,从而出现了某些block有重复现象,对于这类情况,通过结合asm字典信息进行分析可以完全规避该问题,对数据文件进行恢复,dbv进行检查,一切正常
20200612000805


对所有文件类似处理,结合正常磁盘组中数据文件,对数据库进行直接open,实现完美恢复.
如果您遇到此类情况,无法解决请联系我们,提供专业ORACLE数据库恢复技术支持
Phone:13429648788    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com
这次数据能够完美恢复属于侥幸,因为asm disk被dd了50M(正常情况下4个磁盘的磁盘组每个磁盘dd 50M之后,很可能有部分数据文件被覆盖,该客户该磁盘组最初是存储归档日志,因此数据文件写入位置相对比较靠后,从而没有被dd破坏)

12.2数据库启动报ORA-007445 lmebucp错误

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:12.2数据库启动报ORA-007445 lmebucp错误

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有一个客户找到我们,说他们是数据库启动之时报的错误和数据库不能open 报ORA-7445 lmebucp错类似,让我们对其进行恢复支持,通过分析确定客户数据库版本为12.2.0.1

Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Productio
PL/SQL Release 12.2.0.1.0 - Production	
CORE 12.2.0.1.0 Production	
TNS for Linux: Version 12.2.0.1.0 - Production	
NLSRTL Version 12.2.0.1.0 - Production	

alert日志报错

--最初报错
2020-05-11T03:43:06.787164+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_m004_3253.trc  (incident=639048):
ORA-00600: 内部错误代码, 参数: [kkpo_rcinfo_defstg:delseg], [72280], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_639048/orcl_m004_3253_i639048.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2020-05-11T03:43:14.250993+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_m004_3253.trc  (incident=639049):
ORA-00600: 内部错误代码, 参数: [kkpo_rcinfo_defstg:delseg], [72280], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_639049/orcl_m004_3253_i639049.trc
2020-05-11T03:43:21.286310+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_m004_3253.trc  (incident=639050):
ORA-00600: 内部错误代码, 参数: [kkpo_rcinfo_defstg:delseg], [72280], [], [], [], [], [], [], [], [], [], []
ORA-06512: 在 line 2
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_639050/orcl_m004_3253_i639050.trc
2020-05-11T03:43:28.059048+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2020-05-11T03:43:28.074681+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_m004_3253.trc:
ORA-00600: 内部错误代码, 参数: [kkpo_rcinfo_defstg:delseg], [72280], [], [], [], [], [], [], [], [], [], []
ORA-06512: 在 line 2
2020-05-11T08:31:22.416087+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_j000_16511.trc:
ORA-12012: 自动执行作业 141 出错
ORA-30732: 表中不包含用户可见的列
ORA-06512: 在 line 1


---关闭数据库之后重启报错
2020-05-11T09:42:43.234769+08:00
ALTER DATABASE OPEN
2020-05-11T09:42:44.353085+08:00
Ping without log force is disabled:
  instance mounted in exclusive mode.
Endian type of dictionary set to little
2020-05-11T09:42:44.660388+08:00
TT00: Gap Manager starting (PID:31134)
2020-05-11T09:42:45.180876+08:00
Thread 1 opened at log sequence 78596
  Current log# 2 seq# 78596 mem# 0: /u01/app/oracle/oradata/orcl/redo02.log
Successful open of redo thread 1
2020-05-11T09:42:45.181357+08:00
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Exception [type: SIGSEGV, Address not mapped to object][ADDR:0x0][PC:0x10F4C112, lmebucp()+34][flags: 0x0, count: 1]
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_31125.trc  (incident=646445):
ORA-07445: exception encountered:core dump[lmebucp()+34][SIGSEGV][ADDR:0x0][PC:0x10F4C112][Address not mapped to object]
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_646445/orcl_ora_31125_i646445.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2020-05-11T09:42:46.070049+08:00
*****************************************************************
An internal routine has requested a dump of selected redo.
This usually happens following a specific internal error, when
analysis of the redo logs will help Oracle Support with the
diagnosis.
It is recommended that you retain all the redo logs generated (by
all the instances) during the past 12 hours, in case additional
redo dumps are required to help with the diagnosis.
*****************************************************************
2020-05-11T09:42:53.344955+08:00
Instance Critical Process (pid: 41, ospid: 31125) died unexpectedly
PMON (ospid: 31026): terminating the instance due to error 12752
2020-05-11T09:42:53.377209+08:00
System state dump requested by (instance=1, osid=31026 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_diag_31046_20200511094253.trc
2020-05-11T09:42:56.690224+08:00
Instance terminated by PMON, pid = 31026

对启动过程做10046跟踪

PARSING IN CURSOR #139821999713040 len=65 dep=1 uid=0 oct=3 lid=0 
tim=2200910467 hv=1762642493 ad='1bfd79130'sqlid='aps3qh1nhzkjx'
select line#, sql_text from bootstrap$ where obj# not in (:1, :2)
END OF STMT
PARSE #139821999713040:c=378,e=378,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=2200910467
BINDS #139821999713040:

 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7f2ad89fe6c8  bln=22  avl=02  flg=05
  value=59
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7f2ad89fe698  bln=24  avl=06  flg=05
  value=4294967295
EXEC #139821999713040:c=219,e=658,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=867914364,tim=2200911218
WAIT #139821999713040: nam='db file sequential read' ela= 9 file#=1 block#=520 blocks=1 obj#=59 tim=2200911300
WAIT #139821999713040: nam='db file scattered read' ela= 19 file#=1 block#=521 blocks=3 obj#=59 tim=2200911559
FETCH #139821999713040:c=404,e=404,p=4,cr=5,cu=0,mis=0,r=0,dep=1,og=4,plh=867914364,tim=2200911646
STAT #139821999713040 id=1 cnt=0 pid=0 pos=1 obj=59 op='TABLE ACCESS FULL BOOTSTRAP$ (cr=5 pr=4 pw=0 str=1 time=406 us)'

*** 2020-05-11T12:25:30.977832+08:00
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x0] [PC:0x10F4C112, lmebucp()+34] [flags: 0x0, count: 1]
2020-05-11T12:25:31.026914+08:00
Incident 718445 created, dump file:/u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_718445/orcl_ora_14324_i718445.trc
ORA-07445: exception encountered:core dump [lmebucp()+34][SIGSEGV][ADDR:0x0][PC:0x10F4C112][Address not mapped to object]

根据该报错,可以大概定位数据库重启之后报ORA-07445 lmebucp()+34错误不能正常启动是由于bootstrap$表异常导致.

BBED> p kcvfhrdb
ub4 kcvfhrdb                                @96       0x00400208

BBED> set block 523
        BLOCK#          523

BBED> map
 File: F:\temp\12.2\1\system01.dbf (0)
 Block: 523                                   Dba:0x00000000
------------------------------------------------------------
 KTB Data Block (Table/Cluster)

 struct kcbh, 20 bytes                      @0

 struct ktbbh, 48 bytes                     @20

 struct kdbh, 14 bytes                      @68

 struct kdbt[1], 4 bytes                    @82

 sb2 kdbr[20]                               @86

 ub1 freespace[1015]                        @126

 ub1 rowdata[7047]                          @1141

 ub4 tailchk                                @8188

BBED> p *kdbr[1]
rowdata[6431]
-------------
ub1 rowdata[6431]                           @7572     0x3c

BBED> x /rnnc
rowdata[6431]                               @7572
-------------
flag@7572: 0x3c (KDRHFL, KDRHFF, KDRHFD, KDRHFH)
lock@7573: 0x01
cols@7574:    0

通过bbed定位rootdba,然后dump相关block,随机找一条记录,确认bootstrap表无后效记录.但是该数据库在重启之前已经报了ORA-600 kkpo_rcinfo_defstg:delseg和ORA-30732错误,很可能还有其他的基表异常.通过先修复bootstrap$记录,然后根据该表中记录分析其他相关表记录,最终确定tab$中记录也异常,通过bbed 批量循环修复方法,对其进行恢复,open数据库,可以验证数据没有问题.至此该问题解决,但是没有找出来故障原因(是人为破坏【直接人工删除】,还是某种工具带入恶意脚本导致,类似:plsql dev引起的数据库被黑勒索比特币实现原理分析和解决方案,亦或者是数据库安装介质带有恶意程序,类似:plsql dev引起的数据库被黑勒索比特币实现原理分析和解决方案)

ORA-21779错误处理

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:ORA-21779错误处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户win 10.2.0.4的rac,查看alert日志发现如下错误

Mon May 04 17:25:18 2020
Errors in file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl2_smon_2548.trc:
ORA-21779: 持续时间不活动
ORA-06512: 在 line 1

Mon May 04 17:25:28 2020
Errors in file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl2_smon_2548.trc:
ORA-21779: 持续时间不活动
ORA-06512: 在 line 1

Mon May 04 17:25:38 2020
Errors in file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl2_smon_2548.trc:
ORA-21779: 持续时间不活动
ORA-06512: 在 line 1

Mon May 04 17:25:39 2020
Errors in file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl2_smon_2548.trc:
ORA-21779: 持续时间不活动
ORA-06512: 在 line 1

查看对应trace

Dump file d:\oracle\product\10.2.0\admin\orcl\bdump\orcl2_smon_2548.trc
Sat Aug 31 15:02:39 2019
ORACLE V10.2.0.4.0 - 64bit Production vsnsta=0
vsnsql=14 vsnxtr=3
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
Windows Server 2003 Version V5.2 Service Pack 2
CPU                 : 32 - type 8664, 2 Physical Cores
Process Affinity    : 0x0000000000000000
Memory (Avail/Total): Ph:17543M/32742M, Ph+PgF:19550M/33833M

…………

*** 2020-05-04 18:40:35.687
SMON: following errors trapped and ignored:
ORA-21779: 持续时间不活动
ORA-06512: 在 line 1
*** 2020-05-04 18:40:45.734
	 Drop transient type:   SYSTPzpEA6olJSImGURLObkiE6w==
*** 2020-05-04 18:40:45.734
SMON: following errors trapped and ignored:
ORA-21779: 持续时间不活动
ORA-06512: 在 line 1
*** 2020-05-04 18:40:55.812
	 Drop transient type:   SYSTPzpEA6olJSImGURLObkiE6w==
*** 2020-05-04 18:40:55.812
SMON: following errors trapped and ignored:
ORA-21779: 持续时间不活动
ORA-06512: 在 line 1
*** 2020-05-04 18:41:05.875
	 Drop transient type:   SYSTPzpEA6olJSImGURLObkiE6w==
*** 2020-05-04 18:41:05.875
SMON: following errors trapped and ignored:
ORA-21779: 持续时间不活动
ORA-06512: 在 line 1

出现该问题是由于oracle的smon进程无法清理掉 transient types,从而出现该问题,根据官方的说法,这个错误是不会影响数据库正常使用,但是可以通过以下方法暂时规避这种错误:
1)通过设置alter system set events ’22834 trace name context forever, level 1′禁止smon清理transient types,从而来规避该错误,但是可能会引起transient types对象越来越多,当然你可以通过以下sql查询出来

select o.* from obj$ o, type$ t
where o.oid$ = t.tvoid and
bitand(t.properties,8388608) = 8388608 and (sysdate-o.ctime) > 0.0007;

然后删除掉相关记录DROP TYPE “SYSTPf/r2wN4keX7gQKjA3AFMSw==” FORCE;【这个删除不是必须的】
2) flush shared_pool也可以临时规避这个问题
3) 重启数据库,可以暂时规避

具体参考:
SMON: Following Errors Trapped And Ignored ORA-21779 (Doc ID 988663.1)
Receiving ORA-21780 Continuously in the Alert Log and SMON Trace Reports “Drop transient type”. (Doc ID 1081950.1)