ORA-00600: internal error code, arguments: [16513], [1403] 恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-00600: internal error code, arguments: [16513], [1403] 恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

接到客户请求,存储异常断电之后,一个40多T的经分数据库无法正常启动,通过各方一系列操作之后,数据库依旧无法open,报错信息为:ORA-00600: internal error code, arguments: [16513], [1403], [28]
20210214193332


Sun Feb 14 00:20:09 BEIST 2021
SMON: enabling cache recovery
Sun Feb 14 00:20:09 BEIST 2021
ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 0x0f27.13cd4fc3):
Sun Feb 14 00:20:09 BEIST 2021
select ctime, mtime, stime from obj$ where obj# = :1
Sun Feb 14 00:20:09 BEIST 2021
Errors in file /oracle10g/db/admin/xifenfei/udump/xifenfei1_ora_177254.trc:
ORA-00600: internal error code, arguments: [16513], [1403], [28], [], [], [], [], []
Sun Feb 14 00:20:10 BEIST 2021
Errors in file /oracle10g/db/admin/xifenfei/udump/xifenfei1_ora_177254.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [16513], [1403], [28], [], [], [], [], []
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 177254
ORA-1092 signalled during: ALTER DATABASE OPEN...

通过对启动过程进行跟踪

=====================
PARSING IN CURSOR #5 len=52 dep=1 uid=0 oct=3 lid=0 tim=194171381576991 hv=429618617 ad='afcee60'
select ctime, mtime, stime from obj$ where obj# = :1
END OF STMT
PARSE #5:c=0,e=257,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=194171381576990
BINDS #5:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=1104b8ed0  bln=22  avl=02  flg=05
  value=28
EXEC #5:c=0,e=422,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=194171381577472
WAIT #5: nam='db file sequential read' ela= 274 file#=1 block#=110 blocks=1 obj#=36 tim=194171381577809
WAIT #5: nam='db file sequential read' ela= 257 file#=1 block#=78943 blocks=1 obj#=36 tim=194171381578123
WAIT #5: nam='db file sequential read' ela= 253 file#=1 block#=111 blocks=1 obj#=36 tim=194171381578416
WAIT #5: nam='db file sequential read' ela= 226 file#=1 block#=62 blocks=1 obj#=18 tim=194171381578692
=====================
PARSING IN CURSOR #6 len=142 dep=2 uid=0 oct=3 lid=0 tim=194171381579134 hv=361892850 ad='df87eb0'
select /*+ rule */ name,file#,block#,status$,user#,undosqn,xactsqn,scnbas,scnwrp,
DECODE(inst#,0,NULL,inst#),ts#,spare1 from undo$ where us#=:1
END OF STMT
PARSE #6:c=0,e=368,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=3,tim=194171381579133
BINDS #6:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=1105af9a8  bln=22  avl=02  flg=05
  value=92
EXEC #6:c=0,e=531,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=3,tim=194171381579736
WAIT #6: nam='db file sequential read' ela= 232 file#=1 block#=102 blocks=1 obj#=34 tim=194171381580011
WAIT #6: nam='db file sequential read' ela= 251 file#=1 block#=54 blocks=1 obj#=15 tim=194171381580307
FETCH #6:c=0,e=615,p=2,cr=2,cu=0,mis=0,r=1,dep=2,og=3,tim=194171381580369
STAT #6 id=1 cnt=1 pid=0 pos=1 obj=15 op='TABLE ACCESS BY INDEX ROWID UNDO$ (cr=2 pr=2 pw=0 time=584 us)'
STAT #6 id=2 cnt=1 pid=1 pos=1 obj=34 op='INDEX UNIQUE SCAN I_UNDO1 (cr=1 pr=1 pw=0 time=282 us)'
WAIT #5: nam='db file sequential read' ela= 260 file#=4290 block#=365090 blocks=1 obj#=0 tim=194171381580721
FETCH #5:c=10000,e=3327,p=7,cr=7,cu=0,mis=0,r=0,dep=1,og=4,tim=194171381580817
*** 2021-02-14 02:32:40.430
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [16513], [1403], [28], [], [], [], [], []
Current SQL statement for this session:
alter database open
----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
ksedst+001c          bl       ksedst1              70000090FC00AE4 ? 000000000 ?
ksedmp+0290          bl       ksedst               104C1CA58 ?
ksfdmp+02d8          bl       03F4BD0C             
kgeriv+0108          bl       _ptrgl               
kgesiv+0080          bl       kgeriv               7000008F6DB5890 ? 00000000C ?
                                                   7000008F6DB5890 ? 0FFFFFDC3 ?
                                                   0FFFFFFBF ?
ksesic2+0060         bl       kgesiv               5FFFEC580 ? 500000000 ?
                                                   FFFFFFFFFFEC590 ? 0F5FAF808 ?
                                                   7000008F5FAF7E8 ?
kqdpts+0158          bl       ksesic2              408100004081 ? 000000000 ?
                                                   00000057B ? 000000000 ?
                                                   00000001C ?
                                                   A006150571B05EF0 ?
                                                   1100CF0A8 ? 000000001 ?
kqrlfc+0274          bl       kqdpts               000000000 ?
kqlbplc+00b4         bl       03F4E6A8             
kqlblfc+0230         bl       kqlbplc              00000009D ?
adbdrv+1a9c          bl       03F4B66C             
opiexe+2db4          bl       adbdrv               
opiosq0+1ac8         bl       opiexe               1103B418C ? 000000000 ?
                                                   FFFFFFFFFFF9008 ?
kpooprx+016c         bl       opiosq0              300F1E1D4 ? 000000000 ?
                                                   000000000 ? A4FFFFFFFF9798 ?
                                                   000000000 ?
kpoal8+03cc          bl       kpooprx              FFFFFFFFFFFB814 ?
                                                   FFFFFFFFFFFB5C0 ?
                                                   1300000013 ? 100000001 ?
                                                   000000000 ? A40000000000A4 ?
                                                   000000000 ? 1103B4378 ?
opiodr+0b2c          bl       _ptrgl               
ttcpip+1020          bl       _ptrgl               
opitsk+117c          bl       01FBD04C             
opiino+09d0          bl       opitsk               1EFFFFD7E0 ? 000000000 ?
opiodr+0b2c          bl       _ptrgl               
opidrv+04a4          bl       opiodr               3C102B1398 ? 404C6FF30 ?
                                                   FFFFFFFFFFFF7A0 ? 0102B1390 ?
sou2o+0090           bl       opidrv               3C02705B3C ? 440663000 ?
                                                   FFFFFFFFFFFF7A0 ?
opimai_real+01bc     bl       01FB9BF4             
main+0098            bl       opimai_real          000000000 ? 000000000 ?
__start+0098         bl       main                 000000000 ? 000000000 ?
 
--------------------- Binary Stack Dump ---------------------

报错比较明显,数据库在查询obj$取obj#=28的数据的时候无法正常获取到该数据,从而报错ORA-600 16513错误.通过对相关block进行分析,发现有事务异常

BBED>  p ktbbh.ktbbhitl[1]
struct ktbbhitl[1], 24 bytes                @44
   struct ktbitxid, 8 bytes                 @44
      ub2 kxidusn                           @44       0x5c00
      ub2 kxidslt                           @46       0x0a00
      ub4 kxidsqn                           @48       0x48b4fc01
   struct ktbituba, 8 bytes                 @52
      ub4 kubadba                           @52       0x22928531
      ub2 kubaseq                           @56       0xe383
      ub1 kubarec                           @58       0x01
   ub2 ktbitflg                             @60       0x0120 (NONE)
   union _ktbitun, 2 bytes                  @62
      b2 _ktbitfsc                          @62       0
      ub2 _ktbitwrp                         @62       0x0000
   ub4 ktbitbas                             @64       0x95c2ff13

通过一些技巧处理规避掉该事务,然后启动库报熟悉的ORA-01555错误
20210214161333


相对比较简单参考(在数据库open过程中常遇到ORA-01555汇总),数据库顺利open成功,完成春节后第一个大库的恢复
20210214193344

ORA-600 3020错误引起ORA-600 2663

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-600 3020错误引起ORA-600 2663

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

数据库recover异常ORA-600 3020

SQL> recover database using backup controlfile until cancel;
ORA-00279: change 5693717234723 generated at 01/19/2021 10:44:52 needed for
thread 1
ORA-00289: suggestion : +RECOVER/arch/1_294845_938895110.dbf
ORA-00280: change 5693717234723 for thread 1 is in sequence #294845


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
+BACKUP/xifenfei/onlinelog/group_5.258.973180257
ORA-00279: change 5693717234723 generated at 01/15/2021 11:41:15 needed for
thread 2
ORA-00289: suggestion : +RECOVER/arch/2_336576_938895110.dbf
ORA-00280: change 5693717234723 for thread 2 is in sequence #336576


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
+DATA1/xifenfei/onlinelog/group_8.298.962885887
ORA-00600: internal error code, arguments: [3020], [128], [248606],
[537119518], [], [], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 128, block# 248606, file
offset is 2036580352 bytes)
ORA-10564: tablespace UNDOTBS1
ORA-01110: data file 128: '+DATA1/xifenfei/datafile/undotbs1_02.dbf'
ORA-10560: block type 'KTU UNDO BLOCK'


ORA-01112: media recovery not started

这个错误比较简单,一般是允许坏块继续恢复

SQL> recover database using backup controlfile allow 1 corruption;
ORA-00279: change 5693717234839 generated at 01/19/2021 10:44:52 needed for
thread 1
ORA-00289: suggestion : +RECOVER/arch/1_294845_938895110.dbf
ORA-00280: change 5693717234839 for thread 1 is in sequence #294845


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
+BACKUP/xifenfei/onlinelog/group_5.258.973180257
ORA-00279: change 5693717234839 generated at 01/15/2021 11:41:15 needed for
thread 2
ORA-00289: suggestion : +RECOVER/arch/2_336576_938895110.dbf
ORA-00280: change 5693717234839 for thread 2 is in sequence #336576


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
+DATA1/xifenfei/onlinelog/group_8.298.962885887
ORA-00279: change 5693717637654 generated at 01/19/2021 10:47:25 needed for
thread 1
ORA-00289: suggestion : +RECOVER/arch/1_294846_938895110.dbf
ORA-00280: change 5693717637654 for thread 1 is in sequence #294846
ORA-00278: log file '+BACKUP/xifenfei/onlinelog/group_5.258.973180257' no longer
needed for this recovery


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
+RECOVER/xifenfei/onlinelog/group_3.258.973180321
ORA-00279: change 5693717705759 generated at 01/19/2021 10:48:07 needed for
thread 1
ORA-00289: suggestion : +RECOVER/arch/1_294847_938895110.dbf
ORA-00280: change 5693717705759 for thread 1 is in sequence #294847
ORA-00278: log file '+RECOVER/xifenfei/onlinelog/group_3.258.973180321' no
longer needed for this recovery


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
+BACKUP/xifenfei/onlinelog/group_7.265.973181365
Log applied.
Media recovery complete.

后续重建ctl,尝试recover库,报ORA-10877错误

SQL> startup mount pfile='/tmp/pfile'
ORACLE instance started.

Total System Global Area 1.0088E+10 bytes
Fixed Size		    2261928 bytes
Variable Size		 2181041240 bytes
Database Buffers	 7851737088 bytes
Redo Buffers		   53149696 bytes
Database mounted.
SQL> recover database;
ORA-10877: error signaled in parallel recovery slave


--对应的alert日志
Wed Jan 20 13:34:04 2021
ALTER DATABASE RECOVER  database  
Media Recovery Start
 started logmerger process
Parallel Media Recovery started with 64 slaves
Wed Jan 20 13:34:06 2021
Errors in file /u01/app/oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_pr00_50593.trc:
ORA-00313: open failed for members of log group 7 of thread 1
Media Recovery failed with error 313
Errors in file /u01/app/oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_pr00_50593.trc:
ORA-00283: recovery session canceled due to errors
ORA-00313: open failed for members of log group 7 of thread 1
ORA-10877 signalled during: ALTER DATABASE RECOVER  database  ...

resetlogs失败open数据库失败,ORA-600 2663

Wed Jan 20 13:42:34 2021
Setting recovery target incarnation to 2
Initializing SCN for created control file
Database SCN compatibility initialized to 3
Warning - High Database SCN: Current SCN value is 5693718057561, threshold SCN value is 0
If you have not previously reported this warning on this database, please notify Oracle Support so that additional diagnosis can be performed.
Wed Jan 20 13:42:35 2021
Assigning activation ID 3801294256 (0xe29325b0)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: +RECOVER/xifenfei/onlinelog/group_1.260.973179783
  Current log# 1 seq# 1 mem# 1: +BACKUP/xifenfei/onlinelog/group_1.260.973179787
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Wed Jan 20 13:42:35 2021
SMON: enabling cache recovery
Errors in file /u01/app/oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_ora_52800.trc  (incident=189187):
ORA-00600: internal error code, arguments: [2663], [1325], [2886390384], [1325], [2886403118], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/xifenfei/xifenfei1/incident/incdir_189187/xifenfei1_ora_52800_i189187.trc
Wed Jan 20 13:42:38 2021
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_ora_52800.trc:
ORA-00600: internal error code, arguments: [2663], [1325], [2886390384], [1325], [2886403118], [], [], [], [], [], [], []
Errors in file /u01/app/oracle/diag/rdbms/xifenfei/xifenfei1/trace/xifenfei1_ora_52800.trc:
ORA-00600: internal error code, arguments: [2663], [1325], [2886390384], [1325], [2886403118], [], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
USER (ospid: 52800): terminating the instance due to error 600

这个错误比较明显,由于scn的异常导致,通过调整scn,数据库正常open成功,然后使用hcheck检查数据库字典一致(运气不错),没有太大问题,后续建议客户进行逻辑迁移
20210121234330


ORA-00600 kfrHtAdd01

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-00600 kfrHtAdd01

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

由于存储掉电,报ORA-15096: lost disk write detected错误,无法mount磁盘组.

Sun Dec 20 16:56:51 2020
SQL> alter diskgroup data mount 
NOTE: cache registered group DATA number=1 incarn=0x0c1a7a4e
NOTE: cache began mount (first) of group DATA number=1 incarn=0x0c1a7a4e
NOTE: Assigning number (1,2) to disk (/dev/mapper/multipath12)
NOTE: Assigning number (1,5) to disk (/dev/mapper/multipath15)
NOTE: Assigning number (1,3) to disk (/dev/mapper/multipath13)
NOTE: Assigning number (1,7) to disk (/dev/mapper/multipath17)
NOTE: Assigning number (1,1) to disk (/dev/mapper/multipath11)
NOTE: Assigning number (1,6) to disk (/dev/mapper/multipath16)
NOTE: Assigning number (1,0) to disk (/dev/mapper/multipath10)
NOTE: Assigning number (1,4) to disk (/dev/mapper/multipath14)
Sun Dec 20 16:56:57 2020
NOTE: GMON heartbeating for grp 1
GMON querying group 1 at 19 for pid 32, osid 130347
NOTE: cache opening disk 0 of grp 1: DATA_0000 path:/dev/mapper/multipath10
NOTE: F1X0 found on disk 0 au 2 fcn 0.14159360
NOTE: cache opening disk 1 of grp 1: DATA_0001 path:/dev/mapper/multipath11
NOTE: F1X0 found on disk 1 au 2 fcn 0.14159360
NOTE: cache opening disk 2 of grp 1: DATA_0002 path:/dev/mapper/multipath12
NOTE: F1X0 found on disk 2 au 2 fcn 0.14159360
NOTE: cache opening disk 3 of grp 1: DATA_0003 path:/dev/mapper/multipath13
NOTE: cache opening disk 4 of grp 1: DATA_0004 path:/dev/mapper/multipath14
NOTE: cache opening disk 5 of grp 1: DATA_0005 path:/dev/mapper/multipath15
NOTE: cache opening disk 6 of grp 1: DATA_0006 path:/dev/mapper/multipath16
NOTE: cache opening disk 7 of grp 1: DATA_0007 path:/dev/mapper/multipath17
NOTE: cache mounting (first) normal redundancy group 1/0x0C1A7A4E (DATA)
Sun Dec 20 16:56:57 2020
* allocate domain 1, invalid = TRUE 
Sun Dec 20 16:56:58 2020
NOTE: attached to recovery domain 1
NOTE: starting recovery of thread=1 ckpt=233.4189 group=1 (DATA)
NOTE: starting recovery of thread=2 ckpt=542.6409 group=1 (DATA)
lost disk write detected during recovery (apply)
NOTE: recovery (pass 2) of diskgroup 1 (DATA) caught error ORA-15096
Errors in file /grid/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_130347.trc:
ORA-15096: lost disk write detected
Abort recovery for domain 1
NOTE: crash recovery signalled OER-15096
ERROR: ORA-15096 signalled during mount of diskgroup DATA
NOTE: cache dismounting (clean) group 1/0x0C1A7A4E (DATA) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 130347, image: oracle@db1.rac.com (TNS V1-V3)
NOTE: lgwr not being msg'd to dismount

通过一系列修复之后报错如下

Sun Dec 20 20:12:35 2020
NOTE: GMON heartbeating for grp 1
GMON querying group 1 at 23 for pid 26, osid 67538
Sun Dec 20 20:12:35 2020
NOTE: cache opening disk 0 of grp 1: DATA_0000 path:/dev/mapper/multipath10
NOTE: F1X0 found on disk 0 au 2 fcn 0.14159360
NOTE: cache opening disk 1 of grp 1: DATA_0001 path:/dev/mapper/multipath11
NOTE: F1X0 found on disk 1 au 2 fcn 0.14159360
NOTE: cache opening disk 2 of grp 1: DATA_0002 path:/dev/mapper/multipath12
NOTE: F1X0 found on disk 2 au 2 fcn 0.14159360
NOTE: cache opening disk 3 of grp 1: DATA_0003 path:/dev/mapper/multipath13
NOTE: cache opening disk 4 of grp 1: DATA_0004 path:/dev/mapper/multipath14
NOTE: cache opening disk 5 of grp 1: DATA_0005 path:/dev/mapper/multipath15
NOTE: cache opening disk 6 of grp 1: DATA_0006 path:/dev/mapper/multipath16
NOTE: cache opening disk 7 of grp 1: DATA_0007 path:/dev/mapper/multipath17
NOTE: cache mounting (first) normal redundancy group 1/0x64848829 (DATA)
Sun Dec 20 20:12:36 2020
* allocate domain 1, invalid = TRUE 
Sun Dec 20 20:12:36 2020
NOTE: attached to recovery domain 1
NOTE: Fallback recovery: thread 2 read 10751 blocks oldest redo found in ABA 540.6429
NOTE: Fallback recovery: thread 1 read 10751 blocks oldest redo found in ABA 232.4218
Errors in file /grid/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_67538.trc  (incident=1692689):
ORA-00600: internal error code, arguments: [kfrHtAdd01], [2147483651], [1025], [0], [38660545], [0],
 [38687990], [1], [2], [6429], [], []
Incident details in: /grid/app/grid/diag/asm/+asm/+ASM1/incident/incdir_1692689/+ASM1_ora_67538_i1692689.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Sun Dec 20 20:12:39 2020
Sweep [inc][1692689]: completed
Sweep [inc2][1692689]: completed
Errors in file /grid/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_67538.trc:
ORA-00600: internal error code, arguments: [kfrHtAdd01], [2147483651], [1025], [0], [38660545], [0],
 [38687990], [1], [2], [6429], [], []
NOTE: crash recovery signalled OER-600
ERROR: ORA-600 signalled during mount of diskgroup DATA
NOTE: cache dismounting (clean) group 1/0x64848829 (DATA) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 67538, image: oracle@db1.rac.com (TNS V1-V3)
NOTE: lgwr not being msg'd to dismount
freeing rdom 1
NOTE: detached from domain 1
NOTE: cache dismounted group 1/0x64848829 (DATA) 
NOTE: cache ending mount (fail) of group DATA number=1 incarn=0x64848829
NOTE: cache deleting context for group DATA 1/0x64848829
GMON dismounting group 1 at 24 for pid 26, osid 67538
NOTE: Disk DATA_0000 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0001 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0002 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0003 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0004 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0005 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0006 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0007 in mode 0x7f marked for de-assignment
ERROR: diskgroup DATA was not mounted
ORA-00600: internal error code, arguments: [kfrHtAdd01], [2147483651], [1025], [0],
 [38660545], [0], [38687990], [1], [2], [6429], [], []
ERROR: alter diskgroup data mount

分析trace文件

*** 2020-12-20 20:11:54.956
kfdp_query(DATA): 19 
----- Abridged Call Stack Trace -----
ksedsts()+465<-kfdp_query()+530<-kfdPstSyncPriv()+585<-kfgFinalizeMount()+1630<-kfgscFinalize()+1433<
-kfgForEachKfgsc()+285<-kfgsoFinalize()+135<-kfgFinalize()+398<-kfxdrvMount()+5558<-kfxdrvEntry()
+2207<-opiexe()+20624<-opiosq0()+3932<-kpooprx()+274<-kpoal8()+842<-opiodr()+917<-ttcpip()
+2183<-opitsk()+1710<-opiino()+969<-opiodr()+917<-opidrv()+570<-sou2o()
+103<-opimai_real()+133<-ssthrdmain()+265<-main()+201<-__libc_start_main()+253 
----- End of Abridged Call Stack Trace -----
2020-12-20 20:11:55.393106 : Start recovery for domain=1, valid=0, flags=0x4
NOTE: starting recovery of thread=1 ckpt=233.4189 group=1 (DATA)
NOTE: starting recovery of thread=2 ckpt=542.6409 group=1 (DATA)
lost disk write detected during recovery (apply):
last written kfcn: 0.38747593 aba=233.4208 thd=1
kfcn_kfrbcd=0.38747593 flags_kfrbcd=0x001c aba=542.6410 thd=2
CE: (0x0x66edc798) group=1 (DATA) fn=4 blk=1
    hashFlags=0x0000 lid=0x0002 lruFlags=0x0000 bastCount=1
    mirror=0
    flags_kfcpba=0x38 copies=3 blockIndex=1 AUindex=0 AUcount=0 loctr fcn=0.0
    copy #0:  disk=6  au=35 flags=01
    copy #1:  disk=0  au=34 flags=01
    copy #2:  disk=4  au=52 flags=01
BH: (0x0x66e10d00) bnum=33 type=COD_RBO state=rcv chgSt=not modifying pageIn=rcvRead
    flags=0x00000000 pinmode=excl lockmode=null bf=0x66020000
    kfbh_kfcbh.fcn_kfbh = 0.38747538 lowAba=0.0 highAba=0.0
    modTime=0
    last kfcbInitSlot return code=null chgCount=0 cpkt lnk is null ralFlags=0x00000000
    PINS:
    (kfcbps) pin=91 get by kfr.c line 7879 mode=excl
             fn=4 blk=1 status=pinned
             flags=0x88000000 flags2=0x00000000
             class=0 type=INVALID stateWanted=rcvRead
             bastCount=1 waitStatus=0x00000000 relocCount=0
             scanBastCount=0 scanBxid=0 scanSkipCode=0
             last released by kfc.c 21183
NOTE: recovery (pass 2) of diskgroup 1 (DATA) caught error ORA-15096
last new 0.0
kfrPass2: dump of current log buffer for error 15096 follows
=======================
OSM metadata block dump:
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            8 ; 0x002: KFBTYP_CHNGDIR
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                   17162 ; 0x004: blk=17162
kfbh.block.obj:                       3 ; 0x008: file=3
kfbh.check:                  4226524538 ; 0x00c: 0xfbeba57a
kfbh.fcn.base:                 38747431 ; 0x010: 0x024f3d27
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfracdb.aba.seq:                    542 ; 0x000: 0x0000021e
kfracdb.aba.blk:                   6409 ; 0x004: 0x00001909
kfracdb.ents:                         1 ; 0x008: 0x0001
kfracdb.ub2spare:                     0 ; 0x00a: 0x0000
kfracdb.lge[0].valid:                 1 ; 0x00c: V=1 B=0 M=0
kfracdb.lge[0].chgCount:              1 ; 0x00d: 0x01
kfracdb.lge[0].len:                  68 ; 0x00e: 0x0044
kfracdb.lge[0].kfcn.base:      38747432 ; 0x010: 0x024f3d28
kfracdb.lge[0].kfcn.wrap:             0 ; 0x014: 0x00000000
kfracdb.lge[0].bcd[0].kfbl.blk:    1292 ; 0x018: blk=1292
kfracdb.lge[0].bcd[0].kfbl.obj:       1 ; 0x01c: file=1
kfracdb.lge[0].bcd[0].kfcn.base:38743102 ; 0x020: 0x024f2c3e
kfracdb.lge[0].bcd[0].kfcn.wrap:      0 ; 0x024: 0x00000000
kfracdb.lge[0].bcd[0].oplen:          8 ; 0x028: 0x0008
kfracdb.lge[0].bcd[0].blkIndex:      12 ; 0x02a: 0x000c
kfracdb.lge[0].bcd[0].flags:         28 ; 0x02c: F=0 N=0 F=1 L=1 V=1 A=0 C=0
kfracdb.lge[0].bcd[0].opcode:       135 ; 0x02e: 0x0087
kfracdb.lge[0].bcd[0].kfbtyp:         4 ; 0x030: KFBTYP_FILEDIR
kfracdb.lge[0].bcd[0].redund:        19 ; 0x031: SCHE=0x1 NUMB=0x3
kfracdb.lge[0].bcd[0].pad:        63903 ; 0x032: 0xf99f
kfracdb.lge[0].bcd[0].KFFFD_COMMIT.modts.hi:33108586 ; 0x034: HOUR=0xa DAYS=0x13 MNTH=0xc YEAR=0x7e4
kfracdb.lge[0].bcd[0].KFFFD_COMMIT.modts.lo:0 ; 0x038: USEC=0x0 MSEC=0x0 SECS=0x0 MINS=0x0
kfracdb.lge[0].bcd[0].au[0]:     292415 ; 0x03c: 0x0004763f
kfracdb.lge[0].bcd[0].au[1]:     292452 ; 0x040: 0x00047664
kfracdb.lge[0].bcd[0].au[2]:     292474 ; 0x044: 0x0004767a
kfracdb.lge[0].bcd[0].disks[0]:       2 ; 0x048: 0x0002
kfracdb.lge[0].bcd[0].disks[1]:       1 ; 0x04a: 0x0001
kfracdb.lge[0].bcd[0].disks[2]:       0 ; 0x04c: 0x0000

彻底屏蔽asm的实例恢复,mount磁盘组,尝试启动库进行数据库恢复.如果如果此类asm无法mount问题,无法自行解决请联系我们
电话/微信:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com

12C数据库报ORA-600 kcbzib_kcrsds_1故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:12C数据库报ORA-600 kcbzib_kcrsds_1故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有朋友由于主机断电,导致数据库无法启动,需要我们提供恢复支持.通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)检查分析,发现存放在一个磁盘的数据文件scn过小,数据库无法正常open,需要历史redo(已经被覆盖,而且数据库未归档).
20201130202509
20201130202545


对于这种情况,选择强制拉库,然后数据库报ORA-600 kcbzib_kcrsds_1错误
20201130203023
20201130203204

对于此类故障参考以前处理过类似案例:ORA-600 kcbzib_kcrsds_1报错,对数据库scn进行修改,数据库顺利open
20201130203529

记录一次pdb恢复过程中遇到的大量bug

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:记录一次pdb恢复过程中遇到的大量bug

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

12C版本使用pdb的Oracle数据库,由于在创建index的过程中强制终止,导致业务大量阻塞,然后重启数据库几次之后直接crash,最后直接无法open成功,报ORA-00600 6856

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 22, block # 993741)
ORA-01110: data file 22:
'+DATA/XIFENFEI/96D12F7BA1E2CE57E0532506060A4A2D/DATAFILE/xff01.309.1050943869'
ORA-10564: tablespace CWBASEMS01
ORA-01110: data file 22:
'+DATA/XIFENFEI/96D12F7BA1E2CE57E0532506060A4A2D/DATAFILE/xff01.309.1050943869'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 76286
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [6856], [0], [12], [], [], [], [],[], [], [], [], []
2020-11-14T15:56:50.736722+08:00
Started redo scan
2020-11-14T15:56:50.977023+08:00
Completed redo scan
 read 82825 KB redo, 10769 data blocks need recovery
2020-11-14T15:56:51.147256+08:00
Started redo application at
 Thread 1: logseq 120309, block 48, offset 0
 Thread 2: logseq 74989, block 2, offset 16, scn 0x00000000f69e1f8d
2020-11-14T15:56:51.151007+08:00
Recovery of Online Redo Log: Thread 1 Group 1 Seq 120309 Reading mem 0
  Mem# 0: +DATA/XIFENFEI/ONLINELOG/group_1.262.1023806467
2020-11-14T15:56:51.153989+08:00
Recovery of Online Redo Log: Thread 2 Group 7 Seq 74989 Reading mem 0
  Mem# 0: +DATA/XIFENFEI/ONLINELOG/group_7.274.1023806785
Errors in file /u01/app/oracle/……/xifenfei1/trace/xifenfei1_p00d_469777.trc(incident=10079552)(PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [6856], [0], [12], [], [], [], [], [], [], [], [], []
2020-11-14T15:56:52.089726+08:00
(3):Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.

这个错误比较明显是由于ORA-600 6856错误导致数据库在启动的时候无法进行实例恢复,出现这个错误原因是由于客户创建index的过程中强制终止引起Bug 17437634 – ORA-1578 or ORA-600 [6856] transient in-memory corruption on TEMP segment during transaction recovery / ROLLBACK (eg: after Ctrl-C) – superseded (Doc ID 17437634.8),屏蔽该文件实例恢复,cdb启动成功,但是pdb无法正常open

SQL> alter session set container=pdb1;
 
Session altered.

SQL> alter database open;
alter database open 
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [kcffo_online_pdb_check:fno_system], [3], [], []
Process ID: 224476
Session ID: 13311 Serial number: 59525

这个错误比较明显是由于ORA-600 kcffo_online_pdb_check:fno_system,数据库未正常检测到pdb的system文件导致该问题,通过对pdb的system文件进行操作,让数据库识别到该文件,然后继续open库

SQL> alter database open;
alter database open 
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [kcvfdb_pdb_set_clean_scn: cleanckpt],[3],[4138003978],[4289274940], [2]
Process ID: 224476
Session ID: 13311 Serial number: 59525

该错误是由于数据库在恢复的过程中推了scn,触发了oracle 某种bug导致该问题,通过一些操作之后,数据库可以open,尝试temp表空间增加临时数据文件报ORA-00600 [kcffo_add_tmpf-1] 错误(Bug 29379978 – ORA-00600 [kcffo_add_tmpf-1] when trying to add temp file (Doc ID 29379978.8)).由于该文件无法加入,数据库无法导出
20201115115951


最后没有办法换了思路直接bbed修改文件头,open cdb库,然后open pdb,顺利导出数据.这次的恢复中,深刻的体验到pdb在open过程中的各种bug,实在比较厌烦.