存储断电导致ORA-00325故障恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:存储断电导致ORA-00325故障恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

由于存储断电,数据库启动最初报错

Reconfiguration complete
 parallel recovery started with 32 processes
Started redo scan
Completed redo scan
 read 22887 KB redo, 2025 data blocks need recovery
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl11/trace/orcl11_ora_25449.trc  (incident=25522):
ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr], [1], [203596], [1537], [3046], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl11/incident/incdir_25522/orcl11_ora_25449_i25522.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Abort recovery for domain 0
Aborting crash recovery due to error 600
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl11/trace/orcl11_ora_25449.trc:
ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr], [1], [203596], [1537], [3046], []
Abort recovery for domain 0
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl11/trace/orcl11_ora_25449.trc:
ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr], [1], [203596], [1537], [3046], []
ORA-600 signalled during: ALTER DATABASE OPEN...

尝试重建ctl之后,恢复数据库报错ORA-00325错误
ORA-00325


[oracle@xifenfei ~]$ oerr ora 325
00325, 00000, "archived log for thread %s, wrong thread # %s in header"
// *Cause:  The archived log is corrupted or for another thread. Can not
//          use the log for applying redo.
// *Action: Find correct archived log.

归档日志文件的头部中的线程编号和实际使用的线程编号不匹配的时候报出的错误。该错误的常见情形是由于线程 ID (thread ID) 没有被正确分配或更新造成的,如从主控制档(control file)恢复用户表空间后更改线程 ID 但系统没有知道,导致系统在访问归档日志文件时,由于归档日志文件的头部中的线程编号与实际使用的线程编号不匹配,而出现该错误。

尝试强制拉库,报ORA-600 2662错误
ORA-600-2662


尝试修改scn继续启动库报ORA-600 kclchkblk_4
ORA-600-kclchkblk_4

继续调整scn,数据库open成功.后续就是各种收尾工作(比如undo回滚段,tempfile,各种ORA-600错误等)

存储重启,oracle无法启动故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:存储重启,oracle无法启动故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户由于机房要停电,正常关闭两个节点数据库,通过数据库alert日志均可看到类似如下记录,证明数据库确实是正常shutdown immediate
1
2
然后关闭存储,启动存储之后发现数据库无法正常启动(数据scn不一致).相关信息如下:
20220707175525


最初报ORA-214错

that ORACLE_BASE be set in the environment
Wed Jul 06 00:50:02 2022
ALTER SYSTEM SET local_listener=' (ADDRESS=(PROTOCOL=TCP)(HOST=10.10.10.10)(PORT=1521))' SCOPE=MEMORY SID='xffdb2';
ALTER DATABASE MOUNT /* db agent *//* {1:42392:203} */
This instance was first to mount
NOTE: Loaded library: System 
SUCCESS: diskgroup DATA1 was mounted
SUCCESS: diskgroup DATA2 was mounted
ORA-214 signalled during: ALTER DATABASE MOUNT /* db agent *//* {1:42392:203} */...
NOTE: dependency between database xffdb and diskgroup resource ora.DATA1.dg is established
NOTE: dependency between database xffdb and diskgroup resource ora.DATA2.dg is established

提示ctl不存在,通过处理之后报ORA-600 2131错误

Wed Jul 06 01:55:45 2022
ALTER SYSTEM SET local_listener=' (ADDRESS=(PROTOCOL=TCP)(HOST=10.10.10.10)(PORT=1521))' SCOPE=MEMORY SID='xffdb2';
ALTER DATABASE MOUNT /* db agent *//* {1:42392:663} */
This instance was first to mount
NOTE: Loaded library: System 
SUCCESS: diskgroup DATA1 was mounted
SUCCESS: diskgroup DATA2 was mounted
NOTE: dependency between database xffdb and diskgroup resource ora.DATA1.dg is established
NOTE: dependency between database xffdb and diskgroup resource ora.DATA2.dg is established
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb2/trace/xffdb2_ora_47746.trc  (incident=576488):
ORA-00600: internal error code, arguments: [2131], [33], [32], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/xffdb/xffdb2/incident/incdir_576488/xffdb2_ora_47746_i576488.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
ORA-600 signalled during: ALTER DATABASE MOUNT /* db agent *//* {1:42392:663} */...

重建控制文件后恢复报错

Parallel Media Recovery started with 127 slaves
ORA-279 signalled during: ALTER DATABASE RECOVER  database using BACKUP CONTROLFILE  ...
Wed Jul 06 02:41:04 2022
ALTER DATABASE RECOVER    LOGFILE '+DATA3/xffdb/archivelog/2022_07_05/thread_2_seq_40889.18030.1109269215'  
Media Recovery Log +DATA3/xffdb/archivelog/2022_07_05/thread_2_seq_40889.18030.1109269215
Wed Jul 06 02:41:04 2022
Errors with log +DATA3/xffdb/archivelog/2022_07_05/thread_2_seq_40889.18030.1109269215
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/trace/xffdb1_pr00_96503.trc:
ORA-00325: archived log for thread 1, wrong thread # 2 in header
ORA-00334: archived log: '+DATA3/xffdb/archivelog/2022_07_05/thread_2_seq_40889.18030.1109269215'
ORA-325 signalled during: ALTER DATABASE RECOVER    LOGFILE '+DATA3/thread_2_seq_40889.18030.1109269215'  ...
ALTER DATABASE RECOVER CANCEL 
Media Recovery Canceled
Completed: ALTER DATABASE RECOVER CANCEL 
…………
Wed Jul 06 02:22:25 2022
ALTER DATABASE RECOVER  DATABASE  
Media Recovery Start
 started logmerger process
Only allocated 127 recovery slaves (requested 128)
Parallel Media Recovery started with 127 slaves
Wed Jul 06 02:22:28 2022
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/trace/xffdb1_pr00_77044.trc:
ORA-00313: open failed for members of log group 7 of thread 1
Media Recovery failed with error 313
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/trace/xffdb1_pr00_77044.trc:
ORA-00283: recovery session canceled due to errors
ORA-00313: open failed for members of log group 7 of thread 1
Wed Jul 06 02:22:28 2022
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/trace/xffdb1_m000_77318.trc:
ORA-00322: log 4 of thread 2 is not current copy
ORA-00312: online log 4 thread 2: '+DATA3/xffdb/onlinelog/group_4.16148.1107795635'
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/trace/xffdb1_m000_77318.trc:
ORA-00322: log 7 of thread 1 is not current copy
ORA-00312: online log 7 thread 1: '+DATA3/xffdb/onlinelog/group_7.18959.1107796013'
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/trace/xffdb1_m000_77318.trc:
ORA-00314: log 9 of thread 1, expected sequence# 133495 doesn't match 133490
ORA-00312: online log 9 thread 1: '+DATA3/xffdb/onlinelog/group_9.3142.1107796071'
Checker run found 208 new persistent data failures
ORA-10877 signalled during: ALTER DATABASE RECOVER  DATABASE  ...
…………
Only allocated 127 recovery slaves (requested 128)
Parallel Media Recovery started with 127 slaves
ORA-279 signalled during: ALTER DATABASE RECOVER  database using backup controlfile  ...
Wed Jul 06 06:15:26 2022
ALTER DATABASE RECOVER    LOGFILE '+DATA3/xffdb/onlinelog/group_4.16442.1107795653'  
Media Recovery Log +DATA3/xffdb/onlinelog/group_4.16442.1107795653
ORA-279 signalled during: ALTER DATABASE RECOVER    LOGFILE '+DATA3/xffdb/onlinelog/group_4.16442.1107795653'  ...
Wed Jul 06 06:15:43 2022
ALTER DATABASE RECOVER    LOGFILE '+DATA3/xffdb/onlinelog/group_7.18959.1107796013'  
Media Recovery Log +DATA3/xffdb/onlinelog/group_7.18959.1107796013
Wed Jul 06 06:15:50 2022
Errors with log +DATA3/xffdb/onlinelog/group_7.18959.1107796013
Wed Jul 06 06:15:50 2022
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/trace/xffdb1_pr29_306479.trc  (incident=961030):
ORA-00600: internal error code, arguments: [6102], [13], [17], [], [], [], [], [], [], [], [], []
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Wed Jul 06 06:15:54 2022
Sweep [inc][961030]: completed
Sweep [inc2][961030]: completed
Slave exiting with ORA-10562 exception
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/trace/xffdb1_pr29_306479.trc:
ORA-10562: Error occurred while applying redo to data block (file# 159, block# 3591756)
ORA-10564: tablespace LIS
ORA-01110: data file 159: '+DATA1/xffdb/datafile/lis.379.1080445903'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 138875
ORA-00600: internal error code, arguments: [6102], [13], [17], [], [], [], [], [], [], [], [], []
Wed Jul 06 06:15:59 2022
Recovery Slave PR29 previously exited with exception 10562

基于上述情况,很可能是由于存储重启之后,cache或者某些数据没有写入到数据文件和redo中,数据库重启之后redo不是最新的[ORA-00322错误可以证明,],数据文件也需要进行恢复(不是数据库正常关闭之后该有的情况),而且redo和数据文件还不一致[ORA-00600 6102可以证明],对于类似这样的情况,只能尝试强制打开数据库,报ORA-600 2663

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [2663], [1393], [4159455578],
[1393], [4160374753], [], [], [], [], [], [], []
Process ID: 357910
Session ID: 1585 Serial number: 7
Wed Jul 06 06:57:25 2022
SMON: enabling cache recovery
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/trace/xffdb1_ora_357910.trc  (incident=1056360):
ORA-00600: internal error code, arguments: [2663], [1393],[4159455578],[1393],[4160374753],[], [], [], []
Redo thread 2 internally disabled at seq 1 (CKPT)
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/trace/xffdb1_arc3_360348.trc:
ORA-00600: internal error code, arguments: [ORA_NPI_ERROR],[600], 
  [ORA-00600: internal error code, arguments: [kffbAddBlk04]
Unable to create archive log file '+DATA3'
ARC3: Error 19504 Creating archive log file to '+DATA3'
ARCH: Archival error occurred on a closed thread. Archiver continuing
ORACLE Instance xffdb1 - Archival Error. Archiver continuing.
ARCH: Archival error occurred on a closed thread. Archiver continuing
ORACLE Instance xffdb1 - Archival Error. Archiver continuing.
Wed Jul 06 06:57:34 2022
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/incident/incdir_1056360/xffdb1_ora_357910_i1056360.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '+DATA1/xffdb/onlinelog/group_4.424.1109314453'
ORA-00600: internal error code, arguments: [2663], [1393], [4159455578], [1393], [4160374753], [], [],
Wed Jul 06 06:57:34 2022
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/trace/xffdb1_ora_357910.trc:
ORA-00600: internal error code, arguments: [2663], [1393], [4159455578], [1393], [4160374753], [], [], []
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/trace/xffdb1_ora_357910.trc:
ORA-00600: internal error code, arguments: [2663], [1393], [4159455578], [1393], [4160374753], [], [], []
Error 600 happened during db open, shutting down database
USER (ospid: 357910): terminating the instance due to error 600
Instance terminated by USER, pid = 357910
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (357910) as a result of ORA-1092
Wed Jul 06 06:57:35 2022
ORA-1092 : opitsk aborting process

该错误比较常见,参考:ORA-600 2663,也可以利用我的Patch_SCN小工具快速解决,后续数据库报ORA-03113错

SQL> alter database open ;
alter database open 
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 369324
Session ID: 1585 Serial number: 1

查看alert日志,确认具体报错为kgegpa

Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Wed Jul 06 07:17:08 2022
SMON: enabling cache recovery
ARC1: Archiving disabled thread 2 sequence 1
Archived Log entry 1 added for thread 1 sequence 1 ID 0x36317f52 dest 1:
Archived Log entry 2 added for thread 1 sequence 2 ID 0x36317f52 dest 1:
Archived Log entry 3 added for thread 2 sequence 1 ID 0x0 dest 1:
Exception [type:SIGSEGV, Address not mapped to object][ADDR:0x4D562123][PC:0x983CDD6,kgegpa()+40][flags: 0x0,count:1]
Exception [type:SIGSEGV, Address not mapped to object][ADDR:0x4D562123][PC:0x983B84A, kgebse()+776][flags: 0x2,count:2]
Exception [type:SIGSEGV, Address not mapped to object][ADDR:0x4D562123][PC:0x983B84A, kgebse()+776][flags: 0x2,count:2]
Wed Jul 06 07:17:11 2022
PMON (ospid: 377647): terminating the instance due to error 397

该问题有过类似的案例通过处理数据库open成功:
在数据库恢复遭遇ORA-07445 kgegpa错误
Exception [type: SIGSEGV, Address not mapped to object] [] [ kgegpa()+36]