从ORA-00283 ORA-16433报错开始恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:从ORA-00283 ORA-16433报错开始恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

接手一个客户无法正常启动的故障数据库,尝试recover 报ORA-00283 ORA-16433错误

[oracle@xff trace]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Sat Jan 27 04:46:23 2024

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


???:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> show pdbs;
SQL> select open_mode from v$database;

OPEN_MODE
--------------------
MOUNTED

SQL>
SQL> recover database;
ORA-00283: recovery session canceled due to errors
ORA-16433: The database must be opened in read/write mode

通过对控制文件进行处理,再次尝试recover库

SQL> recover database;
ORA-00399: corrupt change description in redo log
ORA-00353: log corruption near block 134877 change 3249721295 time 01/27/2024 00:21:05
ORA-00312: online log 1 thread 1:'/u01/app/oracle/oradata/xff/redo01.log'

由于redo和数据文件不匹配,无法正常recover库,尝试强制打开库报ORA-600 2662错误

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [2662], [0], [3249721308], [0],[3249730440], [16777344],[],[],[],[],[],[]
ORA-00600: internal error code, arguments: [2662], [0], [3249721307], [0],[3249730440], [16777344],[],[],[],[],[],[]
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [2662], [0], [3249721303], [0],[3249730440], [16777344],[],[],[],[],[],[]
Process ID: 117336
Session ID: 1146 Serial number: 11764

基于这种错误,尝试oradebug修改scn

SQL> oradebug setmypid
oradebug DUMPvar SGA kcsgscn_
Statement processed.
SQL> kcslf kcsgscn_ [06001FBB0, 06001FBE0) = 00000000 00000000 00000000 00000000 00000000 
SQL> oradebug poke 0x06001FBB0 4 0x10000000
oradebug DUMPvar SGA kcsgscn_
ORA-32521: error parsing ORADEBUG command:

发现报ORA-32521错误,证明常规的oradebug方法无法修改scn,参考相关文章:
oradebug poke ORA-32521/ORA-32519故障解决
第一次通过其他方法处理,由于计算失误导致数据库启动报ORA-600 2252错误

SQL> ALTER DATABASE OPEN RESETLOGS;
ALTER DATABASE OPEN RESETLOGS
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [2252], [45264], [0], [11641],[3340959744], [],[],[],[],[],[]

该错误是相关文章参考:
记录一次ORA-00600[2252]故障解决
ORA-00600: internal error code, arguments: [2252], [3987]
主机断电系统回到N年前数据库报ORA-600 kcm_headroom_warn_1错误
处理正确的scn值之后,数据库open成功,然后逻辑方式导出数据,恢复工作完成

SQL> alter database open ;

Database altered.

ORA-00600: internal error code, arguments: [2252]

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-00600: internal error code, arguments: [2252]

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户数据库版本10.2.0.4,启动成功之后立马crash,让我们协助解决
version


Thu Jul  4 13:03:10 2019
Completed: ALTER DATABASE OPEN
Thu Jul  4 13:03:10 2019
db_recovery_file_dest_size of 2048 MB is 0.00% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Thu Jul  4 13:04:01 2019
Errors in file /oracle/app/oracle/admin/tpfworcl/bdump/tpfworcl_reco_22268.trc:
ORA-00600: internal error code, arguments: [2252], [3987], [3375047096], [], [], [], [], []
Thu Jul  4 13:04:01 2019
Errors in file /oracle/app/oracle/admin/tpfworcl/bdump/tpfworcl_reco_22268.trc:
ORA-00600: internal error code, arguments: [2252], [3987], [3375047096], [], [], [], [], []
Thu Jul  4 13:04:02 2019
Errors in file /oracle/app/oracle/admin/tpfworcl/bdump/tpfworcl_reco_22268.trc:
ORA-00600: internal error code, arguments: [2252], [3987], [3375047096], [], [], [], [], []
Thu Jul  4 13:04:02 2019
RECO: terminating instance due to error 476
Instance terminated by RECO, pid = 22268

通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)检查scn相关信息
scn


从ORA-600 2252错误信息看,由于scn可能超过该数据库的天花板理论上而导致该问题,而reco进程主要是由于分布式事务引起,通过和客户确认,该库有通过dblink去访问11204版本oracle,而从2019年6月23日之后scn的算法发生了一些改变(SCN Compatibility问题汇总-2019年6月23日),导致数据库可以支持更大的scn,从而当低版本需要进行分布式事务操作之时,可能导致数据库异常.

处理方案:通过临时屏蔽分布式事务,让数据库临时正常工作;长期解决方案需要把数据库版本升级,避免scn引起相关问题