tab$异常被处理之后报ORA-600 13304故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:tab$异常被处理之后报ORA-600 13304故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

又一例数据库启动报ORA-600 16703 1403 20错误故障

Sun Jun 13 14:00:56 2021
NOTE: dependency between database xff and diskgroup resource ora.DG_ARCH_xff.dg is established
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_56340.trc  (incident=348265):
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
Incident details in: /opt/oracle/diag/rdbms/xff/xff1/incident/incdir_348265/xff1_ora_56340_i348265.trc
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_56340.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_56340.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
Error 704 happened during db open, shutting down database
USER (ospid: 56340): terminating the instance due to error 704
Instance terminated by USER, pid = 56340

这个故障比较明显,根据我们之前的分析经验(警告:互联网中有oracle介质被注入恶意程序导致—ORA-600 16703),应该是tab$被恶意破坏导致,通过分析安装程序,确认是该问题,客户通过互联网上的相关文章,dd方式进行处理,结果数据库报ORA-600 13304错误,无法继续,让我们提供技术支持

SMON: enabling tx recovery
Database Characterset is AL32UTF8
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_83843.trc  (incident=396265):
ORA-00600: internal error code, arguments: [13304], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /opt/oracle/diag/rdbms/xff/xff1/incident/incdir_396265/xff1_ora_83843_i396265.trc
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_83843.trc:
ORA-00600: internal error code, arguments: [13304], [], [], [], [], [], [], [], [], [], [], []
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_83843.trc:
ORA-00600: internal error code, arguments: [13304], [], [], [], [], [], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
USER (ospid: 83843): terminating the instance due to error 600
Instance terminated by USER, pid = 83843

通过我们的技术对数据库进行一系列恢复之后,open过程报错

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-00904: "NAME": invalid identifier
Process ID: 23346
Session ID: 680 Serial number: 51933

通过跟踪启动过程分析

PARSE ERROR #140574232044112:len=45 dep=1 uid=0 oct=3 lid=0 tim=1623621695884944 err=904
select value$ from sys.props$ where name = :1
ORA-00604: error occurred at recursive SQL level 1
ORA-00904: "NAME": invalid identifier
ORA-00604: error occurred at recursive SQL level 1
ORA-00904: "NAME": invalid identifier

基本上可以确定是由于客户自行恢复导致props$表异常.通过进一步分析,确认是由于在对tab$处理不合适导致,进一步对tab$进行处理,数据库恢复正常,实现数据0丢失

磁盘空间不足迁移数据文件导致故障恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:磁盘空间不足迁移数据文件导致故障恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户由于磁盘空间不足,在线把oracle数据迁移到其他位置

Tue Jun 01 11:44:32 2021
Thread 1 advanced to log sequence 28754 (LGWR switch)
  Current log# 2 seq# 28754 mem# 0: /u01/app/oracle/oradata/orcl/redo02.log
Tue Jun 01 11:59:54 2021
Non critical error ORA-48113 caught while writing to trace file
      "/u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_mmon_23341.trc"
Error message: 
Writing to the above trace file is disabled for now on...
Tue Jun 01 12:00:00 2021
Non critical error ORA-48181 caught while writing to trace file
       "/u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_j000_29692.trc"
Error message: Linux-x86_64 Error: 28: No space left on device
Additional information: 1
Writing to the above trace file is disabled for now on...
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_j000_29692.trc:
ORA-12012: error on auto execute of job "XIFENFEI"."STATISTICS_1_JOBS"
ORA-06575: Package or function PKG_STAT_1_2018 is in an invalid state
Tue Jun 01 12:12:26 2021

迁移走数据文件之后,数据库报错,并且强制关闭数据库

ORA-01116: error in opening database file 30
ORA-01110: data file 30: '/u02/orcdate/AAAA.dbf'
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_m001_29106.trc:
ORA-01116: error in opening database file 31
ORA-01110: data file 31: '/u02/orcdate/CBD.dbf'
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Mon Jun 07 10:25:03 2021
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_m000_9817.trc:
ORA-01116: error in opening database file 24
ORA-01110: data file 24: '/u02/orcdate/ABC.dbf'
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Mon Jun 07 10:25:10 2021
Shutting down instance (immediate)
Stopping background process SMCO
Shutting down instance: further logons disabled
Read of datafile '/u02/orcdate/XXXXXXX.dbf' (fno 21) header failed with ORA-01208
Rereading datafile 21 header failed with ORA-01208
Mon Jun 07 10:25:36 2021
Adjusting the default value of parameter parallel_max_servers
from 640 to 485 due to the value of parameter processes (500)
Starting ORACLE instance (normal)
Mon Jun 07 10:28:20 2021
Shutting down instance (abort)
License high water mark = 152
USER (ospid: 7987): terminating the instance
Termination issued to instance processes. Waiting for the processes to exit
Mon Jun 07 10:28:30 2021
Instance termination failed to kill one or more processes
Instance terminated by USER, pid = 7987
Mon Jun 07 10:28:31 2021
Instance shutdown complete

然后又把文件迁移回来,并且进行了一系列数据库恢复,最后我们接手是情况是有多个文件被offline,并且有一个文件报WRONG FILE NUMBER,通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)脚本检查,对其中的v$datafile,v$datafile_header,v$tablespace综合分析
20210612154127
20210612154301
20210612154350


确认是WXD_YPT表空间数据文件直接拷贝为WXD表空间数据文件,经过客户确认,WXD数据不重要,客户先忽略.
通过一系列处理,尝试open数据库,报ORA-600 2662错误

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [2662], [3786], [2612118101], [3786], [2612128448], [12583040]
ORA-00600: internal error code, arguments: [2662], [3786], [2612118100], [3786], [2612128448], [12583040]
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [2662], [3786], [2612118098], [3786], [2612128448], [12583040]
Process ID: 14888
Session ID: 198 Serial number: 3

修改数据库scn(参考blog相关link:ORA-600 2662)数据库顺利open,并且协助客户导出数据并导入新库,完成数据库恢复.
这次运气比较好,只是丢失了一点数据,没有引起重大事故.再此提醒:不太了解oracle的朋友,操作数据库需谨慎,不要在线直接移动数据文件,另外为了更好的恢复效果,更快的恢复,故障之后,最好尽可能的告知所有操作.

文件系统重新分区oracle恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:文件系统重新分区oracle恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

最近处理的一个恢复,算是这几年中的一个奇葩.
1. oracle dg 主备库raid同时损坏,找硬件恢复厂商软件重组raid,恢复厂商判断所有磁盘全部都是好的
2. 主库系统被重装,文件系统重新分区.备库在使用duplicate搭建dg的过程中(通过alert日志分析以前的dg是正常的,直接rm掉了所有文件,然后使用duplicate搭建),只是部分文件拷贝到了备库
3. 备份放在一台单独的存储上,但是当上去看是发现存储上面空空的,没有任何数据(通过对ctl的分析,确认存储上面只有一个月之前的备份记录,估计也被删除或者重新分区了(通过后续分析,判断应该是被重新分区了)
客户没有和我们说任何信息,就是说突然两个raid都损坏了,找硬件厂商进行恢复,硬件厂商开始也觉得这个会比较简单,直接通过raid模拟恢复出来lun,然后通过软件恢复出来一些数据文件(反馈给我的信息是少了redo,需要我们协助恢复),通过深入分析,发现少了大量数据文件,基于现在的恢复基本上没意义.然后通过低主库的raid模拟恢复,拷贝出来数据文件,结果发现恢复出来的文件大小,和文件头记录不匹配
20210607232818


这里显示文件大小应该是30G,但是实际拷贝的文件只有26G大小
20210607232731

通过底层进一步分析,发现任何大于4G的文件,按照4G为单位间隔损坏(4G好,4G损坏,4G好……)
20210605203719
20210605201235

出现这类情况,通过底层分析,判断是客户对磁盘进行了重新分区,引起底层问题导致
20210607214629

基于这样的情况,没有太多好的方法处理,直接使用底层碎片技术进行恢复
20210607233847

运气不错,顺利open数据库
20210607234450

本次恢复走了很多弯路,主要是客户不清楚客户那边处于什么原因,多次隐秘故障原因,没有如实的告知我们故障情况,一步步尝试,走了很多弯路,耽误了不少时间.如果可能请尽量告诉我们准确情况,便于我们准确做出判断,快速高效的恢复.
类似oracle 碎片层面恢复,我们进行了挺多的,类似:
dbca删除库和rm删库恢复
文件系统损坏导致数据文件异常恢复
Oracle 数据文件大小为0kb或者文件丢失恢复
alter database create datafile 导致数据文件丢失恢复
rm -rf 删除数据文件恢复方法—文件系统反删除+oracle碎片重组