tab$异常被处理之后报ORA-600 13304故障处理

Posted on 2021 年 06 月 15 日 by 惜分飞

又一例数据库启动报ORA-600 16703 1403 20错误故障

Sun Jun 13 14:00:56 2021
NOTE: dependency between database xff and diskgroup resource ora.DG_ARCH_xff.dg is established
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_56340.trc  (incident=348265):
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
Incident details in: /opt/oracle/diag/rdbms/xff/xff1/incident/incdir_348265/xff1_ora_56340_i348265.trc
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_56340.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_56340.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
Error 704 happened during db open, shutting down database
USER (ospid: 56340): terminating the instance due to error 704
Instance terminated by USER, pid = 56340

这个故障比较明显,根据我们之前的分析经验(警告：互联网中有oracle介质被注入恶意程序导致—ORA-600 16703),应该是tab$被恶意破坏导致,通过分析安装程序,确认是该问题,客户通过互联网上的相关文章,dd方式进行处理,结果数据库报ORA-600 13304错误,无法继续,让我们提供技术支持

SMON: enabling tx recovery
Database Characterset is AL32UTF8
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_83843.trc  (incident=396265):
ORA-00600: internal error code, arguments: [13304], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /opt/oracle/diag/rdbms/xff/xff1/incident/incdir_396265/xff1_ora_83843_i396265.trc
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_83843.trc:
ORA-00600: internal error code, arguments: [13304], [], [], [], [], [], [], [], [], [], [], []
Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_83843.trc:
ORA-00600: internal error code, arguments: [13304], [], [], [], [], [], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
USER (ospid: 83843): terminating the instance due to error 600
Instance terminated by USER, pid = 83843

通过我们的技术对数据库进行一系列恢复之后,open过程报错

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-00904: "NAME": invalid identifier
Process ID: 23346
Session ID: 680 Serial number: 51933

通过跟踪启动过程分析

PARSE ERROR #140574232044112:len=45 dep=1 uid=0 oct=3 lid=0 tim=1623621695884944 err=904
select value$ from sys.props$ where name = :1
ORA-00604: error occurred at recursive SQL level 1
ORA-00904: "NAME": invalid identifier
ORA-00604: error occurred at recursive SQL level 1
ORA-00904: "NAME": invalid identifier

基本上可以确定是由于客户自行恢复导致props$表异常.通过进一步分析,确认是由于在对tab$处理不合适导致,进一步对tab$进行处理,数据库恢复正常,实现数据0丢失

文件系统重新分区oracle恢复

Posted on 2021 年 06 月 07 日 by 惜分飞

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：文件系统重新分区oracle恢复

最近处理的一个恢复,算是这几年中的一个奇葩.
1. oracle dg 主备库raid同时损坏,找硬件恢复厂商软件重组raid,恢复厂商判断所有磁盘全部都是好的
2. 主库系统被重装,文件系统重新分区.备库在使用duplicate搭建dg的过程中(通过alert日志分析以前的dg是正常的,直接rm掉了所有文件,然后使用duplicate搭建),只是部分文件拷贝到了备库
3. 备份放在一台单独的存储上,但是当上去看是发现存储上面空空的,没有任何数据(通过对ctl的分析,确认存储上面只有一个月之前的备份记录,估计也被删除或者重新分区了(通过后续分析,判断应该是被重新分区了)
客户没有和我们说任何信息,就是说突然两个raid都损坏了,找硬件厂商进行恢复,硬件厂商开始也觉得这个会比较简单,直接通过raid模拟恢复出来lun,然后通过软件恢复出来一些数据文件(反馈给我的信息是少了redo,需要我们协助恢复),通过深入分析,发现少了大量数据文件,基于现在的恢复基本上没意义.然后通过低主库的raid模拟恢复,拷贝出来数据文件,结果发现恢复出来的文件大小,和文件头记录不匹配

这里显示文件大小应该是30G,但是实际拷贝的文件只有26G大小

通过底层进一步分析,发现任何大于4G的文件,按照4G为单位间隔损坏(4G好,4G损坏,4G好……)

出现这类情况,通过底层分析,判断是客户对磁盘进行了重新分区,引起底层问题导致

基于这样的情况,没有太多好的方法处理,直接使用底层碎片技术进行恢复

运气不错,顺利open数据库

本次恢复走了很多弯路,主要是客户不清楚客户那边处于什么原因,多次隐秘故障原因,没有如实的告知我们故障情况,一步步尝试,走了很多弯路,耽误了不少时间.如果可能请尽量告诉我们准确情况,便于我们准确做出判断,快速高效的恢复.
类似oracle 碎片层面恢复,我们进行了挺多的,类似：
dbca删除库和rm删库恢复
文件系统损坏导致数据文件异常恢复
Oracle 数据文件大小为0kb或者文件丢失恢复
alter database create datafile 导致数据文件丢失恢复
rm -rf 删除数据文件恢复方法—文件系统反删除+oracle碎片重组

ORA-00600: internal error code, arguments: [16703], [1403], [4] 故障处理

Posted on 2021 年 05 月 15 日 by 惜分飞

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：ORA-00600: internal error code, arguments: [16703], [1403], [4] 故障处理

有一个客户数据库遭遇ORA-600 16703错误,故障原因见:警告：互联网中有oracle介质被注入恶意程序导致—ORA-600 16703,这种故障已经恢复比较多,在这次的恢复中遇到一个新错误,进行记录
接手给客户报错情况ORA-00600: internal error code, arguments: [16703], [1403], [20]

Thu May 13 22:36:11 2021
SMON: enabling cache recovery
Thu May 13 22:36:11 2021
NSA2 started with pid=61, OS id=6261 
Archived Log entry 90224 added for thread 1 sequence 50454 ID 0x19ae1c6c dest 1:
Errors in file /oracle/diag/rdbms/xff/xff/trace/xff_ora_5931.trc  (incident=741052):
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
Incident details in: /oracle/diag/rdbms/xff/xff/incident/incdir_741052/xff_ora_5931_i741052.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /oracle/diag/rdbms/xff/xff/trace/xff_ora_5931.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
Errors in file /oracle/diag/rdbms/xff/xff/trace/xff_ora_5931.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
Error 704 happened during db open, shutting down database
USER (ospid: 5931): terminating the instance due to error 704
Instance terminated by USER, pid = 5931
ORA-1092 signalled during: alter database open...
opiodr aborting process unknown ospid (5931) as a result of ORA-1092
Thu May 13 22:36:13 2021
ORA-1092 : opitsk aborting process

这种故障,由于是恶意脚本在数据库启动的时候清空tab$所致,使用bbed对tab$进行反向删除操作,实现数据库打开.
在这次的恢复过程中遇到ORA-600 16703 1403 4的错误

SQL> startup mount pfile='/tmp/pfile';
ORACLE instance started.

Total System Global Area 7.0818E+10 bytes
Fixed Size                  2260448 bytes
Variable Size            1.3422E+10 bytes
Database Buffers         5.7177E+10 bytes
Redo Buffers              217030656 bytes
Database mounted.
SQL> alter database open ;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [16703], [1403], [4], [], [], [],
[], [], [], [], [], []
Process ID: 51886
Session ID: 3497 Serial number: 3

根据ora-600 16703 1403 4基本上可以判断是由于tab$这个表中缺少obj#=4的对象导致,通过查询正常库,确认是该对象为tab$,也就是说由于tab$对象中少了tab$记录.通过bbed分析确认

SQL> SELECT a.OBJ#
  2        ,TAB#
  3        ,a.DATAOBJ#
  4        ,BOBJ#
  5        ,NAME
  6        ,DBMS_ROWID.ROWID_RELATIVE_FNO (a.ROWID) FILE_ID
  7        ,DBMS_ROWID.ROWID_BLOCK_NUMBER (a.ROWID) BLOCK_ID
  8    FROM TAB$ a, obj$ b
  9   WHERE     a.obj# = b.obj#
 10         AND A.OBJ# IN (4);

      OBJ#       TAB#   DATAOBJ#      BOBJ# NAME
---------- ---------- ---------- ---------- ------------------------------
   FILE_ID   BLOCK_ID
---------- ----------
         4          1          2          2 TAB$
         1        147

BBED> set dba 1,147
        DBA             0x00400093 (4194451 1,147)

BBED> x /rnnnnnnnnnnnnncnnnnnnnntnnnnnnnnnncct  *kdbr[14]
rowdata[6848]                               @7349
-------------
flag@7349: 0x20 (KDRHFH)
lock@7350: 0x02
cols@7351:    0
nrid@7352:0x00407b09.1
BBED> set dba 0x00407b09
        DBA             0x00407b09 (4225801 1,31497)

BBED> p kdbt[1]
struct kdbt[1], 4 bytes                     @110
   sb2 kdbtoffs                             @110      10
   sb2 kdbtnrow                             @112      2

BBED> x /rnnnnnnnnnnnnncnnnnnnnntnnnnnnnnnncct  *kdbr[11]
rowdata[815]                                @4436
------------
flag@4436: 0x5c (KDRHFL, KDRHFF, KDRHFD, KDRHFC)
lock@4437: 0x02
cols@4438:    0
ckix@4439:    8

BBED> x /rn  *kdbr[8]
rowdata[950]                                @4571
------------
flag@4571: 0xac (KDRHFL, KDRHFF, KDRHFH, KDRHFK)
lock@4572: 0x00
cols@4573:    1
kref@4574:    1
hrid@4576:0x00400093.8
nrid@4582:0x00400094.0
col    0[2] @4590: 4

确认该记录发生了行迁移导致该问题,对其对应的block进行修复,数据库正常打开.

记录一次oracle现场故障处理经过

Posted on 2021 年 04 月 03 日 by 惜分飞

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：记录一次oracle现场故障处理经过

近期到现场进行了一个数据库恢复,我在恢复之前该库先由于硬件进行恢复,然后由其他人对其进行了一系列数据库恢复,但是未恢复成功,客户希望我们到现场进行处理(因为网络原因无法远程).接手库之后,处理第一个问题,是客户在进行现场备份的时候(把linux数据拷贝到win的过程中)发现有几个文件拷贝异常,这个错误很可能是由于当初的硬件故障修复之后留下的后遗症(由于io设备错误,无法运行此项请求),通过工具进行拷贝,恢复出来

DUL> copy file from  /oradata2/xifenfeidata.dbf to /oradata2/xifenfeidata.dbf

starting copy datafile '/oradata1/xifenfeidata.dbf' to '/oradata2/xifenfeidata.dbf'
read data error from file '/oradata1/xifenfeidata.dbf'.error message:Input/output error
read block# error: 560171
read data error from file '/oradata1/xifenfeidata.dbf'.error message:Input/output error
read block# error: 560179
datafile copy completed with 2 block error.

[oracle@localhost ~]$ dbv file=/oradata2/xifenfeidata.dbf blocksize=16384

DBVERIFY: Release 11.2.0.3.0 - Production on Mon Mar 29 17:28:17 2021

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /oradata2/xifenfeidata.dbf
Page 560171 is marked corrupt
Corrupt block relative dba: 0x3bc88c2b (file 239, block 560171)
Completely zero block found during dbv: 

Page 560179 is marked corrupt
Corrupt block relative dba: 0x3bc88c33 (file 239, block 560179)
Completely zero block found during dbv: 



DBVERIFY - Verification complete

Total Pages Examined         : 4194302
Total Pages Processed (Data) : 2230726
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 1936953
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 26618
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 3
Total Pages Marked Corrupt   : 2
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 304929867 (106.304929867)

修复完相关无法拷贝文件之后,启动数据库报控制文件异常

Mon Mar 29 15:03:38 2021
alter database mount
USER (ospid: 29044): terminating the instance
Mon Mar 29 15:03:42 2021
System state dump requested by (instance=1, osid=29044), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/xff/xff/trace/xff_diag_28961.trc
Instance terminated by USER, pid = 29044

尝试重建ctl

[oracle@localhost ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Mon Mar 29 17:40:17 2021

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup nomount pfile='/tmp/pfile'
ORACLE instance started.

Total System Global Area 1.7704E+10 bytes
Fixed Size                  2235568 bytes
Variable Size            2348811088 bytes
Database Buffers         1.5301E+10 bytes
Redo Buffers               52580352 bytes
SQL> @/tmp/ctl.sql
CREATE CONTROLFILE REUSE DATABASE xff NORESETLOGS  NOARCHIVELOG
*
ERROR at line 1:
ORA-01503: CREATE CONTROLFILE failed
ORA-01189: file is from a different RESETLOGS than previous files
ORA-01110: data file 249: '/oradata/xff/system03.dbf'

初步判断是由于对方之前恢复导致部分文件resetlogs scn异常,通过bbed进行判断确认

BBED> set file 1
        FILE#           1

BBED> p kcvfhrls
struct kcvfhrls, 8 bytes                    @116     
   ub4 kscnbas                              @116      0x00000001
   ub2 kscnwrp                              @120      0x0000

BBED> set file 249
        FILE#           249

BBED> p kcvfhrls
struct kcvfhrls, 8 bytes                    @116     
   ub4 kscnbas                              @116      0x00000001
   ub2 kscnwrp                              @120      0x0000

通过bbed修改相关值,然后重建控制文件成功,尝试resetlogs库，报ORA-01248错误

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01248: file 234 was created in the future of incomplete recovery
ORA-01110: data file 234: '/oradata1/xifenfeidata5.DBF'

关于ORA-01248的错误解释

01248, 00000, "file %s was created in the future of incomplete recovery"
// *Cause:  Attempting to do a RESETLOGS open with a file entry in the
//          control file that was originally created after the UNTIL time 
//          of the incomplete recovery.
//          Allowing such an entry may hide the version of the file that 
//          is needed at this time.  The file number may be in use for 
//          a different file which would be lost if the RESETLOGS was allowed.
// *Action: If more recovery is desired then apply redo until the creation
//          time of the file is reached. If the file is not wanted and the
//          same file number is not in use at the stop time of the recovery,
//          then the file can be taken offline with the FOR DROP option.
//          Otherwise a different control file is needed to allow the RESETLOGS.
//          Another backup can be restored and recovered, or a control file can
//          be created via CREATE CONTROLFILE.

大概的意思是文件的创建时间大于文件当前的scn,通过查询确实如此

SQL> select file#,CREATION_CHANGE#,CREATION_TIME from v$datafile_header where file#=234;

           FILE# CREATION_CHANGE# CREATION_
---------------- ---------------- ---------
             234     419298664864 02-AUG-19

SQL> SELECT status,  
  2  to_char(checkpoint_change#,'9999999999999999') "SCN",
  3  to_char(checkpoint_time,'yyyy-mm-dd hh24:mi:ss') checkpoint_time,FUZZY,
  4  count(*) ROW_NUM
  5  FROM v$datafile_header
  6  GROUP BY status, checkpoint_change#, to_char(checkpoint_time,'yyyy-mm-dd hh24:mi:ss'),fuzzy
  7  ORDER BY status, checkpoint_change#, checkpoint_time;

STATUS  SCN               CHECKPOINT_TIME     FUZ          ROW_NUM
------- ----------------- ------------------- --- ----------------
ONLINE       417750848223 2021-02-23 23:50:46 YES                7
ONLINE       417750848223 2021-03-21 11:44:25 NO               396

通过对部分scn进行修改（比如减小创建时间的scn)，然后尝试resetlogs库

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 5 with name
"_SYSSMU5_2708889888$" too small
Process ID: 3182
Session ID: 1 Serial number: 3

这个错误比较简单,参考以前的部分文章:在数据库open过程中常遇到ORA-01555汇总数据库open过程遭遇ORA-1555对应sql语句补充,处理之后,数据库open成功

SQL> startup mount;
ORACLE instance started.

Total System Global Area 1.7704E+10 bytes
Fixed Size                  2235568 bytes
Variable Size            2348811088 bytes
Database Buffers         1.5301E+10 bytes
Redo Buffers               52580352 bytes
Database mounted.
SQL> alter database open;

Database altered.

本次数据库恢复基本上完成,已经最大限度恢复数据,导出数据到新库,完成恢复任务

ORA-600 16703直接把orachk备份表插入到tab$恢复

Posted on 2021 年 03 月 24 日 by 惜分飞

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：ORA-600 16703直接把orachk备份表插入到tab$恢复

有一个朋友和我说，他们数据库出现了以下错误ORA-600 16703 错误

他们是在虚拟化环境中,可以恢复到上一个快照点,但是主机启动之后,数据库依旧异常,让我们进行处理

C:\Users\Administrator>sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Wed Mar 24 17:04:01 2021

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> select open_mode from v$database;

OPEN_MODE
--------------------
READ WRITE

SQL> select count(1) from tab$;

  COUNT(1)
----------
         0

很明显tab$已经被清空,数据库无法正常使用.因为库没有crash,尝试把备份的orachk表插入进来

SQL> insert into tab$ select * from ORACHKB514061BDCB10EBA9CF58F3;

6318 rows created.

SQL> commit;

Commit complete.

SQL> select 'DROP TRIGGER '||owner||'."'||TRIGGER_NAME||'";' from dba_triggers w
here TRIGGER_NAME like 'DBMS_%_INTERNAL% '
  2  union all
  3  select 'DROP PROCEDURE '||owner||'."'||a.object_name||'";' from dba_procedu
res a where a.object_name like 'DBMS_%_INTERNAL% '
  4  union all
  5  select 'drop '||object_type||' '||owner||'.'||object_name||';' from dba_obj
ects where object_name in('DBMS_SUPPORT_DBMONITOR','DBMS_SUPPORT_DBMONITORP');

'DROPTRIGGER'||OWNER||'."'||TRIGGER_NAME||'";'
--------------------------------------------------------------------------------

drop PROCEDURE SYS.DBMS_SUPPORT_DBMONITORP;
drop TRIGGER SYS.DBMS_SUPPORT_DBMONITOR;

SQL> drop PROCEDURE SYS.DBMS_SUPPORT_DBMONITORP;

Procedure dropped.

SQL> drop TRIGGER SYS.DBMS_SUPPORT_DBMONITOR;

Trigger dropped.

SQL> commit;

Commit complete.

SQL>

重启数据库,该故障恢复完成,数据完美恢复0丢失.