硬件故障恢复出文件之后数据库故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:硬件故障恢复出文件之后数据库故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户那边硬件故障(raid损坏磁盘超过了极限,导致raid offline),通过硬件恢复出来数据文件,然后尝试自行恢复,我接手的时候大量数据文件resetlogs scn异常.
wrong_resetlogs


重建控制文件报错

WARNING: Default Temporary Tablespace not specified in CREATE DATABASE command
Default Temporary Tablespace will be necessary for a locally managed database in future release
Errors in file /home/oracle/app/diag/rdbms/orcl/orcl/trace/orcl_ora_5949.trc:
ORA-01189: file is from a different RESETLOGS than previous files
ORA-01110: data file 153: '/home/oracle/oracledata/orcl/sysaux02.dbf'
ORA-1503 signalled during: CREATE CONTROLFILE REUSE DATABASE "ORCL" NORESETLOGS  ARCHIVELOG

通过修改文件头然后重建控制文件,可以通过bbed,或者我的小工具Oracle Recovery Tools
bbed解决ORA-01190
Oracle Recovery Tools 解决ORA-01190 ORA-01248等故障
重建control遗漏数据文件,reseltogs报ORA-1555错误处理
然后继续重建ctl发现以下错误

WARNING: Default Temporary Tablespace not specified in CREATE DATABASE command
Default Temporary Tablespace will be necessary for a locally managed database in future release
Errors in file /home/oracle/app/diag/rdbms/orcl/orcl/trace/orcl_ora_34075.trc:
ORA-01200: actual file size of 2015415 is smaller than correct size of 2944000 blocks
ORA-01110: data file 178: '/home/oracle/oracledata/orcl/xifenfei20_10.dbf'
ORA-1503 signalled during: CREATE CONTROLFILE REUSE DATABASE "ORCL" NORESETLOGS  NOARCHIVELOG

通过对比发现是由于客户上传恢复文件异常导致
20230713002257


重新上传文件,然后修改文件头,该问题解决,重建ctl成功,提个醒:对于这种硬件恢复之后文件上次到服务器上进行恢复的,一定要确认上传文件和原文件一致,不然做无用功或者恢复效果差很多
尝试open数据库报ORA-600 2662错误

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [2662], [5], [1653389530], [5],
[1653496702], [12583040], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [2662], [5], [1653389529], [5],
[1653496702], [12583040], [], [], [], [], [], []
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [2662], [5], [1653389527], [5],
[1653496702], [12583040], [], [], [], [], [], []
Process ID: 4710
Session ID: 1847 Serial number: 3

这个错误比较简单,一般是scn问题,有过大量的处理经验案例:
使用bbed解决ORA-00600[2662]
硬件故障导致ORA-600 2662错误处理
Patch SCN工具快速解决ORA-600 2662问题
解决好该问题之后,数据库open成功,实现了最大限度抢救数据.

记录一次oracle现场故障处理经过

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:记录一次oracle现场故障处理经过

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

近期到现场进行了一个数据库恢复,我在恢复之前该库先由于硬件进行恢复,然后由其他人对其进行了一系列数据库恢复,但是未恢复成功,客户希望我们到现场进行处理(因为网络原因无法远程).接手库之后,处理第一个问题,是客户在进行现场备份的时候(把linux数据拷贝到win的过程中)发现有几个文件拷贝异常,这个错误很可能是由于当初的硬件故障修复之后留下的后遗症(由于io设备错误,无法运行此项请求),通过工具进行拷贝,恢复出来
20210403210131


DUL> copy file from  /oradata2/xifenfeidata.dbf to /oradata2/xifenfeidata.dbf

starting copy datafile '/oradata1/xifenfeidata.dbf' to '/oradata2/xifenfeidata.dbf'
read data error from file '/oradata1/xifenfeidata.dbf'.error message:Input/output error
read block# error: 560171
read data error from file '/oradata1/xifenfeidata.dbf'.error message:Input/output error
read block# error: 560179
datafile copy completed with 2 block error.
[oracle@localhost ~]$ dbv file=/oradata2/xifenfeidata.dbf blocksize=16384

DBVERIFY: Release 11.2.0.3.0 - Production on Mon Mar 29 17:28:17 2021

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /oradata2/xifenfeidata.dbf
Page 560171 is marked corrupt
Corrupt block relative dba: 0x3bc88c2b (file 239, block 560171)
Completely zero block found during dbv: 

Page 560179 is marked corrupt
Corrupt block relative dba: 0x3bc88c33 (file 239, block 560179)
Completely zero block found during dbv: 



DBVERIFY - Verification complete

Total Pages Examined         : 4194302
Total Pages Processed (Data) : 2230726
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 1936953
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 26618
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 3
Total Pages Marked Corrupt   : 2
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 304929867 (106.304929867)

修复完相关无法拷贝文件之后,启动数据库报控制文件异常

Mon Mar 29 15:03:38 2021
alter database mount
USER (ospid: 29044): terminating the instance
Mon Mar 29 15:03:42 2021
System state dump requested by (instance=1, osid=29044), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/xff/xff/trace/xff_diag_28961.trc
Instance terminated by USER, pid = 29044

尝试重建ctl

[oracle@localhost ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Mon Mar 29 17:40:17 2021

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup nomount pfile='/tmp/pfile'
ORACLE instance started.

Total System Global Area 1.7704E+10 bytes
Fixed Size                  2235568 bytes
Variable Size            2348811088 bytes
Database Buffers         1.5301E+10 bytes
Redo Buffers               52580352 bytes
SQL> @/tmp/ctl.sql
CREATE CONTROLFILE REUSE DATABASE xff NORESETLOGS  NOARCHIVELOG
*
ERROR at line 1:
ORA-01503: CREATE CONTROLFILE failed
ORA-01189: file is from a different RESETLOGS than previous files
ORA-01110: data file 249: '/oradata/xff/system03.dbf'

初步判断是由于对方之前恢复导致部分文件resetlogs scn异常,通过bbed进行判断确认

BBED> set file 1
        FILE#           1

BBED> p kcvfhrls
struct kcvfhrls, 8 bytes                    @116     
   ub4 kscnbas                              @116      0x00000001
   ub2 kscnwrp                              @120      0x0000

BBED> set file 249
        FILE#           249

BBED> p kcvfhrls
struct kcvfhrls, 8 bytes                    @116     
   ub4 kscnbas                              @116      0x00000001
   ub2 kscnwrp                              @120      0x0000

通过bbed修改相关值,然后重建控制文件成功,尝试resetlogs库,报ORA-01248错误

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01248: file 234 was created in the future of incomplete recovery
ORA-01110: data file 234: '/oradata1/xifenfeidata5.DBF'

关于ORA-01248的错误解释

01248, 00000, "file %s was created in the future of incomplete recovery"
// *Cause:  Attempting to do a RESETLOGS open with a file entry in the
//          control file that was originally created after the UNTIL time 
//          of the incomplete recovery.
//          Allowing such an entry may hide the version of the file that 
//          is needed at this time.  The file number may be in use for 
//          a different file which would be lost if the RESETLOGS was allowed.
// *Action: If more recovery is desired then apply redo until the creation
//          time of the file is reached. If the file is not wanted and the
//          same file number is not in use at the stop time of the recovery,
//          then the file can be taken offline with the FOR DROP option.
//          Otherwise a different control file is needed to allow the RESETLOGS.
//          Another backup can be restored and recovered, or a control file can
//          be created via CREATE CONTROLFILE.

大概的意思是文件的创建时间大于文件当前的scn,通过查询确实如此

SQL> select file#,CREATION_CHANGE#,CREATION_TIME from v$datafile_header where file#=234;

           FILE# CREATION_CHANGE# CREATION_
---------------- ---------------- ---------
             234     419298664864 02-AUG-19

SQL> SELECT status,  
  2  to_char(checkpoint_change#,'9999999999999999') "SCN",
  3  to_char(checkpoint_time,'yyyy-mm-dd hh24:mi:ss') checkpoint_time,FUZZY,
  4  count(*) ROW_NUM
  5  FROM v$datafile_header
  6  GROUP BY status, checkpoint_change#, to_char(checkpoint_time,'yyyy-mm-dd hh24:mi:ss'),fuzzy
  7  ORDER BY status, checkpoint_change#, checkpoint_time;

STATUS  SCN               CHECKPOINT_TIME     FUZ          ROW_NUM
------- ----------------- ------------------- --- ----------------
ONLINE       417750848223 2021-02-23 23:50:46 YES                7
ONLINE       417750848223 2021-03-21 11:44:25 NO               396

通过对部分scn进行修改(比如减小创建时间的scn),然后尝试resetlogs库

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 5 with name
"_SYSSMU5_2708889888$" too small
Process ID: 3182
Session ID: 1 Serial number: 3

这个错误比较简单,参考以前的部分文章:在数据库open过程中常遇到ORA-01555汇总数据库open过程遭遇ORA-1555对应sql语句补充,处理之后,数据库open成功

SQL> startup mount;
ORACLE instance started.

Total System Global Area 1.7704E+10 bytes
Fixed Size                  2235568 bytes
Variable Size            2348811088 bytes
Database Buffers         1.5301E+10 bytes
Redo Buffers               52580352 bytes
Database mounted.
SQL> alter database open;

Database altered.

本次数据库恢复基本上完成,已经最大限度恢复数据,导出数据到新库,完成恢复任务