记录一次oracle现场故障处理经过

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:记录一次oracle现场故障处理经过

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

近期到现场进行了一个数据库恢复,我在恢复之前该库先由于硬件进行恢复,然后由其他人对其进行了一系列数据库恢复,但是未恢复成功,客户希望我们到现场进行处理(因为网络原因无法远程).接手库之后,处理第一个问题,是客户在进行现场备份的时候(把linux数据拷贝到win的过程中)发现有几个文件拷贝异常,这个错误很可能是由于当初的硬件故障修复之后留下的后遗症(由于io设备错误,无法运行此项请求),通过工具进行拷贝,恢复出来
20210403210131


DUL> copy file from  /oradata2/xifenfeidata.dbf to /oradata2/xifenfeidata.dbf

starting copy datafile '/oradata1/xifenfeidata.dbf' to '/oradata2/xifenfeidata.dbf'
read data error from file '/oradata1/xifenfeidata.dbf'.error message:Input/output error
read block# error: 560171
read data error from file '/oradata1/xifenfeidata.dbf'.error message:Input/output error
read block# error: 560179
datafile copy completed with 2 block error.
[oracle@localhost ~]$ dbv file=/oradata2/xifenfeidata.dbf blocksize=16384

DBVERIFY: Release 11.2.0.3.0 - Production on Mon Mar 29 17:28:17 2021

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /oradata2/xifenfeidata.dbf
Page 560171 is marked corrupt
Corrupt block relative dba: 0x3bc88c2b (file 239, block 560171)
Completely zero block found during dbv: 

Page 560179 is marked corrupt
Corrupt block relative dba: 0x3bc88c33 (file 239, block 560179)
Completely zero block found during dbv: 



DBVERIFY - Verification complete

Total Pages Examined         : 4194302
Total Pages Processed (Data) : 2230726
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 1936953
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 26618
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 3
Total Pages Marked Corrupt   : 2
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 304929867 (106.304929867)

修复完相关无法拷贝文件之后,启动数据库报控制文件异常

Mon Mar 29 15:03:38 2021
alter database mount
USER (ospid: 29044): terminating the instance
Mon Mar 29 15:03:42 2021
System state dump requested by (instance=1, osid=29044), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/xff/xff/trace/xff_diag_28961.trc
Instance terminated by USER, pid = 29044

尝试重建ctl

[oracle@localhost ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Mon Mar 29 17:40:17 2021

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup nomount pfile='/tmp/pfile'
ORACLE instance started.

Total System Global Area 1.7704E+10 bytes
Fixed Size                  2235568 bytes
Variable Size            2348811088 bytes
Database Buffers         1.5301E+10 bytes
Redo Buffers               52580352 bytes
SQL> @/tmp/ctl.sql
CREATE CONTROLFILE REUSE DATABASE xff NORESETLOGS  NOARCHIVELOG
*
ERROR at line 1:
ORA-01503: CREATE CONTROLFILE failed
ORA-01189: file is from a different RESETLOGS than previous files
ORA-01110: data file 249: '/oradata/xff/system03.dbf'

初步判断是由于对方之前恢复导致部分文件resetlogs scn异常,通过bbed进行判断确认

BBED> set file 1
        FILE#           1

BBED> p kcvfhrls
struct kcvfhrls, 8 bytes                    @116     
   ub4 kscnbas                              @116      0x00000001
   ub2 kscnwrp                              @120      0x0000

BBED> set file 249
        FILE#           249

BBED> p kcvfhrls
struct kcvfhrls, 8 bytes                    @116     
   ub4 kscnbas                              @116      0x00000001
   ub2 kscnwrp                              @120      0x0000

通过bbed修改相关值,然后重建控制文件成功,尝试resetlogs库,报ORA-01248错误

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01248: file 234 was created in the future of incomplete recovery
ORA-01110: data file 234: '/oradata1/xifenfeidata5.DBF'

关于ORA-01248的错误解释

01248, 00000, "file %s was created in the future of incomplete recovery"
// *Cause:  Attempting to do a RESETLOGS open with a file entry in the
//          control file that was originally created after the UNTIL time 
//          of the incomplete recovery.
//          Allowing such an entry may hide the version of the file that 
//          is needed at this time.  The file number may be in use for 
//          a different file which would be lost if the RESETLOGS was allowed.
// *Action: If more recovery is desired then apply redo until the creation
//          time of the file is reached. If the file is not wanted and the
//          same file number is not in use at the stop time of the recovery,
//          then the file can be taken offline with the FOR DROP option.
//          Otherwise a different control file is needed to allow the RESETLOGS.
//          Another backup can be restored and recovered, or a control file can
//          be created via CREATE CONTROLFILE.

大概的意思是文件的创建时间大于文件当前的scn,通过查询确实如此

SQL> select file#,CREATION_CHANGE#,CREATION_TIME from v$datafile_header where file#=234;

           FILE# CREATION_CHANGE# CREATION_
---------------- ---------------- ---------
             234     419298664864 02-AUG-19

SQL> SELECT status,  
  2  to_char(checkpoint_change#,'9999999999999999') "SCN",
  3  to_char(checkpoint_time,'yyyy-mm-dd hh24:mi:ss') checkpoint_time,FUZZY,
  4  count(*) ROW_NUM
  5  FROM v$datafile_header
  6  GROUP BY status, checkpoint_change#, to_char(checkpoint_time,'yyyy-mm-dd hh24:mi:ss'),fuzzy
  7  ORDER BY status, checkpoint_change#, checkpoint_time;

STATUS  SCN               CHECKPOINT_TIME     FUZ          ROW_NUM
------- ----------------- ------------------- --- ----------------
ONLINE       417750848223 2021-02-23 23:50:46 YES                7
ONLINE       417750848223 2021-03-21 11:44:25 NO               396

通过对部分scn进行修改(比如减小创建时间的scn),然后尝试resetlogs库

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 5 with name
"_SYSSMU5_2708889888$" too small
Process ID: 3182
Session ID: 1 Serial number: 3

这个错误比较简单,参考以前的部分文章:在数据库open过程中常遇到ORA-01555汇总数据库open过程遭遇ORA-1555对应sql语句补充,处理之后,数据库open成功

SQL> startup mount;
ORACLE instance started.

Total System Global Area 1.7704E+10 bytes
Fixed Size                  2235568 bytes
Variable Size            2348811088 bytes
Database Buffers         1.5301E+10 bytes
Redo Buffers               52580352 bytes
Database mounted.
SQL> alter database open;

Database altered.

本次数据库恢复基本上完成,已经最大限度恢复数据,导出数据到新库,完成恢复任务

Input/output error故障恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:Input/output error故障恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户由于硬件故障,导致数据文件出现io错误

oracle@linux1:~> dd if=/oradata/orcl/system01.dbf of=/oradata/orcl/system01.dbf_bak bs=8192
dd: reading `/oradata/orcl/system01.dbf': Input/output error
83871+0 records in
83871+0 records out
687071232 bytes (687 MB) copied, 1.07177 s, 641 MB/s
dd: reading `/oradata/orcl/system01.dbf': Input/output error
83871+0 records in
83871+0 records out
687071232 bytes (687 MB) copied, 1.0731 s, 640 MB/s
dd: reading `/oradata/orcl/system01.dbf': Input/output error
83871+0 records in
83871+0 records out
687071232 bytes (687 MB) copied, 1.07431 s, 640 MB/s
dd: reading `/oradata/orcl/system01.dbf': Input/output error
85158+1 records in
85158+1 records out
697618432 bytes (698 MB) copied, 4.11649 s, 169 MB/s
dd: reading `/oradata/orcl/system01.dbf': Input/output error
85158+1 records in
85158+1 records out
697618432 bytes (698 MB) copied, 5.64775 s, 124 MB/s
dd: reading `/oradata/orcl/system01.dbf': Input/output error
85158+1 records in
85158+1 records out
697618432 bytes (698 MB) copied, 7.1791 s, 97.2 MB/s
dd: reading `/oradata/orcl/system01.dbf': Input/output error
85158+1 records in
85158+1 records out
697618432 bytes (698 MB) copied, 8.70247 s, 80.2 MB/s
dd: reading `/oradata/orcl/system01.dbf': Input/output error
85158+1 records in
85158+1 records out
697618432 bytes (698 MB) copied, 10.2258 s, 68.2 MB/s
dd: reading `/oradata/orcl/system01.dbf': Input/output error
85158+1 records in
85158+1 records out
697618432 bytes (698 MB) copied, 10.2272 s, 68.2 MB/s
dd: reading `/oradata/orcl/system01.dbf': Input/output error
85158+1 records in
85158+1 records out
697618432 bytes (698 MB) copied, 10.2284 s, 68.2 MB/s
dd: reading `/oradata/orcl/system01.dbf': Input/output error
85158+1 records in
85158+1 records out
697618432 bytes (698 MB) copied, 10.2296 s, 68.2 MB/s
dd: reading `/oradata/orcl/system01.dbf': Input/output error
85158+1 records in
85158+1 records out
697618432 bytes (698 MB) copied, 10.2309 s, 68.2 MB/s
dd: reading `/oradata/orcl/system01.dbf': Input/output error
85170+1 records in
85170+1 records out
697716736 bytes (698 MB) copied, 11.7563 s, 59.3 MB/s
dd: reading `/oradata/orcl/system01.dbf': Input/output error
85170+1 records in
85170+1 records out
697716736 bytes (698 MB) copied, 13.3038 s, 52.4 MB/s
93431+1 records in
93431+1 records out
765390848 bytes (765 MB) copied, 18.2578 s, 41.9 MB/s

这个明显io错误比较多,无法直接使用以前的dd方法较好的恢复数据,只能通过linux平台的一些io工具修复文件(或者直接把磁盘挂载到win上通过工具处理),然后下载到win机器之后效果不错,只有17个坏块

C:\Users\XIFENFEI>dbv file=f:/11.2.0.1/system01.dbf

DBVERIFY: Release 10.2.0.3.0 - Production on 星期日 2月 24 22:46:59 2019

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

DBVERIFY - 开始验证: FILE = f:/11.2.0.1/system01.dbf
页 83871 标记为损坏
Corrupt block relative dba: 0x0041479f (file 1, block 83871)
Bad header found during dbv:
Data in bad block:
 type: 229 format: 5 rdba: 0xe5e5e5e5
 last change scn: 0xe5e5.e5e5e5e5 seq: 0xe5 flg: 0xe5
 spare1: 0xe5 spare2: 0xe5 spare3: 0xe5e5
 consistency value in tail: 0xe5e5e5e5
 check value in block header: 0xe5e5
 computed block checksum: 0x0

页 83872 标记为损坏
Corrupt block relative dba: 0x004147a0 (file 1, block 83872)
Bad header found during dbv:
Data in bad block:
 type: 229 format: 5 rdba: 0xe5e5e5e5
 last change scn: 0xe5e5.e5e5e5e5 seq: 0xe5 flg: 0xe5
 spare1: 0xe5 spare2: 0xe5 spare3: 0xe5e5
 consistency value in tail: 0xe5e5e5e5
 check value in block header: 0xe5e5
 computed block checksum: 0x0

页 83873 标记为损坏
Corrupt block relative dba: 0x004147a1 (file 1, block 83873)
Bad header found during dbv:
Data in bad block:
 type: 229 format: 5 rdba: 0xe5e5e5e5
 last change scn: 0xe5e5.e5e5e5e5 seq: 0xe5 flg: 0xe5
 spare1: 0xe5 spare2: 0xe5 spare3: 0xe5e5
 consistency value in tail: 0x759c0601
 check value in block header: 0xe5e5
 computed block checksum: 0xddc6

页 85161 流入 - 很可能是介质损坏
Corrupt block relative dba: 0x00414ca9 (file 1, block 85161)
Fractured block found during dbv:
Data in bad block:
 type: 6 format: 2 rdba: 0x00414ca9
 last change scn: 0x0000.0ce20ac2 seq: 0x2 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0xe5e5e5e5
 check value in block header: 0x47f0
 computed block checksum: 0xc3ab

页 85162 标记为损坏
Corrupt block relative dba: 0x00414caa (file 1, block 85162)
Bad header found during dbv:
Data in bad block:
 type: 229 format: 5 rdba: 0xe5e5e5e5
 last change scn: 0xe5e5.e5e5e5e5 seq: 0xe5 flg: 0xe5
 spare1: 0xe5 spare2: 0xe5 spare3: 0xe5e5
 consistency value in tail: 0xe5e5e5e5
 check value in block header: 0xe5e5
 computed block checksum: 0x0

页 85163 标记为损坏
Corrupt block relative dba: 0x00414cab (file 1, block 85163)
Bad header found during dbv:
Data in bad block:
 type: 229 format: 5 rdba: 0xe5e5e5e5
 last change scn: 0xe5e5.e5e5e5e5 seq: 0xe5 flg: 0xe5
 spare1: 0xe5 spare2: 0xe5 spare3: 0xe5e5
 consistency value in tail: 0xe5e5e5e5
 check value in block header: 0xe5e5
 computed block checksum: 0x0

页 85164 标记为损坏
Corrupt block relative dba: 0x00414cac (file 1, block 85164)
Bad header found during dbv:
Data in bad block:
 type: 229 format: 5 rdba: 0xe5e5e5e5
 last change scn: 0xe5e5.e5e5e5e5 seq: 0xe5 flg: 0xe5
 spare1: 0xe5 spare2: 0xe5 spare3: 0xe5e5
 consistency value in tail: 0xe5e5e5e5
 check value in block header: 0xe5e5
 computed block checksum: 0x0

页 85165 标记为损坏
Corrupt block relative dba: 0x00414cad (file 1, block 85165)
Bad header found during dbv:
Data in bad block:
 type: 229 format: 5 rdba: 0xe5e5e5e5
 last change scn: 0xe5e5.e5e5e5e5 seq: 0xe5 flg: 0xe5
 spare1: 0xe5 spare2: 0xe5 spare3: 0xe5e5
 consistency value in tail: 0xe5e5e5e5
 check value in block header: 0xe5e5
 computed block checksum: 0x0

页 85166 标记为损坏
Corrupt block relative dba: 0x00414cae (file 1, block 85166)
Bad header found during dbv:
Data in bad block:
 type: 229 format: 5 rdba: 0xe5e5e5e5
 last change scn: 0xe5e5.e5e5e5e5 seq: 0xe5 flg: 0xe5
 spare1: 0xe5 spare2: 0xe5 spare3: 0xe5e5
 consistency value in tail: 0xe5e5e5e5
 check value in block header: 0xe5e5
 computed block checksum: 0x0

页 85167 标记为损坏
Corrupt block relative dba: 0x00414caf (file 1, block 85167)
Bad header found during dbv:
Data in bad block:
 type: 229 format: 5 rdba: 0xe5e5e5e5
 last change scn: 0xe5e5.e5e5e5e5 seq: 0xe5 flg: 0xe5
 spare1: 0xe5 spare2: 0xe5 spare3: 0xe5e5
 consistency value in tail: 0xe5e5e5e5
 check value in block header: 0xe5e5
 computed block checksum: 0x0

页 85177 流入 - 很可能是介质损坏
Corrupt block relative dba: 0x00414cb9 (file 1, block 85177)
Fractured block found during dbv:
Data in bad block:
 type: 6 format: 2 rdba: 0x00414cb9
 last change scn: 0x0000.0ce55ebf seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0xe5e5e5e5
 check value in block header: 0x54c9
 computed block checksum: 0x5ce5

页 85178 标记为损坏
Corrupt block relative dba: 0x00414cba (file 1, block 85178)
Bad header found during dbv:
Data in bad block:
 type: 229 format: 5 rdba: 0xe5e5e5e5
 last change scn: 0xe5e5.e5e5e5e5 seq: 0xe5 flg: 0xe5
 spare1: 0xe5 spare2: 0xe5 spare3: 0xe5e5
 consistency value in tail: 0xe5e5e5e5
 check value in block header: 0xe5e5
 computed block checksum: 0x0

页 85179 标记为损坏
Corrupt block relative dba: 0x00414cbb (file 1, block 85179)
Bad header found during dbv:
Data in bad block:
 type: 229 format: 5 rdba: 0xe5e5e5e5
 last change scn: 0xe5e5.e5e5e5e5 seq: 0xe5 flg: 0xe5
 spare1: 0xe5 spare2: 0xe5 spare3: 0xe5e5
 consistency value in tail: 0xe5e5e5e5
 check value in block header: 0xe5e5
 computed block checksum: 0x0

页 85180 标记为损坏
Corrupt block relative dba: 0x00414cbc (file 1, block 85180)
Bad header found during dbv:
Data in bad block:
 type: 229 format: 5 rdba: 0xe5e5e5e5
 last change scn: 0xe5e5.e5e5e5e5 seq: 0xe5 flg: 0xe5
 spare1: 0xe5 spare2: 0xe5 spare3: 0xe5e5
 consistency value in tail: 0xe5e5e5e5
 check value in block header: 0xe5e5
 computed block checksum: 0x0

页 85181 标记为损坏
Corrupt block relative dba: 0x00414cbd (file 1, block 85181)
Bad header found during dbv:
Data in bad block:
 type: 229 format: 5 rdba: 0xe5e5e5e5
 last change scn: 0xe5e5.e5e5e5e5 seq: 0xe5 flg: 0xe5
 spare1: 0xe5 spare2: 0xe5 spare3: 0xe5e5
 consistency value in tail: 0xe5e5e5e5
 check value in block header: 0xe5e5
 computed block checksum: 0x0

页 85182 标记为损坏
Corrupt block relative dba: 0x00414cbe (file 1, block 85182)
Bad header found during dbv:
Data in bad block:
 type: 229 format: 5 rdba: 0xe5e5e5e5
 last change scn: 0xe5e5.e5e5e5e5 seq: 0xe5 flg: 0xe5
 spare1: 0xe5 spare2: 0xe5 spare3: 0xe5e5
 consistency value in tail: 0xe5e5e5e5
 check value in block header: 0xe5e5
 computed block checksum: 0x0

页 85183 标记为损坏
Corrupt block relative dba: 0x00414cbf (file 1, block 85183)
Bad header found during dbv:
Data in bad block:
 type: 229 format: 5 rdba: 0xe5e5e5e5
 last change scn: 0xe5e5.e5e5e5e5 seq: 0xe5 flg: 0xe5
 spare1: 0xe5 spare2: 0xe5 spare3: 0xe5e5
 consistency value in tail: 0xe5e5e5e5
 check value in block header: 0xe5e5
 computed block checksum: 0x0



DBVERIFY - 验证完成

检查的页总数: 93440
处理的页总数 (数据): 64294
失败的页总数 (数据): 0
处理的页总数 (索引): 12616
失败的页总数 (索引): 0
处理的页总数 (其它): 3111
处理的总页数 (段)  : 0
失败的总页数 (段)  : 0
空的页总数: 13402
标记为损坏的总页数: 17
流入的页总数: 2
最高块 SCN            : 1073748415 (0.1073748415)

C:\Users\XIFENFEI>

经过一系列恢复,数据库强制打开,数据库后台报ORA-7445 kkogbro

Completed: alter database open resetlogs upgrade
Sun Feb 24 18:06:36 2019
MMON started with pid=15, OS id=9032 
Sun Feb 24 18:07:50 2019
Errors in file d:\app\diag\rdbms\orcl\orcl\trace\orcl_ora_8336.trc:
Sun Feb 24 18:07:52 2019
Trace dumping is performing id=[cdmp_20190224180752]
Sun Feb 24 18:09:42 2019
alter tablespace temp add tempfile 'f:/11.2.0.1/temp01.dbf' size 128m autoextend on
Completed: alter tablespace temp add tempfile 'f:/11.2.0.1/temp01.dbf' size 128m autoextend on
Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0x166] [PC:0x38E41AD, kkogbro()+497]
ERROR: Unable to normalize symbol name for the following short stack (at offset 199):
dbgexProcessError()+193<-dbgeExecuteForError()+65<-dbgePostErrorKGE()+1726<-dbkePostKGE_kgsf()+
75<-kgeade()+560<-kgerev()+125<-kgerec5()+60<-sss_xcpt_EvalFilterEx()+1869<-
sss_xcpt_EvalFilter()+174<-.1.6_8+59<-0000000077207388<-000000007721BF7D<-
00000000771F043A<-000000007721B61E<-kkogbro()+497<-kkogjro()+99<-kkojnp()
+10299<-kkocnp()+78<-kkooqb()+1549<-kkoqbc()+2474<-apakkoqb()+200<-
apaqbdDescendents()+496<-apaqbdList()+79<-apaqbdDescendents()+795<-
apaqbdList()+79<-apaqbd()+17<-apadrv()+818<-opitca()+2518<-kksLoadChild()+9008
<-kxsGetRuntimeLock()+2320<-kksfbc()+15225<-kkspbd0()+669<-kksParseCursor()+741
<-opiosq0()+2538<-opipls()+12841<-opiodr()+1662<-rpidrus()+862<-rpidru()+154
<-rpiswu2()+2757<-rpidrv()+6105<-psddr0()+614<-psdnal()+510<-pevm_EXECC()+365
<-pfrinstr_EXECC()+90<-pfrrun_no_tool()+65<-pfrrun()+1241<-plsql_run()+875
<-peicnt()+329<-kkxexe()+616<-opiexe()+20006
Errors in file d:\app\diag\rdbms\orcl\orcl\trace\orcl_ora_8336.trc  (incident=2540):
ORA-07445: 出现异常错误: 核心转储 [kkogbro()+497] [ACCESS_VIOLATION] [ADDR:0x166] [PC:0x38E41AD] [UNABLE_TO_READ] []
Incident details in: d:\app\diag\rdbms\orcl\orcl\incident\incdir_2540\orcl_ora_8336_i2540.trc

通过分析trace文件,确认是和坏块有关系,对于上述坏块进行处理之后,数据正常导出.