resetlogs强制拉库失败并使用备份system文件还原数据库故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:resetlogs强制拉库失败并使用备份system文件还原数据库故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

接手一个库,在open的过程中遭遇到ORA-600 2662错误

Sun May 26 10:15:54 2024
alter database open RESETLOGS
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 84303583
Clearing online redo logfile 1 /data/OracleData/xff/redo01.log
Clearing online log 1 of thread 1 sequence number 8330
Clearing online redo logfile 1 complete
Clearing online redo logfile 2 /data/OracleData/xff/redo02.log
Clearing online log 2 of thread 1 sequence number 8327
Clearing online redo logfile 2 complete
Clearing online redo logfile 3 /data/OracleData/xff/redo03.log
Clearing online log 3 of thread 1 sequence number 8329
Clearing online redo logfile 3 complete
Clearing online redo logfile 4 /data/OracleData/xff/redo04.log
Clearing online log 4 of thread 1 sequence number 8328
Clearing online redo logfile 4 complete
Resetting resetlogs activation ID 1431370398 (0x5550fa9e)
Online log /data/OracleData/xff/redo01.log: Thread 1 Group 1 was previously cleared
Online log /data/OracleData/xff/redo02.log: Thread 1 Group 2 was previously cleared
Online log /data/OracleData/xff/redo03.log: Thread 1 Group 3 was previously cleared
Online log /data/OracleData/xff/redo04.log: Thread 1 Group 4 was previously cleared
Sun May 26 10:15:59 2024
Setting recovery target incarnation to 3
Sun May 26 10:15:59 2024
Read of datafile '/data/OracleData/xff/temp01.dbf' (fno 201) header failed with ORA-01200
Rereading datafile 201 header failed with ORA-01200
Errors in file /data/u01/app/oracle/diag/rdbms/xff/xff/trace/xff_dbw0_1563.trc:
ORA-01186: file 201 failed verification tests
ORA-01122: database file 201 failed verification check
ORA-01110: data file 201: '/data/OracleData/xff/temp01.dbf'
ORA-01200: actual file size of 3711 is smaller than correct size of 3712 blocks
File 201 not verified due to error ORA-01122
Sun May 26 10:15:59 2024
Assigning activation ID 1509069065 (0x59f29109)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: /data/OracleData/xff/redo01.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sun May 26 10:15:59 2024
SMON: enabling cache recovery
Errors in file /data/u01/app/oracle/diag/rdbms/xff/xff/trace/xff_ora_1590.trc  (incident=163897):
ORA-00600: internal error code, arguments: [2662], [0], [84303590], [0], [84314659], [12583040] 
Incident details in:/data/u01/app/oracle/diag/rdbms/xff/xff/incident/incdir_163897/xff_ora_1590_i163897.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /data/u01/app/oracle/diag/rdbms/xff/xff/trace/xff_ora_1590.trc:
ORA-00600: internal error code, arguments: [2662], [0], [84303590], [0], [84314659], [12583040] 
Errors in file /data/u01/app/oracle/diag/rdbms/xff/xff/trace/xff_ora_1590.trc:
ORA-00600: internal error code, arguments: [2662], [0], [84303590], [0], [84314659], [12583040] 
Error 600 happened during db open, shutting down database
USER (ospid: 1590): terminating the instance due to error 600

然后客户使用备份的system01.dbf文件替换了被resetlogs之后文件,导致数据库后续操作无法继续

SQL> recover database;
ORA-00283: recovery session canceled due to errors
ORA-19909: datafile 1 belongs to an orphan incarnation
ORA-01110: data file 1: '/data/OracleData/xff/system01.dbf'

这个问题比较简单,通过bbed或者Oracle Recovery Tools修改文件头相关信息,然后open数据库成功
重建控制文件丢失数据文件导致悲剧
Oracle Recovery Tools快速恢复ORA-19909

SQL> recover datafile 1;
Media recovery complete.
SQL> recover database;
Media recovery complete.
SQL> alter database open;

Database altered.

但是由于system文件有大量坏块导致数据库无法正常登录和导出

[oracle@et-dbserver ~]$ exp "'/ as sysdba'" owner=USERNAME  file=/tmp/2user.dmp log=/tmp/2user.log 

Export: Release 11.2.0.4.0 - Production on Sun May 26 13:00:50 2024

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

EXP-00056: ORACLE error 604 encountered
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 86500)
ORA-01110: data file 1: '/data/OracleData/xff/system01.dbf'
Username: / as sysdba

EXP-00056: ORACLE error 604 encountered
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 86500)
ORA-01110: data file 1: '/data/OracleData/xff/system01.dbf'
Username:
Password:

EXP-00056: ORACLE error 604 encountered
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 86500)
ORA-01110: data file 1: '/data/OracleData/xff/system01.dbf'
ORA-01017: invalid username/password; logon denied
EXP-00005: all allowable logon attempts failed
EXP-00000: Export terminated unsuccessfully

通过dbv检查system数据文件

DBVERIFY: Release 11.2.0.4.0 - Production on Sun May 26 12:33:28 2024

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.


DBVERIFY - Verification starting : FILE = /data/OracleData/xff/system01.dbf
Page 1044 is influx - most likely media corrupt
Corrupt block relative dba: 0x00400414 (file 1, block 1044)
Fractured block found during dbv: 
Data in bad block:
 type: 0 format: 2 rdba: 0x00400414
 last change scn: 0x0000.00000000 seq: 0x1 flg: 0x05
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x1d7f550b
 check value in block header: 0xa354
 computed block checksum: 0x6830

Page 1103 is marked corrupt
Corrupt block relative dba: 0x0040044f (file 1, block 1103)
Bad header found during dbv: 
Data in bad block:
 type: 0 format: 0 rdba: 0x00000000
 last change scn: 0x508f.5f74492e seq: 0x53 flg: 0x0c
 spare1: 0xc spare2: 0xa6 spare3: 0xc757
 consistency value in tail: 0x00000001
 check value in block header: 0x8925
 computed block checksum: 0x5d3b

Page 1143 is marked corrupt
Corrupt block relative dba: 0x00400477 (file 1, block 1143)
Bad header found during dbv: 
Data in bad block:
 type: 0 format: 0 rdba: 0x00000001
 last change scn: 0x65c4.52eb202e seq: 0x28 flg: 0x0e
 spare1: 0xe spare2: 0xe2 spare3: 0xfa46
 consistency value in tail: 0x00000001
 check value in block header: 0x6405
 computed block checksum: 0x28b1

………………

Page 124805 is influx - most likely media corrupt
Corrupt block relative dba: 0x0041e785 (file 1, block 124805)
Fractured block found during dbv: 
Data in bad block:
 type: 6 format: 2 rdba: 0x0041e785
 last change scn: 0x0000.0434fc6c seq: 0x2 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x1991255b
 check value in block header: 0x6386
 computed block checksum: 0x1384


DBVERIFY - Verification complete

Total Pages Examined         : 130560
Total Pages Processed (Data) : 95634
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 14949
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 1774
Total Pages Processed (Seg)  : 1669
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 16251
Total Pages Marked Corrupt   : 283
Total Pages Influx           : 149
Total Pages Encrypted        : 0
Highest block SCN            : 84314727 (0.84314727)

对于这样问题,通过Oracle Recovery Tools实战批量坏块修复,实现数据库可以完美导出数据

ORA-600 2131故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-600 2131故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

数据库启动报ORA-600 2131错误,查看alert日志发现是在mount过程报错

Fri May 17 20:58:28 2024
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Initial number of CPU is 16
Number of processor cores in the system is 8
Number of processor sockets in the system is 1
Picked latch-free SCN scheme 3
Autotune of undo retention is turned on. 
IMODE=BR
ILAT =249
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options.
Windows NT Version V6.2  
CPU                 : 16 - type 8664, 8 Physical Cores
Process Affinity    : 0x0x0000000000000000
Memory (Avail/Total): Ph:93799M/97925M, Ph+PgF:78891M/112261M 
Using parameter settings in server-side spfile E:\APP\ADMINISTRATOR\PRODUCT\11.2.0\DBHOME_1\DATABASE\SPFILEXFF.ORA
System parameters with non-default values:
  processes                = 1500
  sessions                 = 2272
  nls_language             = "SIMPLIFIED CHINESE"
  nls_territory            = "CHINA"
  sga_target               = 29440M
  control_files            = "E:\ORADATA\xff\CONTROL01.CTL"
  db_block_size            = 8192
  compatible               = "11.2.0.4.0"
  log_archive_dest_1       = "LOCATION=e:\app\archivelog\"
  log_archive_format       = "ARC%S_%R.%T"
  undo_tablespace          = "UNDOTBS2"
  sec_case_sensitive_logon = FALSE
  remote_login_passwordfile= "EXCLUSIVE"
  db_domain                = ""
  dispatchers              = "(PROTOCOL=TCP) (SERVICE=xffXDB)"
  audit_file_dest          = "E:\APP\ADMINISTRATOR\ADMIN\xff\ADUMP"
  audit_trail              = "NONE"
  db_name                  = "xff"
  open_cursors             = 300
  pga_aggregate_target     = 9792M
  diagnostic_dest          = "E:\APP\ADMINISTRATOR"
Fri May 17 20:58:29 2024
PMON started with pid=2, OS id=6696 
Fri May 17 20:58:29 2024
PSP0 started with pid=3, OS id=2424 
Fri May 17 20:58:30 2024
VKTM started with pid=4, OS id=5472 at elevated priority
VKTM running at (10)millisec precision with DBRM quantum (100)ms
Fri May 17 20:58:30 2024
GEN0 started with pid=5, OS id=5764 
Fri May 17 20:58:30 2024
DIAG started with pid=6, OS id=372 
Fri May 17 20:58:30 2024
DBRM started with pid=7, OS id=2992 
Fri May 17 20:58:30 2024
DIA0 started with pid=8, OS id=4960 
Fri May 17 20:58:30 2024
MMAN started with pid=9, OS id=6036 
Fri May 17 20:58:30 2024
DBW0 started with pid=10, OS id=4724 
Fri May 17 20:58:30 2024
DBW1 started with pid=11, OS id=2652 
Fri May 17 20:58:30 2024
LGWR started with pid=12, OS id=5320 
Fri May 17 20:58:30 2024
CKPT started with pid=13, OS id=5732 
Fri May 17 20:58:30 2024
SMON started with pid=14, OS id=936 
Fri May 17 20:58:30 2024
RECO started with pid=15, OS id=2192 
Fri May 17 20:58:30 2024
MMON started with pid=16, OS id=5576 
Fri May 17 20:58:30 2024
MMNL started with pid=17, OS id=5748 
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
starting up 1 shared server(s) ...
ORACLE_BASE from environment = E:\app\Administrator
Fri May 17 20:58:31 2024
ALTER DATABASE   MOUNT
Errors in file E:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\trace\xff_ora_5452.trc  (incident=403399):
ORA-00600: ??????, ??: [2131], [9], [8], [], [], [], [], [], [], [], [], []
Incident details in: E:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\incident\incdir_403399\xff_ora_5452_i403399.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
ORA-600 signalled during: ALTER DATABASE   MOUNT...

这个错误是由于controlfile损坏导致,有这个库以前部署过rman备份,解决起来比较简单,使用rman还原控制文件,并尝试recover

RMAN> restore controlfile from 'E:\rmanback\rmanfile\CTL_20240517_A62R067K_1_1.RMAN';

启动 restore 于 17-5月 -24
使用通道 ORA_DISK_1

通道 ORA_DISK_1: 正在还原控制文件
通道 ORA_DISK_1: 还原完成, 用时: 00:00:01
输出文件名=E:\ORADATA\XFF\CONTROL01.CTL
完成 restore 于 17-5月 -24

RMAN>

RMAN>

RMAN> alter database mount;

数据库已装载
释放的通道: ORA_DISK_1

RMAN> recover database;

启动 recover 于 17-5月 -24
分配的通道: ORA_DISK_1
通道 ORA_DISK_1: SID=996 设备类型=DISK

正在开始介质的恢复

线程 1 序列 4100 的归档日志已作为文件 E:\ORADATA\XFF\REDO02.LOG 存在于磁盘上
线程 1 序列 4101 的归档日志已作为文件 E:\ORADATA\XFF\REDO03.LOG 存在于磁盘上
线程 1 序列 4102 的归档日志已作为文件 E:\ORADATA\XFF\REDO01.LOG 存在于磁盘上
归档日志文件名=E:\APP\ARCHIVELOG\ARC0000004025_1165094245.0001 线程=1 序列=4025
归档日志文件名=E:\APP\ARCHIVELOG\ARC0000004026_1165094245.0001 线程=1 序列=4026
…………
归档日志文件名=E:\APP\ARCHIVELOG\ARC0000004099_1165094245.0001 线程=1 序列=4099
归档日志文件名=E:\ORADATA\XFF\REDO02.LOG 线程=1 序列=4100
归档日志文件名=E:\ORADATA\XFF\REDO03.LOG 线程=1 序列=4101
归档日志文件名=E:\ORADATA\XFF\REDO01.LOG 线程=1 序列=4102
介质恢复完成, 用时: 00:00:22
完成 recover 于 17-5月 -24

RMAN> exit


恢复管理器完成。

E:\oradata\XFF>

这种恢复情况下,如果现在要打开库,需要resetlogs方式,考虑通过创建ctl直接打开(不想用resetlogs)

SQL> shutdown immediate;
ORA-01109: 数据库未打开


已经卸载数据库。
ORACLE 例程已经关闭。
SQL> startup nomount;
ORACLE 例程已经启动。

Total System Global Area 3.0732E+10 bytes
Fixed Size                  2296264 bytes
Variable Size            3825206840 bytes
Database Buffers         2.6844E+10 bytes
Redo Buffers               61206528 bytes
SQL> CREATE CONTROLFILE REUSE DATABASE "XFF" NORESETLOGS  ARCHIVELOG
  2      MAXLOGFILES 16
  3      MAXLOGMEMBERS 3
  4      MAXDATAFILES 100
  5      MAXINSTANCES 8
  6      MAXLOGHISTORY 876
  7  LOGFILE
  8    GROUP 1 'E:\ORADATA\XFF\REDO01.LOG'  SIZE 50M BLOCKSIZE 512,
  9    GROUP 2 'E:\ORADATA\XFF\REDO02.LOG'  SIZE 50M BLOCKSIZE 512,
 10    GROUP 3 'E:\ORADATA\XFF\REDO03.LOG'  SIZE 50M BLOCKSIZE 512
 11  -- STANDBY LOGFILE
 12  DATAFILE
 13    'E:\ORADATA\XFF\SYSTEM01.DBF',
 14    'E:\ORADATA\XFF\SYSAUX01.DBF',
 15    'E:\ORADATA\XFF\USERS01.DBF',
 16    'E:\ORADATA\XFF\XFF_DATA01.DBF',
 17    'E:\ORADATA\XFF\XFF_INDEX01.DBF',
 18    'E:\ORADATA\XFF\UNDOTBS2.DBF'
 19  CHARACTER SET ZHS16GBK
 20  ;

控制文件已创建。

SQL> recover database;
完成介质恢复。
SQL> alter database open;

数据库已更改。

SQL> ALTER TABLESPACE TEMP ADD TEMPFILE 'E:\ORADATA\XFF\TEMP01.DBF' REUSE;

表空间已更改。

至此本次恢复晚上,由于arch,redo和数据文件没有损坏,恢复非常完美,参考以前类似说明:ORA-600 2131故障说明

rm -rf误删Oracle数据库恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:rm -rf误删Oracle数据库恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户把虚拟化环境中装有oracle数据库的linux操作系统,由于操作失误在/下面执行了rm -rf *,导致所有文件被删除,系统无法启动.客户希望要求恢复出其中的Oracle数据库.由于是虚拟化环境,然后客户直接从虚拟化平台下载下来磁盘文件,通过工具加载和分析确认是一个xfs的文件系统
20240516190746


使用工具进一步扫描分析,找到部分数据文件
20240516190505

这里可以获取到两个信息:
1. 尝试恢复oracle的control01.ctl文件,然后通过该文件尝试分析其他数据文件位置,运气不错该文件恢复出来是好的,直接加载到新库查询v$datafile,分析出来所有数据文件信息
2. 这里有一个非常不幸的信息,oracle最核心的system01.dbf文件大小明显异常,进一步分析该文件信息,结论是该文件无法通过反删除方式进行恢复
20240515223607

先把可以os层面可以恢复的数据恢复出来,并且检查坏块情况
20240516191836

对于异常的system文件,有两个处理方法:
1. 通过阅览被删除的文件,发现客户有5月14日1点左右的rman备份,通过恢复软件中完整度提示,大概率应该没有什么问题,但是分析发现部分归档日志损坏无法完整恢复
2. 通过对磁盘做碎片,恢复出来该数据文件,参考以往文章:
dbca删除库和rm删库恢复
Oracle 数据文件大小为0kb或者文件丢失恢复
通过这个方法运气不错,恢复出来该库的system01.dbf文件非常完整0丢失

[oracle@localhost oradata]$ dbv file=system01.dbf

DBVERIFY: Release 19.0.0.0.0 - Production on Thu May 15 23:26:57 2024

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /u01/oradata/system01.dbf


DBVERIFY - Verification complete

Total Pages Examined         : 199680
Total Pages Processed (Data) : 113988
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 26869
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 40253
Total Pages Processed (Seg)  : 1
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 18570
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 658228557 (0.658228557)

完成上述恢复工作之后,目前确认只有sysaux01.dbf有8026个block损坏,但是该表空间不涉及业务数据,尝试在新的系统中直接修改路径并open库

SQL> recover database;
ORA-00283: recovery session canceled due to errors
ORA-38760: This database instance failed to turn on flashback database


SQL> alter database flashback off;

Database altered.

SQL> recover database;
Media recovery complete.
SQL> alter database open;

Database altered.

运气不错,数据库直接open成功,现在处理sysaux01.dbf中的损坏文件:
1. 确认该文件具体坏块开始位置:
20240516193650


2. 由于坏块在文件中比较靠后,分析实际存储数据最后的位置

SQL> select max(block_id+blocks) from dba_extents where file_id=3;

MAX(BLOCK_ID+BLOCKS)
--------------------
             3493120

最后存储数据的位置小于坏块的位置,证明坏块部分是没有存储数据的,直接resize掉坏块部分

SQL> alter database datafile '/u01/oradata/sysaux01.dbf' resize 27290m;

Database altered.

然后dbv该数据文件,确认没有任何问题

[oracle@localhost trace]$ dbv file=/u01/oradata/sysaux01.dbf

DBVERIFY: Release 19.0.0.0.0 - Production on Wed May 15 22:43:00 2024

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /u01/oradata/sysaux01.dbf


DBVERIFY - Verification complete

Total Pages Examined         : 3493120
Total Pages Processed (Data) : 1516833
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 1868832
Total Pages Failing   (Index): 0
Total Pages Processed (Lob)  : 56577
Total Pages Failing   (Lob)  : 0
Total Pages Processed (Other): 32107
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 18771
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 658223915 (0.658223915)

使用rman检测全库,也确定没有任何问题

[oracle@localhost trace]$ rman target /

Recovery Manager: Release 19.0.0.0.0 - Production on Wed May 15 22:43:58 2024
Version 19.15.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: XIFENFEI (DBID=2912535091)

RMAN> 

RMAN> 

RMAN> backup validate check logical database skip inaccessible;

Starting backup at 15-MAY-24
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=43 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=278 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
………………
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
32   OK     0              6273         6400            370625094 
  File Name: /u01/oradata/xff_com.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              0               
  Index      0              0               
  Other      0              127             

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
33   OK     0              163752       832000          627920639 
  File Name: /u01/oradata/XFF_DATA_202312231.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              374296          
  Index      0              285002          
  Other      0              8950            

Finished backup at 15-MAY-24

[oracle@localhost trace]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Wed May 15 22:47:44 2024
Version 19.15.0.0.0

Copyright (c) 1982, 2022, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.15.0.0.0

SQL> select * from v$database_block_corruption ;

no rows selected

SQL> 

至此对于这次rm -rf /*的故障实现了Oracle数据库完美恢复,数据0丢失.

分布式存储故障导致数据库无法启动故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:分布式存储故障导致数据库无法启动故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

国内xx医院使用了国外医疗行业龙头的pacs系统,由于是一个历史库,存放在分布式存储中,由于存储同时多个节点故障,导致数据库多个文件异常,数据库无法启动,三方维护人员尝试通通过rman归档进行应用日志,结果发现日志有损坏报ORA-00354 ORA-00353,无法记录恢复,希望我们给予支持
ORA-00353

Mon Apr 29 13:28:40 2024
Media Recovery failed with error 354
Mon Apr 29 13:28:40 2024
Errors in file F:\XXXXXX_DB\ORACLE\ADMIN\diag\rdbms\xxx\msxxx1\trace\xxx_pr00_4568.trc:
ORA-00283: recovery session canceled due to errors
ORA-00354: corrupt redo log block header
ORA-00353: log corruption near block 487424 change 8737273868 time 04/01/2024 01:38:25
ORA-00334: archived log: 'F:\XXXXXX_DB\ORADATA\XXX\LOGARC0000052184_0922116268.0001'
ORA-283 signalled during: alter database recover logfile 'F:\XXXXXX_DB\ORADATA\XXX\LOGARC0000052184_0922116268.0001'...

接手故障之后,通过尝试恢复发现除该错误之外,还有ORA-600 4552之类错误
ORA-600-4552


跳过这些异常文件恢复,最终确认异常文件有如下部分
20240511223413

这些文件由于日志无法正常应用(有日志损坏无法应用,有日志和数据文件block不匹配导致无法应用),这样的情况直接通过自研的Oracle Recovery Tools小工具直接修改文件头信息
orarecovery

然后尝试OPEN数据库结果报ORA-1207
ORA-1207

对于这个故障可以通过rectl或者using backup ctl方式处理,然后open数据库成功
20240510224845

由于该系统是历史库,不会有新业务写入,通过对异常表和索引进行处理之后,客户测试业务可以正常访问,完成本次恢复

存储故障后oracle报—ORA-01122/ORA-01207故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:存储故障后oracle报—ORA-01122/ORA-01207故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户存储异常,通过硬件恢复解决存储故障之后,oracle数据库无法正常启动(存储cache丢失),尝试recover数据库报ORA-00283 ORA-01122 ORA-01110 ORA-01207错误
以前处理过比较类似的存储故障case:
又一起存储故障导致ORA-00333 ORA-00312恢复
存储故障,强制拉库报ORA-600 kcbzib_kcrsds_1处理

SQL> recover database until cancel;
ORA-00283: 恢复会话因错误而取消
ORA-01122: 数据库文件 536 验证失败
ORA-01110: 数据文件 536: '+DATA/orcl/dt_img_dat511.ora'
ORA-01207: 文件比控制文件更新 - 旧的控制文件
Sun May 05 00:09:03 2024
ALTER DATABASE RECOVER  database until cancel  
Media Recovery Start
 started logmerger process
Sun May 05 00:09:10 2024
SUCCESS: diskgroup FRA was mounted
Sun May 05 00:09:10 2024
NOTE: dependency between database orcl and diskgroup resource ora.FRA.dg is established
Sun May 05 00:09:14 2024
WARNING! Recovering data file 1 from a fuzzy backup. It might be an online
backup taken without entering the begin backup command.
Media Recovery failed with error 1122
Slave exiting with ORA-283 exception
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_pr00_8048.trc:
ORA-00283: 恢复会话因错误而取消
ORA-01122: 数据库文件 536 验证失败
ORA-01110: 数据文件 536: '+DATA/orcl/dt_img_dat511.ora'
ORA-01207: 文件比控制文件更新 - 旧的控制文件
Sun May 05 00:09:16 2024
Recovery Slave PR00 previously exited with exception 283
ORA-283 signalled during: ALTER DATABASE RECOVER  database until cancel  ...

using backup controlfile进行恢复

SQL> recover database using backup controlfile until cancel;
ORA-00279: 更改 18646239951 (在 04/25/2024 17:14:50 生成) 对于线程 1 是必需的
ORA-00289: 建议:
+FRA/orcl/archivelog/2024_04_25/thread_1_seq_1003886.199435.1167240505
ORA-00280: 更改 18646239951 (用于线程 1) 在序列 #1003886 中


指定日志: {<RET>=suggested | filename | AUTO | CANCEL}
auto
ORA-00279: 更改 18646239951 (在 04/25/2024 17:11:40 生成) 对于线程 2 是必需的
ORA-00289: 建议:
+FRA/orcl/archivelog/2024_04_25/thread_2_seq_677876.199531.1167239807
ORA-00280: 更改 18646239951 (用于线程 2) 在序列 #677876 中


ORA-00279: 更改 18646255791 (在 04/25/2024 17:16:46 生成) 对于线程 2 是必需的
ORA-00289: 建议:
+FRA/orcl/archivelog/2024_04_25/thread_2_seq_677877.199472.1167240099
ORA-00280: 更改 18646255791 (用于线程 2) 在序列 #677877 中
ORA-00278: 此恢复不再需要日志文件
'+FRA/orcl/archivelog/2024_04_25/thread_2_seq_677876.199531.1167239807'


ORA-00279: 更改 18646295647 (在 04/25/2024 17:21:38 生成) 对于线程 2 是必需的
ORA-00289: 建议:
+FRA/orcl/archivelog/2024_04_25/thread_2_seq_677878.199379.1167240623
ORA-00280: 更改 18646295647 (用于线程 2) 在序列 #677878 中
ORA-00278: 此恢复不再需要日志文件
'+FRA/orcl/archivelog/2024_04_25/thread_2_seq_677877.199472.1167240099'


ORA-00279: 更改 18646331784 (在 04/25/2024 17:28:25 生成) 对于线程 1 是必需的
ORA-00289: 建议:
+FRA/orcl/archivelog/2024_04_25/thread_1_seq_1003887.199320.1167241507
ORA-00280: 更改 18646331784 (用于线程 1) 在序列 #1003887 中
ORA-00278: 此恢复不再需要日志文件
'+FRA/orcl/archivelog/2024_04_25/thread_1_seq_1003886.199435.1167240505'


ORA-00308: 无法打开归档日志
'+FRA/orcl/archivelog/2024_04_25/thread_1_seq_1003887.199320.1167241507'
ORA-17503: ksfdopn: 2 未能打开文件
+FRA/orcl/archivelog/2024_04_25/thread_1_seq_1003887.199320.1167241507
ORA-15012: ASM file
'+FRA/orcl/archivelog/2024_04_25/thread_1_seq_1003887.199320.1167241507' does not exist


ORA-10879: error signaled in parallel recovery slave
ORA-01547: 警告: RECOVER 成功但 OPEN RESETLOGS 将出现如下错误
ORA-01194: 文件 1 需要更多的恢复来保持一致性
ORA-01110: 数据文件 1: '+DATA/orcl/system01.dbf'

通过分析,确认由于cache丢失导致thread_1_seq_1003887这个日志丢失(而且redo已经被覆盖)
20240506-2


20240506-1

数据库无法通过正常recover的思路解决.通过屏蔽一致性,强制打开数据库,alert日志报ORA-600 2662错误

Sat May 04 17:23:00 2024
Redo thread 2 internally disabled at seq 1 (CKPT)
ARC1: Archiving disabled thread 2 sequence 1
Archived Log entry 2 added for thread 2 sequence 1 ID 0x0 dest 1:
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_ora_3684.trc  (incident=47066):
ORA-00600: ??????, ??: [2662], [4], [1466533588], [4], [1466584862], [12583040], [], [], [], [], [], []
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_ora_3684.trc:
ORA-00600: ??????, ??: [2662], [4], [1466533588], [4], [1466584862], [12583040], [], [], [], [], [], []
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_ora_3684.trc:
ORA-00600: ??????, ??: [2662], [4], [1466533588], [4], [1466584862], [12583040], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
USER (ospid: 3684): terminating the instance due to error 600
Instance terminated by USER, pid = 3684
ORA-1092 signalled during: alter database open resetlogs...

通过修改数据库scn,open数据库报ORA-600 4137

Sun May 05 00:12:41 2024
replication_dependency_tracking turned off (no async multimaster replication found)
LOGSTDBY: Validating controlfile with logical metadata
LOGSTDBY: Validation complete
Completed: alter database open resetlogs
Sun May 05 00:12:56 2024
Trace dumping is performing id=[cdmp_20240505001256]
Sun May 05 00:12:56 2024
ORACLE Instance orcl1 (pid = 22) - Error 600 encountered while recovering transaction (28, 21).
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_smon_5896.trc:
ORA-00600: ??????, ??: [4137], [28.21.42965783], [0], [0], [], [], [], [], [], [], [], []

这个错误比较明显,由于28号回滚段异常导致,对异常回滚段进行处理,重建undo,数据库恢复主要工作完成