记录一次200T的数据库恢复经历

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:记录一次200T的数据库恢复经历

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有一个客户恢复请求,6个节点11.2.0.3 RAC,非归档模式,数据量近200T
df_size


由于存储掉电导致数据库6个节点全部宕机,恢复硬件之后,数据库无法正常启动,报错如下:

SQL> recover database;
ORA-00279: change 318472018583 generated at 05/04/2019 17:58:05 needed for
thread 4
ORA-00289: suggestion :
/u01/app/oracle/product/11.2.0/db_1/dbs/arch4_322810_870181839.dbf
ORA-00280: change 318472018583 for thread 4 is in sequence #322810

Wed Aug 28 11:19:55 2019
ALTER DATABASE RECOVER  DATABAE 
Media Recovery Start
Serial Media Recovery started
Recovery of Online Redo Log: Thread 1 Group 14 Seq 552 Reading mem 0
  Mem# 0: +REDO/xff/log2.ora
Recovery of Online Redo Log: Thread 2 Group 15 Seq 126 Reading mem 0
  Mem# 0: +REDO/xff/log3.ora
Recovery of Online Redo Log: Thread 3 Group 18 Seq 122 Reading mem 0
  Mem# 0: +REDO/xff/log6.ora
ORA-279 signalled during: ALTER DATABASE RECOVER  database  ...
Wed Aug 28 11:21:31 2019
ALTER DATABASE RECOVER CANCEL 
Media Recovery Canceled
Completed: ALTER DATABASE RECOVER CANCEL 

数据库恢复需要thread 4 sequence #322810,查询redo信息
redo


redo已经被覆盖,数据库无法通过正常途径恢复实现数据库open,尝试屏蔽一致性强制拉库操作后

Wed Aug 28 12:40:15 2019
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_smon_51338.trc  (incident=244209):
ORA-00600: internal error code, arguments: [4137], [44.47.613406], [0], [0], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/xff/xff1/incident/incdir_244209/xff1_smon_51338_i244209.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
No Resource Manager plan active
replication_dependency_tracking turned off (no async multimaster replication found)
Wed Aug 28 12:40:16 2019
ORACLE Instance xff1 (pid = 26) - Error 600 encountered while recovering transaction (44, 47).
Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_smon_51338.trc:
ORA-00600: internal error code, arguments: [4137], [44.47.613406], [0], [0], [], [], [], [], [], [], [], []
Wed Aug 28 12:40:20 2019
Exception[type: SIGSEGV,Address not mapped to object][ADDR:0x5122000000C8][PC:0xE1B4D3,ktugru()+87][flags:0x0,count:1]
Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_p086_54066.trc  (incident=245017):
ORA-07445:exception encountered:core dump [ktugru()+87][SIGSEGV][ADDR:0x5122000000C8][Address not mapped to object]
Incident details in: /u01/app/oracle/diag/rdbms/xff/xff1/incident/incdir_245017/xff1_p086_54066_i245017.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Wed Aug 28 12:40:20 2019
Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_p000_53873.trc  (incident=244305):
ORA-00600: internal error code, arguments: [4198], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/xff/xff1/incident/incdir_244305/xff1_p000_53873_i244305.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.

提示undo异常,屏蔽回滚段之后,数据库正常打开没有任何报错信息

Wed Aug 28 12:57:15 2019
SMON: enabling cache recovery
Instance recovery: looking for dead threads
Instance recovery: lock domain invalid but no dead threads
[57676] Successfully onlined Undo Tablespace 22.
Undo initialization finished serial:0 start:2386111306 end:2386112316 diff:1010 (10 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
Wed Aug 28 12:57:17 2019
minact-scn: Inst 1 is now the master inc#:2 mmon proc-id:57624 status:0x7
minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000
No Resource Manager plan active
Starting background process GTX0
Wed Aug 28 12:57:18 2019
GTX0 started with pid=45, OS id=57777 
Starting background process RCBG
Wed Aug 28 12:57:18 2019
RCBG started with pid=46, OS id=57779 
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Wed Aug 28 12:57:19 2019
QMNC started with pid=47, OS id=57788 
Completed: ALTER DATABASE OPEN

后续涉及创建新undo,删除老undo并处理一些类似,基本上恢复正常
OPEN