记录一次pdb恢复过程中遇到的大量bug

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:记录一次pdb恢复过程中遇到的大量bug

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

12C版本使用pdb的Oracle数据库,由于在创建index的过程中强制终止,导致业务大量阻塞,然后重启数据库几次之后直接crash,最后直接无法open成功,报ORA-00600 6856

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 22, block # 993741)
ORA-01110: data file 22:
'+DATA/XIFENFEI/96D12F7BA1E2CE57E0532506060A4A2D/DATAFILE/xff01.309.1050943869'
ORA-10564: tablespace CWBASEMS01
ORA-01110: data file 22:
'+DATA/XIFENFEI/96D12F7BA1E2CE57E0532506060A4A2D/DATAFILE/xff01.309.1050943869'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 76286
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [6856], [0], [12], [], [], [], [],[], [], [], [], []
2020-11-14T15:56:50.736722+08:00
Started redo scan
2020-11-14T15:56:50.977023+08:00
Completed redo scan
 read 82825 KB redo, 10769 data blocks need recovery
2020-11-14T15:56:51.147256+08:00
Started redo application at
 Thread 1: logseq 120309, block 48, offset 0
 Thread 2: logseq 74989, block 2, offset 16, scn 0x00000000f69e1f8d
2020-11-14T15:56:51.151007+08:00
Recovery of Online Redo Log: Thread 1 Group 1 Seq 120309 Reading mem 0
  Mem# 0: +DATA/XIFENFEI/ONLINELOG/group_1.262.1023806467
2020-11-14T15:56:51.153989+08:00
Recovery of Online Redo Log: Thread 2 Group 7 Seq 74989 Reading mem 0
  Mem# 0: +DATA/XIFENFEI/ONLINELOG/group_7.274.1023806785
Errors in file /u01/app/oracle/……/xifenfei1/trace/xifenfei1_p00d_469777.trc(incident=10079552)(PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [6856], [0], [12], [], [], [], [], [], [], [], [], []
2020-11-14T15:56:52.089726+08:00
(3):Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.

这个错误比较明显是由于ORA-600 6856错误导致数据库在启动的时候无法进行实例恢复,出现这个错误原因是由于客户创建index的过程中强制终止引起Bug 17437634 – ORA-1578 or ORA-600 [6856] transient in-memory corruption on TEMP segment during transaction recovery / ROLLBACK (eg: after Ctrl-C) – superseded (Doc ID 17437634.8),屏蔽该文件实例恢复,cdb启动成功,但是pdb无法正常open

SQL> alter session set container=pdb1;
 
Session altered.

SQL> alter database open;
alter database open 
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [kcffo_online_pdb_check:fno_system], [3], [], []
Process ID: 224476
Session ID: 13311 Serial number: 59525

这个错误比较明显是由于ORA-600 kcffo_online_pdb_check:fno_system,数据库未正常检测到pdb的system文件导致该问题,通过对pdb的system文件进行操作,让数据库识别到该文件,然后继续open库

SQL> alter database open;
alter database open 
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [kcvfdb_pdb_set_clean_scn: cleanckpt],[3],[4138003978],[4289274940], [2]
Process ID: 224476
Session ID: 13311 Serial number: 59525

该错误是由于数据库在恢复的过程中推了scn,触发了oracle 某种bug导致该问题,通过一些操作之后,数据库可以open,尝试temp表空间增加临时数据文件报ORA-00600 [kcffo_add_tmpf-1] 错误(Bug 29379978 – ORA-00600 [kcffo_add_tmpf-1] when trying to add temp file (Doc ID 29379978.8)).由于该文件无法加入,数据库无法导出
20201115115951


最后没有办法换了思路直接bbed修改文件头,open cdb库,然后open pdb,顺利导出数据.这次的恢复中,深刻的体验到pdb在open过程中的各种bug,实在比较厌烦.