记录一次pdb恢复过程中遇到的大量bug

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:记录一次pdb恢复过程中遇到的大量bug

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

12C版本使用pdb的Oracle数据库,由于在创建index的过程中强制终止,导致业务大量阻塞,然后重启数据库几次之后直接crash,最后直接无法open成功,报ORA-00600 6856

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 22, block # 993741)
ORA-01110: data file 22:
'+DATA/XIFENFEI/96D12F7BA1E2CE57E0532506060A4A2D/DATAFILE/xff01.309.1050943869'
ORA-10564: tablespace CWBASEMS01
ORA-01110: data file 22:
'+DATA/XIFENFEI/96D12F7BA1E2CE57E0532506060A4A2D/DATAFILE/xff01.309.1050943869'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 76286
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [6856], [0], [12], [], [], [], [],[], [], [], [], []
2020-11-14T15:56:50.736722+08:00
Started redo scan
2020-11-14T15:56:50.977023+08:00
Completed redo scan
 read 82825 KB redo, 10769 data blocks need recovery
2020-11-14T15:56:51.147256+08:00
Started redo application at
 Thread 1: logseq 120309, block 48, offset 0
 Thread 2: logseq 74989, block 2, offset 16, scn 0x00000000f69e1f8d
2020-11-14T15:56:51.151007+08:00
Recovery of Online Redo Log: Thread 1 Group 1 Seq 120309 Reading mem 0
  Mem# 0: +DATA/XIFENFEI/ONLINELOG/group_1.262.1023806467
2020-11-14T15:56:51.153989+08:00
Recovery of Online Redo Log: Thread 2 Group 7 Seq 74989 Reading mem 0
  Mem# 0: +DATA/XIFENFEI/ONLINELOG/group_7.274.1023806785
Errors in file /u01/app/oracle/……/xifenfei1/trace/xifenfei1_p00d_469777.trc(incident=10079552)(PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [6856], [0], [12], [], [], [], [], [], [], [], [], []
2020-11-14T15:56:52.089726+08:00
(3):Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.

这个错误比较明显是由于ORA-600 6856错误导致数据库在启动的时候无法进行实例恢复,出现这个错误原因是由于客户创建index的过程中强制终止引起Bug 17437634 – ORA-1578 or ORA-600 [6856] transient in-memory corruption on TEMP segment during transaction recovery / ROLLBACK (eg: after Ctrl-C) – superseded (Doc ID 17437634.8),屏蔽该文件实例恢复,cdb启动成功,但是pdb无法正常open

SQL> alter session set container=pdb1;
 
Session altered.

SQL> alter database open;
alter database open 
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [kcffo_online_pdb_check:fno_system], [3], [], []
Process ID: 224476
Session ID: 13311 Serial number: 59525

这个错误比较明显是由于ORA-600 kcffo_online_pdb_check:fno_system,数据库未正常检测到pdb的system文件导致该问题,通过对pdb的system文件进行操作,让数据库识别到该文件,然后继续open库

SQL> alter database open;
alter database open 
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [kcvfdb_pdb_set_clean_scn: cleanckpt],[3],[4138003978],[4289274940], [2]
Process ID: 224476
Session ID: 13311 Serial number: 59525

该错误是由于数据库在恢复的过程中推了scn,触发了oracle 某种bug导致该问题,通过一些操作之后,数据库可以open,尝试temp表空间增加临时数据文件报ORA-00600 [kcffo_add_tmpf-1] 错误(Bug 29379978 – ORA-00600 [kcffo_add_tmpf-1] when trying to add temp file (Doc ID 29379978.8)).由于该文件无法加入,数据库无法导出
20201115115951


最后没有办法换了思路直接bbed修改文件头,open cdb库,然后open pdb,顺利导出数据.这次的恢复中,深刻的体验到pdb在open过程中的各种bug,实在比较厌烦.

dul 12.2完美支持Oracle 19c恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:dul 12.2完美支持Oracle 19c恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

Oracle dul 12.2完美支持12c+版本(实测19c恢复正常)

[root@ora11g dul]# ./dul

Data UnLoader: 12.2.0.0.0 Beta - Internal Only - on Wed Nov 11 00:00:39 2020
with 64-bit io functions and the decompression option

Copyright (c) 1994 2019 Bernard van Duijnen All rights reserved.

 Strictly Oracle Internal Use Only


Found db_id = 2193655303
Found db_name = UORA19C
DUL> bootstrap;
Probing file = 1, block = 520
. unloading table                BOOTSTRAP$
DUL: Warning: block number is non zero but marked deferred trying to process it anyhow
      60 rows unloaded
Reading BOOTSTRAP.dat 60 entries loaded
Parsing Bootstrap$ contents
DUL: Warning: Recreating file "dict.ddl"
Generating dict.ddl for version 12
 OBJ$: segobjno 18, file 1 block 240
 TAB$: segobjno 2, tabno 1, file 1  block 144
 COL$: segobjno 2, tabno 5, file 1  block 144
 USER$: segobjno 10, tabno 1, file 1  block 208
Running generated file "@dict.ddl" to unload the dictionary tables
. unloading table                      OBJ$   72485 rows unloaded
. unloading table                      TAB$    2243 rows unloaded
. unloading table                      COL$  123798 rows unloaded
. unloading table                     USER$     127 rows unloaded
Reading USER.dat 127 entries loaded
Reading OBJ.dat 72485 entries loaded and sorted 72485 entries
Reading TAB.dat 2243 entries loaded
Reading COL.dat 123798 entries loaded and sorted 123798 entries
Reading BOOTSTRAP.dat 60 entries loaded

DUL: Warning: Recreating file "dict.ddl"
Generating dict.ddl for version 12
 OBJ$: segobjno 18, file 1 block 240
 TAB$: segobjno 2, tabno 1, file 1  block 144
 COL$: segobjno 2, tabno 5, file 1  block 144
 USER$: segobjno 10, tabno 1, file 1  block 208
 TABPART$: segobjno 814, file 1 block 5424
 INDPART$: segobjno 819, file 1 block 5464
 TABCOMPART$: segobjno 836, file 1 block 5600
 INDCOMPART$: segobjno 841, file 1 block 5640
 TABSUBPART$: segobjno 826, file 1 block 5520
 INDSUBPART$: segobjno 831, file 1 block 5560
 IND$: segobjno 2, tabno 3, file 1  block 144
 ICOL$: segobjno 2, tabno 4, file 1  block 144
 LOB$: segobjno 2, tabno 6, file 1  block 144
 COLTYPE$: segobjno 2, tabno 7, file 1  block 144
 TYPE$: segobjno 740, tabno 1, file 1  block 4880
 COLLECTION$: segobjno 740, tabno 2, file 1  block 4880
 ATTRIBUTE$: segobjno 740, tabno 3, file 1  block 4880
 LOBFRAG$: segobjno 847, file 1 block 5688
 LOBCOMPPART$: segobjno 850, file 1 block 5720
 UNDO$: segobjno 15, file 1 block 224
 TS$: segobjno 6, tabno 2, file 1  block 176
 PROPS$: segobjno 127, file 1 block 1320
Running generated file "@dict.ddl" to unload the dictionary tables
. unloading table                      OBJ$
DUL: Warning: Recreating file "OBJ.ctl"
   72485 rows unloaded
. unloading table                      TAB$
DUL: Warning: Recreating file "TAB.ctl"
    2243 rows unloaded
. unloading table                      COL$
DUL: Warning: Recreating file "COL.ctl"
  123798 rows unloaded
. unloading table                     USER$
DUL: Warning: Recreating file "USER.ctl"
     127 rows unloaded
. unloading table                  TABPART$     304 rows unloaded
. unloading table                  INDPART$     204 rows unloaded
. unloading table               TABCOMPART$       1 row  unloaded
. unloading table               INDCOMPART$       0 rows unloaded
. unloading table               TABSUBPART$      32 rows unloaded
. unloading table               INDSUBPART$       0 rows unloaded
. unloading table                      IND$    2922 rows unloaded
. unloading table                     ICOL$    4996 rows unloaded
. unloading table                      LOB$     702 rows unloaded
. unloading table                  COLTYPE$    3035 rows unloaded
. unloading table                     TYPE$    5889 rows unloaded
. unloading table               COLLECTION$    1385 rows unloaded
. unloading table                ATTRIBUTE$   15376 rows unloaded
. unloading table                  LOBFRAG$      15 rows unloaded
. unloading table              LOBCOMPPART$       0 rows unloaded
. unloading table                     UNDO$      21 rows unloaded
. unloading table                       TS$      21 rows unloaded
. unloading table                    PROPS$      42 rows unloaded
Reading USER.dat 127 entries loaded
Reading OBJ.dat 72485 entries loaded and sorted 72485 entries
Reading TAB.dat 2243 entries loaded
Reading COL.dat 123798 entries loaded and sorted 123798 entries
Reading TABPART.dat 304 entries loaded and sorted 304 entries
Reading TABCOMPART.dat 1 entries loaded and sorted 1 entries
Reading TABSUBPART.dat 32 entries loaded and sorted 32 entries
Reading INDPART.dat 204 entries loaded and sorted 204 entries
Reading INDCOMPART.dat 0 entries loaded and sorted 0 entries
Reading INDSUBPART.dat 0 entries loaded and sorted 0 entries
Reading IND.dat 2922 entries loaded
Reading LOB.dat 702 entries loaded
Reading ICOL.dat 4996 entries loaded
Reading COLTYPE.dat 3035 entries loaded
Reading TYPE.dat
DUL: Notice: Increased the size of DC_TYPES from 4096 to 32768 entries
 5889 entries loaded
Reading ATTRIBUTE.dat 15376 entries loaded
Reading COLLECTION.dat
DUL: Notice: Increased the size of DC_COLLECTIONS from 1024 to 8192 entries
 1385 entries loaded
Reading BOOTSTRAP.dat 60 entries loaded
Reading LOBFRAG.dat 15 entries loaded and sorted 15 entries
Reading LOBCOMPPART.dat 0 entries loaded and sorted 0 entries
Reading UNDO.dat 21 entries loaded
Reading TS.dat 21 entries loaded
Reading PROPS.dat 42 entries loaded
Database character set is AL32UTF8
Database national character set is AL16UTF16

实际测试一张表恢复

DUL> unload table sys.props$;
. unloading table                    PROPS$
DUL: Warning: Recreating file "SYS_PROPS.ctl"
      42 rows unloaded

20201111224237


因尝试恢复未破坏数据库实现数据0丢失恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:因尝试恢复未破坏数据库实现数据0丢失恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

早上接手一个被人折腾了一个晚上的库,通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)收集信息发现有8个数据文件WRONG FILE TYPE
20201110220859


分析最初日志发现客户做了多次offline数据文件和resetlogs操作,运气不错resetlogs操作都没有成功,现场没有被进一步破坏
20201110221430
20201110221400

进一步检查发现客户有11月9日的备份以及全部的归档,直接执行

run{
set newname for datafile  31 to '+ARCH';
set newname for datafile  19 to '+ARCH';
set newname for datafile  20 to '+ARCH';
set newname for datafile  35 to '+ARCH';
set newname for datafile  52 to '+ARCH';
set newname for datafile  16 to '+ARCH';
set newname for datafile  30 to '+ARCH';
set newname for datafile  28 to '+ARCH';
restore datafile 31,19,20,35,52,16,30,28;
switch datafile all;
}

然后catalog注册日志,继续恢复,实现数据0丢失,数据库打开
20201110222050


庆幸夜间的所有误操作没有产生任何实质性破坏,不然就算有备份,后果也是比较麻烦的(客户的设备io特别慢,还原8个数据文件使用了近7个小时,全库还原只能呵呵……)

因为人工误操作导致resetlogs scn不一致恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:因为人工误操作导致resetlogs scn不一致恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有个朋友,数据库损坏有一段时间,前期找过我们,对其进行分析,判断强制打开库应该就可以了;后来不知道什么原因有几批人进行了恢复,把现场完全破坏,检查数据库发现,数据库文件有部分resetlogs scn不一致
1


通过工具(Oracle恢复小工具—Oracle Recovery Tools)对文件相关值进行修改,并尝试打开库
20201108133342

SQL> recover database until cancel;
ORA-00279: change 15764982275935 generated at 11/06/2020 00:27:23 needed for thread 1
ORA-00289: suggestion : /jiaoh/oradata/orcl/archlog/arch_1_5_995131008.log
ORA-00280: change 15764982275935 for thread 1 is in sequence #5

Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 3 needs more recovery to be consistent
ORA-01110: data file 3: '/u01/oracle/oradata/orcl/undotbs01.dbf'


ORA-01112: media recovery not started


SQL> alter database open resetlogs;

Database altered.

数据库alert日志报ORA-600 4097错误

un Nov 08 12:17:06 2020
Errors in file /u01/oracle/diag/rdbms/orcl/orcl/trace/orcl_j006_135739.trc  (incident=2604819):
ORA-00600: internal error code, arguments: [4097], [31], [2], [338312], [], [], [], [], [], [], [], []
Incident details in: /u01/oracle/diag/rdbms/orcl/orcl/incident/incdir_2604819/orcl_j006_135739_i2604819.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.

对undo进行处理,数据库后台无明显报错,建议客户逻辑导出数据导入新库,恢复完成.

ORA-15096: lost disk write detected

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-15096: lost disk write detected

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

又一例由于存储掉电导致asm磁盘组,由于ORA-15096: lost disk write detected,导致无法mount的恢复请求

SQL> ALTER DISKGROUP DATA MOUNT  /* asm agent *//* {1:45277:148} */
NOTE: cache registered group DATA number=2 incarn=0x73886b6a
NOTE: cache began mount (first) of group DATA number=2 incarn=0x73886b6a
NOTE: Assigning number (2,2) to disk (/dev/asm-data3)
NOTE: Assigning number (2,1) to disk (/dev/asm-data2)
NOTE: Assigning number (2,0) to disk (/dev/asm-data1)
Fri Nov 06 19:06:56 2020
NOTE: GMON heartbeating for grp 2
GMON querying group 2 at 94 for pid 30, osid 11596
NOTE: cache opening disk 0 of grp 2: DATA_0000 path:/dev/asm-data1
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache opening disk 1 of grp 2: DATA_0001 path:/dev/asm-data2
NOTE: cache opening disk 2 of grp 2: DATA_0002 path:/dev/asm-data3
NOTE: cache mounting (first) external redundancy group 2/0x73886B6A (DATA)
Fri Nov 06 19:06:57 2020
* allocate domain 2, invalid = TRUE
kjbdomatt send to inst 2
Fri Nov 06 19:06:57 2020
NOTE: attached to recovery domain 2
NOTE: starting recovery of thread=1 ckpt=25.7986 group=2 (DATA)
NOTE: starting recovery of thread=2 ckpt=33.364 group=2 (DATA)
NOTE: BWR validation signaled ORA-15096
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_11596.trc:
ORA-15096: lost disk write detected
NOTE: crash recovery signalled OER-15096
ERROR: ORA-15096 signalled during mount of diskgroup DATA
NOTE: cache dismounting (clean) group 2/0x73886B6A (DATA)
NOTE: messaging CKPT to quiesce pins Unix process pid: 11596, image: oracle@db1 (TNS V1-V3)
NOTE: lgwr not being msg'd to dismount
kjbdomdet send to inst 2
detach from dom 2, sending detach message to inst 2
freeing rdom 2
NOTE: detached from domain 2
NOTE: cache dismounted group 2/0x73886B6A (DATA)
NOTE: cache ending mount (fail) of group DATA number=2 incarn=0x73886b6a
NOTE: cache deleting context for group DATA 2/0x73886b6a
GMON dismounting group 2 at 95 for pid 30, osid 11596
NOTE: Disk DATA_0000 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0001 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0002 in mode 0x7f marked for de-assignment
ERROR: diskgroup DATA was not mounted
ORA-15032: not all alterations performed
ORA-15096: lost disk write detected
ERROR: ALTER DISKGROUP DATA MOUNT  /* asm agent *//* {1:45277:148} */

通过判断,通过一系列处理之后,数据库进行了mount操作发现报错ORA-600 2130

Fri Nov 06 17:03:27 2020
ALTER DATABASE RECOVER  database
Media Recovery Start
 started logmerger process
Parallel Media Recovery started with 40 slaves
Fri Nov 06 17:03:29 2020
Errors in file /u01/app/oracle/diag/rdbms/ynhis/ynhis1/trace/ynhis1_pr00_7393.trc  (incident=195869):
ORA-00600: internal error code, arguments: [2130], [2], [1], [2], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/ynhis/ynhis1/incident/incdir_195869/ynhis1_pr00_7393_i195869.trc
Fri Nov 06 17:03:30 2020
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Media Recovery failed with error 600
ORA-10877 signalled during: ALTER DATABASE RECOVER  database  ...

判断redo异常,通过resetlogs打开库,发现报错ORA-00600 2662

Fri Nov 06 18:21:32 2020
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 8670753264
Resetting resetlogs activation ID 306909514 (0x124b114a)
Redo thread 2 enabled by open resetlogs or standby activation
Fri Nov 06 18:21:39 2020
Setting recovery target incarnation to 2
Initializing SCN for created control file
Database SCN compatibility initialized to 3
Warning - High Database SCN: Current SCN value is 8670753267, threshold SCN value is 0
Fri Nov 06 18:21:39 2020
Assigning activation ID 408224320 (0x18550240)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: /orabak/data/group_1.289.954514319
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Nov 06 18:21:40 2020
SMON: enabling cache recovery
Errors in file /u01/app/oracle/diag/rdbms/ynhis/ynhis1/trace/ynhis1_ora_24310.trc  (incident=231847):
ORA-00600: internal error code, arguments: [2662], [2], [80818679], [2], [93545365], [4194545], [], [], [], [], [],[]
Incident details in: /u01/app/oracle/diag/rdbms/ynhis/ynhis1/incident/incdir_231847/ynhis1_ora_24310_i231847.trc
Fri Nov 06 18:21:42 2020
Dumping diagnostic data in directory=[cdmp_20201106182142],requested by(instance=1,osid=24310),summary=[incident=231847]
Fri Nov 06 18:21:43 2020
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/ynhis/ynhis1/trace/ynhis1_ora_24310.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [2662], [2], [80818679], [2], [93545365],[4194545],[],[],[],[],[],[]
Errors in file /u01/app/oracle/diag/rdbms/ynhis/ynhis1/trace/ynhis1_ora_24310.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [2662], [2], [80818679], [2], [93545365],[4194545],[],[],[],[],[],[]
Error 704 happened during db open, shutting down database
USER (ospid: 24310): terminating the instance due to error 704
Instance terminated by USER, pid = 24310
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (24310) as a result of ORA-1092

处理该错误之后,数据库resetlog之后,数据库open成功但是报错ORA-00600 4137

Database Characterset is ZHS16GBK
Errors in file /u01/app/oracle/diag/rdbms/ynhis/ynhis1/trace/ynhis1_smon_26195.trc  (incident=255799):
ORA-00600: internal error code, arguments: [4137], [25.33.122556], [0], [0], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/ynhis/ynhis1/incident/incdir_255799/ynhis1_smon_26195_i255799.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
No Resource Manager plan active
Fri Nov 06 18:30:46 2020
replication_dependency_tracking turned off (no async multimaster replication found)
ORACLE Instance ynhis1 (pid = 23) - Error 600 encountered while recovering transaction (25, 33).
Errors in file /u01/app/oracle/diag/rdbms/ynhis/ynhis1/trace/ynhis1_smon_26195.trc:
ORA-00600: internal error code, arguments: [4137], [25.33.122556], [0], [0], [], [], [], [], [], [], [], []

对异常undo进行处理,数据库可以正常启动关闭,然后安排数据导出导入新库操作,恢复完成.