一次完美的asm disk被格式化ntfs恢复

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:一次完美的asm disk被格式化ntfs恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

又一起win rac的asm disk被格式化为ntfs,导致数据库异常恢复的请求,客户描述有一个500G的data磁盘组(只有一个磁盘,被误操作进行了格式化).格式化asm disk之后
asm的alert日志报错

Tue Jul 09 17:09:45 2019
NOTE: ASM client orcl1:ORCL disconnected unexpectedly.
NOTE: check client alert log.
NOTE: Trace records dumped in trace file D:\APP\ADMINISTRATOR\diag\asm\+asm\+asm1\trace\+asm1_ora_5376.trc
Tue Jul 09 17:10:19 2019
Errors in file D:\APP\ADMINISTRATOR\diag\asm\+asm\+asm1\trace\+asm1_lgwr_1448.trc:
ORA-27070: async read/write failed
OSD-04008: WriteFile() 失败, 无法写入文件
O/S-Error: (OS 21) 设备未就绪。
WARNING: Write Failed. group:1 disk:0 AU:15 offset:876544 size:4096
NOTE: unable to write any mirror side for diskgroup DATA
NOTE: cache initiating offline of disk 0 group DATA
NOTE: process _lgwr_+asm1 (3764:1448) initiating offline of disk 0.4042281525 (DATA_0000) with mask 0x7e in group 1
Tue Jul 09 17:10:19 2019
WARNING: Disk 0 (DATA_0000) in group 1 mode 0x7f is now being offlined
WARNING: Disk 0 (DATA_0000) in group 1 in mode 0x7f is now being taken offline on ASM inst 1
NOTE: initiating PST update: grp = 1, dsk = 0/0xf0f05235, mask = 0x6a, op = clear
GMON updating disk modes for group 1 at 10 for pid 15, osid 1448
ERROR: Disk 0 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 1)
WARNING: Disk 0 (DATA_0000) in group 1 mode 0x7f offline is being aborted
WARNING: Offline of disk 0 (DATA_0000) in group 1 and mode 0x7f failed on ASM inst 1
Tue Jul 09 17:10:20 2019
NOTE: halting all I/Os to diskgroup 1 (DATA)
NOTE: unable to offline disks after getting write error for diskgroup DATA
Tue Jul 09 17:10:20 2019
NOTE: cache dismounting (not clean) group 1/0xBDB0A2C0 (DATA) 
NOTE: disk 0 had IO error
NOTE: messaging CKPT to quiesce pins Windows thread id: 520528, image: ORACLE.EXE (B000)
Tue Jul 09 17:10:20 2019
NOTE: Deferred communication with ASM instance
Errors in file D:\APP\ADMINISTRATOR\diag\asm\+asm\+asm1\trace\+asm1_ora_6140.trc:
ORA-15130: diskgroup "DATA" is being dismounted
NOTE: deferred map free for map id 4
NOTE: LGWR doing non-clean dismount of group 1 (DATA)
NOTE: LGWR sync ABA=38.3028 last written ABA 38.3029

数据库的alert日志报错

Tue Jul 09 17:09:12 2019
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_lgwr_5396.trc:
ORA-27072: 文件 I/O 错误
WARNING: IO Failed. group:1 disk(number.incarnation):0.0xf0f05235 disk_path:\\.\ORCLDISKDATA0
	 AU:1305 disk_offset(bytes):1368456704 io_size:512 operation:Write type:asynchronous
	 result:I/O error process_id:5396
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_lgwr_5396.trc:
ORA-15080: 与磁盘的同步 I/O 操作失败
WARNING: failed to write mirror side 1 of virtual extent 23 logical extent 0 
   of file 261 in group 1 on disk 0 allocation unit 1305 
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_lgwr_5396.trc:
ORA-00345: 重做日志写入块 47231 计数 1 出错
ORA-00312: 联机日志 1 线程 1: '+DATA/orcl/onlinelog/group_1.261.909498607'
ORA-15081: 无法将 I/O 操作提交到磁盘
ORA-15081: 无法将 I/O 操作提交到磁盘
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl1\trace\orcl1_lgwr_5396.trc:
ORA-27070: 异步读取/写入失败
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 21) 设备未就绪。

机器上查看磁盘信息
format_ntfs2
format_ntfs


这个客户比较幸运,asm disk被格式化之后,没有进行任何写操作,理解对现场进行了保护,没有任何的二次破坏.因为处理过多起类似故障(oracle asm disk格式化恢复—格式化为ntfs文件系统,oracle asm disk格式化恢复—格式化为ext4文件系统又一例asm格式化文件系统恢复),有一定的经验,一般三种方案恢复数据文件:
1)运气好直接通过kfed进行修复asm disk,然后mount起来,然后把数据文件拷贝到文件系统中
2)通过相关工具把asm disk中相关的数据文件拷贝到文件系统中
3)如果损坏的严重,通过底层碎片,把相关数据文件恢复到文件系统中
拷贝完成数据文件之后,然后根据文件的情况有几种可能性:
1)直接open数据库,处理可能的其他坏块
2)通过一些方法强制拉库,然后对其进行导出导入新库
3)通过工具直接恢复表数据,甚至恢复部分核心数据
这次的恢复运气不错,对asm disk进行一系列修复之后,asm 磁盘组mount成功

C:\Users\Administrator>asmtool -list
NTFS                             \Device\Harddisk0\Partition1            80000M
NTFS                             \Device\Harddisk0\Partition2           491133M
ORCLDISKOCR0                     \Device\Harddisk1\Partition1            34132M
ORCLDISKOCR1                     \Device\Harddisk1\Partition2            34132M
ORCLDISKOCR2                     \Device\Harddisk1\Partition3            34133M
ORCLDISKDATA0                    \Device\Harddisk2\Partition1           511997M
ORCLDISKFRA0                     \Device\Harddisk3\Partition2          1225000M
NTFS                             \Device\Harddisk3\Partition3          1225800M
                                 \Device\Harddisk3\Partition4          1314861M
C:\Users\Administrator>sqlplus / as sysasm

SQL*Plus: Release 11.2.0.1.0 Production on 星期六 7月 13 11:20:02 2019

Copyright (c) 1982, 2010, Oracle.  All rights reserved.


连接到:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> alter diskgroup data mount;

Diskgroup altered.

SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64
bit Production
With the Real Application Clusters and Automatic Storage Management options

mount数据库拷贝文件

C:\Users\Administrator>sqlplus / as sysdba

SQL*Plus: Release 11.2.0.1.0 Production on 星期六 7月 13 16:02:08 2019

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

已连接到空闲例程。

SQL> startup  mount;
ORACLE 例程已经启动。

Total System Global Area 3.4206E+10 bytes
Fixed Size                  2192864 bytes
Variable Size            7516195360 bytes
Database Buffers         2.6642E+10 bytes
Redo Buffers               45727744 bytes
数据库已装载

RMAN>  backup as copy   database format 'I:\rmanback\df_%U.dbf';

启动 backup 于 13-7月 -19
使用通道 ORA_DISK_1
使用通道 ORA_DISK_2
使用通道 ORA_DISK_3
使用通道 ORA_DISK_4
通道 ORA_DISK_1: 启动数据文件副本
输入数据文件: 文件号=00006 名称=+DATA/orcl/datafile/XFF5.dmp
通道 ORA_DISK_2: 启动数据文件副本
输入数据文件: 文件号=00016 名称=+DATA/orcl/datafile/XFF53
通道 ORA_DISK_3: 启动数据文件副本
输入数据文件: 文件号=00017 名称=+DATA/orcl/datafile/XFF54
通道 ORA_DISK_4: 启动数据文件副本
输入数据文件: 文件号=00018 名称=+DATA/orcl/datafile/XFF55
输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-XFF5_FNO-16_0VU6I1FO.DBF
标记=TAG20190713T120927 RECID=2 STAMP=1013517954
通道 ORA_DISK_2: 数据文件复制完毕, 经过时间: 00:36:29
通道 ORA_DISK_2: 启动数据文件副本
输入数据文件: 文件号=00019 名称=+DATA/orcl/datafile/XFF5.281.988042407
输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-XFF5_FNO-17_10U6I1FO.DBF
标记=TAG20190713T120927 RECID=1 STAMP=1013517951
通道 ORA_DISK_3: 数据文件复制完毕, 经过时间: 00:36:29
通道 ORA_DISK_3: 启动数据文件副本
输入数据文件: 文件号=00020 名称=+DATA/orcl/datafile/XFF5.282.988042943
输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-XFF5_FNO-18_11U6I1FO.DBF
标记=TAG20190713T120927 RECID=3 STAMP=1013517962
通道 ORA_DISK_4: 数据文件复制完毕, 经过时间: 00:36:37
通道 ORA_DISK_4: 启动数据文件副本
输入数据文件: 文件号=00021 名称=+DATA/orcl/datafile/XFF5.283.988043279
输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-XFF5_FNO-6_0UU6I1FO.DBF
标记=TAG20190713T120927 RECID=4 STAMP=1013518052
通道 ORA_DISK_1: 数据文件复制完毕, 经过时间: 00:38:12
通道 ORA_DISK_1: 启动数据文件副本
输入数据文件: 文件号=00022 名称=+DATA/orcl/datafile/XFF5.284.988043721
输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-XFF5_FNO-20_13U6I3K5.DBF
标记=TAG20190713T120927 RECID=5 STAMP=1013519584
通道 ORA_DISK_3: 数据文件复制完毕, 经过时间: 00:27:10
通道 ORA_DISK_3: 启动数据文件副本
输入数据文件: 文件号=00009 名称=+DATA/orcl/datafile/XFF51.dmp
输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-XFF5_FNO-21_14U6I3KD.DBF
标记=TAG20190713T120927 RECID=6 STAMP=1013519586
通道 ORA_DISK_4: 数据文件复制完毕, 经过时间: 00:27:03
通道 ORA_DISK_4: 启动数据文件副本
输入数据文件: 文件号=00010 名称=+DATA/orcl/datafile/XFF52.dmp
输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-XFF5_FNO-19_12U6I3K5.DBF
标记=TAG20190713T120927 RECID=7 STAMP=1013519599
通道 ORA_DISK_2: 数据文件复制完毕, 经过时间: 00:27:26
通道 ORA_DISK_2: 启动数据文件副本
输入数据文件: 文件号=00003 名称=+DATA/orcl/datafile/undotbs1.258.909498475
输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-XFF5_FNO-22_15U6I3NC.DBF
标记=TAG20190713T120927 RECID=8 STAMP=1013519695
通道 ORA_DISK_1: 数据文件复制完毕, 经过时间: 00:27:19
通道 ORA_DISK_1: 启动数据文件副本
输入数据文件: 文件号=00005 名称=+DATA/orcl/datafile/undotbs2.264.909498739
输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-UNDOTBS1_FNO-3_18U6I57K.DB
F 标记=TAG20190713T120927 RECID=9 STAMP=1013520920
通道 ORA_DISK_2: 数据文件复制完毕, 经过时间: 00:22:02
通道 ORA_DISK_2: 启动数据文件副本
输入数据文件: 文件号=00012 名称=+DATA/orcl/datafile/XFF5.275.954036129
输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-UNDOTBS2_FNO-5_19U6I5AJ.DB
F 标记=TAG20190713T120927 RECID=10 STAMP=1013521015
通道 ORA_DISK_1: 数据文件复制完毕, 经过时间: 00:22:03
通道 ORA_DISK_1: 启动数据文件副本
输入数据文件: 文件号=00013 名称=+DATA/orcl/datafile/XFF5.276.954036233
输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-XFF5_FNO-10_17U6I574.DBF
标记=TAG20190713T120927 RECID=11 STAMP=1013521231
通道 ORA_DISK_4: 数据文件复制完毕, 经过时间: 00:27:30
通道 ORA_DISK_4: 启动数据文件副本
输入数据文件: 文件号=00002 名称=+DATA/orcl/datafile/sysaux.257.909498475
输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-XFF5_FNO-9_16U6I574.DBF
标记=TAG20190713T120927 RECID=12 STAMP=1013521244
通道 ORA_DISK_3: 数据文件复制完毕, 经过时间: 00:27:37
通道 ORA_DISK_3: 启动数据文件副本
输入数据文件: 文件号=00023 名称=+DATA/orcl/datafile/XFF60.dmp
输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-XFF5_FNO-12_1AU6I6GU.DBF
标记=TAG20190713T120927 RECID=13 STAMP=1013522267
通道 ORA_DISK_2: 数据文件复制完毕, 经过时间: 00:22:25
通道 ORA_DISK_2: 启动数据文件副本
输入数据文件: 文件号=00024 名称=+DATA/orcl/datafile/XFF61.dmp
输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-XFF5_FNO-13_1BU6I6JU.DBF
标记=TAG20190713T120927 RECID=15 STAMP=1013522367
通道 ORA_DISK_1: 数据文件复制完毕, 经过时间: 00:22:26
通道 ORA_DISK_1: 启动数据文件副本
输入数据文件: 文件号=00011 名称=+DATA/orcl/datafile/system.dmp
输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-SYSAUX_FNO-2_1CU6I6QM.DBF
标记=TAG20190713T120927 RECID=14 STAMP=1013522361
通道 ORA_DISK_4: 数据文件复制完毕, 经过时间: 00:18:50
通道 ORA_DISK_4: 启动数据文件副本
输入数据文件: 文件号=00001 名称=+DATA/orcl/datafile/system.256.909498475
输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-XFF5_FNO-23_1DU6I6QT.DBF
标记=TAG20190713T120927 RECID=16 STAMP=1013522400
通道 ORA_DISK_3: 数据文件复制完毕, 经过时间: 00:19:19
通道 ORA_DISK_3: 启动数据文件副本
输入数据文件: 文件号=00025 名称=+DATA/orcl/datafile/XFF62.dmp
输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-XFF5_FNO-25_1HU6I7V4.DBF
标记=TAG20190713T120927 RECID=17 STAMP=1013522490
通道 ORA_DISK_3: 数据文件复制完毕, 经过时间: 00:01:35
通道 ORA_DISK_3: 启动数据文件副本
输入数据文件: 文件号=00026 名称=+DATA/orcl/datafile/XFF63.dmp
输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-SYSTEM_FNO-1_1GU6I7U0.DBF
标记=TAG20190713T120927 RECID=18 STAMP=1013522526
通道 ORA_DISK_4: 数据文件复制完毕, 经过时间: 00:02:47
通道 ORA_DISK_4: 启动数据文件副本
输入数据文件: 文件号=00007 名称=+DATA/orcl/datafile/precise.dbf
输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-SYSTEM_FNO-11_1FU6I7U0.DBF
 标记=TAG20190713T120927 RECID=20 STAMP=1013522576
通道 ORA_DISK_1: 数据文件复制完毕, 经过时间: 00:03:33
通道 ORA_DISK_1: 启动数据文件副本
输入数据文件: 文件号=00004 名称=+DATA/orcl/datafile/users.259.909498477
RMAN-03009: backup 命令 (ORA_DISK_3 通道上, 在 07/13/2019 14:03:01 上) 失败
ORA-19566: 超出损坏块限制 0 (文件 +DATA/orcl/datafile/XFF63.dmp)
继续执行其他作业步骤, 将不重新运行失败的作业
通道 ORA_DISK_3: 启动数据文件副本
输入数据文件: 文件号=00014 名称=+DATA/orcl/datafile/test.280.972807149
输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-PRECISE_FNO-7_1JU6I837.DBF
 标记=TAG20190713T120927 RECID=19 STAMP=1013522572
通道 ORA_DISK_4: 数据文件复制完毕, 经过时间: 00:00:46
输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-USERS_FNO-4_1KU6I84L.DBF
标记=TAG20190713T120927 RECID=21 STAMP=1013522591
通道 ORA_DISK_1: 数据文件复制完毕, 经过时间: 00:00:15
输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-TEST_FNO-14_1LU6I84L.DBF
标记=TAG20190713T120927 RECID=22 STAMP=1013522591
通道 ORA_DISK_3: 数据文件复制完毕, 经过时间: 00:00:15
输出文件名=I:\RMANBACK\DF_DATA_D-ORCL_I-1437279340_TS-XFF5_FNO-24_1EU6I7R0.DBF
标记=TAG20190713T120927 RECID=23 STAMP=1013522974
通道 ORA_DISK_2: 数据文件复制完毕, 经过时间: 00:11:44
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================

RMAN-03009: backup 命令 (ORA_DISK_3 通道上, 在 07/13/2019 14:03:01 上) 失败
ORA-19566: 超出损坏块限制 0 (文件 +DATA/orcl/datafile/XFF63.dmp)

运气不错,除+DATA/orcl/datafile/XFF63.dmp文件上面有坏块之外其他文件没有发现坏块,对该文件进行特殊方式拷贝处理
dbv检查该文件

C:\Users\Administrator>dbv file=i:/rmanback/26.dbf

DBVERIFY: Release 11.2.0.1.0 - Production on 星期六 7月 13 14:09:32 2019

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - 开始验证: FILE = I:\RMANBACK\26.DBF
页 132735 流入 - 很可能是介质损坏
Corrupt block relative dba: 0x0682067f (file 26, block 132735)
Fractured block found during dbv:
Data in bad block:
 type: 40 format: 2 rdba: 0x0682067f
 last change scn: 0x0000.333eaa21 seq: 0x2 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0xaa550000
 check value in block header: 0x4242
 computed block checksum: 0xb24f



DBVERIFY - 验证完成

检查的页总数: 153600
处理的页总数 (数据): 9283
失败的页总数 (数据): 0
处理的页总数 (索引): 4789
失败的页总数 (索引): 0
处理的页总数 (其他): 139463
处理的总页数 (段)  : 0
失败的总页数 (段)  : 0
空的页总数: 64
标记为损坏的总页数: 1
流入的页总数: 1
加密的总页数        : 0
最高块 SCN            : 1029666541 (0.1029666541)

这次运气非常好,该文件也只有一个坏块,整体来说,把正在运行的asm disk磁盘格式化为ntfs格式化,整个数据库文件只发现一个坏块.拷贝数据文件,redo,ctl等之后

通过重建控制文件尝试open数据库

SQL> @ctl.sql

控制文件已创建。
SQL> RECOVER DATABASE;
完成介质恢复。
SQL> ALTER DATABASE OPEN;

数据库已更改。

通过dba_extents 定位坏块对象,然后根据实际情况处理(index直接rebuild,表跳过,lob置空等方法),确定数据没有问题,重建磁盘组,数据回迁,恢复完美完成

ORA-00600: internal error code, arguments: [2252]

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:ORA-00600: internal error code, arguments: [2252]

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户数据库版本10.2.0.4,启动成功之后立马crash,让我们协助解决
version


Thu Jul  4 13:03:10 2019
Completed: ALTER DATABASE OPEN
Thu Jul  4 13:03:10 2019
db_recovery_file_dest_size of 2048 MB is 0.00% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Thu Jul  4 13:04:01 2019
Errors in file /oracle/app/oracle/admin/tpfworcl/bdump/tpfworcl_reco_22268.trc:
ORA-00600: internal error code, arguments: [2252], [3987], [3375047096], [], [], [], [], []
Thu Jul  4 13:04:01 2019
Errors in file /oracle/app/oracle/admin/tpfworcl/bdump/tpfworcl_reco_22268.trc:
ORA-00600: internal error code, arguments: [2252], [3987], [3375047096], [], [], [], [], []
Thu Jul  4 13:04:02 2019
Errors in file /oracle/app/oracle/admin/tpfworcl/bdump/tpfworcl_reco_22268.trc:
ORA-00600: internal error code, arguments: [2252], [3987], [3375047096], [], [], [], [], []
Thu Jul  4 13:04:02 2019
RECO: terminating instance due to error 476
Instance terminated by RECO, pid = 22268

通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)检查scn相关信息
scn


从ORA-600 2252错误信息看,由于scn可能超过该数据库的天花板理论上而导致该问题,而reco进程主要是由于分布式事务引起,通过和客户确认,该库有通过dblink去访问11204版本oracle,而从2019年6月23日之后scn的算法发生了一些改变(SCN Compatibility问题汇总-2019年6月23日),导致数据库可以支持更大的scn,从而当低版本需要进行分布式事务操作之时,可能导致数据库异常.

处理方案:通过临时屏蔽分布式事务,让数据库临时正常工作;长期解决方案需要把数据库版本升级,避免scn引起相关问题

ORA-00600: internal error code, arguments: [16703], [1403], [32]

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:ORA-00600: internal error code, arguments: [16703], [1403], [32]

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有网友咨询数据库启动报ORA-00600: internal error code, arguments: [16703], [1403], [32]错误,这个错误和以往遇到的不太一样,以前恢复的一些案例
tab$恢复错误汇总
10g数据库遭遇ORA-600 16703
12C数据库遭遇ORA-600 16703
tab$被恶意删除sys用户之外记录
ORA-600 16703故障解析—tab$表被清空
警告:互联网中有oracle介质被注入恶意程序导致—ORA-600 16703

SQL> startup
ORACLE 例程已经启动。

Total System Global Area 1.3892E+10 bytes
Fixed Size                  5420776 bytes
Variable Size            2281703704 bytes
Database Buffers         1.1576E+10 bytes
Redo Buffers               28131328 bytes
数据库装载完毕。
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [16703], [1403], [32], [], [], [],[], [], [], [], [], []
进程 ID: 9512
会话 ID: 272 序列号: 22801

查看alert日志

Sun Jun 30 14:47:55 2019
SMON: enabling cache recovery
Errors in file D:\APP\SHORCL\diag\rdbms\orcl\orcl\trace\orcl_ora_7824.trc  (incident=177881) (PDBNAME=CDB$ROOT):
ORA-00600: 内部错误代码, 参数: [16703], [1403], [32], [], [], [], [], [], [], [], [], []
Incident details in: D:\APP\SHORCL\diag\rdbms\orcl\orcl\incident\incdir_177881\orcl_ora_7824_i177881.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Sun Jun 30 14:47:57 2019
Errors in file D:\APP\SHORCL\diag\rdbms\orcl\orcl\trace\orcl_ora_7824.trc:
ORA-00704: 引导程序进程失败
ORA-00600: 内部错误代码, 参数: [16703], [1403], [32], [], [], [], [], [], [], [], [], []
Sun Jun 30 14:47:57 2019
Errors in file D:\APP\SHORCL\diag\rdbms\orcl\orcl\trace\orcl_ora_7824.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00600: 内部错误代码, 参数: [16703], [1403], [32], [], [], [], [], [], [], [], [], []
Sun Jun 30 14:47:57 2019
Errors in file D:\APP\SHORCL\diag\rdbms\orcl\orcl\trace\orcl_ora_7824.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00600: 内部错误代码, 参数: [16703], [1403], [32], [], [], [], [], [], [], [], [], []
Sun Jun 30 14:47:57 2019
Error 704 happened during db open, shutting down database
USER (ospid: 7824): terminating the instance due to error 704
Sun Jun 30 14:48:02 2019
Instance terminated by USER, pid = 7824
ORA-1092 signalled during: ALTER DATABASE OPEN...

根据以往经验,这个很可能也是tab$数据被删除导致。经过分析,该库的区别是由于该库是12C的pdb
12.1.0.2


pdb

通过分析确认,确实是tab$数据被删除,通过bbed反向删除处理,实现时间完美恢复,open之后删除恶意脚本,数据库直接使用,实现完美恢复

SQL> startup mount
ORACLE 例程已经启动。

Total System Global Area 1.3892E+10 bytes
Fixed Size                  5420776 bytes
Variable Size            2281703704 bytes
Database Buffers         1.1576E+10 bytes
Redo Buffers               28131328 bytes
数据库装载完毕。
SQL> alter database open;

数据库已更改。

SQL> select 'drop '||object_type||' '||owner||'.'||object_name||';' from dba_obj
ects where object_name in('DBMS_SUPPORT_DBMONITOR','DBMS_SUPPORT_DBMONITORP');

'DROP'||OBJECT_TYPE||''||OWNER||'.'||OBJECT_NAME||';'
--------------------------------------------------------------------------------

drop TRIGGER SYS.DBMS_SUPPORT_DBMONITOR;
drop PROCEDURE SYS.DBMS_SUPPORT_DBMONITORP;

SQL>
SQL> drop TRIGGER SYS.DBMS_SUPPORT_DBMONITOR;

触发器已删除。

SQL> drop PROCEDURE SYS.DBMS_SUPPORT_DBMONITORP;

过程已删除。

SQL> shutdown immediate;
数据库已经关闭。
已经卸载数据库。
ORACLE 例程已经关闭。
SQL> startup
ORACLE 例程已经启动。

Total System Global Area 1.3892E+10 bytes
Fixed Size                  5420776 bytes
Variable Size            2281703704 bytes
Database Buffers         1.1576E+10 bytes
Redo Buffers               28131328 bytes
数据库装载完毕。
数据库已经打开。
SQL> show pdbs;

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDBORCL                        MOUNTED
SQL> alter session set  container=pdborcl;

会话已更改。

SQL> alter database open;

数据库已更改。

ERROR: diskgroup XXXX was not mounted

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:ERROR: diskgroup XXXX was not mounted

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

aix平台10.2.0.5 2节点RAC,由于节点2系统盘故障,通过节点1镜像系统,复制到节点2,结果由于节点2磁盘顺序和节点1不匹配,aix工程师进行了相关操作之后,节点1重启之后datadg磁盘组无法mount

SQL> alter diskgroup datadg mount 
Mon Jun 10 23:23:46 CST 2019
NOTE: cache registered group DATADG number=1 incarn=0x8cf61164
Mon Jun 10 23:23:46 CST 2019
NOTE: Hbeat: instance first (grp 1)
Mon Jun 10 23:23:50 CST 2019
NOTE: start heartbeating (grp 1)
Mon Jun 10 23:23:50 CST 2019
NOTE: cache dismounting group 1/0x8CF61164 (DATADG) 
NOTE: dbwr not being msg'd to dismount
ERROR: diskgroup DATADG was not mounted

检查datadg磁盘组相关信息

Tue Jan 29 19:21:45 CST 2019
NOTE: start heartbeating (grp 2)
NOTE: cache opening disk 0 of grp 2: DATADG_0000 path:/dev/rhdisk6
Tue Jan 29 19:21:45 CST 2019
NOTE: F1X0 found on disk 0 fcn 0.0
NOTE: cache opening disk 1 of grp 2: DATADG_0001 path:/dev/rhdisk7
NOTE: cache opening disk 2 of grp 2: DATADG_0002 path:/dev/rhdisk8
NOTE: cache opening disk 3 of grp 2: DATADG_0003 path:/dev/rhdisk9
NOTE: cache mounting (first) group 2/0x60E59155 (DATADG)
* allocate domain 2, invalid = TRUE 
Tue Jan 29 19:21:45 CST 2019
NOTE: attached to recovery domain 2
Tue Jan 29 19:21:45 CST 2019
NOTE: cache recovered group 2 to fcn 0.849668
Tue Jan 29 19:21:45 CST 2019
NOTE: LGWR attempting to mount thread 1 for disk group 2
NOTE: LGWR mounted thread 1 for disk group 2
NOTE: opening chunk 1 at fcn 0.849668 ABA 
NOTE: seq=21 blk=5394 
Tue Jan 29 19:21:46 CST 2019
NOTE: cache mounting group 2/0x60E59155 (DATADG) succeeded
SUCCESS: diskgroup DATADG was mounted

通过这里可以看出来datadg磁盘组是由rhdisk6-9 四块磁盘组成,查询相关磁盘信息发现
asm-foreign


这里确定rhdisk7磁盘异常,通过kfed分析磁盘情况

D:\BaiduNetdiskDownload\xifenfei>kfed read rhdisk7.dd
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                           34 ; 0x001: 0x22
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                   49407 ; 0x004: blk=49407
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                    58396 ; 0x010: 0x0000e41c
kfbh.fcn.wrap:                   131072 ; 0x014: 0x00020000
kfbh.spare1:                 4294967064 ; 0x018: 0xffffff18
kfbh.spare2:                 2105310074 ; 0x01c: 0x7d7c7b7a
005918A00 00002200 0000C0FF 00000000 00000000  [."..............]
005918A10 0000E41C 00020000 FFFFFF18 7D7C7B7A  [............z{|}]
005918A20 00000000 00000000 00000000 00000000  [................]
  Repeat 253 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

D:\BaiduNetdiskDownload\xifenfei>kfed read rhdisk7.dd blkn=1
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
006EF8A00 00000000 00000000 00000000 00000000  [................]
  Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

D:\BaiduNetdiskDownload\xifenfei>kfed read rhdisk7.dd blkn=2|more
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                33554432 ; 0x004: blk=33554432
kfbh.block.obj:                16777344 ; 0x008: file=128
kfbh.check:                  3844041089 ; 0x00c: 0xe51f6981
kfbh.fcn.base:               1297484544 ; 0x010: 0x4d560b00
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdatb10.aunum:                       0 ; 0x000: 0x00000000
kfdatb10.shrink:                  49153 ; 0x004: 0xc001
kfdatb10.ub2pad:                  20555 ; 0x006: 0x504b
kfdatb10.auinfo[0].link.next:      2048 ; 0x008: 0x0800
kfdatb10.auinfo[0].link.prev:      2048 ; 0x00a: 0x0800
kfdatb10.auinfo[0].free:              0 ; 0x00c: 0x0000
kfdatb10.auinfo[0].total:         49153 ; 0x00e: 0xc001
kfdatb10.auinfo[1].link.next:      4096 ; 0x010: 0x1000
kfdatb10.auinfo[1].link.prev:      4096 ; 0x012: 0x1000
kfdatb10.auinfo[1].free:              0 ; 0x014: 0x0000
kfdatb10.auinfo[1].total:             0 ; 0x016: 0x0000
kfdatb10.auinfo[2].link.next:      6144 ; 0x018: 0x1800
kfdatb10.auinfo[2].link.prev:      6144 ; 0x01a: 0x1800
kfdatb10.auinfo[2].free:              0 ; 0x01c: 0x0000
kfdatb10.auinfo[2].total:             0 ; 0x01e: 0x0000
kfdatb10.auinfo[3].link.next:      8192 ; 0x020: 0x2000
kfdatb10.auinfo[3].link.prev:      8192 ; 0x022: 0x2000
kfdatb10.auinfo[3].free:              0 ; 0x024: 0x0000

对比磁盘可能的损坏情况,由于在aix 平台asm disk的block有一个特征一般0082开头,通过工具打开磁盘,检索该标记对比
正常磁盘
0082-good


异常磁盘
0082-had

通过上述分析,大概评估rhdisk7 元数据部分损坏的不光是block 0和1,人工修复继续使用的可能性不太大,而且基于客户的数据库不大,采取方案是直接拷贝数据文件、redo、控制文件到文件系统,然后在本地文件系统open库
open_db

运气不错,实现完美恢复数据0丢失

WARNING: Read Failed.导致asm磁盘组异常

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:WARNING: Read Failed.导致asm磁盘组异常

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户对asm dg进行扩容,一段时间之后,asm data 磁盘组直接dismount

Wed May 29 18:37:25 2019
SUCCESS: ALTER DISKGROUP DATA ADD  DISK '/dev/oracleasm/disks/DATA_0028' SIZE 511993M ,
'/dev/oracleasm/disks/DATA_0027' SIZE 511993M ,
'/dev/oracleasm/disks/DATA_0026' SIZE 511993M ,
'/dev/oracleasm/disks/DATA_0025' SIZE 511993M /* ASMCA */
NOTE: starting rebalance of group 1/0x9e18e2f1 (DATA) at power 1
Wed May 29 18:37:26 2019
Starting background process ARB0
Wed May 29 18:37:26 2019
ARB0 started with pid=34, OS id=96638 
NOTE: assigning ARB0 to group 1/0x9e18e2f1 (DATA) with 1 parallel I/O
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: Refresh completed on diskgroup DATA. No voting file found.
cellip.ora not found.
Wed May 29 19:21:43 2019
WARNING: Read Failed. group:1 disk:27 AU:0 offset:360448 size:4096
WARNING: cache failed reading from group=1(DATA) dsk=27 blk=88 count=1 from disk= 27 
(DATA_0027) kfkist=0x20 status=0x02 osderr=0x0 file=kfc.c line=11596
ERROR: cache failed to read group=1(DATA) dsk=27 blk=88 from disk(s): 27(DATA_0027)
ORA-15080: synchronous I/O operation to a disk failed
ORA-27072: File I/O error
Linux-x86_64 Error: 5: Input/output error
Additional information: 4
Additional information: 704
Additional information: -1
NOTE: cache initiating offline of disk 27 group DATA
NOTE: process _user31879_+asm1 (31879) initiating offline of disk 27.3915911747 (DATA_0027) with mask 0x7e in group 1
NOTE: initiating PST update: grp = 1, dsk = 27/0xe9681243, mask = 0x6a, op = clear
Wed May 29 19:21:43 2019
GMON updating disk modes for group 1 at 10 for pid 35, osid 31879
ERROR: Disk 27 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 1)
Wed May 29 19:21:43 2019
NOTE: cache dismounting (not clean) group 1/0x9E18E2F1 (DATA) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 90256, image: oracle@ftz-db-o1 (B000)
Wed May 29 19:21:43 2019
NOTE: halting all I/Os to diskgroup 1 (DATA)
WARNING: Offline for disk DATA_0027 in mode 0x7f failed.
Wed May 29 19:21:43 2019
NOTE: LGWR doing non-clean dismount of group 1 (DATA)
NOTE: LGWR sync ABA=27.3207 last written ABA 27.3207
Wed May 29 19:21:43 2019
ERROR: ORA-15130 thrown in ARB0 for group number 1
Errors in file /oracle/grid_base/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_96638.trc:
ORA-15130: diskgroup "" is being dismounted
ORA-15130: diskgroup "DATA" is being dismounted
Wed May 29 19:21:43 2019
NOTE: stopping process ARB0

后续继续mount data 磁盘组成功,但是立马又dismount

Wed May 29 18:37:25 2019
SUCCESS: ALTER DISKGROUP DATA ADD  DISK '/dev/oracleasm/disks/DATA_0028' SIZE 511993M ,
'/dev/oracleasm/disks/DATA_0027' SIZE 511993M ,
'/dev/oracleasm/disks/DATA_0026' SIZE 511993M ,
'/dev/oracleasm/disks/DATA_0025' SIZE 511993M /* ASMCA */
NOTE: starting rebalance of group 1/0x9e18e2f1 (DATA) at power 1
Wed May 29 18:37:26 2019
Starting background process ARB0
Wed May 29 18:37:26 2019
ARB0 started with pid=34, OS id=96638 
NOTE: assigning ARB0 to group 1/0x9e18e2f1 (DATA) with 1 parallel I/O
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: Refresh completed on diskgroup DATA. No voting file found.
cellip.ora not found.
Wed May 29 19:21:43 2019
WARNING: Read Failed. group:1 disk:27 AU:0 offset:360448 size:4096
WARNING: cache failed reading from group=1(DATA) dsk=27 blk=88 count=1 from disk= 27 
(DATA_0027) kfkist=0x20 status=0x02 osderr=0x0 file=kfc.c line=11596
ERROR: cache failed to read group=1(DATA) dsk=27 blk=88 from disk(s): 27(DATA_0027)
ORA-15080: synchronous I/O operation to a disk failed
ORA-27072: File I/O error
Linux-x86_64 Error: 5: Input/output error
Additional information: 4
Additional information: 704
Additional information: -1
NOTE: cache initiating offline of disk 27 group DATA
NOTE: process _user31879_+asm1 (31879) initiating offline of disk 27.3915911747 (DATA_0027) with mask 0x7e in group 1
NOTE: initiating PST update: grp = 1, dsk = 27/0xe9681243, mask = 0x6a, op = clear
Wed May 29 19:21:43 2019
GMON updating disk modes for group 1 at 10 for pid 35, osid 31879
ERROR: Disk 27 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 1)
Wed May 29 19:21:43 2019
NOTE: cache dismounting (not clean) group 1/0x9E18E2F1 (DATA) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 90256, image: oracle@ftz-db-o1 (B000)
Wed May 29 19:21:43 2019
NOTE: halting all I/Os to diskgroup 1 (DATA)
WARNING: Offline for disk DATA_0027 in mode 0x7f failed.
Wed May 29 19:21:43 2019
NOTE: LGWR doing non-clean dismount of group 1 (DATA)
NOTE: LGWR sync ABA=27.3207 last written ABA 27.3207
Wed May 29 19:21:43 2019
ERROR: ORA-15130 thrown in ARB0 for group number 1
Errors in file /oracle/grid_base/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_96638.trc:
ORA-15130: diskgroup "" is being dismounted
ORA-15130: diskgroup "DATA" is being dismounted
Wed May 29 19:21:43 2019
NOTE: stopping process ARB0

对于上述的故障现象,本质原因是由于asm 磁盘组增加新磁盘之后,开始做rebalance,但是由于遭遇到 27号盘上有IO读错误,使得asm磁盘组无法正常完成rebalance,因而data磁盘组无法稳定的mount。解决该问题思路,通过patch asm磁盘组,禁止rebalance,从而使得data磁盘组不再dismount,再进行后续恢复