asm disk被加入到另外一个磁盘组故障恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:asm disk被加入到另外一个磁盘组故障恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有朋友在aix环境对其中一个rac的asm磁盘组进行扩容
add_disk


之后另外一套rac的磁盘组直接dismount

Wed Aug 23 12:44:02 2023
NOTE: SMON starting instance recovery for group DATA domain 2 (mounted)
NOTE: F1X0 found on disk 0 au 2 fcn 0.128808679
NOTE: SMON skipping disk 7 - no header
NOTE: cache initiating offline of disk 7 group DATA
NOTE: process _smon_+asm1 (1770932) initiating offline of disk 7.3422955792 (DATA_0007) with mask 0x7e in group 2
NOTE: initiating PST update: grp = 2, dsk = 7/0xcc062910, mask = 0x6a, op = clear
Wed Aug 23 12:44:02 2023
GMON updating disk modes for group 2 at 7 for pid 17, osid 1770932
ERROR: Disk 7 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 2)
Wed Aug 23 12:44:02 2023
NOTE: cache dismounting (not clean) group 2/0x7FE6D808 (DATA) 
WARNING: Offline for disk DATA_0007 in mode 0x7f failed.
Wed Aug 23 12:44:02 2023
NOTE: halting all I/Os to diskgroup 2 (DATA)
ERROR: No disks with F1X0 found on disk group DATA
NOTE: aborting instance recovery of domain 2 due to diskgroup dismount
NOTE: SMON skipping lock domain (2) validation because diskgroup being dismounted
Abort recovery for domain 2
Wed Aug 23 12:44:02 2023
ERROR: ORA-15130 in COD recovery for diskgroup 2/0x7fe6d808 (DATA)
ERROR: ORA-15130 thrown in RBAL for group number 2
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_2360526.trc:
ORA-15130: diskgroup "DATA" is being dismounted
[

再次尝试mount该磁盘组,报ORA-15042和ORA-15038错误

SQL> alter diskgroup data mount 
NOTE: cache registered group DATA number=2 incarn=0x79e6d861
NOTE: cache began mount (first) of group DATA number=2 incarn=0x79e6d861
NOTE: Assigning number (2,0) to disk (/dev/rhdisk31)
NOTE: Assigning number (2,3) to disk (/dev/rhdisk33)
NOTE: Assigning number (2,4) to disk (/dev/rhdisk34)
NOTE: Assigning number (2,5) to disk (/dev/rhdisk35)
NOTE: Assigning number (2,6) to disk (/dev/rhdisk36)
NOTE: Assigning number (2,9) to disk (/dev/rhdisk39)
NOTE: Assigning number (2,1) to disk (/dev/rhdisk8)
NOTE: Assigning number (2,2) to disk (/dev/rhdisk9)
Wed Aug 23 12:58:46 2023
NOTE: GMON heartbeating for grp 2
GMON querying group 2 at 11 for pid 27, osid 3736034
NOTE: Assigning number (2,7) to disk ()
NOTE: Assigning number (2,8) to disk ()
GMON querying group 2 at 12 for pid 27, osid 3736034
NOTE: cache dismounting (clean) group 2/0x79E6D861 (DATA) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 3736034, image: oracle@hbbz01 (TNS V1-V3)
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 2/0x79E6D861 (DATA) 
NOTE: cache ending mount (fail) of group DATA number=2 incarn=0x79e6d861
NOTE: cache deleting context for group DATA 2/0x79e6d861
GMON dismounting group 2 at 13 for pid 27, osid 3736034
NOTE: Disk DATA_0000 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0001 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0002 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0003 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0004 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0005 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0006 in mode 0x7f marked for de-assignment
NOTE: Disk  in mode 0x7f marked for de-assignment
NOTE: Disk  in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0009 in mode 0x7f marked for de-assignment
ERROR: diskgroup DATA was not mounted
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "8" is missing from group number "2" 
ORA-15042: ASM disk "7" is missing from group number "2" 
ORA-15038: disk '/dev/rhdisk37' mismatch on 'Time Stamp' with target disk group [2129689239] [2062898314]
ERROR: alter diskgroup data mount

怀疑把报错这个磁盘组的rhdisk37加入到另外一套rac的asm中了(也就是说两套asm使用了同一块磁盘),aix操作系统层面分析确认

---对asm扩容的机器上
# lscfg -vpl hdisk15
  hdisk15          U78C5.001.DQD076A-P2-C4-T1-W200C00A098BC9A83-L0  MPIO NetApp FCP Default PCM Disk

        Manufacturer................NETAPP  
        Machine Type and Model......LUN C-Mode      
        ROS Level and ID............9000
        Serial Number...............80DYz]L/OpCA
        Device Specific.(Z0)........FAS8020         


  PLATFORM SPECIFIC

  Name:  disk
    Node:  disk
    Device Type:  block

---磁盘组dismount的机器上
# lscfg -vpl hdisk37      
  hdisk37          U5802.001.9K87776-P1-C1-T1-W200500A098BC9A83-L0  MPIO NetApp FCP Default PCM Disk

        Manufacturer................NETAPP  
        Machine Type and Model......LUN C-Mode      
        ROS Level and ID............9000
        Serial Number...............80DYz]L/OpCA
        Device Specific.(Z0)........FAS8020         


  PLATFORM SPECIFIC

  Name:  disk
    Node:  disk
    Device Type:  block

通过lscfg 命令确认两套rac使用了同一块盘导致一个磁盘组异常,在新加的机器上查询确认新盘被破坏情况(新加入的磁盘由于reblance操作,已经被写入了380G左右数据[也就意味着这个磁盘在老磁盘组中最少会丢失380G数据]
20230905140603


对于这种情况,dismount磁盘组是外部冗余不可能直接mount起来,只能通过以前处理的类似方法:
asm disk header 彻底损坏恢复
asm磁盘加入vg恢复
asm磁盘dd破坏恢复
asm disk 磁盘部分被清空恢复
再一例asm disk被误加入vg并且扩容lv恢复
fdisk分区导致asm disk破坏数据库恢复
再一起asm disk被格式化成ext3文件系统故障恢复
oracle asm disk格式化恢复—格式化为ext4文件系统
oracle asm disk格式化恢复—格式化为ntfs文件系统
ORA-15063: ASM discovered an insufficient number of disks for diskgroup 恢复
通过底层处理恢复出来没有覆盖的数据块中数据
20230827200941

再使用dul恢复出来其中数据,完成这次故障的核心数据恢复

ORA-600 ksuloget2 恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-600 ksuloget2 恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户在win 32位的操作系统上调至sga超过2G,数据库运行过程中报ORA-600 ksuloget2错误

Thread 1 cannot allocate new log, sequence 43586
Checkpoint not complete
  Current log# 1 seq# 43585 mem# 0: D:\ORACLE\ORADATA\ORCL\REDO01.LOG
Fri Aug 04 14:57:02 2023
Errors in file d:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_42996.trc  (incident=67481):
ORA-00600: 内部错误代码, 参数: [ksuloget2], [0xFEBA6208], [0xFEBA3B08], [500], [0xFEBA622C], [], [], [], [], []
Thread 1 advanced to log sequence 43586 (LGWR switch)
  Current log# 2 seq# 43586 mem# 0: D:\ORACLE\ORADATA\ORCL\REDO02.LOG

重启数据库,进行尝试恢复继续报ORA-600 ksuloget2

Thu Aug 17 17:38:27 2023
ALTER DATABASE RECOVER  database using backup controlfile  
Media Recovery Start
 started logmerger process
Parallel Media Recovery started with 24 slaves
ORA-279 signalled during: ALTER DATABASE RECOVER  database using backup controlfile  ...
Thu Aug 17 17:39:01 2023
ALTER DATABASE RECOVER LOGFILE 'D:\oracle\flash_recovery_area\orcl\ARCHIVELOG\2023_08_04\REDO03.LOG'  
Media Recovery Log D:\oracle\flash_recovery_area\orcl\ARCHIVELOG\2023_08_04\REDO03.LOG
Thu Aug 17 17:39:01 2023
Errors in file d:\oracle\diag\rdbms\orcl\orcl\trace\orcl_pr00_5528.trc  (incident=110724):
ORA-00600: internal error code, arguments: [ksuloget2], [0xFEBA6E38], [0xFEBA3B08], [500], [0xFEBA6E5C], [], [], [], []
ORA-00342: archived log does not have expected resetlogs SCN 685171428
ORA-00334: archived log: 'D:\ORACLE\FLASH_RECOVERY_AREA\ORCL\ARCHIVELOG\2023_08_04\REDO03.LOG'
Errors in file d:\oracle\diag\rdbms\orcl\orcl\trace\orcl_pr00_5528.trc:
ORA-00600: internal error code, arguments: [ksuloget2], [0xFEBA6E38], [0xFEBA3B08], [500], [0xFEBA6E5C], [], [], [], []
ORA-00342: archived log does not have expected resetlogs SCN 685171428
ORA-00334: archived log: 'D:\ORACLE\FLASH_RECOVERY_AREA\ORCL\ARCHIVELOG\2023_08_04\REDO03.LOG'
Errors in file d:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_5604.trc  (incident=110709):
ORA-00600: internal error code, arguments: [ksuloget2], [0xFEBA6E38], [0xFEBA3B08], [500], [0xFEBA6E5C], [], [], [], []
ORA-00342: archived log does not have expected resetlogs SCN 685171428
ORA-00334: archived log: 'D:\ORACLE\FLASH_RECOVERY_AREA\ORCL\ARCHIVELOG\2023_08_04\REDO03.LOG'
Incident details in: d:\oracle\diag\rdbms\orcl\orcl\incident\incdir_110709\orcl_ora_5604_i110709.trc
ORA-600 signalled during:ALTER DATABASE RECOVER LOGFILE 'D:\oracle\flash_recovery_area\orcl\2023_08_04\REDO03.LOG'
ALTER DATABASE RECOVER CANCEL 
Errors in file d:\oracle\diag\rdbms\orcl\orcl\trace\orcl_pr00_5528.trc  (incident=110725):
ORA-00600: internal error code, arguments: [ksuloget2], [0xFEBA6E38], [0xFEBA3B08], [500], [0xFEBA6E5C], [], [], [], []
Errors in file d:\oracle\diag\rdbms\orcl\orcl\trace\orcl_pr00_5528.trc  (incident=110726):
ORA-00600: internal error code, arguments: [ksuloget2], [0xFEBA6E38], [0xFEBA3B08], [500], [0xFEBA6E5C], [], [], [], []
ORA-00600: internal error code, arguments: [ksuloget2], [0xFEBA6E38], [0xFEBA3B08], [500], [0xFEBA6E5C], [], [], [], []
Errors in file d:\oracle\diag\rdbms\orcl\orcl\trace\orcl_pr00_5528.trc:
ORA-00600: internal error code, arguments: [ksuloget2], [0xFEBA6E38], [0xFEBA3B08], [500], [0xFEBA6E5C], [], [], [], []
ORA-00600: internal error code, arguments: [ksuloget2], [0xFEBA6E38], [0xFEBA3B08], [500], [0xFEBA6E5C], [], [], [], []
Errors in file d:\oracle\diag\rdbms\orcl\orcl\trace\orcl_pr00_5528.trc  (incident=110727):
ORA-00600: internal error code, arguments: [ksuloget2], [0xFEBA6E38], [0xFEBA3B08], [500], [0xFEBA6E5C], [], [], [], []
ORA-00600: internal error code, arguments: [ksuloget2], [0xFEBA6E38], [0xFEBA3B08], [500], [0xFEBA6E5C], [], [], [], []
ORA-00600: internal error code, arguments: [ksuloget2], [0xFEBA6E38], [0xFEBA3B08], [500], [0xFEBA6E5C], [], [], [], []
Errors in file d:\oracle\diag\rdbms\orcl\orcl\trace\orcl_pr00_5528.trc  (incident=110728):
ORA-00600: internal error code, arguments: [ksuloget2], [0xFEBA6E38], [0xFEBA3B08], [500], [0xFEBA6E5C], [], [], [], []
ORA-00600: internal error code, arguments: [ksuloget2], [0xFEBA6E38], [0xFEBA3B08], [500], [0xFEBA6E5C], [], [], [], []
ORA-00600: internal error code, arguments: [ksuloget2], [0xFEBA6E38], [0xFEBA3B08], [500], [0xFEBA6E5C], [], [], [], []
ORA-00600: internal error code, arguments: [ksuloget2], [0xFEBA6E38], [0xFEBA3B08], [500], [0xFEBA6E5C], [], [], [], []
Errors in file d:\oracle\diag\rdbms\orcl\orcl\trace\orcl_ora_5604.trc  (incident=110710):
ORA-00600: internal error code, arguments: [ksuloget2], [0xFEBA6E38], [0xFEBA3B08], [500], [0xFEBA6E5C], [], [], [], []
ORA-00600: internal error code, arguments: [ksuloget2], [0xFEBA6E38], [0xFEBA3B08], [500], [0xFEBA6E5C], [], [], [], []
ORA-00600: internal error code, arguments: [ksuloget2], [0xFEBA6E38], [0xFEBA3B08], [500], [0xFEBA6E5C], [], [], [], []
ORA-00600: internal error code, arguments: [ksuloget2], [0xFEBA6E38], [0xFEBA3B08], [500], [0xFEBA6E5C], [], [], [
Incident details in: d:\oracle\diag\rdbms\orcl\orcl\incident\incdir_110710\orcl_ora_5604_i110710.trc

由于是应用日志失败,屏蔽日志一致性,强制打开数据库,检查数据ok,业务可以直接使用,对于这类问题,官方建议:ORA-600: [Ksuloget2] Hit on Windows When SGA Greater Than 1G (Doc ID 836109.1)
20230819105750


Oracle 19C 报ORA-704 ORA-01555故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:Oracle 19C 报ORA-704 ORA-01555故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

异常断电导致数据库无法启动,尝试对数据文件进行recover操作,报ORA-00283 ORA-00742 ORA-00312错误,由于redo写丢失无法正常应用

D:\check_db>sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on 星期日 7月 30 07:49:19 2023
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


连接到:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

SQL> recover datafile 1;
ORA-00283: 恢复会话因错误而取消
ORA-00742: 日志读取在线程 1 序列 9274 块 18057 中检测到写入丢失情况
ORA-00312: 联机日志 1 线程 1: 'D:\APP\ADMINISTRATOR\ORADATA\XFF\REDO01.LOG'

屏蔽数据一致性,尝试强制打开库,报ORA-00604,ORA-00704,ORA-01555错误

SQL> alter database open resetlogs;
alter database open resetlogs
*
第 1 行出现错误:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 9 with name
"_SYSSMU9_4165470211$" too small
进程 ID: 4036
会话 ID: 2277 序列号: 40707

alert日志对应错误

2023-07-30T06:54:43.457383+08:00
.... (PID:5836): Clearing online redo logfile 1 complete
.... (PID:5836): Clearing online redo logfile 2 complete
.... (PID:5836): Clearing online redo logfile 3 complete
Resetting resetlogs activation ID 3572089731 (0xd4e9c383)
Online log D:\APP\ADMINISTRATOR\ORADATA\XFF\REDO01.LOG: Thread 1 Group 1 was previously cleared
Online log D:\APP\ADMINISTRATOR\ORADATA\XFF\REDO02.LOG: Thread 1 Group 2 was previously cleared
Online log D:\APP\ADMINISTRATOR\ORADATA\XFF\REDO03.LOG: Thread 1 Group 3 was previously cleared
2023-07-30T06:54:43.863676+08:00
Setting recovery target incarnation to 2
2023-07-30T06:54:44.816771+08:00
Ping without log force is disabled:
  instance mounted in exclusive mode.
Endian type of dictionary set to little
2023-07-30T06:54:44.957395+08:00
Assigning activation ID 3664275149 (0xda6866cd)
2023-07-30T06:54:44.957395+08:00
TT00 (PID:4640): Gap Manager starting
2023-07-30T06:54:45.004305+08:00
Redo log for group 1, sequence 1 is not located on DAX storage
2023-07-30T06:54:46.176153+08:00
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: D:\APP\ADMINISTRATOR\ORADATA\XFF\REDO01.LOG
Successful open of redo thread 1
2023-07-30T06:54:46.191771+08:00
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
stopping change tracking
2023-07-30T06:54:46.223036+08:00
TT03 (PID:1816): Sleep 5 seconds and then try to clear SRLs in 2 time(s)
2023-07-30T06:54:46.332398+08:00
ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 
0x0000000017b852a7
):
2023-07-30T06:54:46.332398+08:00
select ctime, mtime, stime from obj$ where obj# = :1
2023-07-30T06:54:46.332398+08:00
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\trace\xff_ora_5836.trc:
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01555: 快照过旧: 回退段号 9 (名称为 "_SYSSMU9_4165470211$") 过小
2023-07-30T06:54:46.332398+08:00
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\trace\xff_ora_5836.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01555: 快照过旧: 回退段号 9 (名称为 "_SYSSMU9_4165470211$") 过小
2023-07-30T06:54:46.348028+08:00
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\trace\xff_ora_5836.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01555: 快照过旧: 回退段号 9 (名称为 "_SYSSMU9_4165470211$") 过小
Error 704 happened during db open, shutting down database
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\trace\xff_ora_5836.trc  (incident=474502):
ORA-00603: ORACLE 服务器会话因致命错误而终止
ORA-01092: ORACLE 实例终止。强制断开连接
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01555: 快照过旧: 回退段号 9 (名称为 "_SYSSMU9_4165470211$") 过小
Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\incident\incdir_474502\xff_ora_5836_i474502.trc
2023-07-30T06:54:47.785549+08:00
opiodr aborting process unknown ospid (5836) as a result of ORA-603
2023-07-30T06:54:47.816792+08:00
ORA-603 : opitsk aborting process
License high water mark = 6
USER (ospid: (prelim)): terminating the instance due to ORA error 

这类错误比较常见,参考以前类似恢复:
在数据库open过程中常遇到ORA-01555汇总
数据库open过程遭遇ORA-1555对应sql语句补充
Oracle Recovery Tools恢复—ORA-00704 ORA-01555故障
使用_allow_resetlogs_corruption导致ORA-00704/ORA-01555故障
对于本次故障,通过Oracle Recovery Tools工具快速处理
patch


open数据库成功

SQL> alter database open;

数据库已更改。

SQL>
SQL>
SQL> select status,count(1) from v$datafile group by status;

STATUS           COUNT(1)
-------------- ----------
SYSTEM                  1
ONLINE                 61

Exadata磁盘损坏导致磁盘组无法mount恢复(oracle一体机磁盘组异常恢复)

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:Exadata磁盘损坏导致磁盘组无法mount恢复(oracle一体机磁盘组异常恢复)

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

Oracle Exadata客户,在换盘过程中,cell节点又一块磁盘损坏,导致datac1磁盘组(该磁盘组是normal方式冗余)无法mount

Thu Jul 20 22:01:21 2023
SQL> alter diskgroup datac1 mount force 
NOTE: cache registered group DATAC1 number=1 incarn=0x0728ad12
NOTE: cache began mount (first) of group DATAC1 number=1 incarn=0x0728ad12
NOTE: Assigning number (1,35) to disk (o/192.168.10.9;192.168.10.10/DATAC1_CD_11_dm01celadm03)
NOTE: Assigning number (1,31) to disk (o/192.168.10.9;192.168.10.10/DATAC1_CD_07_dm01celadm03)
NOTE: Assigning number (1,24) to disk (o/192.168.10.9;192.168.10.10/DATAC1_CD_00_dm01celadm03)
NOTE: Assigning number (1,25) to disk (o/192.168.10.9;192.168.10.10/DATAC1_CD_01_dm01celadm03)
NOTE: Assigning number (1,27) to disk (o/192.168.10.9;192.168.10.10/DATAC1_CD_03_dm01celadm03)
NOTE: Assigning number (1,33) to disk (o/192.168.10.9;192.168.10.10/DATAC1_CD_09_dm01celadm03)
NOTE: Assigning number (1,30) to disk (o/192.168.10.9;192.168.10.10/DATAC1_CD_06_dm01celadm03)
NOTE: Assigning number (1,28) to disk (o/192.168.10.9;192.168.10.10/DATAC1_CD_04_dm01celadm03)
NOTE: Assigning number (1,26) to disk (o/192.168.10.9;192.168.10.10/DATAC1_CD_02_dm01celadm03)
NOTE: Assigning number (1,1) to disk (o/192.168.10.9;192.168.10.10/DATAC1_CD_08_dm01celadm03)
NOTE: Assigning number (1,34) to disk (o/192.168.10.9;192.168.10.10/DATAC1_CD_10_dm01celadm03)
NOTE: Assigning number (1,29) to disk (o/192.168.10.9;192.168.10.10/DATAC1_CD_05_dm01celadm03)
NOTE: Assigning number (1,3) to disk (o/192.168.10.7;192.168.10.8/DATAC1_CD_07_dm01celadm02)
NOTE: Assigning number (1,4) to disk (o/192.168.10.7;192.168.10.8/DATAC1_CD_06_dm01celadm02)
NOTE: Assigning number (1,5) to disk (o/192.168.10.7;192.168.10.8/DATAC1_CD_00_dm01celadm02)
NOTE: Assigning number (1,6) to disk (o/192.168.10.7;192.168.10.8/DATAC1_CD_10_dm01celadm02)
NOTE: Assigning number (1,7) to disk (o/192.168.10.7;192.168.10.8/DATAC1_CD_08_dm01celadm02)
NOTE: Assigning number (1,8) to disk (o/192.168.10.7;192.168.10.8/DATAC1_CD_03_dm01celadm02)
NOTE: Assigning number (1,9) to disk (o/192.168.10.7;192.168.10.8/DATAC1_CD_11_dm01celadm02)
NOTE: Assigning number (1,10) to disk (o/192.168.10.7;192.168.10.8/DATAC1_CD_01_dm01celadm02)
NOTE: Assigning number (1,11) to disk (o/192.168.10.7;192.168.10.8/DATAC1_CD_04_dm01celadm02)
NOTE: Assigning number (1,21) to disk (o/192.168.10.7;192.168.10.8/DATAC1_CD_05_dm01celadm02)
NOTE: Assigning number (1,43) to disk (o/192.168.10.7;192.168.10.8/DATAC1_CD_02_dm01celadm02)
NOTE: Assigning number (1,36) to disk (o/192.168.10.5;192.168.10.6/DATAC1_CD_07_dm01celadm01)
NOTE: Assigning number (1,37) to disk (o/192.168.10.5;192.168.10.6/DATAC1_CD_09_dm01celadm01)
NOTE: Assigning number (1,38) to disk (o/192.168.10.5;192.168.10.6/DATAC1_CD_11_dm01celadm01)
NOTE: Assigning number (1,0) to disk (o/192.168.10.5;192.168.10.6/DATAC1_CD_08_dm01celadm01)
NOTE: Assigning number (1,40) to disk (o/192.168.10.5;192.168.10.6/DATAC1_CD_00_dm01celadm01)
NOTE: Assigning number (1,41) to disk (o/192.168.10.5;192.168.10.6/DATAC1_CD_03_dm01celadm01)
NOTE: Assigning number (1,42) to disk (o/192.168.10.5;192.168.10.6/DATAC1_CD_06_dm01celadm01)
NOTE: Assigning number (1,44) to disk (o/192.168.10.5;192.168.10.6/DATAC1_CD_05_dm01celadm01)
NOTE: Assigning number (1,45) to disk (o/192.168.10.5;192.168.10.6/DATAC1_CD_01_dm01celadm01)
NOTE: Assigning number (1,46) to disk (o/192.168.10.5;192.168.10.6/DATAC1_CD_02_dm01celadm01)
NOTE: Assigning number (1,47) to disk (o/192.168.10.5;192.168.10.6/DATAC1_CD_10_dm01celadm01)
NOTE: Assigning number (1,2) to disk (o/192.168.10.5;192.168.10.6/DATAC1_CD_04_dm01celadm01)
Thu Jul 20 22:01:28 2023
NOTE: GMON heartbeating for grp 1
GMON querying group 1 at 450 for pid 30, osid 171838
NOTE: Assigning number (1,32) to disk ()
NOTE: Assigning number (1,39) to disk ()
GMON querying group 1 at 451 for pid 30, osid 171838
NOTE: cache closing disk 32 of grp 1: (not open) 
NOTE: process _user171838_+asm1 (171838) 
     initiating offline of disk 39.3915945266 () with mask 0x7e[0x7f] in group 1
NOTE: initiating PST update: grp = 1, dsk = 39/0xe9689532, mask = 0x6a, op = clear
GMON updating disk modes for group 1 at 452 for pid 30, osid 171838
NOTE: cache closing disk 32 of grp 1: (not open) 
ERROR: Disk 39 cannot be offlined, since all the disks [39, 32] with mirrored data would be offline.
ERROR: too many offline disks in PST (grp 1)
WARNING: Offline for disk  in mode 0x7f failed.
NOTE: cache dismounting (not clean) group 1/0x0728AD12 (DATAC1) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 171838, image: oracle@dm01dbadm01.gyzq.cn (TNS V1-V3)
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 1/0x0728AD12 (DATAC1) 
NOTE: cache ending mount (fail) of group DATAC1 number=1 incarn=0x0728ad12
NOTE: cache deleting context for group DATAC1 1/0x0728ad12
NOTE: cache closing disk 32 of grp 1: (not open) 
GMON dismounting group 1 at 453 for pid 30, osid 171838
NOTE: Disk DATAC1_CD_08_DM01CELADM01 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_08_DM01CELADM03 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_04_DM01CELADM01 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_07_DM01CELADM02 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_06_DM01CELADM02 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_00_DM01CELADM02 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_10_DM01CELADM02 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_08_DM01CELADM02 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_03_DM01CELADM02 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_11_DM01CELADM02 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_01_DM01CELADM02 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_04_DM01CELADM02 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_05_DM01CELADM02 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_00_DM01CELADM03 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_01_DM01CELADM03 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_02_DM01CELADM03 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_03_DM01CELADM03 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_04_DM01CELADM03 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_05_DM01CELADM03 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_06_DM01CELADM03 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_07_DM01CELADM03 in mode 0x7f marked for de-assignment
NOTE: Disk  in mode 0x1 marked for de-assignment
NOTE: Disk DATAC1_CD_09_DM01CELADM03 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_10_DM01CELADM03 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_11_DM01CELADM03 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_07_DM01CELADM01 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_09_DM01CELADM01 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_11_DM01CELADM01 in mode 0x7f marked for de-assignment
NOTE: Disk  in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_00_DM01CELADM01 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_03_DM01CELADM01 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_06_DM01CELADM01 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_02_DM01CELADM02 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_05_DM01CELADM01 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_01_DM01CELADM01 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_02_DM01CELADM01 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_10_DM01CELADM01 in mode 0x7f marked for de-assignment
ERROR: diskgroup DATAC1 was not mounted
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15066: offlining disk "39" in group "DATAC1" may result in a data loss
ORA-15042: ASM disk "39" is missing from group number "1" 
ORA-15042: ASM disk "32" is missing from group number "1" 
ERROR: alter diskgroup datac1 mount force

故障原因是由于asm disk 32还已经损坏在换盘过程中(数据没有reblance完成),又损坏了asm disk 39,而这两份磁盘中有数据互为镜像,因此磁盘组无法正常mount起来.

检查cell节点celldisk和griddisk情况,确认底层磁盘损坏
cellcli


对于这种情况,因为normal冗余的两份数据都有部分丢失,无法直接恢复数据,通过底层磁盘级别恢复(参考以前一次的Oracle exadata故障恢复:Oracle Exadata坏盘导致磁盘组无法mount恢复),然后比较顺利恢复数据,实现业务数据0丢失

SQL> alter datac1 mount;

Diskgroup altered.

SQL> alter diskgroup datac1 check all;

Diskgroup altered.

多套库顺利open成功
20230728124241


在实际恢复过程中由于客户进行了各种尝试,直接新镜像盘然后插入新盘,强制拉磁盘组drop异常disk操作等,导致第一现场发生一些破坏,增加了恢复难道,但是最终通过各种方法弥补,实现了预期的恢复效果(业务数据0丢失)

ORA-01122 ORA-01208 故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-01122 ORA-01208 故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

数据库突然故障ORA-01122 ORA-01208,导致实例crash

Tue Jul 11 09:06:43 2023
Thread 1 cannot allocate new log, sequence 254989
Private strand flush not complete
  Current log# 3 seq# 254988 mem# 0: E:\APP\ADMINISTRATOR\ORADATA\xff\REDO03.LOG
Thread 1 advanced to log sequence 254989 (LGWR switch)
  Current log# 1 seq# 254989 mem# 0: E:\APP\ADMINISTRATOR\ORADATA\xff\REDO01.LOG
Tue Jul 11 09:09:46 2023
Read of datafile 'E:\APP\ADMINISTRATOR\ORADATA\xff\SYSTEM01.DBF' (fno 1) header failed with ORA-01208
Rereading datafile 1 header found valid data
Repaired corruption in datafile 1 header
Read of datafile 'E:\APP\ADMINISTRATOR\ORADATA\xff\SYSAUX01.DBF' (fno 2) header failed with ORA-01208
Rereading datafile 2 header found valid data
Repaired corruption in datafile 2 header
Read of datafile 'E:\APP\ADMINISTRATOR\ORADATA\xff\UNDOTBS01.DBF' (fno 3) header failed with ORA-01208
Rereading datafile 3 header failed with ORA-01208
Errors in file E:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\trace\xff_ckpt_5820.trc:
ORA-01242: data file suffered media failure: database in NOARCHIVELOG mode
ORA-01122: database file 3 failed verification check
ORA-01110: data file 3: 'E:\APP\ADMINISTRATOR\ORADATA\xff\UNDOTBS01.DBF'
ORA-01208: data file is an old version - not accessing current version
Errors in file E:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\trace\xff_ckpt_5820.trc:
ORA-01242: data file suffered media failure: database in NOARCHIVELOG mode
ORA-01122: database file 3 failed verification check
ORA-01110: data file 3: 'E:\APP\ADMINISTRATOR\ORADATA\xff\UNDOTBS01.DBF'
ORA-01208: data file is an old version - not accessing current version
CKPT (ospid: 5820): terminating the instance due to error 1242
…………
Tue Jul 11 09:10:10 2023
Instance terminated by CKPT, pid = 5820
Tue Jul 11 09:18:32 2023

重启实例无法open

Tue Jul 11 09:18:41 2023
alter database mount exclusive
Successful mount of redo thread 1, with mount id 1485684209
Database mounted in Exclusive Mode
Lost write protection disabled
Completed: alter database mount exclusive
alter database open
Errors in file E:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\trace\xff_ora_406776.trc:
ORA-01113: file 3 needs media recovery
ORA-01110: data file 3: 'E:\APP\ADMINISTRATOR\ORADATA\xff\UNDOTBS01.DBF'
ORA-1113 signalled during: alter database open...

通过Oracle Database Recovery Check工具分析
20230715200500


确认数据库恢复需要sequence为254986的日志,但是数据库为非归档模式,redo已经被覆盖,因此常规方法无法正常open库,通过Oracle Recovery Tools工具快速修改文件头实现数据库文件头一致,open数据库成功
20230417230141