RAC环境redo在各节点本地导致数据库故障恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:RAC环境redo在各节点本地导致数据库故障恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

最近一个运行在win平台的rac,由于断电之后,集群两个节点均无法正常启动,客户进行了一系列尝试,结果到了ora-600 kclchkblk_4错误无法继续.
通过对数据库日志分析,回溯了故障大概的原因,启动的时候报错为:
节点1启动报错

Sun Aug 03 15:21:22 2025
alter database open
This instance was first to open
Beginning crash recovery of 2 threads
 parallel recovery started with 32 processes
Started redo scan
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl1\trace\orcl1_ora_7108.trc:
ORA-00314: 日志 11 (用于线程 2) 要求的 sequence# 147717 与 147541 不匹配
ORA-00312: 联机日志 11 线程 2: 'D:\REDOLOG\REDO011.LOG'
Abort recovery for domain 0
Aborting crash recovery due to error 314
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl1\trace\orcl1_ora_7108.trc:
ORA-00314: 日志 11 (用于线程 2) 要求的 sequence# 147717 与 147541 不匹配
ORA-00312: 联机日志 11 线程 2: 'D:\REDOLOG\REDO011.LOG'
Abort recovery for domain 0
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl1\trace\orcl1_ora_7108.trc:
ORA-00314: 日志 11 (用于线程 2) 要求的 sequence# 147717 与 147541 不匹配
ORA-00312: 联机日志 11 线程 2: 'D:\REDOLOG\REDO011.LOG'
ORA-314 signalled during: alter database open...

节点2启动报错

Sat Aug 02 15:45:43 2025
Successful mount of redo thread 2, with mount id 1735887907
Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
Lost write protection disabled
Completed: ALTER DATABASE MOUNT /* db agent *//* {1:47460:124} */
ALTER DATABASE OPEN /* db agent *//* {1:47460:124} */
This instance was first to open
Beginning crash recovery of 2 threads
Sat Aug 02 15:45:49 2025
 parallel recovery started with 32 processes
Started redo scan
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl2\trace\orcl2_ora_3444.trc:
ORA-00314: ?? 1 (???? 1) ??? sequence# 67782 ? 60818 ???
ORA-00312: ???? 1 ?? 1: 'D:\REDOLOG\REDO01.LOG'
Abort recovery for domain 0
Aborting crash recovery due to error 314
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl2\trace\orcl2_ora_3444.trc:
ORA-00314: ?? 1 (???? 1) ??? sequence# 67782 ? 60818 ???
ORA-00312: ???? 1 ?? 1: 'D:\REDOLOG\REDO01.LOG'
Abort recovery for domain 0
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl2\trace\orcl2_ora_3444.trc:
ORA-00314: ?? 1 (???? 1) ??? sequence# 67782 ? 60818 ???
ORA-00312: ???? 1 ?? 1: 'D:\REDOLOG\REDO01.LOG'
ORA-314 signalled during: ALTER DATABASE OPEN /* db agent *//* {1:47460:124} */...

看到这两个报错信息得出两个结论:
1)比较明显节点1需要thead 2的 group 11 sequence为147717,但是实际group 11文件的sequence为147541;而节点2启动需要thread 1的group 1 sequence为67782,但是实际中group 1文件的sequnece为60818,这两个都相差比较多,属于非正常的情况,很可能是文件本身有问题
2)这是一套win的rac架构,理论上redo应该在共享文件系统(一般是asm中),而这个第一感觉很可能是本地文件系统中
客户当时恢复之时查询信息截图
ORA-00314-ORA-00312


查看了两个节点的最后redo切换信息

--节点1
Sat Aug 02 10:49:31 2025
Thread 1 advanced to log sequence 67782 (LGWR switch)
  Current log# 1 seq# 67782 mem# 0: D:\REDOLOG\REDO01.LOG

--节点2(redo每组2G,节点2长时间没跑业务,之时做数据库导出操作,所以切换时间比较久远)
Sat Jul 26 16:56:42 2025
Thread 2 advanced to log sequence 147717 (LGWR switch)
  Current log# 11 seq# 147717 mem# 0: D:\REDOLOG\REDO011.LOG

并查看两个机器d:/redolog信息(客户自行resetlogs之后的,非第一现场,但是可以确认两个节点各自有一份redo文件
redo


本来这个是一个比较小的故障,只要把节点2的thread 1的redo拷贝到到节点1或者节点1的thread 2的redo拷贝到节点2,然后正常open库即可,现场恢复对rac不太熟悉,直接按照互联网上检索的处理方法,加上_allow_resetlogs_corruption然后强制拉库,结果不太幸运,拉库失败报ORA-600 kclchkblk_4错误
kclchkblk_4


Sun Aug 03 18:59:24 2025
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 21497084214
Resetting resetlogs activation ID 1543012633 (0x5bf88119)
Sun Aug 03 18:59:46 2025
Setting recovery target incarnation to 3
Sun Aug 03 18:59:46 2025
This instance was first to open
Picked broadcast on commit scheme to generate SCNs
Sun Aug 03 18:59:48 2025
Assigning activation ID 1735960667 (0x6778a85b)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: D:\REDOLOG\REDO01.LOG
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sun Aug 03 18:59:49 2025
SMON: enabling cache recovery
Instance recovery: looking for dead threads
Instance recovery: lock domain invalid but no dead threads
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl1\trace\orcl1_ora_8508.trc  (incident=728324):
ORA-00600: 内部错误代码, 参数: [kclchkblk_4], [5], [200595988], [5], [22247740], [], [], [], [], [], [], []
Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl1\incident\incdir_728324\orcl1_ora_8508_i728324.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Sun Aug 03 18:59:53 2025
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl1\trace\orcl1_ora_8508.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00600: 内部错误代码, 参数: [kclchkblk_4], [5], [200595988], [5], [22247740], [], [], [], [], [], [], []
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl1\trace\orcl1_ora_8508.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00600: 内部错误代码, 参数: [kclchkblk_4], [5], [200595988], [5], [22247740], [], [], [], [], [], [], []
Error 704 happened during db open, shutting down database
USER (ospid: 8508): terminating the instance due to error 704
Sun Aug 03 18:59:54 2025
opiodr aborting process unknown ospid (9480) as a result of ORA-1092
Sun Aug 03 19:00:09 2025
Instance terminated by USER, pid = 8508
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (8508) as a result of ORA-1092

这个故障之后,客户那边无法自行恢复,让我这边介入处理,对于这个错误以前处理比较多,一般就是scn问题,通过Patch SCN小工具快速解决
QQ20250808-185918


数据库open成功之后主要报一些ORA-600 4137,ORA-600 6006等错误

Database Characterset is ZHS16GBK
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl1\trace\orcl1_smon_8300.trc  (incident=1176205):
ORA-00600: 内部错误代码, 参数: [4137], [1.14.2713957], [0], [0], [], [], [], [], [], [], [], []
Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl1\incident\incdir_1176205\orcl1_smon_8300_i1176205.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Fri Aug 08 19:03:14 2025
ORACLE Instance orcl1 (pid = 25) - Error 600 encountered while recovering transaction (1, 14).
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl1\trace\orcl1_smon_8300.trc:
ORA-00600: 内部错误代码, 参数: [4137], [1.14.2713957], [0], [0], [], [], [], [], [], [], [], []
Fri Aug 08 19:03:15 2025
ORACLE Instance orcl1 (pid = 25) - Error 600 encountered while recovering transaction (5, 19).
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl1\trace\orcl1_smon_8300.trc:
ORA-00600: 内部错误代码, 参数: [4137], [5.19.2318502], [0], [0], [], [], [], [], [], [], [], []
Starting background process MMON
Fri Aug 08 19:03:18 2025
MMON started with pid=29, OS id=4624
Fri Aug 08 19:03:19 2025
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl1\trace\orcl1_smon_8300.trc  (incident=1176207):
ORA-00600: 内部错误代码, 参数: [6006], [1], [], [], [], [], [], [], [], [], [], []
Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl1\incident\incdir_1176207\orcl1_smon_8300_i1176207.trc
Starting background process MMNL
Fri Aug 08 19:03:19 2025
MMNL started with pid=30, OS id=8344
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
ORACLE Instance orcl1 (pid = 25) - Error 600 encountered while recovering transaction (46, 28) on object 197344.
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl1\trace\orcl1_smon_8300.trc:
ORA-00600: 内部错误代码, 参数: [6006], [1], [], [], [], [], [], [], [], [], [], []

通过重建undo解决该错误,数据库稳定运行,没有再crash和报明显错误,导出核心数据,完成本次恢复任务.

Oracle 21c 202507补丁-21.19

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:Oracle 21c 202507补丁-21.19

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

 Description  Database Update  GI Update  Windows Bundle Patch
  JUL2025 (21.19.0.0.0) 38068980  38068887  37962964
  APR2025 (21.18.0.0.0) 37655430  37642955  37532378
  JAN2025 (21.17.0.0.0) 37350281  37349593  37500148
  OCT2024 (21.16.0.0.0) 36991631  36990664  36878842
  JUL2024 (21.15.0.0.0) 36696242  36696109  36521898
  APR2024 (21.14.0.0.0) 36352352  36352207  36219877
  JAN2024 (21.13.0.0.0) 36041222  36031790  35962857
  OCT2023 (21.12.0.0.0) 35740258  35738010  35681617
  JUL2023  (21.11.0.0.0) 35428978  35427907  35347974
  APR2023 (21.10.0.0.0) 35134934  35132566  35046488
  JAN2023 (21.9.0.0.0) 34839741  34838415  34750812
  Oct2022 (21.8.0.0.0) 34527084  34526142  34468137
  JUL2022 (21.7.0.0.0) 34160444  34155589  34110698
  APR2022 (21.6.0.0.0) 33843745  33859395  33829143
  JAN2022 (21.5.0.0.0) 33516412  33531909  33589769
 OCT2021 (21.4.0.0.0) 33239276  33250101  NA

参考:Assistant: Download Reference for Oracle Database/GI Update, Revision, PSU, SPU(CPU), Bundle Patches, Patchsets and Base Releases (Doc ID 2118136.2)

ORA-600 kcratr_nab_less_than_odr和ORA-600 4194故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-600 kcratr_nab_less_than_odr和ORA-600 4194故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户一个11.2.0.1的库由于断电导致启动的时候报ORA-600 kcratr_nab_less_than_odr错误
ORA-600-kcratr_nab_less_than_odr


Wed Aug 06 15:55:07 2025
ALTER DATABASE OPEN
Beginning crash recovery of 1 threads
 parallel recovery started with 27 processes
Started redo scan
Completed redo scan
 read 11787 KB redo, 4129 data blocks need recovery
Errors in file d:\app\xff\diag\rdbms\primary\orcl\trace\orcl_ora_3520.trc  (incident=2624321):
ORA-00600: 内部错误代码, 参数: [kcratr_nab_less_than_odr], [1], [111988], [42537], [43285], [], [], [], [], [], [], []
Incident details in: d:\app\xff\diag\rdbms\primary\orcl\incident\incdir_2624321\orcl_ora_3520_i2624321.trc
Aborting crash recovery due to error 600
Errors in file d:\app\xff\diag\rdbms\primary\orcl\trace\orcl_ora_3520.trc:
ORA-00600: 内部错误代码, 参数: [kcratr_nab_less_than_odr], [1], [111988], [42537], [43285], [], [], [], [], [], [], []
Errors in file d:\app\xff\diag\rdbms\primary\orcl\trace\orcl_ora_3520.trc:
ORA-00600: 内部错误代码, 参数: [kcratr_nab_less_than_odr], [1], [111988], [42537], [43285], [], [], [], [], [], [], []
ORA-600 signalled during: ALTER DATABASE OPEN...

类似的ORA-600 kcratr_nab_less_than_odr故障处理不少,但是应用redo完成之后,再open报错的不多:
12c启动报kcratr_nab_less_than_odr
又一例ORA-600 kcratr_nab_less_than_odr
ORA-600 kcratr_nab_less_than_odr故障解决
差点被误操作的ORA-600 kcratr_nab_less_than_odr故障
11.2.0.4库中遇到ORA-600 kcratr_nab_less_than_odr报错
但是这个客户比较不幸,经过一系列尝试打开数据库,但是报ORA-600 4194错误

Wed Aug 06 17:09:33 2025
ALTER DATABASE RECOVER  database until cancel using backup controlfile  
Media Recovery Start
 started logmerger process
Parallel Media Recovery started with 28 slaves
ORA-279 signalled during: ALTER DATABASE RECOVER  database until cancel using backup controlfile  ...
Wed Aug 06 17:10:17 2025
ALTER DATABASE RECOVER    LOGFILE 'D:\app\xff\oradata\orcl\REDO05.LOG'  
Media Recovery Log D:\app\xff\oradata\orcl\REDO05.LOG
Wed Aug 06 17:10:17 2025
Incomplete recovery applied all redo ever generated.
Recovery completed through change 2118813100 time 08/06/2025 15:02:15
Completed: ALTER DATABASE RECOVER    LOGFILE 'D:\app\xff\oradata\orcl\REDO05.LOG'  
Wed Aug 06 17:14:05 2025
alter database open resetlogs
Archived Log entry 188143 added for thread 1 sequence 111985 ID 0x634438e6 dest 1:
Archived Log entry 188144 added for thread 1 sequence 111986 ID 0x634438e6 dest 1:
Archived Log entry 188145 added for thread 1 sequence 111987 ID 0x634438e6 dest 1:
Archived Log entry 188146 added for thread 1 sequence 111984 ID 0x634438e6 dest 1:
Archived Log entry 188147 added for thread 1 sequence 111988 ID 0x634438e6 dest 1:
RESETLOGS after complete recovery through change 2118813100
Resetting resetlogs activation ID 1665415398 (0x634438e6)
Wed Aug 06 17:15:03 2025
Archived Log entry 188148 added for thread 1 sequence 1 ID 0x677cfac1 dest 1:
Successfully onlined Undo Tablespace 2.
Dictionary check beginning
Dictionary check complete
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
No Resource Manager plan active
Errors in file d:\app\xff\diag\rdbms\primary\orcl\trace\orcl_smon_2428.trc  (incident=3264274):
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Incident details in: d:\app\xff\diag\rdbms\primary\orcl\incident\incdir_3264274\orcl_smon_2428_i3264274.trc
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Wed Aug 06 17:15:06 2025
QMNC started with pid=34, OS id=3052 
LOGSTDBY: Validating controlfile with logical metadata
LOGSTDBY: Validation complete
Completed: alter database open resetlogs

后续由于该undo异常报ORA-01595和ORA-600 4194等错误导致数据库crash

Wed Aug 06 17:15:49 2025
Doing block recovery for file 3 block 888
Resuming block recovery (PMON) for file 3 block 888
Block recovery from logseq 2, block 316 to scn 2118813732
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: D:\APP\xff\ORADATA\ORCL\REDO02.LOG
Block recovery stopped at EOT rba 2.447.16
Block recovery completed at rba 2.447.16, scn 0.2118813732
Doing block recovery for file 3 block 1960
Resuming block recovery (PMON) for file 3 block 1960
Block recovery from logseq 2, block 316 to scn 2118813622
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: D:\APP\xff\ORADATA\ORCL\REDO02.LOG
Block recovery completed at rba 2.317.16, scn 0.2118813626
Errors in file d:\app\xff\diag\rdbms\primary\orcl\trace\orcl_smon_2428.trc:
ORA-01595: error freeing extent (2) of rollback segment (27))
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []

接手这个库之后,处理工作相对比较简单,直接对异常undo进行重建即可

Thu Aug 07 09:35:08 2025
create undo tablespace undotbs2 datafile 'D:\APP\xff\ORADATA\ORCL\undotbs02.dbf' size 128M autoextend on
Completed: create undo tablespace undotbs2 datafile 'D:\APP\xff\ORADATA\ORCL\undotbs02.dbf' size 128M autoextend on
Thu Aug 07 09:37:53 2025
drop tablespace UNDOTBS1 including contents and datafiles
Deleted file D:\APP\xff\ORADATA\ORCL\UNDOTBS01.DBF
Completed: drop tablespace UNDOTBS1 including contents and datafiles

然后数据库报大量的ORA-600 kdsgrp1错误

Thu Aug 07 10:03:53 2025
Errors in file d:\app\xff\diag\rdbms\primary\orcl\trace\orcl_ora_6156.trc  (incident=5121829):
ORA-00600: 内部错误代码, 参数: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
Thu Aug 07 10:04:00 2025
Errors in file d:\app\xff\diag\rdbms\primary\orcl\trace\orcl_ora_4564.trc  (incident=5121501):
ORA-00600: 内部错误代码, 参数: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
Thu Aug 07 10:04:06 2025
Errors in file d:\app\xff\diag\rdbms\primary\orcl\trace\orcl_ora_9160.trc  (incident=5121837):
ORA-00600: 内部错误代码, 参数: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
Thu Aug 07 10:04:16 2025
Errors in file d:\app\xff\diag\rdbms\primary\orcl\trace\orcl_ora_5868.trc  (incident=5121821):
ORA-00600: 内部错误代码, 参数: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
Thu Aug 07 10:04:28 2025
Errors in file d:\app\xff\diag\rdbms\primary\orcl\trace\orcl_ora_4448.trc  (incident=5122077):
ORA-00600: 内部错误代码, 参数: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
* kdsgrp1-1: *************************************************
            row 0x013a9ae9.0 continuation at
            0x013a9ae9.0 file# 4 block# 3840745 slot 0 not found
KDSTABN_GET: 0 ..... ntab: 1
curSlot: 0 ..... nrows: 2
kdsgrp - dump CR block dba=0x013a9ae9
Block header dump:  0x013a9ae9
 Object id on Block? Y
 seg/obj: 0x31dd9  csc: 0x00.7dc0f246  itc: 2  flg: E  typ: 1 - DATA
     brn: 0  bdba: 0x13a9082 ver: 0x01 opc: 0
     inc: 2  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000

通过分析是由于index和表记录不匹配导致,对index进行重新rebuild完成本次修复任务.

一次非常幸运的ORA-600 16703(tab$被清空)故障恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:一次非常幸运的ORA-600 16703(tab$被清空)故障恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

这次的ORA-600 16703的故障比较特殊,客户的一套rac运行了5年多没有重启,这次由于异常导致其中一个节点重启,然后触发了tab$被清空,异常节点启动报ORA-600 16703错误.朋友在故障之后,第一时间没有对在运行的节点进行重启(虽然也无法对外提供业务服务),使得恢复工作相对简单一些,恢复效果也是最完美的.这个是我在对于软件安装介质注入恶意脚本,300天之后重启触发tab$被清空的相关恢复case中,最完美的一次(以前遇到过一次客户是虚拟化环境通过cdp回退然后类似方法处理ORA-600 16703直接把orachk备份表插入到tab$恢复),凸显了这位朋友在故障发生之后对于问题的准确判断和果断的应对能力.
有朋友和我反馈,他们数据库突然报大量ORA-600错误,业务无法正常操作,我分析相关日志确认:节点2重启之后节点1开始报大量ORA-600错误,但是节点一直处于open状态

Fri Jul 25 15:28:53 2025
Decreasing number of real time LMS from 3 to 0
Fri Jul 25 15:29:18 2025
Reconfiguration started (old inc 13, new inc 15)
List of instances:
 1 2 (myinst: 1) 
 Global Resource Directory frozen
 Communication channels reestablished
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
Fri Jul 25 15:29:18 2025
 LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Fri Jul 25 15:29:18 2025
 LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Fri Jul 25 15:29:18 2025
 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
 Set master node info 
 Submitted all remote-enqueue requests
 Dwn-cvts replayed, VALBLKs dubious
 All grantable enqueues granted
 Submitted all GCS remote-cache requests
 Fix write in gcs resources
Reconfiguration complete
Fri Jul 25 15:29:20 2025
minact-scn: Master returning as live inst:2 has inc# mismatch instinc:0 cur:15 errcnt:0
Fri Jul 25 15:30:07 2025
Errors in file /u01/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_252634.trc  (incident=77234):
ORA-00600: internal error code, arguments: [ktsircinfo_num1], [0], [0], [0], [],[],[],[],[],[],[],[]
Incident details in: /u01/oracle/diag/rdbms/orcl/orcl1/incident/incdir_77234/orcl1_ora_252634_i77234.trc
Fri Jul 25 15:30:18 2025
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Fri Jul 25 15:30:19 2025
Sweep [inc][77234]: completed
Sweep [inc2][77234]: completed
Fri Jul 25 15:30:27 2025
Errors in file /u01/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_261587.trc  (incident=76487):
ORA-00600: internal error code, arguments: [ktsircinfo_num1],[0],[0],[0], [], [], [], [], [], [], [], []

通过grep筛选报错信息

[root@iZbp11c0qyuuo1gr7j98upZ tmp]# egrep "ORA-00600|ORA-07445" alert_1.txt |sort -u
ORA-00600: internal error code, arguments: [25027], [0], [0], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kkpo_rcinfo_defstg:delseg], [28941391], [], [], [], []
ORA-00600: internal error code, arguments: [ktsircinfo_num1], [0], [0], [0], [], [], [], [], [], []
ORA-00600: 内部错误代码, 参数: [16659], [kqldtu], [DEL], [0], [35038924], [], [], [], [], [], [], []
ORA-00600: 内部错误代码, 参数: [16659], [kqldtu], [INS], [0], [277736], [], [], [], [], [], [], []
ORA-00600: 内部错误代码, 参数: [16659], [kqldtu], [INS], [0], [28829570], [], [], [], [], [], [], []
ORA-07445: exception encountered: core dump [qknSetParent()+9] [SIGSEGV] [ADDR:0x10354] 
   [PC:0x1A48B9B] [Address not mapped to object] []
ORA-07445: exception encountered: core dump [qksxaMoveQbAnnotations()+168] [SIGSEGV]
   [ADDR:0x20304] [PC:0x1594954] [Address not mapped to object] []
ORA-07445: 出现异常错误: 核心转储 [qknExpRegIni_int()+87] [SIGSEGV] [ADDR:0x8C] 
   [PC:0x1A4D729] [Address not mapped to object] []
ORA-07445: 出现异常错误: 核心转储 [qksxaMoveQbAnnotations()+168] [SIGSEGV] [ADDR:0x0] 
   [PC:0x1594954] [SI_KERNEL(general_protection)] []

既然是由于节点2重启导致节点1报错,那分析节点2重启相关情况,第一次重启成功之后,数据库开始报ORA-600错误

Fri Jul 25 15:29:29 2025
QMNC started with pid=46, OS id=363757 
Fri Jul 25 15:29:31 2025
minact-scn: Inst 2 is a slave inc#:15 mmon proc-id:363622 status:0x2
minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000
Fri Jul 25 15:29:33 2025
Errors in file /u01/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_m003_363779.trc  (incident=248519):
ORA-00600: internal error code, arguments: [kgmfvmi#3], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/oracle/diag/rdbms/orcl/orcl2/incident/incdir_248519/orcl2_m003_363779_i248519.trc
Starting background process SMCO
Fri Jul 25 15:29:35 2025
SMCO started with pid=57, OS id=363802 
Fri Jul 25 15:29:35 2025
Completed: ALTER DATABASE OPEN /* db agent *//* {2:23784:2} */
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_m003_363779.trc  (incident=248520):
ORA-00600: internal error code, arguments: [kgmfvmi#3], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/oracle/diag/rdbms/orcl/orcl2/incident/incdir_248520/orcl2_m003_363779_i248520.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Exception [type: SIGSEGV, Address not mapped to object][ADDR:0x10] [PC:0x2FDA4BB,kgmdelsis()+219][flags:0x0,count:1]
Errors in file /u01/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_m003_363779.trc  (incident=248521):
ORA-07445: exception encountered: core dump [kgmdelsis()+219] [SIGSEGV] 
  [ADDR:0x10] [PC:0x2FDA4BB] [Address not mapped to object] []
ORA-00600: internal error code, arguments: [kgmfvmi#3], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/oracle/diag/rdbms/orcl/orcl2/incident/incdir_248521/orcl2_m003_363779_i248521.trc
Use ADRCI or Support Workbench to package the incident.
Fri Jul 25 15:29:47 2025
Errors in file /u01/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_ora_363808.trc  (incident=248559):
ORA-00600: internal error code, arguments: [kkposds2], [18446744073709551615], [18446744073709551615], 
  [18446744073709551615], [], [], [], [], [], [], [], []

然后第二次重启数据库无法open成功,而是报ORA-600 16703错误

ALTER DATABASE OPEN /* db agent *//* {2:21799:2} */
Picked broadcast on commit scheme to generate SCNs
ARCH: STARTING ARCH PROCESSES
Fri Jul 25 15:41:23 2025
ARC0 started with pid=39, OS id=369231 
ARC0: Archival started
ARCH: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
Fri Jul 25 15:41:24 2025
ARC1 started with pid=40, OS id=369242 
Fri Jul 25 15:41:24 2025
ARC2 started with pid=41, OS id=369244 
Fri Jul 25 15:41:24 2025
ARC3 started with pid=42, OS id=369246 
ARC1: Archival started
ARC2: Archival started
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
ARC2: Becoming the heartbeat ARCH
Thread 2 opened at log sequence 33585
  Current log# 7 seq# 33585 mem# 0: +DATA/orcl/onlinelog/group_7.269.1011373611
Successful open of redo thread 2
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
SMON: enabling cache recovery
Errors in file /u01/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_ora_369210.trc  (incident=260494):
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
Incident details in: /u01/oracle/diag/rdbms/orcl/orcl2/incident/incdir_260494/orcl2_ora_369210_i260494.trc
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
SUCCESS: diskgroup FRA was mounted
Fri Jul 25 15:41:30 2025
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_ora_369210.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
Errors in file /u01/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_ora_369210.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
Error 704 happened during db open, shutting down database
USER (ospid: 369210): terminating the instance due to error 704
Instance terminated by USER, pid = 369210
ORA-1092 signalled during: ALTER DATABASE OPEN /* db agent *//* {2:21799:2} */...
opiodr aborting process unknown ospid (369210) as a result of ORA-1092
Fri Jul 25 15:41:31 2025
ORA-1092 : opitsk aborting process

到这一步基本上就清晰了,大概率是遭遇到以前恢复的类似case,tab$数据被清空导致,类似案例
ORA-600 16703故障解析—tab$表被清空
警告:互联网中有oracle介质被注入恶意程序导致—ORA-600 16703
通过在故障主机上找到安装介质,验证md5确认该程序是被注入恶意代码程序
md5


这个库由于还有一个节点处于open状态,相对处理比较简单,直接把备份的表数据反向插入回去即可

SYS@orcl1> select count(1) from ORACHK3C08C86E063530510ACD937;

  COUNT(1)
----------
     20696

SYS@orcl1> insert into tab$ select * from ORACHK3C08C86E063530510ACD937;

20696 rows created.

SYS@orcl1> commit;

Commit complete.

SYS@orcl1> select object_name,to_char(CREATED,'yyyy-mm-dd hh24:mi:ss') from dba_objects 
          2 where object_name in('DBMS_SUPPORT_DBMONITOR','DBMS_SUPPORT_DBMONITORP');

OBJECT_NAME                                  TO_CHAR(CREATED,'YY
-------------------------------------------- -------------------
DBMS_SUPPORT_DBMONITORP                      2019-06-19 17:06:46
DBMS_SUPPORT_DBMONITOR                       2019-06-19 17:06:46

然后清理掉恶意脚本,分别重启两个节点,完成数据恢复任务
2025-07-26_215903_578


这次故障能够快速顺利的恢复,和客户发现故障之后保留第一现场,没有把一个open的节点也重启有很大关系,open的节点也重启了,那后续恢复工作会麻烦很多,效果可能也没有这样的完美.

Oracle 19c 202507补丁(RUs+OJVM)-19.28

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:Oracle 19c 202507补丁(RUs+OJVM)-19.28

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

19.0.0.0
 Description  Database Update  GI Update  Windows Bundle Patch
 JUL2025 (19.28.0.0.0) 37960098  37957391  37962957
 APR2025 (19.27.0.0.0) 37642901  37641958  37532350
 JAN2025 (19.26.0.0.0) 37260974  37257886  37486199
 OCT2024 (19.25.0.0.0) 36912597  36916690  36878821
 JUL2024 (19.24.0.0.0) 36582781  36582629  36521936
 APR2024 (19.23.0.0.0) 36233263  36233126  36219938
 JAN2024 (19.22.0.0.0) 35943157  35940989  35962832
 OCT2023 (19.21.0.0.0) 35643107  35642822  35681552
 JUL2023 (19.20.0.0.0) 35320081  35319490  35348034
 APR2023 (19.19.0.0.0) 35042068  35037840  35046439
 JAN2023 (19.18.0.0.0) 34765931  34762026  34750795
 Oct2022 (19.17.0.0.0) 34419443  34416665  34468114
 JUL2022 (19.16.0.0.0) 34133642  34130714  34110685
 APR2022 (19.15.0.0.0) 33806152  33803476  33829175
 JAN2022 (19.14.0.0.0) 33515361  33509923  33575656
 OCT2021(19.13.0.0.0) 33192793  33182768  33155330
 JUL2021 (19.12.0.0.0) 32904851  32895426  32832237
 APR2021 (19.11.0.0.0) 32545013  32545008  32409154
 JAN2021 (19.10.0.0.0) 32218454  32226239  32062765
 OCT2020 (19.9.0.0.0) 31771877  31750108  31719903
 JUL2020  (19.8.0.0.0) 31281355  31305339  31247621
 APR2020 (19.7.0.0.0) 30869156  30899722  30901317
 JAN2020 (19.6.0.0.0) 30557433  30501910  30445947
 OCT2019 (19.5.0.0.0) 30125133  30116789  30151705
 JUL2019 (19.4.0.0.0) 29834717  29708769   NA
 APR2019 (19.3.0.0.0) 29517242  29517302   NA

 

19.0.0.0
 Description  OJVM Update  OJVM + DB Update  OJVM + GI Update
 JUL2025 (19.28.0.0.250715)  37847857  37952354  37952382
 APR2025 (19.27.0.0.250415)  37499406  37591483  37591516
 JAN2025 (19.26.0.0.250121)  37102264  37262172  37262208
 OCT2024 (19.25.0.0.241015)  36878697  36866623  36866740
 JUL2024 (19.24.0.0.240716)  36414915  36522340  36522439
 APR2024 (19.23.0.0.240416)  36199232  36209492  36209493
 JAN2024 (19.22.0.0.240116)  35926646  36031426  36031453
 OCT2023 (19.21.0.0.231017)  35648110  35742413  35742441
 JUL2023 (19.20.0.0.230718)  35354406  35370174  35370167
 APR2023 (19.19.0.0.230418)  35050341  35058163  35058172
 JAN2023 (19.18.0.0.230117)  34786990  34773489  34773504
 OCT2022 (19.17.0.0.221018)  34411846  34449114  34449117
 JUL2022 (19.16.0.0.220719)  34086870  34160831  34160854
 APR2022 (19.15.0.0.220419)  33808367  33859194  33859214
 JAN2022 (19.14.0.0.220118)  33561310  33567270  33567274
 OCT2021 (19.13.0.0.211019)  33192694  33248420  33248471
 JUL2021 (19.12.0.0.210720)  32876380  32900021  32900083
 APR2021 (19.11.0.0.210420)  32399816  32578972  32578973
 JAN2021 (19.10.0.0.210119)  32067171  32126828  32126842
 OCT2020 (19.9.0.0.201020)  31668882  31720396  31720429
 JUL2020 (19.8.0.0.200714)  31219897  31326362  31326369
 APR2020 (19.7.0.0.200414)  30805684  30783543  30783556
 JAN2020 (19.6.0.0.200114)  30484981  30463595  30463609
 OCT2019 (19.5.0.0.191015)  30128191  30133124  30133178
 JUL2019 (19.4.0.0.190716)  29774421  29699079  29699097
 APR2019 (19.3.0.0.190416)  29548437  29621253  29621299

参考:Assistant: Download Reference for Oracle Database/GI Update, Revision, PSU, SPU(CPU), Bundle Patches, Patchsets and Base Releases (Doc ID 2118136.2)