一次非常幸运的ORA-600 16703(tab$被清空)故障恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:一次非常幸运的ORA-600 16703(tab$被清空)故障恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

这次的ORA-600 16703的故障比较特殊,客户的一套rac运行了5年多没有重启,这次由于异常导致其中一个节点重启,然后触发了tab$被清空,异常节点启动报ORA-600 16703错误.朋友在故障之后,第一时间没有对在运行的节点进行重启(虽然也无法对外提供业务服务),使得恢复工作相对简单一些,恢复效果也是最完美的.这个是我在对于软件安装介质注入恶意脚本,300天之后重启触发tab$被清空的相关恢复case中,最完美的一次(以前遇到过一次客户是虚拟化环境通过cdp回退然后类似方法处理ORA-600 16703直接把orachk备份表插入到tab$恢复),凸显了这位朋友在故障发生之后对于问题的准确判断和果断的应对能力.
有朋友和我反馈,他们数据库突然报大量ORA-600错误,业务无法正常操作,我分析相关日志确认:节点2重启之后节点1开始报大量ORA-600错误,但是节点一直处于open状态

Fri Jul 25 15:28:53 2025
Decreasing number of real time LMS from 3 to 0
Fri Jul 25 15:29:18 2025
Reconfiguration started (old inc 13, new inc 15)
List of instances:
 1 2 (myinst: 1) 
 Global Resource Directory frozen
 Communication channels reestablished
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
Fri Jul 25 15:29:18 2025
 LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Fri Jul 25 15:29:18 2025
 LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Fri Jul 25 15:29:18 2025
 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
 Set master node info 
 Submitted all remote-enqueue requests
 Dwn-cvts replayed, VALBLKs dubious
 All grantable enqueues granted
 Submitted all GCS remote-cache requests
 Fix write in gcs resources
Reconfiguration complete
Fri Jul 25 15:29:20 2025
minact-scn: Master returning as live inst:2 has inc# mismatch instinc:0 cur:15 errcnt:0
Fri Jul 25 15:30:07 2025
Errors in file /u01/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_252634.trc  (incident=77234):
ORA-00600: internal error code, arguments: [ktsircinfo_num1], [0], [0], [0], [],[],[],[],[],[],[],[]
Incident details in: /u01/oracle/diag/rdbms/orcl/orcl1/incident/incdir_77234/orcl1_ora_252634_i77234.trc
Fri Jul 25 15:30:18 2025
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Fri Jul 25 15:30:19 2025
Sweep [inc][77234]: completed
Sweep [inc2][77234]: completed
Fri Jul 25 15:30:27 2025
Errors in file /u01/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_261587.trc  (incident=76487):
ORA-00600: internal error code, arguments: [ktsircinfo_num1],[0],[0],[0], [], [], [], [], [], [], [], []

通过grep筛选报错信息

[root@iZbp11c0qyuuo1gr7j98upZ tmp]# egrep "ORA-00600|ORA-07445" alert_1.txt |sort -u
ORA-00600: internal error code, arguments: [25027], [0], [0], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kkpo_rcinfo_defstg:delseg], [28941391], [], [], [], []
ORA-00600: internal error code, arguments: [ktsircinfo_num1], [0], [0], [0], [], [], [], [], [], []
ORA-00600: 内部错误代码, 参数: [16659], [kqldtu], [DEL], [0], [35038924], [], [], [], [], [], [], []
ORA-00600: 内部错误代码, 参数: [16659], [kqldtu], [INS], [0], [277736], [], [], [], [], [], [], []
ORA-00600: 内部错误代码, 参数: [16659], [kqldtu], [INS], [0], [28829570], [], [], [], [], [], [], []
ORA-07445: exception encountered: core dump [qknSetParent()+9] [SIGSEGV] [ADDR:0x10354] 
   [PC:0x1A48B9B] [Address not mapped to object] []
ORA-07445: exception encountered: core dump [qksxaMoveQbAnnotations()+168] [SIGSEGV]
   [ADDR:0x20304] [PC:0x1594954] [Address not mapped to object] []
ORA-07445: 出现异常错误: 核心转储 [qknExpRegIni_int()+87] [SIGSEGV] [ADDR:0x8C] 
   [PC:0x1A4D729] [Address not mapped to object] []
ORA-07445: 出现异常错误: 核心转储 [qksxaMoveQbAnnotations()+168] [SIGSEGV] [ADDR:0x0] 
   [PC:0x1594954] [SI_KERNEL(general_protection)] []

既然是由于节点2重启导致节点1报错,那分析节点2重启相关情况,第一次重启成功之后,数据库开始报ORA-600错误

Fri Jul 25 15:29:29 2025
QMNC started with pid=46, OS id=363757 
Fri Jul 25 15:29:31 2025
minact-scn: Inst 2 is a slave inc#:15 mmon proc-id:363622 status:0x2
minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000
Fri Jul 25 15:29:33 2025
Errors in file /u01/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_m003_363779.trc  (incident=248519):
ORA-00600: internal error code, arguments: [kgmfvmi#3], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/oracle/diag/rdbms/orcl/orcl2/incident/incdir_248519/orcl2_m003_363779_i248519.trc
Starting background process SMCO
Fri Jul 25 15:29:35 2025
SMCO started with pid=57, OS id=363802 
Fri Jul 25 15:29:35 2025
Completed: ALTER DATABASE OPEN /* db agent *//* {2:23784:2} */
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_m003_363779.trc  (incident=248520):
ORA-00600: internal error code, arguments: [kgmfvmi#3], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/oracle/diag/rdbms/orcl/orcl2/incident/incdir_248520/orcl2_m003_363779_i248520.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Exception [type: SIGSEGV, Address not mapped to object][ADDR:0x10] [PC:0x2FDA4BB,kgmdelsis()+219][flags:0x0,count:1]
Errors in file /u01/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_m003_363779.trc  (incident=248521):
ORA-07445: exception encountered: core dump [kgmdelsis()+219] [SIGSEGV] 
  [ADDR:0x10] [PC:0x2FDA4BB] [Address not mapped to object] []
ORA-00600: internal error code, arguments: [kgmfvmi#3], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/oracle/diag/rdbms/orcl/orcl2/incident/incdir_248521/orcl2_m003_363779_i248521.trc
Use ADRCI or Support Workbench to package the incident.
Fri Jul 25 15:29:47 2025
Errors in file /u01/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_ora_363808.trc  (incident=248559):
ORA-00600: internal error code, arguments: [kkposds2], [18446744073709551615], [18446744073709551615], 
  [18446744073709551615], [], [], [], [], [], [], [], []

然后第二次重启数据库无法open成功,而是报ORA-600 16703错误

ALTER DATABASE OPEN /* db agent *//* {2:21799:2} */
Picked broadcast on commit scheme to generate SCNs
ARCH: STARTING ARCH PROCESSES
Fri Jul 25 15:41:23 2025
ARC0 started with pid=39, OS id=369231 
ARC0: Archival started
ARCH: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
Fri Jul 25 15:41:24 2025
ARC1 started with pid=40, OS id=369242 
Fri Jul 25 15:41:24 2025
ARC2 started with pid=41, OS id=369244 
Fri Jul 25 15:41:24 2025
ARC3 started with pid=42, OS id=369246 
ARC1: Archival started
ARC2: Archival started
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
ARC2: Becoming the heartbeat ARCH
Thread 2 opened at log sequence 33585
  Current log# 7 seq# 33585 mem# 0: +DATA/orcl/onlinelog/group_7.269.1011373611
Successful open of redo thread 2
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
SMON: enabling cache recovery
Errors in file /u01/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_ora_369210.trc  (incident=260494):
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
Incident details in: /u01/oracle/diag/rdbms/orcl/orcl2/incident/incdir_260494/orcl2_ora_369210_i260494.trc
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
SUCCESS: diskgroup FRA was mounted
Fri Jul 25 15:41:30 2025
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_ora_369210.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
Errors in file /u01/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_ora_369210.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
Error 704 happened during db open, shutting down database
USER (ospid: 369210): terminating the instance due to error 704
Instance terminated by USER, pid = 369210
ORA-1092 signalled during: ALTER DATABASE OPEN /* db agent *//* {2:21799:2} */...
opiodr aborting process unknown ospid (369210) as a result of ORA-1092
Fri Jul 25 15:41:31 2025
ORA-1092 : opitsk aborting process

到这一步基本上就清晰了,大概率是遭遇到以前恢复的类似case,tab$数据被清空导致,类似案例
ORA-600 16703故障解析—tab$表被清空
警告:互联网中有oracle介质被注入恶意程序导致—ORA-600 16703
通过在故障主机上找到安装介质,验证md5确认该程序是被注入恶意代码程序
md5


这个库由于还有一个节点处于open状态,相对处理比较简单,直接把备份的表数据反向插入回去即可

SYS@orcl1> select count(1) from ORACHK3C08C86E063530510ACD937;

  COUNT(1)
----------
     20696

SYS@orcl1> insert into tab$ select * from ORACHK3C08C86E063530510ACD937;

20696 rows created.

SYS@orcl1> commit;

Commit complete.

SYS@orcl1> select object_name,to_char(CREATED,'yyyy-mm-dd hh24:mi:ss') from dba_objects 
          2 where object_name in('DBMS_SUPPORT_DBMONITOR','DBMS_SUPPORT_DBMONITORP');

OBJECT_NAME                                  TO_CHAR(CREATED,'YY
-------------------------------------------- -------------------
DBMS_SUPPORT_DBMONITORP                      2019-06-19 17:06:46
DBMS_SUPPORT_DBMONITOR                       2019-06-19 17:06:46

然后清理掉恶意脚本,分别重启两个节点,完成数据恢复任务
2025-07-26_215903_578


这次故障能够快速顺利的恢复,和客户发现故障之后保留第一现场,没有把一个open的节点也重启有很大关系,open的节点也重启了,那后续恢复工作会麻烦很多,效果可能也没有这样的完美.

2025年第一起ORA-600 16703故障恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:2025年第一起ORA-600 16703故障恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

又有一个客户数据库启动报ORA-600 16703错误
ora-600-16703


查看alert日志信息

Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
SMON: enabling cache recovery
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\trace\xff_ora_8784.trc  (incident=20617):
ORA-00600: 内部错误代码, 参数: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\incident\incdir_20617\xff_ora_8784_i20617.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\trace\xff_ora_8784.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00600: 内部错误代码, 参数: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\trace\xff_ora_8784.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00600: 内部错误代码, 参数: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
Error 704 happened during db open, shutting down database
USER (ospid: 8784): terminating the instance due to error 704
Instance terminated by USER, pid = 8784
ORA-1092 signalled during: ALTER DATABASE OPEN...
opiodr aborting process unknown ospid (8784) as a result of ORA-1092

这个故障一般是由于oracle安装介质被注入了恶意脚本,在数据库运行300天之后重启数据库就会遭遇该问题:
警告:互联网中有oracle介质被注入恶意程序导致—ORA-600 16703
对于这个问题,我们这边经过多年的恢复经验积累和研究,可以实现直接open库并且不用做逻辑迁移,完美恢复之后业务直接使用,最大限度恢复数据和最小限度减少停机时间
以往类似恢复case汇总:
tab$恢复错误汇总
ORA-600 16703故障再现
tab$异常被处理之后报ORA-600 13304故障处理
ORA-600 16703直接把orachk备份表插入到tab$恢复
最近遇到几起ORA-600 16703故障(tab$被清空),请引起重视
ORA-00600: internal error code, arguments: [16703], [1403], [32]
aix平台tab$被删除可能出现ORA-600 [16703], [1403], [28]错误
ORA-00600: internal error code, arguments: [16703], [1403], [4] 原因
ORA-00600: internal error code, arguments: [16703], [1403], [4] 故障处理

ORA-600 16703故障,客户找人恢复数据库,数据库被进一步恶意破坏—ORA-00704 ORA-00922

ORA-600 16703故障再现

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-600 16703故障再现

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

从第一次发现ORA-600 16703(警告:互联网中有oracle介质被注入恶意程序导致—ORA-600 16703)至今已经7年多时间了,最近依旧有客户中招,提醒各位注意该问题

Sat Sep 14 21:43:29 2024
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sat Sep 14 21:43:29 2024
SMON: enabling cache recovery
Errors in file D:\ORACLE\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_6264.trc  (incident=8561):
ORA-00600: 内部错误代码, 参数: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
Incident details in: D:\ORACLE\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\incident\incdir_8561\orcl_ora_6264_i8561.trc
Sat Sep 14 21:43:31 2024
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file D:\ORACLE\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_6264.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00600: 内部错误代码, 参数: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
Errors in file D:\ORACLE\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_6264.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00600: 内部错误代码, 参数: [16703], [1403], [20], [], [], [], [], [], [], [], [], []
Error 704 happened during db open, shutting down database
USER (ospid: 6264): terminating the instance due to error 704
Instance terminated by USER, pid = 6264
ORA-1092 signalled during: alter database open...
opiodr aborting process unknown ospid (6264) as a result of ORA-1092

由于此类故障出现较多,破坏性加大,对其进行了深入的研究,在没有破坏现场的情况下,通过对tab$进行直接重建,实现数据库完美恢复(数据0丢失,数据库无需逻辑迁移[原库直接可用])
ora-600-16703


以前关于此类报错的文章:
10g数据库遭遇ORA-600 16703
12C数据库遭遇ORA-600 16703
ORA-600 kzrini:!uprofile处理
ORA-600 16703故障解析—tab$表被清空
近期又遇到ORA-600 16703和ORA-702故障
ORA-00600: internal error code, arguments: [16703], [1403], [4] 原因
tab$异常被处理之后报ORA-600 13304故障处理
最近遇到几起ORA-600 16703故障(tab$被清空),请引起重视
ORA-600 16703直接把orachk备份表插入到tab$恢复
警告:互联网中有oracle介质被注入恶意程序导致—ORA-600 16703
aix平台tab$被删除可能出现ORA-600 [16703], [1403], [28]错误
ORA-00600: internal error code, arguments: [16703], [1403], [4] 故障处理
ORA-00600: internal error code, arguments: [16703], [1403], [32]
ORA-600 16703故障,客户找人恢复数据库,数据库被进一步恶意破坏—ORA-00704 ORA-00922
尽可能不要从互联网下载Oracle安装介质和Patch,避免被注入恶意脚本,并检查已经存在的安装介质的sha256码

最近遇到几起ORA-600 16703故障(tab$被清空),请引起重视

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:最近遇到几起ORA-600 16703故障(tab$被清空),请引起重视

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

最近由于安装介质问题,导致tab$被清空的case又多了起来,最近遇到几个客户咨询此类问题
20240321201804
20240321201813
20240321201822


以前写过一些类似的文档,参考:
tab$恢复错误汇总
ORA-600 kzrini:!uprofile处理
10g数据库遭遇ORA-600 16703
12C数据库遭遇ORA-600 16703
ORA-00600: internal error code, arguments: [16703], [1403], [32]
ORA-600 16703直接把orachk备份表插入到tab$恢复
ORA-600 16703故障解析—tab$表被清空
tab$异常被处理之后报ORA-600 13304故障处理

aix平台tab$被删除可能出现ORA-600 [16703], [1403], [28]错误
ORA-00600: internal error code, arguments: [16703], [1403], [4] 故障处理
ORA-600 16703故障,客户找人恢复数据库,数据库被进一步恶意破坏—ORA-00704 ORA-00922
警告:互联网中有oracle介质被注入恶意程序导致—ORA-600 16703

近期又遇到ORA-600 16703和ORA-702故障

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:近期又遇到ORA-600 16703和ORA-702故障

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

比较常见的由于软件注入恶意代码或者被人恶意破坏的常见故障ORA-600 16703和ORA-702故障
ORA-702


上面的故障是由于bootstrap$表被删除导致,以前处理过类似case:
恶意删除bootstrap$导致数据库无法正常启动
恶意删除bootstrap$导致数据库无法正常启动
2019恢复第一单—ORA-704-ORA-702恢复
ORA-00702: bootstrap verison ” inconsistent with version ’8.0.0.0.0′
ORA-600-16703

该故障的原因是一般是由于在互联网上下载了带恶意脚本的oracle安装介质,300天之后数据库重启会导致tab$被删除,参见:
ORA-600 16703故障解析—tab$表被清空
警告:互联网中有oracle介质被注入恶意程序导致—ORA-600 16703
再次提醒oracle使用者:
1. 不要使用从第三方网站下载非官方的数据库访问软件(发生过被注入恶意脚本的plsql dev案例:plsql dev引起的数据库被黑勒索比特币实现原理分析和解决方案
2. 从官方下载oracle 安装程序和补丁程序(防止被注入恶意脚本,参考:警告:互联网中有oracle介质被注入恶意程序导致—ORA-600 16703