ORA-600 4194引起SMON encountered 100 out of maximum 100 non-fatal internal errors故障

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-600 4194引起SMON encountered 100 out of maximum 100 non-fatal internal errors故障

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户数据库11.2.0.3版本,由于机房停电导致数据库启动之后一段时间自动crash

Sat Sep 20 20:31:14 2025
QMNC started with pid=39, OS id=10637 
Completed: ALTER DATABASE OPEN
Starting background process CJQ0
Sat Sep 20 20:31:14 2025
CJQ0 started with pid=44, OS id=10654 
Setting Resource Manager plan SCHEDULER[0x318E]:DEFAULT_MAINTENANCE_PLAN via scheduler window
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Starting background process VKRM
Sat Sep 20 20:31:17 2025
VKRM started with pid=40, OS id=10680 
Sat Sep 20 20:38:01 2025
Starting background process SMCO
Sat Sep 20 20:38:01 2025
SMCO started with pid=38, OS id=10955 
Sat Sep 20 20:56:54 2025
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_j000_11564.trc (incident=148368):
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Block recovery from logseq 1082886, block 29263 to scn 74449804596
Recovery of Online Redo Log: Thread 1 Group 14 Seq 1082886 Reading mem 0
  Mem# 0: /oracledb/oradata/orcl/redo14.log
Block recovery stopped at EOT rba 1082886.29264.16
Block recovery completed at rba 1082886.29264.16, scn 17.1435360559
Block recovery from logseq 1082886, block 29263 to scn 74449804590
Recovery of Online Redo Log: Thread 1 Group 14 Seq 1082886 Reading mem 0
  Mem# 0: /oracledb/oradata/orcl/redo14.log
Block recovery completed at rba 1082886.29264.16, scn 17.1435360559
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_j000_11564.trc:
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Sat Sep 20 20:56:58 2025
Sweep [inc][148368]: completed
Sweep [inc2][148368]: completed
Sat Sep 20 21:00:20 2025
Exception[type:SIGSEGV,Address not mapped to object][ADDR:0xBC44AC1][PC:0x932F8EA,kgegpa()+40][flags:0x0,count:1]
Exception[type:SIGSEGV,Address not mapped to object][ADDR:0xBC44AC1][PC:0x932DEF3,kgebse()+771][flags:0x2,count:2]
Exception[type:SIGSEGV,Address not mapped to object][ADDR:0xBC44AC1][PC:0x932DEF3,kgebse()+771][flags:0x2,count:2]
Sat Sep 20 21:00:21 2025
Block recovery from logseq 1082886, block 29263 to scn 74449804596
Recovery of Online Redo Log: Thread 1 Group 14 Seq 1082886 Reading mem 0
  Mem# 0: /oracledb/oradata/orcl/redo14.log
……………………
Sat Sep 20 21:05:00 2025
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_10516.trc(incident=148296):
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Block recovery from logseq 1082886, block 32045 to scn 74449805729
Recovery of Online Redo Log: Thread 1 Group 14 Seq 1082886 Reading mem 0
  Mem# 0: /oracledb/oradata/orcl/redo14.log
Block recovery completed at rba 1082886.32056.16, scn 17.1435361698
Block recovery from logseq 1082886, block 32045 to scn 74449806046
Recovery of Online Redo Log: Thread 1 Group 14 Seq 1082886 Reading mem 0
  Mem# 0: /oracledb/oradata/orcl/redo14.log
Block recovery completed at rba 1082886.32321.16, scn 17.1435362015
Non-fatal internal error happenned while SMON was doing flushing of monitored table stats.
SMON encountered 1 out of maximum 100 non-fatal internal errors.
*******
Block recovery completed at rba 1082898.52054.16, scn 17.1444838013
Non-fatal internal error happenned while SMON was doing flushing of monitored table stats.
SMON encountered 99 out of maximum 100 non-fatal internal errors.
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_10516.trc(incident=164458):
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Block recovery from logseq 1082898, block 52038 to scn 74459282045
Recovery of Online Redo Log: Thread 1 Group 8 Seq 1082898 Reading mem 0
  Mem# 0: /oracledb/oradata/orcl/redo08.log
Block recovery completed at rba 1082898.52054.16, scn 17.1444838014
Block recovery from logseq 1082898, block 52038 to scn 74459282088
Recovery of Online Redo Log: Thread 1 Group 8 Seq 1082898 Reading mem 0
  Mem# 0: /oracledb/oradata/orcl/redo08.log
Block recovery completed at rba 1082898.52104.16, scn 17.1444838057
Non-fatal internal error happenned while SMON was doing flushing of monitored table stats.
SMON encountered 100 out of maximum 100 non-fatal internal errors.
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_10516.trc  (incident=164459):
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Block recovery from logseq 1082898, block 52038 to scn 74459282045
Recovery of Online Redo Log: Thread 1 Group 8 Seq 1082898 Reading mem 0
  Mem# 0: /oracledb/oradata/orcl/redo08.log
Block recovery completed at rba 1082898.52054.16, scn 17.1444838014
Block recovery from logseq 1082898, block 52038 to scn 74459282101
Recovery of Online Redo Log: Thread 1 Group 8 Seq 1082898 Reading mem 0
  Mem# 0: /oracledb/oradata/orcl/redo08.log
Block recovery completed at rba 1082898.52130.16, scn 17.1444838070
Non-fatal internal error happenned while SMON was doing flushing of monitored table stats.
SMON exceeded the maximum limit of 100 internal error(s).
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_10516.trc:
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
SMON (ospid: 10516): terminating the instance due to error 474
Mon Sep 22 04:05:28 2025
System state dump requested by(instance=1,osid=10516 (SMON)),summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_diag_10463.trc
Instance terminated by SMON, pid = 10516

错误原因比较明显是由于“Non-fatal internal error happenned while SMON was doing flushing of monitored table stats.”smon进程对MON_MODS$/MON_MODS_ALL$操作异常触发ORA-600 4194错误使得该操作失败,默认情况smon尝试100次(_smon_internal_errlimit该参数控制)依旧没有成功,会强制终止smon进程,从而导致实例crash.然后尝试重启数据库无法启动成功

Mon Sep 22 09:00:03 2025
ALTER DATABASE OPEN
Beginning crash recovery of 1 threads
 parallel recovery started with 32 processes
Started redo scan
Completed redo scan
 read 1360 KB redo, 405 data blocks need recovery
Started redo application at
 Thread 1: logseq 1082898, block 49410
Recovery of Online Redo Log: Thread 1 Group 8 Seq 1082898 Reading mem 0
  Mem# 0: /oracledb/oradata/orcl/redo08.log
Completed redo application of 0.60MB
Completed crash recovery at
 Thread 1: logseq 1082898, block 52130, scn 74459302102
 405 data blocks read, 405 data blocks written, 1360 redo k-bytes read
Thread 1 advanced to log sequence 1082899 (thread open)
Thread 1 opened at log sequence 1082899
  Current log# 9 seq# 1082899 mem# 0: /oracledb/oradata/orcl/redo09.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
SMON: enabling cache recovery
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_78465.trc  (incident=164779):
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Block recovery from logseq 1082899, block 3 to scn 74459302111
Recovery of Online Redo Log: Thread 1 Group 9 Seq 1082899 Reading mem 0
  Mem# 0: /oracledb/oradata/orcl/redo09.log
Block recovery stopped at EOT rba 1082899.5.16
Block recovery completed at rba 1082899.5.16, scn 17.1444858077
Block recovery from logseq 1082899, block 3 to scn 74459302108
Recovery of Online Redo Log: Thread 1 Group 9 Seq 1082899 Reading mem 0
  Mem# 0: /oracledb/oradata/orcl/redo09.log
Block recovery completed at rba 1082899.5.16, scn 17.1444858077
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_78465.trc:
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_78465.trc:
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
USER (ospid: 78465): terminating the instance due to error 600
Instance terminated by USER, pid = 78465
ORA-1092 signalled during: ALTER DATABASE OPEN...
opiodr aborting process unknown ospid (78465) as a result of ORA-1092
Mon Sep 22 09:00:08 2025
ORA-1092 : opitsk aborting process

客户再次尝试几次之后,最后直接无法正常mount库

Mon Sep 22 19:14:14 2025
ALTER DATABASE   MOUNT
USER (ospid: 11679): terminating the instance
System state dump requested by (instance=1, osid=11679), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_diag_11615.trc
Dumping diagnostic data in directory=[cdmp_20250922191419],requested by(instance=1,osid=11679)
   ,summary=[abnormal instance termination].
Instance terminated by USER, pid = 11679

这个故障处理起来相对比较简单:
1)根据当前损坏的的ctl以及操作系统的控制数据文件,redo以及alert日志中数据库字符集信息,构造重建控制文件语句,对该库进行rectl
2)由于ORA-600 4194错误,明显指向undo异常,通过对异常undo的回滚段处理,打开数据库