联系:手机/微信(+86 17813235971) QQ(107644445)
标题:ORA-600 4194引起SMON encountered 100 out of maximum 100 non-fatal internal errors故障
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
客户数据库11.2.0.3版本,由于机房停电导致数据库启动之后一段时间自动crash
Sat Sep 20 20:31:14 2025 QMNC started with pid=39, OS id=10637 Completed: ALTER DATABASE OPEN Starting background process CJQ0 Sat Sep 20 20:31:14 2025 CJQ0 started with pid=44, OS id=10654 Setting Resource Manager plan SCHEDULER[0x318E]:DEFAULT_MAINTENANCE_PLAN via scheduler window Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter Starting background process VKRM Sat Sep 20 20:31:17 2025 VKRM started with pid=40, OS id=10680 Sat Sep 20 20:38:01 2025 Starting background process SMCO Sat Sep 20 20:38:01 2025 SMCO started with pid=38, OS id=10955 Sat Sep 20 20:56:54 2025 Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_j000_11564.trc (incident=148368): ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], [] Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Block recovery from logseq 1082886, block 29263 to scn 74449804596 Recovery of Online Redo Log: Thread 1 Group 14 Seq 1082886 Reading mem 0 Mem# 0: /oracledb/oradata/orcl/redo14.log Block recovery stopped at EOT rba 1082886.29264.16 Block recovery completed at rba 1082886.29264.16, scn 17.1435360559 Block recovery from logseq 1082886, block 29263 to scn 74449804590 Recovery of Online Redo Log: Thread 1 Group 14 Seq 1082886 Reading mem 0 Mem# 0: /oracledb/oradata/orcl/redo14.log Block recovery completed at rba 1082886.29264.16, scn 17.1435360559 Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_j000_11564.trc: ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], [] Sat Sep 20 20:56:58 2025 Sweep [inc][148368]: completed Sweep [inc2][148368]: completed Sat Sep 20 21:00:20 2025 Exception[type:SIGSEGV,Address not mapped to object][ADDR:0xBC44AC1][PC:0x932F8EA,kgegpa()+40][flags:0x0,count:1] Exception[type:SIGSEGV,Address not mapped to object][ADDR:0xBC44AC1][PC:0x932DEF3,kgebse()+771][flags:0x2,count:2] Exception[type:SIGSEGV,Address not mapped to object][ADDR:0xBC44AC1][PC:0x932DEF3,kgebse()+771][flags:0x2,count:2] Sat Sep 20 21:00:21 2025 Block recovery from logseq 1082886, block 29263 to scn 74449804596 Recovery of Online Redo Log: Thread 1 Group 14 Seq 1082886 Reading mem 0 Mem# 0: /oracledb/oradata/orcl/redo14.log …………………… Sat Sep 20 21:05:00 2025 Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_10516.trc(incident=148296): ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], [] Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Block recovery from logseq 1082886, block 32045 to scn 74449805729 Recovery of Online Redo Log: Thread 1 Group 14 Seq 1082886 Reading mem 0 Mem# 0: /oracledb/oradata/orcl/redo14.log Block recovery completed at rba 1082886.32056.16, scn 17.1435361698 Block recovery from logseq 1082886, block 32045 to scn 74449806046 Recovery of Online Redo Log: Thread 1 Group 14 Seq 1082886 Reading mem 0 Mem# 0: /oracledb/oradata/orcl/redo14.log Block recovery completed at rba 1082886.32321.16, scn 17.1435362015 Non-fatal internal error happenned while SMON was doing flushing of monitored table stats. SMON encountered 1 out of maximum 100 non-fatal internal errors. ******* Block recovery completed at rba 1082898.52054.16, scn 17.1444838013 Non-fatal internal error happenned while SMON was doing flushing of monitored table stats. SMON encountered 99 out of maximum 100 non-fatal internal errors. Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_10516.trc(incident=164458): ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], [] Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Block recovery from logseq 1082898, block 52038 to scn 74459282045 Recovery of Online Redo Log: Thread 1 Group 8 Seq 1082898 Reading mem 0 Mem# 0: /oracledb/oradata/orcl/redo08.log Block recovery completed at rba 1082898.52054.16, scn 17.1444838014 Block recovery from logseq 1082898, block 52038 to scn 74459282088 Recovery of Online Redo Log: Thread 1 Group 8 Seq 1082898 Reading mem 0 Mem# 0: /oracledb/oradata/orcl/redo08.log Block recovery completed at rba 1082898.52104.16, scn 17.1444838057 Non-fatal internal error happenned while SMON was doing flushing of monitored table stats. SMON encountered 100 out of maximum 100 non-fatal internal errors. Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_10516.trc (incident=164459): ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], [] Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Block recovery from logseq 1082898, block 52038 to scn 74459282045 Recovery of Online Redo Log: Thread 1 Group 8 Seq 1082898 Reading mem 0 Mem# 0: /oracledb/oradata/orcl/redo08.log Block recovery completed at rba 1082898.52054.16, scn 17.1444838014 Block recovery from logseq 1082898, block 52038 to scn 74459282101 Recovery of Online Redo Log: Thread 1 Group 8 Seq 1082898 Reading mem 0 Mem# 0: /oracledb/oradata/orcl/redo08.log Block recovery completed at rba 1082898.52130.16, scn 17.1444838070 Non-fatal internal error happenned while SMON was doing flushing of monitored table stats. SMON exceeded the maximum limit of 100 internal error(s). Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_10516.trc: ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], [] SMON (ospid: 10516): terminating the instance due to error 474 Mon Sep 22 04:05:28 2025 System state dump requested by(instance=1,osid=10516 (SMON)),summary=[abnormal instance termination]. System State dumped to trace file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_diag_10463.trc Instance terminated by SMON, pid = 10516
错误原因比较明显是由于“Non-fatal internal error happenned while SMON was doing flushing of monitored table stats.”smon进程对MON_MODS$/MON_MODS_ALL$操作异常触发ORA-600 4194错误使得该操作失败,默认情况smon尝试100次(_smon_internal_errlimit该参数控制)依旧没有成功,会强制终止smon进程,从而导致实例crash.然后尝试重启数据库无法启动成功
Mon Sep 22 09:00:03 2025 ALTER DATABASE OPEN Beginning crash recovery of 1 threads parallel recovery started with 32 processes Started redo scan Completed redo scan read 1360 KB redo, 405 data blocks need recovery Started redo application at Thread 1: logseq 1082898, block 49410 Recovery of Online Redo Log: Thread 1 Group 8 Seq 1082898 Reading mem 0 Mem# 0: /oracledb/oradata/orcl/redo08.log Completed redo application of 0.60MB Completed crash recovery at Thread 1: logseq 1082898, block 52130, scn 74459302102 405 data blocks read, 405 data blocks written, 1360 redo k-bytes read Thread 1 advanced to log sequence 1082899 (thread open) Thread 1 opened at log sequence 1082899 Current log# 9 seq# 1082899 mem# 0: /oracledb/oradata/orcl/redo09.log Successful open of redo thread 1 MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set SMON: enabling cache recovery Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_78465.trc (incident=164779): ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], [] Block recovery from logseq 1082899, block 3 to scn 74459302111 Recovery of Online Redo Log: Thread 1 Group 9 Seq 1082899 Reading mem 0 Mem# 0: /oracledb/oradata/orcl/redo09.log Block recovery stopped at EOT rba 1082899.5.16 Block recovery completed at rba 1082899.5.16, scn 17.1444858077 Block recovery from logseq 1082899, block 3 to scn 74459302108 Recovery of Online Redo Log: Thread 1 Group 9 Seq 1082899 Reading mem 0 Mem# 0: /oracledb/oradata/orcl/redo09.log Block recovery completed at rba 1082899.5.16, scn 17.1444858077 Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_78465.trc: ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], [] Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_78465.trc: ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], [] Error 600 happened during db open, shutting down database USER (ospid: 78465): terminating the instance due to error 600 Instance terminated by USER, pid = 78465 ORA-1092 signalled during: ALTER DATABASE OPEN... opiodr aborting process unknown ospid (78465) as a result of ORA-1092 Mon Sep 22 09:00:08 2025 ORA-1092 : opitsk aborting process
客户再次尝试几次之后,最后直接无法正常mount库
Mon Sep 22 19:14:14 2025 ALTER DATABASE MOUNT USER (ospid: 11679): terminating the instance System state dump requested by (instance=1, osid=11679), summary=[abnormal instance termination]. System State dumped to trace file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_diag_11615.trc Dumping diagnostic data in directory=[cdmp_20250922191419],requested by(instance=1,osid=11679) ,summary=[abnormal instance termination]. Instance terminated by USER, pid = 11679
这个故障处理起来相对比较简单:
1)根据当前损坏的的ctl以及操作系统的控制数据文件,redo以及alert日志中数据库字符集信息,构造重建控制文件语句,对该库进行rectl
2)由于ORA-600 4194错误,明显指向undo异常,通过对异常undo的回滚段处理,打开数据库