ORA-00470: LGWR process terminated with error

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:ORA-00470: LGWR process terminated with error

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户win 10.2.0.1数据库由于断电之后无法正常启动,报ORA-00470错误

SQL> startup mount;
ORACLE 例程已经启动。

Total System Global Area  293601280 bytes
Fixed Size                  1248600 bytes
Variable Size              92275368 bytes
Database Buffers          192937984 bytes
Redo Buffers                7139328 bytes
数据库装载完毕。
SQL> recoer database;
SQL> recover database;
完成介质恢复。
SQL> alter database open;
alter database open
*
第 1 行出现错误:
ORA-00470: LGWR 进程因错误而终止

查看alert日志发现报ORA-600 kcrfwfl_nab错误,导致后台进程异常

Mon May 13 10:38:52 2019
alter database open
Mon May 13 10:38:52 2019
Thread 1 opened at log sequence 1699
  Current log# 3 seq# 1699 mem# 0: D:\ORACLE\PRODUCT\10.2.0\ORADATA\ORACLE\REDO03.LOG
Successful open of redo thread 1
Mon May 13 10:38:52 2019
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Mon May 13 10:38:52 2019
Errors in file d:\oracle\product\10.2.0\db_1\admin\oracle\bdump\oracle_lgwr_2308.trc:
ORA-00600: internal error code, arguments: [kcrfwfl_nab], [4294967295], [102401], [], [], [], [], []

Mon May 13 10:38:53 2019
Errors in file d:\oracle\product\10.2.0\db_1\admin\oracle\bdump\oracle_lgwr_2308.trc:
ORA-00600: internal error code, arguments: [kcrfwfl_nab], [4294967295], [102401], [], [], [], [], []

Mon May 13 10:38:53 2019
LGWR: terminating instance due to error 470
Mon May 13 10:38:53 2019
ORA-470 signalled during: alter database open...
Mon May 13 10:38:53 2019
Errors in file d:\oracle\product\10.2.0\db_1\admin\oracle\bdump\oracle_pmon_2928.trc:
ORA-00470: LGWR process terminated with error

Mon May 13 10:38:54 2019
Errors in file d:\oracle\product\10.2.0\db_1\admin\oracle\bdump\oracle_reco_3224.trc:
ORA-00470: LGWR process terminated with error

Mon May 13 10:38:54 2019
Errors in file d:\oracle\product\10.2.0\db_1\admin\oracle\bdump\oracle_smon_3064.trc:
ORA-00470: LGWR process terminated with error

Mon May 13 10:38:54 2019
Errors in file d:\oracle\product\10.2.0\db_1\admin\oracle\bdump\oracle_ckpt_3640.trc:
ORA-00470: LGWR process terminated with error

Mon May 13 10:38:54 2019
Errors in file d:\oracle\product\10.2.0\db_1\admin\oracle\bdump\oracle_dbw0_1976.trc:
ORA-00470: LGWR process terminated with error

Mon May 13 10:38:54 2019
Errors in file d:\oracle\product\10.2.0\db_1\admin\oracle\bdump\oracle_mman_2356.trc:
ORA-00470: LGWR process terminated with error

Mon May 13 10:38:54 2019
Errors in file d:\oracle\product\10.2.0\db_1\admin\oracle\bdump\oracle_psp0_3876.trc:
ORA-00470: LGWR process terminated with error

Mon May 13 10:38:54 2019
Instance terminated by LGWR, pid = 2308

尝试启动数据库到upgrade模式,报ORA-600 2758错误

SQL> alter database open upgrade;
alter database open upgrade
*
第 1 行出现错误:
ORA-00600: 内部错误代码, 参数: [2758], [3], [4294967295], [102400], [10], [],[], []

对应alert日志报错

Mon May 13 10:45:50 2019
Completed redo application
Mon May 13 10:45:50 2019
Completed crash recovery at
 Thread 1: logseq 1699, block 4294967295, scn 56807170
 0 data blocks read, 0 data blocks written, 0 redo blocks read
Mon May 13 10:45:50 2019
Errors in file d:\oracle\product\10.2.0\db_1\admin\oracle\udump\oracle_ora_3220.trc:
ORA-00600: 内部错误代码, 参数: [2758], [3], [4294967295], [102400], [10], [], [], []

Mon May 13 10:45:51 2019
Aborting crash recovery due to error 600
Mon May 13 10:45:51 2019
Errors in file d:\oracle\product\10.2.0\db_1\admin\oracle\udump\oracle_ora_3220.trc:
ORA-00600: 内部错误代码, 参数: [2758], [3], [4294967295], [102400], [10], [], [], []

ORA-600 signalled during: alter database open upgrade...

通过上述相关报错分析,以及ORA-600 kcrfwfl_nab和ORA-600 2758报错的相关资料查询,确定是由于redo和ctl损坏导致,通过强制拉库,恢复成功

SQL> recover database;
完成介质恢复。
SQL> alter database open resetlogs;

数据库已更改。

存储双活系统逻辑损坏数据库抢救恢复

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:存储双活系统逻辑损坏数据库抢救恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

计划休假的前一夜晚上节点朋友求救电话,说xx医院核心his系统的Oracle数据库很多表报ORA-8103错误,业务无法正常办理.
ora-8103


通过dbv检查文件发现连续坏块
2

根据以往经验数据库出现类似这样的错误,很可能是底层问题,查看系统日志发现大量磁盘错误
3

该报错时间和应用反馈系统异常时间基本上匹配,初步怀疑是硬件或者os异常导致.因为客户数据库大量表表ORA-8103,而且有文件出现连片被置空,无法准确定位数据库损坏情况(置空值数据库级别的物理损坏,ora-8103是逻辑错误在表不被访问的情况下无法检查出来),考虑分析客户的硬件环境,备份容灾情况,分析选择最佳方案.
通过和客户沟通以及检查数据库的相关情况发现信息如下:
1)存储使用的是xx厂商的双活方案,这种存储级解决方案对于该故障来说没用,因为是lun的逻辑级别损坏,损坏数据同时同步到两套存储上.
2)数据库库容灾使用的是某厂家的cdp同步容灾,客户对cdp库进行分析,发现数据同步异常,基本上该方案也无法使用
3)数据库的备份情况:由于存放数据库备份的存储电池异常和有坏盘导致存储写io效率非常低,客户在3天之前停止掉了文件系统中的rman备份;有tsm的带库备份,结果检查发现竟无一次备份成功.
故障进一步扩大
针对客户情况,确定是节点2有明显异常,准备停掉节点2的数据库和集群,然后看下在节点1上是否有改善,结果发现把节点2的crs停掉之后,节点1的库直接crash,通过分析发现asm disk有一块盘磁盘前几M表直接置空(应该是在关闭crs之前就已经异常,只是因为磁盘头部分数据没有相关操作,因此没有触发相关问题),当一个节点关闭会去写磁盘头信息,asm发现异常直接dismount 节点1的磁盘组了,从而使得节点1的库异常.
4

5

现在的情况:
1)现在的asm 磁盘组异常(其中一个磁盘头前几M损坏),也就是说在原库基础上直接修复的概率基本上没有可能
2)cdp数据异常,不可用
3)在数据库相关服务器中找到一份4天之前的一次全备
恢复思路:
1.客户准备新空间,直接把4天之前的备份还原到本地文件系统中
2.通过底层工具对于有磁盘损坏的asm磁盘组进行分析,尝试恢复归档日志和redo(尽可能做到最大限度恢复数据)
3.通过备份还原4天之前的备份结合我们恢复的归档日志和redo尝试完全恢复数据
4.问题风险,就算归档日志和redo从损坏的asm 磁盘组中恢复出来,但是也有可能损坏,导致后面无法恢复到最新数据(造成数据丢失)
实际操作:
1. 由于客户在昨天晚上故障之后增加了一些undo数据文件,使得无法正常全库restore database(因为ctl中数据文件信息比备份集中多)
2. 后续由于10204 rac还原到单机出现ORA-600 kgeade_is_1错误
6

3. 数据库恢复完成之后,出现sqlplus 操作数据库正常,plsql dev和应用访问数据库报ora-27092的问题
7

最后运气不错,经过一系列努力,数据库open成功,应用也正常访问,最初生产环境中损坏的表现在查询也不再报ORA-8103,dbv检查异常文件也ok
8

9

再次提醒各位朋友:
1)你的数据库备份是否正常,建议定期做故障演练
2)选择合适数据库的容灾方案,建议定期检查或者演练
3)存储双活可以解决硬件故障问题,但是还要有适当的解决方案来规避存储逻辑错误风险.

数据库open过程遭遇ORA-1555对应sql语句补充

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:数据库open过程遭遇ORA-1555对应sql语句补充

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在2015年的在数据库open过程中常遇到ORA-01555汇总文章中写过oracle open过程中可能会遇到ORA-01555错误,对应的sql语句.最近的恢复中又遇到两个新的,对其进行补充
select rowcnt,blkcnt,empcnt,avgspc,chncnt,avgrln,nvl(degree,1), nvl(instances,1) from tab$ where obj# = :1

Thu May 09 02:10:27 2019
SMON: enabling cache recovery
ORA-01555 caused by SQL statement below (SQL ID: bqbdby3c400p7, SCN: 0x0000.3e785fc7):
select rowcnt,blkcnt,empcnt,avgspc,chncnt,avgrln,nvl(degree,1), nvl(instances,1) from tab$ where obj# = :1
Errors in file /home/u01/diag/rdbms/orcl/orcl/trace/orcl_ora_15929.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 91 with name "_SYSSMU91_1360910548$" too small
Errors in file /home/u01/diag/rdbms/orcl/orcl/trace/orcl_ora_15929.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 91 with name "_SYSSMU91_1360910548$" too small
Error 704 happened during db open, shutting down database
USER (ospid: 15929): terminating the instance due to error 704
Instance terminated by USER, pid = 15929
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (15929) as a result of ORA-1092
Thu May 09 02:10:28 2019
ORA-1092 : opitsk aborting process

select obj#,type#,ctime,mtime,stime, status, dataobj#, flags, oid$, spare1, spare2 from obj$ where owner#=:1 and name=:2 and namespace=:3 and remoteowner is null and linkname is null and subname is null

NSA2 started with pid=41, OS id=32571518 
ORA-01555 caused by SQL statement below (SQL ID: 3nkd3g3ju5ph1, Query Duration=0 sec, SCN: 0x0005.e4bea784):
select obj#,type#,ctime,mtime,stime, status, dataobj#, flags, oid$, spare1, spare2 from 
    obj$ where owner#=:1 and name=:2 and namespace=:3 and remoteowner is null and linkname is null and subname is null
Errors in file /u01/app/oracle/diag/rdbms/xifenfei_std/xifenfei/trace/xifenfei_ora_18939904.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-01555: snapshot too old: rollback segment number 7 with name "_SYSSMU7_542380376$" too small
Errors in file /u01/app/oracle/diag/rdbms/xifenfei_std/xifenfei/trace/xifenfei_ora_18939904.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-01555: snapshot too old: rollback segment number 7 with name "_SYSSMU7_542380376$" too small
Error 704 happened during db open, shutting down database
USER (ospid: 18939904): terminating the instance due to error 704
Instance terminated by USER, pid = 18939904
ORA-1092 signalled during: alter database open RESETLOGS...
opiodr aborting process unknown ospid (18939904) as a result of ORA-1092

ORA-600 kfrValAcd30 恢复

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:ORA-600 kfrValAcd30 恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户由于存储控制器损坏,修复控制器之后,asm无法正常mount,报ORA-600 kfrValAcd30错误,让我们提供技术支持
kfrValAcd30


asm alert日志报错

Wed Apr 03 16:50:57 2019
SQL> alter diskgroup data mount 
NOTE: cache registered group DATA number=1 incarn=0x14248741
NOTE: cache began mount (first) of group DATA number=1 incarn=0x14248741
NOTE: Assigning number (1,0) to disk (ORCL:DATAVOL1)
Wed Apr 03 16:51:03 2019
NOTE: start heartbeating (grp 1)
kfdp_query(DATA): 15 
kfdp_queryBg(): 15 
NOTE: cache opening disk 0 of grp 1: DATAVOL1 label:DATAVOL1
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache mounting (first) external redundancy group 1/0x14248741 (DATA)
Wed Apr 03 16:51:04 2019
* allocate domain 1, invalid = TRUE 
Wed Apr 03 16:51:04 2019
NOTE: attached to recovery domain 1
NOTE: starting recovery of thread=1 ckpt=27.2697 group=1 (DATA)
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_15951.trc  (incident=23394):
ORA-00600: internal error code, arguments: [kfrValAcd30], [DATA], [1], [27], [2697], [28], [2697], [], [], [], [], []
Incident details in: /u01/app/grid/diag/asm/+asm/+ASM2/incident/incdir_23394/+ASM2_ora_15951_i23394.trc
Abort recovery for domain 1
NOTE: crash recovery signalled OER-600
ERROR: ORA-600 signalled during mount of diskgroup DATA
ORA-00600: internal error code, arguments: [kfrValAcd30], [DATA], [1], [27], [2697], [28], [2697], [], [], [], [], []
ERROR: alter diskgroup data mount
NOTE: cache dismounting (clean) group 1/0x14248741 (DATA) 
NOTE: lgwr not being msg'd to dismount
freeing rdom 1
Wed Apr 03 16:51:05 2019
Sweep [inc][23394]: completed
Sweep [inc2][23394]: completed
Wed Apr 03 16:51:05 2019
Trace dumping is performing id=[cdmp_20190403165105]
NOTE: detached from domain 1
NOTE: cache dismounted group 1/0x14248741 (DATA) 
NOTE: cache ending mount (fail) of group DATA number=1 incarn=0x14248741
kfdp_dismount(): 16 
kfdp_dismountBg(): 16 
NOTE: De-assigning number (1,0) from disk (ORCL:DATAVOL1)
ERROR: diskgroup DATA was not mounted
NOTE: cache deleting context for group DATA 1/337938241

mos相关记录
参考:ORA-600 [KFRVALACD30] in ASM (Doc ID 2123013.1)
kfrValAcd30-mos


ORA-00600: internal error code, arguments: [kfrValAcd30]可能是bug或者硬件故障导致.基于客户的情况,最大可能就是由于硬件故障导致asm 磁盘组的acd无法正常进行,从而无法mount成功.如果运气好,通过kfed相关修复可以正常mount成功,运气不好可以通过底层进行恢复数据文件,从而最大限度恢复数据.

ORA-600 kcffo_online_pdb_check: fno_system 和 ORA-600 kcvfdb_pdb_set_clean_scn: cleanckpt错误

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:ORA-600 kcffo_online_pdb_check: fno_system 和 ORA-600 kcvfdb_pdb_set_clean_scn: cleanckpt错误

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在做18c模拟故障测试中,经过自己一系列折腾,主要遭遇了ORA-600 kcffo_online_pdb_check: fno_system 和 ORA-600 kcvfdb_pdb_set_clean_scn: cleanckpt错误,这些都是pdb特有的,主要是由于一些bug引起,在非pdb环境中不太可能遇到.其实这也就是说明由于pdb机制的引入,使得后续的数据库异常恢复中会更加复杂.
18c数据库open ORA-00603 ORA-01092 ORA-00600报错

[oracle@ora11g tmp]$ ss

SQL*Plus: Release 18.0.0.0.0 - Production on Sat Apr 20 21:18:09 2019
Version 18.3.0.0.0

Copyright (c) 1982, 2018, Oracle.  All rights reserved.


Connected to:
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Version 18.3.0.0.0

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
BANNER_FULL
--------------------------------------------------------------------------------
BANNER_LEGACY
--------------------------------------------------------------------------------
    CON_ID
----------
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Version 18.3.0.0.0
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
         0

BANNER
--------------------------------------------------------------------------------
BANNER_FULL
--------------------------------------------------------------------------------
BANNER_LEGACY
--------------------------------------------------------------------------------
    CON_ID
----------

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], [], [],
[], [], []
Process ID: 55775
Session ID: 135 Serial number: 49652

alert日志信息

Database Characterset is AL32UTF8
2019-04-20T18:20:54.256841+08:00
No Resource Manager plan active
2019-04-20T18:20:56.751241+08:00
replication_dependency_tracking turned off (no async multimaster replication found)
2019-04-20T18:20:57.862516+08:00
Starting background process AQPC
2019-04-20T18:20:58.341991+08:00
AQPC started with pid=45, OS id=55830 
2019-04-20T18:21:01.476252+08:00
PDB$SEED(2):Autotune of undo retention is turned on. 
2019-04-20T18:21:01.738732+08:00
Pdb PDB$SEED hit error 1157 during open read only (2) and will be closed.
2019-04-20T18:21:01.755310+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_ora_55775.trc:
ORA-01157: cannot identify/lock data file 5 - see DBWR trace file
ORA-01110: data file 5: '/u01/app/oracle/oradata/ORCL18C/pdbseed/system01.dbf'
PDB$SEED(2):JIT: pid 55775 requesting stop
PDB$SEED(2):Buffer Cache flush deferred for PDB 2
Could not open PDB$SEED error=1157
2019-04-20T18:21:01.887601+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_ora_55775.trc:
ORA-01157: cannot identify/lock data file 5 - see DBWR trace file
ORA-01110: data file 5: '/u01/app/oracle/oradata/ORCL18C/pdbseed/system01.dbf'
2019-04-20T18:21:03.385503+08:00
PDB1(3):Autotune of undo retention is turned on. 
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_p000_55808.trc  (incident=66865) (PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [kcffo_online_pdb_check: fno_system], [3], [], [], [], [], [], [], [], [], [], []
2019-04-20T18:21:03.682428+08:00
PDB2(4):Autotune of undo retention is turned on. 
Incident details in: /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/incident/incdir_66865/orcl18c_p000_55808_i66865.trc
2019-04-20T18:21:12.863880+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2019-04-20T18:21:12.879921+08:00
Pdb PDB1 hit error 600 during open read write (5) and will be closed.
2019-04-20T18:21:12.880506+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_p000_55808.trc:
ORA-00600: internal error code, arguments: [kcffo_online_pdb_check: fno_system], [3], [], [], [], [], [], [], [], [], [], []
PDB1(3):JIT: pid 55808 requesting stop
2019-04-20T18:21:12.915407+08:00
Dumping diagnostic data in directory=[cdmp_20190420182112], requested by (instance=1, osid=55808 (P000)), summary=[incident=66865].
2019-04-20T18:21:12.989890+08:00
PDB1(3):Buffer Cache flush deferred for PDB 3
2019-04-20T18:21:13.004575+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_p001_55810.trc  (incident=66873) (PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [kcffo_online_pdb_check: fno_system], [4], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/incident/incdir_66873/orcl18c_p001_55810_i66873.trc
2019-04-20T18:21:17.218642+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2019-04-20T18:21:17.222057+08:00
Pdb PDB2 hit error 600 during open read write (5) and will be closed.
2019-04-20T18:21:17.222236+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_p001_55810.trc:
ORA-00600: internal error code, arguments: [kcffo_online_pdb_check: fno_system], [4], [], [], [], [], [], [], [], [], [], []
2019-04-20T18:21:17.260941+08:00
Dumping diagnostic data in directory=[cdmp_20190420182117], requested by (instance=1, osid=55810 (P001)), summary=[incident=66873].
2019-04-20T18:21:17.262023+08:00
PDB2(4):JIT: pid 55810 requesting stop
PDB2(4):Buffer Cache flush deferred for PDB 4
2019-04-20T18:21:17.352939+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2019-04-20T18:21:17.483695+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_ora_55775.trc  (incident=66857) (PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [kcffo_online_pdb_check: fno_system], [3], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/incident/incdir_66857/orcl18c_ora_55775_i66857.trc
2019-04-20T18:21:22.612339+08:00
*****************************************************************
An internal routine has requested a dump of selected redo.
This usually happens following a specific internal error, when
analysis of the redo logs will help Oracle Support with the
diagnosis.
It is recommended that you retain all the redo logs generated (by
all the instances) during the past 12 hours, in case additional
redo dumps are required to help with the diagnosis.
*****************************************************************
2019-04-20T18:21:26.062635+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_ora_55775.trc  (incident=66858) (PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/incident/incdir_66858/orcl18c_ora_55775_i66858.trc
2019-04-20T18:21:26.506644+08:00
Dumping diagnostic data in directory=[cdmp_20190420182126], requested by (instance=1, osid=55775), summary=[incident=66857].
2019-04-20T18:21:30.119381+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2019-04-20T18:21:30.119505+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_ora_55775.trc:
ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], [], [], [], [], []
2019-04-20T18:21:30.119629+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_ora_55775.trc:
ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], [], [], [], [], []
2019-04-20T18:21:30.119719+08:00
Error 600 happened during db open, shutting down database
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_ora_55775.trc  (incident=66859) (PDBNAME=CDB$ROOT):
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/incident/incdir_66859/orcl18c_ora_55775_i66859.trc
2019-04-20T18:21:30.346811+08:00
Dumping diagnostic data in directory=[cdmp_20190420182130], requested by (instance=1, osid=55775), summary=[incident=66858].
2019-04-20T18:21:34.551418+08:00
opiodr aborting process unknown ospid (55775) as a result of ORA-603
2019-04-20T18:21:34.720885+08:00
ORA-603 : opitsk aborting process
License high water mark = 4
2019-04-20T18:21:34.754766+08:00
USER (ospid: 55775): terminating the instance due to ORA error 600
2019-04-20T18:21:35.839992+08:00
Instance terminated by USER, pid = 55775

alert日志提示文件不存在,实际上文件是存在的

[root@ora11g ~]# 
[root@ora11g ~]# ls -l /u01/app/oracle/oradata/ORCL18C/pdbseed/system01.dbf
-rw-r-----. 1 oracle oinstall 283123712 4月  13 23:59 /u01/app/oracle/oradata/ORCL18C/pdbseed/system01.dbf

主要错误是ORA-600 kcffo_online_pdb_check: fno_system 查询mos发现主要是由于数据库bug导致,查询mos发现不少bug
KCFFO_ONLINE_PDB_CHECK FNO_SYSTEM]


官方解释,主要是由于kcffo_online_pdb函数执行异常导致

Function kcffo_online_pdb_check  Check if it ok to online the files in a 
pluggable database. An error is  signalled if it is not ok. This routine will 
grab a file enqueue for the relevant files and save it in the NULL-terminated 
array fenqsp. The caller must release these after kcffo_online_pdb() or if an 
error occurs.

通过人工修改文件状态,绕过该错误,cdb数据库open成功,但是pdb依旧无法正常open

SQL> alter database open;

Database altered.

SQL>  alter session set container=PDB1;

Session altered.

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [kcvfdb_pdb_set_clean_scn: cleanckpt], [3], [1739494], [38655308813], [2], [], [], [], [], [], [], []

alert日志

PDB1(3):alter database open
PDB1(3):Autotune of undo retention is turned on. 
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_ora_56178.trc  (incident=69185) (PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [kcvfdb_pdb_set_clean_scn: cleanckpt], [3], [1739494], [38655308813], [2], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/incident/incdir_69185/orcl18c_ora_56178_i69185.trc
2019-04-20T18:31:41.761479+08:00
*****************************************************************
An internal routine has requested a dump of selected redo.
This usually happens following a specific internal error, when
analysis of the redo logs will help Oracle Support with the
diagnosis.
It is recommended that you retain all the redo logs generated (by
all the instances) during the past 12 hours, in case additional
redo dumps are required to help with the diagnosis.
*****************************************************************
2019-04-20T18:31:42.097465+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Pdb PDB1 hit error 600 during open read write (1) and will be closed.
2019-04-20T18:31:42.097847+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl18c/orcl18c/trace/orcl18c_ora_56178.trc:
ORA-00600: internal error code, arguments: [kcvfdb_pdb_set_clean_scn: cleanckpt], [3], [1739494], [38655308813], [2], [], [], [], [], [], [], []
PDB1(3):JIT: pid 56178 requesting stop
2019-04-20T18:31:42.098808+08:00
Dumping diagnostic data in directory=[cdmp_20190420183142], requested by (instance=1, osid=56178), summary=[incident=69185].
2019-04-20T18:31:42.138818+08:00
PDB1(3):Buffer Cache flush deferred for PDB 3
PDB1(3):ORA-600 signalled during: alter database open...

主要错误是ORA-600 kcvfdb_pdb_set_clean_scn: cleanckpt,通过查询mos,依旧发现mos上有的主要可能的bug
kcvfdb_pdb_set_clean_scn cleanckpt


通过人工修改数据文件的checkpoint scn解决该问题,pdb open成功