ORA-600 [kcrfr_update_nab_2]恢复支持

APPLIES TO:

Oracle Database – Enterprise Edition – Version 10.2.0.2 to 10.2.0.4 [Release 10.2]
Information in this document applies to any platform.
Oracle Server Enterprise Edition – Version: 10.2.0.2 to 10.2.0.4

SYMPTOMS

After Database Crashing or Shutdown abort we are unable to open database, in the alert.log we find the error   ORA-600[kcrfr_update_nab_2]

==> In the alert.log we can see that database crashes while performing instance recovery:

Tue Oct 07 13:30:28 2008
Starting ORACLE instance (normal)
..
ALTER DATABASE OPEN
Tue Oct 07 13:30:39 2008
Beginning crash recovery of 1 threads
Tue Oct 07 13:30:39 2008
Started redo scan
Tue Oct 07 13:30:41 2008
Errors in file ….ORCL_ora_3148.trc:
ORA-00600: internal error code, arguments: [kcrfr_update_nab_2], [0x3C2C5CD0], [2], [], [], []
Tue Oct 07 13:30:46 2008
Aborting crash recovery due to error 600

==> In the trace file we can see the following error stack

start recovery at logseq 18989, block 1312, scn 0

ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kcrfr_update_nab_2], [0x3C2C5CD0], [2], [], [], [],
Current SQL statement for this session:
alter database open
—– Call Stack Trace:
ksedst <- ksedmp <- ksfdmp <- kgerinv <- kgeasnmierr
<- kcrfr_update_nab <- kcrfr_read <- kcrfr_read_buffer <- kcrfrgv <- kcratr1
<- kcratr <- kctrec <- kcvcrv <- kcfopd <- adbdrv
<- opiexe <- opiosq0 <- kpooprx <- kpoal8 <- opiodr
<- ttcpip <- opitsk <- opiino <- opiodr <- opidrv
<- sou2o <- opimai_real <- opimai <- OracleThreadStart@

CAUSE

This issue has been reported in following bugs:

Bug 5692594
Hdr: 5692594 10.2.0.1 RDBMS 10.2.0.1 RECOVERY PRODID-5 PORTID-226 ORA-600
Abstract: AFTER DATABASE CRASHED DOESN’T OPEN ORA-600 [KCRFR_UPDATE_NAB_2]
Status: 95,Closed, Vendor OS Problem
 

Bug 6655116
Hdr: 6655116 10.2.0.3 RDBMS 10.2.0.3 RECOVERY PRODID-5 PORTID-23
Abstract: INSTANCES CRASH WITH ORA-600 [KCRFR_UPDATE_NAB_2] AFTER DISK FAILURE

Status: 95,Closed, Vendor OS Problem

EXPLANATION
The assert ORA-600: [kcrfr_update_nab_2] is a direct result of a lost write   in the current on line log that we are attempting to resolve.   So, this confirms the theory that this is a OS/hardware   lost write issue not an internal oracle bug. In fact the assert  ORA-600: [kcrfr_update_nab_2] is how we detect a lost log write.

SOLUTION

There are some bugs that match with this issue and all have been closed as Vendor OS/problem.
The error is caused by a corruption in the on line redo-log, probably a lost write in the file.

The best solution in this case is to restore database from backup and recover it until the sequence before the current on line redo-log

 

这类问题,由于写丢失无法直接open成功,如果需要,可以联系我们,提供专业ORACLE数据库恢复技术支持
Phone:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com

ORA-600 [kcrfrgv_nextlwn_scn]

有朋友公司数据库由于_use_single_log_writer参数,导致 异常断电之后,数据库无法启动,报ORA-600 [kcrfrgv_nextlwn_scn]错误,通过分析是由于该参数导致.

DESCRIPTION

Oracle 12c introduces a new default feature of using multiple LGWRs which may lead to DEADLOCK / Database Hang or ORA-742 “Log read detects lost write” or ORA-600 [kcrfrgv_nextlwn_scn] during instance OPEN or ORA-600 [krr_process_read_error_2] during Recovery on IBM AIX and potentially on HPUX Itanium 64bit.

The database may become unusable and fail to be OPEN.

OCCURRENCE

This issue is specific to RDBMS version 12c (12.1.0.1 or 12.1.0.2) where the new default feature of using multiple LGWRs is introduced.

It affects databases on IBM AIX and potentially on HPUX Itanium 64bit

SYMPTOMS

ORACLE on IBM AIX or HPUX Itanium 64bit with RDBMS Version 12c.

DEADLOCK or ORA-742 “Log read detects lost write” or ORA-600 [kcrfrgv_nextlwn_scn] during instance OPEN or ORA-600 [krr_process_read_error_2] during Recovery caused by bug 21915719.

PMON may terminate the instance while extensive block recovery is being performed.

A DEADLOCK example is with LG0[n] waiting on ‘LGWR worker group ordering’. Example from a System State Dump trace file:

PROCESS 18: LG01
SO: 0x7000101f95ad720, type: 4, owner: 0x7000101f84195f8, flag: INIT/-/-/0x00
if: 0x3 c: 0x3
proc=0x7000101f84195f8, name=session, file=ksu.h LINE:13590 ID:, pg=0
conuid=0
(session) sid: 865 ser: 1 trans: 0x0, creator: 0x7000101f84195f
Current Wait Stack:
0: waiting for ‘LGWR worker group ordering’
lwn_id=0x58, phase=0x1, =0x0
wait_id=4947 seq_num=4948 snap_id=1
wait times: snap=13 min 21 sec, exc=13 min 21 sec, total=13 min 21 sec
wait times: max=infinite, heur=13 min 21 sec
wait counts: calls=1 os=267
in_wait=1 iflags=0x5a0
There is at least one session blocking this session.
Dumping 1 direct blocker(s):
inst: 1, sid: 817, ser: 1
Dumping final blocker:
inst: 1, sid: 817, ser: 1
There are 730 sessions blocked by this session.
.
.
PROCESS 17: LG00
SO: 0x7000101f85bcc60, type: 4, owner: 0x7000101f93eeb20, flag: INIT/-/-/0x00
if: 0x3 c: 0x3
proc=0x7000101f93eeb20, name=session, file=ksu.h LINE:13590 ID:, pg=0
conuid=0
(session) sid: 817 ser: 1 trans: 0x0, creator: 0x7000101f93eeb20
ksuxds FALSE at location: 0
service name: SYS$BACKGROUND
Current Wait Stack:
0: waiting for ‘LGWR worker group ordering’
lwn_id=0x56, phase=0x1, =0x0
wait_id=1630680 seq_num=57841 snap_id=1
wait times: snap=13 min 21 sec, exc=13 min 21 sec, total=13 min 21 sec
wait times: max=infinite, heur=13 min 21 sec
wait counts: calls=2 os=268
in_wait=1 iflags=0x15a0
There is at least one session blocking this session.
Dumping 1 direct blocker(s):
inst: 1, sid: 865, ser: 1
Dumping final blocker:
inst: 1, sid: 865, ser: 1

The instance may fail to OPEN with errors ORA-600 [kcrfrgv_nextlwn_scn]  and/or ORA-600 [krr_process_read_error_2]:

Recovery Session Failed with:ORA-00283: recovery session canceled due to errors
ORA-00600: internal error code, arguments: [krr_process_read_error_2],

Alter database open fails with:

ORA-00600: internal error code, arguments: [kcrfrgv_nextlwn_scn] …..
ORA-600 signalled during: ALTER DATABASE OPEN…

WORKAROUND

Disable the new feature of multiple LGWR slave processes by proactively setting _use_single_log_writer=true.  

Setting _use_single_log_writer = true is a safe workaround; it is the behavior before 12c where multiple LGWR slave groups were not available.

ALTER SYSTEM SET “_use_single_log_writer”=TRUE SID=’*’ SCOPE=SPFILE;
— Restart the database or all instances of the RAC database

Note that while _use_single_log_writer=true is not set, then error ORA-600 [kcrfrgv_nextlwn_scn] might be produced avoiding the database to OPEN.  Once the problem is introduced, _use_single_log_writer=true may not fix it. _use_single_log_writer = true prevents inconsistencies in the redo log to be introduced which causes that error.

If the parameter does not help, because the problem was already introduced when _use_single_log_writer=true had not been proactively set, then Point in Time Recovery (PITR) or Flashback Database are the options to recover from this situation.

这类问题,由于写丢失无法直接open成功,如果需要,可以联系我们,提供专业ORACLE数据库恢复技术支持
Phone:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com

Alter database open ORA-7445 [kkcnrli0] signalled

APPLIES TO:

Oracle Database – Enterprise Edition – Version 11.2.0.3 to 12.1.0.2 [Release 11.2 to 12.1]
Information in this document applies to any platform.

SYMPTOMS

Alter database open signaled ORA-7445 kkcnrli0

at :

Wed Jan 17 12:09:17 2018
CJQ0 started with pid=174, OS id=71144
Completed: ALTER DATABASE OPEN /* db agent *//* {1:39551:2} */
Wed Jan 17 12:09:18 2018
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x0] [PC:0x198F36F, kkcnrli0()+639] [flags: 0x0, count: 1]
Errors in file /u01/app/oracle/diag/rdbms/zdlra/zdlra1/trace/zdlra1_q003_71365.trc (incident=247784):
ORA-07445: exception encountered: core dump [kkcnrli0()+639] [SIGSEGV] [ADDR:0x0] [PC:0x198F36F] [Address not mapped to object] []
Incident details in: /u01/app/oracle/diag/rdbms/zdlra/zdlra1/incident/incdir_247784/zdlra1_q003_71365_i247784.trc

 

Trace files

shows :

*** SERVICE NAME:(SYS$BACKGROUND) 2018-01-17 12:09:18.108
*** MODULE NAME:(Streams) 2018-01-17 12:09:18.108 <—————-
*** CLIENT DRIVER:() 2018-01-17 12:09:18.108
*** ACTION NAME:(QMON Slave) 2018-01-17 12:09:18.108 <————–

========= Dump for incident 247784 (ORA 7445 [kkcnrli0]) ========


….

*** 2018-01-17 12:09:18.110
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x3, level=3, mask=0x0)
[TOC00004]
—– SQL Statement (None) —–
Current SQL information unavailable – no cursor.
[TOC00004-END]

[TOC00005]
—– Call Stack Trace —–
… kkcnrli0 kkcnrli kponPurgeUnreachLoc kwqmnslv kwsbsmspm
kwsbgcbkms ksvrdp opirip opidrv sou2o
opimai_real ssthrdmain main

 

CAUSE

Bug 17722075 – ORA-7445 [kkcnrli] in Qnnn process during ALTER DATABASE OPEN (Doc ID 17722075.8)

SOLUTION

User  may either upgrade to the releases that were fixed

12.2.0.1 (Base Release)
12.1.0.2.160719 (Jul 2016) Database Patch Set Update (DB PSU)
12.1.0.2.160719 (Jul 2016) Database Proactive Bundle Patch
11.2.0.4.160719 Exadata Database Bundle Patch (Jul 2016)
12.1.0.2.160719 (Jul 2016) Bundle Patch for Windows Platforms

or

check if there were one-off patch according to your RDBMS oraInventory version

or

simply workaorund by restart database.

0kb数据文件恢复或者文件丢失恢复

接到一个朋友恢复请求,由于rose频繁切换导致文件系统部分数据文件变化为0kb和文件丢失.
故障现象
部分数据文件变化为0kb和文件丢失.
file_lost
file_size_0


这里比较明显,数据库的users03变为了0kb和users04丢失.数据库alert日志报错信息如下:

Completed: alter database mount exclusive
alter database open
Errors in file E:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_dbw0_12008.trc:
ORA-01157: ????/?????? 7 - ??? DBWR ????
ORA-01110: ???? 7: 'E:\APP\ADMINISTRATOR\ORADATA\ORCL\USERS03.DBF'
ORA-27047: ??????????
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 38) 已到文件结尾。
Errors in file E:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_dbw0_12008.trc:
ORA-01157: ????/?????? 8 - ??? DBWR ????
ORA-01110: ???? 8: 'E:\APP\ADMINISTRATOR\ORADATA\ORCL\USERS04.DBF'
ORA-27041: ??????
OSD-04002: 无法打开文件
O/S-Error: (OS 2) 系统找不到指定的文件。
Errors in file E:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_12040.trc:
ORA-01157: ????/?????? 7 - ??? DBWR ????
ORA-01110: ???? 7: 'E:\APP\ADMINISTRATOR\ORADATA\ORCL\USERS03.DBF'
ORA-1157 signalled during: alter database open...
Fri May 04 09:35:10 2018
Checker run found 2 new persistent data failures

alert日志的报错也比较明显,users03是文件超过了大小(大小为0kb,读取之后肯定超过大小),users04提示无法打开文件(文件在文件系统层面已经丢失).现在问题比较明显由于文件系统故障导致文件大小为0和丢失

碎片扫描恢复
常规的方法肯定无法恢复,比较好的方法只能是底层碎片扫描重组,结合多种扫描工具,最后发现一个做底层恢复的朋友的工具效果不错,扫描结果如下
file_scan


通过工具分析坏块情况

C:\Users\Administrator>dbv FiLe=D:\0504\ORCL_TS.4_FILE.7_10.ora

DBVERIFY: Release 11.2.0.4.0 - Production on 星期六 5月 5 08:52:53 2018

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - 开始验证: FILE = D:\0504\ORCL_TS.4_FILE.7_10.ora

………………

页 382565 标记为损坏
Corrupt block relative dba: 0x01c5d665 (file 7, block 382565)
Completely zero block found during dbv:

页 382566 标记为损坏
Corrupt block relative dba: 0x01c5d666 (file 7, block 382566)
Completely zero block found during dbv:

页 382567 标记为损坏
Corrupt block relative dba: 0x01c5d667 (file 7, block 382567)
Completely zero block found during dbv:



DBVERIFY - 验证完成

检查的页总数: 1374720
处理的页总数 (数据): 27582
失败的页总数 (数据): 0
处理的页总数 (索引): 20114
失败的页总数 (索引): 0
处理的页总数 (其他): 1319752
处理的总页数 (段)  : 0
失败的总页数 (段)  : 0
空的页总数: 1
标记为损坏的总页数: 7271
流入的页总数: 0
加密的总页数        : 0
最高块 SCN            : 228271996 (0.228271996)


C:\Users\Administrator>dbv FiLe=D:\0504\ORCL_TS.4_FILE.8_8.ora

DBVERIFY: Release 11.2.0.4.0 - Production on 星期六 5月 5 08:52:53 2018

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - 开始验证: FILE = D:\0504\ORCL_TS.4_FILE.8_8.ora


DBVERIFY - 验证完成

检查的页总数: 1136896
处理的页总数 (数据): 36639
失败的页总数 (数据): 0
处理的页总数 (索引): 57038
失败的页总数 (索引): 0
处理的页总数 (其他): 1043218
处理的总页数 (段)  : 0
失败的总页数 (段)  : 0
空的页总数: 1
标记为损坏的总页数: 0
流入的页总数: 0
加密的总页数        : 0
最高块 SCN            : 228271997 (0.228271997)

C:\Users\Administrator>

scan_resulte


这里通过分析恢复的两个文件总的block数量2511618,其中连续损坏7271个block损坏,由于出现问题之后,数据库被offline这两个文件继续启动运行了几个小时,导致少量block被覆盖,恢复软件直接置空.后续的恢复比较顺利,正常open数据库,然后处理坏块对象(正好不是业务核心表的lob字段,所有部分丢失影响不是非常大).

温馨提醒:
1. 数据文件和备份不要放在同一个阵列上,更不能是同一个分区(卷)上
2. 出现此类问题之后,应当理解停止对该分区的任何写操作,方式丢失或者大小为0的文件被覆盖.

ORA-00704 ORA-00702 恢复

数据库启动报ORA-01092 ORA-00704 ORA-00702错误

SQL> startup
ORACLE instance started.

Total System Global Area 3056513024 bytes
Fixed Size                  2257152 bytes
Variable Size             704646912 bytes
Database Buffers         2332033024 bytes
Redo Buffers               17575936 bytes
Database mounted.
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00702: bootstrap verison '' inconsistent with version '8.0.0.0.0'
Process ID: 27344
Session ID: 191 Serial number: 3

bootstrap-ORA-00702


查看alert日志

Mon Apr 09 16:22:34 2018
ALTER DATABASE   MOUNT
Successful mount of redo thread 1, with mount id 1383493834
Database mounted in Exclusive Mode
Lost write protection disabled
Completed: ALTER DATABASE   MOUNT
Mon Apr 09 16:22:39 2018
ALTER DATABASE OPEN
Thread 1 opened at log sequence 3
  Current log# 3 seq# 3 mem# 0: /u01/app/oracle/oradata/orcl/redo03.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
SMON: enabling cache recovery
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_27344.trc:
ORA-00704: bootstrap process failure
ORA-00702: bootstrap verison '' inconsistent with version '8.0.0.0.0'
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_27344.trc:
ORA-00704: bootstrap process failure
ORA-00702: bootstrap verison '' inconsistent with version '8.0.0.0.0'
Error 704 happened during db open, shutting down database
USER (ospid: 27344): terminating the instance due to error 704
Instance terminated by USER, pid = 27344
ORA-1092 signalled during: ALTER DATABASE OPEN...
opiodr aborting process unknown ospid (27344) as a result of ORA-1092
Mon Apr 09 16:22:40 2018
ORA-1092 : opitsk aborting process

错误比较明显是由于数据库open过程中bootstrap异常导致,出现此类错误一般是由于软件介质和db不匹配或者bootstrap表的block故障导致.

官方说明

Versions 9.2, 10.1, 10.2, 11.1, 11.2, 12.1

Error:  ORA-00702 bootstrap verison '%s' inconsistent with version '%s' 
---------------------------------------------------------------------------
Cause:  The reading version of the boostrap is incompatible with the current 
	bootstrap version. 
Action: Restore a version of the software that is compatible with this 
	bootstrap version

由于bootstrap$以及其中相关表处理比较特殊,如果您遭遇此类bootstrap$相关异常无法解决,需要恢复支持,请联系我们
Phone:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com