控制文件异常导致ORA-00600[kccsbck_first]

今天接到一个朋友求救他们的his系统数据库不能访问,情况比较紧急,让我帮忙处理.登录数据库得到信息如下:

操作系统:windows 2003
数据库:8.1.7
容灾方案:双机+emc存储镜像
备份:数据库无任何备份
启动到mount报错类此:
ORA-00600: internal error code, arguments: [kccsbck_first], [1], [4141358753], [], [], [], [], []

这个问题在上周的数据库恢复中遇到过一次,他们也是因为双机的案例,当时的情况见:双机mount数据库出现ORA-00600[kccsbck_first],有了上次的思维,我开始也怀疑是客户的双机的问题,但是客户说双机在半年前就关闭了,没有启动过;因为我对win的双机不太熟悉,怕他们双机软自动启动系统然后接管oracle从而导致这个问题,然后让客户检查另一台机器,确定没有启动和接管oracle 服务.
然后查询MOS发现win上面的特殊之处:是控制文件corruption导致故障(不过dbv检查不出来),而且三个控制文件有同样的问题
MOS记录如下(不过8.1.7也存在同样问题) [ID 291684.1]

Applies to:

Oracle Server - Enterprise Edition - Version: 9.0.1.5 and later   [Release: 9.0.1 and later ]
Information in this document applies to any platform.
***Checked for relevance on 09-APR-2012***
Symptoms



Alter database mount exclusive results in

ORA-00600: internal error code, arguments: [kccsbck_first], [1], [2141358753], [], [], [], [], []

The description of the error is:

'We receive this error because we are attempting to be the first 
thread/instance to mount the database and cannot because it appears that 
at least one other thread has mounted the database already'.


However in this case the database was a standalone database on Windows.
It had only one oracle service running.

The operating system was rebooted, the oracle service was deleted and a new service created.
Even then the error persisted. 
Cause

There was some corruption present in the controlfile.


Solution

In this case the problem was resolved by:

+ Taking a backup of the old control file

+ Recreating the control file using the following document
How to Recreate a Controlfile	 [Document 735106.1]

因为数据库不能mount,所以不能使用backup controlfile to trace;
因为是win系统,没有任何的控制文件备份,只能把控制文件拷贝到linux下面通过strings命令,自己编辑创建控制文件脚本(noresetlogs).执行脚本创建控制文件,recover database,应用redo文件恢复,然后resetlogs库,恢复成功(注意:8i中不需要另外增加临时文件)

双机mount数据库出现ORA-00600[kccsbck_first]

一朋友的数据库在做数据库恢复的时候,数据库不能启动到mount状态,检查发现
alert日志错误如下

Mon Aug 27 10:00:18 2012
ALTER DATABASE   MOUNT
Mon Aug 27 10:00:23 2012
Errors in file /oracle/admin/wf2009/udump/orcl_ora_7042.trc:
ORA-00600: internal error code, arguments: [kccsbck_first], [1], [1208656276], [], [], [], [], []
Mon Aug 27 10:00:23 2012
ORA-600 signalled during: ALTER DATABASE   MOUNT...

查询mos发现解释

The ORA-600 [kccsbck_first] error occurs when Oracle detects that another instance
has this database already mounted. For some reason, Oracle already sees a thread
with a heartbeat. This could be the expected behaviour if running OPS. In such a
case the parallel_server parameter needs to be set. In cases where Parallel Server
is not linked in, this is not the expected behaviour.

在非集群环境中,当该数据库已经在一个节点启动,然后另外一个节点再尝试启动数据库可能出现该错误.
检查环境发现是使用roseha的双机环境,当关闭当前节点的数据库时候,另外一个节点认为oracle down掉了,然后自动在该节点去尝试启动数据库,而当前操作节点去尝试mount数据库的时候发生该错误,因为该数据库的另外一个节点已经mount了.出现这样的情况,和朋友的存储资源的配置也有不合理之处,在接管资源之前,应该先分析和处理存储的挂载情况,而不是不卸载这边,另外一遍直接挂载(也就是存储两边都挂载)

解决办法
这个问题的本质就是因为ha的两边都启动了数据库导致,关闭一边的roseha或者关闭主机就可以了
在做数据库恢复的时候,尽量排查掉其他因素的影响,比如:rac在一个节点上操作,ha关闭双机软件等