双机mount数据库出现ORA-00600[kccsbck_first]

一朋友的数据库在做数据库恢复的时候,数据库不能启动到mount状态,检查发现
alert日志错误如下

Mon Aug 27 10:00:18 2012
ALTER DATABASE   MOUNT
Mon Aug 27 10:00:23 2012
Errors in file /oracle/admin/wf2009/udump/orcl_ora_7042.trc:
ORA-00600: internal error code, arguments: [kccsbck_first], [1], [1208656276], [], [], [], [], []
Mon Aug 27 10:00:23 2012
ORA-600 signalled during: ALTER DATABASE   MOUNT...

查询mos发现解释

The ORA-600 [kccsbck_first] error occurs when Oracle detects that another instance
has this database already mounted. For some reason, Oracle already sees a thread
with a heartbeat. This could be the expected behaviour if running OPS. In such a
case the parallel_server parameter needs to be set. In cases where Parallel Server
is not linked in, this is not the expected behaviour.

在非集群环境中,当该数据库已经在一个节点启动,然后另外一个节点再尝试启动数据库可能出现该错误.
检查环境发现是使用roseha的双机环境,当关闭当前节点的数据库时候,另外一个节点认为oracle down掉了,然后自动在该节点去尝试启动数据库,而当前操作节点去尝试mount数据库的时候发生该错误,因为该数据库的另外一个节点已经mount了.出现这样的情况,和朋友的存储资源的配置也有不合理之处,在接管资源之前,应该先分析和处理存储的挂载情况,而不是不卸载这边,另外一遍直接挂载(也就是存储两边都挂载)

解决办法
这个问题的本质就是因为ha的两边都启动了数据库导致,关闭一边的roseha或者关闭主机就可以了
在做数据库恢复的时候,尽量排查掉其他因素的影响,比如:rac在一个节点上操作,ha关闭双机软件等

通过odu验证rman backup对于truncate对象备份处理

rman backup 对于truncate和drop等相关操作的extent到底是怎么处理的,这里通过rman backup 结合odu证明出来,在较新版本的rman中,rman backup 并未完全的备份这些被认为不需要的extent.
创建模拟环境

SQL> select * from v$version;
 
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Prod
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

SQL> create tablespace xifenfei datafile '/u01/oracle/oradata/XFF/xifenfei01.dbf' 
   2 size 10m autoextend on maxsize 10g;

Tablespace created.

SQL> conn chf/xifenfei
Connected.

SQL> create table t_xifenfei tablespace xifenfei
  2  as
  3  select * from dba_objects;

Table created.

SQL> insert into t_xifenfei
  2  select * from dba_objects;

50055 rows created.

SQL> commit;

Commit complete.

SQL> select BYTES from dba_free_space where TABLESPACE_NAME='XIFENFEI';

     BYTES
----------
    983040

SQL> select BYTES from v$datafile where name='/u01/oracle/oradata/XFF/xifenfei01.dbf';

     BYTES
----------
  12582912

SQL> select 12582912-983040 from dual;

12582912-983040
---------------
       11599872

SQL> select object_id,data_object_id from dba_objects where object_name='T_XIFENFEI';

 OBJECT_ID DATA_OBJECT_ID
---------- --------------
     51833          51833

--这里我们得到信息有:
--1.dataobj#=51833
--2.使用数据文件空间为:11599872

rman备份no truncate table 数据文件

[oracle@xifenfei ~]$ rman target /

Recovery Manager: Release 10.2.0.4.0 - Production on Thu Dec 15 06:00:05 2011

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

connected to target database: XFF (DBID=3440302261)

RMAN> backup tablespace xifenfei format '/u01/oracle/oradata/tmp/no_truncate_xifenfei';

Starting backup at 15-DEC-11
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=158 devtype=DISK
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00005 name=/u01/oracle/oradata/XFF/xifenfei01.dbf
channel ORA_DISK_1: starting piece 1 at 15-DEC-11
channel ORA_DISK_1: finished piece 1 at 15-DEC-11
piece handle=/u01/oracle/oradata/tmp/no_truncate_xifenfei tag=TAG20111215T060343 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:03
Finished backup at 15-DEC-11

truncate table 操作

[oracle@xifenfei ~]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.4.0 - Production on Thu Dec 15 06:03:58 2011

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> truncate table chf.t_xifenfei;

Table truncated.

rman备份truncate table 数据文件

RMAN> backup tablespace xifenfei format '/u01/oracle/oradata/tmp/truncate_xifenfei';

Starting backup at 15-DEC-11
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=140 devtype=DISK
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00005 name=/u01/oracle/oradata/XFF/xifenfei01.dbf
channel ORA_DISK_1: starting piece 1 at 15-DEC-11
channel ORA_DISK_1: finished piece 1 at 15-DEC-11
piece handle=/u01/oracle/oradata/tmp/truncate_xifenfei tag=TAG20111215T060445 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 15-DEC-11

[root@xifenfei ~]# ls -l /u01/oracle/oradata/tmp/*_xifenfei
-rw-r----- 1 oracle oinstall 11640832 Dec 15 06:03 /u01/oracle/oradata/tmp/no_truncate_xifenfei
-rw-r----- 1 oracle oinstall   630784 Dec 15 06:04 /u01/oracle/oradata/tmp/truncate_xifenfei

odu挖rman备份前数据文件

ODU> unload dict
CLUSTER C_USER# file_no: 1 block_no: 89
TABLE OBJ$ file_no: 1 block_no: 121
CLUSTER C_OBJ# file_no: 1 block_no: 25
CLUSTER C_OBJ# file_no: 1 block_no: 25
found IND$'s obj# 19
found IND$'s dataobj#:2,ts#:0,file#:1,block#:25,tab#:3
found TABPART$'s obj# 266
found TABPART$'s dataobj#:266,ts#:0,file#:1,block#:2121,tab#:0
found INDPART$'s obj# 271
found INDPART$'s dataobj#:271,ts#:0,file#:1,block#:2161,tab#:0
found TABSUBPART$'s obj# 278
found TABSUBPART$'s dataobj#:278,ts#:0,file#:1,block#:2217,tab#:0
found INDSUBPART$'s obj# 283
found INDSUBPART$'s dataobj#:283,ts#:0,file#:1,block#:2257,tab#:0
found IND$'s obj# 19
found IND$'s dataobj#:2,ts#:0,file#:1,block#:25,tab#:3
found LOB$'s obj# 151
found LOB$'s dataobj#:2,ts#:0,file#:1,block#:25,tab#:6
found LOBFRAG$'s obj# 299
found LOBFRAG$'s dataobj#:299,ts#:0,file#:1,block#:2393,tab#:0

ODU> scan extent tablespace 6

scan extent start: 2011-12-15 06:12:28
scanning extent...
scanning extent finished.
scan extent completed: 2011-12-15 06:12:28

ODU> unload table chf.t_xifenfei object 51833 

Unloading table: T_XIFENFEI,object ID: 51833
Unloading segment,storage(Obj#=51833 DataObj#=51833 TS#=6 File#=5 Block#=11 Cluster=0)
100110 rows unloaded
--这里可以看到odu全部找到被truncate掉的记录条数

使用rman 备份后数据文件

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> exit
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
[oracle@xifenfei odu]$ rm /u01/oracle/oradata/XFF/xifenfei01.dbf
[oracle@xifenfei odu]$ rman target /

Recovery Manager: Release 10.2.0.4.0 - Production on Thu Dec 15 06:14:00 2011

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

connected to target database (not started)

RMAN> startup mount;

Oracle instance started
database mounted

Total System Global Area     318767104 bytes

Fixed Size                     1267236 bytes
Variable Size                104860124 bytes
Database Buffers             205520896 bytes
Redo Buffers                   7118848 bytes

RMAN> restore datafile 5;

Starting restore at 15-DEC-11
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=157 devtype=DISK

channel ORA_DISK_1: starting datafile backupset restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
restoring datafile 00005 to /u01/oracle/oradata/XFF/xifenfei01.dbf
channel ORA_DISK_1: reading from backup piece /u01/oracle/oradata/tmp/truncate_xifenfei
channel ORA_DISK_1: restored backup piece 1
piece handle=/u01/oracle/oradata/tmp/truncate_xifenfei tag=TAG20111215T060445
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
Finished restore at 15-DEC-11

odu挖rman还原后数据文件

ODU> scan extent tablespace 6

scan extent start: 2011-12-15 06:14:43
scanning extent...
scanning extent finished.
scan extent completed: 2011-12-15 06:14:43

ODU>  unload table chf.t_xifenfei object 51833 

Unloading table: T_XIFENFEI,object ID: 51833
Unloading segment,storage(Obj#=51833 DataObj#=51833 TS#=6 File#=5 Block#=11 Cluster=0)
4774 rows unloaded
--odu只找到极少数数据4774/100110

通过odu挖rman备份前和备份后的数据文件,得知rman backup备份的过程,对绝大多数truncate的表的原始数据未正常备份(为什么是绝大多数,我无法给出解释),这里也可以看出rman backup并非是真正意义上的完全物理上复制(和rman copy还是有区别,copy不能完全被取代)

rman备份对各种数据块操作

有不少人对于rman的backup功能,到底备份数据文件的什么级别,一直有着不明确的说法,我这里以10.2.0.4版本的rman backup 测试,进行一个简单的说明.这里提供的是一种思路.如果你在实际工作中,遇到一些rman到底会不会备份相关数据块的时候,可以通过类此的试验来证明你的版本的rman的功能.
模拟环境

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Prod
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

SQL> create tablespace xifenfei datafile '/u01/oracle/oradata/XFF/xifenfei01.dbf' 
   2 size 10m autoextend on next 10m maxsize 30g;

Tablespace created.

备份空数据文件

SQL> select BYTES from v$datafile where name='/u01/oracle/oradata/XFF/xifenfei01.dbf';

     BYTES
----------
  10485760

SQL> select BYTES from dba_free_space where TABLESPACE_NAME='XIFENFEI';

     BYTES
----------
  10420224

SQL> SELECT 10485760-10420224 FROM DUAL;

10485760-10420224
-----------------
            65536

RMAN> backup tablespace xifenfei format '/u01/oracle/oradata/tmp/no_table_xifenfei';

[root@xifenfei tmp]# ls -l no_table_xifenfei
-rw-r----- 1 oracle oinstall 106496 Dec 15 01:03 no_table_xifenfei

从这里可以看出来rman备份的时候,数据文件中未格式化的块并没有备份(数据文件10m,备份集只有106k左右,比文件实际使用的65536b稍微大点)

备份create表数据文件

SQL> create table t_rman tablespace xifenfei
  2  as
  3  select * from chf.t_xifenfei1;

Table created.

SQL> select BYTES from dba_free_space where TABLESPACE_NAME='XIFENFEI';

     BYTES
----------
   9371648

SQL> select BYTES from v$datafile where name='/u01/oracle/oradata/XFF/xifenfei01.dbf';

     BYTES
----------
  20971520

SQL> select 20971520-9371648 from dual;

20971520-9371648
----------------
        11599872

RMAN> backup tablespace xifenfei format '/u01/oracle/oradata/tmp/crt_table_xifenfei';

[root@xifenfei ~]# ls -l /u01/oracle/oradata/tmp/crt_table_xifenfei
-rw-r----- 1 oracle oinstall 11608064 Dec 15 01:29 /u01/oracle/oradata/tmp/crt_table_xifenfei

这里可以得出结论,rman的备份集大小可以从一定程度上近似等于数据文件使用空间大小

备份truncate表数据文件

SQL> truncate table t_rman;

Table truncated.

SQL> SELECT 20840448-9371648 from dual;

20840448-9371648
----------------
        11468800

SQL> select 20971520-20840448 from dual;

20971520-20840448
-----------------
           131072

RMAN>  backup tablespace xifenfei format '/u01/oracle/oradata/tmp/truncate_table_xifenfei';

[root@xifenfei ~]# ls -l /u01/oracle/oradata/tmp/truncate_table_xifenfei
-rw-r----- 1 oracle oinstall 630784 Dec 15 01:30 /u01/oracle/oradata/tmp/truncate_table_xifenfei

通过这里可以看出来,truncate 对象后,数据文件释放了对象空间,rman备份集也同样未备份这部分空间

备份insert表数据文件

SQL> insert into t_rman  select * from chf.t_xifenfei1;

100062 rows created.

SQL> commit;

Commit complete.

SQL> alter system checkpoint;

System altered.

SQL> select BYTES from v$datafile where name='/u01/oracle/oradata/XFF/xifenfei01.dbf';

     BYTES
----------
  20971520

SQL> select BYTES from dba_free_space where TABLESPACE_NAME='XIFENFEI';

     BYTES
----------
   9371648

SQL>  select 20971520 - 9371648 from dual;

20971520-9371648
----------------
        11599872

RMAN> backup tablespace xifenfei format '/u01/oracle/oradata/tmp/insert_table_xifenfei';

[root@xifenfei ~]# ls -l /u01/oracle/oradata/tmp/insert_table_xifenfei
-rw-r----- 1 oracle oinstall 11640832 Dec 15 02:19 /u01/oracle/oradata/tmp/insert_table_xifenfei

和直接创建表的出来结论相似

备份delete表数据文件

SQL> delete from t_rman;

100062 rows deleted.

SQL> commit;

Commit complete.

SQL>  alter system checkpoint;

System altered.

SQL> select BYTES from v$datafile where name='/u01/oracle/oradata/XFF/xifenfei01.dbf';

     BYTES
----------
  20971520

SQL>  select BYTES from dba_free_space where TABLESPACE_NAME='XIFENFEI';

     BYTES
----------
   9371648

SQL> select 20971520 - 9371648 from dual;

20971520-9371648
----------------
        11599872

RMAN> backup tablespace xifenfei format '/u01/oracle/oradata/tmp/delete_table_xifenfei';

[root@xifenfei ~]# ls -l /u01/oracle/oradata/tmp/delete_table_xifenfei
-rw-r----- 1 oracle oinstall 11640832 Dec 15 02:45 /u01/oracle/oradata/tmp/delete_table_xifenfei

这里是直接delete数据,产生了明显的高水位现象(高水位之下部分无数据),但是rman备份,还是会备份高水位之下的所有数据

备份drop表数据文件

SQL> drop table t_rman;

Table dropped.

SQL> select BYTES from v$datafile where name='/u01/oracle/oradata/XFF/xifenfei01.dbf';

     BYTES
----------
  20971520

SQL> select BYTES from v$datafile where name='/u01/oracle/oradata/XFF/xifenfei01.dbf';

     BYTES
----------
  20971520
SQL> select sum(bytes) from  dba_free_space where TABLESPACE_NAME='XIFENFEI';

SUM(BYTES)
----------
  20905984

SQL> select 20971520-20905984 from dual;

20971520-20905984
-----------------
            65536

RMAN> backup tablespace xifenfei format '/u01/oracle/oradata/tmp/drop_table_xifenfei';

[root@xifenfei ~]# ls -l /u01/oracle/oradata/tmp/drop_table_xifenfei
-rw-r----- 1 oracle oinstall 11640832 Dec 15 02:51 /u01/oracle/oradata/tmp/drop_table_xifenfei

在10g中,因为默认使用回收站功能,对象还存在回收站中,rman为了使得还原出来的数据库可以继续使用回收站中相应的表的闪回功能,所以也会备份回收站中数据

备份purge表数据文件

SQL> select OBJECT_NAME,ORIGINAL_NAME from user_recyclebin;

OBJECT_NAME                    ORIGINAL_NAME
------------------------------ --------------------------------
BIN$tBHa31bTe3jgQKjACgEImw==$0 T_RMAN

SQL> purge table "BIN$tBHa31bTe3jgQKjACgEImw==$0";

Table purged.

RMAN> backup tablespace xifenfei format '/u01/oracle/oradata/tmp/PURGE_table_xifenfei';

[root@xifenfei ~]# ls -l /u01/oracle/oradata/tmp/PURGE_table_xifenfei
-rw-r----- 1 oracle oinstall 106496 Dec 15 03:08 /u01/oracle/oradata/tmp/PURGE_table_xifenfei

可以看到purge表之后,其实效果类此truncate(当然truncate做的工作更多),rman备份集大小和无数据对象时相同,结合drop和purge也可以知道在删除大对象或者比较多对象而且又确定不再需要,且有rman备份,这个时候建议直接加上purge.

各个备份集汇总

[root@xifenfei tmp]# ll *table_xifenfei
-rw-r----- 1 oracle oinstall 11608064 Dec 15 01:29 crt_table_xifenfei
-rw-r----- 1 oracle oinstall 11640832 Dec 15 02:45 delete_table_xifenfei
-rw-r----- 1 oracle oinstall 11640832 Dec 15 02:51 drop_table_xifenfei
-rw-r----- 1 oracle oinstall 11640832 Dec 15 02:19 insert_table_xifenfei
-rw-r----- 1 oracle oinstall   106496 Dec 15 01:03 no_table_xifenfei
-rw-r----- 1 oracle oinstall   106496 Dec 15 03:08 PURGE_table_xifenfei
-rw-r----- 1 oracle oinstall   630784 Dec 15 01:30 truncate_table_xifenfei

rman的备份功能本身就是在不断的增强,不同的版本会有不同的结果,最明显的就是在9i版本会备份truncate的数据.

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
PL/SQL Release 9.2.0.4.0 - Production
CORE    9.2.0.3.0       Production
TNS for Linux: Version 9.2.0.4.0 - Production
NLSRTL Version 9.2.0.4.0 - Production

SQL> create tablespace xifenfei datafile
  2  '/u01/oracle/oradata/xifenfei/xifenfei01.dbf' size 10m autoextend on next 10m maxsize 10240m;

Tablespace created.

SQL> create table t_xifenfei tablespace xifenfei
  2  as
  3  select * from dba_objects;

Table created.

SQL> insert into t_xifenfei
  2  select * from dba_objects;

30803 rows created.

SQL> commit;

Commit complete.

SQL> alter system checkpoint;

System altered.

RMAN> backup tablespace xifenfei format '/tmp/no_truncate_xifenfei';

SQL> truncate table t_xifenfei;

Table truncated.

[oracle@xifenfei ~]$ ls -l /tmp/*truncate_xifenfei
-rw-r-----  1 oracle oinstall 7004160 Aug 26 22:52 /tmp/no_truncate_xifenfei
-rw-r-----  1 oracle oinstall 7004160 Aug 26 22:53 /tmp/truncate_xifenfei

11GR2 Control file enqueue hold time tracking dump

如果你比较心细,可能在11.2的数据库中发现alert文件中存在存在类此下面的记录

Errors in file /oradb/diag/rdbms/offon/offon2/trace/offon2_ckpt_19660878.trc:

查看trace文件发现

*** 2012-08-01 03:36:03.520
  1: 1450ms (rw) file: kct.c line: 1011 count: 1 total: 1450ms time: 1594117
  2: 890ms (rw) file: kcrf.c line: 10012 count: 6 total: 4266ms time: 1820928
  3: 830ms (ro) file: kcf.c line: 5306 count: 1 total: 830ms time: 1594116
  4: 530ms (rw) file: kcv.c line: 11783 count: 1 total: 530ms time: 3207607
Control file enqueue hold time tracking dump at time: 3376956

*** 2012-08-03 02:14:38.714
  1: 1450ms (rw) file: kct.c line: 1011 count: 1 total: 1450ms time: 1594117
  2: 890ms (rw) file: kcrf.c line: 10012 count: 7 total: 4953ms time: 1820928
  3: 830ms (ro) file: kcf.c line: 5306 count: 1 total: 830ms time: 1594116
  4: 530ms (rw) file: kcv.c line: 11783 count: 1 total: 530ms time: 3207607
Control file enqueue hold time tracking dump at time: 3384212

这个类此我们在10g中看到的lgwr的警告类此,见Warning: log write time 560ms, size 3KB,其实也就是oracle对lgwr进程写入日志慢的时候的一个trace功能记录下来的.
我们这里遇到的是因为oracle对ckpt进程的trace,当control file enqueue holding time tracking size超过10的时候,就会记录到trace文件中,oracle 内部文档对于该问题的一些描述如下:

About the issue, this is the expected behavior on 11.2. 

New controlfile enqueue hold time tracking statistics have been added in 11.2 to 
aid diagnosis of controlfile transaction related performance related issues:

Control File Enqueue AWR Statistics:

* max cf enq hold time - The maximum amount of time in milliseconds a client has held the control file enqueue.
* total cf enq hold time - The total amount of time in milliseconds all clients have held the control file enqueue.
* total number of cf enq holders - The total number of times clients have held the control file enqueue.

Periodically, the CKPT process dumps statistics for the top N control file enqueue holders. 
N defaults to 10, but can be modified with the static hidden parameter:
_controlfile_enqueue_holding_time_tracking_size.The dump looks like the following:

Preface: "Control file enqueue hold time tracking dump at time: [relative time]".

* a. Time the client has held the control file enqueue.
* b. Type of client's control file enqueue transaction - rw or ro.
* c. File name where the client obtained control file enqueue.
* d. Line number where the client obtained control file enqueue.
* e. Number of times the client has held the control file enqueue since it became a member of the top N.
* f. Total time the client has held the control file in all those times from [e].
* g. Relative time the client obtained the control file enqueue from [a].

查询数据库默认值

SQL> col value for a24
SQL> col description for a70
SQL> set linesize 180
SQL> select a.ksppinm name,b.ksppstvl value,a.ksppdesc description
  2    from x$ksppi a,x$ksppcv b
  3    where a.inst_id = USERENV ('Instance')
  4     and b.inst_id = USERENV ('Instance')
  5     and a.indx = b.indx
  6     and upper(a.ksppinm) LIKE upper('%&param%')
  7  order by name
  8  /
Enter value for param: _controlfile_enqueue_holding_time_tracking_size
old   6:    and upper(a.ksppinm) LIKE upper('%&param%')
new   6:    and upper(a.ksppinm) LIKE upper('%_controlfile_enqueue_holding_time_tracking_size%')

NAME                                               VALUE    DESCRIPTION
-------------------------------------------------- -------- ------------------------------------------------
_controlfile_enqueue_holding_time_tracking_size    10       control file enqueue holding time tracking size

虽然在alert日志中使用Error的形式显示该错误,但是这只是一个oracle作为诊断数据库Control File Enqueue性能的一个依据,大部分情况下,我们可以选择忽略或者关闭该诊断功能.屏蔽该提示方法如下:

The way to shut off is set _controlefile_enqueue_holding_time_tracking_size = 0 then restart the database

-- spfile
alter system set "_controlfile_enqueue_holding_time_tracking_size"=0 scope=spfile;

-- pfile
_controlfile_enqueue_holding_time_tracking_size=0

Restart database 

ORA-609 TNS-12537 and TNS-12547 in 11g Alert.log

数据库alert日志出现如下错误

Fatal NI connect error 12537, connecting to:
 (LOCAL=NO)

  VERSION INFORMATION:
        TNS for IBM/AIX RISC System/6000: Version 11.2.0.2.0 - Production
        TCP/IP NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.2.0 - Production
        Oracle Bequeath NT Protocol Adapter for IBM/AIX RISC System/6000: Version 11.2.0.2.0 - Production
  Time: 21-AUG-2012 09:50:15
  Tracing not turned on.
  Tns error struct:
    ns main err code: 12537

TNS-12537: TNS:connection closed
    ns secondary err code: 12560
    nt main err code: 0
    nt secondary err code: 0
    nt OS err code: 0
opiodr aborting process unknown ospid (15204768) as a result of ORA-609

错误的原因

The ORA-609 error is thrown when a client connection of any kind failed to complete or 
aborted the connection process before the connection/authentication process was complete.

Very often, this connection abort is due to a timeout.  Beginning with 10gR2, a default value for 
inbound connect timeout has been set at 60 seconds.  
This time limit is often inadequate for the entire connection process to complete.    

We have also discovered that the ORA-609 occurs frequently in installations 
where the database is monitored by DB Console and the Enterprise Manager agent (emagent).
After the DB Console is started and as a matter of routine, the emagent will repeatedly try to 
connect to the target instances.  
We can see frequent emagent connections in the listener.log without error.  
However, on occasion it may have failed to complete the connection process at the database so an ORA-609 is thrown.  
The emagent will simply retry the connection and may be successful on the subsequent try.  
(Provided there is no real fault occurring at the listener or database).  
This temporary failure to connect will not be reported back to DB Console and there will be no indication,
 except for the ORA-609, that a fault occurred.

出现这个问题的主要原因是因为从10.2开始inbound connect timeout默认为60 seconds,而在很多建立连接过程可能超过这个时间从而出现类此错误,常见的诱因是DB Console 和 Enterprise Manager agent (emagent). EM会重复的尝试连接到数据库。其过程中会偶尔的出现连接超时的问题,但是接下来会继续尝试,并获得成功。这种临时的失败不会导致EM报错而只会以ora-609的形式记录在alert log中.

处理方法

For that reason, we often recommend increasing the values for INBOUND_CONNECT_TIMEOUT at both listener 
and server side sqlnet.ora file as a preventive measure.  
If the problem  is due to connection timeouts, an increase in the following parameters should eliminate 
or reduce the occurrence of the ORA-609s.

e.g.
Sqlnet.ora: SQLNET.INBOUND_CONNECT_TIMEOUT=180
Listener.ora: INBOUND_CONNECT_TIMEOUT_listener_name=120

These settings are in seconds.  Again, the default is 60.

问题跟踪方法

If the issue persists and inbound connect does not have any effect, the following steps are intended to 
help locate  the client that may be causing the errors.

1)  Suppress the TNS errors in the alert.log by setting the following listener.ora file parameter:  

DIAG_ADR_ENABLED_listener_name=OFF

This will cause the TNS errors to be posted to the ORACLE_HOME/network/log/sqlnet.log file that is 
local to the database and 
may yield useful information about the client's address.

For example, here's a snippet from a server side sqlnet.log where client address info was posted:

Production Time: 15-FEB-2010 07:15:01

Fatal NI connect error 12537, connecting to:
(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(Host=yourhost)(Port=1521))(CONNECT_DATA=(SID=PROD1DR)
(CID=(PROGRAM=sqlplus)(HOST=client_host)(USER=client))))

Observe the PROGRAM and HOST fields on the last line.  This is where the connection originated.
Be sure to match timestamps in the sqlnet.log with the timestamps of the alert.log errors.  
Once you've located the offending client, 
you can enable client tracing to try and determine the cause:

TRACE_LEVEL_CLIENT=16
TRACE_DIRECTORY_CLIENT=<dir location>
TRACE_TIMESTAMP_CLIENT=TRUE
DIAG_ADR_ENABLED=off   <<<<<11g or newer client requirement

If you need assistance with client or server tracing, please open an SR with Global Customer Support.


2)  Check the listener.log for client connections that were logged at timestamps that match the ORA-609 
timestamps as they appear in the alert.log.  
The client information is recorded in each listener.log entry.  
Since this error occurs AFTER the listener has handled the connection, 
do not expect to see errors in the listener.log.

Here's an example snippet of an incoming client connection that was posted to the listener.log:



20-JAN-2009 17:08:45 (CONNECT_DATA=(SID=orcl)(CID=
(PROGRAM=D:\oracle\product\10.1.0\Db_1\perl\5.6.1\bin\MSWin32-x86\perl.exe)(HOST=myclient)

Note that the exact timestamp, program name and client host will often be recorded.  Again, 
once you've located the offending client, 
enable tracing (see above) to try to capture the connection failure.

3)  Enable server side Oracle Net tracing and capture the TNS error along with the incoming connection.
Match the PID that accompanies the ORA-609 to the server trace label.  e.g.  

ORA-609 : opiodr aborting process unknown ospid (4799_1)  *Note the PID

This PID would correspond to server trace labeled:  svr_4799.trc.  Check the server trace for either 
TNS error (the 609 will not appear) 
and try to locate the originating client address.  If assistance is needed for this investigation, 
please open an SR with Oracle Support.

See below for instuctions on enabling Oracle Net server tracing.

The following details the discovery of the source of an ORA-609 for a real case:

The alert.log reports the following messages intermittently but frequently:

Mon Nov 16 22:39:22 2009
ORA-609 : opiodr aborting process unknown ospid (nnnn)

Enabled Oracle Net server tracing:

TRACE_LEVEL_SERVER=16
TRACE_DIRECTORY_SERVER=<dir location>
TRACE_TIMESTAMP_SERVER=TRUE
DIAG_ADR_ENABLED=off

Reloaded listener and wait for error to appear again.:


ORA-609 : opiodr aborting process unknown ospid (5233_1)

Note that the server trace file set that corresponded to this event was named svr_5233*.trc.
Of course the timestamps of the alert.log event and the server trace creation matched as well.

A review of the server trace showed only an EOF failure and the  TNS-12537 error:


Read unexpected EOF ERROR 
nserror: nsres: id=0, op=68, ns=12537

In this particular case, there was no information about the client in the trace. 
This is atypical for a server trace.   
It may be that the client aborted before all the client information was posted to the file.  
However, there was post in the listener.log f
or an emagent connection that was established at the same point in time.

Here's an excerpt from a listener.log entry where an emagent establishes a connection:

PROGRAM=D:\oracle\product\10.1.0\Db_1\bin\emagent.exe)

Checked the EM Agent traces and logs and discovered the following entry:

Fatal NI connect error 12547, connecting to:
(LOCAL=NO)

VERSION INFORMATION:
TNS for Solaris: Version 11.1.0.7.0 - Production
Oracle Bequeath NT Protocol Adapter for Solaris: Version 11.1.0.7.0 - Production
TCP/IP NT Protocol Adapter for Solaris: Version 11.1.0.7.0 - Production
Time: 16-NOV-2009 22:39:22

****Tracing to file: /backup/sid_traces/sqlnetlog/svr_5233.trc 

Tns error struct:

ns main err code: 12547
TNS-12547: TNS:lost contact
ns secondary err code: 12560
nt main err code: 0
nt secondary err code: 0
nt OS err code: 0

****Note the name of the server trace which contains the PID:  svr_5233.trc 
Also, the timestamp of the agent event matches the timestamp of the alert.log error.



Check the following locations for EM Agent traces. If working with support on this issue and 
the EM Agent is suspected, upload ALL files under:

$ORACLE_HOME/sysman/log/emagent.trc < Single node agent trace location
$ORACLE_HOME/host/sysman/log/emagent.trc < RAC agent trace location

It was determined that in this case, the emagent was aborting the connection 
before it was complete and then simply reconnecting 
and succeeding on the subsequent try.  No errors were reported in the listener log or listener trace. 
No errors were returned to the DB Console.
There was no apparent outage of any kind.  No action was taken to correct the ORA-609 in this case.  
It was decided that the message was informational and completely benign. 

参考文档:
ORA-609 TNS-12537 and TNS-12547 in 11g Alert.log (Doc ID 1116960.1)
Troubleshooting Guide ORA-609 : Opiodr aborting process unknown ospid (Doc ID 1121357.1)