未收集统计信息对象—执行sql动态采样

在一次ORA-7445导致oracle数据库down掉故障分析中,发现一条类似的sql非常大(通过复制到文档确定该sql大小是5M左右)

SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) 
opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE 
*/ NVL(SUM(C1),0), NVL(SUM(C2),0) FROM (SELECT /*+ IGNORE_WHERE_CLAUSE NO_PARALLEL("DCREDITMSG_00")
 FULL("DCREDITMSG_00") NO_PARALLEL_INDEX("DCREDITMSG_00") */ 1 AS C1, CASE WHEN
 "DCREDITMSG_00"."PHONE_NO"='具体电话号码' OR "DCREDITMSG_00"."PHONE_NO"='具体电话号码' OR 
……………………N多OR "DCREDITMSG_00"."PHONE_NO"='具体电话号码'
"DCREDITMSG_00"."PHONE_NO"='具体电话号码' THEN 1 ELSE 0 END AS C2 FROM 
"BSSADMIN"."DCREDITMSG_00" SAMPLE BLOCK (0.032410 , 1) SEED (1) "DCREDITMSG_00") SAMPLESUB

当时该sql因某种原因导致大量的sql area中很多内存泄露,最终导致数据库down掉.通过实验找出类此奇怪SQL.

创建模拟表

SQL> create table t_xifenfei 
  2  as
  3  select  * from dba_objects;

Table created.

SQL> select count(*) from t_xifenfei;

  COUNT(*)
----------
     74605

SQL> select NUM_ROWS,LAST_ANALYZED from dba_tables 
  2  where  table_name='T_XIFENFEI' and owner='CHF';

  NUM_ROWS LAST_ANALYZE
---------- ------------

得出信息:
1.该表一共有记录数74605条
2.该表未收集统计信息

查看执行计划

SQL>  set autotrace trace exp
SQL> select /*+ dynamic_sampling(t 0) */ * from t_xifenfei t;

Execution Plan
----------------------------------------------------------
Plan hash value: 548923532

--------------------------------------------------------------------------------
| Id  | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            | 88868 |    17M|   299   (2)| 00:00:04 |
|   1 |  TABLE ACCESS FULL| T_XIFENFEI | 88868 |    17M|   299   (2)| 00:00:04 |
--------------------------------------------------------------------------------
--通过hint指定动态采样sql相关对象统计信息,
可以看到我们实际的表记录是74605而数据库采样出来的记录为88868,原则上还是可以接受

SQL> select * from t_xifenfei;

Execution Plan
----------------------------------------------------------
Plan hash value: 548923532

--------------------------------------------------------------------------------
| Id  | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            | 73449 |    14M|   298   (1)| 00:00:04 |
|   1 |  TABLE ACCESS FULL| T_XIFENFEI | 73449 |    14M|   298   (1)| 00:00:04 |
--------------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)
--因为没有统计信息,数据库动态采样sql相关对象统计信息
可以看到我们实际的表记录是74605而数据库采样出来的记录为73449,比手工指定采样准确

对自动采样进行10046跟踪

SQL> conn / as sysdba
Connected.
SQL> oradebug  setmypid
Statement processed.
SQL> oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12
Statement processed.
SQL> select count(*) from CHF.t_xifenfei;

  COUNT(*)
----------
     74605

SQL>  oradebug TRACEFILE_NAME
/u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_27967.trc

分析trace文件

*** 2012-07-12 15:42:34.991
WAIT #0:nam='SQL*Net message from client'ela= 56716427 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1342078954991525
=====================
PARSING IN CURSOR #3063864268 len=404 dep=1 uid=0 oct=3 lid=0 tim=1342078955037387 
hv=4184780033 ad='385d3708' sqlid='3gjvvxzwqxb81'
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) 
opt_param('parallel_execution_enabled', 'false') 
NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),0), NVL(SUM(C2),0) FROM 
(SELECT /*+ NO_PARALLEL("T_XIFENFEI") FULL("T_XIFENFEI") NO_PARALLEL_INDEX("T_XIFENFEI") */ 1 AS C1, 
1 AS C2 FROM "CHF"."T_XIFENFEI" SAMPLE BLOCK (5.790441 , 1) SEED (1) "T_XIFENFEI") SAMPLESUB
END OF STMT
PARSE #3063864268:c=6000,e=5404,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=205916192,tim=1342078955037303
EXEC #3063864268:c=0,e=206,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=205916192,tim=1342078955037901
FETCH #3063864268:c=4998,e=4759,p=0,cr=65,cu=0,mis=0,r=1,dep=1,og=1,plh=205916192,tim=1342078955042730
STAT #3063864268 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=65 pr=0 pw=0 time=4795 us)'
STAT #3063864268 id=2 cnt=4253 pid=1 pos=1 obj=76370 op='TABLE ACCESS SAMPLE T_XIFENFEI 
(cr=65 pr=0 pw=0 time=8247 us cost=19 size=61752 card=5146)'
CLOSE #3063864268:c=0,e=7,dep=1,type=0,tim=1342078955043024
=====================
PARSING IN CURSOR #3063864784 len=35 dep=0 uid=0 oct=3 lid=0 tim=1342078955043465 hv=2174183953 ad='3ed2d700' 
sqlid='fadutqq0tfuhj'
select count(*) from CHF.t_xifenfei
END OF STMT
PARSE #3063864784:c=51991,e=51648,p=0,cr=66,cu=0,mis=1,r=0,dep=0,og=1,plh=2715729601,tim=1342078955043464
EXEC #3063864784:c=0,e=75,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2715729601,tim=1342078955043645
WAIT #3063864784: nam='SQL*Net message to client' ela= 3 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1342078955043705

通过这个trace的分析,果然发现在执行我们需要的sql前,执行了SELECT /* OPT_DYN_SAMP */组成的一个复杂的采样sql语句.

收集统计信息查看执行计划

SQL> EXEC DBMS_STATS.gather_table_stats('CHF','T_XIFENFEI');

PL/SQL procedure successfully completed.

SQL> select NUM_ROWS,LAST_ANALYZED from dba_tables 
  2  where  table_name='T_XIFENFEI' and owner='CHF';

  NUM_ROWS LAST_ANALYZE
---------- ------------
     74605 12-JUL-12

SQL>  set autotrace trace exp
SQL> select * from t_xifenfei;

Execution Plan
----------------------------------------------------------
Plan hash value: 548923532

--------------------------------------------------------------------------------
| Id  | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            | 74605 |  7139K|   298   (1)| 00:00:04 |
|   1 |  TABLE ACCESS FULL| T_XIFENFEI | 74605 |  7139K|   298   (1)| 00:00:04 |
--------------------------------------------------------------------------------
--执行计划未提示dynamic sampling

继续做10046

SQL> conn / as sysdba
Connected.
SQL> oradebug  setmypid
Statement processed.
SQL> oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12
Statement processed.
SQL> select count(*) from CHF.t_xifenfei;

  COUNT(*)
----------
     74605

SQL> oradebug TRACEFILE_NAME
/u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_29780.trc

分析trace文件

*** 2012-07-12 16:14:53.914
Oradebug command 'EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12' console output: <none>
WAIT #0: nam='SQL*Net message to client' ela= 8 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1342080893914307

*** 2012-07-12 16:14:59.376
WAIT #0: nam='SQL*Net message from client' ela= 5461608 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1342080899376008
=====================
PARSING IN CURSOR #3063709248 len=35 dep=0 uid=0 oct=3 lid=0 tim=1342080899379562 hv=2174183953 
ad='3ed2d700' sqlid='fadutqq0tfuhj'
select count(*) from CHF.t_xifenfei
END OF STMT

通过这里可以发现,当有了统计信息后,数据库不能再使用那条N多hint的sql去动态采样统计信息.

总结建议
动态采样(Dynamic Sampling)技术的最初提出是在Oracle 9i R2,在段(表,索引,分区)没有分析的情况下,为了使CBO 优化器得到足够的信息以保证做出正确的执行计划而发明的一种技术,可以把它看做分析手段的一种补充。当段对象没有统计信息时(即没有做分析),动态采样技术可以通过直接从需要分析的对象上收集数据块(采样)来获得CBO需要的统计信息。为了cbo,oracle引进了该功能,原则上说是一个很不错的东西,但是偶尔也是会出现一些意外,所以如果发现数据库中有表未做统计分析,建议手工处理下,ORACLE的自动收集统计信息程序也有不靠谱的时候(发现多次10g的库中有部分表未收集统计信息)

Oracle 11g增加列,并带默认值的新特性

在11g以前,如果要在一个大表中增加一列,并设置默认值,那将是一个非常悲剧的事情.有些时候不得不选择在线重定义功能来实现该需求.而在11g中增加新列并设置默认值,只是简单的修改数据字典来实现该功能,大大提供效率
10g加列(默认值)

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod
PL/SQL Release 10.2.0.1.0 - Production
CORE    10.2.0.1.0      Production
TNS for Linux: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production

SQL> create table t_xifenfei 
  2  as select object_id,object_name from dba_objects;

Table created.

SQL> select count(*) from t_xifenfei;

  COUNT(*)
----------
     49827

SQL> desc t_xifenfei
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 OBJECT_ID                                          NUMBER
 OBJECT_NAME                                        VARCHAR2(128)

SQL> set timing on                                                                      
SQL> alter table t_xifenfei add c_xff varchar2(100) default 'www.orasos.com' not null;

Table altered.

Elapsed: 00:00:06.13
--使用了6秒钟

SQL> select   rowid,
  2   dbms_rowid.rowid_relative_fno(rowid)rel_fno,
  3   dbms_rowid.rowid_block_number(rowid)blockno,
  4   dbms_rowid.rowid_row_number(rowid) rowno
  5  from t_xifenfei where object_name='OBJ$';

ROWID                 REL_FNO    BLOCKNO      ROWNO
------------------ ---------- ---------- ----------
AAAMwJAAEAAAAB8AAr          4        124         43

SQL> alter system dump datafile 4 block 124;

System altered.

Elapsed: 00:00:00.08

11g增加列(默认值)

SQL> select * from v$version;

BANNER
-----------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production

SQL> create table t_xifenfei 
  2  as select object_id,object_name from dba_objects;

Table created.

SQL> select count(*) from t_xifenfei;

  COUNT(*)
----------
     74605

SQL> select   rowid,
  2   dbms_rowid.rowid_relative_fno(rowid)rel_fno,
  3   dbms_rowid.rowid_block_number(rowid)blockno,
  4   dbms_rowid.rowid_row_number(rowid) rowno
  5  from t_xifenfei where object_name='OBJ$';

ROWID                 REL_FNO    BLOCKNO      ROWNO
------------------ ---------- ---------- ----------
AAASpRAAEAAAACrAAu          4        171         46

SQL> alter system dump datafile 4 block 171;

System altered.


SQL> set timing on
SQL> alter table t_xifenfei add c_xff varchar2(100) default 'www.orasos.com' not null;

Table altered.

Elapsed: 00:00:00.19
--只是使用了0.19秒

SQL> select   rowid,
  2   dbms_rowid.rowid_relative_fno(rowid)rel_fno,
  3   dbms_rowid.rowid_block_number(rowid)blockno,
  4   dbms_rowid.rowid_row_number(rowid) rowno
  5  from t_xifenfei where object_name='OBJ$';

ROWID                 REL_FNO    BLOCKNO      ROWNO
------------------ ---------- ---------- ----------
AAASpRAAEAAAACrAAu          4        171         46

Elapsed: 00:00:00.04
SQL> alter system dump datafile 4 block 171;

System altered.

通过10g和11g的增加相同列和默认值的对比可以发现,11g的速度要比10g快很多很多,下面我们通过上面dump出来相关的数据块来分析原因
dump分析
11g增加列之前dump

tab 0, row 0, @0x1f74
tl: 12 fb: --H-FL-- lb: 0x0  cc: 2
col  0: [ 2]  c1 15
col  1: [ 5]  49 43 4f 4c 24
tab 0, row 1, @0x1f66
tl: 14 fb: --H-FL-- lb: 0x0  cc: 2
col  0: [ 2]  c1 2f
col  1: [ 7]  49 5f 55 53 45 52 31
tab 0, row 2, @0x1f5b
tl: 11 fb: --H-FL-- lb: 0x0  cc: 2
col  0: [ 2]  c1 1d
col  1: [ 4]  43 4f 4e 24

11g增加列之后dump

tab 0, row 0, @0x1f74
tl: 12 fb: --H-FL-- lb: 0x0  cc: 2
col  0: [ 2]  c1 15
col  1: [ 5]  49 43 4f 4c 24
tab 0, row 1, @0x1f66
tl: 14 fb: --H-FL-- lb: 0x0  cc: 2
col  0: [ 2]  c1 2f
col  1: [ 7]  49 5f 55 53 45 52 31
tab 0, row 2, @0x1f5b
tl: 11 fb: --H-FL-- lb: 0x0  cc: 2
col  0: [ 2]  c1 1d
col  1: [ 4]  43 4f 4e 24
tab 0, row 3, @0x1f4f
tl: 12 fb: --H-FL-- lb: 0x0  cc: 2
col  0: [ 2]  c1 10
col  1: [ 5]  55 4e 44 4f 24

10g增加列之后dump
因为10g在没有增加列之前的dump和11g未增加列之前类似,所以未dump出来

tab 0, row 0, @0x1f63
tl: 29 fb: --H-FL-- lb: 0x2  cc: 3
col  0: [ 2]  c1 15
col  1: [ 5]  49 43 4f 4c 24
col  2: [16]  77 77 77 2e 78 69 66 65 6e 66 65 69 2e 63 6f 6d
tab 0, row 1, @0x1f44
tl: 31 fb: --H-FL-- lb: 0x2  cc: 3
col  0: [ 2]  c1 2d
col  1: [ 7]  49 5f 55 53 45 52 31
col  2: [16]  77 77 77 2e 78 69 66 65 6e 66 65 69 2e 63 6f 6d
tab 0, row 2, @0x1f28
tl: 28 fb: --H-FL-- lb: 0x2  cc: 3
col  0: [ 2]  c1 1d
col  1: [ 4]  43 4f 4e 24
col  2: [16]  77 77 77 2e 78 69 66 65 6e 66 65 69 2e 63 6f 6d
tab 0, row 3, @0x1f0b
tl: 29 fb: --H-FL-- lb: 0x2  cc: 3
col  0: [ 2]  c1 10
col  1: [ 5]  55 4e 44 4f 24
col  2: [16]  77 77 77 2e 78 69 66 65 6e 66 65 69 2e 63 6f 6d

对比发现11g在增加列之后,以前的数据dump出来的内容未有任何改变.也就是说:在10g中,我们增加一个列和默认值,会自动的增加到真实的数据中,而在11g中增加列和默认值并未真的加到11g的表中已经存在的数据中.
11g中插入新数据dump测试

SQL>  insert into chf.t_xifenfei(object_id,object_name)
  2  select object_id,object_name FROM DBA_OBJECTS;

74605 rows created.

SQL> commit;

Commit complete.

SQL> ALTER SYSTEM FLUSH BUFFER_CACHE;

System altered.

SQL> /

System altered.
--建议刷新

SQL> 
SQL> 
SQL> select   rowid,
  2   dbms_rowid.rowid_relative_fno(rowid)rel_fno,
  3   dbms_rowid.rowid_block_number(rowid)blockno,
  4   dbms_rowid.rowid_row_number(rowid) rowno
  5  from chf.t_xifenfei where object_name='OBJ$';

ROWID                 REL_FNO    BLOCKNO      ROWNO
------------------ ---------- ---------- ----------
AAASpRAAEAAAACrAAu          4        171         46
AAASpRAAEAAAB5TAAu          4       7763         46

SQL> ALTER SYSTEM DUMP DATAFILE 4 BLOCK 7763;

System altered.

--dump内容
tab 0, row 0, @0x4e3
tl: 29 fb: --H-FL-- lb: 0x1  cc: 3
col  0: [ 2]  c1 15
col  1: [ 5]  49 43 4f 4c 24
col  2: [16]  77 77 77 2e 78 69 66 65 6e 66 65 69 2e 63 6f 6d
tab 0, row 1, @0x500
tl: 31 fb: --H-FL-- lb: 0x1  cc: 3
col  0: [ 2]  c1 2f
col  1: [ 7]  49 5f 55 53 45 52 31
col  2: [16]  77 77 77 2e 78 69 66 65 6e 66 65 69 2e 63 6f 6d
tab 0, row 2, @0x51f
tl: 28 fb: --H-FL-- lb: 0x1  cc: 3
col  0: [ 2]  c1 1d
col  1: [ 4]  43 4f 4e 24
col  2: [16]  77 77 77 2e 78 69 66 65 6e 66 65 69 2e 63 6f 6d
tab 0, row 3, @0x53b
tl: 29 fb: --H-FL-- lb: 0x1  cc: 3
col  0: [ 2]  c1 10
col  1: [ 5]  55 4e 44 4f 24
col  2: [16]  77 77 77 2e 78 69 66 65 6e 66 65 69 2e 63 6f 6d

通过这里看看出:在11g中后续插入的数据,默认值也插入到数据文件中

Oracle 11G的DDL_LOCK_TIMEOUT参数

今天kaums给客户做培训11g新特性,发现还真的有不少挺好的新东西,但是以前没有怎么去关注的他们,在后续的几篇中,陆续整理处理.
DDL_LOCK_TIMEOUT specifies a time limit for how long DDL statements will wait in a DML lock queue. The default value of zero indicates a status of NOWAIT. The maximum value of 1,000,000 seconds will result in the DDL statement waiting forever to acquire a DML lock.
If a lock is not acquired before the timeout period expires, then an error is returned.
数据库ddl_lock_timeout参数

SQL> select * from v$version;

BANNER
---------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production

SQL> show parameter ddl_lock_timeout;

NAME                                 TYPE       VALUE
------------------------------------ ---------- --------------
ddl_lock_timeout                     integer    0

ddl_lock_timeout默认值测试(同以前版本)

--会话1
SQL> set time on
10:27:57 SQL> create table t_xifenfei as
10:28:05   2  select * from dba_users;

Table created.

Elapsed: 00:00:00.08
10:28:17 SQL> delete from t_xifenfei where username='CHF';

1 row deleted.

--会话2
SQL> set timing on
SQL> ALTER TABLE T_XIFENFEI DROP COLUMN AUTHENTICATION_TYPE;
ALTER TABLE T_XIFENFEI DROP COLUMN AUTHENTICATION_TYPE
            *
ERROR at line 1:
ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired


Elapsed: 00:00:00.01

设置ddl_lock_timeout超时测试

--会话1
10:28:17 SQL> delete from t_xifenfei where username='CHF';

1 row deleted.

--会话2
SQL>  ALTER SESSION SET ddl_lock_timeout=10;

Session altered.

Elapsed: 00:00:00.00
SQL> ALTER TABLE T_XIFENFEI DROP COLUMN AUTHENTICATION_TYPE;
ALTER TABLE T_XIFENFEI DROP COLUMN AUTHENTICATION_TYPE
            *
ERROR at line 1:
ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired


Elapsed: 00:00:10.01

设置ddl_lock_timeout未超时测试

--会话2
SQL> ALTER SESSION SET ddl_lock_timeout=30;

Session altered.

Elapsed: 00:00:00.00

--会话1
10:30:00 SQL> COMMIT;

Commit complete.

Elapsed: 00:00:00.00

--会话2
SQL> ALTER TABLE T_XIFENFEI DROP COLUMN AUTHENTICATION_TYPE;

Table altered.

Elapsed: 00:00:07.91

在以前的版本中ddl操作是nowait等待,通过实验可以发现ddl_lock_timeout可以在一定程度上解决因为我们不清楚这个表是否有dml操作而导致ddl操作不能进行的情况,从一定程度上减少了自己去尝试ddl操作,或者查询相关视图然后找出相关会话,然后kill掉对应数据的情况,可以说是在修改表结构的时候一个很不错的新特性.

使用flashback query恢复被删除plsql

今天一个朋友在11g中误删除生产中的一个过程,让他通过对dba_source视图的flashback query找回该过程.
从10g及其以后的版本中,如果被删除的plsql被及时发现(undo 未被覆盖掉)可以使用flashback query功能实现恢复.
创建plsql并删除

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod
PL/SQL Release 10.2.0.1.0 - Production
CORE    10.2.0.1.0      Production
TNS for Linux: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production

SQL> create or replace PROCEDURE  p_test_del(in_put varchar2) as
  2  begin
  3  dbms_output.put_line(in_put);
  4  end;
  5  /

Procedure created.

SQL> set serveroutput on
SQL> exec p_test_del('www.orasos.com');
www.orasos.com

PL/SQL procedure successfully completed.

SQL> select to_char(sysdate,'yyyy-mm-dd hh24:mi:ss') from dual;

TO_CHAR(SYSDATE,'YY
-------------------
2012-07-02 08:34:45

SQL> drop  PROCEDURE  p_test_del;

Procedure dropped.

恢复plsql

SQL> set pagesize 0
SQL> column text format a4000
SQL> spool /tmp/get_delete_proc.xff
SQL> SELECT text
  2         FROM DBA_source AS OF TIMESTAMP TO_TIMESTAMP('2012-07-02 08:34:45', 'YYYY-MM-DD HH24:MI:SS')
  3        WHERE OWNER = 'CHF' AND NAME = 'P_TEST_DEL' ORDER BY LINE;
PROCEDURE  p_test_del(in_put varchar2) as
begin
dbms_output.put_line(in_put);
end;

SQL> spool off;
SQL> !more /tmp/get_delete_proc.xff
SQL> SELECT text
  2         FROM DBA_source AS OF TIMESTAMP TO_TIMESTAMP('2012-07-02 08:34:45', 'YYYY-MM-DD HH24:MI:SS')
  3        WHERE OWNER = 'CHF' AND NAME = 'P_TEST_DEL' ORDER BY LINE;
create PROCEDURE  p_test_del(in_put varchar2) as 
begin                         
dbms_output.put_line(in_put); 
end;        

SQL> spool off;

重建plsql

SQL> create PROCEDURE  p_test_del(in_put varchar2) as 
  2  begin                                     
  3  dbms_output.put_line(in_put);             
  4  end; 
  5  /

Procedure created.

SQL> set serveroutput on
SQL> exec p_test_del('惜分飞');
惜分飞

PL/SQL procedure successfully completed.

ORA-00600[kcbshlc_1]导致数据库 down 案例

一台服务器因为ORA-00600[kcbshlc_1]错误引起PMON异常导致数据库down掉

Sun Jul  8 17:20:10 2012
Errors in file /opt/oracle/admin/xff/bdump/xff_pmon_16412.trc:
ORA-00600: internal error code, arguments: [kcbshlc_1], [33], [], [], [], [], [], []
Sun Jul  8 17:20:12 2012
Errors in file /opt/oracle/admin/xff/bdump/xff_pmon_16412.trc:
ORA-00600: internal error code, arguments: [kcbshlc_1], [33], [], [], [], [], [], []
Sun Jul  8 17:20:12 2012
PMON: terminating instance due to error 472

分析trace文件

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /opt/oracle/product/10.2.0
System name:	Linux
Node name:	localhost.localdomain
Release:	2.6.9-89.ELsmp
Version:	#1 SMP Mon Apr 20 10:33:05 EDT 2009
Machine:	x86_64
Instance name: xff
Redo thread mounted by this instance: 1
Oracle process number: 2
Unix process pid: 16412, image: oracle@localhost.localdomain (PMON)

*** 2012-07-08 03:00:11.351
*** SERVICE NAME:(SYS$BACKGROUND) 2012-07-08 03:00:11.338
*** SESSION ID:(1105.1) 2012-07-08 03:00:11.338
 wsd 0x1f8169a6c8, sbuf (nil), setid 9, op 0
lcuridx 0, lasz (nil)
freeing in-flux r/w latch for process state: 1fc165d248
... in-flux r/w latch  1fc1fcc9b0 Child cache buffers chains level=1 child#=4753 
        Location from where latch is held: kcbgtcr: kslbegin excl: 
        Context saved from call: 113266196
        state=busy(exclusive) (val=0x2000000000000071) holder orapid = 113
    waiters [orapid (seconds since: put on list, posted, alive check)]:
     139 (2, 1341687611, 2)
     192 (2, 1341687611, 2)
     191 (2, 1341687611, 2)
     173 (2, 1341687611, 2)
     185 (2, 1341687611, 2)
     176 (2, 1341687611, 2)
     174 (2, 1341687611, 2)
     118 (2, 1341687611, 2)
     190 (2, 1341687611, 2)
     179 (2, 1341687611, 2)
     184 (1, 1341687611, 1)
     189 (1, 1341687611, 1)
     177 (1, 1341687611, 1)
     195 (1, 1341687611, 1)
     187 (1, 1341687611, 1)
     194 (1, 1341687611, 1)
     147 (1, 1341687611, 1)
     183 (1, 1341687611, 1)
     143 (1, 1341687611, 1)
     144 (1, 1341687611, 1)
     186 (1, 1341687611, 1)
     188 (1, 1341687611, 1)
     196 (1, 1341687611, 1)
     145 (1, 1341687611, 1)
     193 (1, 1341687611, 1)
     waiter count=25
*** 2012-07-08 03:50:06.228
 wsd 0x1f8169ac20, sbuf 0xac1ffafe8, setid 10, op 3
lcuridx 1, lasz 0x3c1ffc110
*** 2012-07-08 16:30:05.294
freeing in-flux r/w latch for process state: 20406507f0
... in-flux r/w latch  1f81265f28 Child cache buffers chains level=1 child#=14180 
        Location from where latch is held: kcbgtcr: kslbegin excl: 
        Context saved from call: 71341989
        state=busy(exclusive) (val=0x2000000000000066) holder orapid = 102
    waiters [orapid (seconds since: put on list, posted, alive check)]:
     121 (2, 1341736205, 2)
     116 (2, 1341736205, 2)
     125 (2, 1341736205, 2)
     140 (2, 1341736205, 2)
     145 (2, 1341736205, 2)
     waiter count=5
freeing in-flux r/w latch for process state: 1fc165f9d0
... in-flux r/w latch  1f813aec18 Child cache buffers chains level=1 child#=20914 
        Location from where latch is held: kcbrls: kslbegin: 
        Context saved from call: 96505705
        state=busy(exclusive) (val=0x200000000000007b) holder orapid = 123
*** 2012-07-08 17:20:10.876
 wsd 0x1f8169a6c8, sbuf (nil), setid 9, op 0
lcuridx 0, lasz (nil)
*** 2012-07-08 17:20:10.876
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kcbshlc_1], [33], [], [], [], [], [], []
----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
ksedst()+31          call     ksedst1()            000000000 ? 000000001 ?
                                                   7FBFFFCEB0 ? 7FBFFFCF10 ?
                                                   7FBFFFCE50 ? 000000000 ?
ksedmp()+610         call     ksedst()             000000000 ? 000000001 ?
                                                   7FBFFFCEB0 ? 7FBFFFCF10 ?
                                                   7FBFFFCE50 ? 000000000 ?
ksfdmp()+21          call     ksedmp()             000000003 ? 000000001 ?
                                                   7FBFFFCEB0 ? 7FBFFFCF10 ?
                                                   7FBFFFCE50 ? 000000000 ?
kgerinv()+161        call     ksfdmp()             000000003 ? 000000001 ?
                                                   7FBFFFCEB0 ? 7FBFFFCF10 ?
                                                   7FBFFFCE50 ? 000000000 ?
kgeasnmierr()+163    call     kgerinv()            0066876E0 ? 2A97200260 ?
                                                   7FBFFFCF10 ? 7FBFFFCE50 ?
                                                   000000000 ? 000000000 ?
kcbshlc()+239        call     kgeasnmierr()        0066876E0 ? 2A97200260 ?
                                                   7FBFFFCF10 ? 7FBFFFCE50 ?
                                                   000000000 ? 000000021 ?
kslilcr()+770        call     kcbshlc()            0066876E0 ? 1F801DDB28 ?
                                                   7FBFFFCF10 ? 7FBFFFCE50 ?
                                                   000000000 ? 000000021 ?
ksl_cleanup()+1567   call     kslilcr()            7FBFFFCE50 ? 000000000 ?
                                                   7FBFFFDCE0 ? 1F801DDB28 ?
                                                   0066876E0 ? 000000021 ?
ksuxfl()+492         call     ksl_cleanup()        000000000 ? 000000000 ?
                                                   000000000 ? 1F801DDB28 ?
                                                   0066876E0 ? 000000021 ?
ksuxda()+55          call     ksuxfl()             1FC165B8E0 ? 000000000 ?
                                                   000000000 ? 1F801DDB28 ?
                                                   0066876E0 ? 000000021 ?
ksucln()+1390        call     ksuxda()             1FC165B8E0 ? 000000000 ?
                                                   000000000 ? 1F801DDB28 ?
                                                   0066876E0 ? 000000021 ?
ksbrdp()+794         call     ksucln()             060008100 ? 000000000 ?
                                                   FFFFFFFF9720ED9F ?
                                                   1F801DDB28 ? 0066876E0 ?
                                                   000000021 ?
opirip()+616         call     ksbrdp()             060008100 ? 000000000 ?
                                                   000000001 ? 060008100 ?
                                                   0066876E0 ? 000000021 ?
opidrv()+582         call     opirip()             000000032 ? 000000004 ?
                                                   7FBFFFF698 ? 060008100 ?
                                                   0066876E0 ? 000000021 ?
sou2o()+114          call     opidrv()             000000032 ? 000000004 ?
                                                   7FBFFFF698 ? 060008100 ?
                                                   0066876E0 ? 000000021 ?
opimai_real()+317    call     sou2o()              7FBFFFF670 ? 000000032 ?
                                                   000000004 ? 7FBFFFF698 ?
                                                   0066876E0 ? 000000021 ?
main()+116           call     opimai_real()        000000003 ? 7FBFFFF700 ?
                                                   000000004 ? 7FBFFFF698 ?
                                                   0066876E0 ? 000000021 ?
__libc_start_main()  call     main()               000000003 ? 7FBFFFF700 ?
+219                                               000000004 ? 7FBFFFF698 ?
                                                   0066876E0 ? 000000021 ?
_start()+42          call     __libc_start_main()  000713984 ? 000000001 ?
                                                   7FBFFFF848 ? 005288D00 ?
                                                   000000000 ? 000000003 ?

通过这个trace可以看出数据库运行在LINUX 64操作系统,版本是10.2.0.4。
出现错误的原因:
PMON在清理1fc165d248的时候,因为被orapid = 102持有,导致清理失败.
PMON在清理20406507f0的时候,因为被orapid = 102持有,导致清理失败.
PMON在清理1fc165f9d0的时候,因为被orapid = 123持有,导致清理失败.

查询MOS[443909.1]
发现是unpublished Bug 4723109.处理方法打上Patch 4723109.