ROW CACHE LOCK等待事件

ROW CACHE LOCK基础说明
ROW CACHE LOCK等待事件是一个共享池相关的等待事件。是由于对于字典缓冲的访问造成的。
P1 – Cache Id
P2 – Mode Held
P3 – Mode Requested

mode 和REQUEST的取值:
KQRMNULL 0 null mode – not locked
KQRMS 3 share mode
KQRMX 5 exclusive mode
KQRMFAIL 10 fail to acquire instance lock

如果是RAC/OPS环境,前台进程发出锁请求,LCK0进程发出锁请求。如果是单实例模式,由前台进程直接发出锁请求。
在RAC/OPS环境下,前台进程会循环等待锁的获取,最多会等待60秒钟。在单实例环境,前台进程会循环1000次,等待3秒钟。PMON进程无论在哪种模式,都会等待5秒钟。
要注意的是单实例模式下和多实例模式下申请该锁调用的模块是不同的(kqrget()- 单实例,kqgigt()- 多实例)。
如果发现这个等待十分高,一般来说可能由于2种原因,一是共享池太小了,需要增加共享池,另外一种情况是SQL分析过于频繁,对于共享池的并发访问量过大。对于任何一种情况,绝大多数情况下加大共享池会有助于降低该等待,不过加大共享池的时候也要注意,并不一定所有的情况下增加共享池都会有明显的效果,特别是对于第二种情况,精确的分析十分重要。另外进一步分析,弄清楚哪些ROW CACHE的等待最为严重,有助于解决问题。

SQL查询

--查询row cache lock等待
select *  from v$session_wait where wait_class = 'row cache lock';

--查询rowcache 名称
select * from v$rowcache where cache# = &p1;

ENQUEUE TYPE
DC_TABLESPACES
Probably the most likely cause is the allocation of new extents. If extent sizes are set low then the application may constantly be requesting new extents and causing contention. Do you have objects with small extent sizes that are rapidly growing? (You may be able to spot these by looking for objects with large numbers of extents). Check the trace for insert/update activity, check the objects inserted into for number of extents.

DC_SEQUENCES
Check for appropriate caching of sequences for the application requirements.

DC_USERS
Deadlock and resulting “WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!” can occur if a session issues a GRANT to a user, and that user is in the process of logging on to the database.

DC_SEGMENTS
This is likely to be down to segment allocation. Identify what the session holding the enqueue is doing and use errorstacks to diagnose.

DB_ROLLBACK_SEGMENTS
This is due to rollback segment allocation. Just like dc_segments,identify what is holding the enqueue and also generate errorstacks. Remember that on a multi-node system (RAC) the holder may be on another node and so multiple systemstates from each node will be required.

DC_AWR_CONTROL
This enqueue is related to control of the Automatic Workload Repository. As such any operation manipulating the repository may hold this so look for processes blocking these.

Process OS id : xxxxx alive after kill

Process OS id : xxxxx alive after kill警告

Mon May 21 04:55:06 2012
Shutting down instance (immediate)
License high water mark = 373
Mon May 21 04:55:06 2012
Stopping Job queue slave processes
Mon May 21 04:55:06 2012
Job queue slave processes stopped
All dispatchers and shared servers shutdown
Mon May 21 04:55:14 2012
Process OS id : 9922 alive after kill
Errors in file 
Mon May 21 04:55:16 2012
Process OS id : 8159 alive after kill
Errors in file /oracle/admin/resultdb/udump/resultdb_ora_14639.trc
Mon May 21 04:55:17 2012
Process OS id : 8285 alive after kill
Errors in file /oracle/admin/resultdb/udump/resultdb_ora_14639.trc
Mon May 21 04:55:33 2012
ALTER DATABASE CLOSE NORMAL

错误原因

On some platforms it takes some time to kill processes--AIX being one of those platforms

There have been previous reports of shutdown taking time on AIX and after all Oracle waits were taken out of
the picture it was determined to be due to the way the kill command is implemented on that platform. 


Bug 4931101 ERRORS IN ALERT LOG DURING SHUTDOWN

处理建议

Ignore the error  as all processes will be closed and shutdown will complete successfully.

补充说明
本次出问题的数据库是运行在 linux 平台上的 10.2.0.3

主键表插入数据不提交,外键表插入数据被阻塞

有客户和我说:他在含主外键的表中实验发现,在主表数据未提交,然后在外键表插入该数据数据时,出现外键表hang住现象.我开始以为是不同的会话,根据oracle数据库的一致性原则,应该新会话在外键表中不能知道这个记录的存在,直接报错.
可是我实验结果证明:外键表会被阻塞.分析原因如下:
模拟环境

SQL> create table t_p(id number primary key,name varchar2(100));

Table created.

SQL> create table t_f(fid number primary key,pid number, foreign key(pid) references  t_p(id));

Table created.

--会话1
SQL> insert into t_p values(1,'xifenfei');

1 row created.

SQL> commit;

Commit complete.

--会话2
SQL> insert into t_f values(1,1);

1 row created.

SQL> commit;

--会话1
SQL> insert into t_p values(2,'XIFENFEI');

1 row created.

--会话2
SQL> insert into t_f values(2,2);
--hang住

通过实验发现,当主键数据没有提交,然后在外键表中插入该数据外键数据时,该条记录会处于hang住状态(等待),那是什么原因导致了这个等待呢?对会话2做一个10046的trace,发现如下

*** 2012-05-17 17:25:41.757
WAIT #3065187488: nam='enq: TX - row lock contention' ela= 27002895 name|mode=1415053316 usn<<16 
| slot=262151 sequence=588 obj#=-1 tim=1337246741756917
EXEC #3065187488:c=4000,e=27004456,p=0,cr=2,cu=14,mis=0,r=0,dep=0,og=1,
plh=0,tim=1337246741757690
ERROR #3065187488:err=1013 tim=1337246741757751
STAT #3065187488 id=1 cnt=0 pid=0 pos=1 obj=0 op='LOAD TABLE CONVENTIONAL  
(cr=0 pr=0 pw=0 time=12 us)'
WAIT #3065187488: nam='SQL*Net break/reset to client' ela= 581 driver 
id=1650815232 break?=0 p3=0 obj#=-1 tim=1337246741782587
WAIT #3065187488: nam='SQL*Net message to client' ela= 2 driver id=1650815232 
#bytes=1 p3=0 obj#=-1 tim=1337246741782668

通过这个trace发现,是因为TX锁导致了外键表上的插入操作被阻塞.出现该问题的原因
有两种可能:1.两次插入(主键表和外键表分别插入)在主键表上有不兼容锁;2.外键表上有不兼容性锁.

使用oradebug跟踪会话

oradebug setmypid
--EVENT 10704跟踪锁的使用情况
oradebug EVENT 10704 trace name context forever,level 10
--插入数据操作
oradebug EVENT 10704 trace name context off
oradebug TRACEFILE_NAME

跟踪主键表插入数据

*** 2012-05-17 19:05:52.410
ksqgtl *** TM-00012892-00000000 mode=3 flags=0x401 timeout=21474836 ***
ksqgtl: xcb=0x343c0e54, ktcdix=2147483647, topxcb=0x343c0e54
        ktcipt(topxcb)=0x0

*** 2012-05-17 19:05:52.411
ksucti: init txn DID from session DID 
ksqgtl:
        ksqlkdid: 0001-0013-0000000F

*** 2012-05-17 19:05:52.429
*** ksudidTrace: ksqgtl
        ktcmydid(): 0001-0013-0000000F
        ksusesdi:   0000-0000-00000000
        ksusetxn:   0001-0013-0000000F
ksqgtl: RETURNS 0

*** 2012-05-17 19:05:52.430
ksqgtl *** TM-00012894-00000000 mode=3 flags=0x401 timeout=21474836 ***
ksqgtl: xcb=0x343c0e54, ktcdix=2147483647, topxcb=0x343c0e54
        ktcipt(topxcb)=0x0

*** 2012-05-17 19:05:52.430
ksucti: init session DID from txn DID: 
ksqgtl:
        ksqlkdid: 0001-0013-0000000F

*** 2012-05-17 19:05:52.430
*** ksudidTrace: ksqgtl
        ktcmydid(): 0001-0013-0000000F
        ksusesdi:   0000-0000-00000000
        ksusetxn:   0001-0013-0000000F
ksqgtl: RETURNS 0

*** 2012-05-17 19:05:52.431
ksqgtl *** TX-00050019-00000307 mode=6 flags=0x401 timeout=0 ***
ksqgtl: xcb=0x343c0e54, ktcdix=2147483647, topxcb=0x343c0e54
        ktcipt(topxcb)=0x0

*** 2012-05-17 19:05:52.431
ksucti: init session DID from txn DID: 
ksqgtl:
        ksqlkdid: 0001-0013-0000000F

*** 2012-05-17 19:05:52.431
*** ksudidTrace: ksqgtl
        ktcmydid(): 0001-0013-0000000F
        ksusesdi:   0000-0000-00000000
        ksusetxn:   0001-0013-0000000F
ksqgtl: RETURNS 0



SQL> SELECT TO_NUMBER(12892,'xxxxxxx') from dual;

TO_NUMBER(12892,'XXXXXXX')
--------------------------
                     75922

SQL> SELECT TO_NUMBER(12894,'xxxxxxx') from dual;

TO_NUMBER(12894,'XXXXXXX')
--------------------------
                     75924

SQL> select object_name from dba_objects where object_id in(75922,75924);

OBJECT_NAM
----------
T_P
T_F

通过锁使用情况跟踪可以知道,在主键表插入一条记录时,先在主键表获得TM锁,然后外键表获得TM锁,最后主键表获得TX MODE=6的锁。

跟踪外键表插入数据

*** 2012-05-17 19:49:24.912
ksqgtl *** TM-00012892-00000000 mode=3 flags=0x401 timeout=21474836 ***
ksqgtl: xcb=0x343a45e8, ktcdix=2147483647, topxcb=0x343a45e8
        ktcipt(topxcb)=0x0

*** 2012-05-17 19:49:24.912
ksucti: init txn DID from session DID 
ksqgtl:
        ksqlkdid: 0001-0015-00000064

*** 2012-05-17 19:49:24.913
*** ksudidTrace: ksqgtl
        ktcmydid(): 0001-0015-00000064
        ksusesdi:   0000-0000-00000000
        ksusetxn:   0001-0015-00000064
ksqgtl: RETURNS 0

*** 2012-05-17 19:49:24.913
ksqgtl *** TM-00012894-00000000 mode=3 flags=0x401 timeout=21474836 ***
ksqgtl: xcb=0x343a45e8, ktcdix=2147483647, topxcb=0x343a45e8
        ktcipt(topxcb)=0x0

*** 2012-05-17 19:49:24.913
ksucti: init session DID from txn DID: 
ksqgtl:
        ksqlkdid: 0001-0015-00000064

*** 2012-05-17 19:49:24.913
*** ksudidTrace: ksqgtl
        ktcmydid(): 0001-0015-00000064
        ksusesdi:   0000-0000-00000000
        ksusetxn:   0001-0015-00000064
ksqgtl: RETURNS 0

*** 2012-05-17 19:49:24.913
ksqgtl *** TX-0002001f-0000034a mode=6 flags=0x401 timeout=0 ***
ksqgtl: xcb=0x343a45e8, ktcdix=2147483647, topxcb=0x343a45e8
        ktcipt(topxcb)=0x0

*** 2012-05-17 19:49:24.913
ksucti: init session DID from txn DID: 
ksqgtl:
        ksqlkdid: 0001-0015-00000064

*** 2012-05-17 19:49:24.914
*** ksudidTrace: ksqgtl
        ktcmydid(): 0001-0015-00000064
        ksusesdi:   0000-0000-00000000
        ksusetxn:   0001-0015-00000064
ksqgtl: RETURNS 0

*** 2012-05-17 19:49:24.914
ksqgtl *** TX-00050019-00000307 mode=4 flags=0x10021 timeout=21474836 ***
ksqgtl: xcb=0x343a45e8, ktcdix=2147483647, topxcb=0x343a45e8
        ktcipt(topxcb)=0x0

*** 2012-05-17 19:49:24.914
ksucti: init session DID from txn DID: 
ksqgtl:
        ksqlkdid: 0001-0015-00000064

*** 2012-05-17 19:49:24.914
*** ksudidTrace: ksqgtl
        ktcmydid(): 0001-0015-00000064
        ksusesdi:   0000-0000-00000000
        ksusetxn:   0001-0015-00000064

*** 2012-05-17 19:49:24.914
ksqcmi: TX,50019,307 mode=4 timeout=21474836

从这里可以发现:先在主键表和外键表上加上TM锁,然后外键表获得TX MODE=6的锁(这边成功,因为该表上未有其他级别不兼容锁),再需要在主键表上获得TX MODE=4(表结构共享锁+所有记录共享锁),但是这个时候,发现该锁上已经在主键表插入数据未提交的时候,已经含有了TX MODE=6的锁,从而使得TX MODE=4无法获得,从而使得外键表插入数据处于阻塞状态.