library cache latch等待事件

产生library cache latch原因

The library cache latches protect the cached SQL statements and objects' definitions held 
in the library cache within the shared pool. The library cache latch must be acquired 
in order to add a new statement to the library cache. During a parse, Oracle searches 
the library cache for a matching statement. If one is not found, then Oracle will parse 
the SQL statement, obtain the library cache latch and insert the new SQL.

每一个sql被执行之前,先要到library cache中根据hash_value查找parent cursor,这就需要先获得library cache latch;找到parent cursor后,就会去查找对应的child cursor,当发现无法找到时,就会释放library cache latch,获得share pool latch分配空间给硬解析后的产生的执行计划;然后再次获得library cache latch进行把执行计划放入share pool,转入library cache pin+lock(null模式)开始执行sql.library cache latch 的个数有限(与CPU_COUNT参数相关),当数据库中出现大量硬解析的时候,某一个sql无法得到library cache latch就会开始spin,达到spin count后还没得到,就会开始sleep,达到sleep时间后,醒来还再次试图过的library cache latch得不到就在spin再得不到又sleep…依此类推.
综上可知:在sql执行的过程中可以看出在出现High Versions Count和Hard Parse的情况下都有可能出现library cache latch等待.
关于Hard Parse见:shared pool latch 等待事件
关于High Versions Count见:关于High Versions Count总结

1._KGL_LATCH_COUNT控制library cache latches数量

The hidden parameter _KGL_LATCH_COUNT controls the number of library cache latches. 
The default value should be adequate, but if contention for the library cache latch cannot  be resolved, 
it one may consider increasing this value. The default value for _KGL_LATCH_COUNT is the next prime number 
after CPU_COUNT. This value cannot exceed 67. 

2.Library cache: mutex X
在10g及其以后版本中,很多latch使用mutex代替,我们常见的Library cache: mutex X is similar to library cache wait in earlier version.(_kks_use_mutex_pin=false可以禁止mutex)

ORA-07445 [ACCESS_VIOLATION] [UNABLE_TO_READ] []

alert中发现ORA-07445错误
ORA-07445: exception encountered: core dump [PC:0x7FFF65D0] [ACCESS_VIOLATION] [ADDR:0xFFFFFFFF] [PC:0x7FFF65D0] [UNABLE_TO_READ] []错误,导致数据库down掉

Mon May 14 14:34:34 2012
Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0xFFFFFFFF] [PC:0x7FFF65D0, {empty}]
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_p001_1280.trc:
ORA-07445: exception encountered: core dump [PC:0x7FFF65D0] [ACCESS_VIOLATION] 
[ADDR:0xFFFFFFFF] [PC:0x7FFF65D0] [UNABLE_TO_READ] []
Mon May 14 14:34:35 2012
Trace dumping is performing id=[cdmp_20120514143435]
Mon May 14 14:35:10 2012
Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0xFFFFFFFF] [PC:0x7FFF65D0, {empty}]
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_1072.trc  (incident=164712):
ORA-07445: exception encountered: core dump [PC:0x7FFF65D0] [ACCESS_VIOLATION] 
[ADDR:0xFFFFFFFF] [PC:0x7FFF65D0] [UNABLE_TO_READ] []
ORA-12080: Buffer cache miss for IOQ batching
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_164712\orcl_smon_1072_i164712.trc

分析trace文件

Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Windows Server 2003 Version V5.2 Service Pack 2
CPU                 : 8 - type 586, 4 Physical Cores
Process Affinity    : 0x00000000
Memory (Avail/Total): Ph:4892M/8189M, Ph+PgF:5638M/9795M, VA:925M/4095M
Instance name: orcl
Redo thread mounted by this instance: 1
Oracle process number: 12
Windows thread id: 1072, image: ORACLE.EXE (SMON)
--以上信息得出操作系统和数据库版本2003 sp2+oracle11g(11.1.0.6 32位)

Dump continued from file: d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_1072.trc
ORA-07445: exception encountered: core dump [PC:0x7FFF65D0] [ACCESS_VIOLATION] 
[ADDR:0xFFFFFFFF] [PC:0x7FFF65D0] [UNABLE_TO_READ] []
ORA-12080: Buffer cache miss for IOQ batching

========= Dump for incident 164712 (ORA 7445 [PC:0x7FFF65D0]) ========
----- Beginning of Customized Incident Dump(s) -----
Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0xFFFFFFFF] [PC:0x7FFF65D0, {empty}]

--这里的ORA-07445 [ACCESS_VIOLATION][UNABLE_TO_READ]根据经验结合这里的32位的环境,
--怀疑是sga使用的内存太多,ORACLE数据库不能读SGA相关内存导致

--在trace中找出相关参数配置.
[0004]: processes=300
[0004]: sessions=335
[0004]: __shared_pool_size=1124073472
[0004]: __large_pool_size=8388608
[0004]: __java_pool_size=16777216
[0004]: __streams_pool_size=251658240
[0004]: streams_pool_size=251658240
[0004]: sga_target=0
[0004]: __sga_target=1887436800
[0004]: memory_target=3145728000
[0004]: memory_max_target=4722786304
[0004]: db_block_size=8192
[0004]: __db_cache_size=478150656
[0004]: __shared_io_pool_size=0
[0004]: compatible=11.1.0.0.0
[0004]: log_buffer=8851456
[0004]: __pga_aggregate_target=780140544
--这里可以看到sga_target分配了内存为1887436800=1.7578125G
--pga_aggregate_target分配了780140544=0.7265625G
--两者内存之和大于2G,超过了32位ORACLE默认限制

查询MOS发现[1341681.1]
该错误原因

This is a resource issue (memory in particular). 32-bit windows systems, 
are limited to 2GB of addressable memory so if you are on this platform 
it's likely you are simply exceeding the capabilities of the 32bit operating system.

解决建议

First recommendation :

If you have not already done so, add the /3GB switch to your boot.ini file and reboot the server. The
boot.ini will be located in the root directory on the drive where windows is installed. The switch, /3GB,
is placed at the end of the line that executes the WinNT loading process. 
This will allow applications such as oracle access to 3Gb or memory instead of 2Gb.

Example:

[operating systems] multi(0)disk(0)rdisk(0)partition(2)\WINNT="Windows NT Server Version 4.00" /3GB

 Second recommendation :

You do not want to increase memory target. If anything, this should be decreased.
You are limited to under 2GB of addressable memory on 32bit windows (the limit is actually about 1.85GB). 
This is for both SGA and PGA memory for all instances; you have to reduce the SGA size for the instance. 
The recommendation is to reduce sga_target, memory_target, and memory_max_target.

9i库遇到ORA-01595/ORA-01594

在alert日志中发现ORA-01595/ORA-01594错误

Sat May 12 21:54:17 2012
Errors in file /oracle/app/admin/prmdb/bdump/prmdb2_smon_483522.trc:
ORA-01595: error freeing extent (2) of rollback segment (19))
ORA-01594: attempt to wrap into rollback segment (19) extent (2) which is being freed

分析trace文件

Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options
JServer Release 9.2.0.8.0 - Production
ORACLE_HOME = /oracle/app/product/9.2.0
System name:    AIX
Node name:      prmsvr2
Release:        3
Version:        5
Machine:        0008585FD600
Instance name: prmdb2
Redo thread mounted by this instance: 2
Oracle process number: 14
Unix process pid: 483522, image: oracle@prmsvr2 (SMON)

*** 2011-05-03 15:28:47.858
*** SESSION ID:(17.1) 2011-05-03 15:28:47.843
*** 2011-05-03 15:28:47.858
SMON: Parallel transaction recovery tried
*** 2011-07-11 17:13:52.028
SMON: Restarting fast_start parallel rollback
*** 2011-07-11 17:28:39.705
SMON: Parallel transaction recovery tried
*** 2012-05-12 21:54:17.246   --当前问题时间点
SMON: following errors trapped and ignored:
ORA-01595: error freeing extent (2) of rollback segment (19))
ORA-01594: attempt to wrap into rollback segment (19) extent (2) which is being freed

--通过trace文件,我们没有获得关于该错误的其他有用信息

查询MOS相关信息[280151.1]
出现该错误原因

This is a known problem and there is an Internal Bug:2181139 for this Issue.

The error is signaled because smon is shrinking a rollback segment and this fails 
because we need an extent to store some rollback information. This is a failure message 
for the shrinking. Subsequently smon would succeed in doing that.
--当smon在shrink rollback segment因为需要一个extent存放rollback

解决建议

Ignore these error messages.
Normally adding more undo space should solve the problem, 
but if space is not correcting the problem, please file an SR for assistance.

This error message logging is fixed in 10g.
--忽略该错误或者升级到10g