cursor: pin S wait on X 等待事件

cursor: pin S整体描述

cursor: pin S A session waits on this event when it wants to update a shared mutex pin and another session 
is currently in the process of updating a shared mutex pin for the same cursor object. This wait event should
rarely be seen because a shared mutex pin update is very fast.(Wait Time: Microseconds)

--Parameter说明
P1 Hash value of cursor

P2 Mutex value 
64 bit platforms
8 bytes are used.
Top 4 bytes hold the session id (if the mutex is held X)
Bottom 4 bytes hold the ref count (if the mutex is held S).

32 bit platforms 
4 bytes are used.
Top 2 bytes hold the session id (if the mutex is held X) 
Bottom 2 bytes hold the ref count (if the mutex is held S).

P3 Mutex where (an internal code locator) OR'd with Mutex Sleeps

--查询sql
SELECT a.*, s.sql_text
  FROM v$sql s,
       (SELECT sid,
               event,
               wait_class,
               p1 cursor_hash_value,
               p2raw Mutex_value,
               TO_NUMBER (SUBSTR (p2raw, 1, 8), 'xxxxxxxx') hold_mutex_x_sid
          FROM v$session_wait
         WHERE event LIKE 'cursor%') a
 WHERE s.HASH_VALUE = a.cursor_hash_value

cursor: pin S wait on X描述

- In previous versions of Oracle, library cache pin is protected by “library cache pin latch”.
- But in recent versions of Oracle(I believe it’s 10.2.0.2), 
  library cache pin for the cursor LCO is protected by mutext.
- Mutex is allocated per LCO, so it enables fine-grained access control.

“cursor: pin S wait on X” wait event is mostly related to mutex and hard parse.
- When a process hard parses the SQL statement, it should acquire exclusive 
  library cache pin for the corresponding LCO.
- This means that the process acquires the mutex in exclusive mode.
- Another process which also executes the same query needs to acquire the mutex 
  but it’s being blocked by preceding process. The wait event is “cursor: pin S wait on X”.

--发生cursor: pin S wait on X原因
Frequent Hard Parses
If the frequency of Hard Parsing is extremely high, then contention can occur on this pin.

High Version Counts
When Version counts become excessive, a long chain of versions needs to 
be examined and this can lead to contention on this event

Known bugs
Bug 5907779 - Self deadlock hang on "cursor: pin S wait on X" (typically from DBMS_STATS) [ID 5907779.8]
Bug 7568642: BLOCKING_SESSION EMPTY FOR "CURSOR: PIN S WAIT ON X"

ORA-00600[kcratr1_lostwrt]/ORA-00600[3020]错误恢复

open数据库alert日志报ORA-00600[kcratr1_lostwrt]错误

Mon May 14 14:57:28 2012
ALTER DATABASE OPEN
Mon May 14 14:57:29 2012
Beginning crash recovery of 1 threads
Mon May 14 14:57:29 2012
Started redo scan
Mon May 14 14:57:29 2012
Errors in file d:\oracle\admin\cqgasold\udump\cqgasold_ora_504.trc:
ORA-00600: 内部错误代码,参数: [kcratr1_lostwrt], [], [], [], [], [], [], []

ORA-600 signalled during: alter database open...

查询相关SCN
同一个查询中SCN相同,省略

SQL> select file#,online_status "STATUS",to_char(change#,'9999999999999999') "SCN",
  2  To_char(time,'yyyy-mm-dd hh24:mi:ss')"TIME" from v$recover_file;

未选定行

SQL> select file#,to_char(checkpoint_change#,'999999999999999') "SCN",
  2  to_char(last_change#,'999999999999999')"STOP_SCN" from v$datafile;

     FILE# SCN              STOP_SCN                                            
---------- ---------------- ----------------                                    
         1       7842987188                                                     
         2       7842987188                                                     
         3       7842987188                                                     

SQL> select file#,to_char(checkpoint_change#,'9999999999999999') "SCN",
  2  to_char(RESETLOGS_CHANGE#,'9999999999999999') "RESETLOGS SCN"
  3  from v$datafile_header;

     FILE# SCN               RESETLOGS SCN                                      
---------- ----------------- -----------------                                  
         1        7842991811                 1                                  
         2        7842991811                 1                                  
         3        7842991811                 1                                  

这里看到奇怪现象datafile scn小于datafile_header scn,数据库异常断电一般来说也不会出现这样的情况,个人猜测是错误的恢复或者使用历史控制文件导致,对于这样的现状,我先尝试着使用using backup controlfile方式恢复,结果失败.估计控制文件有异常,本着先拉起库原则,重建控制文件.

进行完全恢复

SQL> recover database;
ORA-00283: 恢复会话因错误而取消
ORA-00600: 内部错误代码,参数: [3020], [8388617], [1], [23403], [25], [112],[], []
ORA-10567: Redo is inconsistent with data block (file# 2, block# 9)
ORA-10564: tablespace UNDOTBS1
ORA-01110: 数据文件 2: 'D:\ORACLE\ORADATA\CQGASOLD\UNDO_1.DBF'
ORA-10560: block type 'KTU SMU HEADER BLOCK'

尝试跳过坏块继续恢复

SQL> recover database allow 1 corruption;
ORA-00283: 恢复会话因错误而取消
ORA-00600: 内部错误代码,参数: [3020], [8388610], [1], [23403], [2264], [16],[], []
ORA-10567: Redo is inconsistent with data block (file# 2, block# 2)
ORA-10564: tablespace UNDOTBS1
ORA-01110: 数据文件 2: 'D:\ORACLE\ORADATA\CQGASOLD\UNDO_1.DBF'
ORA-10560: block type 'KTFB Bitmapped File Space Header'

使用dbv检查坏块数量

C:\>dbv file='d:\oracle\oradata\cqgasold\undo_1.dbf' blocksize=8192

DBVERIFY: Release 9.2.0.5.0 - Production on 星期二 5月 15 19:43:42 2012

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.

DBVERIFY - 验证正在开始 : FILE = d:\oracle\oradata\cqgasold\undo_1.dbf

DBV-00200: 块, dba 8388617, 已经标记为崩溃
汇入的页369 - 可能是介质损坏
***
Corrupt block relative dba: 0x00800171 (file 2, block 369)
Fractured block found during dbv:
Data in bad block -
 type: 2 format: 2 rdba: 0x00800171
 last change scn: 0x0001.d37c57db seq: 0x1 flg: 0x04
 consistency value in tail: 0x4e930260
 check value in block header: 0x8202, computed block checksum: 0x4e5f
 spare1: 0x0, spare2: 0x0, spare3: 0x0
***

汇入的页417 - 可能是介质损坏
***
Corrupt block relative dba: 0x008001a1 (file 2, block 417)
Fractured block found during dbv:
Data in bad block -
 type: 2 format: 2 rdba: 0x008001a1
 last change scn: 0x0001.d37c53d4 seq: 0x2 flg: 0x04
 consistency value in tail: 0x4b6b0201
 check value in block header: 0x6ae7, computed block checksum: 0x5abc
 spare1: 0x0, spare2: 0x0, spare3: 0x0
***
…………
--类此记录很多,我放弃了跳过坏块修复的方法

恢复过程中提示坏块数据库文件离线恢复

SQL> alter database datafile 'd:\oracle\oradata\cqgasold\undo_1.dbf' offline;

数据库已更改。

SQL> recover database;
完成介质恢复。
SQL> alter database open;
alter database open
*
ERROR 位于第 1 行:
ORA-00604: 递归 SQL 层 1 出现错误
ORA-00376: 此时无法读取文件 2
ORA-01110: 数据文件 2: 'D:\ORACLE\ORADATA\CQGASOLD\UNDO_1.DBF'

到了这一步,根据经验,数据库被open的可能性很多了,很有可能是open以后因为smon回滚导致数据库down

查看日志,屏蔽回滚段,完成恢复

Tue May 15 19:59:52 2012
alter database open
Tue May 15 19:59:52 2012
Beginning crash recovery of 1 threads
Tue May 15 19:59:52 2012
Started redo scan
Tue May 15 19:59:52 2012
Completed redo scan
 323 redo blocks read, 82 data blocks need recovery
Tue May 15 19:59:52 2012
Started recovery at
 Thread 1: logseq 23404, block 3, scn 0.0
Recovery of Online Redo Log: Thread 1 Group 4 Seq 23404 Reading mem 0
  Mem# 0 errs 0: F:\ORACLE\ORADATA\LOGCQGASOLD4.ORA
Tue May 15 19:59:52 2012
Completed redo application
Tue May 15 19:59:52 2012
Ended recovery at
 Thread 1: logseq 23404, block 326, scn 1.3548264979
 82 data blocks read, 82 data blocks written, 323 redo blocks read
Crash recovery completed successfully
Tue May 15 19:59:53 2012
Thread 1 advanced to log sequence 23405
Thread 1 opened at log sequence 23405
  Current log# 2 seq# 23405 mem# 0: D:\ORACLE\ORADATA\CQGASOLD\REDO02.LOG
Successful open of redo thread 1
Tue May 15 19:59:53 2012
SMON: enabling cache recovery
SMON: enabling tx recovery
Tue May 15 19:59:54 2012
Database Characterset is ZHS16GBK
Tue May 15 19:59:55 2012
replication_dependency_tracking turned off (no async multimaster replication found)
ORA-604 signalled during: alter database open...
Tue May 15 19:59:56 2012
SMON: about to recover undo segment 1
SMON: mark undo segment 1 as needs recovery
SMON: about to recover undo segment 2
SMON: mark undo segment 2 as needs recovery
SMON: about to recover undo segment 3
SMON: mark undo segment 3 as needs recovery
SMON: about to recover undo segment 4
SMON: mark undo segment 4 as needs recovery
SMON: about to recover undo segment 5
SMON: mark undo segment 5 as needs recovery
SMON: about to recover undo segment 6
SMON: mark undo segment 6 as needs recovery
SMON: about to recover undo segment 7
SMON: mark undo segment 7 as needs recovery
SMON: about to recover undo segment 8
SMON: mark undo segment 8 as needs recovery
SMON: about to recover undo segment 9
SMON: mark undo segment 9 as needs recovery
SMON: about to recover undo segment 10
SMON: mark undo segment 10 as needs recovery
Tue May 15 20:00:37 2012
Shutting down instance (abort)

看到这里,可以大概确定是因为undo文件离线,导致回滚段异常.
这个问题,基本上可以确定通过隐含参数屏蔽回滚段,然后open数据库,重建undo删除异常undo,数据库恢复完成。

记录一次rman备份ORA-19502/ORA-27063错误原因分析

rman备份出现ORA-19502/ORA-27063错误

RMAN> 2> 3> 4> 5> 6> 7> 8> 9> 10> 11> 12> 13> 14> 15> 16> 17> 18> 19> 20> 
allocated channel: t11
channel t11: sid=824 instance=ncdb1 devtype=DISK

allocated channel: t12
channel t12: sid=838 instance=ncdb1 devtype=DISK

allocated channel: t13
channel t13: sid=809 instance=ncdb1 devtype=DISK

allocated channel: t14
channel t14: sid=886 instance=ncdb1 devtype=DISK

allocated channel: t15
channel t15: sid=620 instance=ncdb1 devtype=DISK

allocated channel: t16
channel t16: sid=599 instance=ncdb1 devtype=DISK

allocated channel: t17
channel t17: sid=482 instance=ncdb1 devtype=DISK

allocated channel: t18
channel t18: sid=506 instance=ncdb1 devtype=DISK
一共开通8个通道

channel t12: starting full datafile backupset
channel t12: specifying datafile(s) in backupset
input datafile fno=00008 name=/dev/rnc32g_39
input datafile fno=00016 name=/dev/rnc32g_47
input datafile fno=00024 name=/dev/rnc32g_57
input datafile fno=00032 name=/dev/rnc32g_25
input datafile fno=00040 name=/dev/rnc32g_33
input datafile fno=00048 name=/dev/rnc32g_3
input datafile fno=00056 name=/dev/rnc32g_11
input datafile fno=00064 name=/dev/rnc32g_19
input datafile fno=00072 name=/dev/rnc32g_67
input datafile fno=00080 name=/dev/rnc32g_106
input datafile fno=00088 name=/dev/rnc32g_114
input datafile fno=00096 name=/dev/rnc32g_87
input datafile fno=00104 name=/dev/rnc32g_95
input datafile fno=00112 name=/dev/rnc32g_103
input datafile fno=00120 name=/dev/rnc32g_75
input datafile fno=00003 name=/dev/rnc50_sysaux
input datafile fno=00130 name=/dev/rnc32g_119
channel t12: starting piece 1 at 14-MAY-12
--通道12备份数据文件

channel t17: starting full datafile backupset
channel t17: specifying datafile(s) in backupset
input datafile fno=00002 name=/dev/rnc32g_22
input datafile fno=00013 name=/dev/rnc32g_44
input datafile fno=00021 name=/dev/rnc32g_54
input datafile fno=00029 name=/dev/rnc32g_62
input datafile fno=00037 name=/dev/rnc32g_30
input datafile fno=00045 name=/dev/rnc32g_38
input datafile fno=00053 name=/dev/rnc32g_8
input datafile fno=00061 name=/dev/rnc32g_16
input datafile fno=00069 name=/dev/rnc32g_64
input datafile fno=00077 name=/dev/rncundo_33g_4
input datafile fno=00085 name=/dev/rnc32g_111
input datafile fno=00093 name=/dev/rnc32g_84
input datafile fno=00101 name=/dev/rnc32g_92
input datafile fno=00109 name=/dev/rnc32g_100
input datafile fno=00117 name=/dev/rnc32g_72
input datafile fno=00006 name=/dev/rnc50_4g_1
channel t17: starting piece 1 at 14-MAY-12
--通道17备份数据文件

channel t15: finished piece 1 at 15-MAY-12
piece handle=/rman/db_mpnb04jl_1_1 tag=TAG20120514T204954 comment=NONE
channel t15: backup set complete, elapsed time: 06:07:59
channel t11: finished piece 1 at 15-MAY-12
piece handle=/rman/db_mlnb04jk_1_1 tag=TAG20120514T204954 comment=NONE
channel t11: backup set complete, elapsed time: 06:17:25
channel t16: finished piece 1 at 15-MAY-12
piece handle=/rman/db_mqnb04jm_1_1 tag=TAG20120514T204954 comment=NONE
channel t16: backup set complete, elapsed time: 06:34:49
channel t14: finished piece 1 at 15-MAY-12
piece handle=/rman/db_monb04jl_1_1 tag=TAG20120514T204954 comment=NONE
channel t14: backup set complete, elapsed time: 06:40:05
channel t18: finished piece 1 at 15-MAY-12
piece handle=/rman/db_msnb04jn_1_1 tag=TAG20120514T204954 comment=NONE
channel t18: backup set complete, elapsed time: 06:43:38
channel t13: finished piece 1 at 15-MAY-12
piece handle=/rman/db_mnnb04jl_1_1 tag=TAG20120514T204954 comment=NONE
channel t13: backup set complete, elapsed time: 07:40:56
--这里可以看出rman的备份完成了通道11/13/14/15/16/18,也就是说目前为止通道12/17未完成.

RMAN-03009: failure of backup command on t12 channel at 05/15/2012 04:39:58
ORA-19502: write error on file "/rman/db_mmnb04jl_1_1", blockno 30481025 (blocksize=8192)
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 28: No space left on device
Additional information: -1
Additional information: 1048576
ORA-19502: write error on file "/rman/db_mmnb04jl_1_1", blockno 30480897 (blocksize=8192)
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 28: No space left on device
channel t12 disabled, job failed on it will be run on another channel
--通道12报错(硬盘空间不足)

channel t11: starting full datafile backupset
channel t11: specifying datafile(s) in backupset
input datafile fno=00008 name=/dev/rnc32g_39
input datafile fno=00016 name=/dev/rnc32g_47
input datafile fno=00024 name=/dev/rnc32g_57
input datafile fno=00032 name=/dev/rnc32g_25
input datafile fno=00040 name=/dev/rnc32g_33
input datafile fno=00048 name=/dev/rnc32g_3
input datafile fno=00056 name=/dev/rnc32g_11
input datafile fno=00064 name=/dev/rnc32g_19
input datafile fno=00072 name=/dev/rnc32g_67
input datafile fno=00080 name=/dev/rnc32g_106
input datafile fno=00088 name=/dev/rnc32g_114
input datafile fno=00096 name=/dev/rnc32g_87
input datafile fno=00104 name=/dev/rnc32g_95
input datafile fno=00112 name=/dev/rnc32g_103
input datafile fno=00120 name=/dev/rnc32g_75
input datafile fno=00003 name=/dev/rnc50_sysaux
input datafile fno=00130 name=/dev/rnc32g_119
channel t11: starting piece 1 at 15-MAY-12
--在通道12报错后,通道11已经完成了上次备份,所以启动备份通道12出错的数据文件

RMAN-03009: failure of backup command on t17 channel at 05/15/2012 04:39:58
ORA-19502: write error on file "/rman/db_mrnb04jm_1_1", blockno 30753793 (blocksize=8192)
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 28: No space left on device
Additional information: -1
Additional information: 1048576
ORA-19502: write error on file "/rman/db_mrnb04jm_1_1", blockno 30753665 (blocksize=8192)
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 28: No space left on device
channel t17 disabled, job failed on it will be run on another channel
--通道17也因为磁盘空间报错

channel t13: starting full datafile backupset
channel t13: specifying datafile(s) in backupset
input datafile fno=00002 name=/dev/rnc32g_22
input datafile fno=00013 name=/dev/rnc32g_44
input datafile fno=00021 name=/dev/rnc32g_54
input datafile fno=00029 name=/dev/rnc32g_62
input datafile fno=00037 name=/dev/rnc32g_30
input datafile fno=00045 name=/dev/rnc32g_38
input datafile fno=00053 name=/dev/rnc32g_8
input datafile fno=00061 name=/dev/rnc32g_16
input datafile fno=00069 name=/dev/rnc32g_64
input datafile fno=00077 name=/dev/rncundo_33g_4
input datafile fno=00085 name=/dev/rnc32g_111
input datafile fno=00093 name=/dev/rnc32g_84
input datafile fno=00101 name=/dev/rnc32g_92
input datafile fno=00109 name=/dev/rnc32g_100
input datafile fno=00117 name=/dev/rnc32g_72
input datafile fno=00006 name=/dev/rnc50_4g_1
channel t13: starting piece 1 at 15-MAY-12
--通道13也尝试备份通道17失败的数据文件

RMAN-03009: failure of backup command on t11 channel at 05/15/2012 04:39:59
ORA-19504: failed to create file "/rman/db_mtnb104u_1_1"
ORA-27044: unable to write the header block of file
IBM AIX RISC System/6000 Error: 28: No space left on device
Additional information: 3
Addition
--因为当前没有空闲空间,通道11终止,
--这个时候rman异常终止,导致后续的通道13终止记录未打印到日志

阅读完rman日志,很好理解因为存放rman备份的磁盘空间不足导致了一系列错误

检查磁盘剩余空间

Filesystem    GB blocks      Free %Used    Iused %Iused Mounted on
/dev/hd4          10.00      9.75    3%     6548     1% /
/dev/hd2          10.00      4.55   55%    84383     8% /usr
/dev/hd9var        5.00      4.04   20%     6290     1% /var
/dev/hd3           5.00      3.87   23%     1551     1% /tmp
/dev/hd1          10.00      9.91    1%      382     1% /home
/proc                 -         -    -         -     -  /proc
/dev/hd10opt       5.00      4.89    3%     3502     1% /opt
/dev/archalv      99.00     82.98   17%       96     1% /archa
/dev/fslv01       40.00     19.49   52%    72324     2% /ora10
/dev/fslv00     1800.00    467.25   75%       10     1% /rman

这下让人迷糊了,磁盘空间还剩余467.25G,怎么会报错呢?

分析原因

RMAN-03009: failure of backup command on t12 channel at 05/15/2012 04:39:58
ORA-19502: write error on file "/rman/db_mmnb04jl_1_1", blockno 30481025 (blocksize=8192)
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 28: No space left on device
Additional information: -1
Additional information: 1048576
ORA-19502: write error on file "/rman/db_mmnb04jl_1_1", blockno 30480897 (blocksize=8192)
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 28: No space left on device
channel t12 disabled, job failed on it will be run on another channel

RMAN-03009: failure of backup command on t17 channel at 05/15/2012 04:39:58
ORA-19502: write error on file "/rman/db_mrnb04jm_1_1", blockno 30753793 (blocksize=8192)
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 28: No space left on device
Additional information: -1
Additional information: 1048576
ORA-19502: write error on file "/rman/db_mrnb04jm_1_1", blockno 30753665 (blocksize=8192)
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 28: No space left on device
channel t17 disabled, job failed on it will be run on another channel

这两个通道在写入rman备份到磁盘中的时候,在05/15/2012 04:39:58发现磁盘空间不足,两个通道分别准备写入30480897/30753665号块的时候出错,那么当时这两个通道分别写入的数据块数为30480896/30753664,写入文件大小为(30480896+30753664)*8192/1024/1024/1024=467.1826171875G.这里可以看出磁盘剩余空间467.25G,其实当时已经写入了467.1826171875G,继续写入的时候出错.然后rman为了保证备份的正确性,自动删除了当时已经备份的467.1826171875G错误的备份文件.从而在备份结束后看到磁盘空间还有大量剩余而rman包空间不足的现象.

library cache latch等待事件

产生library cache latch原因

The library cache latches protect the cached SQL statements and objects' definitions held 
in the library cache within the shared pool. The library cache latch must be acquired 
in order to add a new statement to the library cache. During a parse, Oracle searches 
the library cache for a matching statement. If one is not found, then Oracle will parse 
the SQL statement, obtain the library cache latch and insert the new SQL.

每一个sql被执行之前,先要到library cache中根据hash_value查找parent cursor,这就需要先获得library cache latch;找到parent cursor后,就会去查找对应的child cursor,当发现无法找到时,就会释放library cache latch,获得share pool latch分配空间给硬解析后的产生的执行计划;然后再次获得library cache latch进行把执行计划放入share pool,转入library cache pin+lock(null模式)开始执行sql.library cache latch 的个数有限(与CPU_COUNT参数相关),当数据库中出现大量硬解析的时候,某一个sql无法得到library cache latch就会开始spin,达到spin count后还没得到,就会开始sleep,达到sleep时间后,醒来还再次试图过的library cache latch得不到就在spin再得不到又sleep…依此类推.
综上可知:在sql执行的过程中可以看出在出现High Versions Count和Hard Parse的情况下都有可能出现library cache latch等待.
关于Hard Parse见:shared pool latch 等待事件
关于High Versions Count见:关于High Versions Count总结

1._KGL_LATCH_COUNT控制library cache latches数量

The hidden parameter _KGL_LATCH_COUNT controls the number of library cache latches. 
The default value should be adequate, but if contention for the library cache latch cannot  be resolved, 
it one may consider increasing this value. The default value for _KGL_LATCH_COUNT is the next prime number 
after CPU_COUNT. This value cannot exceed 67. 

2.Library cache: mutex X
在10g及其以后版本中,很多latch使用mutex代替,我们常见的Library cache: mutex X is similar to library cache wait in earlier version.(_kks_use_mutex_pin=false可以禁止mutex)

ORA-07445 [ACCESS_VIOLATION] [UNABLE_TO_READ] []

alert中发现ORA-07445错误
ORA-07445: exception encountered: core dump [PC:0x7FFF65D0] [ACCESS_VIOLATION] [ADDR:0xFFFFFFFF] [PC:0x7FFF65D0] [UNABLE_TO_READ] []错误,导致数据库down掉

Mon May 14 14:34:34 2012
Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0xFFFFFFFF] [PC:0x7FFF65D0, {empty}]
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_p001_1280.trc:
ORA-07445: exception encountered: core dump [PC:0x7FFF65D0] [ACCESS_VIOLATION] 
[ADDR:0xFFFFFFFF] [PC:0x7FFF65D0] [UNABLE_TO_READ] []
Mon May 14 14:34:35 2012
Trace dumping is performing id=[cdmp_20120514143435]
Mon May 14 14:35:10 2012
Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0xFFFFFFFF] [PC:0x7FFF65D0, {empty}]
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_1072.trc  (incident=164712):
ORA-07445: exception encountered: core dump [PC:0x7FFF65D0] [ACCESS_VIOLATION] 
[ADDR:0xFFFFFFFF] [PC:0x7FFF65D0] [UNABLE_TO_READ] []
ORA-12080: Buffer cache miss for IOQ batching
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_164712\orcl_smon_1072_i164712.trc

分析trace文件

Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Windows Server 2003 Version V5.2 Service Pack 2
CPU                 : 8 - type 586, 4 Physical Cores
Process Affinity    : 0x00000000
Memory (Avail/Total): Ph:4892M/8189M, Ph+PgF:5638M/9795M, VA:925M/4095M
Instance name: orcl
Redo thread mounted by this instance: 1
Oracle process number: 12
Windows thread id: 1072, image: ORACLE.EXE (SMON)
--以上信息得出操作系统和数据库版本2003 sp2+oracle11g(11.1.0.6 32位)

Dump continued from file: d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_1072.trc
ORA-07445: exception encountered: core dump [PC:0x7FFF65D0] [ACCESS_VIOLATION] 
[ADDR:0xFFFFFFFF] [PC:0x7FFF65D0] [UNABLE_TO_READ] []
ORA-12080: Buffer cache miss for IOQ batching

========= Dump for incident 164712 (ORA 7445 [PC:0x7FFF65D0]) ========
----- Beginning of Customized Incident Dump(s) -----
Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0xFFFFFFFF] [PC:0x7FFF65D0, {empty}]

--这里的ORA-07445 [ACCESS_VIOLATION][UNABLE_TO_READ]根据经验结合这里的32位的环境,
--怀疑是sga使用的内存太多,ORACLE数据库不能读SGA相关内存导致

--在trace中找出相关参数配置.
[0004]: processes=300
[0004]: sessions=335
[0004]: __shared_pool_size=1124073472
[0004]: __large_pool_size=8388608
[0004]: __java_pool_size=16777216
[0004]: __streams_pool_size=251658240
[0004]: streams_pool_size=251658240
[0004]: sga_target=0
[0004]: __sga_target=1887436800
[0004]: memory_target=3145728000
[0004]: memory_max_target=4722786304
[0004]: db_block_size=8192
[0004]: __db_cache_size=478150656
[0004]: __shared_io_pool_size=0
[0004]: compatible=11.1.0.0.0
[0004]: log_buffer=8851456
[0004]: __pga_aggregate_target=780140544
--这里可以看到sga_target分配了内存为1887436800=1.7578125G
--pga_aggregate_target分配了780140544=0.7265625G
--两者内存之和大于2G,超过了32位ORACLE默认限制

查询MOS发现[1341681.1]
该错误原因

This is a resource issue (memory in particular). 32-bit windows systems, 
are limited to 2GB of addressable memory so if you are on this platform 
it's likely you are simply exceeding the capabilities of the 32bit operating system.

解决建议

First recommendation :

If you have not already done so, add the /3GB switch to your boot.ini file and reboot the server. The
boot.ini will be located in the root directory on the drive where windows is installed. The switch, /3GB,
is placed at the end of the line that executes the WinNT loading process. 
This will allow applications such as oracle access to 3Gb or memory instead of 2Gb.

Example:

[operating systems] multi(0)disk(0)rdisk(0)partition(2)\WINNT="Windows NT Server Version 4.00" /3GB

 Second recommendation :

You do not want to increase memory target. If anything, this should be decreased.
You are limited to under 2GB of addressable memory on 32bit windows (the limit is actually about 1.85GB). 
This is for both SGA and PGA memory for all instances; you have to reduce the SGA size for the instance. 
The recommendation is to reduce sga_target, memory_target, and memory_max_target.