记录一次rman备份ORA-19502/ORA-27063错误原因分析

Posted on 2012 年 05 月 15 日 by 惜分飞

rman备份出现ORA-19502/ORA-27063错误

RMAN> 2> 3> 4> 5> 6> 7> 8> 9> 10> 11> 12> 13> 14> 15> 16> 17> 18> 19> 20> 
allocated channel: t11
channel t11: sid=824 instance=ncdb1 devtype=DISK

allocated channel: t12
channel t12: sid=838 instance=ncdb1 devtype=DISK

allocated channel: t13
channel t13: sid=809 instance=ncdb1 devtype=DISK

allocated channel: t14
channel t14: sid=886 instance=ncdb1 devtype=DISK

allocated channel: t15
channel t15: sid=620 instance=ncdb1 devtype=DISK

allocated channel: t16
channel t16: sid=599 instance=ncdb1 devtype=DISK

allocated channel: t17
channel t17: sid=482 instance=ncdb1 devtype=DISK

allocated channel: t18
channel t18: sid=506 instance=ncdb1 devtype=DISK
一共开通8个通道

channel t12: starting full datafile backupset
channel t12: specifying datafile(s) in backupset
input datafile fno=00008 name=/dev/rnc32g_39
input datafile fno=00016 name=/dev/rnc32g_47
input datafile fno=00024 name=/dev/rnc32g_57
input datafile fno=00032 name=/dev/rnc32g_25
input datafile fno=00040 name=/dev/rnc32g_33
input datafile fno=00048 name=/dev/rnc32g_3
input datafile fno=00056 name=/dev/rnc32g_11
input datafile fno=00064 name=/dev/rnc32g_19
input datafile fno=00072 name=/dev/rnc32g_67
input datafile fno=00080 name=/dev/rnc32g_106
input datafile fno=00088 name=/dev/rnc32g_114
input datafile fno=00096 name=/dev/rnc32g_87
input datafile fno=00104 name=/dev/rnc32g_95
input datafile fno=00112 name=/dev/rnc32g_103
input datafile fno=00120 name=/dev/rnc32g_75
input datafile fno=00003 name=/dev/rnc50_sysaux
input datafile fno=00130 name=/dev/rnc32g_119
channel t12: starting piece 1 at 14-MAY-12
--通道12备份数据文件

channel t17: starting full datafile backupset
channel t17: specifying datafile(s) in backupset
input datafile fno=00002 name=/dev/rnc32g_22
input datafile fno=00013 name=/dev/rnc32g_44
input datafile fno=00021 name=/dev/rnc32g_54
input datafile fno=00029 name=/dev/rnc32g_62
input datafile fno=00037 name=/dev/rnc32g_30
input datafile fno=00045 name=/dev/rnc32g_38
input datafile fno=00053 name=/dev/rnc32g_8
input datafile fno=00061 name=/dev/rnc32g_16
input datafile fno=00069 name=/dev/rnc32g_64
input datafile fno=00077 name=/dev/rncundo_33g_4
input datafile fno=00085 name=/dev/rnc32g_111
input datafile fno=00093 name=/dev/rnc32g_84
input datafile fno=00101 name=/dev/rnc32g_92
input datafile fno=00109 name=/dev/rnc32g_100
input datafile fno=00117 name=/dev/rnc32g_72
input datafile fno=00006 name=/dev/rnc50_4g_1
channel t17: starting piece 1 at 14-MAY-12
--通道17备份数据文件

channel t15: finished piece 1 at 15-MAY-12
piece handle=/rman/db_mpnb04jl_1_1 tag=TAG20120514T204954 comment=NONE
channel t15: backup set complete, elapsed time: 06:07:59
channel t11: finished piece 1 at 15-MAY-12
piece handle=/rman/db_mlnb04jk_1_1 tag=TAG20120514T204954 comment=NONE
channel t11: backup set complete, elapsed time: 06:17:25
channel t16: finished piece 1 at 15-MAY-12
piece handle=/rman/db_mqnb04jm_1_1 tag=TAG20120514T204954 comment=NONE
channel t16: backup set complete, elapsed time: 06:34:49
channel t14: finished piece 1 at 15-MAY-12
piece handle=/rman/db_monb04jl_1_1 tag=TAG20120514T204954 comment=NONE
channel t14: backup set complete, elapsed time: 06:40:05
channel t18: finished piece 1 at 15-MAY-12
piece handle=/rman/db_msnb04jn_1_1 tag=TAG20120514T204954 comment=NONE
channel t18: backup set complete, elapsed time: 06:43:38
channel t13: finished piece 1 at 15-MAY-12
piece handle=/rman/db_mnnb04jl_1_1 tag=TAG20120514T204954 comment=NONE
channel t13: backup set complete, elapsed time: 07:40:56
--这里可以看出rman的备份完成了通道11/13/14/15/16/18，也就是说目前为止通道12/17未完成.

RMAN-03009: failure of backup command on t12 channel at 05/15/2012 04:39:58
ORA-19502: write error on file "/rman/db_mmnb04jl_1_1", blockno 30481025 (blocksize=8192)
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 28: No space left on device
Additional information: -1
Additional information: 1048576
ORA-19502: write error on file "/rman/db_mmnb04jl_1_1", blockno 30480897 (blocksize=8192)
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 28: No space left on device
channel t12 disabled, job failed on it will be run on another channel
--通道12报错(硬盘空间不足)

channel t11: starting full datafile backupset
channel t11: specifying datafile(s) in backupset
input datafile fno=00008 name=/dev/rnc32g_39
input datafile fno=00016 name=/dev/rnc32g_47
input datafile fno=00024 name=/dev/rnc32g_57
input datafile fno=00032 name=/dev/rnc32g_25
input datafile fno=00040 name=/dev/rnc32g_33
input datafile fno=00048 name=/dev/rnc32g_3
input datafile fno=00056 name=/dev/rnc32g_11
input datafile fno=00064 name=/dev/rnc32g_19
input datafile fno=00072 name=/dev/rnc32g_67
input datafile fno=00080 name=/dev/rnc32g_106
input datafile fno=00088 name=/dev/rnc32g_114
input datafile fno=00096 name=/dev/rnc32g_87
input datafile fno=00104 name=/dev/rnc32g_95
input datafile fno=00112 name=/dev/rnc32g_103
input datafile fno=00120 name=/dev/rnc32g_75
input datafile fno=00003 name=/dev/rnc50_sysaux
input datafile fno=00130 name=/dev/rnc32g_119
channel t11: starting piece 1 at 15-MAY-12
--在通道12报错后,通道11已经完成了上次备份,所以启动备份通道12出错的数据文件

RMAN-03009: failure of backup command on t17 channel at 05/15/2012 04:39:58
ORA-19502: write error on file "/rman/db_mrnb04jm_1_1", blockno 30753793 (blocksize=8192)
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 28: No space left on device
Additional information: -1
Additional information: 1048576
ORA-19502: write error on file "/rman/db_mrnb04jm_1_1", blockno 30753665 (blocksize=8192)
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 28: No space left on device
channel t17 disabled, job failed on it will be run on another channel
--通道17也因为磁盘空间报错

channel t13: starting full datafile backupset
channel t13: specifying datafile(s) in backupset
input datafile fno=00002 name=/dev/rnc32g_22
input datafile fno=00013 name=/dev/rnc32g_44
input datafile fno=00021 name=/dev/rnc32g_54
input datafile fno=00029 name=/dev/rnc32g_62
input datafile fno=00037 name=/dev/rnc32g_30
input datafile fno=00045 name=/dev/rnc32g_38
input datafile fno=00053 name=/dev/rnc32g_8
input datafile fno=00061 name=/dev/rnc32g_16
input datafile fno=00069 name=/dev/rnc32g_64
input datafile fno=00077 name=/dev/rncundo_33g_4
input datafile fno=00085 name=/dev/rnc32g_111
input datafile fno=00093 name=/dev/rnc32g_84
input datafile fno=00101 name=/dev/rnc32g_92
input datafile fno=00109 name=/dev/rnc32g_100
input datafile fno=00117 name=/dev/rnc32g_72
input datafile fno=00006 name=/dev/rnc50_4g_1
channel t13: starting piece 1 at 15-MAY-12
--通道13也尝试备份通道17失败的数据文件

RMAN-03009: failure of backup command on t11 channel at 05/15/2012 04:39:59
ORA-19504: failed to create file "/rman/db_mtnb104u_1_1"
ORA-27044: unable to write the header block of file
IBM AIX RISC System/6000 Error: 28: No space left on device
Additional information: 3
Addition
--因为当前没有空闲空间,通道11终止,
--这个时候rman异常终止,导致后续的通道13终止记录未打印到日志

阅读完rman日志,很好理解因为存放rman备份的磁盘空间不足导致了一系列错误

检查磁盘剩余空间

Filesystem    GB blocks      Free %Used    Iused %Iused Mounted on
/dev/hd4          10.00      9.75    3%     6548     1% /
/dev/hd2          10.00      4.55   55%    84383     8% /usr
/dev/hd9var        5.00      4.04   20%     6290     1% /var
/dev/hd3           5.00      3.87   23%     1551     1% /tmp
/dev/hd1          10.00      9.91    1%      382     1% /home
/proc                 -         -    -         -     -  /proc
/dev/hd10opt       5.00      4.89    3%     3502     1% /opt
/dev/archalv      99.00     82.98   17%       96     1% /archa
/dev/fslv01       40.00     19.49   52%    72324     2% /ora10
/dev/fslv00     1800.00    467.25   75%       10     1% /rman

这下让人迷糊了,磁盘空间还剩余467.25G,怎么会报错呢？

分析原因

RMAN-03009: failure of backup command on t12 channel at 05/15/2012 04:39:58
ORA-19502: write error on file "/rman/db_mmnb04jl_1_1", blockno 30481025 (blocksize=8192)
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 28: No space left on device
Additional information: -1
Additional information: 1048576
ORA-19502: write error on file "/rman/db_mmnb04jl_1_1", blockno 30480897 (blocksize=8192)
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 28: No space left on device
channel t12 disabled, job failed on it will be run on another channel

RMAN-03009: failure of backup command on t17 channel at 05/15/2012 04:39:58
ORA-19502: write error on file "/rman/db_mrnb04jm_1_1", blockno 30753793 (blocksize=8192)
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 28: No space left on device
Additional information: -1
Additional information: 1048576
ORA-19502: write error on file "/rman/db_mrnb04jm_1_1", blockno 30753665 (blocksize=8192)
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 28: No space left on device
channel t17 disabled, job failed on it will be run on another channel

这两个通道在写入rman备份到磁盘中的时候,在05/15/2012 04:39:58发现磁盘空间不足,两个通道分别准备写入30480897/30753665号块的时候出错,那么当时这两个通道分别写入的数据块数为30480896/30753664,写入文件大小为(30480896+30753664)*8192/1024/1024/1024=467.1826171875G.这里可以看出磁盘剩余空间467.25G,其实当时已经写入了467.1826171875G,继续写入的时候出错.然后rman为了保证备份的正确性,自动删除了当时已经备份的467.1826171875G错误的备份文件.从而在备份结束后看到磁盘空间还有大量剩余而rman包空间不足的现象.

library cache latch等待事件

Posted on 2012 年 05 月 14 日 by 惜分飞

产生library cache latch原因

The library cache latches protect the cached SQL statements and objects' definitions held 
in the library cache within the shared pool. The library cache latch must be acquired 
in order to add a new statement to the library cache. During a parse, Oracle searches 
the library cache for a matching statement. If one is not found, then Oracle will parse 
the SQL statement, obtain the library cache latch and insert the new SQL.

每一个sql被执行之前，先要到library cache中根据hash_value查找parent cursor,这就需要先获得library cache latch；找到parent cursor后，就会去查找对应的child cursor,当发现无法找到时，就会释放library cache latch，获得share pool latch分配空间给硬解析后的产生的执行计划；然后再次获得library cache latch进行把执行计划放入share pool,转入library cache pin+lock(null模式)开始执行sql.library cache latch 的个数有限(与CPU_COUNT参数相关)，当数据库中出现大量硬解析的时候，某一个sql无法得到library cache latch就会开始spin,达到spin count后还没得到，就会开始sleep,达到sleep时间后,醒来还再次试图过的library cache latch得不到就在spin再得不到又sleep…依此类推.
综上可知:在sql执行的过程中可以看出在出现High Versions Count和Hard Parse的情况下都有可能出现library cache latch等待.
关于Hard Parse见:shared pool latch 等待事件
关于High Versions Count见:关于High Versions Count总结

1._KGL_LATCH_COUNT控制library cache latches数量

The hidden parameter _KGL_LATCH_COUNT controls the number of library cache latches. 
The default value should be adequate, but if contention for the library cache latch cannot  be resolved, 
it one may consider increasing this value. The default value for _KGL_LATCH_COUNT is the next prime number 
after CPU_COUNT. This value cannot exceed 67.

2.Library cache: mutex X
在10g及其以后版本中,很多latch使用mutex代替,我们常见的Library cache: mutex X is similar to library cache wait in earlier version.(_kks_use_mutex_pin=false可以禁止mutex)

ORA-07445 [ACCESS_VIOLATION] [UNABLE_TO_READ] []

Posted on 2012 年 05 月 14 日 by 惜分飞

alert中发现ORA-07445错误
ORA-07445: exception encountered: core dump [PC:0x7FFF65D0] [ACCESS_VIOLATION] [ADDR:0xFFFFFFFF] [PC:0x7FFF65D0] [UNABLE_TO_READ] []错误,导致数据库down掉

Mon May 14 14:34:34 2012
Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0xFFFFFFFF] [PC:0x7FFF65D0, {empty}]
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_p001_1280.trc:
ORA-07445: exception encountered: core dump [PC:0x7FFF65D0] [ACCESS_VIOLATION] 
[ADDR:0xFFFFFFFF] [PC:0x7FFF65D0] [UNABLE_TO_READ] []
Mon May 14 14:34:35 2012
Trace dumping is performing id=[cdmp_20120514143435]
Mon May 14 14:35:10 2012
Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0xFFFFFFFF] [PC:0x7FFF65D0, {empty}]
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_1072.trc  (incident=164712):
ORA-07445: exception encountered: core dump [PC:0x7FFF65D0] [ACCESS_VIOLATION] 
[ADDR:0xFFFFFFFF] [PC:0x7FFF65D0] [UNABLE_TO_READ] []
ORA-12080: Buffer cache miss for IOQ batching
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_164712\orcl_smon_1072_i164712.trc

分析trace文件

Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Windows Server 2003 Version V5.2 Service Pack 2
CPU                 : 8 - type 586, 4 Physical Cores
Process Affinity    : 0x00000000
Memory (Avail/Total): Ph:4892M/8189M, Ph+PgF:5638M/9795M, VA:925M/4095M
Instance name: orcl
Redo thread mounted by this instance: 1
Oracle process number: 12
Windows thread id: 1072, image: ORACLE.EXE (SMON)
--以上信息得出操作系统和数据库版本2003 sp2+oracle11g(11.1.0.6 32位)

Dump continued from file: d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_smon_1072.trc
ORA-07445: exception encountered: core dump [PC:0x7FFF65D0] [ACCESS_VIOLATION] 
[ADDR:0xFFFFFFFF] [PC:0x7FFF65D0] [UNABLE_TO_READ] []
ORA-12080: Buffer cache miss for IOQ batching

========= Dump for incident 164712 (ORA 7445 [PC:0x7FFF65D0]) ========
----- Beginning of Customized Incident Dump(s) -----
Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0xFFFFFFFF] [PC:0x7FFF65D0, {empty}]

--这里的ORA-07445 [ACCESS_VIOLATION][UNABLE_TO_READ]根据经验结合这里的32位的环境,
--怀疑是sga使用的内存太多,ORACLE数据库不能读SGA相关内存导致

--在trace中找出相关参数配置.
[0004]: processes=300
[0004]: sessions=335
[0004]: __shared_pool_size=1124073472
[0004]: __large_pool_size=8388608
[0004]: __java_pool_size=16777216
[0004]: __streams_pool_size=251658240
[0004]: streams_pool_size=251658240
[0004]: sga_target=0
[0004]: __sga_target=1887436800
[0004]: memory_target=3145728000
[0004]: memory_max_target=4722786304
[0004]: db_block_size=8192
[0004]: __db_cache_size=478150656
[0004]: __shared_io_pool_size=0
[0004]: compatible=11.1.0.0.0
[0004]: log_buffer=8851456
[0004]: __pga_aggregate_target=780140544
--这里可以看到sga_target分配了内存为1887436800=1.7578125G
--pga_aggregate_target分配了780140544=0.7265625G
--两者内存之和大于2G,超过了32位ORACLE默认限制

查询MOS发现[1341681.1]
该错误原因

This is a resource issue (memory in particular). 32-bit windows systems, 
are limited to 2GB of addressable memory so if you are on this platform 
it's likely you are simply exceeding the capabilities of the 32bit operating system.

解决建议

First recommendation :

If you have not already done so, add the /3GB switch to your boot.ini file and reboot the server. The
boot.ini will be located in the root directory on the drive where windows is installed. The switch, /3GB,
is placed at the end of the line that executes the WinNT loading process. 
This will allow applications such as oracle access to 3Gb or memory instead of 2Gb.

Example:

[operating systems] multi(0)disk(0)rdisk(0)partition(2)\WINNT="Windows NT Server Version 4.00" /3GB

 Second recommendation :

You do not want to increase memory target. If anything, this should be decreased.
You are limited to under 2GB of addressable memory on 32bit windows (the limit is actually about 1.85GB). 
This is for both SGA and PGA memory for all instances; you have to reduce the SGA size for the instance. 
The recommendation is to reduce sga_target, memory_target, and memory_max_target.

9i库遇到ORA-01595/ORA-01594

Posted on 2012 年 05 月 14 日 by 惜分飞

在alert日志中发现ORA-01595/ORA-01594错误

Sat May 12 21:54:17 2012
Errors in file /oracle/app/admin/prmdb/bdump/prmdb2_smon_483522.trc:
ORA-01595: error freeing extent (2) of rollback segment (19))
ORA-01594: attempt to wrap into rollback segment (19) extent (2) which is being freed

分析trace文件

Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options
JServer Release 9.2.0.8.0 - Production
ORACLE_HOME = /oracle/app/product/9.2.0
System name:    AIX
Node name:      prmsvr2
Release:        3
Version:        5
Machine:        0008585FD600
Instance name: prmdb2
Redo thread mounted by this instance: 2
Oracle process number: 14
Unix process pid: 483522, image: oracle@prmsvr2 (SMON)

*** 2011-05-03 15:28:47.858
*** SESSION ID:(17.1) 2011-05-03 15:28:47.843
*** 2011-05-03 15:28:47.858
SMON: Parallel transaction recovery tried
*** 2011-07-11 17:13:52.028
SMON: Restarting fast_start parallel rollback
*** 2011-07-11 17:28:39.705
SMON: Parallel transaction recovery tried
*** 2012-05-12 21:54:17.246   --当前问题时间点
SMON: following errors trapped and ignored:
ORA-01595: error freeing extent (2) of rollback segment (19))
ORA-01594: attempt to wrap into rollback segment (19) extent (2) which is being freed

--通过trace文件,我们没有获得关于该错误的其他有用信息

查询MOS相关信息[280151.1]
出现该错误原因

This is a known problem and there is an Internal Bug:2181139 for this Issue.

The error is signaled because smon is shrinking a rollback segment and this fails 
because we need an extent to store some rollback information. This is a failure message 
for the shrinking. Subsequently smon would succeed in doing that.
--当smon在shrink rollback segment因为需要一个extent存放rollback

解决建议

Ignore these error messages.
Normally adding more undo space should solve the problem, 
but if space is not correcting the problem, please file an SR for assistance.

This error message logging is fixed in 10g.
--忽略该错误或者升级到10g

关于High Versions Count总结

Posted on 2012 年 05 月 13 日 by 惜分飞

High Versions Count后果
sql查询–>hash对比确定是否存在shared pool中(不讨论不存在情况)–>选择合适的children
可能导致library cache latch contention

When you have unnecessary versions of a cursor, each time that cursor is executed, 
the parse engine has to search through the list of versions to see which is the cursor that you want. 
This wastes CPU cycles that you could be using on something else.

High version counts can easily cause high contention for library cache latches.
A process parsing a SQL statement with many versions (children cursors) will need to scan
through all these children while holding on to a library cache latch.
This means that other processes needing the same latch will have to wait and
can lead to significant database-wide performance degradation.

引起High Versions Count原因

•UNBOUND_CURSOR - The existing child cursor was not fully built (in other words, it was not optimized) 

•SQL_TYPE_MISMATCH - The SQL type does not match the existing child cursor 

•**OPTIMIZER_MISMATCH - The optimizer environment does not match the existing child cursor. 
For example:
select count(*) from emp; ->> 1 PARENT, 1 CHILD 
alter session set optimizer_mode=ALL_ROWS 
select count(*) from emp; ->> 1 PARENT, 2 CHILDREN 
    (The optimizer mode has changed and therefore 
       the existing child cannot be reused)
(The same applies with events - if I turned on tracing with 10046 than 
I would get the OPTIMIZER_MISMATCH again and a 3rd child) 

•OUTLINE_MISMATCH - The outlines do not match the existing child cursor 
If my user had created stored outlines previously for this command and they were 
stored in seperate categories (say "OUTLINES1" and "OUTLINES2") running:-

alter session set use_stored_outlines = OUTLINES1; 
select count(*) from emp; 
alter session set use_stored_oulines= OUTLINES2; 
select count(*) from emp;
--> Would create a 2nd child as the outline used is different than the first run. 

•STATS_ROW_MISMATCH - The existing statistics do not match the existing child cursor. 
Check that 10046/sql_trace is not set on all sessions as this can cause this. 

•LITERAL_MISMATCH - Non-data literal values do not match the existing child cursor 

•SEC_DEPTH_MISMATCH - Security level does not match the existing child cursor 

•EXPLAIN_PLAN_CURSOR - The child cursor is an explain plan cursor and should not be shared. 
Explain plan statements will generate a new child by default - the mismatch will be this. 

•BUFFERED_DML_MISMATCH - Buffered DML does not match the existing child cursor 

•PDML_ENV_MISMATCH - PDML environment does not match the existing child cursor 

•INST_DRTLD_MISMATCH - Insert direct load does not match the existing child cursor 

•SLAVE_QC_MISMATCH -The existing child cursor is a slave cursor and the new one was issued by the coordinator 
(or, the existing child cursor was issued by the coordinator and the new one is a slave cursor). 

•TYPECHECK_MISMATCH - The existing child cursor is not fully optimized 

•AUTH_CHECK_MISMATCH - Authorization/translation check failed for the existing child cursor 

The user does not have permission to access the object in any previous version of the cursor. 
A typical example would be where each user has it's own copy of a table 

•**BIND_MISMATCH - The bind metadata does not match the existing child cursor. For example:
variable a varchar2(100); 
select count(*) from emp where ename = :a ->> 1 PARENT, 1 CHILD 
variable a varchar2(400); 
select count(*) from emp where ename = :a ->> 1 PARENT, 2 CHILDREN 
   (The bind 'a' has now changed in definition)

•DESCRIBE_MISMATCH - The typecheck heap is not present during the describe for the child cursor 

•LANGUAGE_MISMATCH - The language handle does not match the existing child cursor 

•TRANSLATION_MISMATCH - The base objects of the existing child cursor do not match. 
The definition of the object does not match any current version. 
Usually this is indicative of the same issue as "AUTH_CHECK_MISMATCH" where the object is different.

•ROW_LEVEL_SEC_MISMATCH - The row level security policies do not match 

•INSUFF_PRIVS - Insufficient privileges on objects referenced by the existing child cursor 

•INSUFF_PRIVS_REM - Insufficient privileges on remote objects referenced by the existing child cursor 

•REMOTE_TRANS_MISMATCH - The remote base objects of the existing child cursor do not match 
USER1: select count(*) from table@remote_db 
USER2: select count(*) from table@remote_db 

  (Although the SQL is identical, the dblink pointed to 
   by remote_db may be a private dblink which resolves 
   to a different object altogether)

•LOGMINER_SESSION_MISMATCH 

•INCOMP_LTRL_MISMATCH 

•OVERLAP_TIME_MISMATCH - error_on_overlap_time mismatch 

•SQL_REDIRECT_MISMATCH - sql redirection mismatch 

•MV_QUERY_GEN_MISMATCH - materialized view query generation 

•USER_BIND_PEEK_MISMATCH - user bind peek mismatch 

•TYPCHK_DEP_MISMATCH - cursor has typecheck dependencies 

•NO_TRIGGER_MISMATCH - no trigger mismatch 

•FLASHBACK_CURSOR - No cursor sharing for flashback 

•ANYDATA_TRANSFORMATION - anydata transformation change 

•INCOMPLETE_CURSOR - incomplete cursor.
When bind length is upgradeable (i.e. we found a child cursor that matches everything
else except that the bind length is not long enough), we mark the old cursor is not usable
and build a new one.  This means the version can be ignored.

•TOP_LEVEL_RPI_CURSOR - top level/rpi cursor 
In a Parallel Query invocation this is expected behaviour (we purposely do not share)

•DIFFERENT_LONG_LENGTH - different long length 

•LOGICAL_STANDBY_APPLY - logical standby apply mismatch 

•DIFF_CALL_DURN - different call duration 

•BIND_UACS_DIFF - bind uacs mismatch 

•PLSQL_CMP_SWITCHS_DIFF - plsql compiler switches mismatch 

•CURSOR_PARTS_MISMATCH - cursor-parts executed mismatch 

•STB_OBJECT_MISMATCH - STB object different (now exists) 

•ROW_SHIP_MISMATCH - row shipping capability mismatch 

•PQ_SLAVE_MISMATCH - PQ slave mismatch 
Check you want to be using PX with this reason code, as the problem could be caused by running 
lots of small SQL statements which do not really need PX. If you are on < 11i you may be hitting Bug:4367986

•TOP_LEVEL_DDL_MISMATCH - top-level DDL cursor 

•MULTI_PX_MISMATCH - multi-px and slave-compiled cursor 

•BIND_PEEKED_PQ_MISMATCH - bind-peeked PQ cursor 

•MV_REWRITE_MISMATCH - MV rewrite cursor 

•ROLL_INVALID_MISMATCH - rolling invalidation window exceeded 
This is caused by the rolling invalidation capability in DBMS_STATS. 
The child cannot be shared as it's invalidation window is exceeded. See:
Note:557661.1  Rolling Cursor Invalidations with DBMS_STATS in Oracle10g (Doc ID 557661.1)

•OPTIMIZER_MODE_MISMATCH - optimizer mode mismatch 

•PX_MISMATCH - parallel query mismatch 
If running 11.1.0.6 and RAC see Bug:7352775. 
Check that if (on each instance) parallel_instance_groups is set then instance_groups is set to the same.

•MV_STALEOBJ_MISMATCH - mv stale object mismatch 

•FLASHBACK_TABLE_MISMATCH - flashback table mismatch 

•LITREP_COMP_MISMATCH - literal replacement compilation mismatch 

New in 11g :
•PLSQL_DEBUG - debug mismatch Session has debugging parameter plsql_debug set to true

•LOAD_OPTIMIZER_STATS  - Load optimizer stats for cursor sharing

•ACL_MISMATCH   -  Check ACL mismatch

•FLASHBACK_ARCHIVE_MISMATCH  - Flashback archive mismatch

•LOCK_USER_SCHEMA_FAILED  - Failed to lock user and schema

•REMOTE_MAPPING_MISMATCH  - Remote mapping mismatch

•LOAD_RUNTIME_HEAP_FAILED  - Runtime heap mismatch

•HASH_MATCH_FAILED  - Hash mismatch
Set to "Y" if sharing fails due to a hash mismatch, such as the case with mismatched histogram data 
or a range predicate marked as unsafe by literal replacement (See Bug:3461251)

New in 11.2  :
•PURGED_CURSOR - cursor marked for purging
The cursor has been marked for purging with dbms_shared_pool.purge

•BIND_LENGTH_UPGRADEABLE - bind length upgradeable
 Could not be shared because a bind variable size was smaller than the new value beiing inserted 
(marked as BIND_MISMATCH in earlier versions). 

•USE_FEEDBACK_STATS - cardinality feedback
Cardinality feedback is being used and therefore a new plan could be formed for the current execution.


•BIND_EQUIV_FAILURE - The bind value's selectivity does not match that used to optimize the existing child cursor

There is no longer  ROW_LEVEL_SEC_MISMATCH in 11.2.

High Versions Count查询
1.使用version_rpt function查询

--install version_rpt3_21
connect / as sysdba
start version_rpt3_21.sql

-- Generate reports for all cursors with more than 100 versions using SQL_ID (10g and up)
select b.* from v$sqlarea a ,table(version_rpt(a.sql_id)) b where loaded_versions >=100;

-- Generate reports for all cursors with more than 100 versions using HASH_VALUE
select b.* from v$sqlarea a ,table(version_rpt(null,a.hash_value)) b where loaded_versions>=100;

-- Generate the report for cursor with sql_id cyzznbykb509s
select * from table(version_rpt('cyzznbykb509s'));

2.直接查询

select sql_id,version_count, ADDRESS,sql_text
from   v$sqlarea 
where version_count > 10 
order by version_count, hash_value;

•Version 9.2.X.X and below : 
select * from v$sql_shared_cursor where kglhdpar = '0000000386BC2E58'
•Version 10.0.X.X and above:
select * from v$sql_shared_cursor where address = '0000000386BC2E58'

NOTE:The 'Y's denote a mismatch

High Versions Count跟踪

CURSORTRACE(10G及其以后版本)

TO trace on using:-  
alter system set events 
'immediate trace name cursortrace level 577, address hash_value';
(level 578/580 can be used for high level tracing (577=level 1, 578=level 2, 580=level 3) 

To turn off tracing use:- 
alter system set events 
'immediate trace name cursortrace level 2147483648, address 1';

Please note: BUG:5555371 exists in 10.2 (fixed in 10.2.0.4) where cursor trace cannot fully be turned off 
and single line entries will still be made to the trace file as a result. 
The w/a is to restart the instance. How invasive this BUG is depends on the executions of the cursor 
(and the size of the resultant trace file additions) 

cursordump(11GR2)
alter system set events 'immediate trace name cursordump level 16'

version_rpt3_21脚本:下载
参考:Troubleshooting: High Version Count Issues [ID 296377.1]