启动数据库报错
SQL> startup
ORACLE instance started.
Total System Global Area 167772160 bytes
Fixed Size 1260720 bytes
Variable Size 150995792 bytes
Database Buffers 8388608 bytes
Redo Buffers 7127040 bytes
Database mounted.
ORA-01092: ORACLE instance terminated. Disconnection forced
alert.log导错
Wed Aug 10 12:31:11 2011
Errors in file /u01/admin/xienfei/udump/xff_ora_8568.trc:
ORA-00600: internal error code, arguments: [ktbdchk1: bad dscn], [], [], [], [], [], [], []
xff_ora_8568.trc内容
[ktbdchk] -- readers_dsz -- bad dscn
scn: 0x0000.b1e60c00scn: 0x0000.0011fca1
*** 2011-08-10 12:31:11.998
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [ktbdchk1: bad dscn], [], [], [], [], [], [], []
Current SQL statement for this session:
select ctime, mtime, stime from obj$ where obj# = :1
根据上面错误判断,错误的scn为b1e60c00,不是整个数据文件的scn错误
而应该是一个对象的scn错误,所以继续在xff_ora_8568.trc文件中查找b1e60c00
找到结果如下:
Block header dump: 0x0040007a
Object id on Block? Y
seg/obj: 0x12 csc: 0x00.b1e60c00 itc: 1 flg: - typ: 1 - DATA
fsl: 0 fnx: 0x0 ver: 0x01
Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x0008.02a.000001d9 0x00802341.01bb.04 ---- 1 fsc 0x0000.0011ae7c
data_block_dump,data header at 0x20fd6044
===============
tsiz: 0x1fb8
hsiz: 0xea
pbl: 0x20fd6044
bdba: 0x0040007a
76543210
flag=--------
ntab=1
nrow=108
frre=-1
fsbo=0xea
fseo=0x453
avsp=0x369
tosp=0x369
0xe:pti[0] nrow=108 offs=0
根据这个提示,发现dba为:0040007a的对象异常,查找对应的file_id,block
SQL> SELECT DBMS_UTILITY.data_block_address_file (TO_NUMBER ('40007a', 'XXXXXXXX')) file_id,
2 DBMS_UTILITY.data_block_address_block (TO_NUMBER ('40007a', 'XXXXXXXX')) block_id
3 FROM DUAL;
FILE_ID BLOCK_ID
---------- ----------
1 122
使用bbed查看file=1,block=122的scn情况
BBED> p ktbbh
struct ktbbh, 48 bytes @20
ub1 ktbbhtyp @20 0x01 (KDDBTDATA)
union ktbbhsid, 4 bytes @24
ub4 ktbbhsg1 @24 0x00000012
ub4 ktbbhod1 @24 0x00000012
struct ktbbhcsc, 8 bytes @28
ub4 kscnbas @28 0xb1e60c00
ub2 kscnwrp @32 0x0000
b2 ktbbhict @36 1
ub1 ktbbhflg @38 0x02 (NONE)
ub1 ktbbhfsl @39 0x00
ub4 ktbbhfnx @40 0x00000000
struct ktbbhitl[0], 24 bytes @44
struct ktbitxid, 8 bytes @44
ub2 kxidusn @44 0x0008
ub2 kxidslt @46 0x002a
ub4 kxidsqn @48 0x000001d9
struct ktbituba, 8 bytes @52
ub4 kubadba @52 0x00802341
ub2 kubaseq @56 0x01bb
ub1 kubarec @58 0x04
ub2 ktbitflg @60 0x0001 (NONE)
union _ktbitun, 2 bytes @62
b2 _ktbitfsc @62 0
ub2 _ktbitwrp @62 0x0000
ub4 ktbitbas @64 0x0011ae7c
果然发现scn为0xb1e60c00,现在把其修改为:0x00124ac6(注意规则,一般linux下都是倒序)
BBED> set offset 28
OFFSET 28
BBED> m /x c64a1200
BBED-00209: invalid number (c64a1200)
小技巧,一次性修改报错,尝试一次修改一点
BBED> m /x c64a
File: /u01/oradata/xienfei/system01.dbf (0)
Block: 122 Offsets: 28 to 43 Dba:0x00000000
------------------------------------------------------------------------
c64ae6b1 00000000 01000200 00000000
<32 bytes per line>
BBED> set offset +2
OFFSET 30
BBED> m /x 1200
File: /u01/oradata/xienfei/system01.dbf (0)
Block: 122 Offsets: 30 to 45 Dba:0x00000000
------------------------------------------------------------------------
12000000 00000100 02000000 00000800
<32 bytes per line>
BBED> set offset -2
OFFSET 28
BBED> dump
File: /u01/oradata/xienfei/system01.dbf (0)
Block: 122 Offsets: 28 to 43 Dba:0x00000000
------------------------------------------------------------------------
c64a1200 00000000 01000200 00000000
<32 bytes per line>
BBED> sum apply
Check value for File 0, Block 122:
current = 0x3a4e, required = 0x3a4e
SQL> startup
ORACLE instance started.
Total System Global Area 167772160 bytes
Fixed Size 1260720 bytes
Variable Size 150995792 bytes
Database Buffers 8388608 bytes
Redo Buffers 7127040 bytes
Database mounted.
Database opened.
Daily Archives: 2011 年 09 月 15 日
cursor: pin S事件
A session waits for “cursor: pin S” when it wants a specific mutex in S (share) mode on a specific cursor and there is no concurrent X holder but it could not acquire that mutex immediately. This may seem a little strange as one might question why there should be any form of wait to get a mutex which has no session holding it in an incompatible mode. The reason for the wait is that in order to acquire the mutex in S mode (or release it) the session has to increment (or decrement) the mutex reference count and this requires an exclusive atomic update to the mutex structure itself. If there are concurrent sessions trying to make such an update to the mutex then only one session can actually increment (or decrement) the reference count at a time. A wait on “cursor: pin S” thus occurs if a session cannot make that atomic change immediately due to other concurrent requests.
Mutexes are local to the current instance in RAC environments.
Oracle10g中引用的mutexes机制一定程度的替代了library cache pin,其结构更简单,get&set的原子操作更快捷。
它相当于,每个child cursor下面都有一个mutexes这样的简单内存结构,当有session要执行该SQL而需要pin cursor操作的时候,session只需要以shared模式set这个内存位+1,表示session获得该mutex的shared mode lock.可以有很多session同时具有这个mutex的shared mode lock;但在同一时间,只能有一个session在操作这个mutext +1或者-1。+1 -1的操作是排它性的原子操作。如果因为session并行太多,而导致某个session在等待其他session的mutext +1/-1操作,则该session要等待cursor: pin S等待事件。
当看到系统有很多session等待cursor: pin S事件的时候,要么是CPU不够快,要么是某个SQL的并行执行次数太多了而导致在child cursor上的mutex操作争用。如果是Capacity的问题,则可以升级硬件。如果是因为SQL的并行太多,则要么想办法降低该SQL执行次数,要么将该SQL复制成N个其它的SQL。
select /*SQL 1*/object_name from t where object_id=?
select /*SQL 2*/object_name from t where object_id=?
select /*SQL …*/object_name from t where object_id=?
select /*SQL N*/object_name from t where object_id=?
这样就有了N个SQL Cursor,N个Mutex内存结构,就将争用分散开来,类似partition的作用了
