启动数据库报错 SQL> startup ORACLE instance started. Total System Global Area 167772160 bytes Fixed Size 1260720 bytes Variable Size 150995792 bytes Database Buffers 8388608 bytes Redo Buffers 7127040 bytes Database mounted. ORA-01092: ORACLE instance terminated. Disconnection forced alert.log导错 Wed Aug 10 12:31:11 2011 Errors in file /u01/admin/xienfei/udump/xff_ora_8568.trc: ORA-00600: internal error code, arguments: [ktbdchk1: bad dscn], [], [], [], [], [], [], [] xff_ora_8568.trc内容 [ktbdchk] -- readers_dsz -- bad dscn scn: 0x0000.b1e60c00scn: 0x0000.0011fca1 *** 2011-08-10 12:31:11.998 ksedmp: internal or fatal error ORA-00600: internal error code, arguments: [ktbdchk1: bad dscn], [], [], [], [], [], [], [] Current SQL statement for this session: select ctime, mtime, stime from obj$ where obj# = :1 根据上面错误判断,错误的scn为b1e60c00,不是整个数据文件的scn错误 而应该是一个对象的scn错误,所以继续在xff_ora_8568.trc文件中查找b1e60c00 找到结果如下: Block header dump: 0x0040007a Object id on Block? Y seg/obj: 0x12 csc: 0x00.b1e60c00 itc: 1 flg: - typ: 1 - DATA fsl: 0 fnx: 0x0 ver: 0x01 Itl Xid Uba Flag Lck Scn/Fsc 0x01 0x0008.02a.000001d9 0x00802341.01bb.04 ---- 1 fsc 0x0000.0011ae7c data_block_dump,data header at 0x20fd6044 =============== tsiz: 0x1fb8 hsiz: 0xea pbl: 0x20fd6044 bdba: 0x0040007a 76543210 flag=-------- ntab=1 nrow=108 frre=-1 fsbo=0xea fseo=0x453 avsp=0x369 tosp=0x369 0xe:pti[0] nrow=108 offs=0 根据这个提示,发现dba为:0040007a的对象异常,查找对应的file_id,block SQL> SELECT DBMS_UTILITY.data_block_address_file (TO_NUMBER ('40007a', 'XXXXXXXX')) file_id, 2 DBMS_UTILITY.data_block_address_block (TO_NUMBER ('40007a', 'XXXXXXXX')) block_id 3 FROM DUAL; FILE_ID BLOCK_ID ---------- ---------- 1 122 使用bbed查看file=1,block=122的scn情况 BBED> p ktbbh struct ktbbh, 48 bytes @20 ub1 ktbbhtyp @20 0x01 (KDDBTDATA) union ktbbhsid, 4 bytes @24 ub4 ktbbhsg1 @24 0x00000012 ub4 ktbbhod1 @24 0x00000012 struct ktbbhcsc, 8 bytes @28 ub4 kscnbas @28 0xb1e60c00 ub2 kscnwrp @32 0x0000 b2 ktbbhict @36 1 ub1 ktbbhflg @38 0x02 (NONE) ub1 ktbbhfsl @39 0x00 ub4 ktbbhfnx @40 0x00000000 struct ktbbhitl[0], 24 bytes @44 struct ktbitxid, 8 bytes @44 ub2 kxidusn @44 0x0008 ub2 kxidslt @46 0x002a ub4 kxidsqn @48 0x000001d9 struct ktbituba, 8 bytes @52 ub4 kubadba @52 0x00802341 ub2 kubaseq @56 0x01bb ub1 kubarec @58 0x04 ub2 ktbitflg @60 0x0001 (NONE) union _ktbitun, 2 bytes @62 b2 _ktbitfsc @62 0 ub2 _ktbitwrp @62 0x0000 ub4 ktbitbas @64 0x0011ae7c 果然发现scn为0xb1e60c00,现在把其修改为:0x00124ac6(注意规则,一般linux下都是倒序) BBED> set offset 28 OFFSET 28 BBED> m /x c64a1200 BBED-00209: invalid number (c64a1200) 小技巧,一次性修改报错,尝试一次修改一点 BBED> m /x c64a File: /u01/oradata/xienfei/system01.dbf (0) Block: 122 Offsets: 28 to 43 Dba:0x00000000 ------------------------------------------------------------------------ c64ae6b1 00000000 01000200 00000000 <32 bytes per line> BBED> set offset +2 OFFSET 30 BBED> m /x 1200 File: /u01/oradata/xienfei/system01.dbf (0) Block: 122 Offsets: 30 to 45 Dba:0x00000000 ------------------------------------------------------------------------ 12000000 00000100 02000000 00000800 <32 bytes per line> BBED> set offset -2 OFFSET 28 BBED> dump File: /u01/oradata/xienfei/system01.dbf (0) Block: 122 Offsets: 28 to 43 Dba:0x00000000 ------------------------------------------------------------------------ c64a1200 00000000 01000200 00000000 <32 bytes per line> BBED> sum apply Check value for File 0, Block 122: current = 0x3a4e, required = 0x3a4e SQL> startup ORACLE instance started. Total System Global Area 167772160 bytes Fixed Size 1260720 bytes Variable Size 150995792 bytes Database Buffers 8388608 bytes Redo Buffers 7127040 bytes Database mounted. Database opened.
Daily Archives: 2011 年 09 月 15 日
cursor: pin S事件
A session waits for “cursor: pin S” when it wants a specific mutex in S (share) mode on a specific cursor and there is no concurrent X holder but it could not acquire that mutex immediately. This may seem a little strange as one might question why there should be any form of wait to get a mutex which has no session holding it in an incompatible mode. The reason for the wait is that in order to acquire the mutex in S mode (or release it) the session has to increment (or decrement) the mutex reference count and this requires an exclusive atomic update to the mutex structure itself. If there are concurrent sessions trying to make such an update to the mutex then only one session can actually increment (or decrement) the reference count at a time. A wait on “cursor: pin S” thus occurs if a session cannot make that atomic change immediately due to other concurrent requests.
Mutexes are local to the current instance in RAC environments.
Oracle10g中引用的mutexes机制一定程度的替代了library cache pin,其结构更简单,get&set的原子操作更快捷。
它相当于,每个child cursor下面都有一个mutexes这样的简单内存结构,当有session要执行该SQL而需要pin cursor操作的时候,session只需要以shared模式set这个内存位+1,表示session获得该mutex的shared mode lock.可以有很多session同时具有这个mutex的shared mode lock;但在同一时间,只能有一个session在操作这个mutext +1或者-1。+1 -1的操作是排它性的原子操作。如果因为session并行太多,而导致某个session在等待其他session的mutext +1/-1操作,则该session要等待cursor: pin S等待事件。
当看到系统有很多session等待cursor: pin S事件的时候,要么是CPU不够快,要么是某个SQL的并行执行次数太多了而导致在child cursor上的mutex操作争用。如果是Capacity的问题,则可以升级硬件。如果是因为SQL的并行太多,则要么想办法降低该SQL执行次数,要么将该SQL复制成N个其它的SQL。
select /*SQL 1*/object_name from t where object_id=?
select /*SQL 2*/object_name from t where object_id=?
select /*SQL …*/object_name from t where object_id=?
select /*SQL N*/object_name from t where object_id=?
这样就有了N个SQL Cursor,N个Mutex内存结构,就将争用分散开来,类似partition的作用了