记录一次ORA-00600[kdsgrp1]分析

数据库版本

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for IBM/AIX RISC System/6000: Version 10.2.0.4.0 - Productio
NLSRTL Version 10.2.0.4.0 - Production

找出报错对象

--方法1
*** SESSION ID:(795.16405) 2012-04-05 09:36:11.958
            row 080095ee.26 continuation at
            file# 32 block# 38382 slot 39 not found
**************************************************
KDSTABN_GET: 0 ..... ntab: 1
curSlot: 39 ..... nrows: 19
**************************************************

SQL> SELECT OWNER, SEGMENT_NAME, SEGMENT_TYPE, TABLESPACE_NAME, A.PARTITION_NAME
  2    FROM DBA_EXTENTS A
  3   WHERE FILE_ID = &FILE_ID
  4     AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1;
Enter value for file_id: 32
old   3:  WHERE FILE_ID = &FILE_ID
new   3:  WHERE FILE_ID = 32
Enter value for block_id: 38382
old   4:    AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
new   4:    AND 38382 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1

OWNER
------------------------------
SEGMENT_NAME
--------------------------------------------------------------------------------
SEGMENT_TYPE       TABLESPACE_NAME                PARTITION_NAME
------------------ ------------------------------ ------------------------------
AHV8
TBL_IVR_LOG
TABLE PARTITION    CSS_PARTITION                  IVR_LOG_2012_MONTH04

--方法2
*** 2012-04-05 09:36:11.965
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []
Current SQL statement for this session:
INSERT INTO TBL_CONTACT_INFO_FAILED_TMP 
select * from TBL_IVR_LOG

SO: 70000017f954f50, type: 4, owner: 70000017f65a840, flag: INIT/-/-/0x00
(session) sid: 795 trans: 70000017464a1e8, creator: 70000017f65a840, flag: (40110041) USR/- BSY/-/-/-/-/-
              DID: 0002-0067-000305BD, short-term DID: 0002-0067-000305BE
              txn branch: 0
              oct: 2, prv: 0, sql: 70000015180ee98, psql: 700000180d67550, user: 49/AHV8
service name: SYS$USERS
O/S info: user: oracle10, term: UNKNOWN, ospid: 12976218, machine: zwq_kfdb2
              program: oracle@zwq_kfdb2 (J002)
last wait for 'db file sequential read' blocking sess=0x0 seq=226 wait_time=17071 seconds since wait started=1
                file#=20, block#=95ee, blocks=1

--方法3
Block header dump:  0x080095ee
 Object id on Block? Y
 seg/obj: 0x11eeb  csc: 0x6f2.848e814  itc: 2  flg: E  typ: 1 - DATA
     brn: 1  bdba: 0x7c09c89 ver: 0x01 opc: 0
     inc: 0  exflg: 0


SQL> select to_number('11eeb','xxxxxxxx') from dual;

TO_NUMBER('11EEB','XXXXXXXX')
-----------------------------
                        73451

SQL> select owner,object_name,subobject_name,object_type from dba_objects where data_object_id='73451';

OWNER
------------------------------
OBJECT_NAME
--------------------------------------------------------------------------------
SUBOBJECT_NAME                 OBJECT_TYPE
------------------------------ -------------------
AHV8
TBL_IVR_LOG
IVR_LOG_2012_MONTH04           TABLE PARTITION

验证是否真的坏块

SQL> select name from v$datafile where file#=32;

NAME
------------------------------------------------------
/dev/rdb1_data27

[zwq_kfdb2:/home/oraeye]dbv file='/dev/rdb1_data27' blocksize=8192

DBVERIFY: Release 10.2.0.4.0 - Production on Fri Apr 13 15:33:10 2012

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

DBVERIFY - Verification starting : FILE = /dev/rdb1_data27
  

DBVERIFY - Verification complete

Total Pages Examined         : 1048448
Total Pages Processed (Data) : 947357
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 4756
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 96335
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Highest block SCN            : 297329920 (1778.297329920)

SQL> select count(*) from AHV8.TBL_IVR_LOG partition(IVR_LOG_2012_MONTH04);

  COUNT(*)
----------
   8798030

总结:很明显这次出现这个问题,因为内存中出现坏块导致,经过一段时间buffer cache中的坏块内容已经被老化,所以现在不能重现(甚至不用做任何操作)。如果内存中出现了坏块,而且还没有被老化掉,可以刷新data buffer;如果是数据块出现坏块,根据实际情况决定处理

ORACLE 十进制与二进制互转函数

十进制转换二进制

CREATE OR REPLACE FUNCTION NUMBER_TO_BIT(V_NUM NUMBER) 
RETURN VARCHAR IS V_RTN VARCHAR(8);--注意返回列长度
  V_N1  NUMBER;
  V_N2  NUMBER;
BEGIN
V_N1 := V_NUM;
    LOOP
      V_N2  := MOD(V_N1, 2);
      V_N1  := ABS(TRUNC(V_N1 / 2));
      V_RTN := TO_CHAR(V_N2) || V_RTN;
      EXIT WHEN V_N1 = 0;
    END LOOP;
--返回二进制长度
 SELECT lpad(V_RTN,8,0) 
    INTO   V_RTN
    FROM dual;
return V_RTN;
end;

SQL> select NUMBER_TO_BIT(208) from dual;

NUMBER_TO_BIT(208)
-----------------------------
11010000

二进制转换十进制

CREATE OR REPLACE FUNCTION BIT_TO_NUMBER(P_BIN IN VARCHAR2) RETURN NUMBER AS
  V_SQL    VARCHAR2(30000) := 'SELECT BIN_TO_NUM(';
  V_RETURN NUMBER;
BEGIN
  IF LENGTH(P_BIN) >= 256 THEN
    RAISE_APPLICATION_ERROR(-20001, 'INPUT BIN TOO LONG!');
  END IF;
  IF LTRIM(P_BIN, '01') IS NOT NULL THEN
    RAISE_APPLICATION_ERROR(-20002, 'INPUT STR IS NOT VALID BIN VALUE!');
  END IF;
  FOR I IN 1 .. LENGTH(P_BIN) LOOP
    V_SQL := V_SQL || SUBSTR(P_BIN, I, 1) || ',';
  END LOOP;
  V_SQL := RTRIM(V_SQL, ',') || ') FROM DUAL';
  EXECUTE IMMEDIATE V_SQL
    INTO V_RETURN;
  RETURN V_RETURN;
END;

SQL> SELECT BIT_TO_NUMBER('11010000') FROM DUAL;

BIT_TO_NUMBER('11010000')
-------------------------
                      208

ORACLE ROWID 分析

ROWID知识补充
Rowid改为:OOOOOOFFFBBBBBBRRR,占用10个字节(32bit data_object_id +10bit rfile# +22bit block +16bit)。其中,O是对象ID,F是文件ID,B是块ID,R是行ID。

Rowid对应值对应10十进制值

A-Z <==> 0 - 25 (26)
a-z <==> 26 - 51 (26)
0-9 <==> 52 - 61 (10)
+/ <==> 62 - 63 (2)

模拟环境

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bi
PL/SQL Release 10.2.0.1.0 - Production
CORE    10.2.0.1.0      Production
TNS for Linux: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production

SQL> create table t_xff (id number,name varchar2(100));

Table created.

SQL> insert into t_xff values(1,'www.orasos.com');

1 row created.

SQL> commit;

Commit complete.

SQL>  alter table t_xff move;

Table altered.

SQL> select rowid,a.* from t_xff a;

ROWID                      ID NAME
------------------ ---------- ------------------------------
AADye6AAEAAAtCcAAA          1 www.orasos.com

相关值计算

Data Object number = AADye6
File               = AAE
Block              = AAAtCc
ROW                = AAA

DATA_OBJECT_ID
AADye6
58*64^0+30*64^1+50*64^2+3*64^3+0*64^4+0*64^5
58+1920+204800+786432+0+0=993210

RFILE#
AAE
4*64^0+0*64^1+0*64^2
4+0+0=4

BLOCK
AAAtCc
28*64^0+2*64^1+45*64^2+0*64^3+0*64^4+0*64^5
28+128+184320+0+0+0=184476

验证结果

SQL> select object_id,data_object_id from dba_objects where object_name='T_XFF';

 OBJECT_ID DATA_OBJECT_ID
---------- --------------
    993209         993210

SQL> select dbms_rowid.rowid_object('AADye6AAEAAAtCcAAA') data_object_id#,
  2         dbms_rowid.rowid_relative_fno('AADye6AAEAAAtCcAAA') rfile#,
  3         dbms_rowid.rowid_block_number('AADye6AAEAAAtCcAAA') block#,
  4     dbms_rowid.rowid_row_number('AADye6AAEAAAtCcAAA') row# from dual;  

DATA_OBJECT_ID#     RFILE#     BLOCK#       ROW#
--------------- ---------- ---------- ----------
         993210          4     184476          0

dump方式分析

SQL> select rowid,dump(rowid) from t_xff;

ROWID              DUMP(ROWID)
------------------ ------------------------------------------
AADye6AAEAAAtCcAAA Typ=69 Len=10: 0,15,39,186,1,2,208,156,0,0

DATA_OBJECT_ID
0,15,39,186
15*256*256+39*256+186=993210

RFILE#
1,2(取前10位)
000000001 00 =4

BLOCK
2,208,156(取后22位)
000010 11010000 10011100
2^17+208*256+156=184476

oracle 9i数据库存在大量ora_j0**进程

1.发现多个ora_j0**进程
可以发现进程重启非常频繁,大概1分钟重启一次,启动ora_j0**的个数为20个

[prmsvr2:/home/oraeye]ps -ef|grep ora_j0|grep -v grep
oracle 213120 1 0 11:01:55 - 0:00 ora_j019_prmdb2 
oracle 282808 1 0 11:01:55 - 0:00 ora_j012_prmdb2 
oracle 299178 1 0 11:01:54 - 0:00 ora_j004_prmdb2 
oracle 434212 1 0 11:01:55 - 0:00 ora_j014_prmdb2 
oracle 475238 1 0 11:01:54 - 0:00 ora_j011_prmdb2 
oracle 487562 1 0 11:01:54 - 0:00 ora_j008_prmdb2 
oracle 614612 1 0 11:01:55 - 0:00 ora_j017_prmdb2 
oracle 717008 1 0 11:01:54 - 0:00 ora_j009_prmdb2 
oracle 721012 1 0 11:01:54 - 0:00 ora_j006_prmdb2 
oracle 749618 1 0 11:01:54 - 0:00 ora_j007_prmdb2 
oracle 770268 1 0 11:01:54 - 0:00 ora_j005_prmdb2 
oracle 811114 1 0 11:01:55 - 0:00 ora_j015_prmdb2 
oracle 831550 1 0 11:01:55 - 0:00 ora_j016_prmdb2 
oracle 847940 1 0 11:01:55 - 0:00 ora_j013_prmdb2 
oracle 938076 1 0 11:01:54 - 0:00 ora_j010_prmdb2 
oracle 942224 1 0 11:01:54 - 0:00 ora_j002_prmdb2 
oracle 974968 1 0 11:01:55 - 0:00 ora_j018_prmdb2 
oracle 434602 1 0 11:01:54 - 0:00 ora_j000_prmdb2 
oracle 668104 1 0 11:01:54 - 0:00 ora_j003_prmdb2 
oracle 983424 1 0 11:01:54 - 0:00 ora_j001_prmdb2 

[prmsvr2:/home/oraeye]ps -ef|grep ora_j0|grep -v grep
oracle 454676 1 1 11:02:56 - 0:00 ora_j012_prmdb2 
oracle 696366 1 1 11:02:56 - 0:00 ora_j011_prmdb2 
oracle 749628 1 1 11:02:56 - 0:00 ora_j019_prmdb2 
oracle 770276 1 1 11:02:56 - 0:00 ora_j017_prmdb2 
oracle 794824 1 1 11:02:56 - 0:00 ora_j010_prmdb2 
oracle 827464 1 1 11:02:55 - 0:00 ora_j009_prmdb2 
oracle 831560 1 1 11:02:56 - 0:00 ora_j016_prmdb2 
oracle 847946 1 1 11:02:56 - 0:00 ora_j014_prmdb2 
oracle 888972 1 1 11:02:55 - 0:00 ora_j007_prmdb2 
oracle 934064 1 1 11:02:55 - 0:00 ora_j008_prmdb2 
oracle 938080 1 1 11:02:56 - 0:00 ora_j013_prmdb2 
oracle 942232 1 0 11:02:55 - 0:00 ora_j001_prmdb2 
oracle 962760 1 1 11:02:55 - 0:00 ora_j006_prmdb2 
oracle 434608 1 1 11:02:55 - 0:00 ora_j004_prmdb2 
oracle 528810 1 2 11:02:56 - 0:00 ora_j015_prmdb2 
oracle 549228 1 1 11:02:55 - 0:00 ora_j005_prmdb2 
oracle 668112 1 1 11:02:55 - 0:00 ora_j003_prmdb2 
oracle 709090 1 1 11:02:55 - 0:00 ora_j002_prmdb2 
oracle 905720 1 2 11:02:56 - 0:00 ora_j018_prmdb2 
oracle 971040 1 1 11:02:55 - 0:00 ora_j000_prmdb2 

[prmsvr2:/home/oraeye]ps -ef|grep ora_j0|grep -v grep |wc -l
      20

2.其他参数

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
PL/SQL Release 9.2.0.8.0 - Production
CORE    9.2.0.8.0       Production
TNS for IBM/AIX RISC System/6000: Version 9.2.0.8.0 - Production
NLSRTL Version 9.2.0.8.0 - Production

SQL> show parameter job;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
job_queue_processes                  integer     20

SQL> select count(*) from DBA_JOBS_RUNNING;

  COUNT(*)
----------
         0

SQL> SELECT count(*) FROM v$session_wait where event='jobq slave wait';

  COUNT(*)
----------
        20
SQL> select count(*) from dba_jobs;

  COUNT(*)
----------
         2

SQL> col name for a32
SQL> col value for a24
SQL> col description for a70
SQL> select a.ksppinm name,b.ksppstvl value,a.ksppdesc description
  2    from x$ksppi a,x$ksppcv b
  3   where a.inst_id = USERENV ('Instance')
  4     and b.inst_id = USERENV ('Instance')
  5     and a.indx = b.indx
  6     and upper(a.ksppinm) LIKE upper('%&param%')
  7  order by name
  8  /
Enter value for param: _JOB_QUEUE_INTERVAL
old   6:    and upper(a.ksppinm) LIKE upper('%&param%')
new   6:    and upper(a.ksppinm) LIKE upper('%_JOB_QUEUE_INTERVAL%')

NAME                             VALUE
-------------------------------- ------------------------
DESCRIPTION
----------------------------------------------------------------------
_job_queue_interval              5
Wakeup interval in seconds for job queue co-ordinator

3.对cjq进程做10046

[prmsvr2:/oracle]ps -ef|grep cjq
  oracle  327870       1   0   Feb 10      - 770:39 ora_cjq0_prmdb2 
  oracle  929872  794774   0 13:24:59  pts/2  0:00 grep cjq 

SQL> oradebug setospid 327870
Oracle pid: 19, Unix process pid: 327870, image: oracle@prmsvr2 (CJQ0)
SQL> oradebug unlimit
Statement processed.
SQL> oradebug event 10046 trace name context forever, level 12
Statement processed.
SQL>  oradebug event 10046 trace name context off;
Statement processed.
SQL> oradebug tracefile_name
/oracle/app/admin/prmdb/bdump/prmdb2_cjq0_327870.trc

4.查看cjq的10046文件
发现大量的process startup等待,而且两次批量运行之间的时间间隔在1分钟左右。

WAIT #0: nam='process startup' ela= 59247 p1=74 p2=0 p3=0
WAIT #0: nam='process startup' ela= 51486 p1=74 p2=1 p3=0
WAIT #0: nam='process startup' ela= 51629 p1=74 p2=2 p3=0
WAIT #0: nam='process startup' ela= 48205 p1=74 p2=3 p3=0
WAIT #0: nam='process startup' ela= 47625 p1=74 p2=4 p3=0
WAIT #0: nam='process startup' ela= 55945 p1=74 p2=5 p3=0
WAIT #0: nam='process startup' ela= 47633 p1=74 p2=6 p3=0
WAIT #0: nam='process startup' ela= 51809 p1=74 p2=7 p3=0
WAIT #0: nam='process startup' ela= 57371 p1=74 p2=8 p3=0
WAIT #0: nam='process startup' ela= 50249 p1=74 p2=9 p3=0
WAIT #0: nam='process startup' ela= 51683 p1=74 p2=10 p3=0
WAIT #0: nam='process startup' ela= 47933 p1=74 p2=11 p3=0
WAIT #0: nam='process startup' ela= 47699 p1=74 p2=12 p3=0
WAIT #0: nam='process startup' ela= 47800 p1=74 p2=13 p3=0
WAIT #0: nam='process startup' ela= 47947 p1=74 p2=14 p3=0
WAIT #0: nam='process startup' ela= 52071 p1=74 p2=15 p3=0
WAIT #0: nam='process startup' ela= 47776 p1=74 p2=16 p3=0
WAIT #0: nam='process startup' ela= 47764 p1=74 p2=17 p3=0
WAIT #0: nam='process startup' ela= 47684 p1=74 p2=18 p3=0
WAIT #0: nam='process startup' ela= 47790 p1=74 p2=19 p3=0

通过O记的大力帮助,终于找出了该问题的原因:Bug 4339922: CJQ PROCESS WAKE UP JOB QUEUE PROCESSES EVERY 1 MINUTES.(THERE IS NO JOBS).因为9i的版本oracle不再提供新补丁支持,ora_j0**相关进程不停重启不太占用系统和数据库资源,在不能升级数据库的情况下,可以考虑设置job_queue_processes到一个合适值,然后忽略该问题。

hint指定index的深入理解

模拟环境
创建一个表,含有位图index和b-tree index

SQL> create table t_xifenfei as 
  2  select object_id,object_name from dba_objects;

Table created.

SQL> create index b_tree_t_xifenfei on t_xifenfei(object_id);

Index created.

SQL> CREATE BITMAP INDEX  bitmap_t_xifenfei on t_xifenfei(object_name);

Index created.

SQL> BEGIN
  2  dbms_stats.gather_table_stats(USER,'T_XIFENFEI',cascade => true);
  3  END;
  4  / 

PL/SQL procedure successfully completed.

无index hint

SQL> SET AUTOT TRACE EXPL STAT 
SQL> SELECT OBJECT_ID FROM t_xifenfei;

845708 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 548923532

--------------------------------------------------------------------------------
| Id  | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            |   841K|  4109K|   886   (3)| 00:00:11 |
|   1 |  TABLE ACCESS FULL| T_XIFENFEI |   841K|  4109K|   886   (3)| 00:00:11 |
--------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      60525  consistent gets
          0  physical reads
          0  redo size
   15543305  bytes sent via SQL*Net to client
     620649  bytes received via SQL*Net from client
      56382  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     845708  rows processed

这里因为object_id列可能有null值,所以不会使用b_tree_t_xifenfei索引,预料之中事件

index hint b_tree_t_xifenfei

SQL> SET  LINESIZE 150
SQL> SELECT /*+ INDEX(T b_tree_t_xifenfei) */object_id from t_xifenfei t;

845708 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1935372603

--------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                   |   841K|  4109K| 18940   (1)| 00:03:48 |
|   1 |  TABLE ACCESS BY INDEX ROWID | T_XIFENFEI        |   841K|  4109K| 18940   (1)| 00:03:48 |
|   2 |   BITMAP CONVERSION TO ROWIDS|                   |       |       |            |          |
|   3 |    BITMAP INDEX FULL SCAN    | BITMAP_T_XIFENFEI |       |       |            |          |
--------------------------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      91537  consistent gets
          0  physical reads
          0  redo size
   42362633  bytes sent via SQL*Net to client
     620649  bytes received via SQL*Net from client
      56382  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     845708  rows processed

这里因为object_id列可能有null值,所以不会使用b_tree_t_xifenfei索引,这里的疑惑是:
就算不会使用b_tree_t_xifenfei index也不应该会使用BITMAP_T_XIFENFEI index,因为使用这个的cost会大于全表扫描

index hint 一个无效index

SQL> SELECT /*+ INDEX(T abc) */object_id from t_xifenfei t;

845708 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1935372603

--------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                   |   841K|  4109K| 18940   (1)| 00:03:48 |
|   1 |  TABLE ACCESS BY INDEX ROWID | T_XIFENFEI        |   841K|  4109K| 18940   (1)| 00:03:48 |
|   2 |   BITMAP CONVERSION TO ROWIDS|                   |       |       |            |          |
|   3 |    BITMAP INDEX FULL SCAN    | BITMAP_T_XIFENFEI |       |       |            |          |
--------------------------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      91537  consistent gets
          0  physical reads
          0  redo size
   42362633  bytes sent via SQL*Net to client
     620649  bytes received via SQL*Net from client
      56382  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     845708  rows processed

这里使用了一个无效的index,也使用了BITMAP_T_XIFENFEI,让人更加的感觉奇怪

原因分析
If the INDEX hint specifies no indexes, then the optimizer considers the cost of a scan on each available index on the table and then performs the index scan with the lowest cost. The database can also choose to scan multiple indexes and merge the results, if such an access path has the lowest cost. The optimizer does not consider a full table scan.
如果我们使用hint指定了一个无效的index,优化器会扫描表中所有可以使用的index,然后选择cost最小的index或者index组合,而不会选择全表扫描。
因为我们hint指定b_tree_t_xifenfei index的时候,因为object_id可能有值为空(列没定义为not null),所以不能使用该index,从而也就是相当于一个无效的index,从而扫描该表的其他可以使用的index,导致使用了位图索引(该类型index不排除null),而不是全表扫描.
温馨提示:使用hint指定index的时候需要慎重,如果不合适或者无效,可能导致程序效率更低