记一次含AND-EQUAL执行计划调优

1.数据库版本

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
PL/SQL Release 9.2.0.4.0 - Production
CORE    9.2.0.3.0       Production
TNS for Linux: Version 9.2.0.4.0 - Production
NLSRTL Version 9.2.0.4.0 - Production

2.发现含有AND-EQUAL执行计划

SQL> SELECT 
  2  COUNT(TABXNPRESM1_.DETAIL_ID) AS X0_0_
  3    FROM QXTDEV1.TAB_XN_PRESMS TABXNPRESM0_, QXTDEV1.TAB_XN_PRESMS_DETAIL TABXNPRESM1_
  4  WHERE  (TABXNPRESM0_.COMPANY_ID = 346240)
  5     AND (TABXNPRESM0_.EMPLOYEE_ID = 0)
  6     AND (TABXNPRESM0_.PRE_TIME >= TO_DATE('2011/12/25', 'yyyy/mm/dd'))
  7     AND (TABXNPRESM0_.PRE_TIME < TO_DATE('2011/12/26', 'yyyy/mm/dd') + 1)
  8     AND ((TABXNPRESM0_.SEND_TYPE = 1) OR (TABXNPRESM0_.SEND_TYPE = 0))
  9     AND ((TABXNPRESM1_.RESULT = -1) OR (TABXNPRESM1_.RESULT = 0))
 10     AND (TABXNPRESM0_.PRE_ID = TABXNPRESM1_.PRE_ID);

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=11 Card=1 Bytes=40)
   1    0   SORT (AGGREGATE)
   2    1     TABLE ACCESS (BY INDEX ROWID) OF 'TAB_XN_PRESMS_DETAIL'(Cost=3 Card=2 Bytes=30)
   3    2       NESTED LOOPS (Cost=11 Card=1 Bytes=40)
   4    3         TABLE ACCESS (BY INDEX ROWID) OF 'TAB_XN_PRESMS' (Cost=8 Card=1 Bytes=25)
   5    4           AND-EQUAL
   6    5             INDEX (RANGE SCAN) OF 'IDX_XN_PRESMS_2' (NON-UNIQUE)
   7    5             INDEX (RANGE SCAN) OF 'IDX_XN_PRESMS_1' (NON-UNIQUE) (Cost=4 Card=638)
   8    3         INDEX (RANGE SCAN) OF 'IDX_XN_PRESMS_DETAIL' (NON-UNIQUE) (Cost=2 Card=2)



Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
     272188  consistent gets
          0  physical reads
          0  redo size
        375  bytes sent via SQL*Net to client
        503  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

3.关于AND-EQUAL解释
If the WHERE clauses uses columns of many single-column indexes, then Oracle executes the statement by performing a range scan on each index to retrieve the rowids of the rows that satisfy each condition. Oracle then merges the sets of rowids to obtain a set of rowids of rows that satisfy all conditions. Oracle then accesses the table using these rowids.
Oracle can merge up to five indexes. If the WHERE clause uses columns of more than five single-column indexes, then Oracle merges five of them, accesses the table by rowid, and then tests the resulting rows to determine whether they satisfy the remaining conditions before returning them.
大概的意思是当where条件后面含有多个列的单列索引时(不超过5个),会先得到每个条件的rowid,然后这些rowid进行merges,得到一个rowid的结果集,最后根据这些rowid取表中记录。

4.表/列/索引相关信息

--index和列信息
SQL> SELECT INDEX_NAME, TABLE_NAME, COLUMN_NAME
  2    FROM USER_IND_COLUMNS
  3   WHERE TABLE_NAME IN ('TAB_XN_PRESMS', 'TAB_XN_PRESMS_DETAIL');
 
INDEX_NAME                     TABLE_NAME                     COLUMN_NAME
------------------------------ ------------------------------ --------------------
IDX_XN_PRESMS_1                TAB_XN_PRESMS                  COMPANY_ID
IDX_XN_PRESMS_2                TAB_XN_PRESMS                  EMPLOYEE_ID
IDX_XN_PRESMS_3                TAB_XN_PRESMS                  PRE_TIME
PK_TAB_XN_PRESMS               TAB_XN_PRESMS                  PRE_ID
IDX_XN_PRESMS_4                TAB_XN_PRESMS                  SEND_TYPE
IDX_XN_PRESMS_DETAIL           TAB_XN_PRESMS_DETAIL           PRE_ID
IDX_XN_PRESMS_DETAIL_2         TAB_XN_PRESMS_DETAIL           SEND_TIME
PK_TAB_XN_PRESMS_DETAIL        TAB_XN_PRESMS_DETAIL           DETAIL_ID
 
8 rows selected

--index的统计信息
SQL> SELECT TABLE_NAME,
  2         INDEX_NAME,
  3         TO_CHAR(LAST_ANALYZED, 'yyyy-mm-dd hh24:mi:ss')
  4    FROM USER_INDEXES
  5   WHERE TABLE_NAME IN ('TAB_XN_PRESMS', 'TAB_XN_PRESMS_DETAIL');
 
TABLE_NAME                     INDEX_NAME                     TO_CHAR(LAST_ANALYZED,'YYYY-MM
------------------------------ ------------------------------ ------------------------------
TAB_XN_PRESMS                  IDX_XN_PRESMS_1                2011-12-29 09:25:32
TAB_XN_PRESMS                  IDX_XN_PRESMS_2                2011-12-29 09:25:35
TAB_XN_PRESMS                  IDX_XN_PRESMS_3                2011-12-29 09:25:39
TAB_XN_PRESMS                  IDX_XN_PRESMS_4                2011-12-29 09:25:21
TAB_XN_PRESMS_DETAIL           IDX_XN_PRESMS_DETAIL           2011-12-29 09:20:03
TAB_XN_PRESMS_DETAIL           IDX_XN_PRESMS_DETAIL_2         2011-12-29 09:20:01
TAB_XN_PRESMS                  PK_TAB_XN_PRESMS               2011-12-29 09:25:46
TAB_XN_PRESMS_DETAIL           PK_TAB_XN_PRESMS_DETAIL        2011-12-29 09:20:02
 
8 rows selected

--列的唯一度情况
SQL>  SELECT TABLE_NAME, COLUMN_NAME, NUM_DISTINCT
  2     FROM USER_TAB_COLS
  3    WHERE (TABLE_NAME, COLUMN_NAME) IN
  4          (SELECT TABLE_NAME, COLUMN_NAME
  5             FROM USER_IND_COLUMNS
  6            WHERE TABLE_NAME IN ('TAB_XN_PRESMS', 'TAB_XN_PRESMS_DETAIL'))
  7            ORDER BY table_name,NUM_DISTINCT DESC;
 
TABLE_NAME                     COLUMN_NAME                    NUM_DISTINCT
------------------------------ ------------------------------ ------------
TAB_XN_PRESMS                  PRE_ID                              1999270
TAB_XN_PRESMS                  PRE_TIME                            1342594
TAB_XN_PRESMS                  EMPLOYEE_ID                           10676
TAB_XN_PRESMS                  COMPANY_ID                             3136
TAB_XN_PRESMS                  SEND_TYPE                                10
TAB_XN_PRESMS_DETAIL           DETAIL_ID                           3863184
TAB_XN_PRESMS_DETAIL           PRE_ID                              1996872
TAB_XN_PRESMS_DETAIL           SEND_TIME                            437526
 
8 rows selected

通过这些信息可以得出:
1)统计信息是最新收集过的
2)因为有多个单列index,数据库为了使得cost最小,可能选择了不合适的index(IDX_XN_PRESMS_2[EMPLOYEE_ID]/IDX_XN_PRESMS_1[COMPANY_ID]),使得出现AND-EQUAL,从而逻辑读偏高。对于这个sql,应该使用唯一度比较高的IDX_XN_PRESMS_3[PRE_TIME]
3)也可以通过修改index,实现程序高效,但是考虑到会影响启动程序,在没有十足的把握之前遵守hint优先原则

5.增加hint提示

SQL> SELECT /*+ index(TABXNPRESM0_ IDX_XN_PRESMS_3) */
  2  COUNT(TABXNPRESM1_.DETAIL_ID) AS X0_0_
  3    FROM QXTDEV1.TAB_XN_PRESMS TABXNPRESM0_, QXTDEV1.TAB_XN_PRESMS_DETAIL TABXNPRESM1_
  4  WHERE  (TABXNPRESM0_.COMPANY_ID = 346240)
  5     AND (TABXNPRESM0_.EMPLOYEE_ID = 0)
  6     AND (TABXNPRESM0_.PRE_TIME >= TO_DATE('2011/12/25', 'yyyy/mm/dd'))
  7     AND (TABXNPRESM0_.PRE_TIME < TO_DATE('2011/12/26', 'yyyy/mm/dd') + 1)
  8     AND ((TABXNPRESM0_.SEND_TYPE = 1) OR (TABXNPRESM0_.SEND_TYPE = 0))
  9     AND ((TABXNPRESM1_.RESULT = -1) OR (TABXNPRESM1_.RESULT = 0))
 10     AND (TABXNPRESM0_.PRE_ID = TABXNPRESM1_.PRE_ID);

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=641 Card=1 Bytes=40)
   1    0   SORT (AGGREGATE)
   2    1     TABLE ACCESS (BY INDEX ROWID) OF 'TAB_XN_PRESMS_DETAIL'(Cost=3 Card=2 Bytes=30)
   3    2       NESTED LOOPS (Cost=641 Card=1 Bytes=40)
   4    3         TABLE ACCESS (BY INDEX ROWID) OF 'TAB_XN_PRESMS' (Cost=638 Card=1 Bytes=25)
   5    4           INDEX (RANGE SCAN) OF 'IDX_XN_PRESMS_3' (NON-UNIQUE) (Cost=63 Card=22286)
   6    3         INDEX (RANGE SCAN) OF 'IDX_XN_PRESMS_DETAIL' (NON-UNIQUE) (Cost=2 Card=2)



Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1422  consistent gets
          0  physical reads
          0  redo size
        375  bytes sent via SQL*Net to client
        503  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

从这里可以看出,程序的逻辑读下降了很多(从272188下降到1422),得到了优化效果,提高了程序执行效率
结合上篇:BITMAP CONVERSION FROM ROWIDS,总结一个经验,如果同时使用到了一个表的多个index,效率一般情况下不会太高。同时也给各位提个醒,index并非越多越好,有时候会适得其反(建立index时需要考虑情况)

清理表部分数据方法

最近两天,开发要求,清理掉某些表的部分数据,因为不能停业务,不能采用cast+rman方式实现.只能自己写脚本删除,在这些删除数据中,经过总结,共有三种类型:
1.删除单个表数据
删除a表以dealdate为范围的部分数据

DECLARE 
  P_SQL   VARCHAR2(300) := 
  'DELETE FROM a WHERE dealdate<to_date('||''''||
'2010-11-01 00:00:00'||''''||','||''''||'yyyy-mm-dd hh24:mi:ss'||''''||')';
  P_COUNT NUMBER := 1000;
BEGIN
  WHILE 1 = 1 LOOP
    EXECUTE IMMEDIATE P_SQL || ' and rownum <= :nu'
      USING P_COUNT;
    IF SQL%NOTFOUND THEN
      EXIT;
    END IF;
    COMMIT;
  END LOOP;
  COMMIT;
END;

2.删除两个关联表数据
tab_a,tab_b两个表通过共有的PRE_ID列关联,然后按照tab_a.ACCEPT_TIME列为条件删除两个表中数据,tab_b表中数据比tab_a多很多

DECLARE
  CURSOR CUR IS
    SELECT B.ROWID BID, A.ROWID AID
      FROM tab_a A, tab_b B
     WHERE A.PRE_ID = B.PRE_ID
       AND A.ACCEPT_TIME <
           TO_DATE('2010-11-01 00:00:00', 'yyyy-mm-dd hh24:mi:ss')
     ORDER BY B.ROWID;
  V_COUNTER NUMBER;
BEGIN
  V_COUNTER := 0;
  FOR ROW IN CUR LOOP
    DELETE FROM tab_a WHERE ROWID = ROW.AID;
    DELETE FROM tab_b WHERE ROWID = ROW.BID;
    V_COUNTER := V_COUNTER + 1;
    IF (V_COUNTER >= 1000) THEN
      COMMIT;
      V_COUNTER := 0;
    END IF;
  END LOOP;
  COMMIT;
END;

3.删除某个表中重复列
删除tab_a表中的COMPANY_ID/PY_DES/PY_DES_Q/PY_TYPE/RELATE_ID列重复数据

DECLARE
  CURSOR CUR IS
    SELECT A.ROWID AID
      FROM tab_a A
     WHERE ROWID NOT IN (SELECT MAX(B.ROWID)
                           FROM tab_a B
                          WHERE A.COMPANY_ID = B.COMPANY_ID
                            AND A.PY_DES = B.PY_DES
                            AND A.PY_DES_Q = B.PY_DES_Q
                            AND A.PY_TYPE = B.PY_TYPE
                            AND A.RELATE_ID = B.RELATE_ID)
     ORDER BY A.ROWID;
  V_COUNTER NUMBER;
BEGIN
  V_COUNTER := 0;
  FOR ROW IN CUR LOOP
    DELETE FROM tab_a WHERE ROWID = ROW.AID;
    V_COUNTER := V_COUNTER + 1;
    IF (V_COUNTER >= 1000) THEN
      COMMIT;
      V_COUNTER := 0;
    END IF;
  END LOOP;
  COMMIT;
END;

ORA-07445[kslgetl()+120]/ORA-00108错误解决

一.数据库版本

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

二.alert中发现ORA-07445[kslgetl()+120]/ORA-00108错误

Mon Dec 19 09:19:42 2011
found dead dispatcher 'D000', pid = (13, 1)
Mon Dec 19 09:19:42 2011
dispatcher 'D000' encountered error getting listening address
Mon Dec 19 09:19:42 2011
Errors in file /opt/oracle/admin/gaxt/bdump/gaxt_ora_16297.trc:
ORA-07445: exception encountered: core dump [kslgetl()+120] [SIGSEGV] [Address not mapped to object] [0x000000208] [] []
ORA-00108: failed to set up dispatcher to accept connection asynchronously
Mon Dec 19 09:19:45 2011
found dead dispatcher 'D000', pid = (21, 2)
Mon Dec 19 09:19:45 2011
dispatcher 'D000' encountered error getting listening address
Mon Dec 19 09:19:45 2011
Errors in file /opt/oracle/admin/gaxt/bdump/gaxt_ora_16299.trc:
ORA-07445: exception encountered: core dump [kslgetl()+120] [SIGSEGV] [Address not mapped to object] [0x000000208] [] []
ORA-00108: failed to set up dispatcher to accept connection asynchronously

三.trace文件信息

Oracle process number: 15
Unix process pid: 10607, image: oracle@gongantest (D000)

Warning: keltnfy call to ldmInit failed with error 46
*** 2011-12-19 19:21:40.100
network error encountered getting listening address:
  NS Primary Error: TNS-12533: TNS:illegal ADDRESS parameters
  NS Secondary Error: TNS-12560: TNS:protocol adapter error
  NT Generic Error: TNS-00503: Illegal ADDRESS parameters
OPIRIP: Uncaught error 108. Error stack:
ORA-00108: failed to set up dispatcher to accept connection asynchronously
Exception signal: 11 (SIGSEGV), code: 1 (Address not mapped to object), addr: 0x208, PC: [0x7a06b8, kslgetl()+120]
*** 2011-12-19 19:21:40.107
ksedmp: internal or fatal error
ORA-07445: exception encountered: core dump [kslgetl()+120] [SIGSEGV] [Address not mapped to object] [0x000000208] [] []
ORA-00108: failed to set up dispatcher to accept connection asynchronously
Current SQL information unavailable - no session.
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst()+31          call     ksedst1()            000000000 ? 000000001 ?
                                                   2B91BE8F7D50 ? 2B91BE8F7DB0 ?
                                                   2B91BE8F7CF0 ? 000000000 ?
ksedmp()+610         call     ksedst()             000000000 ? 000000001 ?
                                                   2B91BE8F7D50 ? 2B91BE8F7DB0 ?
                                                   2B91BE8F7CF0 ? 000000000 ?
ssexhd()+629         call     ksedmp()             000000003 ? 000000001 ?
                                                   2B91BE8F7D50 ? 2B91BE8F7DB0 ?
                                                   2B91BE8F7CF0 ? 000000000 ?
__restore_rt()+0     call     ssexhd()             00000000B ? 2B91BE8F8D70 ?
                                                   2B91BE8F8C40 ? 2B91BE8F7DB0 ?
                                                   2B91BE8F7CF0 ? 000000000 ?
kslgetl()+120        signal   __restore_rt()       0600E7560 ? 0000000E8 ?
                                                   09AAE5728 ? 0000009A9 ?
                                                   000003980 ? 09AAE5740 ?
ksfglt()+108         call     kslgetl()            0600E7560 ? 000000001 ?
                                                   09AAE5728 ? 0000009A9 ?
                                                   000003980 ? 09AAE5740 ?

四.在MOS上找到相关文章[ID 1298804.1]

Applies to:

Oracle Server - Enterprise Edition - Version: 11.1.0.6 to 11.1.0.7 - Release: 11.1 to 11.1
Information in this document applies to any platform.
Symptoms

The following errors are seen in the trace file written by an ORA-7445 [kslgetl]:

network error encountered getting listening address:
NS Primary Error: TNS-12533: TNS:illegal ADDRESS parameters
NS Secondary Error: TNS-12560: TNS:protocol adapter error
NT Generic Error: TNS-00503: Illegal ADDRESS parameters
OPIRIP: Uncaught error 108. Error stack:
ORA-00108: failed to set up dispatcher to accept connection asynchronously
Exception signal: 11 (SIGSEGV), code: 1 (Address not mapped to object), addr: 0x130, PC: [0x82f09dc, kslgetl()+80]

The trace file indicates that there is no session:

Current SQL information unavailable - no session.

The Call Stack Trace in the ORA-7445 trace file contains a function list similar to:

 kslgetl <- PGOSF57_ksfglt
<- kghfre <- kmnsbf <- nsbfr <- nsiofrrg <- nsiocancel
<- nsopen_shutitdown <- nsclose <- nsgblclose <- nsgblTRMHelper <- nsgblRealTerm
<- nlstdstp <- npinlt <- ksuabt <- opidrv <- sou2o
<- opimai_real <- main <- libc_start_main

Cause

The trace file first reports: Warning: keltnfy call to ldmInit failed with error 46 

The ORA-7445 is not the starting point here. This exception is just a spin-off from ORA-180 and it is possible that different internal errors may be seen, such as ORA-600 [504], depending on what is happening when the ORA-180 is encountered.

The cause for the ORA-180 is related to the inital message at the beginning of the trace file: "keltnfy call to ldmInit failed with error 46" and this is followed by: "network error encountered getting listening address:" 

The error code (here: 46) is the key for solving the issue.
This warning says that ldmInit() returned error 46 which is LDMERR_HOST_NOT_FOUND (host not found).

This error is returned if the OS call gethostbyname() fails with an error. So these appears to be a network specific issue.
Solution

1) Check permission on /etc/hosts

$ ls -l /etc/hosts -rw-r--r-- 2 root root 194 Oct 17 2006 /etc/hosts


Check if /etc/hosts file is correctly configured


<ip address> <fully qualified hostname> <simple or short hostname> <alias, if applicable> ( all of this on one line ).


2) Check the hostname:

$ hostname
$ ping `hostname`

Make sure you are able to ping the hostname


3) Check if /etc/nodename is correctly configured
If you have DNS setup, ping is not a tool to diagnose DNS problem. A better tool to use is nslookup, dnsquery, or dig.


$ nslookup <shortname> $ nslookup <long name> $ nslookup <ip address>

The forward and reverse lookup should succeed and return consistent address/info.


4) Check nsswitch.conf


$ more nsswitch.confhosts: files dnsMake sure host lookup is also done through the /etc/hosts file and not just dns. It is recommended that FILES come first before DNS.


Also, check the resolv.conf. This makes sure that the DNS is working properly.

在这里看到虽然数据库的版本不一样,alert和trace中的错误不完全一致,但是很相似。是由于主机名不能被正常访问导致,所以尝试这从主机名相关部分着手解决。

五.查看主机名相关信息,解决问题

[oracle@gongantest ~]$  ls -l /etc/hosts
-rw-r--r-- 2 root root 176 Dec 16 13:43 /etc/hosts
[oracle@gongantest ~]$ hostname
gongantest
[oracle@gongantest ~]$ ping gongantest
ping: unknown host gongantest
[oracle@gongantest ~]$ more  /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1        localhost.localdomain localhost
#192.168.11.60    gongantest
--发现改主机名的dns被注释掉
[oracle@gongantest ~]$ ping gongantest
PING gongantest (192.168.11.60) 56(84) bytes of data.
From 192.168.9.66 icmp_seq=2 Destination Host Unreachable
From 192.168.9.66 icmp_seq=3 Destination Host Unreachable
From 192.168.9.66 icmp_seq=4 Destination Host Unreachable

--- gongantest ping statistics ---
5 packets transmitted, 0 received, +3 errors, 100% packet loss, time 3999ms, pipe 3
--除掉注释,测试不通
[root@gongantest ~]# ifconfig
eth1      Link encap:Ethernet  HWaddr 00:14:22:10:96:CA  
          inet addr:192.168.9.66  Bcast:192.168.11.255  Mask:255.255.252.0
          inet6 addr: fe80::214:22ff:fe10:96ca/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:4207222 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2482964 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100 
          RX bytes:1156547557 (1.0 GiB)  TX bytes:565900103 (539.6 MiB)
--发现该机器的ip为192.168.9.66,修改hosts文件。
--初步确定出现问题的原因是因为机器迁移,使用了新ip,注释掉了hosts文件中错误ip
[root@gongantest ~]# vi /etc/hosts

# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1        localhost.localdomain localhost
192.168.9.66    gongantest

[oracle@gongantest ~]$ ping gongantest
PING gongantest (192.168.9.66) 56(84) bytes of data.
64 bytes from gongantest (192.168.9.66): icmp_seq=1 ttl=64 time=0.037 ms
64 bytes from gongantest (192.168.9.66): icmp_seq=2 ttl=64 time=0.031 ms
64 bytes from gongantest (192.168.9.66): icmp_seq=3 ttl=64 time=0.033 ms

--- gongantest ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.031/0.033/0.037/0.007 ms
--ping测试正常

继续观察

--观察d000进程是否启动
[oracle@gongantest ~]$ ps -ef|grep ora_
oracle   10180     1  0 13:23 ?        00:00:00 ora_pmon_gaxt
oracle   10182     1  0 13:23 ?        00:00:00 ora_psp0_gaxt
oracle   10184     1  0 13:23 ?        00:00:00 ora_mman_gaxt
oracle   10186     1  0 13:23 ?        00:00:00 ora_dbw0_gaxt
oracle   10188     1  0 13:23 ?        00:00:02 ora_lgwr_gaxt
oracle   10190     1  0 13:23 ?        00:00:00 ora_ckpt_gaxt
oracle   10192     1  0 13:23 ?        00:00:00 ora_smon_gaxt
oracle   10194     1  0 13:23 ?        00:00:00 ora_reco_gaxt
oracle   10196     1  0 13:23 ?        00:00:00 ora_cjq0_gaxt
oracle   10198     1  0 13:23 ?        00:00:00 ora_mmon_gaxt
oracle   10200     1  0 13:23 ?        00:00:00 ora_mmnl_gaxt
oracle   10204     1  0 13:23 ?        00:00:00 ora_s000_gaxt
oracle   10210     1  0 13:23 ?        00:00:00 ora_arc0_gaxt
oracle   10212     1  0 13:23 ?        00:00:00 ora_arc1_gaxt
oracle   10214     1  0 13:23 ?        00:00:00 ora_qmnc_gaxt
oracle   10218     1  0 13:23 ?        00:00:00 ora_j000_gaxt
oracle   10222     1  0 13:24 ?        00:00:00 ora_q000_gaxt
oracle   10234     1  0 13:24 ?        00:00:00 ora_q001_gaxt
oracle   10609     1  0 13:32 ?        00:00:00 ora_d000_gaxt
oracle   10639  9962  0 13:35 pts/0    00:00:00 grep ora_

--观察alert日志未出现该错误
Mon Dec 26 13:32:18 2011
Errors in file /opt/oracle/admin/gaxt/bdump/gaxt_ora_10607.trc:
ORA-07445: exception encountered: core dump [kslgetl()+120] [SIGSEGV] [Address not mapped to object] [0x000000208] [] []
ORA-00108: failed to set up dispatcher to accept connection asynchronously
Mon Dec 26 13:32:21 2011
found dead dispatcher 'D000', pid = (13, 85)
Mon Dec 26 13:36:04 2011
Thread 1 advanced to log sequence 232 (LGWR switch)
  Current log# 1 seq# 232 mem# 0: /opt/oracle/oradata/gaxt/redo01.log

至此ORA-07445[kslgetl()+120]/ORA-00108错误解决

记录一次oer 8102.2处理

1.alert日志

Tue Dec 20 22:09:45 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_27442.trc:
Wed Dec 21 22:10:45 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_32761.trc:
Thu Dec 22 22:11:46 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_5935.trc:
Fri Dec 23 22:12:47 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_11382.trc:

Mnnn performs manageability tasks dispatched to them by MMON. Tasks performed include taking Automatic Workload Repository snapshots and Automatic Database Diagnostic Monitor analysis.
从这个时间点来看,应该是数据库启动GATHER_STATS_JOB收集统计信息时发现这个错误。

2.bas_m000_11382.trc

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /opt/app/oracle/product/10.2.0/db_1
System name:    Linux
Node name:      bas
Release:        2.6.9-78.ELsmp
Version:        #1 SMP Wed Jul 9 15:46:26 EDT 2008
Machine:        x86_64
Instance name: BAS
Redo thread mounted by this instance: 1
Oracle process number: 24
Unix process pid: 11382, image: oracle@bas (m000)

*** ACTION NAME:(Auto-Purge Slave Action) 2011-12-23 22:12:47.074
*** MODULE NAME:(MMON_SLAVE) 2011-12-23 22:12:47.074
*** SERVICE NAME:(SYS$BACKGROUND) 2011-12-23 22:12:47.074
*** SESSION ID:(5465.2033) 2011-12-23 22:12:47.074
oer 8102.2 - obj# 4152, rdba: 0x00401f7c(afn 1, blk# 8060)
kdk key 8102.2:
  ncol: 2, len: 10
  key: (10):  02 c1 0a 06 00 c0 04 dc 00 00
  mask: (4096): 
 09 00 00 00 00 fb d1 c0 00 00 00 00 00 70 f8 fe bf 7f 00 00 00 cd 7d 5d 01
oer 8102.<code> - obj# <object id>, rdba: <rdba value>(afn <file#>, blk# <block#>)
kdk key 8102.2:
ncol: <number of columns in the key including the rowid>, len: <key length>
key: (<length>):<hexadecimal value>

obj#:   object_id for the affected index in dba_objects.
rdba:   relative data block address where the key is supposed to be stored in the index.
afn:     absolute file number where the affected index block is stored. 
(file_id in dba_data_files, file# in v$datafile).
blk#:   Index block number where the key is supposed to be stored.

出现oer 8102.2的错误,有两种可能:1.坏块,2.表和索引数据不一致

3.找出相关对象

SQL> col object_name for a30
SQL> col owner for a10
SQL> select object_name,owner,object_type
  2  from dba_objects where object_id=4152;

OBJECT_NAME                    OWNER      OBJECT_TYPE
------------------------------ ---------- -------------------
WRI$_SEGADV_OBJLIST_IDX_TS     SYS        INDEX

SQL> select OWNER,TABLE_NAME from dba_indexes 
   2 where index_name='WRI$_SEGADV_OBJLIST_IDX_TS';

OWNER      TABLE_NAME
---------- ------------------------------
SYS        WRI$_SEGADV_OBJLIST

SQL> ANALYZE TABLE sys.WRI$_SEGADV_OBJLIST VALIDATE STRUCTURE CASCADE;
ANALYZE TABLE sys.WRI$_SEGADV_OBJLIST VALIDATE STRUCTURE CASCADE
*
ERROR at line 1:
ORA-01499: table/index cross reference failure - see trace file

4.分析坏块(逻辑/物理)

SQL> ANALYZE INDEX WRI$_SEGADV_OBJLIST_IDX_TS VALIDATE STRUCTURE;

Index analyzed.

SQL> ANALYZE TABLE WRI$_SEGADV_OBJLIST VALIDATE STRUCTURE;

Table analyzed.

[oracle@bas bdump]$ dbv file=/opt/app/oracle/oradata/BAS/system01.dbf

DBVERIFY: Release 10.2.0.1.0 - Production on Sat Dec 24 21:14:38 2011

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

DBVERIFY - Verification starting : FILE = /opt/app/oracle/oradata/BAS/system01.dbf


DBVERIFY - Verification complete

Total Pages Examined         : 552960
Total Pages Processed (Data) : 360156
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 167596
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 1961
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 23247
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Highest block SCN            : 2890198330 (2750.2890198330)

检测证明,对象以及对象所属的数据文件,无坏块现象

5.分析表和index不一致

--找出index对应列
SQL> SELECT table_name , column_name from dba_ind_columns  
  2  WHERE index_name='WRI$_SEGADV_OBJLIST_IDX_TS' order by table_name; 

TABLE_NAME                     COLUMN_NAME
------------------------------ --------------------
WRI$_SEGADV_OBJLIST            TS_ID

--确定对应列是否允许为null
SQL> desc WRI$_SEGADV_OBJLIST
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 AUTO_TASKID                                        NUMBER
 TS_ID                                              NUMBER
 OBJN                                               NUMBER
 OBJD                                               NUMBER
 STATUS                                             VARCHAR2(40)
 TASK_ID                                            NUMBER
 REASON                                             VARCHAR2(40)
 REASON_VALUE                                       NUMBER
 CREATION_TIME                                      TIMESTAMP(6)
 PROC_TASKID                                        NUMBER
 END_TIME                                           TIMESTAMP(6)
 SEGMENT_OWNER                                      VARCHAR2(30)
 SEGMENT_NAME                                       VARCHAR2(81)
 PARTITION_NAME                                     VARCHAR2(30)
 SEGMENT_TYPE                                       VARCHAR2(18)
 TABLESPACE_NAME                                    VARCHAR2(30)

--确认在表中对应列是否有空值
SQL> SELECT /*+ FULL(t1) */ count(TS_ID)
  2   FROM WRI$_SEGADV_OBJLIST t1 
  3    WHERE t1.TS_ID IS NULL;

COUNT(TS_ID)
------------
           0
--表比index多数据
SQL> SELECT /*+ FULL(t1) */ TS_ID
  2  FROM WRI$_SEGADV_OBJLIST t1
  3  MINUS
  4  SELECT /*+ index(t WRI$_SEGADV_OBJLIST_IDX_TS) */ TS_ID
  5  FROM WRI$_SEGADV_OBJLIST t where  ts_id is not null;

no rows selected


--index中数据条数
SQL> SELECT /*+ index(t WRI$_SEGADV_OBJLIST_IDX_TS) */ count(TS_ID)
  2  FROM WRI$_SEGADV_OBJLIST t
  3  where ts_id is not null;

COUNT(TS_ID)
------------
         901

--表中数据条数
SQL> SELECT /*+ FULL(t1) */ count(TS_ID)
  2  FROM WRI$_SEGADV_OBJLIST t1 ;

COUNT(TS_ID)
------------
         937

--index中不同值数量
SQL> SELECT /*+ index(t WRI$_SEGADV_OBJLIST_IDX_TS) */ 
  2  COUNT(DISTINCT TS_ID)
  3  FROM WRI$_SEGADV_OBJLIST t WHERE TS_ID IS NOT NULL;

COUNT(DISTINCTTS_ID)
--------------------
                   5
--表中不同值数量
SQL> SELECT /*+ index(t WRI$_SEGADV_OBJLIST_IDX_TS) */ TS_ID
  2  FROM WRI$_SEGADV_OBJLIST t WHERE ts_id IS NOT NULL
  3  MINUS
  4  SELECT /*+ FULL(t1) */ TS_ID
  5  FROM WRI$_SEGADV_OBJLIST t1 ;

     TS_ID
----------
         4
--对比可以知道index中的唯一值比表中,这个也就解释了,为什么表中总条数多,
--但是他们两做减法的时候,记录为空

--索引表比表多数据
SQL> SELECT /*+ index(t WRI$_SEGADV_OBJLIST_IDX_TS) */ TS_ID
  2  FROM WRI$_SEGADV_OBJLIST t WHERE ts_id IS NOT NULL
  3  MINUS
  4  SELECT /*+ FULL(t1) */ TS_ID
  5  FROM WRI$_SEGADV_OBJLIST t1 ;

     TS_ID
----------
         4

上面的检测证明:1.表中有索引中无的数据,2.索引中有表中不存在数据

6.解决问题

SQL> alter index WRI$_SEGADV_OBJLIST_IDX_TS rebuild online;

Index altered.

--测试index中总条数
SQL> SELECT /*+ index(t WRI$_SEGADV_OBJLIST_IDX_TS) */ count(TS_ID)
  2   FROM WRI$_SEGADV_OBJLIST t
  3    where  ts_id is not null;

COUNT(TS_ID)
------------
         937

--无多余index项(以前唯一值为4的记录已经不存在)
SQL> SELECT /*+ index(t WRI$_SEGADV_OBJLIST_IDX_TS) */ TS_ID
  2      FROM WRI$_SEGADV_OBJLIST t WHERE ts_id IS NOT NULL
  3    MINUS
  4      SELECT /*+ FULL(t1) */ TS_ID
  5      FROM WRI$_SEGADV_OBJLIST t1 ;

no rows selected

--通过上述测试,证明表和index不一致问题解决

通过ROWID找回坏块数据

一.准备环境

C:\Users\XIFENFEI>sqlplus chf/xifenfei

SQL*Plus: Release 11.2.0.1.0 Production on 星期五 12月 23 10:49:52 2011

Copyright (c) 1982, 2010, Oracle.  All rights reserved.


连接到:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options

SQL> create tablespace t_xff datafile 'E:\ORACLE\ORADATA\XFF\t_xff01.dbf'
  2  size 10m autoextend on next 10m maxsize 1g;

表空间已创建。

SQL> create table t_xifenfei tablespace t_xff
  2  as
  3  select * from dba_objects;

表已创建。

SQL> select count(*) from t_xifenfei;

  COUNT(*)
----------
     73286

二.发现坏块
使用ULtraEdit破坏数据(关闭数据库执行)

SQL> select count(*) from t_xifenfei;
select count(*) from t_xifenfei
                     *
第 1 行出现错误:
ORA-01578: ORACLE 数据块损坏 (文件号 13, 块号 373)
ORA-01110: 数据文件 13: 'E:\ORACLE\ORADATA\XFF\T_XFF01.DBF'

三.查询坏块相关信息

The "LOW_RID" is the lowest rowid INSIDE the corrupt block:
SELECT dbms_rowid.rowid_create(1,<DATA_OBJECT_ID>,<RFN>,<BL>,0) LOW_RID
from DUAL;

The "HI_RID" is the first rowid AFTER the corrupt block:
SELECT dbms_rowid.rowid_create(1,<DATA_OBJECT_ID>,<RFN>,<BL>+1,0) HI_RID
from DUAL;

SQL> col tablespace_name for a30
SQL> col segment_type for a5
SQL> col owner for a10
SQL> col segment_name for a20
SQL> SELECT tablespace_name, segment_type, owner, segment_name
  2                 FROM dba_extents
  3                WHERE file_id =13
  4  AND 373 between block_id AND block_id + blocks - 1 ;

TABLESPACE_NAME                SEGME OWNER      SEGMENT_NAME
------------------------------ ----- ---------- --------------------
T_XFF                          TABLE CHF        T_XIFENFEI

SQL> SELECT data_object_id
  2            FROM dba_objects
  3   WHERE object_name = 'T_XIFENFEI'  and owner='CHF';

DATA_OBJECT_ID
--------------
         77759
--坏块的最小rowid
SQL> select dbms_rowid.rowid_create(1, 77759,13,373,0) from dual;

DBMS_ROWID.ROWID_C
------------------
AAAS+/AANAAAAF1AAA

坏块的最大rowid(block+1得到)
SQL> select dbms_rowid.rowid_create(1, 77759,13,374,0) from dual;

DBMS_ROWID.ROWID_C
------------------
AAAS+/AANAAAAF2AAA

四.根据rowid找回数据

SQL> SELECT /*+ ROWID(A) */ COUNT(*) FROM T_XIFENFEI A
  2  WHERE ROWID>='AAAS+/AANAAAAF2AAA';

  COUNT(*)
----------
     55858

SQL> SELECT /*+ ROWID(A) */ COUNT(*) FROM T_XIFENFEI A
  2  WHERE  ROWID<'AAAS+/AANAAAAF1AAA';

  COUNT(*)
----------
     17358

SQL> SELECT 77759-55858-17358 from dual;

77759-55858-17358
-----------------
             4543

SQL> CREATE TABLE T_XIFENFEI_BAK  TABLESPACE T_XFF
  2  AS
  3  SELECT /*+ ROWID(A) */ * FROM T_XIFENFEI A
  4  WHERE ROWID>='AAAS+/AANAAAAF2AAA';

表已创建。

SQL> INSERT INTO  T_XIFENFEI_BAK
  2  SELECT /*+ ROWID(A) */ * FROM T_XIFENFEI A
  3  WHERE  ROWID<'AAAS+/AANAAAAF1AAA';

已创建17358行。

SQL> COMMIT;

提交完成。

SQL> SELECT COUNT(*) FROM T_XIFENFEI_BAK;

  COUNT(*)
----------
     73216

五.和dbms_repair解决坏块对比

SQL> CONN / AS SYSDBA
已连接。
SQL> exec dbms_repair.skip_corrupt_blocks('CHF','T_XIFENFEI');

PL/SQL 过程已成功完成。

SQL>  select skip_corrupt from dba_tables where table_name='T_XIFENFEI';

SKIP_COR
--------
ENABLED

SQL> select count(*) from chf.t_xifenfei;

  COUNT(*)
----------
     73216

通过跳过坏块和rowid功能对比可以看出,两者丢失的数据是相同的,如果有index,同样利用rowid结合index,可能会找回部分数据。当然dbms_repair也提供了类此的功能。两种方法的使用看个人的爱好与习惯。