ORACLE在AIX中产生SOFTWARE PROGRAM ABNORMALLY TERMINATED警告原因

数据库中发现如下错误
该错误的解决方案:ORA-07445[dbgrmqmqpk_query_pick_key()+0f88]

Dump file /oracle/diag/rdbms/sgerp5/sgerp5/incident/incdir_579300/sgerp5_m000_7602504_i579300.trc
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /oracle/product/11.1.0/db_1
System name:    AIX
Node name:  sgerp5
Release:    1
Version:    6
Machine:    00C8F0564C00
Instance name: sgerp5
Redo thread mounted by this instance: 1
Oracle process number: 138
Unix process pid: 7602504, image: oracle@sgerp5 (m000)
 
*** 2012-05-11 03:52:35.200
*** SESSION ID:(752.5029) 2012-05-11 03:52:35.200
*** CLIENT ID:() 2012-05-11 03:52:35.200
*** SERVICE NAME:(SYS$BACKGROUND) 2012-05-11 03:52:35.200
*** MODULE NAME:(MMON_SLAVE) 2012-05-11 03:52:35.200
*** ACTION NAME:(Auto-Purge Slave Action) 2012-05-11 03:52:35.200
 
Dump continued from file: /oracle/diag/rdbms/sgerp5/sgerp5/trace/sgerp5_m000_7602504.trc
ORA-07445: exception encountered: core dump [dbgrmqmqpk_query_pick_key()+0f88] 
[SIGSEGV] [ADDR:0xB38F0000000049][PC:0x100213C08] [Address not mapped to object] []

errpt错误说明
在产生7445错误的同时观察aix系统错误日志发现SOFTWARE PROGRAM ABNORMALLY TERMINATED错误

sgerp5_[oracle]-->errpt -aj A924A5FC
---------------------------------------------------------------------------
LABEL:          CORE_DUMP
IDENTIFIER:     A924A5FC

Date/Time:       Fri May 11 03:52:55 BEIST 2012
Sequence Number: 471
Machine Id:      00C8F0564C00
Node Id:         sgerp5
Class:           S
Type:            PERM
WPAR:            Global
Resource Name:   SYSPROC         

Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED

Probable Causes
SOFTWARE PROGRAM

User Causes
USER GENERATED SIGNAL

        Recommended Actions
        CORRECT THEN RETRY

Failure Causes
SOFTWARE PROGRAM

        Recommended Actions
        RERUN THE APPLICATION PROGRAM
        IF PROBLEM PERSISTS THEN DO THE FOLLOWING
        CONTACT APPROPRIATE SERVICE REPRESENTATIVE

Detail Data
SIGNAL NUMBER
           6
USER'S PROCESS ID:
               7602504
FILE SYSTEM SERIAL NUMBER
          14
INODE NUMBER
           0      367648
CORE FILE NAME
/oracle/diag/rdbms/sgerp5/sgerp5/cdump/core_7602504/core
PROGRAM NAME
oracle
STACK EXECUTION DISABLED
           0
COME FROM ADDRESS REGISTER
sskgmcrea 0

PROCESSOR ID
  hw_fru_id: 1
  hw_cpu_id: 2

ADDITIONAL INFORMATION
skgdbgcra 224
??
ksdbgcra 3D0
ssexhd 978
??

Symptom Data
REPORTABLE
1
INTERNAL ERROR
0
SYMPTOM CODE
PCSS/SPI2 FLDS/oracle SIG/6 FLDS/skgdbgcra VALU/224

错误原因

This error is logged when a software program abnormally ends and causes a core dump. Users might
not be exiting applications correctly, the system might have been shut down while users were
working in application, or the user's terminal might have locked up and the application stopped
1)这里也就是说如果oracle进程在aix机器上异常终止,并且产生了一个core dump文件,
  就会出现SOFTWARE PROGRAM ABNORMALLY TERMINATED警告信息
2)用户登录系统没有正常退出,而系统被关闭
3)用户强制终止一个一个lock,而导致进程停止

本次AIX日志警告原因:由于进程7602504异常终止(ORA-07445错误)并且产生了 /oracle/diag/rdbms/sgerp5/sgerp5/cdump/core_7602504/core dump 文件,从而有了AIX中的SOFTWARE PROGRAM ABNORMALLY TERMINATED警告信息

ORA-07445[dbgrmqmqpk_query_pick_key()+0f88]

alert发现如下错误ORA-07445[dbgrmqmqpk_query_pick_key()+0f88]

Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0xB38F0000000049] 
[PC:0x100213C08, dbgrmqmqpk_query_pick_key()+0f88]
Errors in file /oracle/diag/rdbms/sgerp5/sgerp5/trace/sgerp5_m000_7602504.trc  (incident=579300):
ORA-07445: exception encountered: core dump [dbgrmqmqpk_query_pick_key()+0f88] [SIGSEGV] [ADDR:0xB38F0000000049] 
[PC:0x100213C08] [Address not mapped to object] []
Incident details in: /oracle/diag/rdbms/sgerp5/sgerp5/incident/incdir_579300/sgerp5_m000_7602504_i579300.trc

trace文件部分信息

Dump file /oracle/diag/rdbms/sgerp5/sgerp5/incident/incdir_579300/sgerp5_m000_7602504_i579300.trc
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /oracle/product/11.1.0/db_1
System name:	AIX
Node name:	sgerp5
Release:	1
Version:	6
Machine:	00C8F0564C00
Instance name: sgerp5
Redo thread mounted by this instance: 1
Oracle process number: 138
Unix process pid: 7602504, image: oracle@sgerp5 (m000)
--确定是m000进程出现异常,而该进程是awr收集统计信息进程MMON的子进程

*** 2012-05-11 03:52:35.200
*** SESSION ID:(752.5029) 2012-05-11 03:52:35.200
*** CLIENT ID:() 2012-05-11 03:52:35.200
*** SERVICE NAME:(SYS$BACKGROUND) 2012-05-11 03:52:35.200
*** MODULE NAME:(MMON_SLAVE) 2012-05-11 03:52:35.200
*** ACTION NAME:(Auto-Purge Slave Action) 2012-05-11 03:52:35.200
 
Dump continued from file: /oracle/diag/rdbms/sgerp5/sgerp5/trace/sgerp5_m000_7602504.trc
ORA-07445: exception encountered: core dump [dbgrmqmqpk_query_pick_key()+0f88] [SIGSEGV] [ADDR:0xB38F0000000049] 
[PC:0x100213C08] [Address not mapped to object] []

----- Incident Context Dump -----
Address: 0x1104bdb68
Incident ID: 579300
Problem Key: ORA 7445 [dbgrmqmqpk_query_pick_key()+0f88]
Error: ORA-7445 [dbgrmqmqpk_query_pick_key()+0f88] [SIGSEGV] [ADDR:0xB38F0000000049] [PC:0x100213C08] 
[Address not mapped to object] [] [] []
[00]: dbgexExplicitEndInc [diag_dde]
[01]: dbgeEndDDEInvocationImpl [diag_dde]
[02]: dbgeEndDDEInvocation [diag_dde]
[03]: ssexhd []
[04]: 47dc []<-- Signaling
[05]: dbgrmqmfs_fetch_setup [ams_comp]
[06]: dbgrmqmf_fetch_real [ams_comp]
[07]: dbgrmqmf_fetch [ams_comp]
[08]: dbgrip_fetch_record [ami_comp]
[09]: dbgrip_relation_iterator [ami_comp]
[10]: dbgripricm_rltniter_wcbf_mt [ami_comp]
[11]: dbgripdrm_dmldrv_mt [ami_comp]
[12]: dbghmm_delete_info_records []
[13]: dbghmo_purge_hm_schema []
[14]: dbgrupipscb_hm_pgsvc_cbf [diag_adr]
[15]: dbgruppm_purge_main [diag_adr]
[16]: dbkrapg_auto_purge [rdbms_adr]
[17]: kewraps_auto_purge_slave []
[18]: kebm_slave_main []
[19]: ksvrdp [ksv_trace]
[20]: opirip []
[21]: opidrv []
[22]: sou2o []
[23]: opimai_real []
[24]: main []
[25]: __start []
MD [00]: 'SID'='752.5029' (0x3)
MD [01]: 'ProcId'='138.46' (0x3)
MD [02]: 'PQ'='(0, 1336679546)' (0x7)
MD [03]: 'Client ProcId'='oracle@sgerp5.7602504_1' (0x0)

MOS关于该问题记录
问题原因

This is due to Bug 9390347 fixed in 12.1 & 11.2.0.2, where a core dump can occur 
in module dbgrmqmqpk_query_pick_key() whilst purging HM contents from ADR.

解决方案

- Either install our 11.2.0.2 patchset
- Or download and apply Patch 9390347 if available for your version/platform.
- On Windows, you can also install Bundle Patch 11.1.0.7.31 or above.
There is no workaround to this error, however the error is not serious and does not cause any harm to your database.

常驻连接池(Database Resident Connection Pool)

一.介绍常驻连接池(Database Resident Connection Pool,Oracle DRCP)
数据库驻留连接池是Oracle Database 11g的一个新特性,专门为了解决在需要支持大量连接的环境对可扩性的迫切需求而设计的。数据库驻留连接池把数据库服务器进程和对话汇合起来(这样的组合称之为池服务器),通过从单主机或不同主机发出的多个应用软件进程的连接进行共享。由一个连接代理(Connection Broker)进程控制着数据库后台进程中的池服务器。连接代理会持续的连接客户并对客户进行验证。当需要进行某种数据库活动时,客户将请求连接代理提供池服务器,使用完毕后再将它们释放以供其他客户重新使用。当池服务器处在使用当中时,相当于一台专用服务器。对于来自常驻通道中的客户端连接请求,连接代理会为其选择一个合适的池服务器,并把客户端请求交给该池服务器处理,不再干涉。此后客户通过和该池服务器的直接对话来完成所有的数据库活动。当客户完成请求任务释放池服务器后,连接代理将重新接管该池服务器。

二.什么时间使用DRCP
1 使用较小内存的、大量的客户端连接
2 客户端应用是相似的,可以共享或重用会话
3 客户端占用数据库连接的周期相当短
4 会话不需要跨客户请求
5 客户端有众多的主机与进程

三.Dedicated Servers,Shared Servers与DRCP的内存需求和区别对比
一般情况下,由于每个会话需要消耗400k的内存,每个进程需要消耗4m的内存,现在我们以DRCP的pool size是100,shared server的shared server进程是100为例,假如有5000个客户端连接到这些环境,则这些主机的内存分配如下:

A Dedicated Server
Memory used = 5000 X (400 KB + 4 MB) = 22 GB

B Shared Server
Memory used = 5000 X 400 KB + 100 X 4 MB = 2.5 GB
Out of the 2.5 GB, 2 GB is allocated from the SGA
.
C Database Resident Connection Pooling
Memory used = 100 X (400 KB + 4 MB) + (5000 X 35KB)= 615 MB


四.使用DRCP时,当实例有活动pooled server,有以下限制:
1 不能shutdown database;
2 不能停掉DRCP;
3 不能用database link连接到不同实例的DRCP;
4 不能使用Advanced Security Option (ASO),比如encryption等

五.客户端如何连接到DRCP
如果是专用服务器连接,则SERVER=DEDICATED,如果是DRCP连接,则SEVER=POOLED。如果要指定客户端请求到DRCP,则客户端的tnsnames.ora中的连接字符串必须指定连接类型是POOLED,配置方式如下所示:

ORA11G_P =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.10)(PORT = 1521))
    )
    (CONNECT_DATA =
     (SERVER=POOLED)  --注意
      (SERVICE_NAME = ora11g)
    )
  )

说明:
1)如果在tnsnames.ora中指定了SERVER=POOLED,但并没有在实例中启动DRCP,则当客户请求连接时,DB会报ORA-12520错误。
2)11g的客户端才可以使用DRCP,如果10g的客户端在tnsnames.ora中指定了SERVER=POOLED,则连接时报ORA-56606。

六.DRCP配置/查询
1.DRCP启动/关闭

--查看当前DRCP状态
SQL> select CONNECTION_POOL,status from dba_cpool_info;

CONNECTION_POOL                STATUS
------------------------------ --------------------------------
SYS_DEFAULT_CONNECTION_POOL    INACTIVE

--启动DRCP
SQL> execute dbms_connection_pool.start_pool;

PL/SQL procedure successfully completed.

SQL> select CONNECTION_POOL,status from dba_cpool_info;

CONNECTION_POOL                STATUS
------------------------------ --------------------------------
SYS_DEFAULT_CONNECTION_POOL    ACTIVE

--关闭DRCP
SQL> exec dbms_connection_pool.stop_pool

PL/SQL procedure successfully completed.

SQL> select CONNECTION_POOL,status from dba_cpool_info;

CONNECTION_POOL                STATUS
------------------------------ --------------------------------
SYS_DEFAULT_CONNECTION_POOL    INACTIVE

2.修改DRCP参数

--dbms_connection_pool.configure_pool
exec dbms_connection_pool.configure_pool(
	POOL_NAME=>'SYS_DEFAULT_CONNECTION_POOL',
	minsize=>10,
	maxsize=>100,
	INCRSIZE=>10,
	SESSION_CACHED_CURSORS=>50,
	inactivity_timeout=>3000,
	max_think_time=>100,
	MAX_USE_SESSION=>10000,
	MAX_LIFETIME_SESSION=>36000
   );

--dbms_connection_pool.alter_param
exec dbms_connection_pool.alter_param(
	POOL_NAME=>'SYS_DEFAULT_CONNECTION_POOL',
	PARAM_NAME=>'MINSIZE',
	PARAM_VALUE=>'2');

3.DRCP视图

DBA_CPOOL_INFO 
displays configuration information about all Database Resident Connection Pools in the database.

V$CPOOL_STATS 
displays information about the Database Resident Connection Pool statistics for an instance


V$CPOOL_CC_STATS 
displays information about the connection class level statistics for 
the Database Resident Connection Pool per instance.

V$CPOOL_CONN_INFO 
displays connection information about each connection to the connection broker.

V$CPOOL_CC_INFO 
displays information about the pool-to-connection class mapping for 
the Database Resident Connection Pool per instance.

七.DRCP相关进程

oracle   11715     1  0 21:38 ?        00:00:00 ora_n000_ora11g
oracle   11719     1  0 21:38 ?        00:00:00 ora_l000_ora11g
oracle   11723     1  0 21:38 ?        00:00:00 ora_l001_ora11g
oracle   11727     1  0 21:38 ?        00:00:00 ora_l002_ora11g
oracle   11731     1  0 21:38 ?        00:00:02 ora_l003_ora11g
oracle   12490     1  0 21:57 ?        00:00:00 ora_l004_ora11g
oracle   12494     1  0 21:57 ?        00:00:00 ora_l005_ora11g
oracle   12498     1  0 21:57 ?        00:00:00 ora_l006_ora11g
oracle   12502     1  0 21:57 ?        00:00:00 ora_l007_ora11g
oracle   12506     1  0 21:57 ?        00:00:00 ora_l008_ora11g
oracle   12510     1  0 21:57 ?        00:00:00 ora_l009_ora11g
oracle   12514     1  0 21:57 ?        00:00:01 ora_l010_ora11g
oracle   12518     1  0 21:57 ?        00:00:00 ora_l011_ora11g
oracle   12522     1  0 21:57 ?        00:00:00 ora_l012_ora11g
oracle   12526     1  0 21:57 ?        00:00:00 ora_l013_ora11g
oracle   12530     1  0 21:57 ?        00:00:00 ora_l014_ora11g
oracle   12534     1  0 21:57 ?        00:00:00 ora_l015_ora11g
oracle   12538     1  0 21:57 ?        00:00:00 ora_l016_ora11g
oracle   12542     1  0 21:57 ?        00:00:00 ora_l017_ora11g
oracle   12546     1  0 21:57 ?        00:00:00 ora_l018_ora11g
oracle   12550     1  0 21:57 ?        00:00:00 ora_l019_ora11g

ora_n000_ora11g 
Connection Broker Process

ora_l000_ora11g 
Pooled Server Process(Handles client requests in Database Resident Connection Pooling)

提高短连接性能方法测试

创建测试脚本
通过在三个会话中同时执行test_login.sh脚本,模拟当数据库多个短连接情况性能

[oracle@xifenfei tmp]$ more test_login.sh 
#!/bin/bash
echo "start login database `date`*********" >>/tmp/test_1.log
e=2000
for((i=1;i<=$e;i=i+1))
do
/tmp/login_oracle.sh 
done
echo "end login database `date`*********" >>/tmp/test_1.log

[oracle@xifenfei tmp]$ more login_oracle.sh 
#!/bin/bash
sqlplus chf/xifenfei@ORA11G_P<<XFF>/dev/null 
select to_char(sysdate,'yyyy-mm-dd hh24:mi:ss') from dual;
exit
XFF 

--ORA11G_P根据不同的测试情景指定不同名称

情况1:一个监听情况下

start login database Tue May  1 18:03:32 CST 2012*********
start login database Tue May  1 18:03:35 CST 2012*********
start login database Tue May  1 18:03:37 CST 2012*********

end login database Tue May  1 18:08:20 CST 2012*********
end login database Tue May  1 18:08:25 CST 2012*********
end login database Tue May  1 18:08:26 CST 2012*********

--计算2000个会话登录/查询/推出时间
4:48
4:40
4:49

情况2:三个监听,客户端配置tns负载均衡

--监听配置
LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.10)(PORT = 1521))
    )
  )
SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
     (GLOBAL_DBNAME = ora11g)
     (ORACLE_HOME = /u01/oracle/oracle/product/11.2.0/db_1)
     (SID_NAME = ora11g)
    )
  )
ADR_BASE_LISTENER = /u01/oracle

LISTENER1 =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.10)(PORT = 1522))
    )
  )
SID_LIST_LISTENER1 =
  (SID_LIST =
    (SID_DESC =
     (GLOBAL_DBNAME = ora11g)
     (ORACLE_HOME = /u01/oracle/oracle/product/11.2.0/db_1)
     (SID_NAME = ora11g)
    )
  )
ADR_BASE_LISTENER1 = /u01/oracle


LISTENER2 =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.10)(PORT = 1523))
    )
  )
SID_LIST_LISTENER2 =
  (SID_LIST =
    (SID_DESC =
     (GLOBAL_DBNAME = ora11g)
     (ORACLE_HOME = /u01/oracle/oracle/product/11.2.0/db_1)
     (SID_NAME = ora11g)
    )
  )
ADR_BASE_LISTENER2 = /u01/oracle

--tns配置
ORA11G_M =
  (DESCRIPTION =
      (LOAD_BALANCE=ON)
      (FAILOVER=ON)
      (ADDRESS_LIST =
       (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.10)(PORT = 1521))
       (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.10)(PORT = 1522))
       (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.10)(PORT = 1523))
       (LOAD_BALANCE = yes)
    )
    (CONNECT_DATA =
     (SERVER=DEDICATED)
      (SERVICE_NAME = ora11g)
    )
  )

--测试结果
start login database Tue May  1 17:51:45 CST 2012*********
start login database Tue May  1 17:51:49 CST 2012*********
start login database Tue May  1 17:51:51 CST 2012*********

end login database Tue May  1 17:55:58 CST 2012*********
end login database Tue May  1 17:56:06 CST 2012*********
end login database Tue May  1 17:56:09 CST 2012*********

--计算2000个会话登录/查询/推出时间
4:13
4:17
4:18

情况2:使用常驻连接池DRCP(11g新特性)

--启动默认DRCP
SQL> exec dbms_connection_pool.start_pool();

PL/SQL procedure successfully completed.

--tns配置
ORA11G_P =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.10)(PORT = 1521))
    )
    (CONNECT_DATA =
     (SERVER=POOLED)  --注意
      (SERVICE_NAME = ora11g)
    )
  )

--执行结果
start login database Tue May  1 18:19:58 CST 2012*********
start login database Tue May  1 18:20:01 CST 2012*********
start login database Tue May  1 18:20:03 CST 2012*********

end login database Tue May  1 18:23:16 CST 2012*********
end login database Tue May  1 18:23:19 CST 2012*********
end login database Tue May  1 18:23:21 CST 2012*********

--计算2000个会话登录/查询/推出时间
3:16
3:18
3:19

总结
如果在数据库短连接过程中发现监听是瓶颈的时候,可以考虑使用多个监听+tns 负载均衡,从一定程度上缓解监听瓶颈.如果是11g数据库可以考虑使用其心功能DRCP,从而很大程度上提高短连接过程中数据库的效率.因为DRCP还属于11g的新功能稳定性不知道如何?使用该功能前,请一定要做好相关测试工作.如有可能还是建议从应用层面尽可能的使用长连接,提高数据库会话效率.