因域名解析导致数据库连接延迟分析

一、现状记录

[oracle@node1 ~]$ /sbin/ifconfig
eth1      Link encap:Ethernet  HWaddr 00:25:90:04:AB:6B  
          inet addr:192.168.9.140  Bcast:192.168.15.255  Mask:255.255.248.0
          inet6 addr: fe80::225:90ff:fe04:ab6b/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:23530402 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10959123 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:15308483748 (14.2 GiB)  TX bytes:10087987532 (9.3 GiB)
--IP地址为192.168.9.140

[oracle@node1 ~]$ more /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               ecp-db localhost.localdomain localhost
192.168.9.140   node1.srtcloud.com
--域名node1.srtcloud.com对应ip192.168.9.140 

[oracle@node1 ~]$ lsnrctl status
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=node1.srtcloud.com)(PORT=1521)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 10.2.0.5.0 - Production
Start Date                04-NOV-2011 09:08:51
Uptime                    21 days 4 hr. 58 min. 45 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /opt/oracle/product/10.2.0/db_1/network/admin/listener.ora
Listener Log File         /opt/oracle/product/10.2.0/db_1/network/log/listener.log
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=node1.srtcloud.com)(PORT=1521)))
Services Summary...
Service "ecp" has 2 instance(s).
  Instance "ecp", status UNKNOWN, has 1 handler(s) for this service...
  Instance "ecp", status READY, has 1 handler(s) for this service...
Service "ecpXDB" has 1 instance(s).
  Instance "ecp", status READY, has 1 handler(s) for this service...
Service "ecp_XPT" has 1 instance(s).
  Instance "ecp", status READY, has 1 handler(s) for this service...
Service "ora11g" has 2 instance(s).
  Instance "ora11g", status UNKNOWN, has 1 handler(s) for this service...
  Instance "ora11g", status READY, has 1 handler(s) for this service...
Service "ora11gXDB" has 1 instance(s).
  Instance "ora11g", status READY, has 1 handler(s) for this service...
The command completed successfully
--说明:ora11g是oracle 11g,ecp是oracle 10g
--当前使用域名node1.srtcloud.com监听

[oracle@node1 ~]$ more /opt/oracle/product/10.2.0/db_1/network/admin/tnsnames.ora 
# tnsnames.ora Network Configuration File: /opt/oracle/product/10.2.0/db_1/network/admin/tnsnames.ora
# Generated by Oracle configuration tools.

ECP =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = node1.srtcloud.com)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = ecp)
    )
  )

ORA11G =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = node1.srtcloud.com)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = ora11g)
    )
  )
--tns也通过域名访问

[oracle@node1 ~]$ more /etc/resolv.conf 
nameserver 211.155.235.201
nameserver 211.155.235.188
--当前有效的dns服务器

[oracle@node1 ~]$ more /etc/nsswitch.conf |grep hosts:
hosts:     files dns
--域名解析顺序

二、数据库正常工作分析
1、tns工作:客户端通过tns访问数据库,tns配置的是域名访问,所以需要解析,因为此刻解析的顺序是先利用/etc/hosts解析,所以读取hosts文件,获取到ip,然后访问对应数据库,和监听接触。
2、监听工作:监听的是域名,其实也是通过hosts解析成ip的
3、这里能够正常的工作,是因为hosts文件解析了域名

三、模拟数据库访问延迟

[oracle@node1 ~]$ more /etc/nsswitch.conf |grep hosts:
hosts:     dns files
--先使用dns服务器解析,再使用hosts文件

[oracle@node1 ~]$ more /etc/resolv.conf 
nameserver 11.1.1.1
--无效的dns服务器

[oracle@node1 ~]$ sqlplus chf/xifenfei@ora11g

SQL*Plus: Release 10.2.0.5.0 - Production on Fri Nov 25 14:44:55 2011

Copyright (c) 1982, 2010, Oracle.  All Rights Reserved.
--会在这里一个很长的时间等待

[oracle@node1 ~]$ lsnrctl status

LSNRCTL for Linux: Version 10.2.0.5.0 - Production on 25-NOV-2011 14:48:26

Copyright (c) 1991, 2010, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=node1.srtcloud.com)(PORT=1521)))
--也会一个长时间的等待

--问题原因:就是因为解析域名的时候,先去访问dns服务器,因为该ip非dns服务器ip,所以会一直等待该ip超时,
--然后访问hosts文件获取ip地址(这个就是为什么我们登录或者查看监听状态的时候,会出现如此长的时间的等待)

其实因为dns延迟的现象有很多种,我这里只是举了一个最简单,比较常见的例子,在处理因dns解析的监听延迟的问题上,可以参考下面几点:
1、如果非特殊情况,尽可能使用ip地址在监听和tns中
2、如果是使用域名,请尽可能使用hosts解析,解析顺序配置为files优先(因为dns服务器有很多不确定,不可控因素)
3、如果一定要使用dns服务器解析,请把稳定的dns服务器配置在第一项,尽可能避免出现dns服务器不可达或者不存在该域名的现象