iscsiadm主要操作命令

Posted on 2012 年 06 月 28 日 by 惜分飞

当前包含磁盘

[root@xifenfei ~]# fdisk -l

Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1        2355    18916506   83  Linux
/dev/sda2            2356        2610     2048287+  82  Linux swap / Solaris

Disk /dev/sdb: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1        2610    20964793+  83  Linux

Disk /dev/sdc: 2147 MB, 2147483648 bytes
255 heads, 63 sectors/track, 261 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdc doesn't contain a valid partition table

Disk /dev/sdd: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1               1         100      803218+  83  Linux
/dev/sdd2             101        1000     7229250   83  Linux

查看iscsi运行情况

[root@xifenfei ~]# rpm -aq|grep iscsi
iscsi-initiator-utils-6.2.0.872-10.0.1.el5
[root@xifenfei ~]#  chkconfig --list |grep iscsi
iscsi           0:off   1:off   2:on    3:on    4:on    5:on    6:off
iscsid          0:off   1:off   2:off   3:on    4:on    5:on    6:off
[root@xifenfei ~]# ps -ef|grep iscs
root      2753     2  0 Jun21 ?        00:00:00 [iscsi_eh]
root     15793     1  0 09:08 ?        00:00:00 brcm_iscsiuio
root     15800     1  0 09:08 ?        00:00:00 iscsid
root     15802     1  0 09:08 ?        00:00:00 iscsid
root     19533 15269  0 10:11 pts/1    00:00:00 grep iscs

配置iscsi存储

[root@xifenfei ~]# iscsiadm -m discovery -t sendtargets -p 192.168.1.254:3260
192.168.1.254:3260,1 iqn.2006-01.com.openfiler:tsn.32b32087937b
[root@xifenfei ~]# iscsiadm -m node –T iqn.2006-01.com.openfiler:tsn.32b32087937b -p 192.168.1.254:3260 -l
Logging in to [iface: default, target: iqn.2006-01.com.openfiler:tsn.32b32087937b, portal: 192.168.1.254,3260]
Login to [iface: default, target: iqn.2006-01.com.openfiler:tsn.32b32087937b, portal: 192.168.1.254,3260] successful.
[root@xifenfei ~]# iscsiadm -m node –T iqn.2006-01.com.openfiler:tsn.32b32087937b -p 192.168.1.254:3260 
>--op update -n node.startup -v automatic

当前包含磁盘

[root@xifenfei ~]# fdisk -l

Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1        2355    18916506   83  Linux
/dev/sda2            2356        2610     2048287+  82  Linux swap / Solaris

Disk /dev/sdb: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1        2610    20964793+  83  Linux

Disk /dev/sdc: 2147 MB, 2147483648 bytes
255 heads, 63 sectors/track, 261 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdc doesn't contain a valid partition table

Disk /dev/sdd: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1               1         100      803218+  83  Linux
/dev/sdd2             101        1000     7229250   83  Linux

Disk /dev/sde: 1073 MB, 1073741824 bytes
34 heads, 61 sectors/track, 1011 cylinders
Units = cylinders of 2074 * 512 = 1061888 bytes

Disk /dev/sde doesn't contain a valid partition table

Disk /dev/sdf: 1073 MB, 1073741824 bytes
34 heads, 61 sectors/track, 1011 cylinders
Units = cylinders of 2074 * 512 = 1061888 bytes

Disk /dev/sdf doesn't contain a valid partition table

Disk /dev/sdg: 1073 MB, 1073741824 bytes
34 heads, 61 sectors/track, 1011 cylinders
Units = cylinders of 2074 * 512 = 1061888 bytes

Disk /dev/sdg doesn't contain a valid partition table

卸载iscsi存储

[root@xifenfei ~]# iscsiadm -m node --logoutall=all
Logging out of session [sid: 3, target: iqn.2006-01.com.openfiler:tsn.32b32087937b, portal: 192.168.1.254,3260]
Logout of [sid: 3, target: iqn.2006-01.com.openfiler:tsn.32b32087937b, portal: 192.168.1.254,3260] successful.
[root@xifenfei ~]# iscsiadm -m node --op delete --targetname iqn.2006-01.com.openfiler:tsn.32b32087937b

当前包含磁盘

[root@xifenfei ~]# fdisk -l

Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1        2355    18916506   83  Linux
/dev/sda2            2356        2610     2048287+  82  Linux swap / Solaris

Disk /dev/sdb: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1        2610    20964793+  83  Linux

Disk /dev/sdc: 2147 MB, 2147483648 bytes
255 heads, 63 sectors/track, 261 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdc doesn't contain a valid partition table

Disk /dev/sdd: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1               1         100      803218+  83  Linux
/dev/sdd2             101        1000     7229250   83  Linux

iscsi操作总结

增加iscsi存储
(1)发现iscsi存储:iscsiadm -m discovery -t st -p ISCSI_IP
(2)查看iscsi发现记录:iscsiadm -m node
(3)登录iscsi存储:iscsiadm -m node -T LUN_NAME -p ISCSI_IP -l
(4)开机自动: iscsiadm -m node –T LUN_NAME -p ISCSI_IP --op update -n node.startup -v automatic

删除iscsi存储
(1)登出iscsi存储 iscsiadm -m node -T LUN_NAME -p ISCSI_IP -u
(2)对出iscsi所有登录 iscsiadm -m node --logoutall=all
(3)删除iscsi发现记录:iscsiadm -m node -o delete -T LUN_NAME -p ISCSI_IP

登入需验证码的节点
(1)开启认证
iscsiadm -m node -T LUN_NAME -o update --name node.session.auth.authmethod --value=CHAP
*.使用-o同--op
(2)添加用户
iscsiadm -m node -T LUN_NAME --op update --name node.session.auth.username --value=[用户名]
(3)添加密码
iscsiadm –m node –T LUN_NAME –op update –name node.session.auth.password –value=[密码]

gv$视图不能查询所有节点信息

Posted on 2012 年 06 月 28 日 by 惜分飞

今天遇到诡异的事情,AIX 5.3 ORACLE 9I RAC的gv$视图只能查询到本地的记录,而不是所有节点.但是所有节点均运行正常,除gv$视图之外未发现其他异常.
异常时节点1信息

SQL> show parameter clu;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
cluster_database                     boolean     TRUE
cluster_database_instances           integer     2
cluster_interconnects                string      192.168.6.24

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
PL/SQL Release 9.2.0.8.0 - Production
CORE    9.2.0.8.0       Production
TNS for IBM/AIX RISC System/6000: Version 9.2.0.8.0 - Production
NLSRTL Version 9.2.0.8.0 - Production

SQL> col host_name for a10
SQL> select inst_id,HOST_NAME,STATUS from gv$instance;

   INST_ID HOST_NAME  STATUS
---------- ---------- ------------
         1 zwq_crm1   OPEN

SQL> show parameter par;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
fast_start_parallel_rollback         string      LOW
log_parallelism                      integer     1
parallel_adaptive_multi_user         boolean     FALSE
parallel_automatic_tuning            boolean     FALSE
parallel_execution_message_size      integer     2152
parallel_instance_group              string
parallel_max_servers                 integer     5
parallel_min_percent                 integer     0
parallel_min_servers                 integer     0
parallel_server                      boolean     TRUE
parallel_server_instances            integer     2
parallel_threads_per_cpu             integer     2
partition_view_enabled               boolean     FALSE
recovery_parallelism                 integer     0

SQL> !ps -ef|grep p0
  oracrm 1929258       1   0 09:49:19      -  0:01 ora_p005_crm1 
  oracrm  745844       1   0   Jun 26      -  0:20 ora_p004_crm1 
  oraeye 2421272 3948648   0 21:53:49  pts/0  0:00 grep p0 
  oracrm 3060406       1   0   Jun 26      -  0:20 ora_p002_crm1 
  oracrm 3170868       1   0   Jun 20      -  2:13 ora_p000_crm1 
  oracrm  787414       1   0   Jun 26      -  0:20 ora_p001_crm1 
  oracrm 2552690       1   0   Jun 26      -  0:20 ora_p003_crm1

1.节点最大允许5个并发进程,现在已经启动并发进程到p005(6个)
2.gv$视图只能查询一个节点信息

异常时节点2信息
问题所有情况和1节点完全相似

SQL> col host_name for a10
SQL> select inst_id,HOST_NAME,STATUS from gv$instance;

   INST_ID HOST_NAME  STATUS
---------- ---------- ------------
         2 zwq_crm2   OPEN

SQL> show parameter par;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
fast_start_parallel_rollback         string      LOW
log_parallelism                      integer     1
parallel_adaptive_multi_user         boolean     FALSE
parallel_automatic_tuning            boolean     FALSE
parallel_execution_message_size      integer     2152
parallel_instance_group              string
parallel_max_servers                 integer     5
parallel_min_percent                 integer     0
parallel_min_servers                 integer     0
parallel_server                      boolean     TRUE
parallel_server_instances            integer     2
parallel_threads_per_cpu             integer     2
partition_view_enabled               boolean     FALSE
recovery_parallelism                 integer     0

SQL> !ps -ef|grep p0
  oracrm 1867938       1   0 15:17:25      -  0:00 ora_p004_crm2 
  oracrm 2633748       1   0 09:49:19      -  0:01 ora_p005_crm2 
  oraeye 3059876 1007714   0 21:54:01  pts/0  0:00 grep p0 
  oracrm  323884       1 120   Jun 20      - 10692:47 ora_p000_crm2 
  oracrm 1839818       1   0 06:16:32      -  0:00 ora_p003_crm2 
  oracrm  459660       1 107   Jun 26      - 1857:00 ora_p001_crm2 
  oracrm 2351894       1   0 16:52:52      -  0:00 ora_p002_crm2

在异常2节点上做10046

SQL> oradebug setmypid
Statement processed.
SQL> oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 4
Statement processed.
SQL> select * from gv$version;

   INST_ID BANNER
---------- ----------------------------------------------------------------
         2 Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
         2 PL/SQL Release 9.2.0.8.0 - Production
         2 CORE 9.2.0.8.0       Production
         2 TNS for IBM/AIX RISC System/6000: Version 9.2.0.8.0 - Production
         2 NLSRTL Version 9.2.0.8.0 - Production

SQL> oradebug EVENT 10046 trace name context off
Statement processed.
SQL> oradebug TRACEFILE_NAME
/oracle9/app/admin/crm/udump/crm2_ora_517066.trc

分析10046内容发现

PARSING IN CURSOR #1 len=24 dep=0 uid=0 oct=3 lid=0 tim=41759005850609 hv=775381991 ad='21195808'
select * from gv$version
END OF STMT
PARSE #1:c=20000,e=33894,p=1,cr=59,cu=3,mis=1,r=0,dep=0,og=4,tim=41759005850607
BINDS #1:
kxfpg1srv
        could not start P006, inst 1
kxfpg1srv
        could not start local P006
EXEC #1:c=0,e=3540,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=4,tim=41759005854254
FETCH #1:c=0,e=28,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=4,tim=41759005854347
FETCH #1:c=0,e=23,p=0,cr=0,cu=0,mis=0,r=4,dep=0,og=4,tim=41759005854678
STAT #1 id=1 cnt=5 pid=0 pos=1 obj=0 op='VIEW  '
STAT #1 id=2 cnt=5 pid=1 pos=1 obj=304 op='FIXED TABLE FULL X$VERSION '

通过这里观察10046可以看到:在2节点上查询gv$instance,需要通过使用并发进程去访问1节点,但是因为1节点的parallel_max_servers为5,而当前的并发进程已经达到最大数目,从而使得想在节点1上启动并发失败,进入使得gv$视图只能查询出来本节点数据

gv$视图异常解决方法

--重启两个节点,查询正常
SQL> col host_name for a10
SQL> select inst_id,HOST_NAME,STATUS from gv$instance;

   INST_ID HOST_NAME  STATUS
---------- ---------- ------------
         1 zwq_crm1   OPEN
         2 zwq_crm2   OPEN

针对这个问题,很可能是oracle bug(因为是9i版本,我无法深究),或者是并发进程僵死所致,当时有一个想法,kill 掉数据库并发进程,因客户不同意(采用稳妥重启方案),未能通过尝试验证我的猜想.

对gv$视图正常做10046

SQL> oradebug setmypid
Statement processed.
SQL> oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12
Statement processed.
SQL> select * from gv$version;

   INST_ID BANNER
---------- ----------------------------------------------------------------
         1 Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
         1 PL/SQL Release 9.2.0.8.0 - Production
         1 CORE 9.2.0.8.0       Production
         1 TNS for IBM/AIX RISC System/6000: Version 9.2.0.8.0 - Production
         1 NLSRTL Version 9.2.0.8.0 - Production
         2 Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
         2 PL/SQL Release 9.2.0.8.0 - Production
         2 CORE 9.2.0.8.0       Production
         2 TNS for IBM/AIX RISC System/6000: Version 9.2.0.8.0 - Production
         2 NLSRTL Version 9.2.0.8.0 - Production

10 rows selected.

SQL> oradebug EVENT 10046 trace name context off
Statement processed.
SQL> oradebug TRACEFILE_NAME
/oracle9/app/admin/crm/udump/crm1_ora_1708916.trc

分析10046内容

PARSING IN CURSOR #1 len=24 dep=0 uid=0 oct=3 lid=0 tim=41752681925071 hv=775381991 ad='92ef64f0'
select * from gv$version
END OF STMT
PARSE #1:c=0,e=6770,p=0,cr=12,cu=3,mis=1,r=0,dep=0,og=4,tim=41752681925070
BINDS #1:
WAIT #1: nam='PX Deq: reap credit' ela= 21 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: Join ACK' ela= 237 p1=268500992 p2=1 p3=504403208016510312
WAIT #1: nam='PX Deq: reap credit' ela= 4 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: Join ACK' ela= 64 p1=268500992 p2=2 p3=504403208016510312
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: Join ACK' ela= 46 p1=268566528 p2=1 p3=504403208016502096
WAIT #1: nam='PX Deq: reap credit' ela= 4 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: Join ACK' ela= 855 p1=268566528 p2=2 p3=504403208016502096
WAIT #1: nam='PX Deq: reap credit' ela= 4 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq Credit: send blkd' ela= 188 p1=268566528 p2=1 p3=504403208016502096
WAIT #1: nam='PX Deq: reap credit' ela= 1 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: Parse Reply' ela= 406 p1=200 p2=1 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: Parse Reply' ela= 7442 p1=200 p2=2 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 4 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: Parse Reply' ela= 1664 p1=200 p2=1 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
EXEC #1:c=0,e=11572,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=4,tim=41752681936727
WAIT #1: nam='SQL*Net message to client' ela= 1 p1=1650815232 p2=1 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: Execute Reply' ela= 166 p1=200 p2=1 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: Execute Reply' ela= 310 p1=200 p2=2 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
FETCH #1:c=0,e=557,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=4,tim=41752681937320
WAIT #1: nam='SQL*Net message from client' ela= 223 p1=1650815232 p2=1 p3=0
WAIT #1: nam='SQL*Net message to client' ela= 1 p1=1650815232 p2=1 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 18 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='DFS lock handle' ela= 263 p1=1128857605 p2=9 p3=5
WAIT #1: nam='DFS lock handle' ela= 254 p1=1128857605 p2=9 p3=1
WAIT #1: nam='DFS lock handle' ela= 263 p1=1128857605 p2=9 p3=3
WAIT #1: nam='DFS lock handle' ela= 73 p1=1128857605 p2=9 p3=2
WAIT #1: nam='DFS lock handle' ela= 363 p1=1128857605 p2=9 p3=2
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 24 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: Signal ACK' ela= 3 p1=0 p2=1 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: Signal ACK' ela= 155 p1=10 p2=2 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 1 p1=0 p2=0 p3=0
FETCH #1:c=0,e=1735,p=0,cr=0,cu=0,mis=0,r=9,dep=0,og=4,tim=41752681939329
WAIT #1: nam='DFS lock handle' ela= 249 p1=1128857605 p2=9 p3=1
WAIT #1: nam='DFS lock handle' ela= 258 p1=1128857605 p2=9 p3=3
WAIT #1: nam='DFS lock handle' ela= 66 p1=1128857605 p2=9 p3=2
WAIT #1: nam='DFS lock handle' ela= 369 p1=1128857605 p2=9 p3=2
WAIT #1: nam='PX Deq: reap credit' ela= 3 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 19 p1=0 p2=0 p3=0
WAIT #1: nam='PX Deq: reap credit' ela= 1 p1=0 p2=0 p3=0
WAIT #1: nam='enqueue' ela= 201 p1=1347616774 p2=2 p3=0
WAIT #1: nam='enqueue' ela= 41 p1=1347616774 p2=2 p3=0
WAIT #1: nam='SQL*Net message from client' ela= 6981860 p1=1650815232 p2=1 p3=0
STAT #1 id=1 cnt=0 pid=0 pos=1 obj=0 op='VIEW  '
STAT #1 id=2 cnt=0 pid=1 pos=1 obj=304 op='FIXED TABLE FULL X$VERSION '

通过这里可以看出,rac在正常情况下gv$视图默认的查询就是并发方式进行

因未配置Hugepage会话数添增悲剧案例

Posted on 2012 年 06 月 27 日 by 惜分飞

今天一朋友反馈他们的一个数据库hang住了,通过ssh也不能登录系统,他们没有办法重启系统解决问题,现在想让我帮忙找出问题原因
分析awr得出

询问朋友,他们的库一般session保持在200个左右,这次突然飙升到750以上,属于异常情况

分析监听日志

看到在截图的时间内,整体访问较频繁,某个ip访问异常频繁,通过这些信息,初步怀疑是用户的数据库内存使用完,导致系统数据库hang住.

查看系统日志

Jun 26 14:35:55 result01 kernel: [5613531.566617] Free swap  = 0kB
Jun 26 14:35:55 result01 kernel: [5613531.566618] Total swap = 2104504kB
Jun 26 14:35:55 result01 kernel: [5613531.566620] Free swap:            0kB
Jun 26 14:35:55 result01 kernel: [5613531.591073] 2359296 pages of RAM
Jun 26 14:35:55 result01 kernel: [5613531.591074] 318236 reserved pages
Jun 26 14:35:55 result01 kernel: [5613531.591075] 73353 pages shared
Jun 26 14:35:56 result01 kernel: [5613531.591076] 529 pages swap cached
Jun 26 14:35:56 result01 kernel: [5613531.591079] Out of Memory: Kill process 8904 (oracle) score 891 and children.
Jun 26 14:35:56 result01 kernel: [5613531.591201] Out of memory: Killed process 8904 (oracle).
Jun 26 14:35:56 result01 kernel: [5613531.592280] oracle invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0

通过这个日志看出系统内存和交换分区都使用完,因为内存不够,系统开始kill掉部分oracle进程.通过这些确定是系统内存使用完导致hang住可以理解.

分析hang住原因
为什么session意外的从200添增到750的时候,系统内存被使用完

cat /proc/meminfo
MemTotal:      8164240 kB
SwapTotal:     2104504 kB
PageTables:      69732 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

sga_target=3674210304
pga_aggregate_target=1732247552

从这里得出几个信息:
1.数据库总内存8g,swap配置2g
2.数据库未使用Hugepage
3.数据库设置sga和pga信息

内存参数估算
数据库总计占用内存为:(3674210304+1732247552)/1024/1024=5156M(pga可能未使用完,也可能超过)
结合实际sga_target=3674210304,会话数.
保守估计下Oracle进程占用的系统内存3674210304/(4*1024)*1.5*750/1024/1024=960M
估算如果使用Hugepage Oracle进程占用系统内存为:3674210304/(2*1024*1024)*1.5*750/1024/1024=1.9M
通过这里分析Oracle总占用内存为:5156+960=6116M
通过保守计算留给系统的内存大概为:1.8G左右
因为系统的其他操作,最终导致该系统内存耗完,系统和数据库hang住

总结说明
这是一个实实在在因为linux中因为未配置Hugepage,因为用户突增,导致系统内存消耗光,从而使得系统和数据库hang住的例子.
这个库因为sga不是非常大,所以Oracle占用系统内存不是高到离谱,如果sga配置为32g,1000个session,那就会占用12g的系统内存
通过这些可以看出在linux中配置Hugepage的优点:Hugepage不光是为了减轻cpu的负担,还可以减少系统内存的消耗;在没有极端的情况下,建议linux的数据库系统配置Hugepage.

To find the TX Enqueue contention in a RAC or OPS environment

Posted on 2012 年 06 月 27 日 by 惜分飞

今天查找TX Enqueue看到的一篇文章,拿出来共享下

PURPOSE 
------------- 
To find the TX Enqueue contention in a RAC or OPS environment 

What is TX Enqueue ? 
In one word oracle is maintaining queue for transaction. 

How Many Resources ? 
1/ active transaction 

How Many Locks? 
1/transaction + 1/process waiting for a locked row by that 
transaction. 

How Many Users? 
1 + 1/ process waiting for something locked by this transaction. 

Who Uses? 
All processes 

What need to investigate? 
The mode of TX (6/4), Holding/Waiting/Requesting 

SCOPE & APPLICATION 
===================== 

This document will help to analyze the application design related to transaction bottlenecks 
and database performance tuning. 

Let start with an example: 
=================== 
create table akdas (A1 number, Col1 Varchar2(10), Col2 Varchar2(10)); 
insert into akdas values(5,'Hello','Hi'); 
insert into akdas values(6,'Sudip','Datta'); 
insert into akdas values(7,'Preetam','Roy'); 
insert into akdas values(8,'Michael','Polaski'); 

From Node 1: 
========== 
update akdas set a1=11 where a1=6; 

From Node 2: 
========== 
update akdas set a1=12 where a1=7; 
update akdas set a1=11 where a1=6;  /* this will wait for Node1: to complete the transaction */ 

This Note Is Made To Analyzing Only the TX-Mode-6 (Exclusive). 


1. Now run the following query to track down the problem: Who is waiting 
=================================================================== 
prompt 
prompt Query 1. Waiting for TX Enqueue where mode is Exclusive 
prompt ===================================== 
prompt 
set linesize 100 
set pagesize 66 
col c1 for a15 
col c1 heading "Program Name " 
select l.inst_id,l.SID,program c1,l.TYPE,l.ID1,l.ID2,l.LMODE,l.REQUEST 
from gv$lock l,gv$session s 
where l.type like 'TX' and l.REQUEST =6 
and l.inst_id=s.inst_id and l.sid=s.sid 
order by id1 
/ 

Output will be here 
=============== 
   INST_ID      SID     Program Name       TY     ID1     ID2       LMODE      REQUEST 
-----------  ---------- ------------------ ---   -------- --------  ---------- --------
         2           13  sqlplus@opcbsol   TX     393236  780       0          6 
                         2 (TNS V1-V3) 

It is clear that SID 12 of instance 2 is doing a DML and waiting on REQUEST Mode 6. 


2. Let's run the next query to find who is holding 
=========================================== 

prompt 
prompt 
prompt Query 2. Holding for TX Enqueue where mode greater than 6 
prompt ======================================= 
prompt 
set linesize 100 
set pagesize 66 
col c1 for a15 
col c1 heading "Program Name " 
select l.inst_id,l.SID,program c1,l.TYPE,l.ID1,l.ID2,l.LMODE,l.REQUEST 
from gv$lock l,gv$session s 
where l.type like 'TX' and l.LMODE =6 and (l.ID1,l.ID2) in 
(select id1,id2 from gv$lock where type like 'TX' and REQUEST =6) 
and l.inst_id=s.inst_id and l.sid=s.sid 
order by id1 
/ 

Output will be here 
=============== 
   INST_ID      SID     Program Name      TY        ID1        ID2      LMODE    REQUEST 
   ----------  ---------- -------------- ---   ---------- --------   ----------- --------
         1          12    sqlplus@opcbsol TX     393236        780      6          0 
                          1 (TNS V1-V3) 

So holder is SID 12 on instance 1. Where LMODE = 6. 


3. Let's find out the exact file#, block# and Record# where it is waiting 
=============================================================== 

prompt 
prompt 
prompt Query 3. Object# ,File#, Block# and Slot# TX Enqueue in detail 
prompt ======================================== 
prompt 
set linesize 110 
col c0 for 999 
col c0 heading "INS" 
col c1 for a15 
col c1 heading "Program Name " 
select inst_id c0,sid,program c1,ROW_WAIT_OBJ# object_no, ROW_WAIT_FILE# Rfile_no,
ROW_WAIT_BLOCK# Block_no ,ROW_WAIT_ROW# Row_no 
from gv$session 
where (inst_id,sid) in (select inst_id,sid from gv$session_wait where p1='1415053318') 
/ 

Output Will be here 
=============== 
 INS     SID    Program Name     OBJECT_NO RFILE_NO BLOCK_NO  ROW_NO 
----- ---------- -------------   ---------------    --------- -------
   2         13     sqlplus@opcbsol  7261      9        12346     1 
                      2 (TNS V1-V3) 
  

From the output, it is clear that it is waiting on Relative_File# 9, Block# 12346, Row Number 1. 
Here Row Number 1 means the slot number in the block 12346. This Row_No start from 0 (zero). 


4. Let's Find the object details 
============================= 

prompt 
prompt 
prompt Query 4. Object Involve for TX Enqueue in detail 
prompt =============================== 
prompt 
set linesize 100 
set pagesize 100 
col owner for a10 
col object_name for a20 
col object_type for a10 
select owner,object_name,object_id,object_type 
from dba_objects 
where 
object_id in (select ROW_WAIT_OBJ# from gv$session 
where (inst_id, sid) in (select inst_id,sid from gv$session_wait where p1='1415053318')) 
/ 

Output Will be here 
=============== 
OWNER      OBJECT_NAME  OBJECT_ID   OBJECT_TYP 
---------  ------------ --------    -----------
AKDAS      AKDAS        7261        TABLE 


5. Let’s find the row value details 
============================= 

prompt 
prompt 
prompt Query 5. Finding the row value 
prompt ==================== 
prompt 
select * from <Owner>.<Table Name>  where rowid like 
DBMS_ROWID.ROWID_CREATE(1,&Object_No,&Rfile_No, &Block_No, &Row_Number) 
/ 
From query 3 and 4  we will get the value for all variables. 
Owner = AKDAS 
Table_Name = AKDAS 
Object_No = 7261 
Rfile_No =  9 
Block_No = 12346 
Row_Number = 1 

Output Will be here 
=============== 
        A1    Col1                 Col2 
  ---------- --------------- ---------- 
         6      Hello                Hi 

So we can drag down to the row value where TX Enqueue contention exists. 


6. Let’s find the user activity that is "Holder" and "Waiter" 
==================================================== 

set linesize 120 
set pagesize 66 
col c0 for 999 
col c0 heading "INS" 
col c1 for a9 
col c1 heading "OS User" 
col c2 for a9 
col c2 heading "Oracle User" 
col c3 for a15 
col c3 heading "Program Name" 
col b1 for a9 
col b1 heading "Unix PID" 
col b2 for 9999 justify left 
col b2 heading "ORA SID" 
col b3 for 999999 justify left 
col b3 heading "SERIAL#" 
col sql_text for a45 
set space 1 
break on b1 nodup on c0 nodup on c3 nodup on c1 nodup on c2 nodup on b2 nodup on b3 skip 2 
select a.inst_id c0,b.sid b2,c.spid b1, b.program c3, b.username c2,b.serial# b3, a.sql_text 
  from gv$sql a, gv$session b, gv$process c 
 where 
   a.address = b.sql_address 
   and b.paddr = c.addr 
   and a.hash_value = b.sql_hash_value 
   and a.inst_id=b.inst_id and a.inst_id=c.inst_id 
   and a.inst_id like '&inst_id' and b.sid like '&sid' 
 order by c.spid,a.hash_value 
/ 

This query asks the Instance Number and Sid number, which you can get from step 1 and 2. 
But remember , you can see the waiter activity, but you may not see the holder activity. 
Reason is, the holder is sitting idle after doing the DML operation. So SQL for Holder 
should not be seen under gv$sql. 

This all query can be run for single instance database, but all GV$ view need to replace to V$ 
and there is no INST_ID for V$ View, that part need to be taken care.

来自:How to Find TX Enqueue Contention in RAC or OPS [ID 179582.1]

Oracle 11g丢失access$恢复方法

Posted on 2012 年 06 月 26 日 by 惜分飞

最近接触到两个案例都是11g数据库因为异常关闭导致access$表丢失,使得数据库不能正常open.为什么这个表会丢失还未找到原因.我这里提供一种在upgrade模式下解决给问题方法.
数据库版本

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production

SQL> select to_char(sysdate,'yyyy-mm-dd hh24:mi:ss') "xifenfei" from dual;

xifenfei
--------------------------------------
2012-06-22 05:28:57

数据库启动报ORA-00704

SQL> startup
ORACLE instance started.

Total System Global Area  523108352 bytes
Fixed Size                  1346052 bytes
Variable Size             448792060 bytes
Database Buffers           67108864 bytes
Redo Buffers                5861376 bytes
Database mounted.
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-00942: table or view does not exist
Process ID: 1782
Session ID: 125 Serial number: 5

找出ORA-00704报错原因

SQL> conn / as sysdba
Connected to an idle instance.
SQL> startup mount;
ORACLE instance started.

Total System Global Area  523108352 bytes
Fixed Size                  1346052 bytes
Variable Size             448792060 bytes
Database Buffers           67108864 bytes
Redo Buffers                5861376 bytes
Database mounted.
SQL> oradebug setmypid
Statement processed.
SQL> oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12
Statement processed.
SQL> oradebug TRACEFILE_NAME
/u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_2010.trc
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-00942: table or view does not exist
Process ID: 2010
Session ID: 125 Serial number: 5

查看trace文件发现

PARSE ERROR #3063868604:len=56 dep=1 uid=0 oct=3 lid=0 tim=1340312320595472 err=942
select order#,columns,types from access$ where d_obj#=:1
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-00942: table or view does not exist
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-00942: table or view does not exist

*** 2012-06-22 04:58:40.596
USER (ospid: 2010): terminating the instance due to error 704

启动数据库至upgrade模式

SQL> startup  upgrade
ORACLE instance started.

Total System Global Area  523108352 bytes
Fixed Size                  1346052 bytes
Variable Size             448792060 bytes
Database Buffers           67108864 bytes
Redo Buffers                5861376 bytes
Database mounted.
Database opened.

创建access$表和index

SQL> create table access$
  2  ( d_obj#        number not null,
  3    order#        number not null,
  4    columns       raw(126),
  5    types         number not null)
  6    storage (initial 10k next 100k maxextents unlimited pctincrease 0)
  7  /

Table created.

SQL> create index i_access1 on
  2    access$(d_obj#, order#)
  3    storage (initial 10k next 100k maxextents unlimited pctincrease 0)
  4  /

Index created.
--创建语句可以在?\RDBMS\ADMIN\dcore.bsq中找到

重启数据库

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area  523108352 bytes
Fixed Size                  1346052 bytes
Variable Size             448792060 bytes
Database Buffers           67108864 bytes
Redo Buffers                5861376 bytes
Database mounted.
Database opened.

access$表作用(感谢vmcd同学提供)
When a database object is first referenced in a PL/SQL program, the PL/SQL engine checks the ACCESS$ table (owned by SYS) to see if the executor of the program has authority on that database object.
对于access$表丢失以前记录是否对系统产生严重影响还未知,希望知道的朋友告知下