网关不通致使vip/lsnr资源异常

crs_stat显示节点1的listener和vip时断时续(一会online,一会offline)

rac1-> crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora.devdb.db   application    ONLINE    ONLINE    rac1        
ora....b1.inst application    ONLINE    ONLINE    rac1        
ora....b2.inst application    ONLINE    ONLINE    rac2        
ora....SM1.asm application    ONLINE    ONLINE    rac1        
ora....C1.lsnr application    ONLINE    OFFLINE               
ora.rac1.gsd   application    ONLINE    ONLINE    rac1        
ora.rac1.ons   application    ONLINE    ONLINE    rac1        
ora.rac1.vip   application    ONLINE    ONLINE    rac2        
ora....SM2.asm application    ONLINE    ONLINE    rac2        
ora....C2.lsnr application    ONLINE    OFFLINE               
ora.rac2.gsd   application    ONLINE    ONLINE    rac2        
ora.rac2.ons   application    ONLINE    ONLINE    rac2        
ora.rac2.vip   application    ONLINE    ONLINE    rac1        
rac1-> crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora.devdb.db   application    ONLINE    ONLINE    rac1        
ora....b1.inst application    ONLINE    ONLINE    rac1        
ora....b2.inst application    ONLINE    ONLINE    rac2        
ora....SM1.asm application    ONLINE    ONLINE    rac1        
ora....C1.lsnr application    ONLINE    OFFLINE               
ora.rac1.gsd   application    ONLINE    ONLINE    rac1        
ora.rac1.ons   application    ONLINE    ONLINE    rac1        
ora.rac1.vip   application    ONLINE    ONLINE    rac2        
ora....SM2.asm application    ONLINE    ONLINE    rac2        
ora....C2.lsnr application    ONLINE    ONLINE    rac2        
ora.rac2.gsd   application    ONLINE    ONLINE    rac2        
ora.rac2.ons   application    ONLINE    ONLINE    rac2        
ora.rac2.vip   application    ONLINE    ONLINE    rac2

查看crsd.log日志

0Attempting to start `ora.rac1.vip` on member `rac2`
0Start of `ora.rac1.vip` on member `rac2` failed.
0startRunnable: setting CLI values
0Attempting to start `ora.rac1.vip` on member `rac1`
0Start of `ora.rac1.vip` on member `rac1` succeeded.
0startRunnable: setting CLI values
0Attempting to start `ora.rac1.LISTENER_RAC1.lsnr` on member `rac1`
0Start of `ora.rac1.LISTENER_RAC1.lsnr` on member `rac1` succeeded.
u_freem: mem passed is null
0CheckResource error for ora.rac1.vip error code = 1
0In stateChanged, ora.rac1.vip target is ONLINE
0ora.rac1.vip on rac1 went OFFLINE unexpectedly
0StopResource: setting CLI values
0Attempting to stop `ora.rac1.vip` on member `rac1`
0Stop of `ora.rac1.vip` on member `rac1` succeeded.
0ora.rac1.vip RESTART_COUNT=0 RESTART_ATTEMPTS=0
0ora.rac1.vip failed on rac1 relocating.
0StopResource: setting CLI values
0Attempting to stop `ora.rac1.LISTENER_RAC1.lsnr` on member `rac1`
0Stop of `ora.rac1.LISTENER_RAC1.lsnr` on member `rac1` succeeded.
0Attempting to start `ora.rac1.vip` on member `rac2`
0Start of `ora.rac1.vip` on member `rac2` failed.
0Attempting to start `ora.rac1.vip` on member `rac2`
0Start of `ora.rac1.vip` on member `rac2` succeeded.
0CRS-1002: Resource 'ora.rac1.vip' is already running on member 'rac2'

这里可以看出由于vip资源失败,致使lsnr资源也出现失败,紧接着又是启动vip,再启动lsnr。所以使得我们通过crs_stat -t观察资源情况时,看到这两个进程一直处于波动状态

分析ora.rac1.vip.log日志

[ora.rac1.vip]: clsrcexecut:env ORACLE_CONFIG_HOME=/u01/app/oracle/product/10.2.0/crs_1
[ora.rac1.vip]: clsrcexecut:cmd=/u01/app/oracle/product/10.2.0/crs_1/bin/racgeut -e 
_USR_ORA_DEBUG=0 54 /u01/app/oracle/product/10.2.0/crs_1/bin/racgvip check rac1
[ora.rac1.vip]: clsrcexecut: rc = 1, time = 6.430s
[ora.rac1.vip]: end for resource = ora.rac1.vip, action=check,status=1,time=6.450s
[ora.rac1.vip]: ping to 192.168.1.1 via eth0 failed, rc = 1 (host=rac1)
ping to 192.168.1.1 via eth0 failed, rc = 1 (host=rac1)
[ora.rac1.vip]: clsrcstartorp: Error with malloc
[ora.rac1.vip]: ping to 192.168.1.1 via eth0 failed, rc = 1 (host=rac1)
ping to 192.168.1.1 via eth0 failed, rc = 1 (host=rac1)
Interface eth0 checked failed (host=rac1)
Invalid parameters, or failed to bring up VIP (host=rac1)

通过这里发现:从eth0网卡ping192.168.1.1(网关)不通,导致VIP资源不能正常工作

核实问题原因/解决
我们人工从节点1上ping 网关(192.168.1.1),果真不通.继续检查发现,网关服务器上意外的开启了防火墙,对部分进来的包进行了过滤,恰好节点1在被禁止之列,使得节点1 ping 网关不成功,从而出现该了该错误.关闭防火墙或者重新设置规则后,rac工作正常,未出现vip和lsnr资源出现波动情况.

OCR/Vote disk 维护操作

数据库版本

SQL>  select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - Prod
PL/SQL Release 10.2.0.5.0 - Production
CORE    10.2.0.5.0      Production
TNS for Linux: Version 10.2.0.5.0 - Production
NLSRTL Version 10.2.0.5.0 - Production

ocr测试(可以online处理)

rac2-> ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     160396
         Used space (kbytes)      :       4376
         Available space (kbytes) :     156020
         ID                       : 1302494786
         Device/File Name         : /dev/raw/raw11
                                    Device/File integrity check succeeded

                                    Device/File not configured

         Cluster registry integrity check succeeded

rac2-> more /etc/oracle/ocr.loc 
ocrconfig_loc=/dev/raw/raw11
local_only=false

--增加ocr镜像
[root@rac2 bin]# ./ocrconfig -replace ocrmirror /dev/raw/raw12

rac2-> ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     160396
         Used space (kbytes)      :       4376
         Available space (kbytes) :     156020
         ID                       : 1302494786
         Device/File Name         : /dev/raw/raw11
                                    Device/File integrity check succeeded
         Device/File Name         : /dev/raw/raw12
                                    Device/File integrity check succeeded

         Cluster registry integrity check succeeded

rac2-> more /etc/oracle/ocr.loc 
#Device/file  getting replaced by device /dev/raw/raw12 
ocrconfig_loc=/dev/raw/raw11
ocrmirrorconfig_loc=/dev/raw/raw12
local_only=false

--删除ocr
[root@rac2 bin]# ./ocrconfig -replace ocr

rac2-> more /etc/oracle/ocr.loc 
#Device/file /dev/raw/raw11 being deleted 
ocrconfig_loc=/dev/raw/raw12
local_only=false

rac2-> ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     160396
         Used space (kbytes)      :       4376
         Available space (kbytes) :     156020
         ID                       : 1302494786
         Device/File Name         : /dev/raw/raw12
                                    Device/File integrity check succeeded

                                    Device/File not configured

         Cluster registry integrity check succeeded

--补充删除ocr镜像
[root@rac2 bin]# ./ocrconfig -replace ocrmirror

Vote disk测试(10g offline/11g online)

--关闭crs
[root@rac2 bin]# ./crsctl stop crs
[root@rac1 bin]# ./crsctl stop crs

--查询vote disk
rac2-> crsctl query css votedisk
 0.     0    /dev/raw/raw31

--增加vote disk
[root@rac2 bin]# ./crsctl add css votedisk /dev/raw/raw23 -force
Now formatting voting disk: /dev/raw/raw23
successful addition of votedisk /dev/raw/raw23.
[root@rac2 bin]# ./crsctl add css votedisk /dev/raw/raw33 -force
Now formatting voting disk: /dev/raw/raw33
successful addition of votedisk /dev/raw/raw33.
[root@rac2 bin]# ./crsctl add css votedisk /dev/raw/raw32 -force
Now formatting voting disk: /dev/raw/raw32
successful addition of votedisk /dev/raw/raw32.

rac2-> crsctl query css votedisk
 0.     0    /dev/raw/raw31
 1.     0    /dev/raw/raw23
 2.     0    /dev/raw/raw33
 3.     0    /dev/raw/raw32

located 4 votedisk(s).

--删除vote disk
[root@rac2 bin]# ./crsctl delete css votedisk /dev/raw/raw33 -force
successful deletion of votedisk /dev/raw/raw33.

--启动crs
[root@rac2 bin]# ./crsctl start crs
[root@rac1 bin]# ./crsctl start crs

补充官方操作说明[ID 428681.1]

RAC 10g升级到10.2.0.5

1.Back Up database
一般情况下rman备份

2.备份ocr和vote disk

[root@rac2 bin]# ./ocrconfig -export /tmp/ocr_export.bak
[root@rac2 bin]# more /etc/oracle/ocr.loc 
ocrconfig_loc=/dev/raw/raw11
local_only=FALSE
[root@rac2 bin]# dd if=/dev/raw/raw11 of=/tmp/ocr_dd.bak
[root@rac2 bin]# dd if=/dev/raw/raw31 of=/tmp/vote_dd.bak

3.Update Oracle Time Zone Definitions
Actions for the DSTv4 update in the 10.2.0.5 patchset [ID 1086400.1]

4.Stopping All Processes
滚动升级关闭一个节点所有进程,非滚动升级关闭所有进程

$ isqlplusctl stop
$ emctl stop dbconsole
$ srvctl stop service -d db_name [-s service_name_list [-i inst_name]]
$ srvctl stop instance -d db_name -i inst_name
$ srvctl stop asm -n node
$ srvctl stop listener -n node [-l listenername]
$ srvctl stop nodeapps -n node
# CRS_home/bin/crsctl stop crs(root执行,滚动升级不需要关闭)

5.Back Up the System
$ORACLE_BASE中文件,主要包括(db和crs安装文件/oraInventory文件)

6.升级crs软件
执行./runInstaller选择crs目录

执行下面命令
# CRS_home/bin/crsctl stop crs
# CRS_home/install/root102.sh

7.升级db软件
关闭crs和db所有进程(步骤同4)
执行./runInstaller选择db目录

执行下面命令
# ORACLE_HOME/root.sh

8.升级数据库
8.1)检查数据库升级需要满足条件,对存在不合适之处,进行修正
How to Download and Run Oracle’s Database Pre-Upgrade Utility [ID 884522.1]

SQL> STARTUP UPGRADE
SQL> SPOOL upgrade_info.log 
SQL> @/rdbms/admin/utlu102i.sql
SQL> SPOOL OFF
SQL> ALTER SYSTEM SET CLUSTER_DATABASE=FALSE SCOPE=spfile;
--其他根据upgrade_info.log中提示修改
SQL> SHUTDOWN IMMEDIATE
SQL> STARTUP UPGRADE

8.2)启动监听
srvctl start listener -n node

8.3)升级数据库

SQL> SPOOL patch.log
SQL> @?/rdbms/admin/catupgrd.sql
--检查patch.log,发现有错误查找原因,重新执行catupgrd.sql脚本
SQL> SPOOL OFF
SQL> SHUTDOWN IMMEDIATE
SQL> STARTUP
SQL> @?/rdbms/admin/utlrp.sql
SQL> ALTER SYSTEM SET CLUSTER_DATABASE=TRUE SCOPE=spfile;
--包括其他修改调整参数
SQL> SHUTDOWN IMMEDIATE
--使用rac管理相关命令,启动需要启动资源

9.修改相关目录权限
# ORACLE_HOME/install/changePerm.sh

具体操作步骤请阅读README.html

永久表空间出现临时段不能扩展原因探讨

数据库版本

SQL> select * from v$version;

BANNER
-----------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production

创建5M测试表空间

SQL> CREATE TABLESPACE T_1652 DATAFILE '/tmp/t_1652_01.dbf' size 5M 
  2  AUTOEXTEND OFF LOGGING PERMANENT EXTENT MANAGEMENT LOCAL AUTOALLOCATE 
  3  SEGMENT SPACE MANAGEMENT AUTO blocksize 8192;

Tablespace created.

测试CTAS

SQL> create table CHF.T_XIFENFEI TABLESPACE T_1652 as
  2  select LPAD('XIFENFEI',1024,'F') "C_XFF" from  dual connect by level <=3500;
create table CHF.T_XIFENFEI TABLESPACE T_1652 as
*
ERROR at line 1:
ORA-01652: unable to extend temp segment by 128 in tablespace T_1652


SQL> create table CHF.T_XIFENFEI TABLESPACE T_1652 as
  2  select LPAD('XIFENFEI',1024,'F') "C_XFF" from  dual connect by level <=3000;

Table created.

测试CREATE INDEX

SQL> create index chf.i_xifenfei on chf.t_xifenfei(c_xff)
  2  tablespace t_1652;
create index chf.i_xifenfei on chf.t_xifenfei(c_xff)
                                   *
ERROR at line 1:
ORA-01658: unable to create INITIAL extent for segment in tablespace T_1652


SQL> Select MAX(d.bytes) total_bytes,
  2         nvl(SUM(f.Bytes), 0) free_bytes,
  3         d.file_name,
  4         MAX(d.bytes) - nvl(SUM(f.bytes), 0) used_bytes,
  5  from   DBA_FREE_SPACE f , DBA_DATA_FILES d
  6  where  f.tablespace_name(+) = d.tablespace_name
  7  and    f.file_id(+) = d.file_id
  8  and    d.tablespace_name = 'T_1652'
  9  group by d.file_name;

TOTAL_BYTES FREE_BYTES FILE_NAME                 USED_BYTES  
----------- ---------- ------------------------- ---------- 
    5242880          0 /tmp/t_1652_01.dbf           5242880

SQL> drop table chf.t_xifenfei purge;

Table dropped.

SQL> create table CHF.T_XIFENFEI TABLESPACE T_1652 as
  2  select LPAD('XIFENFEI',1024,'F') "C_XFF" from  dual connect by level <=2000;

Table created.

SQL> Select MAX(d.bytes) total_bytes,
  2         nvl(SUM(f.Bytes), 0) free_bytes,
  3         d.file_name,
  4         MAX(d.bytes) - nvl(SUM(f.bytes), 0) used_bytes,
  5  from   DBA_FREE_SPACE f , DBA_DATA_FILES d
  6  where  f.tablespace_name(+) = d.tablespace_name
  7  and    f.file_id(+) = d.file_id
  8  and    d.tablespace_name = 'T_1652'
  9  group by d.file_name;


TOTAL_BYTES FREE_BYTES FILE_NAME                 USED_BYTES   
----------- ---------- ------------------------- ---------- 
    5242880    1048576 /tmp/t_1652_01.dbf           4194304  

SQL> create index chf.i_xifenfei on chf.t_xifenfei(c_xff)
  2  tablespace t_1652;
create index chf.i_xifenfei on chf.t_xifenfei(c_xff)
                                   *
ERROR at line 1:
ORA-01652: unable to extend temp segment by 128 in tablespace T_1652

SQL> ALTER DATABASE DATAFILE '/tmp/t_1652_01.dbf' RESIZE 10M;

Database altered.

SQL> create index chf.i_xifenfei on chf.t_xifenfei(c_xff)
  2  tablespace t_1652;

Index created.

测试MOVE

SQL> drop table chf.t_xifenfei purge;

Table dropped.

SQL> create table CHF.T_XIFENFEI TABLESPACE T_1652 as
  2  select LPAD('XIFENFEI',1024,'F') "C_XFF" from  dual connect by level <=3500;

Table created.

SQL> alter table chf.t_xifenfei move;
alter table chf.t_xifenfei move
*
ERROR at line 1:
ORA-01652: unable to extend temp segment by 128 in tablespace T_1652


SQL> Select MAX(d.bytes) total_bytes,
  2         nvl(SUM(f.Bytes), 0) free_bytes,
  3         d.file_name,
  4         MAX(d.bytes) - nvl(SUM(f.bytes), 0) used_bytes,
  5  from   DBA_FREE_SPACE f , DBA_DATA_FILES d
  6  where  f.tablespace_name(+) = d.tablespace_name
  7  and    f.file_id(+) = d.file_id
  8  and    d.tablespace_name = 'T_1652'
  9  group by d.file_name;

TOTAL_BYTES FREE_BYTES FILE_NAME                 USED_BYTES  
----------- ---------- ------------------------- ---------- 
   10485760    4194304 /tmp/t_1652_01.dbf           6291456  

SQL>  ALTER DATABASE DATAFILE '/tmp/t_1652_01.dbf' RESIZE 15M;

Database altered.

SQL> alter table chf.t_xifenfei move;

Table altered.

这里可以发现CTAS,CREATE INDEX,MOVE操作都有个共同点:需要一次性创建一个较大SEGMENT,但是这个SEGMENT的创建过程是在数据库中逐渐实现(非初始化指定大小)。
也就是说,ORACLE对这些对象的处理方法是:对于这样的segment先当作临时段处理,当处理完成后,再把这些在永久表空间中的临时段转换为永久段;所以当这些永久表空间中的临时段在扩展的时候,遇到该永久表空间不足,而该段目前还是临时段(在永久表空间中的临时段),就出现了ORA-01652提示一个永久表空间unable to extend temp segment

表空间online出现ORA-00600[kcbz_check_objd_typ]处理过程

online表空间出现ORA-00600[kcbz_check_objd_typ]

Fri Mar 30 14:09:24 2012
alter tablespace xff offline
Fri Mar 30 14:09:28 2012
Completed: alter tablespace xff offline
Fri Mar 30 17:49:59 2012
alter tablespace xff rename datafile '/oradataa/xifenfei.dbf' to '/oradatab/xifenfei_bak.dbf'
Fri Mar 30 17:50:03 2012
Completed: alter tablespace xff rename datafile '/oradataa/xifenfei.dbf' to '/oradatab/xifenfei_bak.dbf'
Fri Mar 30 17:50:03 2012
alter tablespace coweb_bak_new online
Fri Mar 30 17:53:08 2012
Errors in file /oracle/ora10/admin/ora10g/udump/ora10g_ora_21275.trc:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ], [0], [0], [1], [], [], [], []

分析trace文件

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /oracle/ora10/product
System name:	Linux
Node name:	FH-DB-01
Release:	2.6.18-92.el5
Version:	#1 SMP Tue Apr 29 13:16:15 EDT 2008
Machine:	x86_64
Instance name: ora10g
Redo thread mounted by this instance: 1
Oracle process number: 182
Unix process pid: 21275, image: oracle@FH-DB-01 (TNS V1-V3)

*** 2012-03-30 17:50:16.469
*** ACTION NAME:() 2012-03-30 17:50:15.939
*** MODULE NAME:(sqlplus@FH-DB-01 (TNS V1-V3)) 2012-03-30 17:50:15.939
*** SERVICE NAME:(SYS$USERS) 2012-03-30 17:50:15.939
*** SESSION ID:(921.23041) 2012-03-30 17:50:15.939
*** SESSION ID:(921.23041) 2012-03-30 17:50:15.939
OBJD MISMATCH typ=6, seg.obj=-2, diskobj=294051, dsflg=0, dsobj=294044, tid=294044, cls=1
Input data (nil), 0, 0
Formatted dump of block:
buffer tsn: 10 rdba: 0x0435c094 (1024/70631572)
scn: 0x0b2d.2b4f8874 seq: 0x01 flg: 0x04 tail: 0x88740601
frmt: 0x02 chkval: 0x5626 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
Dump of memory from 0x000000006DBB2000 to 0x000000006DBB4000
06DBB2000 0000A206 0435C094 2B4F8874 04010B2D  [......5.t.O+-...]
06DBB2010 00005626 00000002 00047CA3 2B4F8874  [&V.......|..t.O+]
06DBB2020 00000B2D 00320002 0435C091 00000000  [-.....2...5.....]
06DBB2030 00000000 00000000 00000000 00000000  [................]
        Repeat 2 times
06DBB2060 00000000 02800000 00000000 00240000  [..............$.]
06DBB2070 1F3C1F60 00000000 00000000 00000000  [`.<.............]
06DBB2080 00000000 00001F60 00001F5C 00000000  [....`...\.......]
06DBB2090 00000000 00000000 00000000 00000000  [................]
        Repeat 501 times
06DBB3FF0 00000000 00000000 00000000 88740601  [..............t.]
Block header dump:  0x0435c094
 Object id on Block? Y
 seg/obj: 0x47ca3  csc: 0xb2d.2b4f8874  itc: 2  flg: E  typ: 2 - INDEX
     brn: 0  bdba: 0x435c091 ver: 0x01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000

这里可以得出几个结论:
1)版本平台信息
2)通过下面信息都是知道,data_object_id=294044的对象出现异常(seg.obj=-2)

OBJD MISMATCH typ=6, seg.obj=-2, diskobj=294051, dsflg=0, dsobj=294044, tid=294044, cls=1

Block header dump:  0x0435c094
 Object id on Block? Y
 seg/obj: 0x47ca3  csc: 0xb2d.2b4f8874  itc: 2  flg: E  typ: 2 - INDEX
     brn: 0  bdba: 0x435c091 ver: 0x01 opc: 0
     inc: 0  exflg: 0

3)结合dba_objects视图查询出来是xifenfei_index 索引出现异常

问题原因猜测
因为没有找到权威解释,对于这里在online表空间的时候出现ORA-00600[kcbz_check_objd_typ]错误,个人的猜测可能是在online的时候,验证数据字典中关于该表空间中对象的相关记录和该表空间总的存储数据比较,然后发现不匹配,所以出现了该错误。

处理方法
虽然该表空间离线,但是我们可以使用drop操作直接删除数据字典中index的记录,然后再online表空间,这样可以绕过数据字典和表空间中的存储比较。当表空间online成功后,然后再创建index