创建DBFS

DBFS(Oracle Database File System)就是Oracle数据库11gR2中提供的能够在Linux和Solaris操作系统中将Oracle数据库当成文件系统来使用的功能.在DBFS内部,文件是以SecureFiles LOBs(对比与以前的BasicFiles LOBs)的形式存储在数据表中.这个功能第一个是存储图片或者文档,第二个功能就是在RAC或者XD中部署OGG是一个不错的选择.
安装fuse相关包

[root@xifenfei ~]# mount /dev/cdrom /media
mount: block device /dev/cdrom is write-protected, mounting read-only
[root@xifenfei ~]# cd /media/Server
[root@xifenfei Server]# ls fuse*
fuse-2.7.4-8.0.1.el5.x86_64.rpm      fuse-devel-2.7.4-8.0.1.el5.x86_64.rpm  fuse-libs-2.7.4-8.0.1.el5.x86_64.rpm
fuse-devel-2.7.4-8.0.1.el5.i386.rpm  fuse-libs-2.7.4-8.0.1.el5.i386.rpm
[root@xifenfei Server]# rpm -ivh fuse-libs-2.7.4-8.0.1.el5.x86_64.rpm
warning: fuse-libs-2.7.4-8.0.1.el5.x86_64.rpm: Header V3 DSA signature: NOKEY, key ID 1e5e0159
Preparing...                ########################################### [100%]
   1:fuse-libs              ########################################### [100%]
[root@xifenfei Server]# rpm -ivh  fuse-devel-2.7.4-8.0.1.el5.x86_64.rpm
warning: fuse-devel-2.7.4-8.0.1.el5.x86_64.rpm: Header V3 DSA signature: NOKEY, key ID 1e5e0159
Preparing...                ########################################### [100%]
   1:fuse-devel             ########################################### [100%]

[root@xifenfei Server]# rpm -ivh  fuse-2.7.4-8.0.1.el5.x86_64.rpm
warning: fuse-devel-2.7.4-8.0.1.el5.x86_64.rpm: Header V3 DSA signature: NOKEY, key ID 1e5e0159
Preparing...                ########################################### [100%]
   1:fuse                   ########################################### [100%]

系统配置

[root@xifenfei Server]# cd /
[root@xifenfei /]# mkdir dbfs
[root@xifenfei /]# chown ora11g:oinstall dbfs
[root@xifenfei /]# ls -l|grep dbfs
drwxr-xr-x   2 ora11g oinstall  4096 Sep  1 16:41 dbfs
[root@xifenfei /]# echo "/usr/local/lib" >> /etc/ld.so.conf.d/usr_local_lib.conf
[root@xifenfei /]# export ORACLE_HOME=/u01/oracle11
[root@xifenfei /]# ln -s $ORACLE_HOME/lib/libclntsh.so.11.1 /usr/local/lib/libclntsh.so.11.1
[root@xifenfei /]# ln -s $ORACLE_HOME/lib/libnnz11.so /usr/local/lib/libnnz11.so
[root@xifenfei /]# ln -s /lib64/libfuse.so.2 /usr/local/lib/libfuse.so.2
[root@xifenfei /]# cd /usr/local/lib
[root@xifenfei lib]# ls -l
total 0
lrwxrwxrwx 1 root root 35 Sep  1 16:45 libclntsh.so.11.1 -> /u01/oracle11/lib/libclntsh.so.11.1
lrwxrwxrwx 1 root root 19 Sep  1 16:46 libfuse.so.2 -> /lib64/libfuse.so.2
lrwxrwxrwx 1 root root 29 Sep  1 16:46 libnnz11.so -> /u01/oracle11/lib/libnnz11.so
[root@xifenfei lib]# ldconfig
[root@xifenfei lib]# chmod +x /usr/bin/fusermount
[root@xifenfei lib]# ls -l /usr/bin/fusermount
lrwxrwxrwx 1 root root 15 Sep  1 16:37 /usr/bin/fusermount -> /bin/fusermount
[root@xifenfei lib]#  ls -l /bin/fusermount
-rwsr-x--x 1 root fuse 23544 Oct 18  2011 /bin/fusermount

相关表空间/用户配置

SQL> create tablespace dbfs_ts 
  2  datafile '/u01/oradata/ora11g/xifenfei01.dbf' 
  3  size 20m autoextend on next 10m maxsize 30g;

Tablespace created.

SQL> create user dbfs  identified by dbfs
  2   default tablespace dbfs_ts
  3   quota unlimited on dbfs_ts;

User created.

SQL> grant create session, resource, create view, dbfs_role to dbfs ;

Grant succeeded.

创建filesystem

[ora11g@xifenfei admin]$ cd $ORACLE_HOME/rdbms/admin
[ora11g@xifenfei admin]$ sqlplus dbfs/dbfs@ora11g

SQL*Plus: Release 11.2.0.3.0 Production on Sat Sep 1 16:43:04 2012

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> @dbfs_create_filesystem.sql dbfs_ts my_dbfs
No errors.
--------
CREATE STORE:
begin dbms_dbfs_sfs.createFilesystem(store_name => 'FS_MY_DBFS', tbl_name =>
'T_MY_DBFS', tbl_tbs => 'dbfs_ts', lob_tbs => 'dbfs_ts', do_partition => false,
partition_key => 1, do_compress => false, compression => '', do_dedup => false,
do_encrypt => false); end;
--------
REGISTER STORE:
begin dbms_dbfs_content.registerStore(store_name=> 'FS_MY_DBFS', provider_name
=> 'sample1', provider_package => 'dbms_dbfs_sfs'); end;
--------
MOUNT STORE:
begin dbms_dbfs_content.mountStore(store_name=>'FS_MY_DBFS',
store_mount=>'my_dbfs'); end;
--------
CHMOD STORE:
declare m integer; begin m := dbms_fuse.fs_chmod('/my_dbfs', 16895); end;
No errors.

挂载dbfs

[ora11g@xifenfei ~]$ more /home/ora11g/xifenfei_pwd 
dbfs
[ora11g@xifenfei ~]$ nohup dbfs_client dbfs@ora11g /dbfs <xifenfei_pwd &
[1] 3694
[ora11g@xifenfei ~]$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                      3.9G  3.2G  462M  88% /
/dev/sda1              99M   24M   71M  25% /boot
tmpfs                1002M  184M  818M  19% /dev/shm
/dev/sdb1              20G  8.9G  9.9G  48% /u01
df: `/dbfs': Resource temporarily unavailable

查询mos发现 OS 2.6.32-100.26.2.el5 to: 2.6.32-300.10.1.el5uek (Linux UEK Kernel)会出现该问题,解决方法是升级Kernel或者不使用UEK

[ora11g@xifenfei ~]$ uname -a
Linux xifenfei 2.6.32-300.10.1.el5uek #1 SMP Wed Feb 22 17:37:40 EST 2012 x86_64 x86_64 x86_64 GNU/Linux

不幸刚好中招,现在升级是不太可能的事情,只能使用其他kernel来启动系统(下图选择第二个)

重新挂载dbfs并且测试

[ora11g@xifenfei ~]$ uname -a
Linux xifenfei 2.6.18-308.el5 #1 SMP Sat Feb 25 12:40:07 EST 2012 x86_64 x86_64 x86_64 GNU/Linux
[ora11g@xifenfei ~]$ nohup dbfs_client dbfs@ora11g /dbfs <xifenfei_pwd &
[1] 3694
[ora11g@xifenfei ~]$ nohup: appending output to `nohup.out'
[ora11g@xifenfei ~]$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                      3.9G  3.2G  462M  88% /
/dev/sda1              99M   24M   71M  25% /boot
tmpfs                1006M  184M  822M  19% /dev/shm
/dev/sdb1              20G  8.9G  9.9G  48% /u01
dbfs-dbfs@ora11g:/     19M  120K   19M   1% /dbfs
[ora11g@xifenfei ~]$ cd /dbfs
[ora11g@xifenfei dbfs]$ ls
my_dbfs
[ora11g@xifenfei dbfs]$ cd my_dbfs/
[ora11g@xifenfei my_dbfs]$ ls
[ora11g@xifenfei my_dbfs]$ cat /etc/passwd>xifenfei.chf
[ora11g@xifenfei my_dbfs]$ ll
total 2
-rw-r--r-- 1 ora11g oinstall 1736 Sep  1 21:05 xifenfei.chf

卸载dbfs

[ora11g@xifenfei ~]$ fusermount -u /dbfs
[ora11g@xifenfei ~]$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                      3.9G  3.2G  462M  88% /
/dev/sda1              99M   24M   71M  25% /boot
tmpfs                1006M  184M  822M  19% /dev/shm
/dev/sdb1              20G  8.9G  9.9G  48% /u01

删除filesystem

cd $ORACLE_HOME/rdbms/admin
sqlplus dbfs_user/dbfs_user
SQL> @dbfs_drop_filesystem.sql my_dbfs

解决因/etc/fstab错误导致系统不能启动故障

发现如果人品不好做试验都是问题很多,晚上又把fstab给写错了,导致系统不能起来,因为当时处理该故障未截图,后续在网上找了几张图片,大体说明处理思路
系统启动报 filesystems 失败,输入root密码进入repair filesystem模式

尝试修改 /etc/fstab 发现系统是read-only模式

重新mount -n -o remount,rw /重新mount文件系统

重新修改/etc/fstab,除掉错误记录,然后使用init 6/reboot命令重启系统

安装ORACLE db /tmp空间不足解决办法

因测试需要装一个ORACLE 11G,在安装检测阶段报下图错误

详细信息

Free Space: xifenfei:/tmp - This is a prerequisite condition to test 
whether sufficient free space is available in the file system.
Error: 
 - 
PRVF-7501 : Sufficient space is not available at location "/tmp" 
on node "xifenfei" [Required space = 1GB ]  
- Cause:  Not enough free space at location specified.  - Action:  
Free up additional space or select another location. 
Expected Value
 : 1GB
Actual Value
 : 238MB

错误提示很明显ORACLE安装过程需要1G的临时空间,但是现在/tmp只有238M,空间明显不足,是的oracle检测失败,为了安装过程不出意外,决定分析并解决该问题

磁盘空间使用情况

[ora11g@xifenfei ~]$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                      3.9G  3.3G  238M  93% /
/dev/sda1              99M   24M   71M  25% /boot
tmpfs                1002M     0 1002M   0% /dev/shm
/dev/sdb1              20G  7.8G   11G  42% /u01

这里可以看出来/tmp没有另外的分配分区,而是挂载在/下面,也就是说,/tmp最多使用的空间就是/dev/mapper/VolGroup00-LogVol00分区能够使用的最大空间,也就是238M,证明ORACLE的检查程序说的是事实。

解决该问题
1.建立新tmp目录

[root@xifenfei ora11g]# mkdir /u01/tmp
[root@xifenfei ora11g]# chown root:root /u01/tmp
[root@xifenfei ora11g]# chmod 1777 /u01/tmp

2.设置数据库用户变量

vi db_home/.bash_profile
export TEMP=/u01/tmp
export TMPDIR=/u01/tmp

[ora11g@xifenfei ~]$ env|grep TMP
TMPDIR=/u01/tmp
[ora11g@xifenfei ~]$ env|grep TEMP
TEMP=/u01/tmp

3.重新运行runInstaller
4.安装完成清理相关/u01/tmp 和相关环境变了,让数据库使用系统默认(根据实际情况处理)

ORA-600和ORA-7445错误文章汇总

ORA-600和ORA-7445错误都是很多dba忌讳的错误,这里对本blog截止于:2012年09月15日的相关类此错误进行一个简单汇总,方便大家查阅.同时感谢“广州-紫恒”朋友在百忙中帮我整理出来.
ORA-600错误集合
ORA-600 [12235]
ORA-00600 [2662]
ORA-00600[4454]
ORA-00600[KSSADP1]
重现ORA-600[4000]异常
ORA-00600[4194]故障解决
ORA-00600[ktspNextL1:4]
ORA-600[4194]/[4193]解决
TOAD导致ORA-00600[17281]
ORA-00600[729]分析和处理方法
使用bbed解决ORA-00600[2662]
记录一次ORA-600[13013]处理过程
因为高版本引起ORA-00600[17059]
记录另一起ORA-00600[13013]处理
异常断电导致current redo损坏处理
记录一次ORA-00600[kdsgrp1]分析
异常断电导致current redo损坏处理
记录一次ORA-00600[2252]故障解决
因使用OEM引起ORA-00600[12761]
10.2.0.5出现ORA-00600[kcblasm_1]
ORA-00600[kgscLogOff-notempty]
通过bbed解决ORA-00600[4000]案例
ORA-00600 [ktbdchk1: bad dscn] 解决
11G RAC库 ORA-00600[ktubko_1]错误
收集统计信息出现ORA-00600[ksxprqfre3]
ORA-607/ORA-600[4194]不一定是重大灾难
创建控制文件遭遇ORA-00600[3753]故障解决
控制文件异常导致ORA-00600[kccsbck_first]
数据库报ORA-00607/ORA-00600[4194]错误
ORA-00600[kcratr_nab_less_than_odr]故障解决
PL/SQL Developer编译过程引起ora-600[15419]
ORA-07445[kslgetl()+120]/ORA-00108错误解决
双机mount数据库出现ORA-00600[kccsbck_first]
使用bbed解决ORA-00607/ORA-00600[4194]故障
通过bbed模拟ORA-00607/ORA-00600[4194]故障
truncate table强制终止导致ORA-00600[ktspfundo-2]
客户端版本导致ORA-00600[kssadd_stage: null parent]
动态修改PGA_AGGREGATE_TARGET 导致ORA-600[723]
ORA-00600[kcratr1_lostwrt]/ORA-00600[3020]错误恢复
ORA-600 [LibraryCacheNotEmptyOnClose] on shutdown
ORA-600[6749] 发生在 SYSMAN.MGMT_METRICS_RAW表
老版本PL/SQL Developer操作数据库导致ORA-00600[17113]
ASMM表空间强制终止DML操作导致ORA-600 [ktspfupdst-1]
表空间online出现ORA-00600[kcbz_check_objd_typ]处理过程
spfile被覆盖导致ORA-600[kmgs_parameter_update_timeout_1]

ORA-7445错误集合
ORA-7445[__milli_memcpy]分析
8i升级到9i出现ORA-07445[pevm_MOVC_i()+18]
ORA-07445[kslgetl()+120]/ORA-00108错误解决
遭遇ORA-07445[kkdliac()+346]使用odu抢救数据
ORA-07445[dbgrmqmqpk_query_pick_key()+0f88]
ORA-07445 [ACCESS_VIOLATION] [UNABLE_TO_READ] []

ORA-30013导致RAC 节点down掉

今天一朋友让我帮忙分析他们的9.2.0.2 rac 节点2异常down掉原因,相关信息如下:
前提信息

OS:HP-UX B.11.31
DB:9.2.0.2.0 RAC

节点2alert日志信息

Fri Sep  7 13:13:49 2012
ARC0: Completed archiving  log 11 thread 2 sequence 11651
Fri Sep  7 13:31:56 2012
Errors in file /oracle/admin/agent/udump/agent2_ora_797.trc:
ORA-00600: internal error code, arguments: [kgavsd_3], [0], [], [], [], [], [], []
ORA-00028: your session has been killed
Fri Sep  7 13:31:58 2012
Errors in file /oracle/admin/agent/bdump/agent2_pmon_5938.trc:
ORA-30013: undo tablespace 'UNDOTBS1' is currently in use
Fri Sep  7 13:31:58 2012
PMON: terminating instance due to error 30013
Fri Sep  7 13:31:58 2012
Errors in file /oracle/admin/agent/bdump/agent2_lms7_6033.trc:
ORA-30013: undo tablespace '' is currently in use
Fri Sep  7 13:31:58 2012
…………
Errors in file /oracle/admin/agent/bdump/agent2_lms0_6027.trc:
ORA-30013: undo tablespace '' is currently in use
Fri Sep  7 13:31:58 2012
System state dump is made for local instance
Fri Sep  7 13:32:03 2012
Instance terminated by PMON, pid = 5938
Fri Sep  7 14:34:35 2012

这里可以看到因为ORA-30013的错误使得pmon进程异常,从而使得该rac的节点2 down掉.同时这里还发现了ORA-00600[kgavsd_3]错误,是否是因为该ORA-600导致了数据库异常down还是一个偶然机会,我们继续分析

查看ORA-600[kgavsd_3]相关trace文件

*********START PLSQL RUNTIME DUMP************
***Got ORA-28 while running PLSQL***
***********END PLSQL RUNTIME DUMP************
*** 2012-09-07 13:31:56.740
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kgavsd_3], [0], [], [], [], [], [], []
ORA-00028: your session has been killed
Current SQL statement for this session:
--用户补档
DECLARE
  OUT_ERR_CODE NUMBER;
  OUT_ERR_MSG  VARCHAR2(1000);
  V_COUNT NUMBER;
BEGIN   
    WHILE TRUE LOOP
      SELECT COUNT(*) INTO V_COUNT FROM amc_stat_log  where proc_name in('pRunOdsChannelWareData') 
       and run_param=201208 AND STATE='A';
      IF V_COUNT>0 THEN
         dbms_output.put_line('exit loop '|| sysdate);
         EXIT; 
      END IF;
      sys.Dbms_Lock.sleep(600);
      dbms_output.put_line('wake up '|| sysdate);
    END LOOP;
   
   PKG_AME_ODS_DATA.P_Add_TO_AgentServ(201208,OUT_ERR_CODE,OUT_ERR_MSG);
   PKG_AME_ODS_DATA.P_Update_Serv_Ware_ID(201208, OUT_ERR_CODE, OUT_ERR_MSG);
   PKG_AMS_SETTLE.P_COMMISION_51PRE_FLAG(201208,OUT_ERR_CODE, OUT_ERR_MSG); 
END;
----- PL/SQL Call Stack -----
  object      line  object
  handle    number  name
c000000e84ec59d0        14  anonymous block
----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
Cannot find symbol in .
Cannot find symbol in .
Cannot find symbol in .
ksedmp()+512         call     9fffffffffff3940     000000000 ?
                                                   C000000000000A17 ?
                                                   40000000025B6540 ?
ksfdmp()+64          call     9fffffffffff3940     000000003 ?
kgerinv()+352        call     9fffffffffff3940     60000000000466B0 ?
                                                   000000003 ?
                                                   C000000000000714 ?
                                                   4000000004EBA6A0 ?
                                                   00001821B ?
                                                   6000000000468EA8 ?
kgesinv()+48         call     9fffffffffff3940     60000000000466B0 ?
                                                   600000000059BD98 ?
                                                   600000000046B070 ?
                                                   60000000000179C0 ?
                                                   6000000000017950 ?
kgesin()+112         call     9fffffffffff3940     60000000000466B0 ?
                                                   600000000059BD98 ?
                                                   4000000000B44C10 ?
                                                   000000001 ?
                                                   9FFFFFFFFFFF4310 ?
$cold_kgavsd_stackl  call     9fffffffffff3940     60000000000466B0 ?
et_done()+1184                                     600000000059BD98 ?
                                                   4000000000B44C10 ?
                                                   000000001 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   00001FE19 ?
pbesd_stacklet_done  call     9fffffffffff3940     60000000000466B0 ?
()+96                                              000000002 ? 000000000 ?
                                                   9FFFFFFFBEC6AE68 ?
pfrrun()+3328        call     9fffffffffff3940     9FFFFFFFBEC6AE68 ?
                                                   C000000000001D42 ?
                                                   4000000001ABBDA0 ?
                                                   9FFFFFFFBEC6B06C ?
                                                   9FFFFFFFFFFF64F0 ?
                                                   6000000000468EA8 ?
                                                   000000000 ? 000000000 ?
peicnt()+480         call     0000000000000000     9FFFFFFFBEC6AEE8 ?
                                                   C000000000000996 ?
                                                   40000000019F3680 ?
kkxexe()+816         call     0000000000000000     9FFFFFFFFFFF64F0 ?
                                                   9FFFFFFFBEC6AE68 ?
                                                   C00000000000099B ?
                                                   4000000001DD90F0 ?
                                                   00000FE4F ?
                                                   9FFFFFFFFFFF5F00 ?
                                                   60000000000467F0 ?
                                                   4000000000B603F0 ?
opiexe()+11168       call     0000000000000000     000000000 ?
                                                   C000000000002C60 ?
                                                   4000000001BEF980 ?
                                                   00000800F ?
                                                   9FFFFFFFFFFF6470 ?
                                                   9FFFFFFFBEC6AEB2 ?
                                                   6000000000040900 ?
                                                   9FFFFFFFBEC6B534 ?
opiall0()+3184       call     _etext_f()+23058430  000000004 ? 0000000C4 ?
                              09110686928          9FFFFFFFFFFF7B40 ?
                                                   C000000000002BDF ?
                                                   4000000001B26CD0 ?
                                                   000000000 ? 00000C893 ?
                                                   9FFFFFFFFFFF6690 ?
Cannot find symbol in .
kpoal8()+2064        call     9fffffffffff7ad0     000000001 ?
                                                   9FFFFFFFFFFF8304 ?
                                                   FFFFFFFFBFFFFFFF ?
                                                   9FFFFFFFFFFF83E4 ?
                                                   FFFFFFFFFFE7FBDF ?
                                                   9FFFFFFFFFFF7B88 ?
                                                   000000000 ?
                                                   6000000000474528 ?
opiodr()+3584        call     9fffffffffff81fc     6000000000040950 ?
                                                   000000000 ? 000000000 ?
                                                   C000000000002C60 ?
                                                   4000000001C09FE0 ?
                                                   00000C50B ?
                                                   9FFFFFFFFFFF81F0 ?
                                                   9FFFFFFFFFFF81D8 ?
ttcpip()+3776        call     _etext_f()+23058430  00000005E ? 000000014 ?
                              09114957288          9FFFFFFFFFFFA5F0 ?
                                                   6000000000040918 ?
                                                   C000000000001ABD ?
                                                   4000000001AB3BA0 ?
                                                   000000000 ? 00000C59B ?
opitsk()+1872        call     9fffffffffffa200     6000000000049FC0 ?
                                                   000000001 ?
                                                   9FFFFFFFFFFFA5F0 ?
                                                   000000001 ?
                                                   9FFFFFFFFFFFA740 ?
                                                   9FFFFFFFFFFFA564 ?
                                                   9FFFFFFFBF780058 ?
                                                   000000000 ?
opiino()+3184        call     000000000000057b     000000000 ? 000000000 ?
                                                   C00000000000132B ?
                                                   4000000001F78730 ?
                                                   000008001 ?
opiodr()+3584        call     0000000000000000     6000000000548A38 ?
                                                   4000000000B606F0 ?
                                                   6000000000548A38 ?
                                                   C000000000002C60 ?
                                                   4000000001C09FE0 ?
                                                   00000A201 ?
                                                   9FFFFFFFFFFFBC90 ?
                                                   4000000000B606F0 ?
opidrv()+976         call     _etext_f()+23058430  00000003C ? 000000004 ?
                              09114957288          9FFFFFFFFFFFEFB0 ?
                                                   6000000000040918 ?
sou2o()+80           call     _etext_f()+23058430  000000004 ? 000000004 ?
                              09114957288          9FFFFFFFFFFFEFB0 ?
main()+352           call     _etext_f()+23058430  9FFFFFFFFFFFEFD0 ?
                              09114957288          9FFFFFFFFFFFEFD4 ?
                                                   60000000004744F0 ?
                                                   9FFFFFFFFFFFEFB0 ?
main_opd_entry()+80  call     _etext_f()+23058430  000000000 ?
                              09114957288          9FFFFFFFFFFFF498 ?
                                                   C000000000000004 ?
                                                   C00000000002BE30 ?
 
--------------------- Binary Stack Dump ---------------------

Process global information:
     process: c000000d6428c0c0, call: c000000e46e772a8, xact: 0000000000000000, 
     curses: c000000d6437d020, usrses: c000000d6437d020
  ----------------------------------------
  SO: c000000d6428c0c0, type: 2, owner: 0000000000000000, flag: INIT/-/-/0x00
  (process) Oracle pid=282, calls cur/top: c000000e46e772a8/c000000e46e772a8, flag: (0) -
            int error: 28, call error: 0, sess error: 0, txn error 0
  (post info) last post received: 0 0 0
              last post received-location: No post
              last process to post me: c000000d64234f18 1 6
              last post sent: 0 0 104
              last post sent-location: kglpsl: in loop
              last process posted by me: c000000d6428e900 23 0
    (latch info) wait_event=0 bits=0
    Process Group: DEFAULT, pseudo proc: c000000d62234ee0
    O/S info: user: oracle, term: UNKNOWN, ospid: 797
    OSD pid info: Unix process pid: 797, image: oracle@gzagent2 (TNS V1-V3)
    ----------------------------------------
    SO: c000000d6437d020, type: 4, owner: c000000d6428c0c0, flag: INIT/-/-/0x00
    (session) trans: 0000000000000000, creator: c000000d6428c0c0, flag: (41) USR/- BSY/-/-/-/KIL/-
              DID: 0000-0000-00000000, short-term DID: 0000-0000-00000000
              txn branch: 0000000000000000
              oct: 0, prv: 0, sql: c000000e76b03f10, psql: 0000000000000000, user: 31/CUSTOM
    O/S info: user: huangqianhai_lc, term: SVCTAG-D1MLV2X, ospid: 11124:11796, machine: WORKGROUP\SVCTAG-D1MLV2X
              program: plsqldev.exe
    application name: PL/SQL Developer, hash value=1190136663
    action name: 测试窗口 - 新建, hash value=3604520210
    last wait for 'null event' blocking sess=0x0 seq=142 wait_time=567341620
                =ea60, =0, =0
    temporary object counter: 0
      ----------------------------------------

通过这里可以看出来是因为pl/sql dev进行一个plsql的操作导致该错误发生,查询MOS[ID 403575.1]发现

Applies to:

Oracle Server - Enterprise Edition - Version: 9.2.0.7 and later   [Release: 9.2 and later ]
Information in this document applies to any platform.
***Checked for relevance on 27-Oct-2010***
Symptoms

The following errors appears in the alert log file :

Probe:read_pipe: receive failed, status 3
Probe:S:debug_loop: timeout. Action 1

*********START PLSQL RUNTIME DUMP************
***Got ORA-604 while running PLSQL***
***********END PLSQL RUNTIME DUMP************

ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kgavsd_3], [0], [], [], [], [], [], []
ORA-00604: error occurred at recursive SQL level 2

Current SQL statement for this session:
begin :id := sys.dbms_transaction.local_transaction_id; end;


.
Cause

The ora-600 kgavsd_3 appears when calling kgavsd_stacklet_done function which is related to PLSQL DEBUG.

From traces, dbms_debug package is being used during trace generation.
The return code of 3 further indicates that the dbms_pipe message was interrupted.

Probably user cancelled a plsql program, so the appeared while trying to dump the stack

Solution

There is no data corruption over here. 
The error appears to be due to abnormal termination of aPL/SQL Developer application while executing a PL/SQL block.
Changing the PL/SQL and/or the procedure code could help in avoiding this error message.

Hence, this error can be safely ignored.

查找trace文件确实发现有name=SYS.DBMS_DEBUG,进一步表明该错误是由于plsql dev工具使用debug模式运行上面的plsql而引起该错误的发生,但是因为mos中记录和错误不是完全的一致,所以不能十分确定是该错误导致数据库down掉

继续分析ORA-30013

Error:      ORA-30013  (ORA-30013)
Text:      undo tablespace '%s' is currently in use 
---------------------------------------------------------------------------
Cause:    the specified undo tablespace is currently used by another instance. 
Action:    Wait for the undo tablespace to become available or change to another name and reissue the statement.

这个说明是没有疑问的:因为2节点配置的当前undo是UNDOTBS2,而UNDOTBS1是1节点使用的,证明这里的undo确实发生了错误,继续查询mos发现Bug 3368552

Hdr: 3368552 9.2.0.3 RDBMS 9.2.0.3 RAC PRODID-5 PORTID-23
Abstract: RAC:  ORA-30013 WHEN INSTANCE 2 ATTEMPTS TO ACCESS UNDO TABLESPACE OF INSTANCE 1
 
*** 01/12/04 06:21 am ***
TAR:
----
3554549.995
 
PROBLEM:
--------
The RAC database has been stable, but experienced an instance termination due 
to ORA-30031 error in the alert log (instance 2):
 
...
Tue Dec 23 03:01:46 2003
ARC1: Evaluating archive  log 4 thread 2 sequence 1116
ARC1: Beginning to archive log 4 thread 2 sequence 1116
Creating archive destination LOG_ARCHIVE_DEST_1: 
'/oracle/oradata/VLDB/logs/archives/VLDBN2/VLDB_0000001116_0002.arc'
ARC1: Completed archiving  log 4 thread 2 sequence 1116
Tue Dec 23 08:14:09 2003
Errors in file /oracle/admin/VLDB/bdump/vldbn2_pmon_22860.trc:
ORA-30013: undo tablespace 'UNDOTBS1' is currently in use
Tue Dec 23 08:14:09 2003
PMON: terminating instance due to error 30013
Tue Dec 23 08:14:10 2003
System state dump is made for local instance
Tue Dec 23 08:14:12 2003
Trace dumping is performing id=[cdmp_20031223081410]
Tue Dec 23 08:14:14 2003
Instance terminated by PMON, pid = 22860
<eof>
 
Instance 1 alert log shows only the reconfiguration and the cdump info:
..
Tue Dec 23 03:54:13 2003
Errors in file /oracle/admin/VLDB/udump/vldbn1_ora_13564.trc:
ORA-27037: unable to obtain file status
SVR4 Error: 2: No such file or directory
Additional information: 4
Tue Dec 23 08:14:10 2003
Trace dumping is performing id=[cdmp_20031223081410]
Tue Dec 23 08:14:12 2003
Reconfiguration started
List of nodes: 0,
Global Resource Directory frozen
one node partition
Communication channels reestablished
...

因为在9.2.0.3的RAC中有着该bug,那么我们可以大胆猜测在9.2.0.2中应该存在该bug,那么结合上面的ORA-00600[kgavsd_3]错误,我们大概还原该事故的全部:
1.节点1 dml操作了程序中报错的plsql中要范围的部分表对象,但是未提交(或者正在执行)
2.节点2 有用户使用pl/sql dev去执行程序中的plsql,因为是debug模式执行,需要UNDOTBS1的块来构建cr,从而使得节点2去访问UNDOTBS1,引发了Bug 3368552 从而使得数据库直接kill掉该plsql dev会话,进而出现ORA-00600[kgavsd_3]错误和pmon进程异常使得节点2 down掉