使用flashback database找回被误删除表空间

我个人不怎么推荐使用flashback database功能,因为在生产库中,谁能够承受得起整个库的回退(也许特别极端的情况下可能需要使用到),今天帮网友恢复了一个案例:删除表空间,然后尝试着flashback database功能把这个删除的表空间找回来,但是他在整个操作过程中思路比较混乱,最后导致数据库不能正常起来。因为网友有这个方面的需求,我做了一个flashback database 找回表空间操作的试验(官方建议:flashback database在不改变数据文件的情况下使用)

1、恢复过程

SQL> select * from v$version;

BANNER
-----------------------------------------------------------------------

Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
PL/SQL Release 11.2.0.1.0 - Production
CORE    11.2.0.1.0      Production
TNS for 32-bit Windows: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production

SQL> select flashback_on from v$database;

FLASHBACK_ON
------------------
YES

SQL> create tablespace xff_test datafile 'E:\ORACLE\ORADATA\XFF\xff_test01.dbf'
size 10m autoextend on next 1m maxsize 20m;

表空间已创建。

SQL> select to_char(sysdate,'yyyy-mm-dd hh24:mi:ss') from dual;

TO_CHAR(SYSDATE,'YY
-------------------
2011-11-12 21:05:07

SQL> create table chf.t_flashback tablespace xff_test
  2  as
  3  select * from dba_objects;

表已创建。

SQL> select count(*) from chf.t_flashback;

  COUNT(*)
----------
     73211

SQL> select to_char(sysdate,'yyyy-mm-dd hh24:mi:ss') from dual;

TO_CHAR(SYSDATE,'YY
-------------------
2011-11-12 21:06:35

SQL> host dir E:\ORACLE\ORADATA\XFF\xff_test01.dbf
 驱动器 E 中的卷没有标签。
 卷的序列号是 38D0-2A35

 E:\ORACLE\ORADATA\XFF 的目录

2011/11/12  21:04        10,493,952 XFF_TEST01.DBF
               1 个文件     10,493,952 字节
               0 个目录 14,644,822,016 可用字节

SQL> drop tablespace xff_test including contents and datafiles;

表空间已删除。

SQL> host dir E:\ORACLE\ORADATA\XFF\xff_test01.dbf
 驱动器 E 中的卷没有标签。
 卷的序列号是 38D0-2A35

 E:\ORACLE\ORADATA\XFF 的目录

找不到文件


SQL> shutdown immediate
数据库已经关闭。
已经卸载数据库。
ORACLE 例程已经关闭。
SQL> startup mount;
ORACLE 例程已经启动。

Total System Global Area  368263168 bytes
Fixed Size                  1374668 bytes
Variable Size             276825652 bytes
Database Buffers           83886080 bytes
Redo Buffers                6176768 bytes
数据库装载完毕。
SQL> flashback database to timestamp 
2  to_timestamp('2011-11-12 21:06:35','yyyy-mm-dd hh24:mi:ss');

闪回完成。


SQL> alter database open resetlogs;

数据库已更改。

SQL> select name from v$datafile;

NAME
-------------------------------------------------------------
E:\ORACLE\ORADATA\XFF\SYSTEM01.DBF
E:\ORACLE\ORADATA\XFF\SYSAUX01.DBF
E:\ORACLE\ORADATA\XFF\UNDOTBS01.DBF
E:\ORACLE\ORADATA\XFF\USERS01.DBF
E:\ORACLE\ORADATA\XFF\EXAMPLE01.DBF
E:\ORACLE\ORADATA\XFF\O_ORACLE.DBF
E:\ORACLE\ORADATA\XFF\XIFENFEI01.DBF
E:\ORACLE\ORADATA\XFF\P_TEST01.DBF
E:\ORACLE\ORADATA\XFF\SYS_MG01.DBF
E:\ORACLE\ORADATA\XFF\P101.DBF
E:\ORACLE\ORADATA\XFF\P201.DBF

NAME
---------------------------------------------------------------

E:\ORACLE\ORADATA\XFF\P301.DBF
E:\ORACLE\11_2_0\DATABASE\UNNAMED00013
E:\ORACLE\ORADATA\XFF\OGG01.DBF
E:\ORACLE\ORADATA\XFF\SPOT01.DBF

已选择15行。

SQL> desc chf.t_flashback;
ERROR:
ORA-04043: 对象 chf.t_flashback 不存在

SQL> COL ERROR FOR A20
SQL> SELECT FILE#,ONLINE_STATUS,ERROR FROM V$RECOVER_FILE;

     FILE# ONLINE_ ERROR
---------- ------- --------------------
        13 OFFLINE FILE NOT FOUND

SQL> SELECT NAME FROM V$DATAFILE WHERE FILE#=13;

NAME
-----------------------------------------------------------------

E:\ORACLE\11_2_0\DATABASE\UNNAMED00013

SQL> HOST DIR E:\ORACLE\11_2_0\DATABASE\UNNAMED00013
 驱动器 E 中的卷没有标签。
 卷的序列号是 38D0-2A35

 E:\ORACLE\11_2_0\DATABASE 的目录

找不到文件

SQL> ALTER DATABASE CREATE DATAFILE 13 
AS 'E:\ORACLE\ORADATA\XFF\xff_test01.dbf';

数据库已更改。

SQL> HOST DIR E:\ORACLE\ORADATA\XFF\xff_test01.dbf
 驱动器 E 中的卷没有标签。
 卷的序列号是 38D0-2A35

 E:\ORACLE\ORADATA\XFF 的目录

2011/11/12  21:25        10,493,952 XFF_TEST01.DBF
               1 个文件     10,493,952 字节
               0 个目录 14,640,848,896 可用字节

SQL> recover datafile 13;
ORA-00279: 更改 10903431152368 (在 11/12/2011 21:04:40 生成) 对于线程 1
是必需的
ORA-00289: 建议: E:\ORACLE\ARCHIVELOG\ARC0000000241_0753489409.0001
ORA-00280: 更改 10903431152368 (用于线程 1) 在序列 #241 中


指定日志: {<RET>=suggested | filename | AUTO | CANCEL}
auto
已应用的日志。
完成介质恢复。

SQL> alter database datafile 13 online;

数据库已更改。

SQL> select count(*) from chf.t_flashback;

  COUNT(*)
----------
     73211

2、alert中关键信息

Sat Nov 12 21:12:30 2011
flashback database to timestamp to_timestamp('2011-11-12 21:06:35','yyyy-mm-dd hh24:mi:ss')
Flashback Restore Start
Flashback: created tablespace #18: 'XFF_TEST' in the controlfile.
Flashback: created OFFLINE file 'UNNAMED00013' for tablespace #18 in the controlfile.
Filename was:
'E:\ORACLE\ORADATA\XFF\XFF_TEST01.DBF' when dropped.
File will have to be restored from a backup and recovered.
Flashback: deleted datafile #13 in tablespace #18 from control file.
Flashback: dropped tablespace #18: 'XFF_TEST' from the control file.
Flashback Restore Complete
Flashback Media Recovery Start
 started logmerger process
Parallel Media Recovery started with 2 slaves
Sat Nov 12 21:12:38 2011
Recovery of Online Redo Log: Thread 1 Group 1 Seq 241 Reading mem 0
  Mem# 0: E:\ORACLE\ORADATA\XFF\REDO01
Flashback recovery: Added file #13 to control file as OFFLINE and 'UNNAMED00013'
because it was dropped during the flashback interval
or it was added during flashback media recovery.
File was originally created as:
'E:\ORACLE\ORADATA\XFF\XFF_TEST01.DBF'
File will have to be restored from a backup or
recreated using ALTER DATABASE CREATE DATAFILE command,
and the file has to be onlined and recovered.
Incomplete Recovery applied until change 10903431152644 time 11/12/2011 21:05:11
Sat Nov 12 21:12:44 2011
Flashback Media Recovery Complete
Completed: flashback database to timestamp to_timestamp('2011-11-12 21:06:35','yyyy-mm-dd hh24:mi:ss')
--flashback database 操作过程
……
Sat Nov 12 21:25:29 2011
ALTER DATABASE CREATE DATAFILE 13 AS 'E:\ORACLE\ORADATA\XFF\xff_test01.dbf'
Completed: ALTER DATABASE CREATE DATAFILE 13 AS 'E:\ORACLE\ORADATA\XFF\xff_test01.dbf'
Sat Nov 12 21:25:59 2011
ALTER DATABASE RECOVER  datafile 13  
Media Recovery Start
Serial Media Recovery started
Media Recovery start incarnation depth : 1, target inc# : 3, irscn : 10903431152644
ORA-279 signalled during: ALTER DATABASE RECOVER  datafile 13  ...
Sat Nov 12 21:28:27 2011
ALTER DATABASE RECOVER    CONTINUE DEFAULT  
Media Recovery Log E:\ORACLE\ARCHIVELOG\ARC0000000241_0753489409.0001
Recovery of Online Redo Log: Thread 1 Group 1 Seq 1 Reading mem 0
  Mem# 0: E:\ORACLE\ORADATA\XFF\REDO01
Completed: ALTER DATABASE RECOVER    CONTINUE DEFAULT  
Sat Nov 12 21:29:04 2011
alter database datafile 13 online
Completed: alter database datafile 13 online
--恢复被删除数据文件过程,flashbackup database会创建表空间,但是数据文件需要人工干预

Oracle JDBC Thin Driver Formats

虽然自己对java不熟悉,但是做为一个dba,经常会遇到java和数据库连接的配置文件,参考朋友网站,记录下来以便以后查找

格式一:  Oracle JDBC Thin using a ServiceName
jdbc:oracle:thin:@//<host>:<port>/<service_name> 
Example: jdbc:oracle:thin:@//192.168.2.1:1521/xifenfei
 
格式二: Oracle JDBC Thin using an SID
jdbc:oracle:thin:@<host>:<port>:<SID> 
Example: jdbc:oracle:thin:192.168.2.1:1521:xff
--注意这里的格式,@后面有//, 这是与使用SID的主要区别。

格式三:Oracle JDBC Thin using a TNSName
jdbc:oracle:thin:@<TNSName> 
Example: jdbc:oracle:thin:@GL 
--Support for TNSNames was added in the driver release 10.2.0.1

来源:Java JDBC Thin Driver 连接 Oracle 三种方法说明

Linux/Win中ORACLE_SID错误时表现

Linux系统中

[oracle@report ~]$ echo $ORACLE_SID
wlwjj
[oracle@report ~]$ export ORACLE_SID=xifenfei
[oracle@report ~]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.4.0 - Production on Fri Nov 11 23:36:27 2011

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.

Connected to an idle instance.

SQL> exit
Disconnected
[oracle@report ~]$  sqlplus abc/abc

SQL*Plus: Release 10.2.0.4.0 - Production on Fri Nov 11 23:42:32 2011

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.

ERROR:
ORA-01034: ORACLE not available
ORA-27101: shared memory realm does not exist
Linux-x86_64 Error: 2: No such file or directory

Win系统

E:\oracle\11_2_0\NETWORK\ADMIN>ECHO %ORACLE_SID%
xff

E:\oracle\11_2_0\NETWORK\ADMIN>SET ORACLE_SID=xifenfei

E:\oracle\11_2_0\NETWORK\ADMIN>sqlplus / as sysdba

SQL*Plus: Release 11.2.0.1.0 Production on 星期五 11月 11 23:31:14 2011

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

ERROR:
ORA-12560: TNS: 协议适配器错误


请输入用户名:

E:\oracle\11_2_0\NETWORK\ADMIN>sqlplus chf/xifenfei

SQL*Plus: Release 11.2.0.1.0 Production on 星期五 11月 11 23:34:01 2011

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

ERROR:
ORA-12560: TNS: 协议适配器错误


请输入用户名:

SQLNET.AUTHENTICATION_SERVICES参数说明

一、官方文档说明

作用
Use the parameter SQLNET.AUTHENTICATION_SERVICES to enable one or more authentication services. 
If authentication has been installed, 
it is recommended that this parameter be set to either none or to one of the authentication methods.

默认值
None

一般可选值
NONE for no authentication methods. A valid username and password can be used to access the database.
ALL for all authentication methods
NTS for Windows NT native authentication(An authentication method that enables 
a client single login access to a Windows NT server and a database running on the server)

为了加深对这几个参数的理解,通过实验证明,这几个参数在不同的系统中的作用

二、win系统

Microsoft Windows [版本 6.1.7601]
版权所有 (c) 2009 Microsoft Corporation。保留所有权利。

C:\Windows\system32>e:

E:\>cd E:\oracle\11_2_0\NETWORK\ADMIN

#sqlnet.ora文件不存在情况
E:\oracle\11_2_0\NETWORK\ADMIN>dir sqlnet.ora
 驱动器 E 中的卷没有标签。
 卷的序列号是 38D0-2A35

 E:\oracle\11_2_0\NETWORK\ADMIN 的目录

找不到文件

E:\oracle\11_2_0\NETWORK\ADMIN>sqlplus / as sysdba

SQL*Plus: Release 11.2.0.1.0 Production on 星期五 11月 11 22:13:57 2011

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

ERROR:
ORA-01031: 权限不足


请输入用户名:

#NTS情况
E:\oracle\11_2_0\NETWORK\ADMIN>more sqlnet.ora
SQLNET.AUTHENTICATION_SERVICES=(NTS)

E:\oracle\11_2_0\NETWORK\ADMIN>sqlplus / as sysdba

SQL*Plus: Release 11.2.0.1.0 Production on 星期五 11月 11 22:16:20 2011

Copyright (c) 1982, 2010, Oracle.  All rights reserved.


连接到:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options

#NONE情况
E:\oracle\11_2_0\NETWORK\ADMIN>more sqlnet.ora
SQLNET.AUTHENTICATION_SERVICES=(NONE)

E:\oracle\11_2_0\NETWORK\ADMIN>sqlplus / as sysdba

SQL*Plus: Release 11.2.0.1.0 Production on 星期五 11月 11 22:17:18 2011

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

ERROR:
ORA-01031: 权限不足


请输入用户名:

#ALL情况
E:\oracle\11_2_0\NETWORK\ADMIN>more sqlnet.ora
SQLNET.AUTHENTICATION_SERVICES=(ALL)

E:\oracle\11_2_0\NETWORK\ADMIN>sqlplus / as sysdba

SQL*Plus: Release 11.2.0.1.0 Production on 星期五 11月 11 22:18:02 2011

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

ERROR:
ORA-12641: 验证服务无法初始化


请输入用户名:

三、linux系统

[oracle@report ~]$ cd /opt/oracle/product/10.2.0/db_1/network/admin/
#NTS情况
[oracle@report admin]$ more sqlnet.ora 
SQLNET.AUTHENTICATION_SERVICES = (NTS)
[oracle@report admin]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.4.0 - Production on Fri Nov 11 22:03:51 2011

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.

ERROR:
ORA-01031: insufficient privileges


Enter user-name: 

#NONE情况
[oracle@report admin]$ more sqlnet.ora
SQLNET.AUTHENTICATION_SERVICES = (NONE)
[oracle@report admin]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.4.0 - Production on Fri Nov 11 22:04:31 2011

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.

ERROR:
ORA-01031: insufficient privileges


Enter user-name: 

#ALL情况
[oracle@report admin]$ more sqlnet.ora
SQLNET.AUTHENTICATION_SERVICES = (ALL)
[oracle@report admin]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.4.0 - Production on Fri Nov 11 22:05:07 2011

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

#不存在sqlnet.ora文件情况
[oracle@report admin]$ ll sqlnet.ora
ls: sqlnet.ora: No such file or directory
[oracle@report admin]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.4.0 - Production on Fri Nov 11 22:05:41 2011

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

四、补充说明
1、在win系统中使用all,提示ORA-12641,不是很清楚原因
2、在nts只有在win系统中有用,linux中无用
3、当不存在sqlnet.ora文件时,linux中可以正常登录,win中不能

HP Itaniums上一次ORA-240异常处理

一、问题由来
今天有网友和我说,他的数据库始终只能mount,不能打开到open状态,具体状态为:

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01154: database busy. Open, close, mount, and dismount not allowed now

启动数据库一直停留在Database mounted.
最后报ORA-03113: end-of-file on communication channel

二、数据库环境

OS version:HP-UX Itanium  version 11.31
Oracle version:Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit

三、日志文件错误
查看alert日志

Fri Nov 11 09:31:01 2011
Completed: ALTER DATABASE   MOUNT
Fri Nov 11 09:31:08 2011
ALTER DATABASE OPEN
Fri Nov 11 09:40:32 2011
Errors in file /opt/oracle/oracle1/admin/gminer/bdump/gminer_dbw0_3122.trc:
ORA-00240: control file enqueue held for more than 120 seconds
Fri Nov 11 09:47:18 2011
Errors in file /opt/oracle/oracle1/admin/gminer/bdump/gminer_dbw0_3122.trc:
Fri Nov 11 10:02:53 2011
Errors in file /opt/oracle/oracle1/admin/gminer/udump/gminer_ora_3555.trc:
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 3122'
Fri Nov 11 10:02:55 2011
System State dumped to trace file /opt/oracle/oracle1/admin/gminer/udump/gminer_ora_3555.trc
Killing enqueue blocker (pid=3122) on resource CF-00000000-00000000
 by killing session 54.1
Fri Nov 11 10:07:57 2011
Errors in file /opt/oracle/oracle1/admin/gminer/udump/gminer_ora_3555.trc:
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 3122'
Fri Nov 11 10:07:59 2011
System State dumped to trace file /opt/oracle/oracle1/admin/gminer/udump/gminer_ora_3555.trc
Killing enqueue blocker (pid=3122) on resource CF-00000000-00000000
 by terminating the process
USER: terminating instance due to error 2103
Instance terminated by USER, pid = 3555

四、分析ORA-00240和ORA-00494

[oracle@node1 ~]$ oerr ora 240
00240, 00000, "control file enqueue held for more than %s seconds"
// *Cause:  The current process did not release the control file enqueue within
//          the maximum allowed time.
// *Action: Reissue any commands that failed and contact Oracle Support
//          Services with the error information.
[oracle@node1 ~]$ oerr ora 494
00494, 00000, "enqueue%s held for too long%s by '%s'"
// *Cause:  The specified process did not release the enqueue within
//          the maximum allowed time.
// *Action: Reissue any commands that failed and contact Oracle Support
//          Services with the incident information.
ORA-00240: control file enqueue held for more than 120 seconds
ORA-00494: enqueue [CF] held for too long (more than 900 seconds)

dbw0进程持有control file enqueue超过了默认的900s,导致系统kill掉持有这个enqueue的会话,然后数据库自动终止。现在问题定位在control file enqueue的持有时间为什么会超时(ORA-00240)上面

五、深入理解ORA-00240

ORA-00240: control file enqueue held for more than 120 seconds
The Developers confirmed that this is just a warning to let the DBA know that a CF enqueue is being held for more than 120 seconds. 
This is not an error, and error will occur if a CF enqueue is held for more than 900 seconds (15 minutes) and this is not the case here.
The message occurs when there are many datafiles in the database. DBWriter (dbw0) is taking too much time to release the CF enqueue due to having to open these datafiles.

从mos的这几句中,可以分析出来,是因为dbw0在open数据文件的过程中占用了太多时间的CF enqueue。这说明和存储或者asm有关系

六、继续分析日志
分析日志,找出出现ORA-00240和ORA-00494之前出现过什么错误

Stopping Job queue slave processes, flags = 7
Thu Nov 10 17:52:26 2011
Job queue slave processes stopped
Thu Nov 10 17:54:23 2011
Errors in file /opt/oracle/oracle1/admin/gminer/bdump/gminer_asmb_26101.trc:
ORA-00600: internal error code, arguments: [kffmAllocate_1], [8], [2], [], [], [], [], []
Thu Nov 10 17:54:24 2011
Errors in file /opt/oracle/oracle1/admin/gminer/bdump/gminer_asmb_26101.trc:
ORA-00600: internal error code, arguments: [kffmAllocate_1], [8], [2], [], [], [], [], []
Thu Nov 10 17:54:24 2011
ASMB: terminating instance due to error 486
Termination issued to instance processes. Waiting for the processes to exit
Instance terminated by ASMB, pid = 26101

通过这个,发现和asm有关系,更加确定了自己的猜测

七、分析asm日志

Fri Nov 11 09:44:42 2011
Errors in file /opt/oracle/oracle1/admin/+ASM/bdump/+asm_gmon_24008.trc:
ORA-27091: unable to queue I/O
ORA-27072: File I/O error
HPUX-ia64 Error: 11: Resource temporarily unavailable
Additional information: 4
Additional information: 2044
Additional information: -1

+asm_gmon_24008.trc中内容也就报这些错误,不再贴出

八、mos中查找ORA-27091和ORA-27072
Bug 8236874 – HP-Itanium: Intermittent ORA-27091, ORA-27072, Additional information: 4 on HPUX中有类此描述

  ORA-01115: IO error reading block from file fff (block # bbb) 
  ORA-01110: data file 5: '/dev/xxx'
  ORA-27091: unable to queue I/O 
  ORA-27072: File I/O error 
  Additional information: 4  
  Additional information: xxx
  Additional information: yyy

问题很类此,进一步确信是asm的问题

八、在查询mos中ORA-00240和asm相关主题
发现HP Itanium – ORA-240 or process on ASM & Database hang [ID 1105825.1],阅读完毕,马上查去阅读gminer_ora_3555.trc文件
果然发现和mos中一致的描述

*** SESSION ID:(40.1) 2011-11-11 09:36:59.860
Waited for detached process: DBW0 for 300 seconds:
*** 2011-11-11 09:36:59.860
Dumping diagnostic information for DBW0:
OS pid = 3122
loadavg : 0.00 0.01 0.01
Swapinfo : 
	Avail = 124485.08Mb Used = 62062.20Mb
	Swap free = 62422.88Mb Kernel rsvd = 4492.72Mb
	Free Mem  = 60374.91Mb 
  F S      UID   PID  PPID  C PRI NI             ADDR   SZ            WCHAN    STIME TTY       TIME COMD
1401 S   oracle  3122     1  0 154 20 e00000090de8b980 55372 e0000009203b1340 09:19:12 ?         0:06 ora_dbw0_gminer
Attaching to program: /opt/oracle/oracle1/bin/oracle, process 3122
warning: The shared libraries were not privately mapped; setting a
breakpoint in a shared library will not work until you rerun the program;
stepping over longjmp calls will not work as expected.
Please set the kernel variable "shlib_debug_enable" to 1 to enable the shared library debugging
warning: Load module /opt/oracle/oracle1/lib/libskgxp10.so has been stripped.  
Debugging information is not available.
warning: Load module /opt/oracle/oracle1/lib/libhasgen10.so has been stripped.  
Debugging information is not available.
warning: Load module /opt/oracle/oracle1/lib/libocr10.so has been stripped.  
Debugging information is not available.
warning: Load module /opt/oracle/oracle1/lib/libocrb10.so has been stripped.  
Debugging information is not available.
warning: Load module /opt/oracle/oracle1/lib/libocrutl10.so has been stripped.  
Debugging information is not available.
warning: Load module /opt/oracle/oracle1/lib/libdbcfg10.so has been stripped.  
Debugging information is not available.
warning: Load module /opt/oracle/oracle1/lib/libnnz10.so has been stripped.  
Debugging information is not available.
0xc00000000043e3f0:0 in pw_wait+0x30 () from /usr/lib/hpux64/libc.so.1
(gdb) (gdb) #0  0xc00000000043e3f0:0 in pw_wait+0x30 () from /usr/lib/hpux64/libc.so.1
#1  0x40000000094e5ee0:0 in pw_wait () at sskgp.c:2245
#2  0x4000000002cb36c0:0 in sskgpwwait () at sskgp.c:926
#3  0x4000000002f16440:0 in skgpwwait () at skgp.c:2137
#4  0x4000000002e28770:0 in ksliwat () at ksl.c:7428
#5  0x400000000402a7f0:0 in kslwaitns_timed () at ksl.c:7652
#6  0x4000000002d3de40:0 in kskthbwt () at ksk.c:2263
#7  0x4000000002d3c940:0 in kslwait () at ksl.c:7616
#8  0x40000000041780e0:0 in ksvsubmit () at ksv.c:1975
#9  0x4000000004d5b710:0 in kfncSlaveSubmit () at kfnc.c:2913
#10 0x4000000004d63120:0 in kfncFileIdentify () at kfnc.c:1638
#11 0x4000000004cfb890:0 in kfioIdentify () at kfio.c:1431
#12 0x4000000004137c20:0 in ksfd_osmopn () at ksfd.c:16583
#13 0x40000000044c9850:0 in $cold_ksfdopn+0xc0 () at kfio.c:1654
#14 0x4000000002630aa0:0 in kcfbid () at kcf.c:13447
#15 0x40000000025b72f0:0 in kcfida () at kcf.c:13599
#16 0x4000000002d3bc80:0 in ksbabs () at ksb.c:1390
#17 0x4000000003aed5f0:0 in ksbrdp () at ksb.c:2971
#18 0x4000000002807f50:0 in opirip () at opirip.c:283
#19 0x400000000200f680:0 in $cold_opidrv+0x580 ()
#20 0x400000000277edf0:0 in sou2o () at sou2o.c:86
#21 0x4000000001fff020:0 in $cold_opimai_real+0x280 ()
#22 0x400000000269c5e0:0 in main () at opimai.c:173
(gdb) Detaching from program: /opt/oracle/oracle1/bin/oracle, process 3122

九、HP官网描述

Issue
Any questions on the use of software developed and maintained by Oracle should involve Oracle technical support. Oracle, not HP, supports Oracle software.
All ITRC HP-UX version 11.31 March 2009 patch bundles were applied to two HP Integrity rx3600-based Oracle RAC nodes. After this installation, the crsd.bin process from the Oracle RAC Clustering software seemed to hang at random intervals, causing a cluster check process launched by a third party to wait indefinitely and to eventually consume all available RAM memory.
A situation where crsd.bin hangs may cause RAC to issue a Transfer of Control (TOC), that is, a system restart with crash dump.

Solution
If any patches related to Oracle RAC are included in a patch bundle (whether it is HP-UX specific patching that RAC depends upon or vice versa), RAC requires that the binaries be relinked prior to starting them.
An alternative might be to (temporarily) uninstall the patch bundle.
The relink of the binaries should be done by a DBA, they should know how to perform that action.
In the current situation, relinking the Oracle RAC binaries resolved the issue.

NOTE:
It seems this problem can also be caused by the installation of HP patches PHKL_38762 or PHKL_39145. QXCR1000940361 tracked the problem. This has been fixed now, and the problem is not present anymore when installing the superseded patch PHKL_40208.
The problem of QXCR1000940361 has existed before the fix of PHKL_39145 was introduced so we dont set a patch warning.

十、解决方案
问题定位完毕是因为缺少打HP patches PHKL_38762 or PHKL_39145的补丁导致
根据HP官网的建议,打HP patches PHKL_38762 or PHKL_39145解决这个问题,也可以打上PHKL_40208补丁问题解决

十一、补充说明
1、因为系统以前可以运行,那么我猜测这个bug应该有偶然性,让网友重启asm,然后启动数据库正常
2、在处理问题过程中,因缺少耐心,开始没有仔细阅读trace文件,导致问题分析过程中走了一些弯路
3、这个是我第一次发现因为系统补丁没有打,导致数据库异常