ALERT_QUE表重建方法

最近关注了下网络上,出现了很多AQ$_ALERT_QT_N的index SYS_IOT_TOP_NNNN坏块引起的数据库异常(主要是SYSAUX表空间),因为他们是IOT表和主键的关系,不能简单的rebuild.查询了一些资料,得到一些信息

ALERT_QUE表用途
The ALERT_QUE is used by the Grid Control and DB Control Management Agents to monitor server-generated alerts

ALERT_QUE表重建方法

--方法1
SQL> connect / as sysdba

SQL>alter system enable restricted session;

To drop server alert schema.

SQL>@$ORACLE_HOME/rdbms/admin/catnoalr.sql

To re-create tables, sequence, type and queue for server alert

SQL>@$ORACLE_HOME/rdbms/admin/dbmsslrt.sql
SQL>@$ORACLE_HOME/rdbms/admin/catalrt.sql

To recompile the invalid objects

SQL>@$ORACLE_HOME/rdbms/admin/utlrp.sql

SQL> alter system disable restricted session; 
 

--方法2
SQL> connect / as sysdba

SQL>alter system enable restricted session;

To drop server alert schema.

SQL>@$ORACLE_HOME/rdbms/admin/catnoalr.sql

Rerun catproc.sql

SQL>@$ORACLE_HOME/rdbms/admin/catproc.sql

SQL> alter system disable restricted session; 

补充说明

By running the script up, the queue tables will be recreated and the messages 
in the queue will be lost.

For 11g you can use catmwin.sql which has the steps to recreate the ALERT_QT. 
Alternatively, for 11g you can use the catproc.sql to recreate.  

If this option may leave DBSNMP.MGMT_BSLN_INTERNAL invalid. 
To validate the same run catsnmp.sql [NOTE:603289.1]

客户端版本导致ORA-00600[kssadd_stage: null parent]

有客户一台应用不能正常工作,报ORA-00600[kssadd_stage: null parent]错误,重启中间件后工作正常.
alert日志

ORA-00600: internal error code, arguments: [kssadd_stage: null parent], [], [], [], [], [], [], []
Tue Jul 17 14:57:37 2012
Trace dumping is performing id=[cdmp_20120717145742]
Tue Jul 17 14:57:39 2012
Errors in file /oracle/10g/admin/fdjdb/udump/fdjdb2_ora_307720.trc:
ORA-00600: internal error code, arguments: [kssadd_stage: null parent], [], [], [], [], [], [], []
Tue Jul 17 14:57:45 2012
Errors in file /oracle/10g/admin/fdjdb/udump/fdjdb2_ora_357344.trc:
ORA-00600: internal error code, arguments: [kssadd_stage: null parent], [], [], [], [], [], [], []

trace文件

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /oracle/10g/db
System name:    AIX
Node name:      ora2
Release:        1
Version:        6
Machine:        00CCFD354C00
Instance name: fdjdb2
Redo thread mounted by this instance: 2
Oracle process number: 89
Unix process pid: 111068, image: oracle@ora2

*** ACTION NAME:() 2012-07-17 15:08:42.043
*** MODULE NAME:(gsrvr.exe) 2012-07-17 15:08:42.043
*** SERVICE NAME:(fdjdb) 2012-07-17 15:08:42.043
*** SESSION ID:(991.44140) 2012-07-17 15:08:42.043
*** 2012-07-17 15:08:42.043
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kssadd_stage: null parent], [], [], [], [], [], [], []
No current SQL statement being executed.
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst+001c          bl       ksedst1              40D1A9663F9E7ABB ?
                                                   6ED89E14D59386B5 ?
ksedmp+0290          bl       ksedst               104A2CDB0 ?
ksfdmp+0018          bl       03F2735C
kgerinv+00dc         bl       _ptrgl
kgeasnmierr+004c     bl       kgerinv              11041D938 ? 700000220FFBFC8 ?
                                                   110000770 ? 7000004FDF0C700 ?
                                                   FFFFFFFFFFF89C0 ?
kssadd_stage+0080    bl       kgeasnmierr          110195490 ? 110450040 ?
                                                   104AC46B8 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 7000004F0FA9208 ?
kqreqa+0058          bl       kssadd_stage         105670038 ? 104CF7BA0 ?
                                                   000000000 ? 000000000 ?
kqrpre1+0850         bl       kqreqa               100203514 ? 1101A2B20 ?
kqrpre+001c          bl       kqrpre1              710000770 ? 000000009 ?
                                                   FFFFFFFFFFF9088 ?
                                                   28A4202200000000 ?
                                                   10012AEE4 ? FFFFFFFFFFF9080 ?
                                                   000000000 ? 11022A3E0 ?
opiosq0+009c         bl       kqrpre               000000000 ? 000000000 ?
                                                   000000000 ? 1101A2B20 ?
                                                   FFFFFFFFFFF9198 ? 1104B7C60 ?
                                                   FFFFFFFFFFF9458 ?
kpooprx+0168         bl       opiosq0              4A00000001 ? 000000001 ?
                                                   000000000 ? A40000000000FF ?
kpoal8+0400          bl       kpooprx              FFFFFFFFFFFB964 ?
                                                   FFFFFFFFFFFB680 ?
                                                   5000000050 ? 100000001 ?
                                                   000000000 ? A40000000000A4 ?
                                                   000000000 ? 1103A1AD8 ?
opiodr+0ae0          bl       _ptrgl
ttcpip+1020          bl       _ptrgl
opitsk+1124          bl       01F971E8
opiino+0990          bl       opitsk               000000000 ? 000000000 ?
opiodr+0ae0          bl       _ptrgl
opidrv+0484          bl       01F96034
sou2o+0090           bl       opidrv               3C02D9A29C ? 4A006E298 ?
                                                   FFFFFFFFFFFF8A0 ?
opimai_real+01bc     bl       01F939B4
main+0098            bl       opimai_real          000000000 ? 000000000 ?
__start+0070         bl       main                 000000000 ? 000000000 ?

--------------------- Binary Stack Dump ---------------------

咨询客户得知访问该数据库的是通过中间件(OCI)+10g Release 1 (10.1) for Windows访问数据库,然后查询MOS[ID 752149.1]发现stack trace
kssadd_stage <- kqreqa <- kqrpre1 <- kqrpre <- opiosq0 <- kpooprx <- kpoal8 <- opiodr <- ttcpip <- opitsk <- opiino <- opiodr <- opidrv <- sou2o <- opimai_real <- main <- start 和客户端版本和访问环境都和unpublished Bug 4937225相似
处理建议
客户端升级到10.2.0.3或者更高版本

8i升级到9i出现ORA-07445[pevm_MOVC_i()+18]

一个朋友数据库从8i升级到9i后,出现ORA-07445[pevm_MOVC_i()+18]错误
alert日志ORA-07445[pevm_MOVC_i()+18]

Mon Jul 16 12:21:54 2012
Errors in file /oracle/admin/ora8/udump/ora8_ora_8938.trc:
ORA-07445: exception encountered: core dump [pevm_MOVC_i()+18] [SIGSEGV] [Address not mapped to object] [0x7] [] []

trace文件

--版本平台信息
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production
ORACLE_HOME = /oracle/product/9.2.0
System name:	Linux
Node name:	localhost.localdomain
Release:	2.6.18-194.el5PAE
Version:	#1 SMP Tue Mar 16 22:00:21 EDT 2010
Machine:	i686
Instance name: ora8
Redo thread mounted by this instance: 1
Oracle process number: 15
Unix process pid: 8938, image: oracle@localhost.localdomain (TNS V1-V3)

--trace信息
*** 2012-07-16 12:21:54.399
*** SESSION ID:(12.6) 2012-07-16 12:21:54.399
Exception signal: 11 (SIGSEGV), code: 1 (Address not mapped to object), addr: 0x7, PC: [0x9bfac06, pevm_MOVC_i()+18]
Registers:
%eax: 0x00000000 %ebx: 0x00000025 %ecx: 0x00000000
%edx: 0xbf93bf50 %edi: 0x00000000 %esi: 0x002ff1d8
%esp: 0xbf93bc28 %ebp: 0xbf93bc60 %eip: 0x09bfac06
%efl: 0x00010296
  pevm_MOVC_i()+6 (0x9bfabfa) mov %edi,0xffffffcc(%ebp)
  pevm_MOVC_i()+9 (0x9bfabfd) mov %esi,0xffffffd0(%ebp)
  pevm_MOVC_i()+12 (0x9bfac00) mov %ebx,0xffffffc8(%ebp)
  pevm_MOVC_i()+15 (0x9bfac03) mov 0x14(%ebp),%eax
> pevm_MOVC_i()+18 (0x9bfac06) movb 0x7(%eax),%dl
  pevm_MOVC_i()+21 (0x9bfac09) mov $0x0,0xfffffff0(%ebp)
  pevm_MOVC_i()+28 (0x9bfac10) movb %dl,0xffffffe0(%ebp)
  pevm_MOVC_i()+31 (0x9bfac13) movb %dl,0xffffffe0(%ebp)
  pevm_MOVC_i()+34 (0x9bfac16) cmpb $0x1,%dl
*** 2012-07-16 12:21:54.407
ksedmp: internal or fatal error
ORA-07445: exception encountered: core dump [pevm_MOVC_i()+18] [SIGSEGV] [Address not mapped to object] [0x7] [] []
----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
ksedmp()+269         call     ksedst()+0           1 ? 0 ? 0 ? 1 ? 64252C31 ?
                                                   6666006C ?
ssexhd()+1108        call     ksedmp()+0           3 ? 0 ? 0 ? 0 ? 0 ? 0 ?
pevm_MOVC_i()+18     signal   ssexhd()+0           B ? BF93B8BC ? BF93B93C ?
pfrrun()+8458        call     pevm_MOVC_i()+0      2FF19C ? 16 ? BE14650 ? 0 ?
pricar()+1277        call     pfrrun()+0           2FF19C ? 1 ? BF93CCFC ?
                                                   AD638A0 ? 2DFBAC ? 0 ?
pricbr()+427         call     pricar()+0           BF93DA88 ? BF93D084 ?
                                                   9BEAE0C ? 1 ? 0 ? 98C93728 ?
prient2()+598        call     pricbr()+0           BF93DA88 ? BF93D084 ? 0 ?
prient()+1438        call     prient2()+0          BF93DA88 ? BF93D084 ? 1 ?
                                                   BF93E4E0 ? 0 ?
kkxrpc()+347         call     prient()+0           BF93DA88 ? AD638A0 ?
                                                   BF93E534 ? 38 ? 1C8C997 ? 0 ?
kporpc()+138         call     kkxrpc()+0           4C ? F ? BF93E63C ?
opiodr()+5238        call     kjushutdown()+2671   4C ? F ? BF93E63C ?
ttcpip()+2124        call     opiodr()+0           4C ? F ? BF93E63C ? 0 ?
Cannot find symbol in /lib/libc.so.6.
opitsk()+1635        call     ttcpip()+0           AD638A0 ? 4C ? BF93E63C ? 0 ?
                                                   BF93EF14 ? BF93EF10 ?
opiino()+602         call     opitsk()+0           0 ? 0 ? AD638A0 ? BE01DE0 ?
                                                   103 ? 0 ?
opiodr()+5238        call     kjushutdown()+2671   3C ? 4 ? BF9402E0 ?
opidrv()+517         call     opiodr()+0           3C ? 4 ? BF9402E0 ? 0 ?
sou2o()+25           call     opidrv()+0           3C ? 4 ? BF9402E0 ?
main()+182           call     sou2o()+0            BF9402C4 ? 3C ? 4 ?
                                                   BF9402E0 ? 0 ? 0 ?
00125E9C             call     main()+0             2 ? BF940384 ? BF940390 ?
                                                   88A810 ? 0 ? 1 ?
 
---------------------Binary Stack Dump ---------------------

--进程信息
Process global information:
     process: 0x962ba0b8, call: 0x96342cd8, xact: (nil), curses: 0x962e4070, usrses: 0x962e4070
  ----------------------------------------
  SO: 0x962ba0b8, type: 2, owner: (nil), flag: INIT/-/-/0x00
  (process) Oracle pid=15, calls cur/top: 0x96342cd8/0x96342cd8, flag: (0) -
            int error: 0, call error: 0, sess error: 0, txn error 0
  (post info) last post received: 196 0 4
              last post received-location: kslpsr
              last process to post me: 962b7828 1 6
              last post sent: 0 0 15
              last post sent-location: ksasnd
              last process posted by me: 962b7828 1 6
    (latch info) wait_event=0 bits=0
    Process Group: DEFAULT, pseudo proc: 0x962d9444
    O/S info: user: oracle, term: UNKNOWN, ospid: 8938
    OSD pid info: Unix process pid: 8938, image: oracle@localhost.localdomain (TNS V1-V3)
    ----------------------------------------
    SO: 0x962e4070, type: 4, owner: 0x962ba0b8, flag: INIT/-/-/0x00
    (session) trans: (nil), creator: 0x962ba0b8, flag: (8000041) USR/- BSY/-/-/-/-/-
              DID: 0001-000F-00000004, short-term DID: 0000-0000-00000000
              txn branch: (nil)
              oct: 0, prv: 0, sql: (nil), psql: 0x98c3b858, user: 95/DDDD
    O/S info: user: mis, term: LANDERSVR3, ospid: 7904:3012, machine: XANDER\LANDERSVR3
              program: c:\orant\bin\f50run32.exe c:\forms\bas9010.fmx
    application name: c:\orant\bin\f50run32.exe c:\forms\bas9010.fmx, hash value=0
    last wait for 'db file sequential read' blocking sess=0x0 seq=1277 wait_time=11
                file#=1, block#=21b, blocks=1

1.该用户程序是从8i升级到9i之后产生该错误
2.报错的访问程序是FORM 5

解决方案
查询MOS[ID 273411.1]发现是因为FORM 5和9i不兼容导致该错误,ORACLE未给出解决方案,言外之意,如果FORM不能升级,那就只能把ORACLE重新降级到8i.

温馨提示
在做oracle数据库升级前,需要实现进行评估,测试,如果是oracle相关软件和oracle数据库结合紧密,升级前最好需要和ORACLE技术人员确认是否兼容.

using backup controlfile 两种使用情况区别

使用备份控制文件和重建控制文件恢复,都需要使用到using backup controlfile命令,但是两种情况下却有着本质的区别
试验准备条件

SQL> select * from v$version;

BANNER
-------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production

SQL> select to_char(sysdate,'yyyy-mm-dd')"www.orasos.com" from dual;

www.orasos.com
--------------------
2012-07-13

SQL> alter database open resetlogs;

Database altered.

SQL> select name from v$controlfile;

NAME
----------------------------------------------------
/u01/oracle/oradata/ora11g/control01.ctl

SQL> !cp /u01/oracle/oradata/ora11g/control01.ctl /tmp/xff.ctl

SQL> alter system checkpoint;

System altered.

SQL> /

System altered.

SQL> /

System altered.

SQL> alter system switch logfile;     

System altered.

SQL> /

System altered.

SQL> /

System altered.

SQL> /

System altered.

SQL> alter system checkpoint;

System altered.

SQL> shutdown abort;    
ORACLE instance shut down.
SQL> !rm /u01/oracle/oradata/ora11g/control01.ctl

SQL> ! cp /tmp/xff.ctl /u01/oracle/oradata/ora11g/control01.ctl

SQL> startup
ORACLE instance started.

Total System Global Area  523108352 bytes
Fixed Size                  1346052 bytes
Variable Size             432014844 bytes
Database Buffers           83886080 bytes
Redo Buffers                5861376 bytes
Database mounted.
ORA-01122: database file 1 failed verification check
ORA-01110: data file 1: '/u01/oracle/oradata/ora11g/system01.dbf'
ORA-01207: file is more recent than control file - old control file

相关说明:
1.通过resetlogs使得试验更加清晰
2.通过多次的checkpoint实现增加scn,switch logfile实现日志组切换
3.通过模拟备份控制文件恢复

查询相关SCN

SQL> set linesize 150
SQL> select file#,to_char(checkpoint_change#,'9999999999999999') "SCN",
  2  to_char(RESETLOGS_CHANGE#,'9999999999999999') "RESETLOGS SCN",FUZZY
  3  from v$datafile_header;

     FILE# SCN                                RESETLOGS SCN                      FUZZY
---------- ---------------------------------- ---------------------------------- ------
         1           2118981                            2118577                  YES
         2           2118981                            2118577                  YES
         3           2118981                            2118577                  YES
         4           2118981                            2118577                  YES
         6           2118981                            2118577                  YES

SQL> select file#,to_char(checkpoint_change#,'999999999999999') "SCN",
  2  to_char(last_change#,'999999999999999')"STOP_SCN" from v$datafile;

     FILE# SCN                              STOP_SCN
---------- -------------------------------- --------------------------------
         1          2118580
         2          2118580
         3          2118580
         4          2118580
         6          2118580

SQL> select CONTROLFILE_CHANGE#   from v$database;

CONTROLFILE_CHANGE#
-------------------
            2118713

做关于控制文件和数据文件dump

SQL> oradebug setmypid;
Statement processed.
SQL> oradebug dump file_hdrs 3;
Statement processed.
SQL> oradebug tracefile_name;
/u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_593.trc
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
[oracle@xifenfei ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Fri Jul 13 03:05:48 2012

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> oradebug setmypid;
Statement processed.
SQL>  oradebug dump controlf 3;
Statement processed.
SQL> oradebug tracefile_name;
/u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_775.trc

分析file_hdrs 3 dump文件

--datafile 1的datafile header信息
Tablespace #0 - SYSTEM  rel_fn:1 
Creation   at   scn: 0x0000.00000007 09/18/2011 17:33:47
Backup taken at scn: 0x0000.00000000 01/01/1988 00:00:00 thread:0
 reset logs count:0x2eff82e3 scn: 0x0000.002053b1
 prev reset logs count:0x2e9e8451 scn: 0x0000.0016eaab
 recovered at 07/13/2012 02:56:18
 status:0x2004 root dba:0x00400208 chkpt cnt: 760 ctl cnt:759
begin-hot-backup file size: 0
Checkpointed at scn:  0x0000.00205545 07/13/2012 03:01:42

--datafile 1的控制文件中信息
DATA FILE #1: 
  name #7: /u01/oracle/oradata/ora11g/system01.dbf
creation size=0 block size=8192 status=0xe head=7 tail=7 dup=1
 tablespace 0, index=1 krfil=1 prev_file=0
 unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00
 Checkpoint cnt:752 scn: 0x0000.002053b4 07/13/2012 02:59:18
 Stop scn: 0xffff.ffffffff 07/13/2012 02:58:43
 Creation Checkpointed at scn:  0x0000.00000007 09/18/2011 17:33:47

通过这里可以看出来:
datafile header的Checkpointed scn(00205545)>controfile datafile的Checkpoint scn(002053b4)
datafile header的checkpiont count(760)>controfile datafile的checkpiont count(752)
所以在数据库open的时候会报ORA-01207错误

尝试恢复数据库

SQL> recover database using backup controlfile;
ORA-00279: change 2118713 generated at 07/13/2012 02:58:43 needed for thread 1
ORA-00289: suggestion : /u01/oracle/oradata/archivelog/ora11g/1_1_788497123.dbf
ORA-00280: change 2118713 for thread 1 is in sequence #1


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}

ORA-00308: cannot open archived log
'/u01/oracle/oradata/archivelog/ora11g/1_1_788497123.dbf'
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3


SQL> 
SQL> select to_char(2118713,'xxxxxxx') from dual;

TO_CHAR(2118713,
----------------
  205439

分析controlf 3 dump文件

***************************************************************************
DATABASE ENTRY
***************************************************************************
 (size = 316, compat size = 316, section max = 1, section in-use = 1,
  last-recid= 0, old-recno = 0, last-recno = 0)
 (extent = 1, blkno = 1, numrecs = 1)
 03/12/2012 22:17:06
 DB Name "ORA11G"
 Database flags = 0x00404000 0x00001000
 Controlfile Creation Timestamp  03/12/2012 22:17:07
 Incmplt recovery scn: 0x0000.00000000
 Resetlogs scn: 0x0000.002053b1 Resetlogs Timestamp  07/13/2012 02:58:43
 Prior resetlogs scn: 0x0000.0016eaab Prior resetlogs Timestamp  05/01/2012 13:14:57
 Redo Version: compatible=0xb200000
 #Data files = 5, #Online files = 5
 Database checkpoint: Thread=1 scn: 0x0000.002053b4
 Threads: #Enabled=1, #Open=1, Head=1, Tail=1
 Max log members = 3, Max data members = 1
 Arch list: Head=0, Tail=0, Force scn: 0x0000.00000000scn: 0x0000.00000000
 Activation ID: 4184707968
 Controlfile Checkpointed at scn:  0x0000.00205439 07/13/2012 02:59:25 <==控制文件checkpiont,控制文件每3秒的一次checkpiont
 thread:0 rba:(0x0.0.0)

通过recover的提示和对于controlf 3 dump文件的分析,可以确定数据库使用备份控制文件恢复,需要改控制文件备份之时开始的所有归档日志

查询数据库当前redo情况

SQL> select member from v$logfile;

MEMBER
-------------------------------------------------------------
/u01/oracle/oradata/ora11g/redo03.log
/u01/oracle/oradata/ora11g/redo02.log
/u01/oracle/oradata/ora11g/redo01.log

SQL> select SEQUENCE#,STATUS,FIRST_CHANGE# ,NEXT_CHANGE# from v$log; 

 SEQUENCE# STATUS                           FIRST_CHANGE# NEXT_CHANGE#
---------- -------------------------------- ------------- ------------
         1 CURRENT                                2118577   2.8147E+14
         0 UNUSED                                       0            0
         0 UNUSED                                       0            0

SQL> archive log list;
Database log mode              No Archive Mode
Automatic archival             Disabled
Archive destination            /u01/oracle/oradata/archivelog/ora11g
Oldest online log sequence     1
Current log sequence           1

SQL> alter system dump logfile '/u01/oracle/oradata/ora11g/redo03.log';

System altered.

SQL> alter system dump logfile '/u01/oracle/oradata/ora11g/redo02.log';

System altered.

SQL> alter system dump logfile '/u01/oracle/oradata/ora11g/redo01.log';

System altered.

因为数据库处于非归档模式,而得到的redo信息主要都是来自控制文件,所以只能通过dump redo来分析当前redo的情况

分析redo log dump

DUMP OF REDO FROM FILE '/u01/oracle/oradata/ora11g/redo03.log'
 Opcodes *.*
 RBAs: 0x000000.00000000.0000 thru 0xffffffff.ffffffff.ffff
 SCNs: scn: 0x0000.00000000 thru scn: 0xffff.ffffffff
 Times: creation thru eternity
 FILE HEADER:
	Compatibility Vsn = 186646528=0xb200000
	Db ID=4173966754=0xf8c9ada2, Db Name='ORA11G'
	Activation ID=4184707968=0xf96d9380
	Control Seq=7660=0x1dec, File size=30720=0x7800
	File Number=3, Blksiz=512, File Type=2 LOG
 descrip:"Thread 0001, Seq# 0000000003, SCN 0x00000020553d-0x000000205540"
 thread: 1 nab: 0x2 seq: 0x00000003 hws: 0x2 eot: 0 dis: 0
 resetlogs count: 0x2eff82e3 scn: 0x0000.002053b1 (2118577)
 prev resetlogs count: 0x2e9e8451 scn: 0x0000.0016eaab (1501867)
 Low  scn: 0x0000.0020553d (2118973) 07/13/2012 03:01:34
 Next scn: 0x0000.00205540 (2118976) 07/13/2012 03:01:35
 Enabled scn: 0x0000.002053b1 (2118577) 07/13/2012 02:58:43
 Thread closed scn: 0x0000.0020553d (2118973) 07/13/2012 03:01:34
 Disk cksum: 0xa716 Calc cksum: 0xa716


DUMP OF REDO FROM FILE '/u01/oracle/oradata/ora11g/redo02.log'
 Opcodes *.*
 RBAs: 0x000000.00000000.0000 thru 0xffffffff.ffffffff.ffff
 SCNs: scn: 0x0000.00000000 thru scn: 0xffff.ffffffff
 Times: creation thru eternity
 FILE HEADER:
	Compatibility Vsn = 186646528=0xb200000
	Db ID=4173966754=0xf8c9ada2, Db Name='ORA11G'
	Activation ID=4184707968=0xf96d9380
	Control Seq=7663=0x1def, File size=30720=0x7800
	File Number=2, Blksiz=512, File Type=2 LOG
 descrip:"Thread 0001, Seq# 0000000005, SCN 0x000000205543-0xffffffffffff"
 thread: 1 nab: 0xffffffff seq: 0x00000005 hws: 0x1 eot: 1 dis: 0
 resetlogs count: 0x2eff82e3 scn: 0x0000.002053b1 (2118577)
 prev resetlogs count: 0x2e9e8451 scn: 0x0000.0016eaab (1501867)
 Low  scn: 0x0000.00205543 (2118979) 07/13/2012 03:01:36
 Next scn: 0xffff.ffffffff 01/01/1988 00:00:00
 Enabled scn: 0x0000.002053b1 (2118577) 07/13/2012 02:58:43
 Thread closed scn: 0x0000.00205543 (2118979) 07/13/2012 03:01:36
 Disk cksum: 0xc3f9 Calc cksum: 0xc3f9
 Terminal recovery stop scn: 0x0000.00000000
 Terminal recovery  01/01/1988 00:00:00
 Most recent redo scn: 0x0000.00000000


DUMP OF REDO FROM FILE '/u01/oracle/oradata/ora11g/redo01.log'
 Opcodes *.*
 RBAs: 0x000000.00000000.0000 thru 0xffffffff.ffffffff.ffff
 SCNs: scn: 0x0000.00000000 thru scn: 0xffff.ffffffff
 Times: creation thru eternity
 FILE HEADER:
	Compatibility Vsn = 186646528=0xb200000
	Db ID=4173966754=0xf8c9ada2, Db Name='ORA11G'
	Activation ID=4184707968=0xf96d9380
	Control Seq=7663=0x1def, File size=30720=0x7800
	File Number=1, Blksiz=512, File Type=2 LOG
 descrip:"Thread 0001, Seq# 0000000004, SCN 0x000000205540-0x000000205543"
 thread: 1 nab: 0x2 seq: 0x00000004 hws: 0x2 eot: 0 dis: 0
 resetlogs count: 0x2eff82e3 scn: 0x0000.002053b1 (2118577)
 prev resetlogs count: 0x2e9e8451 scn: 0x0000.0016eaab (1501867)
 Low  scn: 0x0000.00205540 (2118976) 07/13/2012 03:01:35
 Next scn: 0x0000.00205543 (2118979) 07/13/2012 03:01:36
 Enabled scn: 0x0000.002053b1 (2118577) 07/13/2012 02:58:43
 Thread closed scn: 0x0000.00205540 (2118976) 07/13/2012 03:01:35
 Disk cksum: 0xaa26 Calc cksum: 0xaa26
 Terminal recovery stop scn: 0x0000.00000000
 Terminal recovery  01/01/1988 00:00:00
 Most recent redo scn: 0x0000.00000000

通过对redo dump的分析可以得到:
1.最小的sequence#=3是redo03.log
2.current redo为redo02.log

继续尝试恢复

SQL> recover database using backup controlfile;
ORA-00279: change 2118713 generated at 07/13/2012 02:58:43 needed for thread 1
ORA-00289: suggestion : /u01/oracle/oradata/archivelog/ora11g/1_1_788497123.dbf
ORA-00280: change 2118713 for thread 1 is in sequence #1


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/u01/oracle/oradata/ora11g/redo02.log
ORA-00326: log begins at change 2118979, need earlier change 2118713
ORA-00334: archived log: '/u01/oracle/oradata/ora11g/redo02.log'

关于重建控制文件后使用using backup controlfile总结:恢复的启动控制文件备份的scn,需要该控制文件备份后的所有归档日志.对于当前这个非归档,而且redo被覆盖的库,该方法无法正常恢复

重建控制文件并做controlf 3 dump

SQL> alter database backup controlfile to trace as '/tmp/ctl.trace';

Database altered.

SQL> shutdown immediate
ORA-01109: database not open


Database dismounted.
ORACLE instance shut down.
SQL> STARTUP NOMOUNT
CREATE CONTROLFILE REUSE DATABASE "ORA11G" NORESETLOGS  NOARCHIVELOG
    MAXLOGFILES 16
    MAXLOGMEMBERS 3
    MAXDATAFILES 100
    MAXINSTANCES 8
    MAXLOGHISTORY 292
LOGFILE
  GROUP 1 '/u01/oracle/oradata/ora11g/redo01.log'  SIZE 15M BLOCKSIZE 512,
  GROUP 2 '/u01/oracle/oradata/ora11g/redo02.log'  SIZE 15M BLOCKSIZE 512,
  GROUP 3 '/u01/oracle/oradata/ora11g/redo03.log'  SIZE 15M BLOCKSIZE 512
DATAFILE
  '/u01/oracle/oradata/ora11g/system01.dbf',
  '/u01/oracle/oradata/ora11g/sysaux01.dbf',
  '/u01/oracle/oradata/ora11g/undotbs01.dbf',
  '/u01/oracle/oradata/ora11g/users01.dbf',
  '/u01/oracle/oradata/ora11g/xifenfei02.dbf'
CHARACTER SET AL32UTF8
;
ORACLE instance started.

Total System Global Area  523108352 bytes
Fixed Size                  1346052 bytes
Variable Size             432014844 bytes
Database Buffers           83886080 bytes
Redo Buffers                5861376 bytes
SQL>   2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18  
Control file created.

SQL> oradebug setmypid;
Statement processed.
SQL> oradebug dump controlf 3;
Statement processed.
SQL> oradebug tracefile_name;
/u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_1867.trc

尝试数据库恢复

SQL>  recover database using backup controlfile;
ORA-00279: change 2118981 generated at 07/13/2012 03:01:42 needed for thread 1
ORA-00289: suggestion : /u01/oracle/oradata/archivelog/ora11g/1_5_788497123.dbf
ORA-00280: change 2118981 for thread 1 is in sequence #5


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}

ORA-00308: cannot open archived log
'/u01/oracle/oradata/archivelog/ora11g/1_5_788497123.dbf'
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3


SQL> 
SQL> select to_char(2118981,'xxxxxxx') from dual;

TO_CHAR(2118981,
----------------
  205545

SQL> set linesize 150
SQL> select file#,to_char(checkpoint_change#,'9999999999999999') "SCN",
  2  to_char(RESETLOGS_CHANGE#,'9999999999999999') "RESETLOGS SCN",FUZZY
  3  from v$datafile_header;

     FILE# SCN                                RESETLOGS SCN                      FUZZY
---------- ---------------------------------- ---------------------------------- ------
         1           2118981                            2118577                  YES
         2           2118981                            2118577                  YES
         3           2118981                            2118577                  YES
         4           2118981                            2118577                  YES
         6           2118981                            2118577                  YES

SQL> select file#,to_char(checkpoint_change#,'999999999999999') "SCN",
  2  to_char(last_change#,'999999999999999')"STOP_SCN" from v$datafile;

     FILE# SCN                              STOP_SCN
---------- -------------------------------- --------------------------------
         1          2118981
         2          2118981
         3          2118981
         4          2118981
         6          2118981

分析 controlf 3 dump文件

***************************************************************************
DATABASE ENTRY
***************************************************************************
 (size = 316, compat size = 316, section max = 1, section in-use = 1,
  last-recid= 0, old-recno = 0, last-recno = 0)
 (extent = 1, blkno = 1, numrecs = 1)
 07/13/2012 03:24:51
 DB Name "ORA11G"
 Database flags = 0x00400102 0x00001000
 Controlfile Creation Timestamp  07/13/2012 03:24:51
 Incmplt recovery scn: 0x0000.00000000
 Resetlogs scn: 0x0000.002053b1 Resetlogs Timestamp  07/13/2012 02:58:43
 Prior resetlogs scn: 0x0000.0016eaab Prior resetlogs Timestamp  05/01/2012 13:14:57
 Redo Version: compatible=0xb200000
 #Data files = 5, #Online files = 5
 Database checkpoint: Thread=1 scn: 0x0000.00205543   

***************************************************************************
DATA FILE RECORDS
***************************************************************************
 (size = 520, compat size = 520, section max = 100, section in-use = 6,
  last-recid= 0, old-recno = 0, last-recno = 0)
 (extent = 1, blkno = 11, numrecs = 100)
DATA FILE #1: 
  name #8: /u01/oracle/oradata/ora11g/system01.dbf
creation size=0 block size=8192 status=0x12 head=8 tail=8 dup=1
 tablespace 0, index=1 krfil=1 prev_file=0
 unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00
 Checkpoint cnt:760 scn: 0x0000.00205545 07/13/2012 03:01:42
 Stop scn: 0xffff.ffffffff 07/13/2012 03:24:51
 Creation Checkpointed at scn:  0x0000.00000007 09/18/2011 17:33:47

完成恢复

SQL> recover database using backup controlfile;
ORA-00279: change 2118981 generated at 07/13/2012 03:01:42 needed for thread 1
ORA-00289: suggestion : /u01/oracle/oradata/archivelog/ora11g/1_5_788497123.dbf
ORA-00280: change 2118981 for thread 1 is in sequence #5


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/u01/oracle/oradata/ora11g/redo02.log
Log applied.
Media recovery complete.

关于重建控制文件后使用using backup总结:重建控制文件后,恢复的起点是datafile header scn 最小值,需要改scn之后的所有日志

未收集统计信息对象—执行sql动态采样

在一次ORA-7445导致oracle数据库down掉故障分析中,发现一条类似的sql非常大(通过复制到文档确定该sql大小是5M左右)

SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) 
opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE 
*/ NVL(SUM(C1),0), NVL(SUM(C2),0) FROM (SELECT /*+ IGNORE_WHERE_CLAUSE NO_PARALLEL("DCREDITMSG_00")
 FULL("DCREDITMSG_00") NO_PARALLEL_INDEX("DCREDITMSG_00") */ 1 AS C1, CASE WHEN
 "DCREDITMSG_00"."PHONE_NO"='具体电话号码' OR "DCREDITMSG_00"."PHONE_NO"='具体电话号码' OR 
……………………N多OR "DCREDITMSG_00"."PHONE_NO"='具体电话号码'
"DCREDITMSG_00"."PHONE_NO"='具体电话号码' THEN 1 ELSE 0 END AS C2 FROM 
"BSSADMIN"."DCREDITMSG_00" SAMPLE BLOCK (0.032410 , 1) SEED (1) "DCREDITMSG_00") SAMPLESUB

当时该sql因某种原因导致大量的sql area中很多内存泄露,最终导致数据库down掉.通过实验找出类此奇怪SQL.

创建模拟表

SQL> create table t_xifenfei 
  2  as
  3  select  * from dba_objects;

Table created.

SQL> select count(*) from t_xifenfei;

  COUNT(*)
----------
     74605

SQL> select NUM_ROWS,LAST_ANALYZED from dba_tables 
  2  where  table_name='T_XIFENFEI' and owner='CHF';

  NUM_ROWS LAST_ANALYZE
---------- ------------

得出信息:
1.该表一共有记录数74605条
2.该表未收集统计信息

查看执行计划

SQL>  set autotrace trace exp
SQL> select /*+ dynamic_sampling(t 0) */ * from t_xifenfei t;

Execution Plan
----------------------------------------------------------
Plan hash value: 548923532

--------------------------------------------------------------------------------
| Id  | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            | 88868 |    17M|   299   (2)| 00:00:04 |
|   1 |  TABLE ACCESS FULL| T_XIFENFEI | 88868 |    17M|   299   (2)| 00:00:04 |
--------------------------------------------------------------------------------
--通过hint指定动态采样sql相关对象统计信息,
可以看到我们实际的表记录是74605而数据库采样出来的记录为88868,原则上还是可以接受

SQL> select * from t_xifenfei;

Execution Plan
----------------------------------------------------------
Plan hash value: 548923532

--------------------------------------------------------------------------------
| Id  | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            | 73449 |    14M|   298   (1)| 00:00:04 |
|   1 |  TABLE ACCESS FULL| T_XIFENFEI | 73449 |    14M|   298   (1)| 00:00:04 |
--------------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)
--因为没有统计信息,数据库动态采样sql相关对象统计信息
可以看到我们实际的表记录是74605而数据库采样出来的记录为73449,比手工指定采样准确

对自动采样进行10046跟踪

SQL> conn / as sysdba
Connected.
SQL> oradebug  setmypid
Statement processed.
SQL> oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12
Statement processed.
SQL> select count(*) from CHF.t_xifenfei;

  COUNT(*)
----------
     74605

SQL>  oradebug TRACEFILE_NAME
/u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_27967.trc

分析trace文件

*** 2012-07-12 15:42:34.991
WAIT #0:nam='SQL*Net message from client'ela= 56716427 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1342078954991525
=====================
PARSING IN CURSOR #3063864268 len=404 dep=1 uid=0 oct=3 lid=0 tim=1342078955037387 
hv=4184780033 ad='385d3708' sqlid='3gjvvxzwqxb81'
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) 
opt_param('parallel_execution_enabled', 'false') 
NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),0), NVL(SUM(C2),0) FROM 
(SELECT /*+ NO_PARALLEL("T_XIFENFEI") FULL("T_XIFENFEI") NO_PARALLEL_INDEX("T_XIFENFEI") */ 1 AS C1, 
1 AS C2 FROM "CHF"."T_XIFENFEI" SAMPLE BLOCK (5.790441 , 1) SEED (1) "T_XIFENFEI") SAMPLESUB
END OF STMT
PARSE #3063864268:c=6000,e=5404,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=205916192,tim=1342078955037303
EXEC #3063864268:c=0,e=206,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=205916192,tim=1342078955037901
FETCH #3063864268:c=4998,e=4759,p=0,cr=65,cu=0,mis=0,r=1,dep=1,og=1,plh=205916192,tim=1342078955042730
STAT #3063864268 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=65 pr=0 pw=0 time=4795 us)'
STAT #3063864268 id=2 cnt=4253 pid=1 pos=1 obj=76370 op='TABLE ACCESS SAMPLE T_XIFENFEI 
(cr=65 pr=0 pw=0 time=8247 us cost=19 size=61752 card=5146)'
CLOSE #3063864268:c=0,e=7,dep=1,type=0,tim=1342078955043024
=====================
PARSING IN CURSOR #3063864784 len=35 dep=0 uid=0 oct=3 lid=0 tim=1342078955043465 hv=2174183953 ad='3ed2d700' 
sqlid='fadutqq0tfuhj'
select count(*) from CHF.t_xifenfei
END OF STMT
PARSE #3063864784:c=51991,e=51648,p=0,cr=66,cu=0,mis=1,r=0,dep=0,og=1,plh=2715729601,tim=1342078955043464
EXEC #3063864784:c=0,e=75,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2715729601,tim=1342078955043645
WAIT #3063864784: nam='SQL*Net message to client' ela= 3 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1342078955043705

通过这个trace的分析,果然发现在执行我们需要的sql前,执行了SELECT /* OPT_DYN_SAMP */组成的一个复杂的采样sql语句.

收集统计信息查看执行计划

SQL> EXEC DBMS_STATS.gather_table_stats('CHF','T_XIFENFEI');

PL/SQL procedure successfully completed.

SQL> select NUM_ROWS,LAST_ANALYZED from dba_tables 
  2  where  table_name='T_XIFENFEI' and owner='CHF';

  NUM_ROWS LAST_ANALYZE
---------- ------------
     74605 12-JUL-12

SQL>  set autotrace trace exp
SQL> select * from t_xifenfei;

Execution Plan
----------------------------------------------------------
Plan hash value: 548923532

--------------------------------------------------------------------------------
| Id  | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            | 74605 |  7139K|   298   (1)| 00:00:04 |
|   1 |  TABLE ACCESS FULL| T_XIFENFEI | 74605 |  7139K|   298   (1)| 00:00:04 |
--------------------------------------------------------------------------------
--执行计划未提示dynamic sampling

继续做10046

SQL> conn / as sysdba
Connected.
SQL> oradebug  setmypid
Statement processed.
SQL> oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12
Statement processed.
SQL> select count(*) from CHF.t_xifenfei;

  COUNT(*)
----------
     74605

SQL> oradebug TRACEFILE_NAME
/u01/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_29780.trc

分析trace文件

*** 2012-07-12 16:14:53.914
Oradebug command 'EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12' console output: <none>
WAIT #0: nam='SQL*Net message to client' ela= 8 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1342080893914307

*** 2012-07-12 16:14:59.376
WAIT #0: nam='SQL*Net message from client' ela= 5461608 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1342080899376008
=====================
PARSING IN CURSOR #3063709248 len=35 dep=0 uid=0 oct=3 lid=0 tim=1342080899379562 hv=2174183953 
ad='3ed2d700' sqlid='fadutqq0tfuhj'
select count(*) from CHF.t_xifenfei
END OF STMT

通过这里可以发现,当有了统计信息后,数据库不能再使用那条N多hint的sql去动态采样统计信息.

总结建议
动态采样(Dynamic Sampling)技术的最初提出是在Oracle 9i R2,在段(表,索引,分区)没有分析的情况下,为了使CBO 优化器得到足够的信息以保证做出正确的执行计划而发明的一种技术,可以把它看做分析手段的一种补充。当段对象没有统计信息时(即没有做分析),动态采样技术可以通过直接从需要分析的对象上收集数据块(采样)来获得CBO需要的统计信息。为了cbo,oracle引进了该功能,原则上说是一个很不错的东西,但是偶尔也是会出现一些意外,所以如果发现数据库中有表未做统计分析,建议手工处理下,ORACLE的自动收集统计信息程序也有不靠谱的时候(发现多次10g的库中有部分表未收集统计信息)