处理fast_recovery_area无剩余空间案例

一、打开数据库报错

[oracle@node1 ora11g]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Mon Dec 19 15:42:04 2011

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup nomount
ORACLE instance started.

Total System Global Area 2137886720 bytes
Fixed Size                  2230072 bytes
Variable Size            1493174472 bytes
Database Buffers          637534208 bytes
Redo Buffers                4947968 bytes
SQL> alter database mount;

Database altered.

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 19489
Session ID: 96 Serial number: 1

二、错误信息
1.alert日志内容

Mon Dec 19 15:23:29 2011
alter database open
Beginning crash recovery of 1 threads
 parallel recovery started with 7 processes
Started redo scan
Completed redo scan
 read 0 KB redo, 0 data blocks need recovery
Started redo application at
 Thread 1: logseq 3, block 93169, scn 12899730
Recovery of Online Redo Log: Thread 1 Group 3 Seq 3 Reading mem 0
  Mem# 0: /opt/oracle/oradata/ora11g/redo03.log
Completed redo application of 0.00MB
Completed crash recovery at
 Thread 1: logseq 3, block 93169, scn 12919731
 0 data blocks read, 0 data blocks written, 0 redo k-bytes read
Mon Dec 19 15:23:29 2011
LGWR: STARTING ARCH PROCESSES
Mon Dec 19 15:23:29 2011
ARC0 started with pid=27, OS id=19539 
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
Mon Dec 19 15:23:30 2011
ARC1 started with pid=28, OS id=19548 
Mon Dec 19 15:23:30 2011
ARC2 started with pid=29, OS id=19550 
Mon Dec 19 15:23:30 2011
ARC3 started with pid=30, OS id=19552 
ARC1: Archival started
ARC2: Archival started
ARC2: Becoming the 'no FAL' ARCH
ARC2: Becoming the 'no SRL' ARCH
ARC1: Becoming the heartbeat ARCH
Errors in file /opt/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_arc2_19550.trc:
ORA-19815: 警告: db_recovery_file_dest_size 字节 (共 4322230272 字节) 已使用 100.00%, 尚有 0 字节可用。
************************************************************************
You have following choices to free up space from recovery area:
1. Consider changing RMAN RETENTION POLICY. If you are using Data Guard,
   then consider changing RMAN ARCHIVELOG DELETION POLICY.
2. Back up files to tertiary device such as tape using RMAN
   BACKUP RECOVERY AREA command.
3. Add disk space and increase db_recovery_file_dest_size parameter to
   reflect the new space.
4. Delete unnecessary files using RMAN DELETE command. If an operating
   system command was used to delete files, then use RMAN CROSSCHECK and
   DELETE EXPIRED commands.
************************************************************************
ARC2: Error 19809 Creating archive log file to '/opt/oracle/fast_recovery_area/ORA11G/archivelog/2011_12_19/o1_mf_1_2_%u_.arc'
Errors in file /opt/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_19489.trc:
ORA-19815: WARNING: db_recovery_file_dest_size of 4322230272 bytes is 100.00% used, and has 0 remaining bytes available.
************************************************************************
You have following choices to free up space from recovery area:
1. Consider changing RMAN RETENTION POLICY. If you are using Data Guard,
   then consider changing RMAN ARCHIVELOG DELETION POLICY.
2. Back up files to tertiary device such as tape using RMAN
   BACKUP RECOVERY AREA command.
3. Add disk space and increase db_recovery_file_dest_size parameter to
   reflect the new space.
4. Delete unnecessary files using RMAN DELETE command. If an operating
   system command was used to delete files, then use RMAN CROSSCHECK and
   DELETE EXPIRED commands.
************************************************************************
ARCH: Error 19809 Creating archive log file to '/opt/oracle/fast_recovery_area/ORA11G/archivelog/2011_12_19/o1_mf_1_1_%u_.arc'
ARCH: Archival stopped, error occurred. Will continue retrying
ORACLE Instance ora11g - Archival Error
ORA-16038: 日志 2 sequence# 2 无法归档
ORA-19809: 超出了恢复文件数的限制
ORA-00312: 联机日志 2 线程 1: '/opt/oracle/oradata/ora11g/redo02.log'
Errors in file /opt/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_19489.trc:
ORA-16038: log 1 sequence# 1 cannot be archived
ORA-19809: limit exceeded for recovery files
ORA-00312: online log 1 thread 1: '/opt/oracle/oradata/ora11g/redo01.log'
USER (ospid: 19489): terminating the instance due to error 16038
Mon Dec 19 15:23:31 2011
System state dump requested by (instance=1, osid=19489), summary=[abnormal instance termination].
System State dumped to trace file /opt/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_diag_19463.trc
Dumping diagnostic data in directory=[cdmp_20111219152331], requested by (instance=1, osid=19489), summary=[abnormal instance termination].
Instance terminated by USER, pid = 19489

2.trace文件信息(ora11g_ora_19489.trc)

*** 2011-12-19 15:23:31.026 4320 krsh.c
ARCH: Error 19809 Creating archive log file to '/opt/oracle/fast_recovery_area/ORA11G/archivelog/2011_12_19/o1_mf_1_1_%u_.arc'
*** 2011-12-19 15:23:31.026 2932 krsi.c
krsi_dst_fail: dest:1 err:19809 force:0 blast:1
DDE: Problem Key 'ORA 312' was flood controlled (0x1) (no incident)
ORA-00312: online log 1 thread 1: '/opt/oracle/oradata/ora11g/redo01.log'
ORA-16038: log 1 sequence# 1 cannot be archived
ORA-19809: limit exceeded for recovery files
ORA-00312: online log 1 thread 1: '/opt/oracle/oradata/ora11g/redo01.log'
kjzduptcctx: Notifying DIAG for crash event
----- Abridged Call Stack Trace -----
ksedsts()+461<-kjzdssdmp()+267<-kjzduptcctx()+232<-kjzdicrshnfy()+53<-ksuitm()+1332<-kcfopd()+5962<-adbdrv()+51834<-opiexe()+18384<-opiosq0()+3870<-kpooprx()
+274<-kpoal8()+829<-opiodr()+916<-ttcpip()+2242<-opitsk()+1673<-opiino()+966<-opiodr()+916<-opidrv()+570
<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+252 
----- End of Abridged Call Stack Trace -----

这个错误很明显:因为数据库归档日志放置在fast_recovery_area中,而空间已满,导致联机日志sequence# 2不能被归档,数据库无法打开

三、修改db_recovery_file_dest_size,打开数据库

SQL>  Alter system set db_recovery_file_dest_size=20G scope=both;

System altered.

SQL> alter database open;

Database altered.

设置较大db_recovery_file_dest_size,先打开数据库,再解决问题(减少down机时间是dba一大准则)

四、删除历史归档日志
从sequence# 为2中很明显看出来,数据库进行了resetlogs打开,所以前面的归档日志,在原则上已经无效(不再使用原始备份集恢复),因为数据库的控制文件中,无原归档日志信息,所以无法使用rman删除归档日志。那只能使用os命令先删除掉历史归档日志,然后再使用rman处理

[root@node1 archivelog]# find ./ -mtime +1| xargs rm -rf
[root@node1 archivelog]# ll
总计 72
drwxr-x--- 2 oracle oinstall 4096 12-18 22:35 2011_12_18
drwxr-x--- 2 oracle oinstall 4096 12-19 13:25 2011_12_19

RMAN> crosscheck archivelog all;

released channel: ORA_DISK_1
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=68 device type=DISK
validation failed for archived log
archived log file name=/opt/oracle/fast_recovery_area/ORA11G/archivelog/2011_10_31/o1_mf_1_5_7bxbhkof_.arc RECID=1 STAMP=766015219
validation failed for archived log
archived log file name=/opt/oracle/fast_recovery_area/ORA11G/archivelog/2011_11_01/o1_mf_1_6_7bxw2gpo_.arc RECID=2 STAMP=766033231
……………………………………
validation failed for archived log
archived log file name=/opt/oracle/fast_recovery_area/ORA11G/archivelog/2011_12_12/o1_mf_1_105_7gc3co97_.arc RECID=132 STAMP=770306728
validation failed for archived log
archived log file name=/opt/oracle/fast_recovery_area/ORA11G/archivelog/2011_12_12/o1_mf_1_106_7gc3cv1w_.arc RECID=123 STAMP=770306728
validation failed for archived log
archived log file name=/opt/oracle/fast_recovery_area/ORA11G/archivelog/2011_12_12/o1_mf_1_107_7gc3mbpr_.arc RECID=127 STAMP=770306728
validation succeeded for archived log
archived log file name=/opt/oracle/fast_recovery_area/ORA11G/archivelog/2011_12_19/o1_mf_1_1_7gxtrlnq_.arc RECID=134 STAMP=770312597
validation succeeded for archived log
archived log file name=/opt/oracle/fast_recovery_area/ORA11G/archivelog/2011_12_19/o1_mf_1_2_7gxtrloz_.arc RECID=135 STAMP=770312597
validation succeeded for archived log
archived log file name=/opt/oracle/fast_recovery_area/ORA11G/archivelog/2011_12_19/o1_mf_1_3_7gxtrodg_.arc RECID=136 STAMP=770312599
validation failed for archived log
archived log file name=/opt/oracle/fast_recovery_area/ORA11G/archivelog/2011_12_12/o1_mf_1_1_7gc3ojqw_.arc RECID=126 STAMP=770306728
……………………………………
validation succeeded for archived log
archived log file name=/opt/oracle/fast_recovery_area/ORA11G/archivelog/2011_12_19/o1_mf_1_20_7gxlq29k_.arc RECID=113 STAMP=770306728
validation succeeded for archived log
archived log file name=/opt/oracle/fast_recovery_area/ORA11G/archivelog/2011_12_19/o1_mf_1_21_7gxl3zdm_.arc RECID=114 STAMP=770306728
Crosschecked 136 objects

RMAN> DELETE EXPIRED  archivelog all;

released channel: ORA_DISK_1
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=68 device type=DISK
List of Archived Log Copies for database with db_unique_name ORA11G
……………………………………
Do you really want to delete the above objects (enter YES or NO)? yes
deleted archived log
……………………………………
deleted archived log
archived log file name=/opt/oracle/fast_recovery_area/ORA11G/archivelog/2011_12_16/o1_mf_1_14_7gpood3n_.arc RECID=115 STAMP=770306728
deleted archived log
archived log file name=/opt/oracle/fast_recovery_area/ORA11G/archivelog/2011_12_17/o1_mf_1_15_7gqhvvhh_.arc RECID=106 STAMP=770306727
Deleted 124 EXPIRED objects

五、补充说明
1.使用os命令删除fast_recovery_area内容后,需要使用crosscheck检测(如:archivelog all,backup等)。
2.然后使用 DELETE EXPIRED命令删除(archivelog all,backup等)
3.fast_recovery_area设置合适大小+合适的策略
4.resetlogs打开数据库后,做好备份
5.fast_recovery_area无剩余空间处理思路
5.1)如果数据库不能登录:重启至mount,增大fast_recovery_area,open数据库,然后使用rman删除历史垃圾数据(备份集,日志,闪回日志等)
5.2)如果数据库可以使用sys登录,增大fast_recovery_area(使其数据库可以正常工作),然后使用rman处理垃圾数据

使用dbms_backup_restore包恢复数据库

Oracle提供了一个包:DBMS_BACKUP_RESTORE.DBMS_BACKUP_RESTORE 包是由dbmsbkrs.sql 和 prvtbkrs.plb 这两个脚本创建的.catproc.sql 脚本运行后会调用这两个包.所以是每个数据库都有的这个包是Oracle服务器和操作系统之间IO操作的接口.由恢复管理器直接调用。而且据说这两个脚本的功能是内建到Oracle的一些库文件中的.
由此可见,我们可以在数据库 nomount 情况下调用这些package ,来达到我们的恢复目的
1、启动数据库到nomount状态

[oracle@node1 ora11g]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Mon Dec 19 13:34:22 2011

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup nomount
ORACLE instance started.

Total System Global Area 2137886720 bytes
Fixed Size                  2230072 bytes
Variable Size            1493174472 bytes
Database Buffers          637534208 bytes
Redo Buffers                4947968 bytes

2、恢复controlfile

SQL> DECLARE
  2  devtype varchar2(256);
  3  done boolean;
  4  BEGIN
  5  devtype:=sys.dbms_backup_restore.deviceAllocate (type=>'',ident=>'t1');
  6  sys.dbms_backup_restore.restoreSetDatafile;
  7  sys.dbms_backup_restore.restoreControlfileTo(cfname=>'/opt/oracle/oradata/ora11g/control01.ctl');
  8  sys.dbms_backup_restore.restoreBackupPiece(done=>done,handle=>'/tmp/ora11g_02mu1avd_1_1.rman', params=>null);
  9  sys.dbms_backup_restore.deviceDeallocate;
 10  END;
 11  /

PL/SQL procedure successfully completed.

3、恢复数据文件

SQL> DECLARE
  2  devtype varchar2(256);
  3  done boolean;
  4  BEGIN
  5  devtype:=sys.dbms_backup_restore.deviceAllocate (type=>'',ident=>'t1');
  6  sys.dbms_backup_restore.restoreSetDatafile;
  7  sys.dbms_backup_restore.restoreDatafileTo(dfnumber=>1,toname=>'/opt/oracle/oradata/ora11g/system01.dbf');
sys.dbms_backup_restore.restoreDatafileTo(dfnumber=>2,toname=>'/opt/oracle/oradata/ora11g/sysaux01.dbf');
sys.dbms_backup_restore.restoreDatafileTo(dfnumber=>3,toname=>'/opt/oracle/oradata/ora11g/undotbs01.dbf');
  8    9   10  sys.dbms_backup_restore.restoreDatafileTo(dfnumber=>4,toname=>'/opt/oracle/oradata/ora11g/users01.dbf');
 11  sys.dbms_backup_restore.restoreDatafileTo(dfnumber=>5,toname=>'/opt/oracle/oradata/ora11g/example01.dbf');
 12  sys.dbms_backup_restore.restoreBackupPiece(done=>done,handle=>'/tmp/ora11g_01mu1aqq_1_1.rman', params=>null);
 13  sys.dbms_backup_restore.deviceDeallocate;
 14  END;
 15  /

PL/SQL procedure successfully completed.

4、启动数据库

SQL> alter database mount;
SQL> alter database recover until cancel using backup controlfile;
SQL> alter database open resetlogs;

5、特别说明
在oracle 10g及其以后版本中,因为rman中有catalog with start命令,可以实现rman备份的加载,所以不需要使用dbms_backup_restore包处理,在oracle 9i及其以前版本中,可能因为没有catalog库,控制文件中又没有了备份集信息,需要采用这种方法处理数据文件还原,然后根据实际情况,使用ALTER DATABASE REGISTER LOGFILE 添加日志,进行恢复

Memory Notification: Library Cache Object loaded into SGA

0.环境

--系统环境
[oracle@bas bdump]$ more /etc/redhat-release 
Red Hat Enterprise Linux AS release 4 (Nahant Update 7)

--数据库版本
SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bi
PL/SQL Release 10.2.0.1.0 - Production
CORE    10.2.0.1.0      Production
TNS for Linux: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production

1.alert日志信息

Sun Dec 18 02:03:38 2011
Memory Notification: Library Cache Object loaded into SGA
Heap size 7607K exceeds notification threshold (2048K)
Details in trace file /opt/app/oracle/admin/BAS/udump/bas_ora_29900.trc

2.bas_ora_29900.trc文件信息

[oracle@bas bdump]$ more  /opt/app/oracle/admin/BAS/udump/bas_ora_29900.trc
/opt/app/oracle/admin/BAS/udump/bas_ora_29900.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /opt/app/oracle/product/10.2.0/db_1
System name:    Linux
Node name:      bas
Release:        2.6.9-78.ELsmp
Version:        #1 SMP Wed Jul 9 15:46:26 EDT 2008
Machine:        x86_64
Instance name: BAS
Redo thread mounted by this instance: 1
Oracle process number: 34
Unix process pid: 29900, image: oracle@bas (TNS V1-V3)

*** 2011-12-18 02:03:35.244
*** SERVICE NAME:(SYS$USERS) 2011-12-18 02:03:35.229
*** SESSION ID:(5465.1) 2011-12-18 02:03:35.229
Memory Notification: Library Cache Object loaded into SGA
Heap size 5249K exceeds notification threshold (2048K)
LIBRARY OBJECT HANDLE: handle=25d38a9d8 mutex=0x25d38ab08(0)

3.警告原因

These are warning messages that should not cause the program responsible for these errors to fail.  They appear as a result of new event messaging mechanism and memory manager in 10g Release 2.
The meaning is that the process is just spending a lot of time in finding free memory extents during an allocate as the memory may be heavily fragmented.  Fragmentation in memory is impossible to eliminate completely, however, continued messages of large allocations in memory indicate there are tuning opportunities on the application.  
The messages do not imply that an ORA-4031 is about to happen. 

从这里可以看出来,这个只是分配大的内存块(超过_kgl_large_heap_warning_threshold参数值)的一个警告信息,不会对系统的性能以及ORA-4031产生什么影响,如果不是很在意这个警告,可以忽略

4.解决方法

In 10g we have a new undocumented parameter that sets the KGL heap size warning threshold.   This parameter was not present in 10gR1.  Warnings are written if heap size exceeds this threshold.
    
Set  _kgl_large_heap_warning_threshold  to a reasonable high value or zero to prevent these warning messages. Value needs to be set in bytes. 

If you want to set this to 8192 (8192 * 1024) and are using an spfile: 

(logged in as "/ as sysdba")

SQL> alter system set "_kgl_large_heap_warning_threshold"=8388608 scope=spfile ; 

SQL> shutdown immediate 
SQL> startup 

If using an "old-style" init parameter, 

Edit the init parameter file and add 

_kgl_large_heap_warning_threshold=8388608

NOTE:  
1)The default threshold in 10.2.0.1 is 2M.   So these messages could show up frequently in some application environments.
2)In 10.2.0.2,  the threshold was increased to 50MB after regression tests, so this should be a reasonable and recommended value. 

参考MOS:330239.1

INSTEAD OF触发器实现视图DML操作

有网友询问了,一个多表关联视图,怎么实现dml操作,其实这个可以使用INSTEAD OF触发器实现
1、准备实验环境
创建两个表和一个视图(两表union关联),并各自插入一条模拟数据

C:\Users\XIFENFEI>sqlplus chf/xifenfei

SQL*Plus: Release 11.2.0.1.0 Production on 星期日 12月 18 10:57:05 2011

Copyright (c) 1982, 2010, Oracle.  All rights reserved.


连接到:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options

SQL> CREATE TABLE XFF_T1 (ID NUMBER PRIMARY KEY, NAME VARCHAR2(30));

表已创建。

SQL> CREATE TABLE XFF_T2 (ID NUMBER PRIMARY KEY, NAME VARCHAR2(30));

表已创建。

SQL> CREATE VIEW V_XFF_T AS SELECT * FROM XFF_T1 UNION ALL SELECT * FROM XFF_T2;


视图已创建。

SQL> INSERT INTO XFF_T1 VALUES (1, 'XFF_T1');

已创建 1 行。

SQL> INSERT INTO XFF_T2 VALUES (2, 'XFF_T2');

已创建 1 行。

SQL> COMMIT;

提交完成。

SQL> SELECT * FROM V_XFF_T;

        ID NAME                           
---------- ------------------------------ 
         1 XFF_T1                         
         2 XFF_T2                         

2、尝试dml操作视图
插入、删除、更新dml操作全部失败

SQL> INSERT INTO V_XFF_T VALUES (3, 'V_XFF_T', );
INSERT INTO V_XFF_T VALUES (3, 'V_XFF_T')
            *
第 1 行出现错误:
ORA-01732: 此视图的数据操纵操作非法


SQL> DELETE FROM V_XFF_T WHERE ID=2;
DELETE FROM V_XFF_T WHERE ID=2
            *
第 1 行出现错误:
ORA-01732: 此视图的数据操纵操作非法


SQL> UPDATE V_XFF_T SET NAME='XIFENFEI' WHERE ID=3;
UPDATE V_XFF_T SET NAME='XIFENFEI' WHERE ID=2
       *
第 1 行出现错误:
ORA-01732: 此视图的数据操纵操作非法

3、创建INSTEAD OF触发器

SQL> CREATE OR REPLACE TRIGGER INSTEADOF_T
  2  INSTEAD OF INSERT OR UPDATE OR DELETE ON V_XFF_T
  3  FOR EACH ROW
  4  BEGIN
  5  IF INSERTING THEN
  6  INSERT INTO XFF_T2 VALUES (:NEW.ID, :NEW.NAME);
  7  ELSIF UPDATING THEN
  8  UPDATE XFF_T2 SET ID = :NEW.ID, NAME = :NEW.NAME
  9  WHERE ID = :OLD.ID;
 10  ELSIF DELETING THEN
 11  DELETE XFF_T2 WHERE ID = :OLD.ID;
 12  END IF;
 13  END;
 14  /

触发器已创建

4、重试dml操作
使用基于INSTEAD OF触发器实现了对复杂视图的dml操作

SQL> INSERT INTO V_XFF_T VALUES (3, 'V_XFF_T');

已创建 1 行。

SQL> SELECT * FROM V_XFF_T;

        ID NAME                           
---------- ------------------------------ 
         1 XFF_T1                         
         2 XFF_T2                         
         3 V_XFF_T                        

SQL> DELETE FROM V_XFF_T WHERE ID=2;

已删除 1 行。

SQL> SELECT * FROM V_XFF_T;

        ID NAME                           
---------- ------------------------------ 
         1 XFF_T1                         
         3 V_XFF_T                        

SQL> UPDATE V_XFF_T SET NAME='XIFENFEI' WHERE ID=3;

已更新 1 行。

SQL> SELECT * FROM V_XFF_T;

        ID NAME                           
---------- ----------------------------- 
         1 XFF_T1                         
         3 XIFENFEI    

SQL> COMMIT;

提交完成。                    

5、查询基表情况
因为编写的INSTEAD OF触发器是对XFF_T2表作用,所以所有关于该视图的操作,都映射到XFF_T2表中

SQL> SELECT * FROM XFF_T1;

        ID NAME                           
---------- ------------------------------ 
         1 XFF_T1                         

SQL> SELECT * FROM XFF_T2;

        ID NAME                           
---------- ------------------------------ 
         3 XIFENFEI                        

sysaux数据文件异常恢复

案例背景:在我接手这个库之前,因为某种原因sysaux表空间的数据文件离线,该库非归档模式,无备份
一、sysaux数据文件离线原因

Mon Jun  7 03:03:22 2010
KCF: write/open error block=0x67009 online=1
     file=3 /opt/app/oracle/oradata/BAS/sysaux01.dbf
     error=27072 txt: 'Linux-x86_64 Error: 5: Input/output error
Additional information: 4
Additional information: 421897
Additional information: -1'
Automatic datafile offline due to write error on
file 3: /opt/app/oracle/oradata/BAS/sysaux01.dbf

因为该数据库是非归档模式,估计以前的dba也是一段时间后发现sysaux被离线,因为不是归档模式,无法恢复,就一直放置着,让库处于这样的状态中。

二、sysaux数据文件online
1、使用bbed修改sysaux数据文件的scn,见:bbed 修改datafile header

2、尝试online过程日志如下

Sat Dec 17 19:33:36 2011
ORACLE Instance BAS (pid = 17) - Error 376 encountered while recovering transaction (70, 41) on object 8964.
Sat Dec 17 19:33:36 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_smon_27197.trc:
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/opt/app/oracle/oradata/BAS/sysaux01.dbf'
Sat Dec 17 19:33:37 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_smon_27197.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 4571)
ORA-01110: data file 1: '/opt/app/oracle/oradata/BAS/system01.dbf'
Sat Dec 17 19:38:38 2011
ORACLE Instance BAS (pid = 17) - Error 376 encountered while recovering transaction (70, 41) on object 8964.
Sat Dec 17 19:38:38 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_smon_27197.trc:
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/opt/app/oracle/oradata/BAS/sysaux01.dbf'
Sat Dec 17 19:38:38 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_smon_27197.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 4571)
ORA-01110: data file 1: '/opt/app/oracle/oradata/BAS/system01.dbf'
Sat Dec 17 19:39:47 2011
ALTER DATABASE RECOVER  datafile 3  
Sat Dec 17 19:39:47 2011
Media Recovery Start
 parallel recovery started with 7 processes
Sat Dec 17 19:39:47 2011
Recovery of Online Redo Log: Thread 1 Group 6 Seq 13545 Reading mem 0
  Mem# 0 errs 0: /opt/app/oracle/oradata/BAS/redo0602.log
  Mem# 1 errs 0: /opt/app/oracle/oradata/BAS/redo0601.log
Sat Dec 17 19:39:47 2011
Recovery of Online Redo Log: Thread 1 Group 7 Seq 13546 Reading mem 0
  Mem# 0 errs 0: /opt/app/oracle/oradata/BAS/redo0702.log
  Mem# 1 errs 0: /opt/app/oracle/oradata/BAS/redo0701.log
Sat Dec 17 19:39:47 2011
Media Recovery Complete (BAS)
Completed: ALTER DATABASE RECOVER  datafile 3  
Sat Dec 17 19:39:58 2011
alter database datafile 3 online
Sat Dec 17 19:39:58 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_smon_27197.trc:
ORA-01157: cannot identify/lock data file 3 - see DBWR trace file
ORA-01110: data file 3: '/opt/app/oracle/oradata/BAS/sysaux01.dbf'
Sat Dec 17 19:39:58 2011
Completed: alter database datafile 3 online

这个过程虽然在sqlplus中提示online成功,但是alert中的错误警告,以及smon进程占用100%的cup资源,最终导致数据库hang住。

2、分析alert日志和trace文件

alert日志中
Sat Dec 17 19:38:38 2011
ORACLE Instance BAS (pid = 17) - Error 376 encountered while recovering transaction (70, 41) on object 8964.

bas_smon_27197.trc中
[oracle@bas bdump]$ more /opt/app/oracle/admin/BAS/bdump/bas_smon_27197.trc
/opt/app/oracle/admin/BAS/bdump/bas_smon_27197.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /opt/app/oracle/product/10.2.0/db_1
System name:    Linux
Node name:      bas
Release:        2.6.9-78.ELsmp
Version:        #1 SMP Wed Jul 9 15:46:26 EDT 2008
Machine:        x86_64
Instance name: BAS
Redo thread mounted by this instance: 1
Oracle process number: 17
Unix process pid: 27197, image: oracle@bas (SMON)

*** SERVICE NAME:() 2011-12-17 19:23:33.179
*** SESSION ID:(5490.1) 2011-12-17 19:23:33.179
SMON: about to recover undo segment 70
ORACLE Instance BAS (pid = 17) - Error 376 encountered while recovering transaction (70, 41) on object 8964.
*** 2011-12-17 19:23:33.188
ksedmp: internal or fatal error
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/opt/app/oracle/oradata/BAS/sysaux01.dbf'

通过这些证明smon在利用undo segment 70在回滚sysaux中的内容,但是因为某种原因该回滚段异常,不能进行回滚,是的smon一直尝试回滚,但是始终不成功,最后数据库hang住,需要解决sysaux的问题,首先需要解决这个回滚段问题(删除异常回滚段)

3、删除异常回滚段,online datafile 3

强制kill掉smon进程,重启数据库
[oracle@bas bdump]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.1.0 - Production on Sat Dec 17 19:52:14 2011

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup mount
ORACLE instance started.

Total System Global Area 8589934592 bytes
Fixed Size                  2034520 bytes
Variable Size            1124074664 bytes
Database Buffers         7398752256 bytes
Redo Buffers               65073152 bytes
Database mounted.

--为了数据库不hang掉,先offline datafile 3
SQL> alter database datafile 3 offline;

Database altered.

SQL> select segment_name,status from dba_rollback_segs;
select segment_name,status from dba_rollback_segs
                                *
ERROR at line 1:
ORA-01219: database not open: queries allowed on fixed tables/views only


SQL> alter database open;

Database altered.

SQL> select segment_name,status from dba_rollback_segs where status='NEEDS RECOVERY';

SEGMENT_NAME                   STATUS
------------------------------ ----------------
_SYSSMU70$                     NEEDS RECOVERY

SQL> create pfile='/tmp/pfile' from spfile;

File created.

关闭数据库,在pfile中增加
*._corrupted_rollback_segments=(_SYSSMU70$)

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup pfile='/tmp/pfile'
ORACLE instance started.

Total System Global Area 8589934592 bytes
Fixed Size                  2034520 bytes
Variable Size            1124074664 bytes
Database Buffers         7398752256 bytes
Redo Buffers               65073152 bytes
Database mounted.
Database opened.
SQL> drop  rollback segment "_SYSSMU70$";

Rollback segment dropped.

SQL> alter database datafile 3 online;
alter database datafile 3 online
*
ERROR at line 1:
ORA-01113: file 3 needs media recovery
ORA-01110: data file 3: '/opt/app/oracle/oradata/BAS/sysaux01.dbf'


SQL> recover  datafile 3 ;
Media recovery complete.
SQL> alter database datafile 3 online;

Database altered.

三、解决坏块问题
1、alert日志中坏块记录

Sat Dec 17 20:33:31 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_smon_27772.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 4571)
ORA-01110: data file 1: '/opt/app/oracle/oradata/BAS/system01.dbf'
Sat Dec 17 20:33:54 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28027.trc:
Sat Dec 17 20:43:32 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_smon_27772.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 4571)
ORA-01110: data file 1: '/opt/app/oracle/oradata/BAS/system01.dbf'

2、查询坏块对象

SQL> SELECT OWNER, SEGMENT_NAME, SEGMENT_TYPE, TABLESPACE_NAME, A.PARTITION_NAME
  2    FROM DBA_EXTENTS A
 WHERE FILE_ID = &FILE_ID
   AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1;  3    4  
Enter value for file_id: 1
old   3:  WHERE FILE_ID = &FILE_ID
new   3:  WHERE FILE_ID = 1
Enter value for block_id: 4571
old   4:    AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
new   4:    AND 4571 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
                                           
OWNER
------------------------------
SEGMENT_NAME
--------------------------------------------------------------------------------
SEGMENT_TYPE       TABLESPACE_NAME                PARTITION_NAME
------------------ ------------------------------ ------------------------------
SYS
SMON_SCN_TIME_SCN_IDX
INDEX              SYSTEM

SQL> alter index SMON_SCN_TIME_SCN_IDX rebulid online;
alter index SMON_SCN_TIME_SCN_IDX rebulid online
                                  *
ERROR at line 1:
ORA-02243: invalid ALTER INDEX or ALTER MATERIALIZED VIEW option

SQL> select table_name from dba_indexes where index_name='SMON_SCN_TIME_SCN_IDX';

TABLE_NAME
------------------------------
SMON_SCN_TIME

3、解决坏块问题

SQL> truncate table SMON_SCN_TIME;
truncate table SMON_SCN_TIME
               *
ERROR at line 1:
ORA-03292: Table to be truncated is part of a cluster

SQL> truncate  cluster SMON_SCN_TIME;
truncate  cluster SMON_SCN_TIME
                  *
ERROR at line 1:
ORA-00943: cluster does not exist

SQL> SELECT dbms_metadata.get_ddl('TABLE','SMON_SCN_TIME','SYS') FROM dual;
 
DBMS_METADATA.GET_DDL('TABLE','SMON_SCN_TIME','SYS')
--------------------------------------------------------------------------------
 
  CREATE TABLE "SYS"."SMON_SCN_TIME"
   (    "THREAD" NUMBER,
        "TIME_MP" NUMBER,
        "TIME_DP" DATE,
        "SCN_WRP" NUMBER,
        "SCN_BAS" NUMBER,
        "NUM_MAPPINGS" NUMBER,
        "TIM_SCN_MAP" RAW(1200),
        "SCN" NUMBER DEFAULT 0,
        "ORIG_THREAD" NUMBER DEFAULT 0           /* for downgrade */
   ) CLUSTER "SYS"."SMON_SCN_TO_TIME" ("THREAD")

SQL> truncate cluster smon_scn_to_time;

Cluster truncated.

SQL> alter system flush buffer_cache;

System altered.

四、解决AUTO_SPACE_ADVISOR_JOB引起bug

Sat Dec 17 21:00:38 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28144.trc:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_1], [0], [0], [1], [], [], [], []
Sat Dec 17 21:00:41 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28144.trc:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_1], [0], [0], [1], [], [], [], []
Sat Dec 17 21:00:44 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28144.trc:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_3], [0], [0], [1], [], [], [], []
Sat Dec 17 21:00:47 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28144.trc:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_1], [0], [0], [1], [], [], [], []
Sat Dec 17 21:00:50 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28144.trc:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_1], [0], [0], [1], [], [], [], []
Sat Dec 17 21:00:54 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28144.trc:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_3], [0], [0], [1], [], [], [], []
Sat Dec 17 21:00:57 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28144.trc:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_3], [0], [0], [1], [], [], [], []
Sat Dec 17 21:01:00 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28144.trc:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_3], [0], [0], [1], [], [], [], []
Sat Dec 17 21:01:03 2011
Errors in file /opt/app/oracle/admin/BAS/bdump/bas_m000_28144.trc:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ_3], [0], [0], [1], [], [], [], []

查看MOS发现(430223.1,785899.1)Segment advisor带来的buffer坏块,可以禁用AUTO_SPACE_ADVISOR_JOB并清空buffer来解决,最终解决办法,升级数据库

SQL> exec dbms_scheduler.disable('AUTO_SPACE_ADVISOR_JOB'); 

PL/SQL procedure successfully completed.

SQL> alter system flush buffer_cache;

System altered.

至此这次数据库sysaux数据文件异常恢复完全结束。再次提醒各位,数据库一定要做好备份和归档工作。