rman通过nfs备份

挂载nfs

[root@xifenfei tmp]# mount -t nfs 192.168.1.90:/tmp/nfs /nfs
[root@xifenfei tmp]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1              18G   12G  5.2G  70% /
tmpfs                 737M     0  737M   0% /dev/shm
/dev/sdb1              20G  7.8G   11G  42% /u01/oracle/oradata
192.168.1.90:/tmp/nfs
                       18G   13G  3.9G  77% /nfs

rman备份

[oracle@xifenfei nfs]$ rman target /

Recovery Manager: Release 10.2.0.1.0 - Production on Wed May 30 16:31:40 2012

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

connected to target database: XFF (DBID=3426707456)

RMAN> backup datafile 1 format '/nfs/rman/system01_%U';

Starting backup at 3-JUN-12
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=146 devtype=DISK
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 05/30/2012 16:32:17
ORA-19602: cannot backup or copy active file in NOARCHIVELOG mode
continuing other job steps, job failed will not be re-run
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
including current control file in backupset
including current SPFILE in backupset
channel ORA_DISK_1: starting piece 1 at 3-JUN-12
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 05/30/2012 16:32:20
ORA-19504: failed to create file "/nfs/rman/system01_02nc9rgh_1_1"
ORA-27054: NFS file system where the file is created or resides is not mounted with correct options
Additional information: 3

重新挂载nfs

[root@xifenfei ~]# umount /nfs
[root@xifenfei tmp]# mount -t nfs -o rw,bg,hard,rsize=32768,wsize=32768,
>vers=3,nointr,timeo=600,tcp 192.168.1.90:/tmp/nfs /nfs

rman重新备份

[oracle@xifenfei ~]$ rman target /

Recovery Manager: Release 10.2.0.1.0 - Production on Wed May 30 16:38:14 2012

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

connected to target database: XFF (DBID=3426707456)

RMAN> backup datafile 1 format '/nfs/rman/system01_%U';

Starting backup at 3-JUN-12
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=143 devtype=DISK
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00001 name=/u01/oracle/oradata/XFF/system01.dbf
channel ORA_DISK_1: starting piece 1 at 3-JUN-12
channel ORA_DISK_1: finished piece 1 at 3-JUN-12
piece handle=/nfs/rman/system01_07nc9rrv_1_1 tag=TAG20120530T163823 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:02:15
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
including current control file in backupset
including current SPFILE in backupset
channel ORA_DISK_1: starting piece 1 at 3-JUN-12
channel ORA_DISK_1: finished piece 1 at 3-JUN-12
piece handle=/nfs/rman/system01_08nc9s07_1_1 tag=TAG20120530T163823 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:02
Finished backup at 3-JUN-12

查看nfs源端文件

[root@xifenfei rman]# pwd
/tmp/nfs/rman
[root@xifenfei rman]# ls -l
total 378300
-rw-r-----  1 54321 54321 379846656 Jun  3 13:45 system01_07nc9rrv_1_1
-rw-r-----  1 54321 54321   7143424 Jun  3 13:45 system01_08nc9s07_1_1

要点说明
Mount Options for Oracle files when used with NFS on NAS devices [ID 359515.1]

kewastUnPackStats(): bad magic 1 (0x0000000036407DFA, 0)

alert日志错误kewastUnPackStats(): bad magic 1 (0x0000000036407DFA, 0)
该错误是出现在win 2008 + 11.2.0.1的数据库中

Sun Jun 03 01:00:05 2012
kewastUnPackStats(): bad magic 1 (0x0000000036407DFA, 0)
kewastUnPackStats(): bad magic 1 (0x0000000036407DFA, 0)
kewastUnPackStats(): bad magic 1 (0x0000000036407DD8, 0)
kewastUnPackStats(): bad magic 1 (0x0000000036407DD8, 0)
kewastUnPackStats(): bad magic 1 (0x0000000036407DD8, 0)
kewastUnPackStats(): bad magic 1 (0x0000000036407DD8, 0)
kewastUnPackStats(): bad magic 1 (0x0000000036407DF0, 0)
kewastUnPackStats(): bad magic 1 (0x0000000036407DF0, 0)
kewastUnPackStats(): bad magic 1 (0x0000000036407DD8, 0)
kewastUnPackStats(): bad magic 1 (0x0000000036407DD8, 0)
Sun Jun 03 02:00:18 2012
kewastUnPackStats(): bad magic 1 (0x0000000035387DD4, 0)
kewastUnPackStats(): bad magic 1 (0x0000000035387DD4, 0)
kewastUnPackStats(): bad magic 1 (0x0000000035387DDC, 0)
kewastUnPackStats(): bad magic 1 (0x0000000035387DDC, 0)
kewastUnPackStats(): bad magic 1 (0x0000000035387DDC, 0)
kewastUnPackStats(): bad magic 1 (0x0000000035387DDC, 0)
kewastUnPackStats(): bad magic 1 (0x0000000035387DDC, 0)
kewastUnPackStats(): bad magic 1 (0x0000000035387DDC, 0)
kewastUnPackStats(): bad magic 1 (0x0000000035387DDC, 0)
kewastUnPackStats(): bad magic 1 (0x0000000035387DDC, 0)
Sun Jun 03 03:00:32 2012
kewastUnPackStats(): bad magic 1 (0x000000001D8F7DD4, 0)
kewastUnPackStats(): bad magic 1 (0x000000001D8F7DD4, 0)
kewastUnPackStats(): bad magic 1 (0x000000001D8F7DDC, 0)
kewastUnPackStats(): bad magic 1 (0x000000001D8F7DDC, 0)
kewastUnPackStats(): bad magic 1 (0x000000001D8F7DDC, 0)
kewastUnPackStats(): bad magic 1 (0x000000001D8F7DDC, 0)
kewastUnPackStats(): bad magic 1 (0x000000001D8F7DD4, 0)
kewastUnPackStats(): bad magic 1 (0x000000001D8F7DD4, 0)
kewastUnPackStats(): bad magic 1 (0x000000001D8F7DD4, 0)
kewastUnPackStats(): bad magic 1 (0x000000001D8F7DD4, 0)
kewastUnPackStats(): bad magic 1 (0x000000001D8F7DDC, 0)

错误原因:
出现该问题是 Bug:8967729/unpublished bug 8730312
Active Session History中某些数据(the moduile, action program, etc. information)未被正确存储,然后awr程序在整点(默认配置)的时候访问V$ACTIVE_SESSION_HISTORY 视图失败,从而出现了该错误.出现该问题可能导致ash和awr使用异常或不准确

问题解决:
1.在Oracle12c release and the 11.2.0.2 and higher patchsets中被修复
2.打上one-off Patch:8730312(one-off补丁没有经过oracle的严格测试,在打该补丁前可能需要进行测试)

ORA-39126: 在 KUPW$WORKER.PUT_DDLS [TABLE_STATISTICS] 中 Worker 发生意外致命错误

使用impdp导入数据报如下错误导致导入终止

处理对象类型 SCHEMA_EXPORT/TABLE/TRIGGER
处理对象类型 SCHEMA_EXPORT/TABLE/INDEX/FUNCTIONAL_AND_BITMAP/INDEX
处理对象类型 SCHEMA_EXPORT/TABLE/INDEX/STATISTICS/FUNCTIONAL_AND_BITMAP/INDEX_STATISTICS
处理对象类型 SCHEMA_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
ORA-39126: 在 KUPW$WORKER.PUT_DDLS [TABLE_STATISTICS] 中 Worker 发生意外致命错误

ORA-06502: PL/SQL: 数字或值错误
LPX-00225: end-element tag "HIST_GRAM_LIST_ITEM" does not match start-element tag "EPVALUE"

ORA-06512: 在 "SYS.DBMS_SYS_ERROR", line 95
ORA-06512: 在 "SYS.KUPW$WORKER", line 9001

----- PL/SQL Call Stack -----
  object      line  object
  handle    number  name
26ABF4B0     20462  package body SYS.KUPW$WORKER
26ABF4B0      9028  package body SYS.KUPW$WORKER
26ABF4B0     16665  package body SYS.KUPW$WORKER
26ABF4B0      3956  package body SYS.KUPW$WORKER
26ABF4B0      9725  package body SYS.KUPW$WORKER
26ABF4B0      1775  package body SYS.KUPW$WORKER
290D454C         2  anonymous block

ORA-39097: 数据泵作业出现意外的错误 -1427
ORA-39065: DISPATCH 中出现意外的主进程异常错误
ORA-01427: 单行子查询返回多个行

作业 "EAS"."SYS_IMPORT_SCHEMA_01" 因致命错误于 15:21:20 停止

从这里可以看出是在执行TABLE_STATISTICS的时候因为EPVALUE列的数据类型和导入数据不匹配,问题发生上面错误,导致impdp job终止.

解决办法
参考文档:[ID 878626.1]
1.如果数据已经expdp导出,建议在导入的时候屏蔽掉统计信息导入EXCLUDE=STATISTICS,导入后使用DBMS_STATS 重新收集统计信息
2.如果数据尚未expdp导出,建议在导出的时候屏蔽掉统计信息导出EXCLUDE=STATISTICS导入后使用DBMS_STATS 重新收集统计信息

undo坏块导致数据库异常终止案例

在处理的众多undo问题的数据库中,这个是第一例遇到因为undo出现坏块导致数据库自动down案例(oracle 9.2.0.8 aix)
undo坏块导致数据库down

Fri Jun  1 00:45:13 2012
Successfully onlined Undo Tablespace 1.
Fri Jun  1 00:45:13 2012
SMON: enabling tx recovery
Fri Jun  1 00:45:13 2012
Database Characterset is ZHS16GBK
Fri Jun  1 00:45:13 2012
SMON: about to recover undo segment 52
SMON: about to recover undo segment 52
SMON: mark undo segment 52 as available
SMON: Restarting fast_start parallel rollback
SMON: about to recover undo segment 52
SMON: mark undo segment 52 as available
SMON: ignoring slave err,downgrading to serial rollback
SMON: about to recover undo segment 52
ORACLE Instance acc1 (pid = 9) - Error 1578 encountered while recovering transaction (52, 29).
Fri Jun  1 00:45:14 2012
Errors in file /oraacc/app/admin/acc/bdump/acc1_smon_734734.trc:
ORA-01578: ORACLE data block corrupted (file # 169, block # 55887)
ORA-01110: data file 169: '/dev/raccount07_01lv'
Fri Jun  1 00:45:15 2012
replication_dependency_tracking turned off (no async multimaster replication found)
Completed: alter database open
Fri Jun  1 00:45:16 2012
Errors in file /oraacc/app/admin/acc/bdump/acc1_pmon_766940.trc:
ORA-00474: SMON process terminated with error
Fri Jun  1 00:45:16 2012
PMON: terminating instance due to error 474
Instance terminated by PMON, pid = 766940

这里可以看出数据库因为回滚段52出现坏块导致回滚的时候smon终止,数据库down

检测坏块

$ dbv file='/dev/raccount07_01lv' blocksize=8192

DBVERIFY: Release 9.2.0.8.0 - Production on Fri Jun 1 01:25:10 2012

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.

DBVERIFY - Verification starting : FILE = /dev/raccount07_01lv

DBV-00200: Block, dba 708893135, already marked corrupted

DBV-00200: Block, dba 708893151, already marked corrupted

DBV-00200: Block, dba 708893263, already marked corrupted


DBVERIFY - Verification complete

Total Pages Examined         : 1048320
Total Pages Processed (Data) : 0
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 1048320
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 0
Total Pages Marked Corrupt   : 3
Total Pages Influx           : 0
Highest block SCN            : 12843497215383 (2990.1545000343)

检测对应文件号/数据块号

SQL> select DBMS_UTILITY.data_block_address_file(708893135) file#,
  2  DBMS_UTILITY.data_block_address_block(708893135) block#  from dual;

     FILE#     BLOCK#
---------- ----------
       169      55759

SQL> select DBMS_UTILITY.data_block_address_file(708893151) file#,
  2  DBMS_UTILITY.data_block_address_block(708893151) block#  from dual;

     FILE#     BLOCK#
---------- ----------
       169      55775

SQL> select DBMS_UTILITY.data_block_address_file(708893263) file#,
  2  DBMS_UTILITY.data_block_address_block(708893263) block#  from dual;

     FILE#     BLOCK#
---------- ----------
       169      55887

解决办法

--隐含参数
_corrupted_rollback_segments= _SYSSMU52$

--open库后
alter session set "_smu_debug_mode"=4;
drop rollback segment "_SYSSMU52$";

连续两次REMOTE_LISTENER 设置为null导致pmon和listener异常

平台系统版本相关信息

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for IBM/AIX RISC System/6000: Version 10.2.0.4.0 - Productio
NLSRTL Version 10.2.0.4.0 - Production

SQL> show parameter cluster;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
cluster_database                     boolean     TRUE
cluster_database_instances           integer     2
cluster_interconnects                string      192.168.16.11

SQL> select to_char(sysdate,'yyyy-mm-dd hh24:mi:ss') "www.orasos.com" from dual;

www.orasos.com
-------------------
2012-05-30 11:07:12

pmon和监听负载
pmon和LISTENER进程负载均比较高

  PID     %CPU ResSize    Char Command    
10617230  72.9  143924 21014  ora_pmon_ahunicom1  
22675560  49.9  142000  1547  oracleahunicom1 (LOCAL=NO)
 5243206  30.6   49728  2579  /oracle10/app/product/db/10.2.0/bin/tnslsnr LISTENER -inherit

监听日志
每秒钟很多类此pmon注册监听信息

27-FEB-2012 14:12:04 * service_update * ahunicom1 * 0
27-FEB-2012 14:12:04 * service_update * ahunicom1 * 0
27-FEB-2012 14:12:04 * service_update * ahunicom1 * 0
27-FEB-2012 14:12:04 * service_update * ahunicom1 * 0
27-FEB-2012 14:12:04 * service_update * ahunicom1 * 0
27-FEB-2012 14:12:04 * service_update * ahunicom1 * 0
27-FEB-2012 14:12:04 * service_update * ahunicom1 * 0
27-FEB-2012 14:12:04 * service_update * ahunicom1 * 0
27-FEB-2012 14:12:04 * service_update * ahunicom1 * 0
27-FEB-2012 14:12:04 * service_update * ahunicom1 * 0
27-FEB-2012 14:12:04 * service_update * ahunicom1 * 0
27-FEB-2012 14:12:04 * service_update * ahunicom1 * 0
27-FEB-2012 14:12:04 * service_update * ahunicom1 * 0
27-FEB-2012 14:12:04 * service_update * ahunicom1 * 0
27-FEB-2012 14:12:04 * service_update * ahunicom1 * 0
27-FEB-2012 14:12:04 * service_update * ahunicom1 * 0
27-FEB-2012 14:12:04 * service_update * ahunicom1 * 0
27-FEB-2012 14:12:04 * service_update * ahunicom1 * 0
27-FEB-2012 14:12:04 * service_update * ahunicom1 * 0
27-FEB-2012 14:12:04 * service_update * ahunicom1 * 0

通过这两点可以确定是因为pmon在不停的动态注册监听导致监听日志,pmon,listener进程异常

查询MOS[ID 982068.1]
问题原因

After altering the value of the parameter REMOTE_LISTENER, 
excessive CPU is seen for the TNS listener process (TNSLSNR) and the listener.log file grows rapidly. 
Alert log confirms the REMOTE_LISTENER parameter in the SPFILE was altered.
Listener.log shows continuous service_update triggered from PMON to the TNS listener, 100's per second. 
REMOTE_LISTENER had been set to null twice

查询alert日志,果真发现:
ALTER SYSTEM SET remote_listener='' SCOPE=BOTH SID='AHUNICOM1';
ALTER SYSTEM SET remote_listener='' SCOPE=BOTH;

解决方案

alter system set remote_listener = 'remote_rac' scope=memory sid = 'AHUNICOM1';
alter system set remote_listener = '' scope=memory sid = 'AHUNICOM1';
--然后重启节点,pmon和监听恢复正常