shell监控dataguard备库是否正常应用日志

一直在思索怎么去监控dg比较方便,又能够做到比较适用.想到了几种方法:
1.使用主备库两边的alert日志,但是这样的方法需要配置ssh,用来一个节点获取另外一个节点的alert日志
2.通过查询v$archived_log或者其他相关视图,然后主备库进行比较,但是这个需要访问另外一个库,需要另外库的登录信息
3.通过查询备库的v$archived_log视图,粗略评估dg是否工作正常.
这里我选择了3,dg的监控大部分时候是为了让人及时的发现日志应用异常,然后人工干预处理,从而减少修改gap或者重建dg的概率.而这个额监控可以在很大程度上发现dg应用归档日志异常,从而确定dg是否工作正常,如果发现工作异常,及时处理,可以减少很多工作量,甚至拯救你的数据.

#!/bin/bash
source ~/.bash_profile

#check time(M)
export CHECK_M=120
export RESULT_FILE=/tmp/dg_switch_check.log

$ORACLE_HOME/bin/sqlplus -silent "/ as sysdba" <<XFF>/tmp/check_dg.log
set pagesize 0 feedback off verify off heading off echo off
 select ceil((sysdate-next_time)*24*60) "M"
 from v\$archived_log 
 where applied='YES' AND SEQUENCE#=(SELECT MAX(SEQUENCE#) 
 FROM V\$ARCHIVED_LOG WHERE applied='YES');
exit;
XFF
GET_M=`cat /tmp/check_dg.log`
rm /tmp/check_dg.log

if [ ${CHECK_M} -lt ${GET_M} ]; 
then
    echo "check dataguard time:`date`">$RESULT_FILE
    echo "The last time application archivelog happened in $GET_M minutes ago">>$RESULT_FILE
else
 echo ''>$RESULT_FILE
fi

针对这样的脚本,根据你的dg归档切换的频率,设置监控dg的最近一次日志应用与当前时间差,然后判断dg是否工作正常.根据监控程序的特点,可以通过判断结果集文件,然后邮件/短信或者其他方式处理.

因v$archived_log视图记录异常导致dg MRP进程异常

版本信息

操作系统Linux 4.8 x86
数据库版本ORACLE 9.2.0.4

alert日志报错
MRP进程出现异常报ORA-00310/ORA-00334错误

…………
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_75.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_76.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_77.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_78.dbf
Media Recovery Log /u01/oracle/oradata/xifenfei/redo02.log
MRP0: Background Media Recovery terminated with error 310
Thu Nov  8 07:44:39 2012
Errors in file /u01/oracle/admin/lunar/bdump/lunar_mrp0_25625.trc:
ORA-00310: archived log contains sequence 85; sequence 79 required
ORA-00334: archived log: '/u01/oracle/oradata/xifenfei/redo02.log'
Recovery interrupted.
MRP0: Background Media Recovery process shutdown

trace文件

*** SESSION ID:(17.13) 2012-11-08 07:24:12.986
Background Managed Standby Recovery process started
*** 2012-11-08 07:24:18.023
Managed Recovery: Active posted.
*** 2012-11-08 07:41:03.171
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_64.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_65.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_66.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_67.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_68.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_69.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_70.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_71.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_72.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_73.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_74.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_75.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_76.dbf
*** 2012-11-08 07:41:39.083
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_77.dbf
*** 2012-11-08 07:44:39.171
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_78.dbf
Media Recovery Log /u01/oracle/oradata/xifenfei/redo02.log
Background Media Recovery terminated with error 310
ORA-00310: archived log contains sequence 85; sequence 79 required
ORA-00334: archived log: '/u01/oracle/oradata/xifenfei/redo02.log'
----- Redo read statistics for thread 1 -----
Read rate (ASYNC): 21990Kb in 1221.38s => 0.02 Mb/sec
Longest record: 1Kb, moves: 0/92129 (0%)
Change moves: 34869/213735 (16%), moved: 2Mb
----------------------------------------------
*** 2012-11-08 07:44:39.404
Managed Recovery: Not Active posted.
Background Media Recovery process shutdown
*** 2012-11-08 07:44:39.406

猜想数据库恢复需要sequence为79的归档日志,但是该归档日志对应的为文件为redo02.log,而该redo02的seq为85所以使得MRP进程异常

使用rman尝试恢复

RMAN> recover database ;

Starting recover at 08-NOV-12
using target database controlfile instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=8 devtype=DISK

starting media recovery

unable to find archive log
archive log thread=1 sequence=79
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 11/08/2012 08:12:48
RMAN-06054: media recovery requesting unknown log: thread 1 scn 12286829427051

测试证明rman也无法正常的恢复该异常问题

验证猜想

SQL> select name from v$archived_log where SEQUENCE#=79;

NAME
-------------------------------------------------------------------
/u01/oracle/oradata/lunar/arch/1_79.dbf
/u01/oracle/oradata/xifenfei/redo02.log
/u01/oracle/oradata/lunar/arch/1_79.dbf

SQL> select dest_id,name from v$archived_log where SEQUENCE#=79;

   DEST_ID  NAME
----------  ---------------------------------------------------
         2  /u01/oracle/oradata/lunar/arch/1_79.dbf
         1  /u01/oracle/oradata/xifenfei/redo02.log
         1  /u01/oracle/oradata/lunar/arch/1_79.dbf

SQL> select sequence#,group# from v$log;

 SEQUENCE#     GROUP#
---------- ----------
        86          1
        85          2
        87          3

SQL> select member from v$logfile where group#=2;

MEMBER
---------------------------------------------------------------
/u01/oracle/oradata/xifenfei/redo02.log

通过查询上面相关视图,证实了猜想是因为redo log被注册进入了v$archived_log导致该故障,解决该问题的思路是把redo log file从备库控制文件的v$archived_log视图中拿掉.具体方法是:
1.如果主库正常,那直接生成standby controlfile来实现
2.如果主库也是相同情况,那么先重建主库控制文件,然后重建standby controlfile来实现(该方法需要维护窗口)

解决问题思路

--查询主库,确定主库正常
SQL> select name from v$archived_log where SEQUENCE#=79;

NAME
------------------------------------------------------------
/u01/oracle/oradata/xifenfei/archive/1_79.dbf
lunar

--重新创建standby controlfile
###############################################################
注:如果主库和备库的数据文件路径不完全一致,
建议通过设置db_file_name_convert来直接实现备库数据文件路径的转换,
而不建议通过alter database rename file来实现重命名
###############################################################
SQL> ALTER DATABASE CREATE STANDBY CONTROLFILE AS 
   2 '/u01/oracle/oradata/lunar/control01.ctl' reuse;

Database altered.

--重新启动备库
SQL> startup nomount;
ORACLE instance started.

Total System Global Area  353441008 bytes
Fixed Size                   451824 bytes
Variable Size             184549376 bytes
Database Buffers          167772160 bytes
Redo Buffers                 667648 bytes

SQL> alter database mount standby database;

Database altered.

--开启日志应用
SQL> alter database recover managed standby database disconnect from session;

Database altered.

--alert日志
Thu Nov  8 08:28:16 2012
Completed: alter database recover managed standby database di
Thu Nov  8 08:28:20 2012
Restarting dead background process QMN0
QMN0 started with pid=13
Thu Nov  8 08:29:45 2012
Fetching gap sequence for thread 1, gap sequence 79-87
Trying FAL server: xifenfei
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_79.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_80.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_81.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_82.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_83.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_84.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_85.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_86.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_87.dbf
Media Recovery Log /u01/oracle/oradata/lunar/arch/1_88.dbf
Media Recovery Waiting for thread 1 seq# 89

使用dbms_pumpdata执行expdp操作

使用dbms_pumpdata执行expdp操作

set serverout on 
declare 
  h1 number; -- Datapump handle 
  dir_name varchar2(30); -- Directory Name 
  job_status VARCHAR2(30); 
begin 
  dir_name := 'DATA_PUMP_DIR'; 
  h1 := dbms_datapump.open(
  operation =>'EXPORT', --是export还是impport
--导出表配置
  job_mode =>'TABLE',  --job_mode可以为SCHEMA/TABLE等
--导出用户配置
  job_mode =>'SCHEMA',
  remote_link => NULL, --是否使用dblink导出(就是NETWORK_LINK)
  job_name =>'TABLE_XFF' --job_name expdpjob的名称
  ); 

  dbms_datapump.add_file(handle =>h1, 
                         filename => 'XIFENFEI.DMP', 
                         directory => dir_name, 
                         filetype => DBMS_DATAPUMP.KU$_FILE_TYPE_DUMP_FILE, 
                         reusefile => 1); -- value of 1 instructs to overwrite existing file 

  dbms_datapump.add_file(handle =>h1, 
                         filename => 'XIFENFEI.LOG', 
                         directory => dir_name, 
                         filetype => DBMS_DATAPUMP.KU$_FILE_TYPE_LOG_FILE, 
                         reusefile => 1); 

--导出表配置
  dbms_datapump.metadata_filter(handle =>h1, 
                         name => 'TABLE_FILTER', 
                         value => 'CHF.T_XIFENFEI'); 

--导出用户配置
  dbms_datapump.metadata_filter (handle => dp_handle, 
                                 name => 'SCHEMA_EXPR', 
                                 value => 'IN (''CHF'')'); 
-- Start the job. 
  dbms_datapump.start_job(h1); 
  dbms_datapump.wait_for_job (handle => dp_handle, 
                              job_state => job_status); 
  dbms_output.put_line ('DataPump Export - '||to_char(sysdate,'DD/MM/YYYY HH24:MI:SS')||' Status '||job_status); 
  begin 
     dbms_datapump.detach(handle => h1); 
   end; 
end; 
/

awrload导入awr数据出现两种常见错误说明

ORA-06502/ORA-06512错误

SQL> @?/rdbms/admin/awrload.sql
~~~~~~~~~~
AWR LOAD
~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~  This script will load the AWR data from a dump file. The   ~
~  script will prompt users for the following information:    ~
~     (1) name of directory object                            ~
~     (2) name of dump file                                   ~
~     (3) staging schema name to load AWR data into           ~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Specify the Directory Name
~~~~~~~~~~~~~~~~~~~~~~~~~~

Directory Name                 Directory Path
------------------------------ -------------------------------------------------
DATA_PUMP_DIR                  /u01/oracle/admin/ora11g/dpdump/
ORACLE_OCM_CONFIG_DIR          /u01/oracle/oracle/product/11.2.0/db_1/ccr/state
XMLDIR                         /u01/oracle/oracle/product/11.2.0/db_1/rdbms/xml

Choose a Directory Name from the list above (case-sensitive).

Enter value for directory_name: DATA_PUMP_DIR

Using the dump directory: DATA_PUMP_DIR

Specify the Name of the Dump File to Load
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Please specify the prefix of the dump file (.dmp) to load:

Enter value for file_name: awrdat_70441_70885-vpos-primary

Loading from the file name: awrdat_70441_70885-vpos-primary.dmp
begin
*
ERROR at line 1:
ORA-06502: PL/SQL: numeric or value error: character string buffer too small
ORA-06512: at line 2

因为导出来的awr数据库的编码是AL32UTF8,而现在的编码是ZHS16GBK,所以解决方法是设置NLS_LANG为合适编码

[oracle@xifenfei ~]$ env|grep NLS_LANG
NLS_LANG=AMERICAN_AMERICA.ZHS16GBK
[oracle@xifenfei ~]$ export NLS_LANG=AMERICAN_AMERICA.AL32UTF8

ORA-31640

SQL> @?/rdbms/admin/awrload.sql
~~~~~~~~~~
AWR LOAD
~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~  This script will load the AWR data from a dump file. The   ~
~  script will prompt users for the following information:    ~
~     (1) name of directory object                            ~
~     (2) name of dump file                                   ~
~     (3) staging schema name to load AWR data into           ~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Specify the Directory Name
~~~~~~~~~~~~~~~~~~~~~~~~~~

Directory Name                 Directory Path
------------------------------ -------------------------------------------------
DATA_PUMP_DIR                  /u01/oracle/admin/ora11g/dpdump/
ORACLE_OCM_CONFIG_DIR          /u01/oracle/oracle/product/11.2.0/db_1/ccr/state
XMLDIR                         /u01/oracle/oracle/product/11.2.0/db_1/rdbms/xml

Choose a Directory Name from the list above (case-sensitive).

Enter value for directory_name: DATA_PUMP_DIR

Using the dump directory: DATA_PUMP_DIR

Specify the Name of the Dump File to Load
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Please specify the prefix of the dump file (.dmp) to load:

Enter value for file_name: awrdat_70441_70885-vpos-primary  <--注意输入

Loading from the file name: awrdat_70441_70885-vpos-primary.dmp

…………

| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|  Loading the AWR data from the following
|  directory/file:
|   /u01/oracle/admin/ora11g/dpdump/
|   awrdat_70441_70885-vpos-primar.dmp    <--提示的文件名
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|  *** AWR Load Started ...
|
|  This operation will take a few moments. The
|  progress of the AWR load operation can be
|  monitored in the following directory/file:
|   /u01/oracle/admin/ora11g/dpdump/
|   awrdat_70441_70885-vpos-primar.log
|
DBMS_DATAPUMP.ADD_FILE(dump file)
ORA-39001: invalid argument value
Exception encountered in AWR_LOAD
begin
*
ERROR at line 1:
ORA-20115: datapump import encountered error:
ORA-39001: invalid argument value
ORA-39000: bad dump file specification
ORA-31640: unable to open dump file 
"/u01/oracle/admin/ora11g/dpdump/awrdat_70441_70885-vpos-primar.dmp" for read
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3
ORA-06512: at "SYS.DBMS_SWRF_INTERNAL", line 1717
ORA-06512: at line 3


begin
*
ERROR at line 1:
ORA-20106: AWR tables do not exist for the 'AWR_STAGE' user
ORA-06512: at "SYS.DBMS_SWRF_INTERNAL", line 2920
ORA-00942: table or view does not exist
ORA-06512: at line 3


... Dropping AWR_STAGE user

End of AWR Load

这里我们可以看到,dmp文件名为31个字符,而在提示文件名的时候是30个字符,从而出现了dmp文件不存在而导致的相关错误,解决方案重命名dmp文件,名称不超过30个字符

windows中使用dd程序

很多习惯了Linux和Uinx的人都困惑windows中没有dd这样的好命令,经过不懈的努力终于发现了win dd
帮助命令

E:\>dd --help
rawwrite dd for windows version 0.6beta3.
Written by John Newbigin <jn@it.swin.edu.au>
This program is covered by terms of the GPL Version 2.

dd [bs=SIZE] [count=BLOCKS] [if=FILE] [of=FILE] [seek=BLOCKS] [skip=BLOCKS] 
[--size] [--list] [--progress]
SIZE and BLOCKS may have one of the following suffix:
 k = 1024
 M = 1048576
 G = 1073741824
default block size (bs) is 512 bytes
skip specifies the starting offset of the input file (if)
seek specifies the starting offset of the output file (of)

磁盘列表

E:\>dd --list
rawwrite dd for windows version 0.6beta3.
Written by John Newbigin <jn@it.swin.edu.au>
This program is covered by terms of the GPL Version 2.

Win32 Available Volume Information
\\.\Volume{7cf2f59c-6207-11e1-b81a-806e6f6e6963}\
  link to \\?\Device\HarddiskVolume1
  fixed media
  Mounted on \\.\c:

\\.\Volume{9ba58399-a85a-11e1-b19f-005056c00008}\
  link to \\?\Device\HarddiskVolume5
  removeable media
  Mounted on \\.\h:

\\.\Volume{7cf2f59d-6207-11e1-b81a-806e6f6e6963}\
  link to \\?\Device\HarddiskVolume2
  fixed media
  Mounted on \\.\d:

\\.\Volume{7cf2f59e-6207-11e1-b81a-806e6f6e6963}\
  link to \\?\Device\HarddiskVolume3
  fixed media
  Mounted on \\.\e:

\\.\Volume{7cf2f59f-6207-11e1-b81a-806e6f6e6963}\
  link to \\?\Device\HarddiskVolume4
  fixed media
  Mounted on \\.\f:

\\.\Volume{d0f53c6a-6272-11e1-8729-806e6f6e6963}\
  link to \\?\Device\CdRom0
  CD-ROM
  Mounted on \\.\g:


NT Block Device Objects
\\?\Device\CdRom0
  size is 2147483647 bytes

Virtual input devices
 /dev/zero   (null data)
 /dev/random (pseudo-random data)
 -           (standard input)

Virtual output devices
 -           (standard output)
 /dev/null   (discard the data)

拷贝磁盘头(asm修复常见)

E:\>dd if=\\.\Volume{9ba58399-a85a-11e1-b19f-005056c00008} 
of=f:\usb.img bs=1M count=10
rawwrite dd for windows version 0.6beta3.
Written by John Newbigin <jn@it.swin.edu.au>
This program is covered by terms of the GPL Version 2.

10+0 records in
10+0 records out

E:\>dir f:\usb.img
 驱动器 F 中的卷没有标签。
 卷的序列号是 000F-FFCB

 f:\ 的目录

2012/11/13  23:39        10,485,760 usb.img
               1 个文件     10,485,760 字节
               0 个目录 30,501,912,576 可用字节

拷贝文件

E:\>dd if=f:\usb.img   of=f:\usb_new.img bs=1M count=5
rawwrite dd for windows version 0.6beta3.
Written by John Newbigin <jn@it.swin.edu.au>
This program is covered by terms of the GPL Version 2.

5+0 records in
5+0 records out

E:\>dir f:\usb*
 驱动器 F 中的卷没有标签。
 卷的序列号是 000F-FFCB

 f:\ 的目录

2012/11/13  23:39        10,485,760 usb.img
2012/11/13  23:46         5,242,880 usb_new.img
               2 个文件     15,728,640 字节
               0 个目录 30,496,669,696 可用字节