win平台rman备份和删除dg备库归档日志脚本

总觉得使用windows跑oracle是不靠谱的事情,可以这个世界上总有很多人喜欢做类似这样的事情,对于数据库比较常见的两件事情:rman和删除dg备库归档日志,在linux/unix平台上使用shell实现很简单,可是跑到win里面,就变的烦了,不是因为其麻烦,而是因为用的人少,不知道怎么下手处理该事情,我编写了简单的实现初级功能的win下面rman备份和删除备库归档日志脚本,供大家参考,也更加欢迎朋友提出来更加好的处理方法(win是真心的不懂)
rman备份脚本

--backup_oracle.bat文件
rman target / cmdfile=D:\backup\rman\backup_db.rman 
log=d:/backup/rman/logfile/rmanlog%date:~0,4%%date:~5,2%%date:~8,2%.log

--backup_db.rman文件
CONFIGURE RETENTION POLICY TO REDUNDANCY = 2;
CONFIGURE DEVICE TYPE DISK PARALLELISM 2;
CONFIGURE DEFAULT DEVICE TYPE TO DISK;
backup filesperset = 5 as compressed backupset database format 'd:/backup/rman/full_%U.rman';
sql 'alter system archive log current';
backup  filesperset = 50 as compressed backupset archivelog  all format 'd:/backup/rman/arch_%U.rman'  delete input;
DELETE noprompt OBSOLETE;
crosscheck backup;
delete noprompt expired backup;
backup  format 'd:/backup/rman/ctl_%U.rman' current controlfile;
backup spfile format 'd:/backup/rman/spfile_%U.rman' ;
exit;

backup_oracle.bat文件加入到计划任务即可

删除dg备库归档日志(已经应用)

--delete_dg_archivelog.bat
rem 注意修改 部署目录
cd D:\win_xifenfei
d:
rem 注意delete_archive.sql 查询是否有记录

echo delete archivelog staring > delete_archivelog.bak
sqlplus / as sysdba @delete_archive.sql
echo rman target / cmdfile=rman_checkcross.rman>>delete_archivelog.bat 
delete_archivelog.bat >>delete_dg_archivelog_%DATE:~0,4%%DATE:~5,2%%DATE:~8,2%".log
exit

--delete_archive.sql
set lines 150
col name for a150
set pagesize 0 feedback off verify off heading off echo off
spool delete_archivelog.bat
select 'del '||name from v$archived_log where APPLIED='YES' AND NAME IS NOT NULL and DEST_ID=1;
spool off
exit;

--rman_checkcross.rman
crosscheck archivelog all;
delete noprompt expired archivelog all;
exit

delete_dg_archivelog.bat加入到计划任务即可

db_block_checksum实质是通过flg_kcbh来控制block checksum

db_block_checksum 用于DBWn和direct loader数据块写入到磁盘时,基于块内的所有字节计算得出一个校验值并将其写入块头。在该参数设置为typical和full时,当读入时候重新计算校验和写出时候的校验对比,如果不同则认为是块损坏。如果设置为FULL模式,则基于update/delete应用程序语句级别的改变发生后,校验值会被重新计算并写入。同时对于日志块,在写入之前,同样会生产校验值并写入到块头。该参数主要是防止IO硬件和IO子系统的错误。
这里提示我们是在系统参数级别使用db_block_checksum来控制block是否进行验证,那在block本身级别,是否有类似的值来控制,实现对block值的checksum?通过dump结合bbed给出相关答案
db_block_checksum为TYPICAL测试

CDB_CDB$ROOT@SYS> show parameter db_block_checksum;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_block_checksum                    string      TYPICAL
CDB_CDB$ROOT@SYS> conn / as sysdba
已连接。
CDB_CDB$ROOT@SYS> alter session set container=pdb;

会话已更改。

CDB_CDB$ROOT@SYS> alter database open;

数据库已更改。

CDB_CDB$ROOT@SYS> alter session set current_schema=chf;

会话已更改。

CDB_CDB$ROOT@SYS> create table t_xifenfei(id number,name varchar2(100));

表已创建。

CDB_CDB$ROOT@SYS> insert into t_xifenfei values(1,'www.orasos.com');

已创建 1 行。

CDB_CDB$ROOT@SYS> alter system checkpoint;

系统已更改。

CDB_CDB$ROOT@SYS>  select dbms_rowid.rowid_relative_fno(rowid) file_no,
2  dbms_rowid.rowid_block_number(rowid) block_no  from t_xifenfei;

   FILE_NO   BLOCK_NO
---------- ----------
         9      19229

CDB_CDB$ROOT@SYS> SELECT NAME FROM V$DATAFILE WHERE FILE#=9;

NAME
---------------------------------------------------------------------
E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF

dump 该block

buffer tsn: 3 rdba: 0x02404b1d (9/19229)
scn: 0xb8c.3c232935 seq: 0x01 flg: 0x06 tail: 0x29350601
frmt: 0x02 chkval: 0xe08b type: 0x06=trans data
Hex dump of block: st=0, typ_found=1

tab 0, row 0, @0x1f81
tl: 23 fb: --H-FL-- lb: 0x1  cc: 2
col  0: [ 2]  c1 02
col  1: [16]  77 77 77 2e 78 69 66 65 6e 66 65 69 2e 63 6f 6d
end_of_block_dump
End dump data blocks tsn: 3 file#: 9 minblk 19229 maxblk 19229

这里可以看到因为db_block_checksum=TYPICAL,因此插入记录的时候,dump block发现flg: 0x06 和 chkval: 0xe08b

使用bbed查看相关记录
补充:在win系统中,bbed查看block和block num相差1

BBED> p kcbh
struct kcbh, 20 bytes                       @0
   ub1 type_kcbh                            @0        0x06
   ub1 frmt_kcbh                            @1        0xa2
   ub1 spare1_kcbh                          @2        0x00
   ub1 spare2_kcbh                          @3        0x00
   ub4 rdba_kcbh                            @4        0x02404b1d
   ub4 bas_kcbh                             @8        0x3c232935
   ub2 wrp_kcbh                             @12       0x0b8c
   ub1 seq_kcbh                             @14       0x01
   ub1 flg_kcbh                             @15       0x06 (KCBHFDLC, KCBHFCKV)
   ub2 chkval_kcbh                          @16       0xe08b
   ub2 spare3_kcbh                          @18       0x0000

通过对比bbed和dump出来数据,可以得出flg: 0x06(flg_kcbh),chkval: 0xe08b(chkval_kcbh)

使用bbed修改block验证chkval_kcbh
提问:在使用bbed修改block的时候,一般都需要使用sum apply 处理下,为什么呢?

BBED> d /v
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19230   Offsets: 8165 to 8191  Dba:0x00000000
-------------------------------------------------------
 2c010202 c1021077 77772e78 6966656e l ,...?.www.xifen
 6665692e 636f6d01 063529            l fei.com..5)

 <16 bytes per line>

--没有修改任何值,sum的current和required值相同,而且和chkval_kcbh也相同
BBED> sum
Check value for File 0, Block 19230:
current = 0xe08b, required = 0xe08b

--尝试修改值
BBED> m /x 78 offset 8173
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19230            Offsets: 8173 to 8191           Dba:0x00000000
------------------------------------------------------------------------
 78772e78 6966656e 6665692e 636f6d01 063529

 <32 bytes per line>

BBED> d /v offset 8165
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19230   Offsets: 8165 to 8191  Dba:0x00000000
-------------------------------------------------------
 2c010202 c1021077 78772e78 6966656e l ,...?.wxw.xifen
 6665692e 636f6d01 063529            l fei.com..5)

 <16 bytes per line>

--发现current<>required
BBED> sum
Check value for File 0, Block 19230:
current = 0xe08b, required = 0xef8b

--apply把current修改为required值
BBED> sum apply
Check value for File 0, Block 19230:
current = 0xef8b, required = 0xef8b

--发现chkval_kcbh也修改为了required值
BBED> p kcbh.chkval_kcbh
ub2 chkval_kcbh                             @16       0xef8b

通过这里可以发现,在flg_kcbh含(KCBHFCKV)的情况下,如果block发生改变,则运行sum apply之后chkval_kcbh也发生改变

继续dump block

buffer tsn: 3 rdba: 0x02404b1d (9/19229)
scn: 0xb8c.3c232935 seq: 0x01 flg: 0x06 tail: 0x29350601
frmt: 0x02 chkval: 0xef8b type: 0x06=trans data
Hex dump of block: st=0, typ_found=1


tab 0, row 0, @0x1f81
tl: 23 fb: --H-FL-- lb: 0x1  cc: 2
col  0: [ 2]  c1 02
col  1: [16]  77 78 77 2e 78 69 66 65 6e 66 65 69 2e 63 6f 6d
end_of_block_dump
End dump data blocks tsn: 3 file#: 9 minblk 19229 maxblk 19229

证明上述结论正确:dump.flg=bbed.flg_kcbh,dump.chkval=bbed.chkval_kcbh

db_block_checksum为FALSE测试

CDB_CDB$ROOT@SYS> alter system set db_block_checksum=false;

系统已更改。

CDB_CDB$ROOT@SYS> drop table t_xifenfei_new purge;

表已删除。

CDB_CDB$ROOT@SYS> alter session set current_schema=chf;

会话已更改。

CDB_CDB$ROOT@SYS>  create table t_xifenfei_new(id number,name varchar2(100));

表已创建。

CDB_CDB$ROOT@SYS>  insert into t_xifenfei_new values(1,'www.orasos.com');

已创建 1 行。

CDB_CDB$ROOT@SYS> commit;

提交完成。

CDB_CDB$ROOT@SYS>  select dbms_rowid.rowid_relative_fno(rowid) file_no,
 2  dbms_rowid.rowid_block_number(rowid) block_no  from t_xifenfei_new;

   FILE_NO   BLOCK_NO
---------- ----------
         9      19237

CDB_CDB$ROOT@SYS> alter system checkpoint;

系统已更改。

CDB_CDB$ROOT@SYS> alter system dump datafile 9 block 19237;

系统已更改。

dump block分析

buffer tsn: 3 rdba: 0x02404b25 (9/19237)
scn: 0xb8c.3c23c8b2 seq: 0x01 flg: 0x02 tail: 0xc8b20601
frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1

tab 0, row 0, @0x1f83
tl: 21 fb: --H-FL-- lb: 0x1  cc: 2
col  0: [ 2]  c1 02
col  1: [14]  77 77 77 2e 6f 72 61 73 6f 73 2e 63 6f 6d
end_of_block_dump
End dump data blocks tsn: 3 file#: 9 minblk 19237 maxblk 19237

这里可以发现当设置db_block_checksum=false之时,插入数据,显示flg: 0x02,chkval: 0x0000,由此猜测无对block写入进行部分验证(tailchk依然验证)

使用bbed修改block

BBED> set filename 'E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF'
        FILENAME        E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF

BBED> set blocksize 19238
BBED-00108: illegal BLOCKSIZE (19238) specified


BBED> set blocksize 8192
        BLOCKSIZE       8192

BBED> set block 8192
        BLOCK#          8192

BBED> set blocksize 19238
BBED-00108: illegal BLOCKSIZE (19238) specified


BBED> set block 19238
        BLOCK#          19238

BBED> p kcbh
struct kcbh, 20 bytes                       @0
   ub1 type_kcbh                            @0        0x06
   ub1 frmt_kcbh                            @1        0xa2
   ub1 spare1_kcbh                          @2        0x00
   ub1 spare2_kcbh                          @3        0x00
   ub4 rdba_kcbh                            @4        0x02404b25
   ub4 bas_kcbh                             @8        0x3c23c8b2
   ub2 wrp_kcbh                             @12       0x0b8c
   ub1 seq_kcbh                             @14       0x01
   ub1 flg_kcbh                             @15       0x02 (KCBHFDLC)
   ub2 chkval_kcbh                          @16       0x0000
   ub2 spare3_kcbh                          @18       0x0000
--这里看到flg_kcbh与chkval_kcbh和dump结果一致

BBED> sum
Check value for File 0, Block 19238:
current = 0x0000, required = 0x0000
--required为0,表示不验证

BBED> d /v offset 8174
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238   Offsets: 8174 to 8191  Dba:0x00000000
-------------------------------------------------------
 7777772e 6f726173 6f732e63 6f6d0106 l www.orasos.com..
 b2c8                                l 踩

 <16 bytes per line>

BBED> set mode edit
        MODE            Edit

BBED> m /x 78
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238            Offsets: 8174 to 8191           Dba:0x00000000
------------------------------------------------------------------------
 7877772e 6f726173 6f732e63 6f6d0106 b2c8

 <32 bytes per line>

BBED> d /v
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238   Offsets: 8174 to 8191  Dba:0x00000000
-------------------------------------------------------
 7877772e 6f726173 6f732e63 6f6d0106 l xww.orasos.com..
 b2c8                                l 踩

 <16 bytes per line>

--修改了block,但是sum依然提示required为0
BBED> sum
Check value for File 0, Block 19238:
current = 0x0000, required = 0x0000

BBED> sum apply
Check value for File 0, Block 19238:
current = 0x0000, required = 0x0000

dump block核对

buffer tsn: 3 rdba: 0x02404b25 (9/19237)
scn: 0xb8c.3c23c8b2 seq: 0x01 flg: 0x02 tail: 0xc8b20601
frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1

tab 0, row 0, @0x1f83
tl: 21 fb: --H-FL-- lb: 0x1  cc: 2
col  0: [ 2]  c1 02
col  1: [14]  78 77 77 2e 6f 72 61 73 6f 73 2e 63 6f 6d
end_of_block_dump
End dump data blocks tsn: 3 file#: 9 minblk 19237 maxblk 19237

修改flg_kcbh测试

BBED> m /x 06 offset 15
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238            Offsets:   15 to   30           Dba:0x00000000
------------------------------------------------------------------------
 06000000 00010000 00d86b01 00aec823

 <32 bytes per line>

BBED> sum
Check value for File 0, Block 19238:
current = 0x0000, required = 0x04b3
--修改flg_kcbh=x06后,sum中的required出现了非0值,表示已经启动了block完整性检测


BBED> m /x 79 offset 1876
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238            Offsets: 1876 to 1891           Dba:0x00000000
------------------------------------------------------------------------
 79391b01 0000c24e 07000205 c406573f

 <32 bytes per line>

BBED> sum
Check value for File 0, Block 19238:
current = 0x0000, required = 0x049d

--修改了chkval_kcbh值
BBED> sum apply
Check value for File 0, Block 19238:
current = 0x049d, required = 0x049d

再次检查dump block

buffer tsn: 3 rdba: 0x02404b25 (9/19237)
scn: 0xb8c.3c23c8b2 seq: 0x01 flg: 0x06 tail: 0xc8b20601
frmt: 0x02 chkval: 0x049d type: 0x06=trans data
Hex dump of block: st=0, typ_found=1

tl: 21 fb: --H-FL-- lb: 0x1  cc: 2
col  0: [ 2]  c1 02
col  1: [14]  78 77 77 2e 6f 72 61 73 6f 73 2e 63 6f 6d
end_of_block_dump
End dump data blocks tsn: 3 file#: 9 minblk 19237 maxblk 19237

证明修改flg_kcbh后,block的sum验证起效

完整测试

--flg_kcbh=2,chkval_kcbh为04b3,修改block观察变化
BBED> sum
Check value for File 0, Block 19238:
current = 0x04b3, required = 0x04b3

BBED> p kcbh
struct kcbh, 20 bytes                       @0
   ub1 type_kcbh                            @0        0x06
   ub1 frmt_kcbh                            @1        0xa2
   ub1 spare1_kcbh                          @2        0x00
   ub1 spare2_kcbh                          @3        0x00
   ub4 rdba_kcbh                            @4        0x02404b25
   ub4 bas_kcbh                             @8        0x3c23c8b2
   ub2 wrp_kcbh                             @12       0x0b8c
   ub1 seq_kcbh                             @14       0x01
   ub1 flg_kcbh                             @15       0x02 (KCBHFDLC)
   ub2 chkval_kcbh                          @16       0x04b3
   ub2 spare3_kcbh                          @18       0x0000

BBED> m /x 11 offset 7184
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238            Offsets: 7184 to 7199           Dba:0x00000000
------------------------------------------------------------------------
 110000c2 39b50002 05c40721 07280000

 <32 bytes per line>

BBED> undo
BBED> modify /x 00 filename 'E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF' block 19238. offset 7184.
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238            Offsets: 7184 to 7199           Dba:0x00000000
------------------------------------------------------------------------
 000000c2 39b50002 05c40721 07280000

 <32 bytes per line>


BBED> m /x 11 offset 8174
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238            Offsets: 8174 to 8189           Dba:0x00000000
------------------------------------------------------------------------
 1177772e 6f726173 6f732e63 6f6d0106

 <32 bytes per line>

BBED> sum
Check value for File 0, Block 19238:
current = 0x04b3, required = 0x04b3
--证明当flg_kcbh=2修改block之后chkval_kcbh依然为04b3,证明flg_kcbh不含(KCBHFCKV),
--修改block不会导致chkval_kcbh改变,也就是说,该block为启用db_block_checksum 

--flg_kcbh=6的时候验证修改block导致的chkval_kcbh变化
BBED> m /x 06 offset 15
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238            Offsets:   15 to   30           Dba:0x00000000
------------------------------------------------------------------------
 06b30400 00010000 00d86b01 00aec823

 <32 bytes per line>

--刚刚修改flg_kcbh=6,马上看到required非0
BBED> sum
Check value for File 0, Block 19238:
current = 0x04b3, required = 0x04da

BBED> sum apply
Check value for File 0, Block 19238:
current = 0x04da, required = 0x04da

--尝试修改block
BBED> m /x 22 offset 8174
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238            Offsets: 8174 to 8189           Dba:0x00000000
------------------------------------------------------------------------
 2277772e 6f726173 6f732e63 6f6d0106

 <32 bytes per line>

--required发生改变
BBED> sum
Check value for File 0, Block 19238:
current = 0x04da, required = 0x04e9

BBED> sum apply
Check value for File 0, Block 19238:
current = 0x04e9, required = 0x04e9

--修改为以前值,验证required
BBED> m /x 11 offset 8174
 File: E:\APP\XIFENFEI\ORADATA\CDB\PDB\PDB_USERS01.DBF (0)
 Block: 19238            Offsets: 8174 to 8189           Dba:0x00000000
------------------------------------------------------------------------
 1177772e 6f726173 6f732e63 6f6d0106

 <32 bytes per line>

BBED> sum
Check value for File 0, Block 19238:
current = 0x04e9, required = 0x04da

BBED> sum apply
Check value for File 0, Block 19238:
current = 0x04da, required = 0x04da
--发现当block修改回来后,required值和以前一致(也就是说chkval_kcbh值还原)
--进步一说明chkval_kcbh取决于block内部值

通过相关测试db_block_checksum是在实例级别启动block checksum,但是具体到每个block是通过flg_kcbh来控制,而具体体现是在chkval_kcbh值上

使用UltraEdit修改oracle二进制文件

对数据库OPEN的过程做10046,因为数据库没有open,那写着语句是写在什么地方的呢?通过使用UltraEdit打开$ORACLE_HOME/bin/oracle文件,可以发现很多sql语句,而且与open过程的10046很多一致.

044acdc0h: 73 65 6C 65 63 74 20 75 2E 6E 61 6D 65 2C 20 6F ; select u.name, o
044acdd0h: 32 2E 6E 61 6D 65 2C 20 6F 32 2E 6F 62 6A 23 20 ; 2.name, o2.obj# 
044acde0h: 66 72 6F 6D 20 69 6E 64 24 20 69 2C 20 6F 62 6A ; from ind$ i, obj
044acdf0h: 24 20 6F 31 2C 20 6F 62 6A 24 20 6F 32 2C 20 75 ; $ o1, obj$ o2, u
044ace00h: 73 65 72 24 20 75 20 77 68 65 72 65 20 6F 31 2E ; ser$ u where o1.
044ace10h: 6F 77 6E 65 72 23 20 3D 20 3A 31 20 20 61 6E 64 ; owner# = :1  and
044ace20h: 20 6F 31 2E 74 79 70 65 23 20 3D 20 32 20 61 6E ;  o1.type# = 2 an
044ace30h: 64 20 69 2E 74 79 70 65 23 20 3D 20 39 20 61 6E ; d i.type# = 9 an
044ace40h: 64 20 69 2E 62 6F 23 20 3D 20 6F 31 2E 6F 62 6A ; d i.bo# = o1.obj
044ace50h: 23 20 20 61 6E 64 20 69 2E 6F 62 6A 23 20 3D 20 ; #  and i.obj# = 
044ace60h: 6F 32 2E 6F 62 6A 23 20 61 6E 64 20 6F 32 2E 6F ; o2.obj# and o2.o
044ace70h: 77 6E 65 72 23 20 3D 20 75 2E 75 73 65 72 23 20 ; wner# = u.user# 
044ace80h: 61 6E 64 20 69 2E 74 79 70 65 23 20 3D 20 39 20 ; and i.type# = 9 
044ace90h: 61 6E 64 20 6F 31 2E 6F 77 6E 65 72 23 20 21 3D ; and o1.owner# !=
044acea0h: 20 6F 32 2E 6F 77 6E 65 72 23 00 00 00 00 00 00 ;  o2.owner#......
044aceb0h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
044acec0h: 73 65 6C 65 63 74 20 75 2E 6E 61 6D 65 2C 20 6F ; select u.name, o
044aced0h: 2E 6E 61 6D 65 2C 20 6F 2E 6F 62 6A 23 20 66 72 ; .name, o.obj# fr
044acee0h: 6F 6D 20 6F 62 6A 24 20 6F 2C 20 75 73 65 72 24 ; om obj$ o, user$
044acef0h: 20 75 2C 20 69 6E 64 24 20 69 20 77 68 65 72 65 ;  u, ind$ i where
044acf00h: 20 6F 2E 6F 77 6E 65 72 23 3D 3A 31 20 61 6E 64 ;  o.owner#=:1 and
044acf10h: 20 6F 2E 6F 77 6E 65 72 23 3D 75 2E 75 73 65 72 ;  o.owner#=u.user
044acf20h: 23 20 61 6E 64 20 6F 2E 6F 62 6A 23 3D 69 2E 6F ; # and o.obj#=i.o
044acf30h: 62 6A 23 20 61 6E 64 20 69 2E 74 79 70 65 23 3D ; bj# and i.type#=
044acf40h: 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; 9...............
044acf50h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
044acf60h: 73 65 6C 65 63 74 20 76 6E 61 6D 65 20 66 72 6F ; select vname fro
044acf70h: 6D 20 73 79 73 2E 73 6E 61 70 24 20 77 68 65 72 ; m sys.snap$ wher
044acf80h: 65 20 73 6F 77 6E 65 72 20 3D 20 3A 31 20 61 6E ; e sowner = :1 an
044acf90h: 64 20 69 6E 73 74 73 69 74 65 20 3D 20 30 20 20 ; d instsite = 0  
044acfa0h: 61 6E 64 20 28 62 69 74 61 6E 64 28 66 6C 61 67 ; and (bitand(flag
044acfb0h: 2C 20 32 36 38 34 33 35 34 35 36 29 20 3D 20 30 ; , 268435456) = 0
044acfc0h: 20 6F 72 20 62 69 74 61 6E 64 28 6F 62 6A 66 6C ;  or bitand(objfl
044acfd0h: 61 67 2C 20 33 32 29 20 3D 20 30 29 00 00 00 00 ; ag, 32) = 0)....

既然数据库OPEN的过程很多sql是在oracle二进制文件中写的,那是否可以考虑通过修改二进制文件中的sql语句来改变执行计划,甚至可以尝试修改这些语句使得异常的数据库能够正常open.这里演示通过修改sql语句来展示改变执行计划(不使用index,而改用全表扫描)
10046 捕获语句和执行计划

PARSING IN CURSOR #2 len=116 dep=2 uid=0 oct=3 lid=0 tim=1345475346332403 hv=854877822 ad='2f2be060'
select o.owner#,o.name,o.namespace,o.remoteowner,o.linkname,o.subname,o.dataobj#,o.flags from obj$ o where o.obj#=:1
END OF STMT
PARSE #2:c=1000,e=1258,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=4,tim=1345475346332394
BINDS #2:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=b69ef000  bln=22  avl=03  flg=05
  value=893
EXEC #2:c=2000,e=1382,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=4,tim=1345475346333978
WAIT #2: nam='db file sequential read' ela= 17820 file#=1 block#=220 blocks=1 obj#=-1 tim=1345475346351927
FETCH #2:c=1000,e=18054,p=1,cr=3,cu=0,mis=0,r=1,dep=2,og=4,tim=1345475346352100
STAT #2 id=1 cnt=1 pid=0 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWID OBJ$ (cr=3 pr=1 pw=0 time=18047 us)'
STAT #2 id=2 cnt=1 pid=1 pos=1 obj=36 op='INDEX UNIQUE SCAN I_OBJ1 (cr=2 pr=1 pw=0 time=18014 us)'

使用UE查看该sql在二进制文件内容

0459c650h: 3A 35 2C 3A 36 2C 3A 37 29 00 00 00 00 00 00 00 ; :5,:6,:7).......
0459c660h: 73 65 6C 65 63 74 20 6F 2E 6F 77 6E 65 72 23 2C ; select o.owner#,
0459c670h: 6F 2E 6E 61 6D 65 2C 6F 2E 6E 61 6D 65 73 70 61 ; o.name,o.namespa
0459c680h: 63 65 2C 6F 2E 72 65 6D 6F 74 65 6F 77 6E 65 72 ; ce,o.remoteowner
0459c690h: 2C 6F 2E 6C 69 6E 6B 6E 61 6D 65 2C 6F 2E 73 75 ; ,o.linkname,o.su
0459c6a0h: 62 6E 61 6D 65 2C 6F 2E 64 61 74 61 6F 62 6A 23 ; bname,o.dataobj#
0459c6b0h: 2C 6F 2E 66 6C 61 67 73 20 66 72 6F 6D 20 6F 62 ; ,o.flags from ob
0459c6c0h: 6A 24 20 6F 20 77 68 65 72 65 20 6F 2E 6F 62 6A ; j$ o where o.obj
0459c6d0h: 23 3D 3A 31 00 00 00 00 00 00 00 00 00 00 00 00 ; #=:1............
0459c6e0h: 73 65 6C 65 63 74 20 74 79 70 65 23 2C 63 6F 6E ; select type#,con

修改二进制文件中sql语句

0459c650h: 3A 35 2C 3A 36 2C 3A 37 29 00 00 00 00 00 00 00 ; :5,:6,:7).......
0459c660h: 73 65 6C 65 63 74 20 6F 2E 6F 77 6E 65 72 23 2C ; select o.owner#,
0459c670h: 6F 2E 6E 61 6D 65 2C 6F 2E 6E 61 6D 65 73 70 61 ; o.name,o.namespa
0459c680h: 63 65 2C 6F 2E 72 65 6D 6F 74 65 6F 77 6E 65 72 ; ce,o.remoteowner
0459c690h: 2C 6F 2E 6C 69 6E 6B 6E 61 6D 65 2C 6F 2E 73 75 ; ,o.linkname,o.su
0459c6a0h: 62 6E 61 6D 65 2C 6F 2E 64 61 74 61 6F 62 6A 23 ; bname,o.dataobj#
0459c6b0h: 2C 6F 2E 66 6C 61 67 73 20 66 72 6F 6D 20 6F 62 ; ,o.flags from ob
0459c6c0h: 6A 24 20 6F 20 77 68 65 72 65 20 6F 62 6A 23 2B ; j$ o where obj#+
0459c6d0h: 30 3D 3A 31 00 00 00 00 00 00 00 00 00 00 00 00 ; 0=:1............
0459c6e0h: 73 65 6C 65 63 74 20 74 79 70 65 23 2C 63 6F 6E ; select type#,con

这里通过对obj#加上一个常量0,使得该sql在执行之时不会使用obj$.obj#上的index,从而使用全表扫描,来实现我们修改执行计划,屏蔽该index的目的
修改后的10046观察

PARSING IN CURSOR #2 len=116 dep=2 uid=0 oct=3 lid=0 tim=1345475781593851 hv=493726595 ad='2f2ba76c'
select o.owner#,o.name,o.namespace,o.remoteowner,o.linkname,o.subname,o.dataobj#,o.flags from obj$ o where obj#+0=:1
END OF STMT
PARSE #2:c=1000,e=1095,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=4,tim=1345475781593840
BINDS #2:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=b6a86000  bln=22  avl=03  flg=05
  value=893
EXEC #2:c=2999,e=1603,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=4,tim=1345475781595754
WAIT #2: nam='db file sequential read' ela= 18417 file#=1 block#=121 blocks=1 obj#=-1 tim=1345475781614344
WAIT #2: nam='db file sequential read' ela= 1000 file#=1 block#=123 blocks=1 obj#=-1 tim=1345475781615726
WAIT #2: nam='db file sequential read' ela= 1462 file#=1 block#=124 blocks=1 obj#=-1 tim=1345475781617720
WAIT #2: nam='db file sequential read' ela= 492 file#=1 block#=125 blocks=1 obj#=-1 tim=1345475781618452
WAIT #2: nam='db file sequential read' ela= 1358 file#=1 block#=126 blocks=1 obj#=-1 tim=1345475781620361
WAIT #2: nam='db file sequential read' ela= 500 file#=1 block#=127 blocks=1 obj#=-1 tim=1345475781621039
WAIT #2: nam='db file sequential read' ela= 63132 file#=1 block#=128 blocks=1 obj#=-1 tim=1345475781684316
WAIT #2: nam='db file sequential read' ela= 989 file#=1 block#=5233 blocks=1 obj#=-1 tim=1345475781685559
WAIT #2: nam='db file sequential read' ela= 792 file#=1 block#=5234 blocks=1 obj#=-1 tim=1345475781686583
FETCH #2:c=7999,e=90994,p=9,cr=13,cu=0,mis=0,r=1,dep=2,og=4,tim=1345475781686865
STAT #2 id=1 cnt=1 pid=0 pos=1 obj=18 op='TABLE ACCESS FULL OBJ$ (cr=13 pr=9 pw=0 time=90977 us)'

很明显,到这里我们通过UE修改oracle二进制文件,实现了sql语句执行计划的更改.
免责申明:本方法仅供在非常特殊或者测试环境下使用,一般生产环境请勿模仿,否则后果自负

误杀进程导致rac hang住

有客户反馈系统hang住,不能归档,需要我们紧急介入分析
节点1日志
出现redo不能归档,redo日志都已经被写满,人工执行了ALTER SYSTEM ARCHIVE LOG CURRENT,数据库就开始把redo全部归档,但是后面产生的redo又不能归档,当redo全部写满之后,数据库有出现大量log file switch (archiving needed)等待

Tue Sep 24 22:05:37 2013
Thread 1 advanced to log sequence 47282 (LGWR switch)
  Current log# 6 seq# 47282 mem# 0: +DATA/q9db/onlinelog/group_6.1244.818697409
Tue Sep 24 22:07:31 2013
ORACLE Instance q9db1 - Can not allocate log, archival required
Thread 1 cannot allocate new log, sequence 47283
All online logs needed archiving
  Current log# 6 seq# 47282 mem# 0: +DATA/q9db/onlinelog/group_6.1244.818697409
Tue Sep 24 22:28:17 2013
ALTER SYSTEM ARCHIVE LOG
Archived Log entry 259646 added for thread 1 sequence 47266 ID 0x354620c2 dest 1:
Tue Sep 24 22:28:18 2013
Thread 1 advanced to log sequence 47283 (LGWR switch)
  Current log# 7 seq# 47283 mem# 0: +DATA/q9db/onlinelog/group_7.1243.818697415
Archived Log entry 259647 added for thread 1 sequence 47267 ID 0x354620c2 dest 1:
Archived Log entry 259648 added for thread 1 sequence 47268 ID 0x354620c2 dest 1:
Archived Log entry 259649 added for thread 1 sequence 47269 ID 0x354620c2 dest 1:
Archived Log entry 259650 added for thread 1 sequence 47270 ID 0x354620c2 dest 1:
Archived Log entry 259651 added for thread 1 sequence 47271 ID 0x354620c2 dest 1:
Archived Log entry 259652 added for thread 1 sequence 47272 ID 0x354620c2 dest 1:
Tue Sep 24 22:28:28 2013
Archived Log entry 259653 added for thread 1 sequence 47273 ID 0x354620c2 dest 1:
Archived Log entry 259654 added for thread 1 sequence 47274 ID 0x354620c2 dest 1:
Archived Log entry 259655 added for thread 1 sequence 47275 ID 0x354620c2 dest 1:
Archived Log entry 259656 added for thread 1 sequence 47276 ID 0x354620c2 dest 1:
Archived Log entry 259657 added for thread 1 sequence 47277 ID 0x354620c2 dest 1:
Archived Log entry 259658 added for thread 1 sequence 47278 ID 0x354620c2 dest 1:
Archived Log entry 259659 added for thread 1 sequence 47279 ID 0x354620c2 dest 1:
Tue Sep 24 22:28:39 2013
Archived Log entry 259660 added for thread 1 sequence 47280 ID 0x354620c2 dest 1:
Archived Log entry 259661 added for thread 1 sequence 47281 ID 0x354620c2 dest 1:
Archived Log entry 259662 added for thread 1 sequence 47282 ID 0x354620c2 dest 1:
Tue Sep 24 22:29:39 2013
Thread 1 advanced to log sequence 47284 (LGWR switch)
  Current log# 8 seq# 47284 mem# 0: +DATA/q9db/onlinelog/group_8.1242.818697417
Tue Sep 24 22:31:18 2013
Thread 1 advanced to log sequence 47285 (LGWR switch)
  Current log# 16 seq# 47285 mem# 0: +DATA/q9db/onlinelog/group_16.1884.827003545
Thread 1 advanced to log sequence 47286 (LGWR switch)
  Current log# 17 seq# 47286 mem# 0: +DATA/q9db/onlinelog/group_17.1885.827003587

节点2日志
节点2中出现大量的IPC Send timeout

Tue Sep 24 15:22:19 2013
IPC Send timeout detected. Sender: ospid 4008 [oracle@q9db02.800best.com (PING)]
…………
Tue Sep 24 18:51:55 2013
IPC Send timeout detected. Sender: ospid 4008 [oracle@q9db02.800best.com (PING)]
Tue Sep 24 18:57:54 2013
IPC Send timeout detected. Sender: ospid 4008 [oracle@q9db02.800best.com (PING)]
Receiver: inst 1 binc 464003926 ospid 1566
Tue Sep 24 19:03:57 2013
IPC Send timeout detected. Sender: ospid 4008 [oracle@q9db02.800best.com (PING)]
Receiver: inst 1 binc 464003926 ospid 1566
Tue Sep 24 19:09:53 2013
IPC Send timeout detected. Sender: ospid 4008 [oracle@q9db02.800best.com (PING)]
…………
Tue Sep 24 20:22:00 2013
IPC Send timeout detected. Sender: ospid 4008 [oracle@q9db02.800best.com (PING)]

节点1因为不能归档hang住,节点2紧接着也就hang住。对节点1hang住之时对两个节点分别做systemstate dump,使用ass进行分析得到节点1和节点2的记录大体如下:
节点1

393:waiting for 'log file switch (archiving needed)'
394:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
395:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
397:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
398:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
451:waiting for 'SQL*Net message from client'
469:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
470:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
471:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
618:waiting for 'log file switch (archiving needed)'
     Cmd: Insert
626:waiting for 'log file switch (archiving needed)'
     Cmd: Insert

NO BLOCKING PROCESSES FOUND

节点2

515:waiting for 'gc buffer busy acquire'
     Cmd: Insert
516:waiting for 'gc buffer busy acquire'
     Cmd: Insert
517:waiting for 'gc buffer busy acquire'
     Cmd: Insert
518:waiting for 'gc buffer busy acquire'
     Cmd: Insert
519:waiting for 'gc buffer busy acquire'
     Cmd: Insert
520:waiting for 'gc buffer busy acquire'
     Cmd: Select
521:waiting for 'gc current request'
     Cmd: Insert
522:waiting for 'enq: TX - row lock contention'[Enq TX-00BA0020-001E3E3C]
     Cmd: Select
523:waiting for 'gc buffer busy acquire'
     Cmd: Insert
524:waiting for 'SQL*Net message from client'
525:waiting for 'gc buffer busy acquire'
     Cmd: Insert
526:waiting for 'gc buffer busy acquire'
     Cmd: Insert
527:waiting for 'enq: TX - row lock contention'[Enq TX-00BA0020-001E3E3C]
     Cmd: Select
528:waiting for 'SQL*Net message from client'
529:waiting for 'gc buffer busy acquire'
     Cmd: Select

                    Resource Holder State
    Enq TX-0005001E-0022374F   223: waiting for 'gc current request'
    Enq TX-0047001B-002BCEB2   247: waiting for 'gc current request'
    Enq TX-015B001E-000041FF   330: waiting for 'gc current request'
    Enq TX-00010010-002EA7CD   179: waiting for 'gc current request'
    Enq TX-00BA0020-001E3E3C    ??? Blocker

Object Names
~~~~~~~~~~~~
Enq TX-0005001E-0022374F
Enq TX-0047001B-002BCEB2
Enq TX-015B001E-000041FF
Enq TX-00010010-002EA7CD
Enq TX-00BA0020-001E3E3C

通过这里,我们可以明白,节点2的很多事务hang住是因为请求gc current request,而该等待是因为节点1无法归档,有些block无法正常传输到节点2,导致节点2一直hang在这里,然后就出现IPC Send timeout;节点1上的事务阻塞甚至hang住是因为无法归档导致.到此需要定位的问题是为什么节点1不能归档
继续分析节点1 alert日志

Tue Sep 24 15:18:20 2013
opidrv aborting process O000 ospid (7332) as a result of ORA-28
Immediate Kill Session#: 1904, Serial#: 1065
Immediate Kill Session: sess: 0x24a2522a38  OS pid: 7338
Immediate Kill Session#: 3597, Serial#: 11107
Immediate Kill Session: sess: 0x24c27cf498  OS pid: 7320
Tue Sep 24 15:18:23 2013
opidrv aborting process W000 ospid (7980) as a result of ORA-28
Tue Sep 24 15:18:23 2013
opidrv aborting process W001 ospid (8560) as a result of ORA-28
Tue Sep 24 15:18:35 2013
LGWR: Detected ARCH process failure
LGWR: Detected ARCH process failure
LGWR: Detected ARCH process failure
LGWR: Detected ARCH process failure
LGWR: STARTING ARCH PROCESSES
Tue Sep 24 15:18:35 2013
ARC0 started with pid=66, OS id=10793 
Tue Sep 24 15:18:35 2013
Errors in file /u01/app/oracle/diag/rdbms/q9db/q9db1/trace/q9db1_nsa2_12635.trc:
ORA-00028: your session has been killed
LNS: Failed to archive log 8 thread 1 sequence 47156 (28)
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
Thread 1 advanced to log sequence 47157 (LGWR switch)
ARC0: STARTING ARCH PROCESSES
  Current log# 9 seq# 47157 mem# 0: +DATA/q9db/onlinelog/group_9.1241.818697421
Tue Sep 24 15:18:36 2013
ARC1 started with pid=81, OS id=10805 
Tue Sep 24 15:18:36 2013
ARC2 started with pid=84, OS id=10807 
Tue Sep 24 15:18:36 2013
ARC3 started with pid=87, OS id=10809 
ARC1: Archival started
ARC2: Archival started
ARC2: Becoming the 'no FAL' ARCH
ARC2: Becoming the 'no SRL' ARCH
ARC1: Becoming the heartbeat ARCH
Error 1031 received logging on to the standby
PING[ARC1]: Heartbeat failed to connect to standby 'q9adgdg'. Error is 1031.
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Archived Log entry 259135 added for thread 1 sequence 47156 ID 0x354620c2 dest 1:
Error 1031 received logging on to the standby
FAL[server, ARC3]: Error 1031 creating remote archivelog file 'q9adgdg'
FAL[server, ARC3]: FAL archive failed, see trace file.
ARCH: FAL archive failed. Archiver continuing
ORACLE Instance q9db1 - Archival Error. Archiver continuing.
Tue Sep 24 15:18:46 2013
opidrv aborting process O001 ospid (9605) as a result of ORA-28
Tue Sep 24 15:18:46 2013
opidrv aborting process O000 ospid (10813) as a result of ORA-28
Tue Sep 24 15:18:46 2013
Immediate Kill Session#: 2909, Serial#: 369
Immediate Kill Session: sess: 0x24226c7200  OS pid: 9091
Immediate Kill Session#: 3380, Serial#: 30271
Immediate Kill Session: sess: 0x2422782c58  OS pid: 10265
Immediate Kill Session#: 3597, Serial#: 11109
Immediate Kill Session: sess: 0x24c27cf498  OS pid: 10267
Tue Sep 24 15:20:14 2013
Restarting dead background process DIAG
Tue Sep 24 15:20:14 2013
DIAG started with pid=64, OS id=20568 
Restarting dead background process PING
Tue Sep 24 15:20:14 2013
PING started with pid=68, OS id=20570 
Restarting dead background process LMHB
Tue Sep 24 15:20:14 2013
LMHB started with pid=70, OS id=20572 
Restarting dead background process SMCO
…………
Tue Sep 24 15:23:13 2013
ARC0: Detected ARCH process failure
Tue Sep 24 15:23:13 2013
Thread 1 advanced to log sequence 47158 (LGWR switch)
  Current log# 10 seq# 47158 mem# 0: +DATA/q9db/onlinelog/group_10.1240.818697423
ARC0: STARTING ARCH PROCESSES
ARC0: STARTING ARCH PROCESSES COMPLETE
ARC0: Becoming the heartbeat ARCH
ARCH: Archival stopped, error occurred. Will continue retrying
ORACLE Instance q9db1 - Archival Error
ORA-00028: your session has been killed

查看ARCn进程

[oracle@q9db01 ~]$ ps -ef|grep ora_ar
oracle   20718 12870  0 22:07 pts/14   00:00:00 grep ora_ar
[oracle@q9db01 ~]$ ps -ef|grep ora_ar
oracle   25998 12870  0 22:07 pts/14   00:00:00 grep ora_ar

这里基本上明白了,因为客户的系统从15:15开始由于中间件程序异常,导致大量会话连接数据库,然后dba为了防止其他业务不受影响,然后开始大量通过alter system kill session,误杀了不少系统进程,包括ARCn(0,1,2,3)进程,在后面ARCn进程因为某种原因无法正常启动,导致redo无法归档,所有的redo组写满系统即hang住,该系统由于大量kill session已经导致了实例本身异常(正常情况ARCn进程kill之后会自动重启),处理方案:先增加redo组配合定时人工归档,等待业务低峰重启节点1,解决问题。温馨提醒:kill进程请小心

ALERT: Disable Transparent HugePages on SLES11, RHEL6, OEL6 and UEK2 Kernels

随着去ioe的潮流越来越汹涌,随着内存越来越便宜,使用Liunx大内存跑核心业务的系统越来越多。因为大内存,多会话,使得配置HugePages也成了必然的选择,在以前的文章中介绍过怎么去配置HugePages(Liunx系统中Oracle使用HugePages配置)以及因为没有配置Hugepage导致的故障(因未配置Hugepage会话数添增悲剧案例).但是随着Linux 6(RedHat 6, OEL 6, SLES 11 and UEK2 kernels等)的广泛应用,他们所在HugePages方面所具有的新特性(Transparent HugePages)很多人可能还没有注意到,Oracle强烈建议无论是rac还是单实例都关闭该特性.
检查是否启用Transparent HugePages方法

To check if the Transparent HugePages are enabled in your server execute the following:

Default/Enabled setting is  [always]:

# cat /sys/kernel/mm/transparent_hugepage/enabled
[always] never

 Disabled setting is [never]:

# cat /sys/kernel/mm/transparent_hugepage/enabled
always [never]
If "enabled" is NOT set to "[never]", the Transparent HugePages are being used.

You can also issue:

# grep AnonHugePages /proc/meminfo
If the output contains a line like "AnonHugepages: xxxx kB", with a value > 0kB the kernel is using Transparent HugePages.

禁用Transparent HugePages方法

Add the following to the kernel boot line in /etc/grub.conf (this is the preferred method) and reboot the server:

transparent_hugepage=never
Once modified the line will read similar to the following example:

title Oracle Linux Server (2.6.32-300.25.1.el6uek.x86_64)
        root (hd0,0)
        kernel /vmlinuz-2.6.32-300.25.1.el6uek.x86_64 ro root=LABEL=/ transparent_hugepage=never
        initrd /initramfs-2.6.32-300.25.1.el6uek.x86_64.img

OR

Add the following lines in /etc/rc.local and reboot the server:

if test -f /sys/kernel/mm/transparent_hugepage/enabled; then
   echo never > /sys/kernel/mm/transparent_hugepage/enabled
fi
if test -f /sys/kernel/mm/transparent_hugepage/defrag; then
   echo never > /sys/kernel/mm/transparent_hugepage/defrag
fi

具体见文档:ALERT Disable Transparent HugePages on SLES11 RHEL6 OEL6 and UEK2 Kernels(Doc ID 1557478.1)