Rman的format格式中的%s参数说明

%c 备份片的拷贝数
%d 数据库名称
%D 位于该月中的第几天 (DD)
%M 位于该年中的第几月 (MM)
%F 一个基于DBID唯一的名称,这个格式的形式为c-IIIIIIIIII-YYYYMMDD-QQ,其中IIIIIIIIII为该数据库的DBID,YYYYMMDD为日期,QQ是一个1-256的序列
%n 数据库名称,向右填补到最大八个字符
%u 一个八个字符的名称代表备份集与创建时间
%p 该备份集中的备份片号,从1开始到创建的文件数
%U 一个唯一的文件名,代表%u_%p_%c
%s 备份集的号
%t 备份集时间戳
%T 年月日格式(YYYYMMDD)

Oracle限制IP访问

通过修改sqlnet.ora文件实现,如果没有在$ORACLE_HOME/network/admin中没有该文件,自己创建一个或者从samples中复制一个出来。

#是否检测上述参数的设置
tcp.validnode_checking=yes
#允许访问的ip
tcp.invited_nodes =(ip1,ip2,……)
#不允许访问的ip
tcp.excluded_nodes=(ip1,ip2,……)

需要注意的问题:
1、 需要设置参数为YES,这样才能激活。
2、 建议设置允许访问的IP,因为IP地址有可能被随意修改,就不能起到自己的目的。
3、 TCP当参数TCP.INVITED_NODES和TCP.EXCLUDED_NODES设置的地址相同的时候将覆盖TCP.INVITED_NODES设置(10G)。
4、 需要重启监听器才能生效。
5、 这个方式只是适合TCP协议。
6、 这个配置适用于9i以上版本。在9i之前的版本使用文件protocol.ora。
7、 在服务器上直接连接数据库不受影响。
8、 这种限制方式事通过监听器来限制的。
9、 这个限制只是针对IP检测,对于用户名检测事不支持的。
10、不能设置ip段和通配符
11、如果配置TCP.INVITED_NODES就必须配置TCP.EXCLUDED_NODES,否则监听不能正常启动,报如下错误
TNS-12560: TNS:protocol adapter error
TNS-00584: Valid node checking configuration error

如(测试192.168.9.215地址不能登录):
tcp.validnode_checking = yes
tcp.excluded_nodes=(192.168.9.215)
tcp.included_nodes=(192.168.9.215,192.168.11.12,127.0.0.1,211.155.227.172)

触发器引起ORA-04091

原因:在行级触发器中,不能查询自身表
场景重现:通过触发器实现test_count表中统计test表中行数

--创建子表
create table TEST
(id NUMBER, name varchar2(100), primary key (id));
--创建统计表
create table test_count (test_count int);
--创建触发器
CREATE OR REPLACE TRIGGER T_TEST
  AFTER INSERT OR DELETE ON TEST
  FOR EACH ROW
DECLARE
  A NUMBER;
BEGIN
  SELECT  COUNT(*) INTO A FROM TEST;
  UPDATE TEST_COUNT SET TEST_COUNT = A;
END T_TEST;

模拟错误

INSERT INTO TEST (ID,NAME)VALUES(2,'abc');
ORA-04091: table CHF.TEST is mutating, trigger/function may not see it
ORA-06512: at "CHF.T_TEST", line 2
ORA-04088: error during execution of trigger 'CHF.T_TEST'

处理方法
通过自治事务实现(修改触发器)

CREATE OR REPLACE TRIGGER T_TEST
  AFTER INSERT OR DELETE ON TEST
  FOR EACH ROW
DECLARE
  A NUMBER;
  PRAGMA AUTONOMOUS_TRANSACTION;
BEGIN
  SELECT  COUNT(*) INTO A FROM TEST;
  UPDATE TEST_COUNT SET TEST_COUNT = A;
  COMMIT;
END T_TEST;

PRAGMA AUTONOMOUS_TRANSACTION
当前的触发器作为已有事务的子事务运行,子事务自治管理,子事务的commit、rollback操作不影响父事务的状态

Read by other session等待事件

今天中午发现福建生产库报负载有点异常,处理思路记录下来:
1、使用top命令查看系统,发现系统负载是比以前要搞(平时都是1以下,今天已经稳定在4左右,总是有部分进城占用cpu比较高,系统cpu等待明显)
1.1)第一反应是有人执行sql导致,抓取占用cpu较高的spid,查询出对应sql,发现都是一些比较简单sql
1.2)查询这些spid的客户端是应用服务器,也就是说不是人为执行,那在一个稳定的系统中,不会出现sql突然改变的原因

2、查询系统是否因为有对象被阻塞导致,查询发现无对象被阻塞

3、查询系统等待事件,发现几十个read by other session等待,都是从一台web的服务器上连接过来
SELECT * FROM v$session WHERE wait_class#<>6;

4、read by other session等待事件比较陌生,幸好伴随有db file sequential read的等待事件,初步怀疑读取数据到内存中等待导致

5、查询资料发现
read by other session Definition: When information is requested from the database, Oracle will first read the data from disk into the database buffer cache. If two or more sessions request the same information, the first session will read the data into the buffer cache while other sessions wait.
In previous versions this wait was classified under the “buffer busy waits” event.
However, in Oracle 10.1 and higher this wait time is now broken out into the “read by other session” wait event. Excessive waits for this event are typically due to several processes repeatedly reading the same blocks, e.g. many sessions scanning the same index or performing full-table scans on the same table. Tuning this issue is a matter of finding and eliminating this contention.
Confio concludes with a summary that “read by other session waits” are very similar to buffer busy waits
When a session waits on the “read by other session” event, it indicates a wait for another session to read the data from disk into the Oracle buffer cache.
If this happens too often the performance of the query or the entire database can suffer. Typically this is caused by contention for “hot” blocks or objects so it is imperative to find out which data is being contended for. Once that is known this document lists several alternative methods for solving the issue.

总结:两个或者多个会话同时需要把硬盘中的对象装载到data buffer中,当其中一个会话把对象装入后,其他会话就处于read by other session等待状态;这个是oracle 10g 从oracle 9i的buffer busy waits中分离出来的,也是需要一种热块现象

6、根据FILE#,BLOCK#查询热块对象
SELECT OWNER, SEGMENT_NAME, SEGMENT_TYPE, TABLESPACE_NAME, A.PARTITION_NAME
FROM DBA_EXTENTS A
WHERE FILE_ID = &FILE_ID
AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS – 1;

7、通过这个对象查询出对应的操作语句
select * from v$sql where upper(sql_text) like ‘%object_name%’;

8、直接查找热点块对象语句

SELECT *
  FROM (SELECT O.OWNER, O.OBJECT_NAME, O.OBJECT_TYPE, SUM(TCH) TOUCHTIME,
          FROM X$BH B, DBA_OBJECTS O
         WHERE B.OBJ = O.DATA_OBJECT_ID
           AND B.TS# > 0
         GROUP BY O.OWNER, O.OBJECT_NAME, O.OBJECT_TYPE
         ORDER BY SUM(TCH) DESC)
 WHERE ROWNUM <= 10
--或者
SELECT E.OWNER, E.SEGMENT_NAME, E.SEGMENT_TYPE
  FROM DBA_EXTENTS E,
       (SELECT *
          FROM (SELECT ADDR, TS#, FILE#, DBARFIL, DBABLK, TCH
                  FROM X$BH
                 ORDER BY TCH DESC)
         WHERE ROWNUM < 11) B
 WHERE E.RELATIVE_FNO = B.DBARFIL
   AND E.BLOCK_ID <= B.DBABLK
   AND E.BLOCK_ID + E.BLOCKS > B.DBABLK;

9、直接查找热点块操作语句

SELECT /*+rule*/
 HASH_VALUE, SQL_TEXT
  FROM V$SQLTEXT
 WHERE (HASH_VALUE, ADDRESS) IN
       (SELECT A.HASH_VALUE, A.ADDRESS
          FROM V$SQLTEXT A,
               (SELECT DISTINCT A.OWNER, A.SEGMENT_NAME, A.SEGMENT_TYPE
                  FROM DBA_EXTENTS A,
                       (SELECT DBARFIL, DBABLK
                          FROM (SELECT DBARFIL, DBABLK
                                  FROM X$BH
                                 ORDER BY TCH DESC)
                         WHERE ROWNUM < 11) B
                 WHERE A.RELATIVE_FNO = B.DBARFIL
                   AND A.BLOCK_ID <= B.DBABLK
                   AND A.BLOCK_ID + A.BLOCKS > B.DBABLK) B
         WHERE A.SQL_TEXT LIKE '%' || B.SEGMENT_NAME || '%'
           AND B.SEGMENT_TYPE = 'TABLE')
 ORDER BY HASH_VALUE, ADDRESS, PIECE;

10、也可以通过awr查询出来相关对象
Segments by Buffer Busy Waits
语句需要通过这些等待对象进行判断

到了1点钟左右,数据库read by other session等待事件消失,数据库恢复正常负载(因为系统还能够承受,所以当时没有采用kill进程的方法)

事后对查询出来的热点块对象和操作语句,已经访问服务器和开发确认的结果为:这个是一个报表功能,但是平时没有人用,所以也没有关注,今天突然被人用了下,导致这个问题发生,他们承诺在升级下个版本中解决这个问题。

Data Guard出现gap sequence修复

一、出现gap sequence现象
备库

Fetching gap sequence in thread 1, gap sequence 710-716
Tue May 31 15:02:38 2011
FAL[client]: Failed to request gap sequence 
 GAP - thread 1 sequence 710-716
 DBID 3240478808 branch 746916894
FAL[client]: All defined FAL servers have been attempted.
-------------------------------------------------------------
Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization
parameter is defined to a value that is sufficiently large
enough to maintain adequate log switch information to resolve
archivelog gaps.
-------------------------------------------------------------

主库

Tue May 31 13:50:47 2011
FAL[server]: Fail to queue the whole FAL gap
 GAP - thread 1 sequence 710-716
 DBID 3240478808 branch 746916894

二、修复操作
1、查询备库的scn

SQL> SELECT CURRENT_SCN FROM V$DATABASE;
CURRENT_SCN
-----------
    1154337
--在出现意外datafile header scn不一致的时候,需要根据提示归档日志,找出最小scn

2、确定主库是否添加数据文件

SQL> select FILE#,name from v$datafile where CREATION_CHANGE#> =1154337;
no rows selected

确定主库在这个scn之后是否有添加数据文件,如果添加文件,需要手工在备库添加

3、备库停止日志应用

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;

4、主库增量备份并传输到备库上
主库进行增量备份

RMAN> BACKUP INCREMENTAL FROM SCN  1154337 DATABASE 
FORMAT '/home/oracle/xff_%U' tag 'XIFENFEI';
[oracle@localhost ~]$ scp xff* 192.168.1.30:/home/oracle/rman

说明:主库之前必须要做过rman的全备(没有全备的库,基于scn的增量备份也能够成功)

5、备库上进行恢复

RMAN> CATALOG START WITH '/home/oracle/rman';
RMAN> RECOVER DATABASE NOREDO;

说明:CATALOG START WITH是10g及其以后版本中才存在功能,没有该功能可以采用catalog或者复制主库的控制文件,rman备份放置和主库备份时相同目录实现。

6、主库上创建standby controlfile文件并传输到备库

RMAN> BACKUP CURRENT CONTROLFILE FOR STANDBY 
FORMAT '/home/oracle/xff_ctl.bck';
[oracle@localhost ~]$ scp xff_ctl.bck 192.168.1.30:/home/oracle/rman

创建standby controlfile两步可以需要根据实际情况考虑,大多数情况下不需要

7、备库恢复控制文件

RMAN> shutdown;
RMAN> STARTUP NOMOUNT;
RMAN> RESTORE STANDBY CONTROLFILE FROM '/home/oracle/rman/xff_ctl.bck';
RMAN> alter database mount;

8、清空备库日志组

SQL> ALTER DATABASE CLEAR LOGFILE GROUP 1; 
注:如果采用了standby log模式,不需要清空,如果清空会出现
SQL> ALTER DATABASE CLEAR LOGFILE GROUP 1;
ALTER DATABASE CLEAR LOGFILE GROUP 1
*
ERROR at line 1:
ORA-19527: physical standby redo log must be renamed
ORA-00312: online log 1 thread 1: '/u01/oradata/xienfei/redo01.log'

说明:如果没有采用standby log模式,有几组需要清空几组

9、备库重设flashback

SQL> ALTER DATABASE FLASHBACK OFF; 
SQL> ALTER DATABASE FLASHBACK ON;

10、备库重新接收并应用日志

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;

三、修复成功标志
1、sql中操作
在主库中执行alter system switch logfile;
分别主备库中执行select max(sequence#) from v$archived_log;如果一致标示修复成功

2、通过alert文件
主库

PING[ARC0]: Error 3113 when pinging standby xff.
Tue May 31 14:11:51 2011
Thread 1 advanced to log sequence 719
  Current log# 3 seq# 719 mem# 0: /u01/oradata/xienfei/redo03.log
Tue May 31 14:20:05 2011
Thread 1 advanced to log sequence 720
  Current log# 1 seq# 720 mem# 0: /u01/oradata/xienfei/redo01.log
Tue May 31 14:20:16 2011
ARC0: Standby redo logfile selected for thread 1 sequence 719 for destination LOG_ARCHIVE_DEST_2

备库

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION
Tue May 31 15:30:37 2011
Attempt to start background Managed Standby Recovery process (xff)
MRP0 started with pid=18, OS id=14704
Tue May 31 15:30:37 2011
MRP0: Background Managed Standby Recovery process started (xff)
Managed Standby Recovery not using Real Time Apply
 parallel recovery started with 2 processes
Media Recovery Log /u01/archive/1_718_746916894.arc
Tue May 31 15:30:43 2011
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION
Tue May 31 15:30:52 2011
RFS[1]: Successfully opened standby log 4: '/u01/oradata/xienfei/s_redo1.log'
Media Recovery Log /u01/archive/1_719_746916894.arc
Media Recovery Waiting for thread 1 sequence 720