ORA-01075: you are currently logged on

rm删除文件后alert中出现错误

Mon Apr 16 21:36:59 2012
Errors in file /home/oracle/oracle/admin/XGS/bdump/xgs_j000_1349.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01116: error in opening database file 3
ORA-01110: data file 3: '/home/oracle/oracle/oradata/XGS/sysaux01.dbf'
ORA-27041: unable to open file
Linux Error: 2: No such file or directory
Additional information: 3
ORA-01116: error in opening database file 3
ORA-01110: data file 3: '/home/oracle/oracle/oradata/XGS/sysaux01.dbf'
ORA-27041: unable to open file
Linux Error: 2: No such file or directory
Additional information: 3
ORA-01116: error in opening database file 6
ORA-01110: data file 6: '/home/oracle/oracle/oradata/XGS/undotbs02.dbf'
ORA-27041: unable to open file
Linux Error: 2: No such file or directory
Additional information: 3

数据库进程还在运行

oracle     779     1  0 21:21 ?        00:00:01 ora_pmon_XGS
oracle     781     1  0 21:21 ?        00:00:10 ora_psp0_XGS
oracle     783     1  0 21:21 ?        00:00:00 ora_mman_XGS
oracle     785     1  0 21:21 ?        00:00:00 ora_dbw0_XGS
oracle     787     1  0 21:21 ?        00:00:00 ora_lgwr_XGS
oracle     789     1  0 21:21 ?        00:00:00 ora_ckpt_XGS
oracle     791     1  0 21:21 ?        00:00:00 ora_smon_XGS
oracle     793     1  0 21:21 ?        00:00:00 ora_reco_XGS
oracle     795     1  0 21:21 ?        00:00:00 ora_cjq0_XGS
oracle     797     1  0 21:21 ?        00:00:01 ora_mmon_XGS
oracle     799     1  0 21:21 ?        00:00:00 ora_mmnl_XGS
oracle     801     1  0 21:21 ?        00:00:00 ora_d000_XGS
oracle     803     1  0 21:21 ?        00:00:00 ora_s000_XGS

尝试登陆数据库

[oracle@dbtest ~]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.1.0 - Production on Mon Apr 16 21:40:06 2012

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

ERROR:
ORA-01075: you are currently logged on


Enter user-name: sys
Enter password: 
ERROR:
ORA-00604: error occurred at recursive SQL level 2
ORA-01116: error in opening database file 1
ORA-01110: data file 1: '/home/oracle/oracle/oradata/XGS/system01.dbf'
ORA-27041: unable to open file
Linux Error: 2: No such file or directory
Additional information: 3
ORA-00604: error occurred at recursive SQL level 1
ORA-01116: error in opening database file 6
ORA-01110: data file 6: '/home/oracle/oracle/oradata/XGS/undotbs02.dbf'
ORA-27041: unable to open file
Linux Error: 2: No such file or directory
Additional information: 3

问题原因

Internal triggers are trying to fire but one or more datafiles for the SYSAUX tablespace is offline, 
this is preventing the database from allowing new connections.

NOTE: At this point, you cannot connect to verify the status in V$DATAFILE, 
but you may find an indication of the offline datafile(s) in the alert.log file.

For example:
In one case, a media problem occurred which made disks unavailable.
This caused several files to be taken offline automatically including a SYSAUX datafile.

解决方法
kill进程,重启数据库到mount状态,然后根据特定情况恢复数据库或者online相关文件

记录一次ORA-00600[kdsgrp1]分析

数据库版本

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for IBM/AIX RISC System/6000: Version 10.2.0.4.0 - Productio
NLSRTL Version 10.2.0.4.0 - Production

找出报错对象

--方法1
*** SESSION ID:(795.16405) 2012-04-05 09:36:11.958
            row 080095ee.26 continuation at
            file# 32 block# 38382 slot 39 not found
**************************************************
KDSTABN_GET: 0 ..... ntab: 1
curSlot: 39 ..... nrows: 19
**************************************************

SQL> SELECT OWNER, SEGMENT_NAME, SEGMENT_TYPE, TABLESPACE_NAME, A.PARTITION_NAME
  2    FROM DBA_EXTENTS A
  3   WHERE FILE_ID = &FILE_ID
  4     AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1;
Enter value for file_id: 32
old   3:  WHERE FILE_ID = &FILE_ID
new   3:  WHERE FILE_ID = 32
Enter value for block_id: 38382
old   4:    AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
new   4:    AND 38382 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1

OWNER
------------------------------
SEGMENT_NAME
--------------------------------------------------------------------------------
SEGMENT_TYPE       TABLESPACE_NAME                PARTITION_NAME
------------------ ------------------------------ ------------------------------
AHV8
TBL_IVR_LOG
TABLE PARTITION    CSS_PARTITION                  IVR_LOG_2012_MONTH04

--方法2
*** 2012-04-05 09:36:11.965
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []
Current SQL statement for this session:
INSERT INTO TBL_CONTACT_INFO_FAILED_TMP 
select * from TBL_IVR_LOG

SO: 70000017f954f50, type: 4, owner: 70000017f65a840, flag: INIT/-/-/0x00
(session) sid: 795 trans: 70000017464a1e8, creator: 70000017f65a840, flag: (40110041) USR/- BSY/-/-/-/-/-
              DID: 0002-0067-000305BD, short-term DID: 0002-0067-000305BE
              txn branch: 0
              oct: 2, prv: 0, sql: 70000015180ee98, psql: 700000180d67550, user: 49/AHV8
service name: SYS$USERS
O/S info: user: oracle10, term: UNKNOWN, ospid: 12976218, machine: zwq_kfdb2
              program: oracle@zwq_kfdb2 (J002)
last wait for 'db file sequential read' blocking sess=0x0 seq=226 wait_time=17071 seconds since wait started=1
                file#=20, block#=95ee, blocks=1

--方法3
Block header dump:  0x080095ee
 Object id on Block? Y
 seg/obj: 0x11eeb  csc: 0x6f2.848e814  itc: 2  flg: E  typ: 1 - DATA
     brn: 1  bdba: 0x7c09c89 ver: 0x01 opc: 0
     inc: 0  exflg: 0


SQL> select to_number('11eeb','xxxxxxxx') from dual;

TO_NUMBER('11EEB','XXXXXXXX')
-----------------------------
                        73451

SQL> select owner,object_name,subobject_name,object_type from dba_objects where data_object_id='73451';

OWNER
------------------------------
OBJECT_NAME
--------------------------------------------------------------------------------
SUBOBJECT_NAME                 OBJECT_TYPE
------------------------------ -------------------
AHV8
TBL_IVR_LOG
IVR_LOG_2012_MONTH04           TABLE PARTITION

验证是否真的坏块

SQL> select name from v$datafile where file#=32;

NAME
------------------------------------------------------
/dev/rdb1_data27

[zwq_kfdb2:/home/oraeye]dbv file='/dev/rdb1_data27' blocksize=8192

DBVERIFY: Release 10.2.0.4.0 - Production on Fri Apr 13 15:33:10 2012

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

DBVERIFY - Verification starting : FILE = /dev/rdb1_data27
  

DBVERIFY - Verification complete

Total Pages Examined         : 1048448
Total Pages Processed (Data) : 947357
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 4756
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 96335
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Highest block SCN            : 297329920 (1778.297329920)

SQL> select count(*) from AHV8.TBL_IVR_LOG partition(IVR_LOG_2012_MONTH04);

  COUNT(*)
----------
   8798030

总结:很明显这次出现这个问题,因为内存中出现坏块导致,经过一段时间buffer cache中的坏块内容已经被老化,所以现在不能重现(甚至不用做任何操作)。如果内存中出现了坏块,而且还没有被老化掉,可以刷新data buffer;如果是数据块出现坏块,根据实际情况决定处理

ORACLE 十进制与二进制互转函数

十进制转换二进制

CREATE OR REPLACE FUNCTION NUMBER_TO_BIT(V_NUM NUMBER) 
RETURN VARCHAR IS V_RTN VARCHAR(8);--注意返回列长度
  V_N1  NUMBER;
  V_N2  NUMBER;
BEGIN
V_N1 := V_NUM;
    LOOP
      V_N2  := MOD(V_N1, 2);
      V_N1  := ABS(TRUNC(V_N1 / 2));
      V_RTN := TO_CHAR(V_N2) || V_RTN;
      EXIT WHEN V_N1 = 0;
    END LOOP;
--返回二进制长度
 SELECT lpad(V_RTN,8,0) 
    INTO   V_RTN
    FROM dual;
return V_RTN;
end;

SQL> select NUMBER_TO_BIT(208) from dual;

NUMBER_TO_BIT(208)
-----------------------------
11010000

二进制转换十进制

CREATE OR REPLACE FUNCTION BIT_TO_NUMBER(P_BIN IN VARCHAR2) RETURN NUMBER AS
  V_SQL    VARCHAR2(30000) := 'SELECT BIN_TO_NUM(';
  V_RETURN NUMBER;
BEGIN
  IF LENGTH(P_BIN) >= 256 THEN
    RAISE_APPLICATION_ERROR(-20001, 'INPUT BIN TOO LONG!');
  END IF;
  IF LTRIM(P_BIN, '01') IS NOT NULL THEN
    RAISE_APPLICATION_ERROR(-20002, 'INPUT STR IS NOT VALID BIN VALUE!');
  END IF;
  FOR I IN 1 .. LENGTH(P_BIN) LOOP
    V_SQL := V_SQL || SUBSTR(P_BIN, I, 1) || ',';
  END LOOP;
  V_SQL := RTRIM(V_SQL, ',') || ') FROM DUAL';
  EXECUTE IMMEDIATE V_SQL
    INTO V_RETURN;
  RETURN V_RETURN;
END;

SQL> SELECT BIT_TO_NUMBER('11010000') FROM DUAL;

BIT_TO_NUMBER('11010000')
-------------------------
                      208

ORACLE ROWID 分析

ROWID知识补充
Rowid改为:OOOOOOFFFBBBBBBRRR,占用10个字节(32bit data_object_id +10bit rfile# +22bit block +16bit)。其中,O是对象ID,F是文件ID,B是块ID,R是行ID。

Rowid对应值对应10十进制值

A-Z <==> 0 - 25 (26)
a-z <==> 26 - 51 (26)
0-9 <==> 52 - 61 (10)
+/ <==> 62 - 63 (2)

模拟环境

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bi
PL/SQL Release 10.2.0.1.0 - Production
CORE    10.2.0.1.0      Production
TNS for Linux: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production

SQL> create table t_xff (id number,name varchar2(100));

Table created.

SQL> insert into t_xff values(1,'www.orasos.com');

1 row created.

SQL> commit;

Commit complete.

SQL>  alter table t_xff move;

Table altered.

SQL> select rowid,a.* from t_xff a;

ROWID                      ID NAME
------------------ ---------- ------------------------------
AADye6AAEAAAtCcAAA          1 www.orasos.com

相关值计算

Data Object number = AADye6
File               = AAE
Block              = AAAtCc
ROW                = AAA

DATA_OBJECT_ID
AADye6
58*64^0+30*64^1+50*64^2+3*64^3+0*64^4+0*64^5
58+1920+204800+786432+0+0=993210

RFILE#
AAE
4*64^0+0*64^1+0*64^2
4+0+0=4

BLOCK
AAAtCc
28*64^0+2*64^1+45*64^2+0*64^3+0*64^4+0*64^5
28+128+184320+0+0+0=184476

验证结果

SQL> select object_id,data_object_id from dba_objects where object_name='T_XFF';

 OBJECT_ID DATA_OBJECT_ID
---------- --------------
    993209         993210

SQL> select dbms_rowid.rowid_object('AADye6AAEAAAtCcAAA') data_object_id#,
  2         dbms_rowid.rowid_relative_fno('AADye6AAEAAAtCcAAA') rfile#,
  3         dbms_rowid.rowid_block_number('AADye6AAEAAAtCcAAA') block#,
  4     dbms_rowid.rowid_row_number('AADye6AAEAAAtCcAAA') row# from dual;  

DATA_OBJECT_ID#     RFILE#     BLOCK#       ROW#
--------------- ---------- ---------- ----------
         993210          4     184476          0

dump方式分析

SQL> select rowid,dump(rowid) from t_xff;

ROWID              DUMP(ROWID)
------------------ ------------------------------------------
AADye6AAEAAAtCcAAA Typ=69 Len=10: 0,15,39,186,1,2,208,156,0,0

DATA_OBJECT_ID
0,15,39,186
15*256*256+39*256+186=993210

RFILE#
1,2(取前10位)
000000001 00 =4

BLOCK
2,208,156(取后22位)
000010 11010000 10011100
2^17+208*256+156=184476

oracle 9i数据库存在大量ora_j0**进程

1.发现多个ora_j0**进程
可以发现进程重启非常频繁,大概1分钟重启一次,启动ora_j0**的个数为20个

[prmsvr2:/home/oraeye]ps -ef|grep ora_j0|grep -v grep
oracle 213120 1 0 11:01:55 - 0:00 ora_j019_prmdb2 
oracle 282808 1 0 11:01:55 - 0:00 ora_j012_prmdb2 
oracle 299178 1 0 11:01:54 - 0:00 ora_j004_prmdb2 
oracle 434212 1 0 11:01:55 - 0:00 ora_j014_prmdb2 
oracle 475238 1 0 11:01:54 - 0:00 ora_j011_prmdb2 
oracle 487562 1 0 11:01:54 - 0:00 ora_j008_prmdb2 
oracle 614612 1 0 11:01:55 - 0:00 ora_j017_prmdb2 
oracle 717008 1 0 11:01:54 - 0:00 ora_j009_prmdb2 
oracle 721012 1 0 11:01:54 - 0:00 ora_j006_prmdb2 
oracle 749618 1 0 11:01:54 - 0:00 ora_j007_prmdb2 
oracle 770268 1 0 11:01:54 - 0:00 ora_j005_prmdb2 
oracle 811114 1 0 11:01:55 - 0:00 ora_j015_prmdb2 
oracle 831550 1 0 11:01:55 - 0:00 ora_j016_prmdb2 
oracle 847940 1 0 11:01:55 - 0:00 ora_j013_prmdb2 
oracle 938076 1 0 11:01:54 - 0:00 ora_j010_prmdb2 
oracle 942224 1 0 11:01:54 - 0:00 ora_j002_prmdb2 
oracle 974968 1 0 11:01:55 - 0:00 ora_j018_prmdb2 
oracle 434602 1 0 11:01:54 - 0:00 ora_j000_prmdb2 
oracle 668104 1 0 11:01:54 - 0:00 ora_j003_prmdb2 
oracle 983424 1 0 11:01:54 - 0:00 ora_j001_prmdb2 

[prmsvr2:/home/oraeye]ps -ef|grep ora_j0|grep -v grep
oracle 454676 1 1 11:02:56 - 0:00 ora_j012_prmdb2 
oracle 696366 1 1 11:02:56 - 0:00 ora_j011_prmdb2 
oracle 749628 1 1 11:02:56 - 0:00 ora_j019_prmdb2 
oracle 770276 1 1 11:02:56 - 0:00 ora_j017_prmdb2 
oracle 794824 1 1 11:02:56 - 0:00 ora_j010_prmdb2 
oracle 827464 1 1 11:02:55 - 0:00 ora_j009_prmdb2 
oracle 831560 1 1 11:02:56 - 0:00 ora_j016_prmdb2 
oracle 847946 1 1 11:02:56 - 0:00 ora_j014_prmdb2 
oracle 888972 1 1 11:02:55 - 0:00 ora_j007_prmdb2 
oracle 934064 1 1 11:02:55 - 0:00 ora_j008_prmdb2 
oracle 938080 1 1 11:02:56 - 0:00 ora_j013_prmdb2 
oracle 942232 1 0 11:02:55 - 0:00 ora_j001_prmdb2 
oracle 962760 1 1 11:02:55 - 0:00 ora_j006_prmdb2 
oracle 434608 1 1 11:02:55 - 0:00 ora_j004_prmdb2 
oracle 528810 1 2 11:02:56 - 0:00 ora_j015_prmdb2 
oracle 549228 1 1 11:02:55 - 0:00 ora_j005_prmdb2 
oracle 668112 1 1 11:02:55 - 0:00 ora_j003_prmdb2 
oracle 709090 1 1 11:02:55 - 0:00 ora_j002_prmdb2 
oracle 905720 1 2 11:02:56 - 0:00 ora_j018_prmdb2 
oracle 971040 1 1 11:02:55 - 0:00 ora_j000_prmdb2 

[prmsvr2:/home/oraeye]ps -ef|grep ora_j0|grep -v grep |wc -l
      20

2.其他参数

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
PL/SQL Release 9.2.0.8.0 - Production
CORE    9.2.0.8.0       Production
TNS for IBM/AIX RISC System/6000: Version 9.2.0.8.0 - Production
NLSRTL Version 9.2.0.8.0 - Production

SQL> show parameter job;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
job_queue_processes                  integer     20

SQL> select count(*) from DBA_JOBS_RUNNING;

  COUNT(*)
----------
         0

SQL> SELECT count(*) FROM v$session_wait where event='jobq slave wait';

  COUNT(*)
----------
        20
SQL> select count(*) from dba_jobs;

  COUNT(*)
----------
         2

SQL> col name for a32
SQL> col value for a24
SQL> col description for a70
SQL> select a.ksppinm name,b.ksppstvl value,a.ksppdesc description
  2    from x$ksppi a,x$ksppcv b
  3   where a.inst_id = USERENV ('Instance')
  4     and b.inst_id = USERENV ('Instance')
  5     and a.indx = b.indx
  6     and upper(a.ksppinm) LIKE upper('%&param%')
  7  order by name
  8  /
Enter value for param: _JOB_QUEUE_INTERVAL
old   6:    and upper(a.ksppinm) LIKE upper('%&param%')
new   6:    and upper(a.ksppinm) LIKE upper('%_JOB_QUEUE_INTERVAL%')

NAME                             VALUE
-------------------------------- ------------------------
DESCRIPTION
----------------------------------------------------------------------
_job_queue_interval              5
Wakeup interval in seconds for job queue co-ordinator

3.对cjq进程做10046

[prmsvr2:/oracle]ps -ef|grep cjq
  oracle  327870       1   0   Feb 10      - 770:39 ora_cjq0_prmdb2 
  oracle  929872  794774   0 13:24:59  pts/2  0:00 grep cjq 

SQL> oradebug setospid 327870
Oracle pid: 19, Unix process pid: 327870, image: oracle@prmsvr2 (CJQ0)
SQL> oradebug unlimit
Statement processed.
SQL> oradebug event 10046 trace name context forever, level 12
Statement processed.
SQL>  oradebug event 10046 trace name context off;
Statement processed.
SQL> oradebug tracefile_name
/oracle/app/admin/prmdb/bdump/prmdb2_cjq0_327870.trc

4.查看cjq的10046文件
发现大量的process startup等待,而且两次批量运行之间的时间间隔在1分钟左右。

WAIT #0: nam='process startup' ela= 59247 p1=74 p2=0 p3=0
WAIT #0: nam='process startup' ela= 51486 p1=74 p2=1 p3=0
WAIT #0: nam='process startup' ela= 51629 p1=74 p2=2 p3=0
WAIT #0: nam='process startup' ela= 48205 p1=74 p2=3 p3=0
WAIT #0: nam='process startup' ela= 47625 p1=74 p2=4 p3=0
WAIT #0: nam='process startup' ela= 55945 p1=74 p2=5 p3=0
WAIT #0: nam='process startup' ela= 47633 p1=74 p2=6 p3=0
WAIT #0: nam='process startup' ela= 51809 p1=74 p2=7 p3=0
WAIT #0: nam='process startup' ela= 57371 p1=74 p2=8 p3=0
WAIT #0: nam='process startup' ela= 50249 p1=74 p2=9 p3=0
WAIT #0: nam='process startup' ela= 51683 p1=74 p2=10 p3=0
WAIT #0: nam='process startup' ela= 47933 p1=74 p2=11 p3=0
WAIT #0: nam='process startup' ela= 47699 p1=74 p2=12 p3=0
WAIT #0: nam='process startup' ela= 47800 p1=74 p2=13 p3=0
WAIT #0: nam='process startup' ela= 47947 p1=74 p2=14 p3=0
WAIT #0: nam='process startup' ela= 52071 p1=74 p2=15 p3=0
WAIT #0: nam='process startup' ela= 47776 p1=74 p2=16 p3=0
WAIT #0: nam='process startup' ela= 47764 p1=74 p2=17 p3=0
WAIT #0: nam='process startup' ela= 47684 p1=74 p2=18 p3=0
WAIT #0: nam='process startup' ela= 47790 p1=74 p2=19 p3=0

通过O记的大力帮助,终于找出了该问题的原因:Bug 4339922: CJQ PROCESS WAKE UP JOB QUEUE PROCESSES EVERY 1 MINUTES.(THERE IS NO JOBS).因为9i的版本oracle不再提供新补丁支持,ora_j0**相关进程不停重启不太占用系统和数据库资源,在不能升级数据库的情况下,可以考虑设置job_queue_processes到一个合适值,然后忽略该问题。