Solaris rm datafile recovery

今天早上接到有客户恢复请求,他一不小心在solaris系统中使用rm -rf oradata命令把一个分区下面的所有数据文件全部删除了。现在不知道怎么办,请求我们给予支持.上去检查发现比较幸运,数据库没有crash,看来直接考虑使用句柄进行恢复

root@CNISORCLSVR # uname -a
SunOS CNISORCLSVR 5.9 Generic_112233-08 sun4u sparc SUNW,Sun-Fire-880
root@CNISORCLSVR # ps -ef|grep lgwr   
  oracle   597     1  0   Mar 05 ?       17:14 ora_lgwr_xifenfei
    root 28069 28043  0 18:51:17 pts/2    0:00 grep lgwr
root@CNISORCLSVR # ls -ltr
total 189348454
-r--r--r--   1 oracle   dba       657920 Apr 26  2002 12
c---------   1 root     sys       13, 12 Mar 27  2004 8
c---------   1 root     sys       13, 12 Mar 27  2004 10
-rw-r-----   0 oracle   dba      34359730176 Nov 12  2013 291
-rw-r-----   0 oracle   dba      1073750016 Nov 13  2013 293
D---------   1 root     root           0 Mar  5 19:31 11
-rw-r-----   1 oracle   dba         1758 Mar  5 22:04 9
-rw-rw----   1 oracle   dba           24 Mar  5 22:04 13
s---------   0 root     root           0 Mar  8 00:45 14
-rw-r-----   1 oracle   dba      1887444992 Mar 12 03:27 289
-rw-r-----   1 oracle   dba      943726592 Mar 12 11:17 290
-rw-r-----   0 oracle   dba      4294975488 Mar 13 00:09 292
-rw-r-----   0 oracle   dba      268443648 Mar 13 01:33 288
-rw-r-----   0 oracle   dba      536879104 Mar 13 01:33 279
-rw-r-----   1 oracle   dba      134225920 Mar 13 01:33 278
-rw-r-----   0 oracle   dba      134225920 Mar 13 01:33 269
-rw-r-----   1 oracle   dba      268443648 Mar 13 01:33 267
-rw-r-----   1 oracle   dba      148119552 Mar 13 01:33 266
-rw-r-----   1 oracle   dba      10493952 Mar 13 01:33 265
-rw-r-----   1 oracle   dba      26222592 Mar 13 01:33 264
-rw-r-----   1 oracle   dba      62922752 Mar 13 01:33 263
-rw-r-----   1 oracle   dba      20979712 Mar 13 01:33 262
-rw-r-----   0 oracle   dba      134225920 Mar 13 01:33 287
-rw-r-----   1 oracle   dba      209723392 Mar 13 01:33 285
-rw-r-----   0 oracle   dba      536879104 Mar 13 01:33 283
-rw-r-----   1 oracle   dba      67117056 Mar 13 01:33 282
-rw-r-----   0 oracle   dba      536879104 Mar 13 01:33 281
-rw-r-----   0 oracle   dba      536879104 Mar 13 01:33 280
-rw-r-----   0 oracle   dba      536879104 Mar 13 01:33 276
-rw-r-----   0 oracle   dba      1073750016 Mar 13 01:33 275
-rw-r-----   0 oracle   dba      2214600704 Mar 13 01:33 274
-rw-r-----   0 oracle   dba      134225920 Mar 13 01:33 273
-rw-r-----   0 oracle   dba      536879104 Mar 13 01:33 272
c---------   1 root     sys       13,  2 Mar 13 02:00 5
c---------   1 root     sys       13,  2 Mar 13 02:00 4
c---------   1 root     sys       13,  2 Mar 13 02:00 3
c---------   1 root     sys       13,  2 Mar 13 02:00 2
c---------   1 root     sys       13,  2 Mar 13 02:00 1
c---------   1 root     sys       13,  2 Mar 13 02:00 0
--w-------   1 oracle   dba      4640842 Mar 13 04:43 7
--w-------   1 oracle   dba      4640842 Mar 13 04:43 6
-rw-r-----   0 oracle   dba      1207967744 Mar 13 18:21 271
-rw-r-----   0 oracle   dba      15929974784 Mar 13 18:39 284
-rw-r-----   0 oracle   dba      134225920 Mar 13 18:45 277
-rw-r-----   0 oracle   dba      2122326016 Mar 13 18:46 286
-rw-r-----   0 oracle   dba      9261031424 Mar 13 18:47 270
-rw-r-----   0 oracle   dba      18253619200 Mar 13 18:47 268
-rw-r-----   1 oracle   dba      134225920 Mar 13 18:51 261
-rw-r-----   1 oracle   dba      524296192 Mar 13 18:51 260
-rw-r-----   1 oracle   dba      104858112 Mar 13 18:52 259
-rw-r-----   1 oracle   dba      1941504 Mar 13 18:52 258
-rw-r-----   1 oracle   dba      1941504 Mar 13 18:52 257
-rw-r-----   1 oracle   dba      1941504 Mar 13 18:52 256

SQL> select file#,name from v$datafile wehre name like '/disk%';

     FILE# NAME
---------- --------------------------------------------------
         9 /disk/oradata/xifenfei/xifenfei.dbf
        10 /disk/oradata/xifenfei/CSSN.dbf
        11 /disk/oradata/xifenfei/NCSSN.dbf
        12 /disk/oradata/xifenfei/CSIC_RDS.dbf
        13 /disk/oradata/xifenfei/CSIC_CSSN.dbf
        14 /disk/oradata/xifenfei/CNIS_I.dbf
        15 /disk/oradata/xifenfei/CNIS.dbf
        16 /disk/oradata/xifenfei/TRSWCM6_CSSN.dbf
        17 /disk/oradata/xifenfei/TRSWCM6_CSSN_PLUGINS.dbf
        18 /disk/oradata/xifenfei/DIGIREF.dbf
        20 /disk/oradata/xifenfei/TRSWCM.dbf
        21 /disk/oradata/xifenfei/TRSWCM52_NSLC.dbf
        22 /disk/oradata/xifenfei/TRSWCM52_PLUGINS_NSLC.dbf
        24 /disk/oradata/xifenfei/TRSWCM_PLUGINS.dbf
        25 /disk/oradata/xifenfei/CNIS_ALL.dbf
        27 /disk/oradata/xifenfei/undotbs01.dbf
        28 /disk/oradata/xifenfei/TRS_IDS02.dbf
        29 /disk/oradata/xifenfei/xdb02.dbf

在solaris中比较郁闷,虽然进入了fd目录,但是无法知道哪些文件句柄是删除,哪些是正常的,因此没有办法,只能使用lsof进一步分析

root@CNISORCLSVR # pkgadd -d lsof-4.80-sol9-sparc-local

The following packages are available:
  1  IBMlsof     lsof
                 (sparc) 4.80

Select package(s) you wish to process (or 'all' to process
all packages). (default: all) [?,??,q]: all

Processing package instance <IBMlsof> from </tmp/lsof-4.80-sol9-sparc-local>

lsof
(sparc) 4.80
Vic Abell
Using </usr/local> as the package base directory.
## Processing package information.
## Processing system information.
## Verifying disk space requirements.
## Checking for conflicts with packages already installed.

The following files are already installed on the system and are being
used by another package:
* /usr/local/bin <attribute change only>

* - conflict with a file which does not belong to any package.

Do you want to install these conflicting files [y,n,?,q] y
## Checking for setuid/setgid programs.

The following files are being installed with setuid and/or setgid
permissions:
 /usr/local/bin/lsof <setgid sys>
 /usr/local/bin/sparcv7/lsof <setgid sys>
 /usr/local/bin/sparcv9/lsof <setgid sys>

Do you want to install these as setuid/setgid files [y,n,?,q] y

Installing lsof as <IBMlsof>

## Installing part 1 of 1.
/usr/local/bin/lsof
/usr/local/bin/sparcv7/lsof
/usr/local/bin/sparcv9/lsof
/usr/local/doc/lsof/00.README.FIRST
/usr/local/doc/lsof/00CREDITS
/usr/local/doc/lsof/00DCACHE
/usr/local/doc/lsof/00DIALECTS
/usr/local/doc/lsof/00DIST
/usr/local/doc/lsof/00FAQ
/usr/local/doc/lsof/00LSOF-L
/usr/local/doc/lsof/00MANIFEST
/usr/local/doc/lsof/00PORTING
/usr/local/doc/lsof/00QUICKSTART
/usr/local/doc/lsof/00README
/usr/local/doc/lsof/00TEST
/usr/local/doc/lsof/00XCONFIG
/usr/local/doc/lsof/lsof.man
/usr/local/man/man8/lsof.8
[ verifying class <none> ]

Installation of <IBMlsof> was successful.

root@CNISORCLSVR # ./lsof -p 597
COMMAND PID   USER   FD   TYPE        DEVICE    SIZE/OFF   NODE NAME
oracle  597 oracle  cwd   VDIR          85,5        2048 106299 /export/home/oracle/app/product/9.2.0/dbs
oracle  597 oracle  txt   VREG          85,5    61272672   2332 /export/home/oracle/app/product/9.2.0/bin/oracle
…………
oracle  597 oracle  260u  VREG          85,5   524296192 106517 /export/home/oracle/oradata/xifenfei/system01.dbf
oracle  597 oracle  261u  VREG          85,5   134225920 106518 /export/home/oracle/oradata/xifenfei/undotbs01.dbf
…………
oracle  597 oracle  268u  VREG        118,70 18253619200  109 /disk (/dev/dsk/c2t600A0B800029CEFA0000036C491B270Bd0s6)
oracle  597 oracle  269u  VREG        118,70   134225920  110 /disk (/dev/dsk/c2t600A0B800029CEFA0000036C491B270Bd0s6)
oracle  597 oracle  270u  VREG        118,70  9261031424  111 /disk (/dev/dsk/c2t600A0B800029CEFA0000036C491B270Bd0s6)
…………
oracle  597 oracle  293u  VREG        118,70  1073750016   14 /disk (/dev/dsk/c2t600A0B800029CEFA0000036C491B270Bd0s6)

到这一步,基本上定位/disk部分是我们需要恢复的数据,从而可以定位到句柄,然后结合数据文件信息,直接使用cp命令恢复出来文件.然后数据库层面recover并且online.

cd /proc/597/fd
cp 269 /disk/oradata/cnisora2/CSSN.dbf
chown oracle:dba /disk/oradata/xifenfei/CSSN.dbf

SQL> recover datafile 10;
ORA-00283: 恢复会话因错误而取消
ORA-01124: 无法恢复数据文件 10 - 文件在使用中或在恢复中
ORA-01110: 数据文件 10: '/disk/oradata/xifenfei/CSSN.dbf'


SQL> alter database datafile 10 offline;

数据库已更改。

SQL> recover datafile 10;
完成介质恢复。
SQL> alter database datafile 10 online;

数据库已更改。

至此基本上恢复完成,万幸是数据库没有crash,遇到此类问题,千万不要盲目关闭数据库.另外数据库备份重于一切

aix中procmap 查看oracle进程占用系统内存

procmap是用来显示进程地址空间,通过这个命令找出来的“read/write”表示为进程的私有内存,如果对应到oracle 进程的LOCAL中来,也就是对应了是oracle 会话进程占用的操作系统内存,和sga与pga无关,即ORACLE数据库进程占用的额外的系统内存,在计算oracle数据库消耗内存的时候,要考虑sga+pga+process占用的内存
procmap命令使用

$procmap 7931354
7931354 : oracleccicdx (LOCAL=NO) 
100000000            95504K  read/exec         oracle
110000035             2399K  read/write        oracle
9fffffff0000000         51K  read/exec         /usr/ccs/bin/usla64
9fffffff000cfe2          0K  read/write        /usr/ccs/bin/usla64
900000000b05930          2K  read/exec         /usr/lib/libC.a[shr3_64.o]
9001000a0122930          0K  read/write        /usr/lib/libC.a[shr3_64.o]
900000000ae6b00        118K  read/exec         /usr/lib/libC.a[shrcore_64.o]
9001000a030a100         12K  read/write        /usr/lib/libC.a[shrcore_64.o]
900000000ac8000        118K  read/exec         /usr/lib/libC.a[ansicore_64.o]
9001000a0300e00         36K  read/write        /usr/lib/libC.a[ansicore_64.o]
900000000411468          0K  read/exec         /usr/lib/libicudata.a[shr_64.o]
9001000a0121468          0K  read/write        /usr/lib/libicudata.a[shr_64.o]
90000000040f738          2K  read/exec         /usr/lib/libC.a[shr2_64.o]
9001000a0314738          0K  read/write        /usr/lib/libC.a[shr2_64.o]
9000000008dd800       1699K  read/exec         /usr/lib/libC.a[ansi_64.o]
9001000a0315a00        277K  read/write        /usr/lib/libC.a[ansi_64.o]
9000000008bab00        135K  read/exec         /usr/lib/libC.a[shr_64.o]
9001000a030eb00         19K  read/write        /usr/lib/libC.a[shr_64.o]
900000000708180       1732K  read/exec         /usr/lib/libicuuc.a[shr_64.o]
9001000a035cdac        180K  read/write        /usr/lib/libicuuc.a[shr_64.o]
900000000493d80       2510K  read/exec         /usr/lib/libicui18n.a[shr_64.o]
9001000a038a148        270K  read/write        /usr/lib/libicui18n.a[shr_64.o]
900000000473200         91K  read/exec         /usr/lib/libsrc.a[shr_64.o]
9001000a01127a8         55K  read/write        /usr/lib/libsrc.a[shr_64.o]
90000000045a300         98K  read/exec         /usr/lib/libcorcfg.a[shr_64.o]
9001000a04147c8         18K  read/write        /usr/lib/libcorcfg.a[shr_64.o]
900000000b16200        750K  read/exec         /usr/lib/liblvm.a[shr_64.o]
9001000a03dd028        219K  read/write        /usr/lib/liblvm.a[shr_64.o]
900000000444f00         82K  read/exec         /usr/lib/libcfg.a[shr_64.o]
9001000a03d58f0         26K  read/write        /usr/lib/libcfg.a[shr_64.o]
90000000040e3a0          2K  read/exec         /usr/lib/libcrypt.a[shr_64.o]
9001000a0106948          0K  read/write        /usr/lib/libcrypt.a[shr_64.o]
90000001615d860          5K  read/exec         /usr/lib/libc.a[aio_64.o]
9001000a3aed568          0K  read/write        /usr/lib/libc.a[aio_64.o]
9000000003efc00        120K  read/exec         /usr/lib/libodm.a[shr_64.o]
9001000a0107cc8         40K  read/write        /usr/lib/libodm.a[shr_64.o]
900000000bd2c80        147K  read/exec         /usr/lib/libperfstat.a[shr_64.o]
9001000a041a960         14K  read/write        /usr/lib/libperfstat.a[shr_64.o]
9000000017d7000          0K  read/exec         /usr/lib/libdl.a[shr_64.o]
9001000a0517000          0K  read/write        /usr/lib/libdl.a[shr_64.o]
9000000158ed100       8636K  read/exec         /oracle/product/db10gr2/lib/libjox10.a[shr.o]
8001000a0000b78        587K  read/write        /oracle/product/db10gr2/lib/libjox10.a[shr.o]
900000000a87000        257K  read/exec         /usr/lib/libpthreads.a[shr_xpg5_64.o]
9001000a0274000        559K  read/write        /usr/lib/libpthreads.a[shr_xpg5_64.o]
900000000000800       4025K  read/exec         /usr/lib/libc.a[shr_64.o]
9001000a0000020       1047K  read/write        /usr/lib/libc.a[shr_64.o]
         Total      121863K

简化命令,统计私有内存,procmap 7931354|grep “read/write” |awk -F ” ” ‘{print $2}’,通过相关计算的出来,在当前的操作系统和数据库版本中,一个LOCAL=NO进程占用系统内存为:5758KB

补充说明
1.操作系统版本

$oslevel -r
6100-06

2.数据库版本

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for IBM/AIX RISC System/6000: Version 10.2.0.4.0 - Productio
NLSRTL Version 10.2.0.4.0 - Production

3.通过跟踪多个LOCAL=NO进程,发现类似进程占用的系统内存相同,估算给系统oracle进程占用的内存,可以通过该值进行大概估算
4.确认ORACLE使用的内存量不是以往认识的sga+pga,实际上应该是sga+pga+所有oracle进程占用
5.在linux中使用pmap来查看

ORACLE在AIX中产生SOFTWARE PROGRAM ABNORMALLY TERMINATED警告原因

数据库中发现如下错误
该错误的解决方案:ORA-07445[dbgrmqmqpk_query_pick_key()+0f88]

Dump file /oracle/diag/rdbms/sgerp5/sgerp5/incident/incdir_579300/sgerp5_m000_7602504_i579300.trc
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /oracle/product/11.1.0/db_1
System name:    AIX
Node name:  sgerp5
Release:    1
Version:    6
Machine:    00C8F0564C00
Instance name: sgerp5
Redo thread mounted by this instance: 1
Oracle process number: 138
Unix process pid: 7602504, image: oracle@sgerp5 (m000)
 
*** 2012-05-11 03:52:35.200
*** SESSION ID:(752.5029) 2012-05-11 03:52:35.200
*** CLIENT ID:() 2012-05-11 03:52:35.200
*** SERVICE NAME:(SYS$BACKGROUND) 2012-05-11 03:52:35.200
*** MODULE NAME:(MMON_SLAVE) 2012-05-11 03:52:35.200
*** ACTION NAME:(Auto-Purge Slave Action) 2012-05-11 03:52:35.200
 
Dump continued from file: /oracle/diag/rdbms/sgerp5/sgerp5/trace/sgerp5_m000_7602504.trc
ORA-07445: exception encountered: core dump [dbgrmqmqpk_query_pick_key()+0f88] 
[SIGSEGV] [ADDR:0xB38F0000000049][PC:0x100213C08] [Address not mapped to object] []

errpt错误说明
在产生7445错误的同时观察aix系统错误日志发现SOFTWARE PROGRAM ABNORMALLY TERMINATED错误

sgerp5_[oracle]-->errpt -aj A924A5FC
---------------------------------------------------------------------------
LABEL:          CORE_DUMP
IDENTIFIER:     A924A5FC

Date/Time:       Fri May 11 03:52:55 BEIST 2012
Sequence Number: 471
Machine Id:      00C8F0564C00
Node Id:         sgerp5
Class:           S
Type:            PERM
WPAR:            Global
Resource Name:   SYSPROC         

Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED

Probable Causes
SOFTWARE PROGRAM

User Causes
USER GENERATED SIGNAL

        Recommended Actions
        CORRECT THEN RETRY

Failure Causes
SOFTWARE PROGRAM

        Recommended Actions
        RERUN THE APPLICATION PROGRAM
        IF PROBLEM PERSISTS THEN DO THE FOLLOWING
        CONTACT APPROPRIATE SERVICE REPRESENTATIVE

Detail Data
SIGNAL NUMBER
           6
USER'S PROCESS ID:
               7602504
FILE SYSTEM SERIAL NUMBER
          14
INODE NUMBER
           0      367648
CORE FILE NAME
/oracle/diag/rdbms/sgerp5/sgerp5/cdump/core_7602504/core
PROGRAM NAME
oracle
STACK EXECUTION DISABLED
           0
COME FROM ADDRESS REGISTER
sskgmcrea 0

PROCESSOR ID
  hw_fru_id: 1
  hw_cpu_id: 2

ADDITIONAL INFORMATION
skgdbgcra 224
??
ksdbgcra 3D0
ssexhd 978
??

Symptom Data
REPORTABLE
1
INTERNAL ERROR
0
SYMPTOM CODE
PCSS/SPI2 FLDS/oracle SIG/6 FLDS/skgdbgcra VALU/224

错误原因

This error is logged when a software program abnormally ends and causes a core dump. Users might
not be exiting applications correctly, the system might have been shut down while users were
working in application, or the user's terminal might have locked up and the application stopped
1)这里也就是说如果oracle进程在aix机器上异常终止,并且产生了一个core dump文件,
  就会出现SOFTWARE PROGRAM ABNORMALLY TERMINATED警告信息
2)用户登录系统没有正常退出,而系统被关闭
3)用户强制终止一个一个lock,而导致进程停止

本次AIX日志警告原因:由于进程7602504异常终止(ORA-07445错误)并且产生了 /oracle/diag/rdbms/sgerp5/sgerp5/cdump/core_7602504/core dump 文件,从而有了AIX中的SOFTWARE PROGRAM ABNORMALLY TERMINATED警告信息

ksh翻上/下条和自动补全功能

AIX默认安装ksh,对于习惯了bash的人来说,不能tab自动补全,不能翻上/下,感觉使用起来很不方便,在ksh中不能直接实现这些功能,可以使用另外的方法来完成
一.安装bash程序,使用起来就和bash一样

二.ksh中通过其他方法完成
翻上/下条功能
1、在主目录中 vi .profile
2、添加一行:export EDITOR=vi
3、保存.profile,重新登陆;或者source ~/.profile
现在如果要使用翻上/下条功能,只需要按下esc键,然后使用j/k翻上/下即可;如果要退回到输入功能,直接输入i,然后输入即可.其实所有操作就是和vi中的操作一样.

自动补全功能
使用esc+\

Posted in AIX |

通过netstat+rmsock查找AIX端口对应进程

rmsock除去不包含文件描述符的套接字。它接受 socket、tcpcb、inpcb、ripcb 或 rawcb 地址并将其转换成套接字地址。然后检查每个进程所有打开的文件以查找套接字的匹配。如果没找到匹配,对该套接字执行异常终止操作,而不考虑套接字 linger 选项的存在。套接字保留的端口号释放。如果发现匹配,文件描述符和主进程状态显示给用户。
命令格式:rmsock Address TypeofAddress

[zwq:/]netstat -Aan|grep 6200|grep LISTEN
f1000e0000307bb0 tcp4       0      0  *.6200             *.*                LISTEN
--f1000e0000307bb0 为系统内核地址

[zwq:/]rmsock f1000e0000307bb0 tcpcb
The socket 0x307808 is being held by proccess 5701830 (ons).

[zwq:/]ps -ef|grep 5701830|grep -v grep
oracle10  5701830  5112098   0   Apr 21      -  7:17 /oracle10/app/product/crs/10.2.0/opmn/bin/ons -d