oracleasm createdisk重新创建asm disk后数据0丢失恢复案例

有客户反馈他们重启系统之后,发现asmlib创建的asmdisk丢失了,然后又使用oracleasm deletedisk和createdisk重新创建的asm disk,最后发现asm diskgroup无法mount。让客户通过dd 备份5m数据,然后使用kfed分析
kefd分析结果

E:\OneDrive\ORACLE\recover\no_backup\asm\kfedwin>kfed read H:\temp\asmlib\xx.img

kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj:                       0 ; 0x008: TYPE=0x0 NUMB=0x0
kfbh.check:                  3760689243 ; 0x00c: 0xe027905b
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000

E:\OneDrive\ORACLE\recover\no_backup\asm\kfedwin>kfed read H:\temp\asmlib\xx.img blkn=1
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj:                       0 ; 0x008: TYPE=0x0 NUMB=0x0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000

E:\OneDrive\ORACLE\recover\no_backup\asm\kfedwin>kfed read H:\temp\asmlib\xx.img blkn=10
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj:                       0 ; 0x008: TYPE=0x0 NUMB=0x0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000

E:\OneDrive\ORACLE\recover\no_backup\asm\kfedwin>kfed read H:\temp\asmlib\xx.img blkn=255
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj:                       0 ; 0x008: TYPE=0x0 NUMB=0x0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000

E:\OneDrive\ORACLE\recover\no_backup\asm\kfedwin>kfed read H:\temp\asmlib\xx.img blkn=256|more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                           17 ; 0x002: KFBTYP_PST_META
kfbh.datfmt:                          2 ; 0x003: 0x02
kfbh.block.blk:                     256 ; 0x004: T=0 NUMB=0x100
kfbh.block.obj:              2147483648 ; 0x008: TYPE=0x8 NUMB=0x0
kfbh.check:                  3925268785 ; 0x00c: 0xe9f6d931
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdpHdrPairBv1.first.super.time.hi:32994098 ; 0x000: HOUR=0x12 DAYS=0x19 MNTH=0x
c YEAR=0x7dd
kfdpHdrPairBv1.first.super.time.lo:1614030848 ; 0x004: USEC=0x0 MSEC=0x10a SECS=
0x3 MINS=0x18
kfdpHdrPairBv1.first.super.last:      2 ; 0x008: 0x00000002
kfdpHdrPairBv1.first.super.next:      2 ; 0x00c: 0x00000002
kfdpHdrPairBv1.first.super.copyCnt:   1 ; 0x010: 0x01
kfdpHdrPairBv1.first.super.version:   1 ; 0x011: 0x01
kfdpHdrPairBv1.first.super.ub2spare:  0 ; 0x012: 0x0000
kfdpHdrPairBv1.first.super.incarn:    1 ; 0x014: 0x00000001
kfdpHdrPairBv1.first.super.copy[0]:   0 ; 0x018: 0x0000
kfdpHdrPairBv1.first.super.copy[1]:   0 ; 0x01a: 0x0000
kfdpHdrPairBv1.first.super.copy[2]:   0 ; 0x01c: 0x0000
……

因为kfed默认每个block为4k,这里提示256是ok的,255是损坏的,从而推测出来,很可能oracleasm createdisk损坏了1M的数据。由于默认au是1m,而且数据库版本是11.2.0.3,而且第256个blkn开始没有损坏,因此初步判断可以考虑使用备份asm disk header来恢复磁盘头
检查还原磁盘头的asm disk

[grid@xifenfei1 disks]$ kfed read DATA1
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                  2776451033 ; 0x00c: 0xa57d47d9
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr:    ORCLDISKDATA1 ; 0x000: length=13
kfdhdb.driver.reserved[0]:   1096040772 ; 0x008: 0x41544144
kfdhdb.driver.reserved[1]:           49 ; 0x00c: 0x00000031
kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
kfdhdb.compat:                186646528 ; 0x020: 0x0b200000
kfdhdb.dsknum:                        0 ; 0x024: 0x0000
kfdhdb.grptyp:                        1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:               DATA_0000 ; 0x028: length=9
kfdhdb.grpname:                    DATA ; 0x048: length=4
kfdhdb.fgname:                DATA_0000 ; 0x068: length=9
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.crestmp.hi:             32994099 ; 0x0a8: HOUR=0x13 DAYS=0x19 MNTH=0xc YEAR=0x7dd
kfdhdb.crestmp.lo:           2797442048 ; 0x0ac: USEC=0x0 MSEC=0x365 SECS=0x2b MINS=0x29
kfdhdb.mntstmp.hi:             33022061 ; 0x0b0: HOUR=0xd DAYS=0x3 MNTH=0x8 YEAR=0x7df
kfdhdb.mntstmp.lo:            816879616 ; 0x0b4: USEC=0x0 MSEC=0x26 SECS=0xb MINS=0xc
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000
…………

证明磁盘头确实被比较完美的修复了,现在的任务是尝试mount磁盘组
mount磁盘组

[grid@xifenfei1 ~]$ sqlplus / as sysasm

SQL*Plus: Release 11.2.0.3.0 Production on Thu Aug 6 20:54:53 2015

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> alter diskgroup data mount;

Diskgroup altered.

SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

asm diskgroup已经正常mount,使用asmcmd命令检查文件是否正常
分析磁盘组数据是否正常

[grid@xifenfei1 ~]$ asmcmd
ASMCMD> lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576   1622060   636493                0          636493              0             N  DATA/
ASMCMD> cd data
ASMCMD> ls
ORCL/
ASMCMD> cd orcl
ASMCMD> ls
CONTROLFILE/
DATAFILE/
ONLINELOG/
PARAMETERFILE/
TEMPFILE/
spfileorcl.ora
ASMCMD> cd datafile
ASMCMD> ls
XIFENFEI20130801.314.835191517
XIFENFEI20140101.321.835191571
XIFENFEI20140201.322.835191573
XIFENFEI20140301.323.835191573
…………
SYSAUX.270.835182535
SYSAUX.838.874669369
SYSTEM.271.835182533
SYSTEM.823.873555791
SYSTEM.945.883146947
…………

这里看到磁盘组里面的数据文件都正常,使用同样的方法,继续mount其他磁盘组。
尝试启动数据库

SQL> startup
ORACLE 例程已经启动。

Total System Global Area 5010685952 bytes
Fixed Size                  2236968 bytes
Variable Size            2013269464 bytes
Database Buffers         2986344448 bytes
Redo Buffers                8835072 bytes
数据库装载完毕。
ORA-16038: 日志 14 sequence# 21145 无法归档
ORA-19504: 无法创建文件""
ORA-00312: 联机日志 14 线程 2: '+DATA/orcl/onlinelog/group_14.284.835184569'
ORA-00312: 联机日志 14 线程 2: '+ARCH/orcl/onlinelog/group_14.287.835184569'

查看数据库alert日志

ARC1: Archival started
ARC2: Archival started
ARC2: Becoming the 'no FAL' ARCH
ARC2: Becoming the 'no SRL' ARCH
ARC1: Becoming the heartbeat ARCH
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Thu Aug 06 21:04:06 2015
Thread 2 advanced to log sequence 21146 (thread recovery)
Picked broadcast on commit scheme to generate SCNs
Thread 2 advanced to log sequence 21147 (before internal thread enable)
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_27402.trc:
ORA-19816: 警告: 文件可能存在于数据库未知的 db_recovery_file_dest 中。
ORA-17502: ksfdcre: 4 未能创建文件 +ARCH
ORA-15196: invalid ASM block header [kfc.c:19572] [check_kfbh] [1] [47962] [1344818371 != 630731762]
ORA-15130: diskgroup "ARCH" is being dismounted
ORA-15066: offlining disk "ARCH_0000" in group "ARCH" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483648] [1] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483648] [1] [0 != 1]
ARCH: Error 19504 Creating archive log file to '+ARCH'
NOTE: Deferred communication with ASM instance
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_27402.trc:
ORA-15130: diskgroup "ARCH" is being dismounted
NOTE: deferred map free for map id 754
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_27402.trc:
ORA-16038: 日志 14 sequence# 21145 无法归档
ORA-19504: 无法创建文件""
ORA-00312: 联机日志 14 线程 2: '+DATA/orcl/onlinelog/group_14.284.835184569'
ORA-00312: 联机日志 14 线程 2: '+ARCH/orcl/onlinelog/group_14.287.835184569'
ORA-16038 signalled during: ALTER DATABASE OPEN...
Thu Aug 06 21:04:10 2015
SUCCESS: diskgroup ARCH was dismounted
SUCCESS: diskgroup ARCH was dismounted
Thu Aug 06 21:04:10 2015
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ckpt_27353.trc:
ORA-00206: error in writing (block 3, # blocks 1) of control file
ORA-00202: control file: '+ARCH/orcl/controlfile/current.256.835182531'
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ckpt_27353.trc:
ORA-00221: error on write to control file
ORA-00206: error in writing (block 3, # blocks 1) of control file
ORA-00202: control file: '+ARCH/orcl/controlfile/current.256.835182531'
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
Thu Aug 06 21:04:10 2015
System state dump requested by (instance=1, osid=27353 (CKPT)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_diag_27318.trc
CKPT (ospid: 27353): terminating the instance due to error 221
Instance terminated by CKPT, pid = 27353

查看asm alert日志

Thu Aug 06 21:04:07 2015
WARNING: cache read  a corrupt block: group=2(ARCH) dsk=0 blk=1 disk=0 (ARCH_0000) incarn=3942486752 au=0 blk=1 count=1
Errors in file /u01/app/11.2.0/grid/log/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27462.trc:
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483648] [1] [0 != 1]
NOTE: a corrupted block from group ARCH was dumped to /u01/app/11.2.0/grid/log/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27462.trc
WARNING: cache read (retry) a corrupt block: group=2(ARCH) dsk=0 blk=1 disk=0 (ARCH_0000) incarn=3942486752 au=0 blk=1 count=1
Errors in file /u01/app/11.2.0/grid/log/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27462.trc:
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483648] [1] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483648] [1] [0 != 1]
ERROR: cache failed to read group=2(ARCH) dsk=0 blk=1 from disk(s): 0(ARCH_0000)
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483648] [1] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26076] [endian_kfbh] [2147483648] [1] [0 != 1]
NOTE: cache initiating offline of disk 0 group ARCH
NOTE: process _user27462_+asm1 (27462) initiating offline of disk 0.3942486752 (ARCH_0000) with mask 0x7e in group 2
WARNING: Disk 0 (ARCH_0000) in group 2 in mode 0x7f is now being taken offline on ASM inst 1
NOTE: initiating PST update: grp = 2, dsk = 0/0xeafd92e0, mask = 0x6a, op = clear
Thu Aug 06 21:04:07 2015
GMON updating disk modes for group 2 at 17 for pid 35, osid 27462
ERROR: Disk 0 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 2)
Thu Aug 06 21:04:07 2015
NOTE: cache dismounting (not clean) group 2/0x723D6245 (ARCH)
NOTE: messaging CKPT to quiesce pins Unix process pid: 27089, image: oracle@xifenfei1 (B000)
WARNING: Offline of disk 0 (ARCH_0000) in group 2 and mode 0x7f failed on ASM inst 1
Thu Aug 06 21:04:07 2015
NOTE: halting all I/Os to diskgroup 2 (ARCH)
System State dumped to trace file /u01/app/11.2.0/grid/log/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27462.trc
NOTE: AMDU dump of disk group ARCH created at /u01/app/11.2.0/grid/log/diag/asm/+asm/+ASM1/trace
Thu Aug 06 21:04:09 2015
NOTE: LGWR doing non-clean dismount of group 2 (ARCH)
NOTE: LGWR sync ABA=126.806 last written ABA 126.806

这里可以看出来,报错的block为arch磁盘组的第一个磁盘的第一个au的第二个block,而我们在开始的时候,已经分析了asm disk的第一个au完全损坏,并且我们使用了备份磁盘头进行来还原,勉强可以让磁盘组mount起来,但是由于数据库在启动的时候,需要对redo进行归档,而归档的过程需要写到arch磁盘组里面,这个时候需要访问到au=0 blk=1,而这个块本身是坏的,因此这个时候该块盘的disk就被offline掉了,而这个磁盘组是外部冗余的,因此磁盘组dismount了,所以数据库无法启动.

分析第一个au里面到底有哪些东西

SQL> select DISK_NUMBER,path from v$asm_disk;

DISK_NUMBER PATH
----------- --------------------------------------------------
          0 /dev/raw/raw1
          2 /dev/raw/raw3
          1 /dev/raw/raw2

[oracle@xifenfei raw]$ kfed read raw1 blkn=1|grep kfbh.type
kfbh.type:                            2 ; 0x002: KFBTYP_FREESPC
[oracle@xifenfei raw]$ kfed read raw1 blkn=2|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL
[oracle@xifenfei raw]$ kfed read raw1 blkn=3|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL
[oracle@xifenfei raw]$ kfed read raw1 blkn=255|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL
[oracle@xifenfei raw]$ kfed read raw2 blkn=1|grep kfbh.type
kfbh.type:                            2 ; 0x002: KFBTYP_FREESPC
[oracle@xifenfei raw]$ kfed read raw2 blkn=2|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL
[oracle@xifenfei raw]$ kfed read raw2 blkn=255|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL
[oracle@xifenfei raw]$ kfed read raw3 blkn=1|grep kfbh.type
kfbh.type:                            2 ; 0x002: KFBTYP_FREESPC
[oracle@xifenfei raw]$ kfed read raw3 blkn=2|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL
[oracle@xifenfei raw]$ kfed read raw3 blkn=255|grep kfbh.type
kfbh.type:                            3 ; 0x002: KFBTYP_ALLOCTBL

通过一个测试机器的一个磁盘组进行分析,我们可以基本上确定asm 第一个au除了asm disk header的KFBTYP_DISKHEAD之外,其他主要是KFBTYP_FREESPC(Free Space Table)和KFBTYP_ALLOCTBL(allocator table),主要就是记录asm中au的分配情况,也就是进一步说明,如果我不对asm里面的数据使用更多的au分配或者回收au,在缺少第一个au的1-255个block的信息情况下,asm的磁盘组也不会dismount。根据这个思路,让数据库归档到本地,然后继续测试

继续open数据库

SQL> startup
ORACLE 例程已经启动。

Total System Global Area 5010685952 bytes
Fixed Size                  2236968 bytes
Variable Size            2013269464 bytes
Database Buffers         2986344448 bytes
Redo Buffers                8835072 bytes
数据库装载完毕。
SQL> alter database open;

数据库已更改。

LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
Fri Aug 07 02:43:13 2015
ARC1 started with pid=34, OS id=22778 
Fri Aug 07 02:43:13 2015
ARC2 started with pid=35, OS id=22780 
Fri Aug 07 02:43:13 2015
ARC3 started with pid=36, OS id=22782 
ARC1: Archival started
ARC2: Archival started
ARC2: Becoming the 'no FAL' ARCH
ARC2: Becoming the 'no SRL' ARCH
ARC1: Becoming the heartbeat ARCH
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Fri Aug 07 02:43:24 2015
Thread 1 opened at log sequence 18604
  Current log# 10 seq# 18604 mem# 0: /tmp/xifenfei/otherfile/group_10.273.835182533
  Current log# 10 seq# 18604 mem# 1: /tmp/xifenfei/otherfile/group_10.263.835182533
Successful open of redo thread 1
Fri Aug 07 02:43:24 2015
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Fri Aug 07 02:43:25 2015
SMON: enabling cache recovery
Instance recovery: looking for dead threads
Instance recovery: lock domain invalid but no dead threads
Fri Aug 07 02:43:26 2015
minact-scn: Inst 1 is now the master inc#:2 mmon proc-id:21328 status:0x7
minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000
Fri Aug 07 02:43:26 2015
Redo thread 2 internally disabled at seq 21147 (CKPT)
[21341] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:96999124 end:97000624 diff:1500 (15 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
No Resource Manager plan active
Starting background process GTX0
Fri Aug 07 02:43:31 2015
GTX0 started with pid=37, OS id=22803 
Starting background process RCBG
Fri Aug 07 02:43:31 2015
RCBG started with pid=38, OS id=22805 
replication_dependency_tracking turned off (no async multimaster replication found)
Fri Aug 07 02:43:34 2015
Archived Log entry 73876 added for thread 2 sequence 21145 ID 0x513c613f dest 1: <---果然有归档操作发生
Starting background process QMNC
Fri Aug 07 02:43:34 2015
QMNC started with pid=39, OS id=22812 
Fri Aug 07 02:43:35 2015
Archived Log entry 73877 added for thread 2 sequence 21146 ID 0x513c613f dest 1:
Fri Aug 07 02:43:35 2015
ARC0: Archiving disabled thread 2 sequence 21147
Archived Log entry 73878 added for thread 2 sequence 21147 ID 0x513c613f dest 1:
Fri Aug 07 02:43:37 2015
Completed: alter database open

现在到了这一步,基本上可以确定,数据库是零丢失恢复。由于asm 第一个au丢失数据严重,想要彻底修复比较难,考虑把数据库启动到mount/read only状态然后使用rman备份数据,然后进行重建asm 磁盘组,再迁移回来。至此完美恢复asmlib的磁盘被oracleasm重写的故障恢复,实现数据0丢失.当然在整个恢复过程没有于此的简单,涉及到在votedisk损坏的情况下,如何mount磁盘组,vote diskgroup的损坏修复问题,磁盘组在10g/11.1和11.2还原磁盘头备份的问题等问题.
虽然本次的恢复案例中,由于asmlib的asm disk不可见就轻易使用oracleasm createdisk命令对磁盘进行了重建,犯了一个很大错误,但是在重建之后,发现磁盘组依旧异常,未继续操作(比如重建磁盘组等),为最后的数据完全恢复创造了必要条件,使得客户的没有任何数据损失。如果再对除磁盘组继续复写操作,可能会导致数据永久性丢失。这个教训告诉我们:遇到自己不能把握的事情,及时终止,不要让错误越走越远

ORA-01115 ORA-01110 ORA-27067故障恢复案例

接到朋友恢复请求,aix 5.3,Oracle 10.2.0.1平台,数据库启动报ORA-01115 ORA-01110 ORA-27067错误,数据库无法正常打开

Mon Aug 10 13:25:22 2015
ALTER DATABASE   MOUNT
Mon Aug 10 13:25:29 2015
Setting recovery target incarnation to 1
Mon Aug 10 13:25:29 2015
Successful mount of redo thread 1, with mount id 432339141
Mon Aug 10 13:25:29 2015
Database mounted in Exclusive Mode
Completed: ALTER DATABASE   MOUNT
Mon Aug 10 13:25:36 2015
alter database open
Mon Aug 10 13:25:36 2015
Beginning crash recovery of 1 threads
 parallel recovery started with 15 processes
Mon Aug 10 13:25:37 2015
Started redo scan
Mon Aug 10 13:25:52 2015
Completed redo scan
 7889582 redo blocks read, 75305 data blocks need recovery
Mon Aug 10 13:25:53 2015
Errors in file /dc/admin/datacent/bdump/datacent_p002_144124.trc:
ORA-01115: IO error reading block from file 2 (block # 40704)
ORA-01110: data file 2: '/dc/oradata/datacent/undotbs01.dbf'
ORA-27067: size of I/O buffer is invalid
Additional information: 2
Additional information: 1572864
Mon Aug 10 13:25:53 2015
Aborting crash recovery due to slave death, attempting serial crash recovery
Mon Aug 10 13:25:53 2015
Beginning crash recovery of 1 threads
Mon Aug 10 13:25:53 2015
Started redo scan
Mon Aug 10 13:26:09 2015
Completed redo scan
 7889582 redo blocks read, 75305 data blocks need recovery
Mon Aug 10 13:26:12 2015
Aborting crash recovery due to error 1115
Mon Aug 10 13:26:12 2015
Errors in file /dc/admin/datacent/udump/datacent_ora_123384.trc:
ORA-01115: IO error reading block from file 2 (block # 39077)
ORA-01110: data file 2: '/dc/oradata/datacent/undotbs01.dbf'
ORA-27067: size of I/O buffer is invalid
Additional information: 2
Additional information: 1310720
ORA-1115 signalled during: alter database open...

这里报的前面两个错误ORA-01115 ORA-01110我们都非常熟悉,类似数据库启动遇到坏块或者io错误之时可能就会报如此错误。但是ORA-27067确实不多见,从mos上看,很多是由于rman备份之时的bug可能导致该错误。

dbv检测undo坏块文件

DBVERIFY: Release 10.2.0.1.0 - Production on Mon Aug 10 23:18:15 2015

Copyright (c) 1982, 2003, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /dc/oradata/datacent/undotbs01.dbf


DBVERIFY - Verification complete

Total Pages Examined         : 329600
Total Pages Processed (Data) : 0
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 327504
Total Pages Processed (Seg)  : 17
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 2096
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 1887888 (0.1887888)

这里可以看到,undo文件本身并没有逻辑和物理的坏块,证明因为数据库异常的原因,可能是由于ORA-27067: size of I/O buffer is invalid导致。根据官方文档ORA-01115 ORA-27067 DURING PARALLEL INSTANCE RECOVERY AFTER INSTANCE CRASH中的解释,我们基本上可以确定很可能是由于10.2.0.1在aix平台的jfs2系统中,由于大量事务操作,突然abort掉数据库(也可能断电),从而数据库在启动的时候进行实例恢复,而由于内部的bug,导致实例恢复无法成功。通过我们处理后的,数据库完美启动,数据0丢失

数据库启动日志

Mon Aug 10 16:34:14 2015
alter database open
Mon Aug 10 16:34:14 2015
Beginning crash recovery of 1 threads
parallel recovery started with 15 processes
Mon Aug 10 16:34:14 2015
Started redo scan
Mon Aug 10 16:34:27 2015
Completed redo scan
7889582 redo blocks read, 0 data blocks need recovery
Mon Aug 10 16:34:27 2015
Started redo application at
Thread 1: logseq 664704, block 1286922
Mon Aug 10 16:34:27 2015
Recovery of Online Redo Log: Thread 1 Group 4 Seq 664704 Reading mem 0
Mem# 0 errs 0: /dev/rredo04
Mon Aug 10 16:34:32 2015
Recovery of Online Redo Log: Thread 1 Group 5 Seq 664705 Reading mem 0
Mem# 0 errs 0: /dev/rredo05
Mon Aug 10 16:34:38 2015
Recovery of Online Redo Log: Thread 1 Group 6 Seq 664706 Reading mem 0
Mem# 0 errs 0: /dev/rredo06
Mon Aug 10 16:34:40 2015
Completed redo application
Mon Aug 10 16:34:40 2015
Completed crash recovery at
Thread 1: logseq 664706, block 1017805, scn 8554793334
0 data blocks read, 0 data blocks written, 7889582 redo blocks read
Mon Aug 10 16:34:40 2015
Thread 1 advanced to log sequence 664707
Thread 1 opened at log sequence 664707
Current log# 1 seq# 664707 mem# 0: /dev/rredo01
Successful open of redo thread 1
Mon Aug 10 16:34:40 2015
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Mon Aug 10 16:34:40 2015
SMON: enabling cache recovery
Mon Aug 10 16:34:40 2015
Successfully onlined Undo Tablespace 1.
Mon Aug 10 16:34:40 2015
SMON: enabling tx recovery
Mon Aug 10 16:34:41 2015
Database Characterset is ZHS32GB18030
replication_dependency_tracking turned off (no async multimaster replication found)
WARNING: AQ_TM_PROCESSES is set to 0. System operation might be adversely affected.
Mon Aug 10 16:34:41 2015
SMON: Parallel transaction recovery tried
Mon Aug 10 16:34:42 2015
db_recovery_file_dest_size of 2048 MB is 0.00% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Mon Aug 10 16:34:42 2015
Completed: alter database open
[/sql]

记录一次由于坏块和不恰当恢复引起各种ORA-600案例

朋友让我帮忙处理一个不能open的库,打开alert日志一看,傻眼了,里面是各种ORA-600的错误应有尽有,被折腾的够惨
故障后重启,无法启动主要表现在block坏块,引起的各种ORA-600等错误

Mon Mar 02 16:09:27 2015
ALTER DATABASE OPEN
Beginning crash recovery of 1 threads
 parallel recovery started with 23 processes
Started redo scan
Completed redo scan
 read 962 KB redo, 256 data blocks need recovery
Started redo application at
 Thread 1: logseq 726, block 37343
Recovery of Online Redo Log: Thread 1 Group 3 Seq 726 Reading mem 0
  Mem# 0: /u01/app/oracle/oradata/oa/redo03.log
Mon Mar 02 16:09:27 2015
RECOVERY OF THREAD 1 STUCK AT BLOCK 1673 OF FILE 3
Completed redo application of 0.27MB
Mon Mar 02 16:09:27 2015
RECOVERY OF THREAD 1 STUCK AT BLOCK 3104 OF FILE 3
Mon Mar 02 16:09:27 2015
RECOVERY OF THREAD 1 STUCK AT BLOCK 3613 OF FILE 3
Mon Mar 02 16:09:28 2015
RECOVERY OF THREAD 1 STUCK AT BLOCK 272 OF FILE 3
Mon Mar 02 16:09:28 2015
RECOVERY OF THREAD 1 STUCK AT BLOCK 2512 OF FILE 3
Hex dump of (file 2, block 92889) in trace file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_dbw2_4158.trc
Corrupt block relative dba: 0x00816ad9 (file 2, block 92889)
Bad header found during preparing block for write
Data in bad block:
 type: 0 format: 0 rdba: 0x6ad90000
 last change scn: 0x0000.00c6a052 seq: 0x1 flg: 0x00
 spare1: 0x6 spare2: 0xa2 spare3: 0x5d7e
 consistency value in tail: 0xa0520001
 check value in block header: 0x0
 block checksum disabled
Mon Mar 02 16:09:28 2015
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_p007_4196.trc  (incident=3833):
ORA-00600: internal error code, arguments: [4502], [1], [], [], [], [], [], [], [], [], [], []
Mon Mar 02 16:09:28 2015
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_p013_4208.trc  (incident=3881):
ORA-00600: internal error code, arguments: [2037], [4259067], [4244307968], [159], [243], [0], [2162032704], [100728832], [], [], [], []
Slave exiting with ORA-1172 exception
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_p009_4200.trc:
ORA-01172: recovery of thread 1 stuck at block 3613 of file 3
ORA-01151: use media recovery to recover block, restore backup if needed
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_p001_4184.trc:
ORA-01172: recovery of thread 1 stuck at block 2512 of file 3
ORA-01151: use media recovery to recover block, restore backup if needed
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_p021_4224.trc:
ORA-10388: parallel query server interrupt (failure)
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_p021_4224.trc:
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_dbw2_4158.trc  (incident=3697):
ORA-00600: internal error code, arguments: [kcbzpbuf_1], [4], [1], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/oa/oa/incident/incdir_3697/oa_dbw2_4158_i3697.trc
Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0xD2DDB7, kcbs_shrink_pool()+705] [flags: 0x0, count: 1]
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_mman_4152.trc  (incident=3673):
ORA-07445: exception encountered: core dump [kcbs_shrink_pool()+705] [SIGSEGV] [ADDR:0x0] [PC:0xD2DDB7] [SI_KERNEL(general_protection)] []
Incident details in: /u01/app/oracle/diag/rdbms/oa/oa/incident/incdir_3673/oa_mman_4152_i3673.trc
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_dbw2_4158.trc:
Mon Mar 02 16:09:34 2015
Instance terminated by DBW2, pid = 4158

第二次重启后增加新错误ORA-00600[17182]

Mon Mar 02 16:39:50 2015
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_p002_4321.trc  (incident=4993):
ORA-00600: internal error code, arguments: [17182], [0x7F548C2BDBA8], [], [], [], [], [], [], [], [], [], []

进行了一些恢复处理后,日志中报错
主要体现在进行了不完全恢复,而且应该是对redo进行了重命名或者redo头损坏锁引起的一系列提示

Beginning crash recovery of 1 threads
Started redo scan
Completed redo scan
 read 962 KB redo, 256 data blocks need recovery
Started redo application at
 Thread 1: logseq 726, block 37343
Recovery of Online Redo Log: Thread 1 Group 3 Seq 726 Reading mem 0
  Mem# 0: /u01/app/oracle/oradata/oa/redo03.log
RECOVERY OF THREAD 1 STUCK AT BLOCK 1673 OF FILE 3
Aborting crash recovery due to error 1172
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_6644.trc:
ORA-01172: recovery of thread 1 stuck at block 1673 of file 3
ORA-01151: use media recovery to recover block, restore backup if needed
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_6644.trc:
ORA-01172: recovery of thread 1 stuck at block 1673 of file 3
ORA-01151: use media recovery to recover block, restore backup if needed
ORA-1172 signalled during: alter  database open...
Tue Mar 03 11:17:59 2015
Sweep [inc][17178]: completed
Sweep [inc][17177]: completed
Sweep [inc2][17178]: completed
Tue Mar 03 11:18:00 2015
ALTER DATABASE RECOVER  database until cancel  
Media Recovery Start
 started logmerger process
Parallel Media Recovery started with 24 slaves
ORA-279 signalled during: ALTER DATABASE RECOVER  database until cancel  ...
ALTER DATABASE RECOVER    CONTINUE DEFAULT  
Tue Mar 03 11:18:06 2015
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_pr00_6701.trc:
ORA-00266: name of archived log file needed
ORA-266 signalled during: ALTER DATABASE RECOVER    CONTINUE DEFAULT  ...
ALTER DATABASE RECOVER CANCEL 
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_pr00_6701.trc:
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/u01/app/oracle/oradata/oa/system01.dbf'
Slave exiting with ORA-1547 exception
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_pr00_6701.trc:
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/u01/app/oracle/oradata/oa/system01.dbf'
ORA-10879 signalled during: ALTER DATABASE RECOVER CANCEL ...
Tue Mar 03 11:18:06 2015
Checker run found 4 new persistent data failures
Tue Mar 03 11:18:13 2015
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 12986989
Resetting resetlogs activation ID 3278679642 (0xc36cae5a)
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_6644.trc:
ORA-00367: checksum error in log file header
ORA-00322: log 1 of thread 1 is not current copy
ORA-00312: online log 1 thread 1: '/u01/app/oracle/oradata/oa/redo01.log'
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_6644.trc:

再一步折腾,增加了_allow_resetlogs_corruption= TRUE之后数据库报ORA-600[2662]

Tue Mar 03 11:19:26 2015
SMON: enabling cache recovery
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_6864.trc  (incident=18195):
ORA-00600: internal error code, arguments: [2662], [0], [13007002], [0], [13016626], [4194545], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/oa/oa/incident/incdir_18195/oa_ora_6864_i18195.trc
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_6864.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [2662], [0], [13007002], [0], [13016626], [4194545], [], [], [], [], [], []
Error 704 happened during db open, shutting down database
USER (ospid: 6864): terminating the instance due to error 704
Instance terminated by USER, pid = 6864
ORA-1092 signalled during: alter database open...
opiodr aborting process unknown ospid (6864) as a result of ORA-1092
Tue Mar 03 11:19:29 2015
ORA-1092 : opitsk aborting process

进一步折腾,可以看出来undo已经被其offline,无法正常访问,导致系统报ORA-704和ORA-00376

Wed Mar 04 21:10:58 2015
SMON: enabling cache recovery
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_17074.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/app/oracle/oradata/oa/undotbs01.dbf'
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_17074.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u01/app/oracle/oradata/oa/undotbs01.dbf'
Error 704 happened during db open, shutting down database
USER (ospid: 17074): terminating the instance due to error 704
Instance terminated by USER, pid = 17074
ORA-1092 signalled during: alter database open...
opiodr aborting process unknown ospid (17074) as a result of ORA-1092
Wed Mar 04 21:11:00 2015
ORA-1092 : opitsk aborting process

通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)检测结果见附件(xifenfei_db_recover_20150304),这里可以知道undo 不知道怎么折腾的数据文件scn较大而且还offline,
通过一些列方法(bbed,隐含参数等)调整数据库scn,强制启动数据库,报如下错误

Wed Mar 04 22:50:23 2015
SMON: enabling cache recovery
ORA-01555 caused by SQL statement below (SQL ID: 3nkd3g3ju5ph1, SCN: 0x0000.4000003e):
select obj#,type#,ctime,mtime,stime, status, dataobj#, flags, oid$, spare1, spare2 from obj$ where owner#=:1 and name=:2 and namespace=:3 and remoteowner is null and linkname is null and subname is null
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_17807.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-01555: snapshot too old: rollback segment number 10 with name "_SYSSMU10_3550978943$" too small
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_17807.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-01555: snapshot too old: rollback segment number 10 with name "_SYSSMU10_3550978943$" too small
Error 704 happened during db open, shutting down database
USER (ospid: 17807): terminating the instance due to error 704
Instance terminated by USER, pid = 17807
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (17807) as a result of ORA-1092

根据经验,该错误怀疑是文件头scn不够大,块延迟清理导致,进一步增加scn尝试,最后依旧是ORA-00704/ORA-00604/ORA-01555错误

Wed Mar 04 22:50:23 2015
SMON: enabling cache recovery
ORA-01555 caused by SQL statement below (SQL ID: 3nkd3g3ju5ph1, SCN: 0x0000.4000003e):
select obj#,type#,ctime,mtime,stime, status, dataobj#, flags, oid$, spare1, spare2 from obj$ where owner#=:1 and name=:2 and namespace=:3 and remoteowner is null and linkname is null and subname is null
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_17807.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-01555: snapshot too old: rollback segment number 10 with name "_SYSSMU10_3550978943$" too small
Errors in file /u01/app/oracle/diag/rdbms/oa/oa/trace/oa_ora_17807.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 2
ORA-01555: snapshot too old: rollback segment number 10 with name "_SYSSMU10_3550978943$" too small
Error 704 happened during db open, shutting down database
USER (ospid: 17807): terminating the instance due to error 704
Instance terminated by USER, pid = 17807
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (17807) as a result of ORA-1092

根据经验,在scn上做手脚估计难以解决给问题,对其启动过程做10046和errorstack分析发现

PARSING IN CURSOR #3 len=202 dep=2 uid=0 oct=3 lid=0 tim=1425481940448439 hv=3819099649 ad='64ff91af8' sqlid='3nkd3g3ju5ph1'
select obj#,type#,ctime,mtime,stime, status, dataobj#, flags, oid$, spare1, spare2 from obj$ where owner#=:1 and name=:2 and namespace=:3 and remoteowner is null and linkname is null and subname is null
END OF STMT
PARSE #3:c=1000,e=334,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=4,plh=0,tim=1425481940448439
BINDS #3:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7f5b3253a6f0  bln=22  avl=01  flg=05
  value=0
 Bind#1
  oacdty=01 mxl=32(06) mxlc=00 mal=00 scl=00 pre=00
  oacflg=18 fl2=0001 frm=01 csi=852 siz=32 off=0
  kxsbbbfp=7f5b3253a6b8  bln=32  avl=06  flg=05
  value="PROPS$"
 Bind#2
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7f5b3253a688  bln=24  avl=02  flg=05
  value=1
EXEC #3:c=0,e=640,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=4,plh=2853959010,tim=1425481940449147
WAIT #3: nam='db file sequential read' ela= 5 file#=1 block#=345 blocks=1 obj#=37 tim=1425481940449186
WAIT #3: nam='db file sequential read' ela= 4 file#=1 block#=44528 blocks=1 obj#=37 tim=1425481940449221
WAIT #3: nam='db file sequential read' ela= 3 file#=1 block#=5505 blocks=1 obj#=37 tim=1425481940449247

*** 2015-03-04 23:12:20.450
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=3, mask=0x0)
----- Error Stack Dump -----
ORA-00604: error occurred at recursive SQL level 2
ORA-01555: snapshot too old: rollback segment number 10 with name "_SYSSMU10_3550978943$" too small
----- Current SQL Statement for this session (sql_id=g64r07v2jn8nq) -----
SELECT NULL FROM PROPS$ WHERE NAME='BOOTSTRAP_UPGRADE_ERROR'

这里可以发现是数据库在启动的过程中需要执行SELECT NULL FROM PROPS$ WHERE NAME=’BOOTSTRAP_UPGRADE_ERROR’语句,而该语句递归调用了select obj#,type#,ctime,mtime,stime, status, dataobj#, flags, oid$, spare1, spare2 from obj$ where owner#=:1 and name=:2 and namespace=:3 and remoteowner is null and linkname is null and subname is null 语句。既然这样通过一些方法避免数据库启动之时查询SELECT NULL FROM PROPS$ WHERE NAME=’BOOTSTRAP_UPGRADE_ERROR’语句,果然数据库启动成功。

知识点补充
ORA-600 [4502] [a]

Arg [a] ITL entry with a lock count
Meaning: During ITL cleanout we clear all row locks but the ITL entry
	 still thinks there is an uncleared lock. Ie: ITL has a locked
	 row but there are no locked rows in the block 

大体意思是数据库发现undo 的itl已经被清除,但是block中的itl依然存在,从而出现ORA-600[4502],引起该问题除bug外主要原因是坏块

ORA-600 [2037] [a] [b] [d] [e] [f] [g]

Arg [a] Relative Data Block Address (RDBA) that the redo vector is for
Arg [b] The Block format  
Arg 1 RDBA in the block itself
Arg [d] The block type
Arg [e] The sequence number
Arg [f] Flags, if set  
Arg [g] The return value from the block head/tail checker.
DESCRIPTION:
  During recovery we are examining a block to ensure that it is not
  corrupt prior to applying any change vectors.
  The block has failed this check and this exception is raised

大体意思是在恢复过程中,正在检查的块,以确保它在应用任何变化向量之前不损坏。如果检查失败排除该异常ORA-600[2037],引起该问题除bug外主要原因是坏块

ORA-600 [kcbzpbuf_1],[a],[b]

Arg [a] Corruption reason
Arg [b] Calculate checksum flag

Corruption reason:
#define KCBH_GOOD    0                                     /* block is valid */
#define KCBH_ZERO    1             /* block header was entirely zero on disk */
#define KCBH_BROKEN  2      /* corruption could be from a partial disk write */
#define KCBH_CHKVAL  3               /* The check value for the block failed */
#define KCBH_CORRUPT 4     /* this is the wrong block or is not a data block */
#define KCBH_ZERONG  5               /* all zero block and it is not allowed */

Calculate checksum flag:
The possible values are 1 (Generate Checksum - db_block_checksum is enabled - default value) 
                        0 (do not generate checksum - db_block_checksum=false)

kcbzpbuf_1是该错误的源码函数

ORA-600 [17182] [a] [b] [d] [e]

DESCRIPTION:        
  Oracle has detected that the magic number in a memory chunk header has been overwritten.
  This is a heap (in memory) corruption and there is no underlying data corruption.
  The error may occur in the one of the process specific heaps 
  (the Call heap, PGA heap, or session heap) or in the shared heap (SGA).
1
ORACLE 发现在内存中重要的块头被重新,但是没有基础数据损坏,大部分和数据块或者内存损坏有关系.
<br>
ORA-600 [4552] [a] [b] 1 [d] [e]
1
DESCRIPTION:
  This assertion is raised because we are trying to unlock the rows in a 
  block, but receive an incorrect block type.
  The second argument is the block type received.

ORACLE尝试对某行进行解锁但是接收到了不正确的数据块类型,Arg [b]是接收到的数据块类型

ORA-600 [2662] [a] [b] [d] [e]

DESCRIPTION:

  A data block SCN is ahead of the current SCN.

  The ORA-600 [2662] occurs when an SCN is compared to the dependent SCN 
  stored in a UGA variable.

  If the SCN is less than the dependent SCN then we signal the ORA-600 [2662]
  internal error.

ARGUMENTS:
  Arg [a]  Current SCN WRAP
  Arg [b]  Current SCN BASE
  Arg 1  dependent SCN WRAP
  Arg [d]  dependent SCN BASE 
  Arg [e]  Where present this is the DBA where the dependent SCN came from.

主要的含义就是oracle文件头scn比某个block dependent scn小从而出现该问题

ora-01092:ORACLE 实例终止 —access$表丢失恢复

有网友咨询数据库启动报ora-01092:ORACLE 实例终止。强制断开连接,请求帮忙处理
数据库版本

Trace file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_5648.trc
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Windows NT Version V6.1 Service Pack 1
CPU                 : 1 - type 8664, 1 Physical Cores
Process Affinity    : 0x0000000000000000
Memory (Avail/Total): Ph:7605M/10239M, Ph+PgF:11979M/20477M
Instance name: orcl
Redo thread mounted by this instance: 1
Oracle process number: 18
Windows thread id: 5648, image: ORACLE.EXE (SHAD)

open数据库报ORA-01092: ORACLE 实例终止。强制断开连接

SQL> alter database open;
alter database open
*
第 1 行出现错误:
ORA-01092: ORACLE 实例终止。强制断开连接

alert日志

Thread 1 opened at log sequence 1008
  Current log# 3 seq# 1008 mem# 0: D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO03.LOG
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
SMON: enabling cache recovery
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_3964.trc:
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-00942: 表或视图不存在
Error 704 happened during db open, shutting down database
USER (ospid: 3964): terminating the instance due to error 704
Instance terminated by USER, pid = 3964
ORA-1092 signalled during: ALTER DATABASE OPEN...
ORA-1092 : opiodr aborting process unknown ospid (3384_3964)

做10046分析日志

PARSE ERROR #1:len=56 dep=1 uid=0 oct=3 lid=0 tim=1796038335 err=942
select order#,columns,types from access$ where d_obj#=:1

*** 2015-01-27 21:24:50.794
----- Error Stack Dump -----
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-00942: 表或视图不存在

通过这里可以知道数据库在启动的过程中由于无法访问access$表从而出现ORA-00942错误,又是由于该sql是数据库内部调用因为出现ORA-00604错误.
出现该错误的原因是由于:BUG:12733463 – ORA-704, ORA-604 AND ORA-942 ON TABLE ACCESS$ DURING STARTUP
官方提供方法

1. Shutdown (abort) the instance and clean up any OS structures used by the instance. 
    Eg: Ensure there is no shared memory, semaphores etc.. left lying around

2. Retry the startup.

3. If the error persists try and recover the database or recover from a backup.

惜分飞处理方法

startup  upgrade

 create table access$
 ( d_obj#        number not null,
   order#        number not null,
   columns       raw(126),
   types         number not null)
   storage (initial 10k next 100k maxextents unlimited pctincrease 0)
/

create index i_access1 on
  access$(d_obj#, order#)
  storage (initial 10k next 100k maxextents unlimited pctincrease 0)
/

asm disk误设置pvid导致asm diskgroup无法mount恢复

转载:asm disk误设置pvid导致asm diskgroup无法mount恢复
有朋友找到我说他们把以前存储到AIX直连的存储切换为含光纤交换机的存储网络后,RAC无法启动,让我给予支持.通过分析是由于换盘之后开始磁盘顺序不对,维护人员对其asm disk 设置了pvid,导致asm 磁盘组无法正常mount,从而使得含votedisk的dg的asm disk无法正常访问,从而RAC的cssd进程无法启动,同样数据文件的磁盘组也无法mount,通过kfed修复成功,实现数据0丢失.
平台版本信息(2节点RAC)

$ sqlplus -v

SQL*Plus: Release 11.2.0.4.0 Production

$ uname -a
AIX db2 1 7 00F9733E4C00

GI日志报错信息

2014-12-20 16:44:08.769: 
[ohasd(6946818)]CRS-2769:Unable to failover resource 'ora.diskmon'.
2014-12-20 16:44:11.775: 
[cssd(9502756)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; 
Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/db1/cssd/ocssd.log
2014-12-20 16:44:26.791: 
[cssd(9502756)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; 
、Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/db1/cssd/ocssd.log
2014-12-20 16:44:41.812: 
[cssd(9502756)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; 
Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/db1/cssd/ocssd.log

从这里可以看出来是由于RAC启动过程中无法获得votedisk使得其无法正常启动,通过分析日志找出来votedisk相关磁盘

2014-12-15 17:36:15.424: 
[cssd(10027070)]CRS-1605:CSSD voting file is online: /dev/rhdisk4; details in /u01/app/11.2.0/grid/log/db1/cssd/ocssd.log
2014-12-15 17:36:15.433: 
[cssd(10027070)]CRS-1605:CSSD voting file is online: /dev/rhdisk5; details in /u01/app/11.2.0/grid/log/db1/cssd/ocssd.log
2014-12-15 17:36:15.445: 
[cssd(10027070)]CRS-1605:CSSD voting file is online: /dev/rhdisk6; details in /u01/app/11.2.0/grid/log/db1/cssd/ocssd.log

从这里可以知道rhdisk4,5,6为votedisk对应磁盘,使用kfed查看磁盘头信息

$kfed read /dev/rhdisk4
kfbh.endian:                        201 ; 0x000: 0xc9
kfbh.hard:                          194 ; 0x001: 0xc2
kfbh.type:                          212 ; 0x002: *** Unknown Enum ***
kfbh.datfmt:                        193 ; 0x003: 0xc1
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
1102BEE00 C9C2D4C1 00000000 00000000 00000000  [................]
1102BEE10 00000000 00000000 00000000 00000000  [................]
        Repeat 6 times
1102BEE80 00F9733D 67553E0A 00000000 00000000  [..s=gU>.........]
1102BEE90 00000000 00000000 00000000 00000000  [................]
  Repeat 246 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][212]

$kfed read /dev/rhdisk4 blkn=1
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            2 ; 0x002: KFBTYP_FREESPC
kfbh.datfmt:                          2 ; 0x003: 0x02
kfbh.block.blk:                       1 ; 0x004: blk=1
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                  3883664132 ; 0x00c: 0xe77c0304
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdfsb.aunum:                         0 ; 0x000: 0x00000000
kfdfsb.max:                         254 ; 0x004: 0x00fe
kfdfsb.cnt:                          23 ; 0x006: 0x0017
kfdfsb.bound:                         0 ; 0x008: 0x0000
kfdfsb.flag:                          1 ; 0x00a: B=1
kfdfsb.ub1spare:                      0 ; 0x00b: 0x00
kfdfsb.spare[0]:                      0 ; 0x00c: 0x00000000
kfdfsb.spare[1]:                      0 ; 0x010: 0x00000000
kfdfsb.spare[2]:                      0 ; 0x014: 0x00000000
kfdfse[0].fse:                      119 ; 0x018: FREE=0x7 FRAG=0x7
kfdfse[1].fse:                       16 ; 0x019: FREE=0x0 FRAG=0x1
…………

$kfed read /dev/rhdisk4 blkn=510
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                     254 ; 0x004: blk=254
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                  3460116983 ; 0x00c: 0xce3d31f7
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr:         ORCLDISK ; 0x000: length=8
kfdhdb.driver.reserved[0]:            0 ; 0x008: 0x00000000
kfdhdb.driver.reserved[1]:            0 ; 0x00c: 0x00000000
kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
kfdhdb.compat:                186646528 ; 0x020: 0x0b200000
kfdhdb.dsknum:                        0 ; 0x024: 0x0000
kfdhdb.grptyp:                        2 ; 0x026: KFDGTP_NORMAL
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:                CRS_0000 ; 0x028: length=8
kfdhdb.grpname:                     CRS ; 0x048: length=3
kfdhdb.fgname:                 CRS_0000 ; 0x068: length=8
…………

由上述分析可以基本上确定是asm disk header 被破坏,进一步分析破坏原因

[db2/dev#]lspv
hdisk0          00f9733ef7cf27e9                    rootvg          active      
hdisk1          00f9733e21b953e6                    rootvg          active      
hdisk2          00f9733e21b97a83                    appvg           active      
hdisk3          00f9733e21b98434                    appvg           active      
hdisk4          00f9733d67553e0a                    None                        
hdisk5          00f9733d67553f31                    None                        
hdisk6          00f9733d67554011                    None                        
hdisk7          00f9733d67554165                    None                        
hdisk8          00f9733d675541e5                    None                        
hdisk9          00f9733d675542e4                    None                        
hdisk10         none                                None   

[db2/dev#]ls -l rhdisk*
crw-------    2 root     system       24,  1 Oct 18 11:45 rhdisk0
crw-------    1 root     system       24,  3 Oct 18 13:27 rhdisk1
crw-------    1 root     system       24,  5 Dec 20 20:02 rhdisk10
crw-------    1 root     system       24,  2 Oct 18 13:32 rhdisk2
crw-------    1 root     system       24,  0 Oct 18 13:32 rhdisk3
crw-rw----    1 grid     asmadmin     24,  8 Dec 20 20:02 rhdisk4
crw-rw----    1 grid     asmadmin     24,  9 Dec 20 20:02 rhdisk5
crw-rw----    1 grid     asmadmin     24, 10 Dec 20 20:02 rhdisk6
crw-rw----    1 grid     asmadmin     24,  4 Dec 20 20:02 rhdisk7
crw-rw----    1 grid     asmadmin     24,  6 Dec 20 20:02 rhdisk8
crw-rw----    1 grid     asmadmin     24,  7 Dec 20 20:02 rhdisk9

从这里基本上可以看出来,是由于磁盘头被重写了pvid,导致asm disk header 被破坏.进一步分析asm log,确定哪些磁盘被用作asm disk

SQL> CREATE DISKGROUP CRS NORMAL REDUNDANCY  DISK '/dev/rhdisk4',
'/dev/rhdisk5',
'/dev/rhdisk6' ATTRIBUTE 'compatible.asm'='11.2.0.0.0','au_size'='1M' /* ASMCA */ 
NOTE: Assigning number (1,0) to disk (/dev/rhdisk4)
NOTE: Assigning number (1,1) to disk (/dev/rhdisk5)
NOTE: Assigning number (1,2) to disk (/dev/rhdisk6)
NOTE: initializing header on grp 1 disk CRS_0000
NOTE: initializing header on grp 1 disk CRS_0001
NOTE: initializing header on grp 1 disk CRS_0002

SQL> CREATE DISKGROUP DATA EXTERNAL REDUNDANCY  DISK 
'/dev/rhdisk9' SIZE 614400M  ATTRIBUTE 'compatible.asm'='11.2.0.0.0','au_size'='1M' /* ASMCA */ 
NOTE: Assigning number (2,0) to disk (/dev/rhdisk9)
NOTE: initializing header on grp 2 disk DATA_0000


SQL> CREATE DISKGROUP FBA EXTERNAL REDUNDANCY  DISK 
'/dev/rhdisk8' SIZE 204800M  ATTRIBUTE 'compatible.asm'='11.2.0.0.0','au_size'='1M' /* ASMCA */ 
NOTE: Assigning number (3,0) to disk (/dev/rhdisk8)
NOTE: initializing header on grp 3 disk FBA_0000


SQL> CREATE DISKGROUP ARCH EXTERNAL REDUNDANCY  DISK 
'/dev/rhdisk7' SIZE 102400M  ATTRIBUTE 'compatible.asm'='11.2.0.0.0','au_size'='1M' /* ASMCA */ 
NOTE: Assigning number (4,0) to disk (/dev/rhdisk7)
NOTE: initializing header on grp 4 disk ARCH_0000

这里可以确定asm disk为rhdisk[4-9],通过kfed分析全部和rhdisk4一样的问题,也符合lspv查询出来的结果,使用kfed repair修复asm disk header后

SQL> alter diskgroup data mount;

Diskgroup altered.

SQL> alter diskgroup fba mount;

Diskgroup altered.

SQL> alter diskgroup arch mount;

Diskgroup altered.

SQL> alter diskgroup crs mount;

Diskgroup altered.

SQL> select group_number,disk_number,path from v$asm_disk;

GROUP_NUMBER DISK_NUMBER PATH
------------ ----------- --------------------------------------------------
           2           0 /dev/rhdisk4
           2           1 /dev/rhdisk5
           2           2 /dev/rhdisk6
           1           0 /dev/rhdisk7
           4           0 /dev/rhdisk8
           3           0 /dev/rhdisk9

6 rows selected.

SQL> select group_number,name from v$asm_diskgroup;

GROUP_NUMBER NAME
------------ ------------------------------------------------------------
           1 ARCH
           2 CRS
           3 DATA
           4 FBA

这里证明通过kfed对磁盘头的修复,asm磁盘组已经全部mount成功,GI状态也恢复正常

[db2/#]crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ARCH.dg
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.CRS.dg
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.DATA.dg
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.FBA.dg
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.LISTENER.lsnr
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.asm
               ONLINE  ONLINE       db1                      Started             
               ONLINE  ONLINE       db2                      Started             
ora.gsd
               OFFLINE OFFLINE      db1                                          
               OFFLINE OFFLINE      db2                                          
ora.net1.network
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.ons
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.registry.acfs
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       db1                                          
ora.cvu
      1        ONLINE  ONLINE       db1                                          
ora.db1.vip
      1        ONLINE  ONLINE       db1                                          
ora.db2.vip
      1        ONLINE  ONLINE       db2                                          
ora.nkora.db
      1        ONLINE  ONLINE       db1                      Open                
      2        ONLINE  ONLINE       db2                      Open                
ora.oc4j
      1        ONLINE  ONLINE       db1                                          
ora.scan1.vip
      1        ONLINE  ONLINE       db1                                          

这里忽略了一个问题,在修复磁盘头之前没有清除pvid,导致磁盘头修复后,pvid依然存储在odm中

[db2/dev#]lspv
hdisk0          00f9733ef7cf27e9                    rootvg          active      
hdisk1          00f9733e21b953e6                    rootvg          active      
hdisk2          00f9733e21b97a83                    appvg           active      
hdisk3          00f9733e21b98434                    appvg           active      
hdisk4          00f9733d67553e0a                    None                        
hdisk5          00f9733d67553f31                    None                        
hdisk6          00f9733d67554011                    None                        
hdisk7          00f9733d67554165                    None                        
hdisk8          00f9733d675541e5                    None                        
hdisk9          00f9733d675542e4                    None                        
hdisk10         none                                None   

通过分析发现fba磁盘组中无任何记录,使用该磁盘组进行直接清除pvid测试

$ sqlplus / as sysasm

SQL*Plus: Release 11.2.0.4.0 Production on Sun Dec 21 03:13:31 2014

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> alter diskgroup fba dismount;

Diskgroup altered.

SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
$ exit
You have mail in /usr/spool/mail/root
[db2/#]chdev -l hdisk8 -a pv=clear
hdisk8 changed
[db2/#]lspv
hdisk0          00f9733ef7cf27e9                    rootvg          active      
hdisk1          00f9733e21b953e6                    rootvg          active      
hdisk2          00f9733e21b97a83                    appvg           active      
hdisk3          00f9733e21b98434                    appvg           active      
hdisk4          00f9733d67553e0a                    None                        
hdisk5          00f9733d67553f31                    None                        
hdisk6          00f9733d67554011                    None                        
hdisk7          00f9733d67554165                    None                        
hdisk8          none                                None                        
hdisk9          00f9733d675542e4                    None                        
hdisk10         none                                None                        
[db2/#]su - grid
$ sqlplus / as sysasm

SQL*Plus: Release 11.2.0.4.0 Production on Sun Dec 21 03:15:19 2014

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> alter diskgroup fba mount;

Diskgroup altered.

SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

通过测试直接清除pvid asm 磁盘头依然工作正常,关闭GI,使用chdev清除hdisk[4-9]所有pvid,启动GI一切正常

[db1/#]crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ARCH.dg
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.CRS.dg
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.DATA.dg
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.FBA.dg
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.LISTENER.lsnr
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.asm
               ONLINE  ONLINE       db1                      Started             
               ONLINE  ONLINE       db2                      Started             
ora.gsd
               OFFLINE OFFLINE      db1                                          
               OFFLINE OFFLINE      db2                                          
ora.net1.network
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.ons
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.registry.acfs
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       db1                                          
ora.cvu
      1        ONLINE  ONLINE       db1                                          
ora.db1.vip
      1        ONLINE  ONLINE       db1                                          
ora.db2.vip
      1        ONLINE  ONLINE       db2                                          
ora.nkora.db
      1        ONLINE  ONLINE       db1                      Open                
      2        ONLINE  ONLINE       db2                      Open                
ora.oc4j
      1        ONLINE  ONLINE       db1                                          
ora.scan1.vip
      1        ONLINE  ONLINE       db1                                          
[db1/#]lspv
hdisk0          00f9733df7c7a9db                    rootvg          active      
hdisk1          00f9733d21dad8fe                    rootvg          active      
hdisk2          00f9733d21dbd08b                    appvg           active      
hdisk3          00f9733d21dbd2ab                    appvg           active      
hdisk4          none                                None                        
hdisk5          none                                None                        
hdisk6          none                                None                        
hdisk7          none                                None                        
hdisk8          none                                None                        
hdisk9          none                                None                        
hdisk10         none                                None 

至此设置pvid导致asm disk header损坏的asm 恢复正常,实现数据0丢失。
温馨提示:aix asm disk磁盘中不能设置pvid,否则将会导致asm disk header 损坏,无法正常mount