系统故障oracle数据库恢复

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:系统故障oracle数据库恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

由于系统故障,导致操作系统进入,客户通过其他方式进入系统拷贝出来数据文件,redo,ctl等文件,安装版本相同的数据库,修改相关路径,启动数据库,但是启动报错,让我们给予技术支持.数据库open报ORA-600 2662错误

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [2662], [2], [2313731576], [2],
[2313735660], [12583040], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [2662], [2], [2313731575], [2],
[2313735660], [12583040], [], [], [], [], [], []
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [2662], [2], [2313731573], [2],
[2313735660], [12583040], [], [], [], [], [], []
Process ID: 22446
Session ID: 577 Serial number: 1

alert日志报错

Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_22446.trc:
ORA-00600: internal error code, arguments:[2662],[2],[2313731573],[2],[2313735660],[12583040],[],[],[],[],[],[]
Errors in file /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_22446.trc:
ORA-00600: internal error code, arguments:[2662],[2],[2313731573],[2],[2313735660],[12583040],[],[],[],[],[],[]
Error 600 happened during db open, shutting down database
USER (ospid: 22446): terminating the instance due to error 600

这个错误比较常见,特别是使用了_allow_resetlogs_corruption屏蔽一致性强制拉库的时候.解决该问题比较简单,修改数据库scn,然后open数据库成功,参考部分案例_allow_resetlogs_corruption

[oracle@localhost ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Sat Jun 20 08:31:55 2020

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup mount pfile='/tmp/pfile';
ORACLE instance started.

Total System Global Area 4275781632 bytes
Fixed Size                  2235208 bytes
Variable Size            2902459576 bytes
Database Buffers         1325400064 bytes
Redo Buffers               45686784 bytes
Database mounted.
SQL> alter database open;

Database altered.

尝试导出数据

[oracle@localhost ~]$ tail -f nohup.out 
. exporting foreign function library names for user XIFENFEI 
. exporting PUBLIC type synonyms
. exporting private type synonyms
. exporting object type definitions for user XIFENFEI 
About to export LIOVBJ2017's objects ...
. exporting database links
. exporting sequence numbers
. exporting cluster definitions
. about to export LIOVBJ2017's tables via Conventional Path ...
. . exporting table                           ABCD
                                                            1 rows exported
. . exporting table               TB_D_RECORD
EXP-00008: ORACLE error 1578 encountered
ORA-01578: ORACLE data block corrupted (file # 1, block # 290344)
ORA-01110: data file 1: '/home/oracle/app/oracle/oradata/orcl/system01.dbf'
. . exporting table                      TB_DRIVER
EXP-00008: ORACLE error 1578 encountered
ORA-01578: ORACLE data block corrupted (file # 1, block # 290344)
ORA-01110: data file 1: '/home/oracle/app/oracle/oradata/orcl/system01.dbf'
. . exporting table               TB_XFF
EXP-00008: ORACLE error 1578 encountered
ORA-01578: ORACLE data block corrupted (file # 1, block # 290344)
ORA-01110: data file 1: '/home/oracle/app/oracle/oradata/orcl/system01.dbf'
. . exporting table              TB_XFF_TM_REL
EXP-00008: ORACLE error 1578 encountered
ORA-01578: ORACLE data block corrupted (file # 1, block # 290344)
ORA-01110: data file 1: '/home/oracle/app/oracle/oradata/orcl/system01.dbf'
. . exporting table                    TB_LOCATION

由于file # 1, block # 290344坏块导致数据无法导出,通过dbv检查数据文件

[oracle@localhost trace]$ dbv file=/home/oracle/app/oracle/oradata/orcl/system01.dbf

DBVERIFY: Release 11.2.0.3.0 - Production on Sat Jun 20 08:43:49 2020

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /home/oracle/app/oracle/oradata/orcl/system01.dbf
Page 290344 is influx - most likely media corrupt
Corrupt block relative dba: 0x00446e28 (file 1, block 290344)
Fractured block found during dbv: 
Data in bad block:
 type: 6 format: 2 rdba: 0x00446e28
 last change scn: 0x0002.89e2b718 seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x61980601
 check value in block header: 0xe118
 computed block checksum: 0xd680



DBVERIFY - Verification complete

Total Pages Examined         : 298240
Total Pages Processed (Data) : 257035
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 13457
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 3598
Total Pages Processed (Seg)  : 1
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 24149
Total Pages Marked Corrupt   : 1
Total Pages Influx           : 1
Total Pages Encrypted        : 0
Highest block SCN            : 3221247831 (2.3221247831)

确认只有一个坏块,尝试通过bbed进行坏块修复

BBED> set blocksize 8192
        BLOCKSIZE       8192

BBED> set block 290344
        BLOCK#          290344

BBED> map
 File: /home/oracle/app/oracle/oradata/orcl/system01.dbf (0)
 Block: 290344                                Dba:0x00000000
------------------------------------------------------------
 KTB Data Block (Index Leaf)

 struct kcbh, 20 bytes                      @0       

 struct ktbbh, 72 bytes                     @20      

 struct kdxle, 32 bytes                     @92      

 sb2 kd_off[231]                            @124     

 ub1 freespace[3026]                        @586     

 ub1 rowdata[4508]                          @3612    

 ub4 tailchk                                @8188    


BBED> verify
DBVERIFY - Verification starting
FILE = /home/oracle/app/oracle/oradata/orcl/system01.dbf
BLOCK = 290344

Block 290344 is corrupt
Corrupt block relative dba: 0x00446e28 (file 0, block 290344)
Fractured block found during verification
Data in bad block:
 type: 6 format: 2 rdba: 0x00446e28
 last change scn: 0x0002.89e2b718 seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x61980601
 check value in block header: 0xe118
 computed block checksum: 0xd680


DBVERIFY - Verification complete

Total Blocks Examined         : 1
Total Blocks Processed (Data) : 0
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 1
Total Blocks Influx           : 2
Message 531 not found;  product=RDBMS; facility=BBED


BBED> set mode edit
        MODE            Edit

BBED> p kcbh
struct kcbh, 20 bytes                       @0       
   ub1 type_kcbh                            @0        0x06
   ub1 frmt_kcbh                            @1        0xa2
   ub1 spare1_kcbh                          @2        0x00
   ub1 spare2_kcbh                          @3        0x00
   ub4 rdba_kcbh                            @4        0x00446e28
   ub4 bas_kcbh                             @8        0x89e2b718
   ub2 wrp_kcbh                             @12       0x0002
   ub1 seq_kcbh                             @14       0x01
   ub1 flg_kcbh                             @15       0x06 (KCBHFDLC, KCBHFCKV)
   ub2 chkval_kcbh                          @16       0xe118
   ub2 spare3_kcbh                          @18       0x0000

BBED> p tailchk
ub4 tailchk                                 @8188     0x61980601

BBED> d /v
 File: /home/oracle/app/oracle/oradata/orcl/system01.dbf (0)
 Block: 290344  Offsets: 8188 to 8191  Dba:0x00000000
-------------------------------------------------------
 01069861                            l ...a

 <16 bytes per line>

BBED> m /x 010618b7
 File: /home/oracle/app/oracle/oradata/orcl/system01.dbf (0)
 Block: 290344           Offsets: 8188 to 8191           Dba:0x00000000
------------------------------------------------------------------------
 010618b7 

 <32 bytes per line>

BBED> sum apply
Check value for File 0, Block 290344:
current = 0xe118, required = 0xe118

BBED> verify 
DBVERIFY - Verification starting
FILE = /home/oracle/app/oracle/oradata/orcl/system01.dbf
BLOCK = 290344


DBVERIFY - Verification complete

Total Blocks Examined         : 1
Total Blocks Processed (Data) : 0
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 1
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 0
Total Blocks Influx           : 0
Message 531 not found;  product=RDBMS; facility=BBED

继续尝试导出数据,遭遇ORA-08103,参考相关文章:
模拟普通ORA-08103并解决
模拟极端ORA-08103并解决
数据库启动ORA-08103故障恢复

EXP-00056: ORACLE error 8103 encountered
ORA-08103: object no longer exists

通过对其进行处理,恢复该记录之外的所有记录,客户创建新库导入数据,数据库恢复基本完成

asm磁盘dd破坏恢复

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:asm磁盘dd破坏恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户和我们反馈,由于运维人员操作错误,对一个磁盘组的asm disk进行了dd操作,导致部分数据丢失(客户数据文件存放在两个磁盘组中,其中一个被dd掉[误以为只是存放归档,其实由于第一个磁盘组空间不足,把部分数据文件放进该磁盘组])
20200603221148


通过对asm 日志进行分析发现被dd的磁盘是一个磁盘组,以前恢复过类似的asm 磁盘被dd的案例(asm磁盘头全部损坏数据0丢失恢复,上次因为dd破坏较少,所以可以通过修复磁盘组直接恢复出来里面数据,但是这次被dd了50M,直接修复磁盘头恢复数据基本上不太可能.通过工具对其进行磁盘扫描,参考:asm disk header 彻底损坏恢复,对扫描结果进行分析,发现不少数据块是重复,无法较好的实现重组效果.
20200612002025

类似出现这样的情况一般是由于该asm磁盘组中有同一个文件号的数据多份(比如一个磁盘组中有两个库,同一个数据文件存储多份),通过方面分析,该库没有文件多份存储而且该磁盘组中只有一个数据库.进一步分析仅有的asm alert日志(大部分日志被清除),发现类似信息

Sun Mar 14 05:25:40 CST 2020
NOTE: F1X0 found on disk 0 fcn 0.60289025
NOTE: cache opening disk 0 of grp 2: HIS_FLASH00 label:HIS_FLASH00
NOTE: cache opening disk 1 of grp 2: HIS_FLASH01 label:HIS_FLASH01
NOTE: cache opening disk 2 of grp 2: HIS_FLASH02 label:HIS_FLASH02
NOTE: cache opening disk 3 of grp 2: HIS_DATA03 label:HIS_DATA03
NOTE: cache mounting (first) group 2/0xCCD84BCB (HIS_FLASH)
* allocate domain 2, invalid = TRUE 
kjbdomatt send to node 0
Sun Mar 14 05:25:40 CST 2020
NOTE: attached to recovery domain 2
Sun Mar 14 05:25:40 CST 2020
NOTE: starting recovery of thread=1 ckpt=405.816 group=2
NOTE: advancing ckpt for thread=1 ckpt=405.819
NOTE: cache recovered group 2 to fcn 0.65493064
Sun Mar 14 05:25:40 CST 2020
NOTE: LGWR attempting to mount thread 1 for disk group 2
NOTE: LGWR mounted thread 1 for disk group 2
NOTE: opening chunk 1 at fcn 0.65493064 ABA 
NOTE: seq=406 blk=820 
Sun Mar 14 05:25:40 CST 2020
NOTE: cache mounting group 2/0xCCD84BCB (HIS_FLASH) succeeded
SUCCESS: diskgroup HIS_FLASH was mounted
Sun Mar 14 05:25:42 CST 2020
NOTE: recovering COD for group 2/0xccd84bcb (HIS_FLASH)
SUCCESS: completed COD recovery for group 2/0xccd84bcb (HIS_FLASH)
Sun Mar 14 05:25:47 CST 2020
Starting background process ASMB
ASMB started with pid=17, OS id=14599

初步可以定位,很可能是由于在3月份对该磁盘组进行了扩容,从而发生了数据文件的rebalance操作,从而出现了某些block有重复现象,对于这类情况,通过结合asm字典信息进行分析可以完全规避该问题,对数据文件进行恢复,dbv进行检查,一切正常
20200612000805


对所有文件类似处理,结合正常磁盘组中数据文件,对数据库进行直接open,实现完美恢复.
如果您遇到此类情况,无法解决请联系我们,提供专业ORACLE数据库恢复技术支持
Phone:13429648788    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com
这次数据能够完美恢复属于侥幸,因为asm disk被dd了50M(正常情况下4个磁盘的磁盘组每个磁盘dd 50M之后,很可能有部分数据文件被覆盖,该客户该磁盘组最初是存储归档日志,因此数据文件写入位置相对比较靠后,从而没有被dd破坏)

12.2数据库启动报ORA-007445 lmebucp错误

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:12.2数据库启动报ORA-007445 lmebucp错误

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有一个客户找到我们,说他们是数据库启动之时报的错误和数据库不能open 报ORA-7445 lmebucp错类似,让我们对其进行恢复支持,通过分析确定客户数据库版本为12.2.0.1

Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Productio
PL/SQL Release 12.2.0.1.0 - Production	
CORE 12.2.0.1.0 Production	
TNS for Linux: Version 12.2.0.1.0 - Production	
NLSRTL Version 12.2.0.1.0 - Production	

alert日志报错

--最初报错
2020-05-11T03:43:06.787164+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_m004_3253.trc  (incident=639048):
ORA-00600: 内部错误代码, 参数: [kkpo_rcinfo_defstg:delseg], [72280], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_639048/orcl_m004_3253_i639048.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2020-05-11T03:43:14.250993+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_m004_3253.trc  (incident=639049):
ORA-00600: 内部错误代码, 参数: [kkpo_rcinfo_defstg:delseg], [72280], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_639049/orcl_m004_3253_i639049.trc
2020-05-11T03:43:21.286310+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_m004_3253.trc  (incident=639050):
ORA-00600: 内部错误代码, 参数: [kkpo_rcinfo_defstg:delseg], [72280], [], [], [], [], [], [], [], [], [], []
ORA-06512: 在 line 2
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_639050/orcl_m004_3253_i639050.trc
2020-05-11T03:43:28.059048+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2020-05-11T03:43:28.074681+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_m004_3253.trc:
ORA-00600: 内部错误代码, 参数: [kkpo_rcinfo_defstg:delseg], [72280], [], [], [], [], [], [], [], [], [], []
ORA-06512: 在 line 2
2020-05-11T08:31:22.416087+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_j000_16511.trc:
ORA-12012: 自动执行作业 141 出错
ORA-30732: 表中不包含用户可见的列
ORA-06512: 在 line 1


---关闭数据库之后重启报错
2020-05-11T09:42:43.234769+08:00
ALTER DATABASE OPEN
2020-05-11T09:42:44.353085+08:00
Ping without log force is disabled:
  instance mounted in exclusive mode.
Endian type of dictionary set to little
2020-05-11T09:42:44.660388+08:00
TT00: Gap Manager starting (PID:31134)
2020-05-11T09:42:45.180876+08:00
Thread 1 opened at log sequence 78596
  Current log# 2 seq# 78596 mem# 0: /u01/app/oracle/oradata/orcl/redo02.log
Successful open of redo thread 1
2020-05-11T09:42:45.181357+08:00
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Exception [type: SIGSEGV, Address not mapped to object][ADDR:0x0][PC:0x10F4C112, lmebucp()+34][flags: 0x0, count: 1]
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_31125.trc  (incident=646445):
ORA-07445: exception encountered:core dump[lmebucp()+34][SIGSEGV][ADDR:0x0][PC:0x10F4C112][Address not mapped to object]
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_646445/orcl_ora_31125_i646445.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2020-05-11T09:42:46.070049+08:00
*****************************************************************
An internal routine has requested a dump of selected redo.
This usually happens following a specific internal error, when
analysis of the redo logs will help Oracle Support with the
diagnosis.
It is recommended that you retain all the redo logs generated (by
all the instances) during the past 12 hours, in case additional
redo dumps are required to help with the diagnosis.
*****************************************************************
2020-05-11T09:42:53.344955+08:00
Instance Critical Process (pid: 41, ospid: 31125) died unexpectedly
PMON (ospid: 31026): terminating the instance due to error 12752
2020-05-11T09:42:53.377209+08:00
System state dump requested by (instance=1, osid=31026 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_diag_31046_20200511094253.trc
2020-05-11T09:42:56.690224+08:00
Instance terminated by PMON, pid = 31026

对启动过程做10046跟踪

PARSING IN CURSOR #139821999713040 len=65 dep=1 uid=0 oct=3 lid=0 
tim=2200910467 hv=1762642493 ad='1bfd79130'sqlid='aps3qh1nhzkjx'
select line#, sql_text from bootstrap$ where obj# not in (:1, :2)
END OF STMT
PARSE #139821999713040:c=378,e=378,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=2200910467
BINDS #139821999713040:

 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7f2ad89fe6c8  bln=22  avl=02  flg=05
  value=59
 Bind#1
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=08 fl2=1000001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7f2ad89fe698  bln=24  avl=06  flg=05
  value=4294967295
EXEC #139821999713040:c=219,e=658,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=867914364,tim=2200911218
WAIT #139821999713040: nam='db file sequential read' ela= 9 file#=1 block#=520 blocks=1 obj#=59 tim=2200911300
WAIT #139821999713040: nam='db file scattered read' ela= 19 file#=1 block#=521 blocks=3 obj#=59 tim=2200911559
FETCH #139821999713040:c=404,e=404,p=4,cr=5,cu=0,mis=0,r=0,dep=1,og=4,plh=867914364,tim=2200911646
STAT #139821999713040 id=1 cnt=0 pid=0 pos=1 obj=59 op='TABLE ACCESS FULL BOOTSTRAP$ (cr=5 pr=4 pw=0 str=1 time=406 us)'

*** 2020-05-11T12:25:30.977832+08:00
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x0] [PC:0x10F4C112, lmebucp()+34] [flags: 0x0, count: 1]
2020-05-11T12:25:31.026914+08:00
Incident 718445 created, dump file:/u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_718445/orcl_ora_14324_i718445.trc
ORA-07445: exception encountered:core dump [lmebucp()+34][SIGSEGV][ADDR:0x0][PC:0x10F4C112][Address not mapped to object]

根据该报错,可以大概定位数据库重启之后报ORA-07445 lmebucp()+34错误不能正常启动是由于bootstrap$表异常导致.

BBED> p kcvfhrdb
ub4 kcvfhrdb                                @96       0x00400208

BBED> set block 523
        BLOCK#          523

BBED> map
 File: F:\temp\12.2\1\system01.dbf (0)
 Block: 523                                   Dba:0x00000000
------------------------------------------------------------
 KTB Data Block (Table/Cluster)

 struct kcbh, 20 bytes                      @0

 struct ktbbh, 48 bytes                     @20

 struct kdbh, 14 bytes                      @68

 struct kdbt[1], 4 bytes                    @82

 sb2 kdbr[20]                               @86

 ub1 freespace[1015]                        @126

 ub1 rowdata[7047]                          @1141

 ub4 tailchk                                @8188

BBED> p *kdbr[1]
rowdata[6431]
-------------
ub1 rowdata[6431]                           @7572     0x3c

BBED> x /rnnc
rowdata[6431]                               @7572
-------------
flag@7572: 0x3c (KDRHFL, KDRHFF, KDRHFD, KDRHFH)
lock@7573: 0x01
cols@7574:    0

通过bbed定位rootdba,然后dump相关block,随机找一条记录,确认bootstrap表无后效记录.但是该数据库在重启之前已经报了ORA-600 kkpo_rcinfo_defstg:delseg和ORA-30732错误,很可能还有其他的基表异常.通过先修复bootstrap$记录,然后根据该表中记录分析其他相关表记录,最终确定tab$中记录也异常,通过bbed 批量循环修复方法,对其进行恢复,open数据库,可以验证数据没有问题.至此该问题解决,但是没有找出来故障原因(是人为破坏【直接人工删除】,还是某种工具带入恶意脚本导致,类似:plsql dev引起的数据库被黑勒索比特币实现原理分析和解决方案,亦或者是数据库安装介质带有恶意程序,类似:plsql dev引起的数据库被黑勒索比特币实现原理分析和解决方案)

oracle asm中drop pdb恢复方法

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:oracle asm中drop pdb恢复方法

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

最近分析研究了一种asm disk 数据文件丢失的新恢复方法,通过cod和acd进行恢复,我们对于asm 磁盘组中的文件的改变(创建,删除,扩大,缩小等)操作会体现在cod和acd中有一些体现,类似oracle 数据库中数据的变化都会体现在redo和undo中类似.可以通过对他们的分析,确认文件在asm磁盘组中的分配关系,从而实现数据文件的恢复.我这里通过模拟创建表空间,插入数据,删除表空间(同时也删除文件),然后相关cod和acd分析,实现数据文件恢复.
创建表空间

SQL> create tablespace xifenfei datafile '+data' size 1G;

Tablespace created.

SQL> alter tablespace xifenfei add datafile '+data' size 128M autoextend on;

Tablespace altered.

创建模拟表并插入数据

SQL> create table t_xifenfei tablespace xifenfei as
  2  select * from dba_objects;

Table created.

SQL> insert into t_xifenfei select * from t_xifenfei;

73013 rows created.

…………

SQL> insert into t_xifenfei select * from t_xifenfei;

18691328 rows created.

SQL> COMMIT;

Commit complete.

SQL> SELECT COUNT(1) FROM T_XIFENFEI;

  COUNT(1)
----------
  37382656

SQL> alter system checkpoint;

System altered.

SQL> select bytes/1024/1024/1024,TABLESPACE_NAME FROM USER_SEGMENTS  where segment_name='T_XIFENFEI';

BYTES/1024/1024/1024 TABLESPACE_NAME
-------------------- ------------------------------
          5.56738281 XIFENFEI

删除表空间

SQL> drop tablespace xifenfei including contents and datafiles;

Tablespace dropped.

查看alert日志信息

2020-04-23T18:23:43.088997-04:00
drop tablespace xifenfei including contents and datafiles
2020-04-23T18:23:46.226654-04:00
Deleted Oracle managed file +DATA/ORA18C/DATAFILE/xifenfei.262.1035571131
Deleted Oracle managed file +DATA/ORA18C/DATAFILE/xifenfei.263.1038507123
Completed: drop tablespace xifenfei including contents and datafiles

这里我们可以看到被删除的两个数据文件的asm number为262和263,如果要恢复该表空间数据需要先恢复出来该数据文件.由于文件被删除,文件对应存储在asm里面的找出来相关的数据分配关系才可以对其恢复出来.尝试找该文件的分配extent映射关系
尝试直接读取extent map

[root@rac18c2 ~]#  kfed read /dev/xifenfei-sdb  aus=4194304 blkn=0|grep f1b1locn
kfdhdb.f1b1locn:                     10 ; 0x0d4: 0x0000000a
[root@rac18c2 ~]#  kfed read /dev/xifenfei-sdb  aus=4194304 aun=10 blkn=262|more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            4 ; 0x002: KFBTYP_FILEDIR
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                     262 ; 0x004: blk=262
kfbh.block.obj:                       1 ; 0x008: file=1
kfbh.check:                  4132734069 ; 0x00c: 0xf6548475
kfbh.fcn.base:                     6741 ; 0x010: 0x00001a55
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfffdb.node.incarn:          1035571132 ; 0x000: A=0 NUMM=0x1edcc7de
kfffdb.node.frlist.number:          264 ; 0x004: 0x00000108
kfffdb.node.frlist.incarn:            0 ; 0x008: A=0 NUMM=0x0
kfffdb.hibytes:                       0 ; 0x00c: 0x00000000
kfffdb.lobytes:              1073750016 ; 0x010: 0x40002000
kfffdb.xtntcnt:                       0 ; 0x014: 0x00000000
kfffdb.xtnteof:                     257 ; 0x018: 0x00000101
kfffdb.blkSize:                    8192 ; 0x01c: 0x00002000
kfffdb.flags:                         1 ; 0x020: O=1 S=0 S=0 D=0 C=0 I=0 R=0 A=0
kfffdb.fileType:                      2 ; 0x021: 0x02
…………
kfffdb.mxshad:                        0 ; 0x498: 0x0000
kfffdb.mxprnt:                        0 ; 0x49a: 0x0000
kfffdb.fmtBlks:                  131073 ; 0x49c: 0x00020001
kfffde[0].xptr.au:           4294967295 ; 0x4a0: 0xffffffff
kfffde[0].xptr.disk:              65535 ; 0x4a4: 0xffff
kfffde[0].xptr.flags:                 0 ; 0x4a6: L=0 E=0 D=0 S=0 R=0 I=0
kfffde[0].xptr.chk:                  42 ; 0x4a7: 0x2a
kfffde[1].xptr.au:           4294967295 ; 0x4a8: 0xffffffff
kfffde[1].xptr.disk:              65535 ; 0x4ac: 0xffff
kfffde[1].xptr.flags:                 0 ; 0x4ae: L=0 E=0 D=0 S=0 R=0 I=0
kfffde[1].xptr.chk:                  42 ; 0x4af: 0x2a

这里的kfffdb.blkSize为8192证明以前很可能是数据文件,但是kfffde中的au和disk全部被置为f,说明extent的直接映射表已经被置空,更不用说间接extent分配映射表,也就是说这条路无法走通.变换一种思路,既然文件从asm中删除掉的extent映射关系被清空,那是否可以通过对应的acd记录来找到相关数据.通过对acd进行分析,发现在删除drop的时间点相关类似记录,通过分析对应的acd记录,发现直接extent和扩展extent分配全部被置空,无法通过该思路进行恢复
尝试acd恢复extent map

kfracdb2.lge[2].bcd[2].kfbl.blk:    262 ; 0x1cc: blk=262
kfracdb2.lge[2].bcd[2].kfbl.obj:      1 ; 0x1d0: file=1
kfracdb2.lge[2].bcd[2].kfcn.base:  6216 ; 0x1d4: 0x00001848
kfracdb2.lge[2].bcd[2].kfcn.wrap:     0 ; 0x1d8: 0x00000000
kfracdb2.lge[2].bcd[2].oplen:        20 ; 0x1dc: 0x0014
kfracdb2.lge[2].bcd[2].blkIndex:    262 ; 0x1de: 0x0106
kfracdb2.lge[2].bcd[2].flags:        28 ; 0x1e0: F=0 N=0 F=1 L=1 V=1 A=0 C=0 R=0
kfracdb2.lge[2].bcd[2].opcode:      162 ; 0x1e2: 0x00a2
kfracdb2.lge[2].bcd[2].kfbtyp:        4 ; 0x1e4: KFBTYP_FILEDIR
kfracdb2.lge[2].bcd[2].redund:       17 ; 0x1e5: SCHE=0x1 NUMB=0x1
kfracdb2.lge[2].bcd[2].pad:       63903 ; 0x1e6: 0xf99f
kfracdb2.lge[2].bcd[2].KFFFD_PEXT.xtntcnt:0 ; 0x1e8: 0x00000000
kfracdb2.lge[2].bcd[2].KFFFD_PEXT.xtntblk:0 ; 0x1ec: 0x0000
kfracdb2.lge[2].bcd[2].KFFFD_PEXT.xnum:0 ; 0x1ee: 0x0000
kfracdb2.lge[2].bcd[2].KFFFD_PEXT.xcnt:1 ; 0x1f0: 0x0001
kfracdb2.lge[2].bcd[2].KFFFD_PEXT.setflg:0 ; 0x1f2: 0x00
kfracdb2.lge[2].bcd[2].KFFFD_PEXT.flags:0 ; 0x1f3: O=0 S=0 S=0 D=0 C=0 I=0 R=0 A=0
kfracdb2.lge[2].bcd[2].KFFFD_PEXT.xptr[0].au:4294967292 ; 0x1f4: 0xfffffffc
kfracdb2.lge[2].bcd[2].KFFFD_PEXT.xptr[0].disk:0 ; 0x1f8: 0x0000
kfracdb2.lge[2].bcd[2].KFFFD_PEXT.xptr[0].flags:0 ; 0x1fa: L=0 E=0 D=0 S=0 R=0 I=0
kfracdb2.lge[2].bcd[2].KFFFD_PEXT.xptr[0].chk:41 ; 0x1fb: 0x29
kfracdb2.lge[2].bcd[2].au[0]:        10 ; 0x1fc: 0x0000000a
kfracdb2.lge[2].bcd[2].disks[0]:      0 ; 0x200: 0x0000


kfracdb2.lge[20].bcd[1].kfbl.blk:2147483648 ; 0xe54: blk=0 (indirect)
kfracdb2.lge[20].bcd[1].kfbl.obj:   262 ; 0xe58: file=262
kfracdb2.lge[20].bcd[1].kfcn.base: 3280 ; 0xe5c: 0x00000cd0
kfracdb2.lge[20].bcd[1].kfcn.wrap:    0 ; 0xe60: 0x00000000
kfracdb2.lge[20].bcd[1].oplen:       16 ; 0xe64: 0x0010
kfracdb2.lge[20].bcd[1].blkIndex:     0 ; 0xe66: 0x0000
kfracdb2.lge[20].bcd[1].flags:       28 ; 0xe68: F=0 N=0 F=1 L=1 V=1 A=0 C=0 R=0
kfracdb2.lge[20].bcd[1].opcode:     163 ; 0xe6a: 0x00a3
kfracdb2.lge[20].bcd[1].kfbtyp:      12 ; 0xe6c: KFBTYP_INDIRECT
kfracdb2.lge[20].bcd[1].redund:      17 ; 0xe6d: SCHE=0x1 NUMB=0x1
kfracdb2.lge[20].bcd[1].pad:      63903 ; 0xe6e: 0xf99f
kfracdb2.lge[20].bcd[1].KFFIX_PEXT.xtntblk:0 ; 0xe70: 0x0000
kfracdb2.lge[20].bcd[1].KFFIX_PEXT.xnum:0 ; 0xe72: 0x0000
kfracdb2.lge[20].bcd[1].KFFIX_PEXT.xcnt:1 ; 0xe74: 0x0001
kfracdb2.lge[20].bcd[1].KFFIX_PEXT.ub2spare:0 ; 0xe76: 0x0000
kfracdb2.lge[20].bcd[1].KFFIX_PEXT.xptr[0].au:4294967292 ; 0xe78: 0xfffffffc
kfracdb2.lge[20].bcd[1].KFFIX_PEXT.xptr[0].disk:0 ; 0xe7c: 0x0000
kfracdb2.lge[20].bcd[1].KFFIX_PEXT.xptr[0].flags:0 ; 0xe7e: L=0 E=0 D=0 S=0 R=0 I=0
kfracdb2.lge[20].bcd[1].KFFIX_PEXT.xptr[0].chk:41 ; 0xe7f: 0x29
kfracdb2.lge[20].bcd[1].au[0]:      296 ; 0xe80: 0x00000128
kfracdb2.lge[20].bcd[1].disks[0]:     0 ; 0xe84: 0x0000

直接通过删除的acd记录来找出来数据文件分配的extent也行不通,通过分析相关acd block,终于找到了对应的extent分配的相关记录

kfracdb2.lge[21].bcd[0].kfbl.blk:     2 ; 0x918: blk=2
kfracdb2.lge[21].bcd[0].kfbl.obj:2147483648 ; 0x91c: disk=0
kfracdb2.lge[21].bcd[0].kfcn.base: 2820 ; 0x920: 0x00000b04
kfracdb2.lge[21].bcd[0].kfcn.wrap:    0 ; 0x924: 0x00000000
kfracdb2.lge[21].bcd[0].oplen:       28 ; 0x928: 0x001c
kfracdb2.lge[21].bcd[0].blkIndex:     2 ; 0x92a: 0x0002
kfracdb2.lge[21].bcd[0].flags:       28 ; 0x92c: F=0 N=0 F=1 L=1 V=1 A=0 C=0 R=0
kfracdb2.lge[21].bcd[0].opcode:      73 ; 0x92e: 0x0049
kfracdb2.lge[21].bcd[0].kfbtyp:       3 ; 0x930: KFBTYP_ALLOCTBL
kfracdb2.lge[21].bcd[0].redund:      18 ; 0x931: SCHE=0x1 NUMB=0x2
kfracdb2.lge[21].bcd[0].pad:      63903 ; 0x932: 0xf99f
kfracdb2.lge[21].bcd[0].KFDAT_ALLOC2.vaa.curidx:2416 ; 0x934: 0x0970
kfracdb2.lge[21].bcd[0].KFDAT_ALLOC2.vaa.nxtidx:8 ; 0x936: 0x0008
kfracdb2.lge[21].bcd[0].KFDAT_ALLOC2.vaa.prvidx:8 ; 0x938: 0x0008
kfracdb2.lge[21].bcd[0].KFDAT_ALLOC2.vaa.asz:0 ; 0x93a: KFDASZ_1X
kfracdb2.lge[21].bcd[0].KFDAT_ALLOC2.vaa.frag:0 ; 0x93b: 0x00
kfracdb2.lge[21].bcd[0].KFDAT_ALLOC2.vaa.total:0 ; 0x93c: 0x0000
kfracdb2.lge[21].bcd[0].KFDAT_ALLOC2.vaa.free:0 ; 0x93e: 0x0000
kfracdb2.lge[21].bcd[0].KFDAT_ALLOC2.vaa.fnum:262 ; 0x940: 0x00000106
kfracdb2.lge[21].bcd[0].KFDAT_ALLOC2.vaa.xnum:0 ; 0x944: 0x00000000
kfracdb2.lge[21].bcd[0].KFDAT_ALLOC2.vaa.flags:8388608 ; 0x948: 0x00800000
kfracdb2.lge[21].bcd[0].KFDAT_ALLOC2.lxnum:3 ; 0x94c: 0x03
kfracdb2.lge[21].bcd[0].KFDAT_ALLOC2.spare1:0 ; 0x94d: 0x00
kfracdb2.lge[21].bcd[0].KFDAT_ALLOC2.spare2:0 ; 0x94e: 0x0000
kfracdb2.lge[21].bcd[0].au[0]:        0 ; 0x950: 0x00000000
kfracdb2.lge[21].bcd[0].au[1]:       11 ; 0x954: 0x0000000b
kfracdb2.lge[21].bcd[0].disks[0]:     0 ; 0x958: 0x0000
kfracdb2.lge[21].bcd[0].disks[1]:     0 ; 0x95a: 0x0000
kfracdb2.lge[21].bcd[1].kfbl.blk:   262 ; 0x95c: blk=262
kfracdb2.lge[21].bcd[1].kfbl.obj:     1 ; 0x960: file=1
kfracdb2.lge[21].bcd[1].kfcn.base: 3018 ; 0x964: 0x00000bca
kfracdb2.lge[21].bcd[1].kfcn.wrap:    0 ; 0x968: 0x00000000
kfracdb2.lge[21].bcd[1].oplen:       20 ; 0x96c: 0x0014
kfracdb2.lge[21].bcd[1].blkIndex:   262 ; 0x96e: 0x0106
kfracdb2.lge[21].bcd[1].flags:       28 ; 0x970: F=0 N=0 F=1 L=1 V=1 A=0 C=0 R=0
kfracdb2.lge[21].bcd[1].opcode:     162 ; 0x972: 0x00a2
kfracdb2.lge[21].bcd[1].kfbtyp:       4 ; 0x974: KFBTYP_FILEDIR
kfracdb2.lge[21].bcd[1].redund:      17 ; 0x975: SCHE=0x1 NUMB=0x1
kfracdb2.lge[21].bcd[1].pad:      63903 ; 0x976: 0xf99f
kfracdb2.lge[21].bcd[1].KFFFD_PEXT.xtntcnt:0 ; 0x978: 0x00000000
kfracdb2.lge[21].bcd[1].KFFFD_PEXT.xtntblk:0 ; 0x97c: 0x0000
kfracdb2.lge[21].bcd[1].KFFFD_PEXT.xnum:0 ; 0x97e: 0x0000
kfracdb2.lge[21].bcd[1].KFFFD_PEXT.xcnt:1 ; 0x980: 0x0001
kfracdb2.lge[21].bcd[1].KFFFD_PEXT.setflg:0 ; 0x982: 0x00
kfracdb2.lge[21].bcd[1].KFFFD_PEXT.flags:0 ; 0x983: O=0 S=0 S=0 D=0 C=0 I=0 R=0 A=0
kfracdb2.lge[21].bcd[1].KFFFD_PEXT.xptr[0].au:297 ; 0x984: 0x00000129
kfracdb2.lge[21].bcd[1].KFFFD_PEXT.xptr[0].disk:0 ; 0x988: 0x0000
kfracdb2.lge[21].bcd[1].KFFFD_PEXT.xptr[0].flags:0 ; 0x98a: L=0 E=0 D=0 S=0 R=0 I=0
kfracdb2.lge[21].bcd[1].KFFFD_PEXT.xptr[0].chk:2 ; 0x98b: 0x02
kfracdb2.lge[21].bcd[1].au[0]:       10 ; 0x98c: 0x0000000a
kfracdb2.lge[21].bcd[1].disks[0]:     0 ; 0x990: 0x0000
kfracdb2.lge[21].bcd[2].kfbl.blk:     2 ; 0x994: blk=2
kfracdb2.lge[21].bcd[2].kfbl.obj:     4 ; 0x998: file=4
kfracdb2.lge[21].bcd[2].kfcn.base: 3019 ; 0x99c: 0x00000bcb
kfracdb2.lge[21].bcd[2].kfcn.wrap:    0 ; 0x9a0: 0x00000000
kfracdb2.lge[21].bcd[2].oplen:        8 ; 0x9a4: 0x0008
kfracdb2.lge[21].bcd[2].blkIndex:     2 ; 0x9a6: 0x0002
kfracdb2.lge[21].bcd[2].flags:       28 ; 0x9a8: F=0 N=0 F=1 L=1 V=1 A=0 C=0 R=0
kfracdb2.lge[21].bcd[2].opcode:     211 ; 0x9aa: 0x00d3
kfracdb2.lge[21].bcd[2].kfbtyp:      16 ; 0x9ac: KFBTYP_COD_DATA
kfracdb2.lge[21].bcd[2].redund:      17 ; 0x9ad: SCHE=0x1 NUMB=0x1
kfracdb2.lge[21].bcd[2].pad:      63903 ; 0x9ae: 0xf99f
kfracdb2.lge[21].bcd[2].KFRCOD_DATA.offset:60 ; 0x9b0: 0x003c
kfracdb2.lge[21].bcd[2].KFRCOD_DATA.length:4 ; 0x9b2: 0x0004
kfracdb2.lge[21].bcd[2].KFRCOD_DATA.data[0]:1 ; 0x9b4: 0x01
kfracdb2.lge[21].bcd[2].KFRCOD_DATA.data[1]:0 ; 0x9b5: 0x00
kfracdb2.lge[21].bcd[2].KFRCOD_DATA.data[2]:0 ; 0x9b6: 0x00
kfracdb2.lge[21].bcd[2].KFRCOD_DATA.data[3]:0 ; 0x9b7: 0x00
kfracdb2.lge[21].bcd[2].au[0]:       16 ; 0x9b8: 0x00000010
kfracdb2.lge[21].bcd[2].disks[0]:     0 ; 0x9bc: 0x0000

对于类似这样的记录,通过汇总处理获得所有的file number对应的au extent分配记录,并且生成dd语句,然后生成文件
20200428213654


dbv检查恢复文件

[oracle@rac18c2 tmp]$ dbv file=262.dbf

DBVERIFY: Release 18.0.0.0.0 - Production on Tue Apr 28 09:45:37 2020

Copyright (c) 1982, 2018, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /tmp/262.dbf


DBVERIFY - Verification complete

Total Pages Examined         : 131072
Total Pages Processed (Data) : 123400
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 631
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 7041
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 10001146011 (2.1411211419)

[oracle@rac18c2 tmp]$ dbv file=263.dbf

DBVERIFY: Release 18.0.0.0.0 - Production on Tue Apr 28 09:51:05 2020

Copyright (c) 1982, 2018, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /tmp/263.dbf



DBVERIFY - Verification complete

Total Pages Examined         : 643584
Total Pages Processed (Data) : 595146
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 821
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 36865
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 10001153042 (2.1411218450)
[oracle@rac18c2 tmp]$ 

dul确认恢复文件中数据

DUL>  scan database
start scan database in parallel 1...
scan database completed.
DUL>  sample all segment 
start get segment info: data_obj#: 74635
finish get segment info: data_obj#: 74635
guess col def: 22
write segment info: 74635, 1, 8, 22
write sample rows: 10000
DUL>  unload 74635
 2020-04-24 22:32:11 unloading table segment 74635...
 2020-04-24 22:35:36 unloaded 37382656 rows.
DUL>

通过dbv和实际数据条数对比,此种恢复恢复的数据完全正常,不用使用底层碎片扫描,亦可恢复asm中被删除数据文件数据.在某些特殊情况下,此类方法配合底层碎片恢复,可以实现更加完美的恢复效果.对于比较典型的oracle pdb被删除(因为有多个数据文件的文件号是一样的,无法直接通过底层碎片扫描恢复),通过此类方法可以非常好的恢复出来.
类似文章参考:
asm disk header 彻底损坏恢复
ASM未正常启动,使用dd找回数据文件
asm磁盘组操作不当导致数据文件丢失恢复
如果你不幸遭遇asm 数据文件被删除/丢失,或者误删除pdb等相关事宜,如果需要恢复可以联系我们,提供专业数据库恢复服务
Phone:13429648788    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com

asm disk被加入vg恢复

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:asm disk被加入vg恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

接到客户恢复请求:把oracle asm datagroup中的一个磁盘增加到vg中,现在磁盘组无法mount,数据库无法正常启动.远程登录现场进行分析发现情况如下:
操作系统层面分析
history操作记录
add_asm_disk_to_vg


这里比较明显把一个磁盘做成pv,并且加入到vg中,然后再分配199G给lv_home,系统层面分析lvm信息

--查看pv信息
[root@xff1 ~]# pvdisplay 
  --- Physical volume ---
  PV Name               /dev/sda2
  VG Name               VolGroup
  PV Size               277.98 GiB / not usable 3.00 MiB
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              71161
  Free PE               0
  Allocated PE          71161
  PV UUID               F6QO3f-065n-mwTW-Xbq2-Xx2y-c8HD-Tkr7V7
   
  --- Physical volume ---
  PV Name               /dev/sdg    <----新加入的磁盘
  VG Name               VolGroup
  PV Size               200.00 GiB / not usable 4.00 MiB
  Allocatable           yes 
  PE Size               4.00 MiB
  Total PE              51199
  Free PE               255
  Allocated PE          50944
  PV UUID               i69vUG-nCIK-dtxL-FvpD-2WZd-bvLv-n7lwrb

[root@xff1 ~]# lvdisplay 
  --- Logical volume ---
  LV Path                /dev/VolGroup/lv_root
  LV Name                lv_root
  VG Name                VolGroup
  LV UUID                JUNnkN-m4zq-D0gh-h42b-cUM1-Wh1q-ZMtQE4
  LV Write Access        read/write
  LV Creation host, time localhost.localdomain, 2017-07-19 20:08:47 +0800
  LV Status              available
  # open                 1
  LV Size                50.00 GiB
  Current LE             12800
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:0
   
  --- Logical volume ---
  LV Path                /dev/VolGroup/lv_home
  LV Name                lv_home
  VG Name                VolGroup
  LV UUID                eZTkLt-cNGX-371i-m8Bd-VdD9-q6Hz-wYDRIJ
  LV Write Access        read/write
  LV Creation host, time localhost.localdomain, 2017-07-19 20:08:54 +0800
  LV Status              available
  # open                 1
  LV Size                422.97 GiB      <-----lv大小编程422G,应该是被扩了199G后结果
  Current LE             108281
  Segments               2
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:2
   
  --- Logical volume ---
  LV Path                /dev/VolGroup/lv_swap
  LV Name                lv_swap
  VG Name                VolGroup
  LV UUID                54P9ok-VpwO-zM68-hvwY-9GBf-89yb-8xQAMn
  LV Write Access        read/write
  LV Creation host, time localhost.localdomain, 2017-07-19 20:09:23 +0800
  LV Status              available
  # open                 1
  LV Size                4.00 GiB
  Current LE             1024
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:1


[root@xff1 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
                       50G  3.9G   43G   9% /
tmpfs                  63G  509M   63G   1% /dev/shm
/dev/sda1             477M   44M  408M  10% /boot
/dev/mapper/VolGroup-lv_home
                      417G  226G  170G  58% /home  <----增加了199g空间,剩余只剩170G,证明增加空间之后最少使用了30G以上  

基于这样的情况,基本上可以确定sdg盘加入VolGroup中并且被分配给 lv_home中,而且还写入了数据(/home空闲空间只剩余170G,lv_home当时扩了199G).
asm层面分析
asm磁盘组无法mount,提示缺少一块磁盘

SQL> ALTER DISKGROUP DATA MOUNT  /* asm agent *//* {1:12056:279} */ 
NOTE: cache registered group DATA number=1 incarn=0xa1dbff16
NOTE: cache began mount (first) of group DATA number=1 incarn=0xa1dbff16
NOTE: Assigning number (1,2) to disk (/dev/asmdisk3)
NOTE: Assigning number (1,1) to disk (/dev/asmdisk2)
Sat Apr 25 13:04:58 2020
ERROR: no read quorum in group: required 1, found 0 disks
NOTE: cache dismounting (clean) group 1/0xA1DBFF16 (DATA) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 81552, image: oracle@rac2db1 (TNS V1-V3)
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 1/0xA1DBFF16 (DATA) 
NOTE: cache ending mount (fail) of group DATA number=1 incarn=0xa1dbff16
NOTE: cache deleting context for group DATA 1/0xa1dbff16
GMON dismounting group 1 at 19 for pid 30, osid 81552
NOTE: Disk DATA_0001 in mode 0x9 marked for de-assignment
NOTE: Disk DATA_0002 in mode 0x9 marked for de-assignment
ERROR: diskgroup DATA was not mounted
ORA-15032: not all alterations performed
ORA-15017: diskgroup "DATA" cannot be mounted
ORA-15040: diskgroup is incomplete
ERROR: ALTER DISKGROUP DATA MOUNT  /* asm agent *//* {1:12056:279} */

kfed分析磁盘信息
kfed_lvm


报错比较明显asm disk磁盘头被lvm的信息取代(因为asm disk 被加入到vg中),根据前面的分析,该磁盘被写入数据很可能超过30G,使用kfed分析一个随意au,确认被破坏,证明开始判断基本正确

root@xff1:/home/oracle11g$kfed read /dev/asmdisk1 aun=10000
kfbh.endian:                         51 ; 0x000: 0x33
kfbh.hard:                           55 ; 0x001: 0x37
kfbh.type:                           32 ; 0x002: *** Unknown Enum ***
kfbh.datfmt:                         42 ; 0x003: 0x2a
kfbh.block.blk:              1329801248 ; 0x004: blk=1329801248
kfbh.block.obj:              1128615502 ; 0x008: file=347726
kfbh.check:                  1094999892 ; 0x00c: 0x41445f54
kfbh.fcn.base:                675103060 ; 0x010: 0x283d4154
kfbh.fcn.wrap:               1448232275 ; 0x014: 0x56524553
kfbh.spare1:                 1598374729 ; 0x018: 0x5f454349
kfbh.spare2:                 1162690894 ; 0x01c: 0x454d414e
7F7843EAD400 2A203733 4F432820 43454E4E 41445F54  [37 * (CONNECT_DA]
7F7843EAD410 283D4154 56524553 5F454349 454D414E  [TA=(SERVICE_NAME]
7F7843EAD420 6361723D 29626432 44494328 5250283D  [=rac2db)(CID=(PR]
7F7843EAD430 4152474F 3A443D4D 4341505C DFCF3153  [OGRAM=D:\PACS1..]
7F7843EAD440 B3BEB7BB 6369445C 65536D6F 72657672  [....\DicomServer]
7F7843EAD450 445C524D 6D6F6369 76726553 524D7265  [MR\DicomServerMR]
7F7843EAD460 6578652E 4F482829 573D5453 362D4E49  [.exe)(HOST=WIN-6]
7F7843EAD470 51414C38 54553645 28294A30 52455355  [8LAQE6UT0J)(USER]
7F7843EAD480 6D64413D 73696E69 74617274 2929726F  [=Administrator))]
7F7843EAD490 202A2029 44444128 53534552 5250283D  [) * (ADDRESS=(PR]
7F7843EAD4A0 434F544F 743D4C4F 28297063 54534F48  [OTOCOL=tcp)(HOST]
7F7843EAD4B0 2E30313D 2E303831 30332E31 4F502829  [=10.180.1.30)(PO]
7F7843EAD4C0 343D5452 37333539 2A202929 74736520  [RT=49537)) * est]
7F7843EAD4D0 696C6261 2A206873 63617220 20626432  [ablish * rac2db ]
7F7843EAD4E0 3231202A 0A343135 2D534E54 31353231  [* 12514.TNS-1251]
7F7843EAD4F0 54203A34 6C3A534E 65747369 2072656E  [4: TNS:listener ]
7F7843EAD500 73656F64 746F6E20 72756320 746E6572  [does not current]
7F7843EAD510 6B20796C 20776F6E 7320666F 69767265  [ly know of servi]
7F7843EAD520 72206563 65757165 64657473 206E6920  [ce requested in ]
7F7843EAD530 6E6E6F63 20746365 63736564 74706972  [connect descript]
………………
7F7843EAE300 6F636944 7265536D 4D726576 69445C52  [DicomServerMR\Di]
7F7843EAE310 536D6F63 65767265 2E524D72 29657865  [comServerMR.exe)]
7F7843EAE320 534F4828 49573D54 4F302D4E 314B304A  [(HOST=WIN-0OJ0K1]
7F7843EAE330 4955304E 55282954 3D524553 696D6441  [N0UIT)(USER=Admi]
7F7843EAE340 7473696E 6F746172 29292972 28202A20  [nistrator))) * (]
7F7843EAE350 52444441 3D535345 4F525028 4F434F54  [ADDRESS=(PROTOCO]
7F7843EAE360 63743D4C 48282970 3D54534F 312E3031  [L=tcp)(HOST=10.1]
7F7843EAE370 312E3038 2930332E 524F5028 35353D54  [80.1.30)(PORT=55]
7F7843EAE380 29383632 202A2029 61747365 73696C62  [268)) * establis]
7F7843EAE390 202A2068 32636172 2A206264 35323120  [h * rac2db * 125]
7F7843EAE3A0 540A3431 312D534E 34313532 4E54203A  [14.TNS-12514: TN]
7F7843EAE3B0 696C3A53 6E657473 64207265 2073656F  [S:listener does ]
7F7843EAE3C0 20746F6E 72727563 6C746E65 6E6B2079  [not currently kn]
7F7843EAE3D0 6F20776F 65732066 63697672 65722065  [ow of service re]
7F7843EAE3E0 73657571 20646574 63206E69 656E6E6F  [quested in conne]
7F7843EAE3F0 64207463 72637365 6F747069 34320A72  [ct descriptor.24]
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][32]

通过上述kfed可以看到第10000 au的位置被写入的是数据库异常之后listener.log的信息(该数据库安装在/home目录中),进一步证明覆盖,通过以下信息证明sdg就是asmdisk1

[root@xff1 dev]# ls -l sdg
brw-rw---- 1 root disk 8,  96 Apr 25 00:05 sdg
[root@xff1 dev]# ls -l asmdisk1
brw-rw---- 1 grid asmadmin 8,  96 Apr 25 00:05 asmdisk1

基于现在的情况,data磁盘组是由三块 200G的磁盘组成,第一块磁盘被意外加入vg,并且写入数据大于30G,无法从asm层面直接通过kfed修复磁盘组,然后直接mount,只能通过oracle asm磁盘数据块重组技术(asm disk header 彻底损坏恢复)实现没有覆盖数据的恢复.
20200426202310


该客户运气还不错,通过仅剩的2019年12月份几天的不成功备份找出来所有的数据文件(无归档),然后强制拉库成功.通过碎片恢复的最新的数据文件数据结合2019年12月份备份,实现绝大部分业务数据恢复,最大限度减少客户损失.对于oracle rac数据库服务器磁盘操作需要谨慎.
如果不幸有类似oracle asm disk被破坏(格式化,dd部分,做成lv等),需要进行恢复支持,可以联系我们,做专业的恢复评估,最大限度,最快速度抢救数据,减少损失
Phone:13429648788    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com
恢复过部分asm被格式化案例:
又一例asm格式化文件系统恢复
一次完美的asm disk被格式化ntfs恢复
oracle asm disk格式化恢复—格式化为ext4文件系统
oracle asm disk格式化恢复—格式化为ntfs文件系统