xfs删除数据文件恢复

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:xfs删除数据文件恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

在linux7的环境(xfs文件系统格式)中,由于误操作执行了rm -rf /操作,导致系统大部分文件被删除(oracle数据库相关文件).我们对其现场进行分析,确认相关数据在磁盘底层依旧存在
20191219042943


20191219043052


运气不错通过底层恢复,实现数据库完美open
20191219221348

oracle dmp被加密恢复

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:oracle dmp被加密恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

接到朋友恢复请求,oracle dmp文件被加密
20191124230422


通过分析文件发现该加密主要是对头尾部分block按照每16byte中8个byte置空和部分加密
20191124230702
20191124231600


通过进行恢复,对损坏部分进行跳过,剩余数据直接导入数据库,通过show=y测试数据可以正常入库,实现了dmp文件表数据的完美恢复
20191124231917
20191124232238


再一起asm disk被格式化成ext3文件系统故障恢复

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:再一起asm disk被格式化成ext3文件系统故障恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

国庆节前夕接到朋友求救电话asm disk被格式化成ext3格式了,具体操作如下
20191006205129


20191006205203

并且把这个分区直接挂载到/目录
20191006205239

由于/被挂载新格式化的控盘,导致asm磁盘组访问其他盘报错

Sun Sep 29 18:15:02 2019
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_b000_8094.trc:
ORA-15025: could not open disk "/dev/asmdisk/sdh"
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_b000_8094.trc:
ORA-15025: could not open disk "/dev/asmdisk/sdh"
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
WARNING: cache failed reading from group=1(DATA) fn=9 blk=0 count=1 from 
disk= 5 (DATA_0005) kfkist=0x20 status=0x02 osderr=0x0 file=kfc.c line=11596
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_b000_8094.trc:
ORA-15025: could not open disk "/dev/asmdisk/sdh"
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
ORA-15080: synchronous I/O operation to a disk failed
WARNING: cache succeeded reading from group=1(DATA) fn=9 blk=0 count=1 from 
disk= 7 (DATA_0007) kfkist=0x20 status=0x01 osderr=0x0 file=kfc.c line=11637
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_b000_8094.trc:
ORA-15025: could not open disk "/dev/asmdisk/sdh"
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
WARNING: PST-initiated drop of 1 disk(s) in group 1(.2380027701))

重启系统之后,重试mount 磁盘组

GMON dismounting group 1 at 18 for pid 31, osid 44279
NOTE: Disk DATA_0000 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0001 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0002 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0003 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0004 in mode 0x1 marked for de-assignment
NOTE: Disk DATA_0005 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0006 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0007 in mode 0x7f marked for de-assignment
NOTE: Disk  in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0009 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0010 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0011 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0012 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0013 in mode 0x7f marked for de-assignment
NOTE: Disk DATA_0014 in mode 0x7f marked for de-assignment
ERROR: diskgroup DATA was not mounted
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "8" is missing from group number "1" 
ERROR: ALTER DISKGROUP DATA MOUNT  /* asm agent *//* {1:2587:2} */

由于sdb(asm disk 8)被格式化,导致data磁盘组无法正常mount.这个客户运气比较好data 磁盘组是normal模式,但是由于mount到/,导致disk 4被强制drop,因此无法mount成功,但是通过一系列处理数据实现完美恢复,0数据丢失
20191006211707


如果磁盘组是外部冗余,请参考:
又一例asm格式化文件系统恢复
一次完美的asm disk被格式化ntfs恢复
oracle asm disk格式化恢复—格式化为ext4文件系统
oracle asm disk格式化恢复—格式化为ntfs文件系统

ORA-600 ktfbhget-4

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:ORA-600 ktfbhget-4

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户和我们反馈,他们数据库现在使用异常,查询dba_data_files报ORA-600 ktfbhget-4

*** 2019-09-21 13:56:42.944
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [ktfbhget-4], [384], [16], [], [], [], [], []
Current SQL statement for this session:
select tablespace_name from dba_data_files
----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
ksedmp+0148          bl       ksedst               1028F23AC ?
ksfdmp+0018          bl       01FD7F4C             
kgerinv+00e8         bl       _ptrgl               
kgesinv+0020         bl       kgerinv              000000000 ? 000000000 ?
                                                   FFFFFFFFFFF8920 ? 000400002 ?
                                                   000000000 ?
ksesin+005c          bl       kgesinv              7000001637868E8 ? 110359FD8 ?
                                                   000000100 ? 000007FFF ?
                                                   000007FFF ?
ktfbhget+047c        bl       ksesin               1029B1D18 ? 200000002 ?
                                                   000000000 ? 000000180 ?
                                                   000000000 ? 000000010 ?
                                                   000000008 ? 000000000 ?
ktfbhcf+03c0         bl       ktfbhget             FFFFFFFFFFF8DA0 ? 1103589D8 ?
                                                   201035A57C ? 1C610005A40 ?
qerfxFetch+0c94      bl       01FD7AC4             
qerjoFetch+037c      bl       _ptrgl               
rwsfcd+0060          bl       _ptrgl               
qersoFetch+0108      bl       _ptrgl               
qerjoFetch+037c      bl       _ptrgl               
qerjoFetch+037c      bl       _ptrgl               
rwsfcd+0060          bl       _ptrgl               
qeruaFetch+0140      bl       01FD7AC4             
qervwFetch+008c      bl       01FD7AC4             
opifch2+0c68         bl       01FD7AC4             
opiall0+1244         bl       opifch2              7000001626B2F28 ? 100000001 ?
                                                   FFFFFFFFFFFA3D8 ?
kpoal8+0a68          bl       opiall0              5E1000C818 ? 2200000014 ?
                                                   FFFFFFFFFFFAAD8 ? 000000000 ?
                                                   FFFFFFFFFFFAA28 ? 11028C2C8 ?
                                                   080000000 ? 4000000007FFF ?
opiodr+08e8          bl       _ptrgl               
ttcpip+0c54          bl       _ptrgl               
opitsk+0c28          bl       ttcpip               11000C818 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
opiino+0798          bl       opitsk               000000000 ? 000000000 ?
opiodr+08e8          bl       _ptrgl               
opidrv+032c          bl       opiodr               3C00000018 ? 4101F5E80 ?
                                                   FFFFFFFFFFFF4C0 ? 0A000FD98 ?
sou2o+0028           bl       opidrv               3C0C000000 ? 400000000 ?
                                                   FFFFFFFFFFFF4C0 ?
main+0138            bl       01FD758C             
__start+0098         bl       main                 000000000 ? 000000000 ?

通过询问知道由于file#=17最初由于裸设备权限异常,导致无法访问,执行了delete from file$ where file#=17,然后遭遇异常又人工插入了一条file#=17的记录到file$中,但是由于不知道具体值,可能是由于部分值插入错误引起现在问题.
20190927225858


通过分析file$表定义

create table file$                                             /* file table */
( file#         number not null,                   /* file identifier number */
  status$       number not null,                      /* status (see KTS.H): */
                                               /* 1 = INVALID, 2 = AVAILABLE */
  blocks        number not null,                   /* size of file in blocks */
                                           /* zero for bitmapped tablespaces */
  ts#           number,                         /* tablespace that owns file */
  relfile#      number,                              /* relative file number */
  maxextend     number,                                 /* maximum file size */
  inc           number,                                  /* increment amount */
  crscnwrp      number,                                 /* creation SCN wrap */
  crscnbas      number,                                 /* creation SCN base */
  ownerinstance varchar("M_IDEN"),                    /* Owner instance name */
  spare1        number,      /* tablespace-relative DBA of space file header */
                                   /* NULL for dictionary-mapped tablespaces */
  spare2        number,
  spare3        varchar2(1000),
  spare4        date
)

通过dump文件头(文件创建大小/SCN等),结合一些计算和经验值,获取到相关值重新插入正确值,一切恢复正常

delete from file$ where file#=17 and ts#=384;
insert into sys.file$ (FILE#, STATUS$, BLOCKS, TS#, RELFILE#, MAXEXTEND, INC,
 CRSCNWRP, CRSCNBAS, OWNERINSTANCE, SPARE1, SPARE2, SPARE3, SPARE4)
values (17,2,256000,384,17,0,0,3087,1631091037,null,71303170, null, null, null);
commit;

再次提醒,file$中记录请勿人工修改,以前写过相关casefile$ 删除记录恢复(delete file$ recovery)

记录一次200T的数据库恢复经历

联系:手机/微信(+86 13429648788) QQ(107644445)QQ咨询惜分飞

标题:记录一次200T的数据库恢复经历

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有一个客户恢复请求,6个节点11.2.0.3 RAC,非归档模式,数据量近200T
df_size


由于存储掉电导致数据库6个节点全部宕机,恢复硬件之后,数据库无法正常启动,报错如下:

SQL> recover database;
ORA-00279: change 318472018583 generated at 05/04/2019 17:58:05 needed for
thread 4
ORA-00289: suggestion :
/u01/app/oracle/product/11.2.0/db_1/dbs/arch4_322810_870181839.dbf
ORA-00280: change 318472018583 for thread 4 is in sequence #322810

Wed Aug 28 11:19:55 2019
ALTER DATABASE RECOVER  DATABAE 
Media Recovery Start
Serial Media Recovery started
Recovery of Online Redo Log: Thread 1 Group 14 Seq 552 Reading mem 0
  Mem# 0: +REDO/xff/log2.ora
Recovery of Online Redo Log: Thread 2 Group 15 Seq 126 Reading mem 0
  Mem# 0: +REDO/xff/log3.ora
Recovery of Online Redo Log: Thread 3 Group 18 Seq 122 Reading mem 0
  Mem# 0: +REDO/xff/log6.ora
ORA-279 signalled during: ALTER DATABASE RECOVER  database  ...
Wed Aug 28 11:21:31 2019
ALTER DATABASE RECOVER CANCEL 
Media Recovery Canceled
Completed: ALTER DATABASE RECOVER CANCEL 

数据库恢复需要thread 4 sequence #322810,查询redo信息
redo


redo已经被覆盖,数据库无法通过正常途径恢复实现数据库open,尝试屏蔽一致性强制拉库操作后

Wed Aug 28 12:40:15 2019
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_smon_51338.trc  (incident=244209):
ORA-00600: internal error code, arguments: [4137], [44.47.613406], [0], [0], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/xff/xff1/incident/incdir_244209/xff1_smon_51338_i244209.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
No Resource Manager plan active
replication_dependency_tracking turned off (no async multimaster replication found)
Wed Aug 28 12:40:16 2019
ORACLE Instance xff1 (pid = 26) - Error 600 encountered while recovering transaction (44, 47).
Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_smon_51338.trc:
ORA-00600: internal error code, arguments: [4137], [44.47.613406], [0], [0], [], [], [], [], [], [], [], []
Wed Aug 28 12:40:20 2019
Exception[type: SIGSEGV,Address not mapped to object][ADDR:0x5122000000C8][PC:0xE1B4D3,ktugru()+87][flags:0x0,count:1]
Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_p086_54066.trc  (incident=245017):
ORA-07445:exception encountered:core dump [ktugru()+87][SIGSEGV][ADDR:0x5122000000C8][Address not mapped to object]
Incident details in: /u01/app/oracle/diag/rdbms/xff/xff1/incident/incdir_245017/xff1_p086_54066_i245017.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Wed Aug 28 12:40:20 2019
Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_p000_53873.trc  (incident=244305):
ORA-00600: internal error code, arguments: [4198], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/xff/xff1/incident/incdir_244305/xff1_p000_53873_i244305.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.

提示undo异常,屏蔽回滚段之后,数据库正常打开没有任何报错信息

Wed Aug 28 12:57:15 2019
SMON: enabling cache recovery
Instance recovery: looking for dead threads
Instance recovery: lock domain invalid but no dead threads
[57676] Successfully onlined Undo Tablespace 22.
Undo initialization finished serial:0 start:2386111306 end:2386112316 diff:1010 (10 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
Wed Aug 28 12:57:17 2019
minact-scn: Inst 1 is now the master inc#:2 mmon proc-id:57624 status:0x7
minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000
No Resource Manager plan active
Starting background process GTX0
Wed Aug 28 12:57:18 2019
GTX0 started with pid=45, OS id=57777 
Starting background process RCBG
Wed Aug 28 12:57:18 2019
RCBG started with pid=46, OS id=57779 
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Wed Aug 28 12:57:19 2019
QMNC started with pid=47, OS id=57788 
Completed: ALTER DATABASE OPEN

后续涉及创建新undo,删除老undo并处理一些类似,基本上恢复正常
OPEN