删除redo导致ORA-00313 ORA-00312故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:删除redo导致ORA-00313 ORA-00312故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户由于误操作直接rm 删除了redo文件,导致数据库启动报ORA-00313 ORA-00312错

2025-03-07T14:49:16.325723+08:00
ALTER DATABASE OPEN
2025-03-07T14:50:00.124620+08:00
Ping without log force is disabled:
  instance mounted in exclusive mode.
2025-03-07T14:50:00.198907+08:00
Crash Recovery excluding pdb 2 which was cleanly closed.
2025-03-07T14:50:00.238450+08:00
Beginning crash recovery of 1 threads
 parallel recovery started with 15 processes
 Thread 1: Recovery starting at checkpoint rba (logseq 2966 block 74686), scn 0
2025-03-07T14:50:00.325246+08:00
Started redo scan
2025-03-07T14:50:00.341193+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_2681.trc:
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '/u01/app/oracle/oradata/orcl/redo02.log'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 7
2025-03-07T14:50:00.372632+08:00
Slave encountered ORA-10388 exception during crash recovery
…………
2025-03-07T14:50:00.385698+08:00
Slave encountered ORA-10388 exception during crash recovery
2025-03-07T14:50:00.388594+08:00
Aborting crash recovery due to error 313
2025-03-07T14:50:00.388739+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_2681.trc:
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '/u01/app/oracle/oradata/orcl/redo02.log'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 7
2025-03-07T14:50:00.389243+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_2681.trc:
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '/u01/app/oracle/oradata/orcl/redo02.log'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 7
ORA-313 signalled during: ALTER DATABASE OPEN...

然后客户把历史的redo文件拷贝过来,尝试恢复数据库,报ORA-00314 ORA-00312错误

2025-03-07T15:07:30.784759+08:00
ALTER DATABASE OPEN
Ping without log force is disabled:
  instance mounted in exclusive mode.
2025-03-07T15:07:30.808497+08:00
Crash Recovery excluding pdb 2 which was cleanly closed.
2025-03-07T15:07:30.838664+08:00
Beginning crash recovery of 1 threads
 parallel recovery started with 15 processes
 Thread 1: Recovery starting at checkpoint rba (logseq 2966 block 74686), scn 0
2025-03-07T15:07:30.897547+08:00
Started redo scan
2025-03-07T15:07:30.898222+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_4106.trc:
ORA-00314: log 2 of thread 1, expected sequence# 2966 doesn't match 1646
ORA-00312: online log 2 thread 1: '/u01/app/oracle/oradata/orcl/redo02.log'
2025-03-07T15:07:30.930089+08:00
Slave encountered ORA-10388 exception during crash recovery
…………
2025-03-07T15:07:30.940051+08:00
Slave encountered ORA-10388 exception during crash recovery
2025-03-07T15:07:30.942274+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_mz00_4138.trc:
ORA-00312: online log 1 thread 1: '/u01/app/oracle/oradata/orcl/redo01.log'
2025-03-07T15:07:30.945509+08:00
Slave encountered ORA-10388 exception during crash recovery
2025-03-07T15:07:30.945512+08:00
Slave encountered ORA-10388 exception during crash recovery
2025-03-07T15:07:30.948369+08:00
Aborting crash recovery due to error 314
2025-03-07T15:07:30.948488+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_4106.trc:
ORA-00314: log 2 of thread 1, expected sequence# 2966 doesn't match 1646
ORA-00312: online log 2 thread 1: '/u01/app/oracle/oradata/orcl/redo02.log'
2025-03-07T15:07:30.949390+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_4106.trc:
ORA-00314: log 2 of thread 1, expected sequence# 2966 doesn't match 1646
ORA-00312: online log 2 thread 1: '/u01/app/oracle/oradata/orcl/redo02.log'
ORA-314 signalled during: ALTER DATABASE OPEN...

使用Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)脚本收集信息之后数据文件头状态和所需要redo信息
df_header


数据库需要sequence#为2966的redo日志,但是当前已经被删除,基于当前情况,只能进行强制非一致性恢复,尝试强制打开库

SQL> recover database;                 
ORA-00283: recovery session canceled due to errors
ORA-00314: log 2 of thread 1, expected sequence# 2966 doesn't match 1646
ORA-00312: online log 2 thread 1: '/u01/app/oracle/oradata/orcl/redo02.log'

QL> select group#,status,sequence# from v$log;

	  GROUP# STATUS 		 SEQUENCE#
---------------- ---------------- ----------------
	       1 UNUSED 			 0
	       3 CURRENT		      2967
	       2 ACTIVE 		      2966

SQL> 
SQL> 
SQL> recover database until cancel;
ORA-00279: change 163033183 generated at 03/07/2025 14:04:20 needed for thread 1
ORA-00289: suggestion : /u01/app/oracle/recovery_area/orcl/archivelog/2025_03_08/o1_mf_1_2966_%u_.arc
ORA-00280: change 163033183 for thread 1 is in sequence #2966


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/u01/app/oracle/oradata/orcl/system01.dbf'


ORA-01112: media recovery not started


SQL> alter database open resetlogs;

Database altered.

运气不错,直接打开数据库成功,然后逻辑导出数据,完成此处恢复.这个让我想起来了一些类似案例:
Oracle 23ai rm redo*.log恢复
清空redo,导致ORA-27048: skgfifi: file header information is invalid
由于默认情况下oracle的redo文件扩展名是.log,然后被当做是不重要文件从而被清理导致数据库故障,在oracle服务器上清理数据之前建议查询v$datafile,v$logfile,v$tempfile,v$controlfile来确认是否是数据库文件

aix磁盘损坏oracle数据库恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:aix磁盘损坏oracle数据库恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户aix环境硬盘异常导致系统无法启动,初步判断是数据文件存放在本地磁盘的空间中(本地两个盘都异常,系统无法启动),通过硬件恢复厂商镜像出来,但是通过aix文件系统直接挂载提示需要fsck,但是做fsck之后,提示大量文件丢失(最关键的数据文件和备份文件都被自动删除)
dmp-remove
fsck-remove


基于这种情况,采用镜像主机挂载的方式肯定不行,考虑直接采用软件直接解析,能够看到软件,可惜由于大量的文件系统元数据损坏,解析出来的数据文件和dmp也不可用(大量损坏和空块)
QQ20250307-122939

基于上述情况,只能采用碎片级别恢复出来数据文件
QQ20250307-123123

然后使用dul工具把数据恢复到表中,实现最大限度抢救客户数据
QQ20250307-123653

对于数据库级别恢复,这个是理论上的终极恢复方法

linux rm -rf 删除数据文件恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:linux rm -rf 删除数据文件恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户由于误操作删除了oracle的部分数据文件(rm -rf 方式删除),然后自己尝试进行恢复操作,对部分文件执行了offline,导致比较麻烦的后果
jb
offline-file


接手故障之后,第一时间对其进行了镜像(因为有部分文件句柄已经释放,为了方式覆盖进一步破坏),对于没有释放的句柄可以通过类似方法进行恢复,参考以前类似恢复:
Oracle误删除数据文件恢复
Solaris rm datafile recovery—利用句柄误删除数据文件恢复

!cp  269  /u01/app/oracle/oradata/orcl/XXXXXX_DATA01.dbf
alter database datafile 12 offline;
recover datafile 12;
alter database datafile 12 online;

对于删除文件,而且句柄已经释放的文件,通过文件系统层面反删除进行恢复,参考以前类似恢复:
rm -rf误删Oracle数据库恢复
记录一次rm -rf 删除数据文件异常恢复
rm -rf 删除数据文件恢复方法—文件系统反删除+oracle碎片重组
rm


在这个恢复过程中,由于客户linux是物理机,而且本地空间不足,无法对其进行镜像,采用dd命令直接写镜像到其他的linux机器上(通过nfs方式),然后在win机器上直接挂载该nfs,记录下win上挂载nfs操作
nfs
mount-nfs

CDM备份缺少归档打开数据库报ORA-600 kcbzib_kcrsds_1故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:CDM备份缺少归档打开数据库报ORA-600 kcbzib_kcrsds_1故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户联系我们,说一个19c的库,由于产生归档较多在备份过程中把部分归档删除而导致没有备份成功,从而使得他们那边一个误操作需要恢复使用备份做不完全恢复,备份厂商进行恢复并尝试强制打开库报ORA-600 kcbzib_kcrsds_1错误
ORA-600-kcbzibz_kcrsds_1


由于客户那边使用的是CDM方式备份,可以较快的准备出来一个新环境,观察客户在应用日志过程中,文件头的scn一直不变,怀疑文件头由于begin backup冻结,对其进行dump,发现确实做了hot backup操作,而且没有end backup
scn
begin-backup

由于缺少归档,不完全恢复没有成功,begin backup 也无法正常结束,对于这种情况,先尝试调整到文件头最大的scn值,尝试打开库

SQL> alter database open resetlogs ;
alter database open resetlogs 
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [],
[], [], [], [], [], [], []
Process ID: 3949615
Session ID: 5111 Serial number: 30040


--重启到mount状态
SQL> set numw 16
col CHECKPOINT_TIME for a40
set lines 150
set pages 1000
SELECT status,
to_char(checkpoint_time,'yyyy-mm-dd hh24:mi:ss') checkpoint_time,FUZZY,checkpoint_change#,
count(*) ROW_NUM
FROM v$datafile_header
GROUP BY status, checkpoint_change#, to_char(checkpoint_time,'yyyy-mm-dd hh24:mi:ss'),fuzzy
ORDER BY status, checkpoint_change#, checkpoint_time;SQL> SQL> SQL> SQL>   2    3    4    5    6  

STATUS  CHECKPOINT_TIME                          FUZ CHECKPOINT_CHANGE#          ROW_NUM
------- ---------------------------------------- --- ------------------ ----------------
ONLINE  2025-02-08 22:43:01                      YES     15626238353558               56

打开库失败,只能找出来数据库最大的block中最大scn,然后调整文件头scn的值,实现数据库open

SQL> oradebug setmypid
Statement processed.
SQL> oradebug DUMPvar SGA kcsgscn_
SQL> kscn8 kcsgscn_ [060017E98, 060017EA0) = 00000000 00000000
SQL> oradebug DUMPvar SGA kcsgscn_
kscn8 kcsgscn_ [060017E98, 060017EA0) = 8CD9C896 00000E4D
SQL> 
SQL> 
SQL> 
SQL> alter database open ;
alter database open 
*
ERROR at line 1:
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1: '/cdmbak/db/xifenfei/ob_data_D-XIFENFEI-SYSTEM_FNO-1_t13930nd'

SQL> recover database;
Media recovery complete.
SQL> alter database open;

Database altered.

SQL> 

使用exp导出客户需要的表,完成本次恢复任务
对于ORA-600 kcbzib_kcrsds_1恢复的情况,以前有过大量恢复案例,修改数据库scn即可
kcbzib_kcrsds_1报错汇总
12C数据库报ORA-600 kcbzib_kcrsds_1故障处理
存储故障,强制拉库报ORA-600 kcbzib_kcrsds_1处理
Patch SCN工具一键恢复ORA-600 kcbzib_kcrsds_1
此类故障处理太多,不一一列举,解决这个错误之后,数据库open成功,然后安排逻辑迁移即可

ORA-07445: exception encountered: core dump [expgod()+43] [IN_PAGE_ERROR]

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-07445: exception encountered: core dump [expgod()+43] [IN_PAGE_ERROR]

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

数据库在运行过程中报O/S-Error: (OS 23) 数据错误(循环冗余检查)错误

Thu Jan 30 22:00:02 2025
Begin automatic SQL Tuning Advisor run for special tuning task  "SYS_AUTO_SQL_TUNING_TASK"
Thu Jan 30 22:00:04 2025
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_j000_12576.trc:
ORA-12012: error on auto execute of job 155962
ORA-01115: IO error reading block from file  (block # )
ORA-01110: data file 1: 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM01.DBF'
ORA-27070: async read/write failed
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 23) 数据错误(循环冗余检查)。
ORA-06512: at "SYS.DBMS_STATS", line 25836
ORA-06512: at "SYS.DBMS_STATS", line 26171
End automatic SQL Tuning Advisor run for special tuning task  "SYS_AUTO_SQL_TUNING_TASK"
Fri Jan 31 02:00:00 2025
Clearing Resource Manager plan via parameter
Fri Jan 31 08:15:46 2025
Thread 1 advanced to log sequence 4420 (LGWR switch)
  Current log# 1 seq# 4420 mem# 0: D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO01.LOG
Fri Jan 31 10:53:57 2025
Process J000 died, see its trace file
kkjcre1p: unable to spawn jobq slave process 
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_cjq0_1140.trc:
Fri Jan 31 10:53:57 2025
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_j000_7916.trc:
ORA-27102: out of memory
OSD-00043: 附加错误信息
O/S-Error: (OS 1455) 页面文件太小,无法完成操作。
Process J000 died, see its trace file
kkjcre1p: unable to spawn jobq slave process 
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_cjq0_1140.trc:
Fri Jan 31 10:54:03 2025
Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0x18] [PC:0xB778B02, clsdcxini()+90]
ERROR: Unable to normalize symbol name for the following short stack (at offset 199):
dbgexProcessError()+193<-dbgeExecuteForError()+65<-dbgePostErrorKGE()+1726<-dbkePostKGE_kgsf()+75
<-kgeade()+560<-kgerev()+125<-kgerec5()+60<-sss_xcpt_EvalFilterEx()+1869<-sss_xcpt_EvalFilter()+174
<-.1.4_5+59<-00007FFD0245F306<-00007FFD024735AF<-00007FFD023D4AAF<-00007FFD0247231E<-clsdcxini()+90
<-clsdinit()+124<-ksdnfy()+225<-kscnfy()+778<-opirip()+86<-opidrv()+909<-sou2o()+98<-opimai_real()+299
<-opimai()+191<-BackgroundThreadStart()+693<-00007FFD020E7E94<-00007FFD02437AD1
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_j000_9920.trc  (incident=39621):
ORA-07445: exception encountered: core dump [clsdcxini()+90][ACCESS_VIOLATION][ADDR:0x18][PC:0xB778B02][UNABLE_TO_READ]
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_39621\orcl_j000_9920_i39621.trc

然后无法正常启动,报Exception [type: IN_PAGE_ERROR, ] [] [PC:0x2C9C015, expgod()+43]错误

Wed Feb 05 09:43:51 2025
Sweep [inc][39621]: completed
Successful mount of redo thread 1, with mount id 1720066005
Database mounted in Exclusive Mode
Lost write protection disabled
Completed: alter database mount exclusive
alter database open
Beginning crash recovery of 1 threads
 parallel recovery started with 3 processes
Started redo scan
Completed redo scan
 read 140 KB redo, 62 data blocks need recovery
Started redo application at
 Thread 1: logseq 4420, block 42375
Wed Feb 05 09:44:00 2025
Recovery of Online Redo Log: Thread 1 Group 1 Seq 4420 Reading mem 0
  Mem# 0: D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO01.LOG
Completed redo application of 0.09MB
Completed crash recovery at
 Thread 1: logseq 4420, block 42656, scn 94456019
 62 data blocks read, 62 data blocks written, 140 redo k-bytes read
Wed Feb 05 09:44:01 2025
Exception [type: IN_PAGE_ERROR, ] [] [PC:0x2C9C015, expgod()+43]
ERROR: Unable to normalize symbol name for the following short stack (at offset 199):
dbgexProcessError()+193<-dbgeExecuteForError()+65<-dbgePostErrorKGE()+1726
<-dbkePostKGE_kgsf()+75<-kgeade()+560<-kgerev()+125<-kgerec5()+60<-sss_xcpt_EvalFilterEx()+1869
<-sss_xcpt_EvalFilter()+174<-.1.4_5+59<-00007FFD0245F306<-00007FFD024735AF<-00007FFD023D4AAF
<-00007FFD0247231E<-expgod()+43<-xtyopncb()+241<-qctcopn()+613<-qctcopn()+392<-qctcpqb()+290
<-qctcpqbl()+52<-xtydrv()+148<-opitca()+1091<-kksLoadChild()+9008<-kxsGetRuntimeLock()+2320
<-kksfbc()+15225<-kkspsc0()+2117<-kksParseCursor()+181<-opiosq0()+2538<-opiosq()+23<-opiodr()+1662
<-rpidrus()+862<-rpidru()+154<-rpiswu2()+2757<-rpidrv()+6105<-rpisplu()+1607<-kqldFixedTableLoadCols()+345
<-kqldcor()+2534<-kglslod()+352<-kqlslod()+52<-PGOSF455_kqlsublod()+125<-kqllod()+7284<-kglobld()+1354
<-kglobpn()+1900<-kglpim()+336<-qcdlgtd()+260<-qcsfplob()+166<-qcsprfro()+903<-qcsprfro_tree()
+292<-qcsprfro_tree()+373<-qcspafq()+96
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_mmon_15468.trc  (incident=39749):
ORA-07445: exception encountered: core dump [expgod()+43] [IN_PAGE_ERROR] [] [PC:0x2C9C015] [] []
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_39749\orcl_mmon_15468_i39749.trc
Wed Feb 05 09:44:02 2025
Thread 1 advanced to log sequence 4421 (thread open)
Thread 1 opened at log sequence 4421
  Current log# 2 seq# 4421 mem# 0: D:\APP\ADMINISTRATOR\ORADATA\ORCL\REDO02.LOG
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Wed Feb 05 09:44:02 2025
SMON: enabling cache recovery
Wed Feb 05 09:44:11 2025
Exception [type: IN_PAGE_ERROR, ] [] [PC:0x2C9C015, expgod()+43]
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_9308.trc  (incident=39781):
ORA-07445: ??????: ???? [expgod()+43] [IN_PAGE_ERROR] [] [PC:0x2C9C015] [] []
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_39781\orcl_ora_9308_i39781.trc
Wed Feb 05 09:44:19 2025
PMON (ospid: 12376): terminating the instance due to error 397
Instance terminated by PMON, pid = 12376

基于上述的Exception [type: IN_PAGE_ERROR, ] [] [PC:0x2C9C015, expgod()+43]错误,第一反应就是可能由于底层损坏导致数据块损坏,dbv检查文件是否报错
dbv-system


检查系统日志确认异常
20250208203059

尝试拷贝文件也报错
QQ20250208-203152

已经比较明确由于底层问题,解决给问题之前,需要先对文件系统进行处理,然后再对恢复出来的数据文件恢复数据