InnoDB: Cannot open table db/tab from the internal data dictionary of InnoDB though the .frm file for the table exists

Posted on 2022 年 07 月 09 日 by 惜分飞

标题：InnoDB: Cannot open table db/tab from the internal data dictionary of InnoDB though the .frm file for the table exists

有客户找到我说mysql无法正常使用报错如下（库名.表名 doesn’t exist）:

客户确认表的frm和ibd文件均存在.通过检查mysql日志,发现大量类似异常

而且情况一样ibd和frm文件均存在,系统日志中提示：
2022-07-08T08:37:57.935514Z 1423 [Warning] InnoDB: Cannot open table 库名/表名 from the internal data dictionary of InnoDB though the .frm file for the table exists. Please refer to http://dev.mysql.com/doc/refman/5.7/en/innodb-troubleshooting.html for how to resolve the issue.
根据对mysql的认知,出现此类问题,很可能是mysql的ibdata文件出了问题,对日志进行分析,发现类似记录

2022-07-08T04:11:27.413455Z 0 [Note] /www/server/mysql/bin/mysqld (mysqld 5.7.34-log) starting as process 2144 ...
2022-07-08T04:11:27.495536Z 0 [Note] InnoDB: PUNCH HOLE support available
2022-07-08T04:11:27.495559Z 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2022-07-08T04:11:27.495562Z 0 [Note] InnoDB: Uses event mutexes
2022-07-08T04:11:27.495565Z 0 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2022-07-08T04:11:27.495568Z 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2022-07-08T04:11:27.495571Z 0 [Note] InnoDB: Using Linux native AIO
2022-07-08T04:11:27.496130Z 0 [Note] InnoDB: Number of pools: 1
2022-07-08T04:11:27.496227Z 0 [Note] InnoDB: Using CPU crc32 instructions
2022-07-08T04:11:27.510618Z 0 [Note] InnoDB: Initializing buffer pool, total size=256M,instances=1,chunk size=128M
2022-07-08T04:11:27.520144Z 0 [Note] InnoDB: Completed initialization of buffer pool
2022-07-08T04:11:27.522095Z 0 [Note] InnoDB: If the mysqld execution user is authorized, 
        page cleaner thread priority can be changed.  See the man page of setpriority().
2022-07-08T04:11:27.532135Z 0 [Note] InnoDB: The first innodb_system data file 'ibdata1' did not exist. 
       A new tablespace will be created!
2022-07-08T04:11:27.532259Z 0 [Note] InnoDB: Setting file '/www/server/data/ibdata1' size to 10 MB. 
        Physically writing the file full; Please wait ...
2022-07-08T04:11:27.760116Z 0 [Note] InnoDB: File '/www/server/data/ibdata1' size is now 10 MB.
2022-07-08T04:11:27.760338Z 0 [Note] InnoDB: Setting log file /www/server/data/ib_logfile101 size to 128MB
2022-07-08T04:11:27.760414Z 0 [Note] InnoDB: Progress in MB:
 100
2022-07-08T04:11:28.940355Z 0 [Note] InnoDB: Setting log file /www/server/data/ib_logfile1 size to 128 MB
2022-07-08T04:11:28.940442Z 0 [Note] InnoDB: Progress in MB:
 100
2022-07-08T04:11:30.517357Z 0 [Note] InnoDB: Renaming log file /www/server/data/ib_logfile101 
      to /www/server/data/ib_logfile0
2022-07-08T04:11:30.517394Z 0 [Warning] InnoDB: New log files created, LSN=45790
2022-07-08T04:11:30.517401Z 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2022-07-08T04:11:30.517425Z 0 [Note] InnoDB: Setting file '/www/server/data/ibtmp1' size to 12 MB. 
       Physically writing the file full; Please wait ...
2022-07-08T04:11:30.609146Z 0 [Note] InnoDB: File '/www/server/data/ibtmp1' size is now 12 MB.
2022-07-08T04:11:30.609236Z 0 [Note] InnoDB: Doublewrite buffer not found: creating new
2022-07-08T04:11:30.631133Z 0 [Note] InnoDB: Doublewrite buffer created
2022-07-08T04:11:31.160847Z 0 [Note] InnoDB: 96 redo rollback segment(s) found.96 redo rollback segment(s) are active.
2022-07-08T04:11:31.160860Z 0 [Note] InnoDB: 32 non-redo rollback segment(s) are active.
2022-07-08T04:11:31.160970Z 0 [Warning] InnoDB: Creating foreign key constraint system tables.
2022-07-08T04:11:31.194147Z 0 [Note] InnoDB: Foreign key constraint system tables created
2022-07-08T04:11:31.194175Z 0 [Note] InnoDB: Creating tablespace and datafile system tables.
2022-07-08T04:11:31.195079Z 0 [Note] InnoDB: Tablespace and datafile system tables created.
2022-07-08T04:11:31.195098Z 0 [Note] InnoDB: Creating sys_virtual system tables.
2022-07-08T04:11:31.195974Z 0 [Note] InnoDB: sys_virtual table created
2022-07-08T04:11:31.196099Z 0 [Note] InnoDB: Waiting for purge to start
2022-07-08T04:11:31.246167Z 0 [Note] InnoDB: 5.7.34 started; log sequence number 0
2022-07-08T04:11:31.246379Z 0 [Note] Plugin 'FEDERATED' is disabled.
2022-07-08T04:11:31.248996Z 0 [Warning] InnoDB: Cannot open table mysql/plugin from the internal data dictionary 
        of InnoDB  though the .frm file for the table exists. 
Please refer to http://dev.mysql.com/doc/refman/5.7/en/innodb-troubleshooting.html for how to resolve the issue.
mysqld: Table 'mysql.plugin' doesn't exist

通过上述日志可以确认,数据库应该被重新初始化了,导致以前库的ibd和frm文件无法被正常访问.对于此类情况,可以参考以前类似恢复案例:frm和ibd文件数据库恢复

存储重启,oracle无法启动故障处理

Posted on 2022 年 07 月 07 日 by 惜分飞

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：存储重启,oracle无法启动故障处理

有客户由于机房要停电,正常关闭两个节点数据库,通过数据库alert日志均可看到类似如下记录,证明数据库确实是正常shutdown immediate

然后关闭存储,启动存储之后发现数据库无法正常启动(数据scn不一致).相关信息如下：

最初报ORA-214错

that ORACLE_BASE be set in the environment
Wed Jul 06 00:50:02 2022
ALTER SYSTEM SET local_listener=' (ADDRESS=(PROTOCOL=TCP)(HOST=10.10.10.10)(PORT=1521))' SCOPE=MEMORY SID='xffdb2';
ALTER DATABASE MOUNT /* db agent *//* {1:42392:203} */
This instance was first to mount
NOTE: Loaded library: System 
SUCCESS: diskgroup DATA1 was mounted
SUCCESS: diskgroup DATA2 was mounted
ORA-214 signalled during: ALTER DATABASE MOUNT /* db agent *//* {1:42392:203} */...
NOTE: dependency between database xffdb and diskgroup resource ora.DATA1.dg is established
NOTE: dependency between database xffdb and diskgroup resource ora.DATA2.dg is established

提示ctl不存在,通过处理之后报ORA-600 2131错误

Wed Jul 06 01:55:45 2022
ALTER SYSTEM SET local_listener=' (ADDRESS=(PROTOCOL=TCP)(HOST=10.10.10.10)(PORT=1521))' SCOPE=MEMORY SID='xffdb2';
ALTER DATABASE MOUNT /* db agent *//* {1:42392:663} */
This instance was first to mount
NOTE: Loaded library: System 
SUCCESS: diskgroup DATA1 was mounted
SUCCESS: diskgroup DATA2 was mounted
NOTE: dependency between database xffdb and diskgroup resource ora.DATA1.dg is established
NOTE: dependency between database xffdb and diskgroup resource ora.DATA2.dg is established
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb2/trace/xffdb2_ora_47746.trc  (incident=576488):
ORA-00600: internal error code, arguments: [2131], [33], [32], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/xffdb/xffdb2/incident/incdir_576488/xffdb2_ora_47746_i576488.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
ORA-600 signalled during: ALTER DATABASE MOUNT /* db agent *//* {1:42392:663} */...

重建控制文件后恢复报错

Parallel Media Recovery started with 127 slaves
ORA-279 signalled during: ALTER DATABASE RECOVER  database using BACKUP CONTROLFILE  ...
Wed Jul 06 02:41:04 2022
ALTER DATABASE RECOVER    LOGFILE '+DATA3/xffdb/archivelog/2022_07_05/thread_2_seq_40889.18030.1109269215'  
Media Recovery Log +DATA3/xffdb/archivelog/2022_07_05/thread_2_seq_40889.18030.1109269215
Wed Jul 06 02:41:04 2022
Errors with log +DATA3/xffdb/archivelog/2022_07_05/thread_2_seq_40889.18030.1109269215
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/trace/xffdb1_pr00_96503.trc:
ORA-00325: archived log for thread 1, wrong thread # 2 in header
ORA-00334: archived log: '+DATA3/xffdb/archivelog/2022_07_05/thread_2_seq_40889.18030.1109269215'
ORA-325 signalled during: ALTER DATABASE RECOVER    LOGFILE '+DATA3/thread_2_seq_40889.18030.1109269215'  ...
ALTER DATABASE RECOVER CANCEL 
Media Recovery Canceled
Completed: ALTER DATABASE RECOVER CANCEL 
…………
Wed Jul 06 02:22:25 2022
ALTER DATABASE RECOVER  DATABASE  
Media Recovery Start
 started logmerger process
Only allocated 127 recovery slaves (requested 128)
Parallel Media Recovery started with 127 slaves
Wed Jul 06 02:22:28 2022
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/trace/xffdb1_pr00_77044.trc:
ORA-00313: open failed for members of log group 7 of thread 1
Media Recovery failed with error 313
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/trace/xffdb1_pr00_77044.trc:
ORA-00283: recovery session canceled due to errors
ORA-00313: open failed for members of log group 7 of thread 1
Wed Jul 06 02:22:28 2022
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/trace/xffdb1_m000_77318.trc:
ORA-00322: log 4 of thread 2 is not current copy
ORA-00312: online log 4 thread 2: '+DATA3/xffdb/onlinelog/group_4.16148.1107795635'
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/trace/xffdb1_m000_77318.trc:
ORA-00322: log 7 of thread 1 is not current copy
ORA-00312: online log 7 thread 1: '+DATA3/xffdb/onlinelog/group_7.18959.1107796013'
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/trace/xffdb1_m000_77318.trc:
ORA-00314: log 9 of thread 1, expected sequence# 133495 doesn't match 133490
ORA-00312: online log 9 thread 1: '+DATA3/xffdb/onlinelog/group_9.3142.1107796071'
Checker run found 208 new persistent data failures
ORA-10877 signalled during: ALTER DATABASE RECOVER  DATABASE  ...
…………
Only allocated 127 recovery slaves (requested 128)
Parallel Media Recovery started with 127 slaves
ORA-279 signalled during: ALTER DATABASE RECOVER  database using backup controlfile  ...
Wed Jul 06 06:15:26 2022
ALTER DATABASE RECOVER    LOGFILE '+DATA3/xffdb/onlinelog/group_4.16442.1107795653'  
Media Recovery Log +DATA3/xffdb/onlinelog/group_4.16442.1107795653
ORA-279 signalled during: ALTER DATABASE RECOVER    LOGFILE '+DATA3/xffdb/onlinelog/group_4.16442.1107795653'  ...
Wed Jul 06 06:15:43 2022
ALTER DATABASE RECOVER    LOGFILE '+DATA3/xffdb/onlinelog/group_7.18959.1107796013'  
Media Recovery Log +DATA3/xffdb/onlinelog/group_7.18959.1107796013
Wed Jul 06 06:15:50 2022
Errors with log +DATA3/xffdb/onlinelog/group_7.18959.1107796013
Wed Jul 06 06:15:50 2022
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/trace/xffdb1_pr29_306479.trc  (incident=961030):
ORA-00600: internal error code, arguments: [6102], [13], [17], [], [], [], [], [], [], [], [], []
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Wed Jul 06 06:15:54 2022
Sweep [inc][961030]: completed
Sweep [inc2][961030]: completed
Slave exiting with ORA-10562 exception
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/trace/xffdb1_pr29_306479.trc:
ORA-10562: Error occurred while applying redo to data block (file# 159, block# 3591756)
ORA-10564: tablespace LIS
ORA-01110: data file 159: '+DATA1/xffdb/datafile/lis.379.1080445903'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 138875
ORA-00600: internal error code, arguments: [6102], [13], [17], [], [], [], [], [], [], [], [], []
Wed Jul 06 06:15:59 2022
Recovery Slave PR29 previously exited with exception 10562

基于上述情况,很可能是由于存储重启之后,cache或者某些数据没有写入到数据文件和redo中,数据库重启之后redo不是最新的[ORA-00322错误可以证明,],数据文件也需要进行恢复(不是数据库正常关闭之后该有的情况),而且redo和数据文件还不一致[ORA-00600 6102可以证明],对于类似这样的情况,只能尝试强制打开数据库,报ORA-600 2663

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [2663], [1393], [4159455578],
[1393], [4160374753], [], [], [], [], [], [], []
Process ID: 357910
Session ID: 1585 Serial number: 7

Wed Jul 06 06:57:25 2022
SMON: enabling cache recovery
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/trace/xffdb1_ora_357910.trc  (incident=1056360):
ORA-00600: internal error code, arguments: [2663], [1393],[4159455578],[1393],[4160374753],[], [], [], []
Redo thread 2 internally disabled at seq 1 (CKPT)
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/trace/xffdb1_arc3_360348.trc:
ORA-00600: internal error code, arguments: [ORA_NPI_ERROR],[600], 
  [ORA-00600: internal error code, arguments: [kffbAddBlk04]
Unable to create archive log file '+DATA3'
ARC3: Error 19504 Creating archive log file to '+DATA3'
ARCH: Archival error occurred on a closed thread. Archiver continuing
ORACLE Instance xffdb1 - Archival Error. Archiver continuing.
ARCH: Archival error occurred on a closed thread. Archiver continuing
ORACLE Instance xffdb1 - Archival Error. Archiver continuing.
Wed Jul 06 06:57:34 2022
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/incident/incdir_1056360/xffdb1_ora_357910_i1056360.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '+DATA1/xffdb/onlinelog/group_4.424.1109314453'
ORA-00600: internal error code, arguments: [2663], [1393], [4159455578], [1393], [4160374753], [], [],
Wed Jul 06 06:57:34 2022
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/trace/xffdb1_ora_357910.trc:
ORA-00600: internal error code, arguments: [2663], [1393], [4159455578], [1393], [4160374753], [], [], []
Errors in file /u01/app/oracle/diag/rdbms/xffdb/xffdb1/trace/xffdb1_ora_357910.trc:
ORA-00600: internal error code, arguments: [2663], [1393], [4159455578], [1393], [4160374753], [], [], []
Error 600 happened during db open, shutting down database
USER (ospid: 357910): terminating the instance due to error 600
Instance terminated by USER, pid = 357910
ORA-1092 signalled during: alter database open resetlogs...
opiodr aborting process unknown ospid (357910) as a result of ORA-1092
Wed Jul 06 06:57:35 2022
ORA-1092 : opitsk aborting process

该错误比较常见,参考:ORA-600 2663，也可以利用我的Patch_SCN小工具快速解决,后续数据库报ORA-03113错

SQL> alter database open ;
alter database open 
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 369324
Session ID: 1585 Serial number: 1

查看alert日志，确认具体报错为kgegpa

Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Wed Jul 06 07:17:08 2022
SMON: enabling cache recovery
ARC1: Archiving disabled thread 2 sequence 1
Archived Log entry 1 added for thread 1 sequence 1 ID 0x36317f52 dest 1:
Archived Log entry 2 added for thread 1 sequence 2 ID 0x36317f52 dest 1:
Archived Log entry 3 added for thread 2 sequence 1 ID 0x0 dest 1:
Exception [type:SIGSEGV, Address not mapped to object][ADDR:0x4D562123][PC:0x983CDD6,kgegpa()+40][flags: 0x0,count:1]
Exception [type:SIGSEGV, Address not mapped to object][ADDR:0x4D562123][PC:0x983B84A, kgebse()+776][flags: 0x2,count:2]
Exception [type:SIGSEGV, Address not mapped to object][ADDR:0x4D562123][PC:0x983B84A, kgebse()+776][flags: 0x2,count:2]
Wed Jul 06 07:17:11 2022
PMON (ospid: 377647): terminating the instance due to error 397

该问题有过类似的案例通过处理数据库open成功：
在数据库恢复遭遇ORA-07445 kgegpa错误
 Exception [type: SIGSEGV, Address not mapped to object] [] [ kgegpa()+36]

误删除分区oracle数据库恢复

Posted on 2022 年 07 月 06 日 by 惜分飞

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：误删除分区oracle数据库恢复

很多时候祸福相依,又一个用户发生类似的事情，数据库故障

他们公司内部折腾,然后把数据库open成功，并且也顺利导出来dmp在d盘.然后重新安装系统,结果悲剧发生了,他们在操作过程中给c盘扩容,把d盘删除了,然后以前d盘的部分空间分配给c盘了,但是d盘数据全部消失（以前的数据库文件，最新备份出来的dmp文件）.用户给我反馈给一系列操作之后,提醒客户尽可能不要对该磁盘进行任何操作（已经分区200G的c盘和800G未分区的空间）,然后通过恢复工具进行分析

运气不错,相关的文件没有被覆盖,并且顺利恢复出来

运气不错,顺利完成相关恢复,将误操作数据恢复恢复来,再次提醒各位,操作谨慎,切莫因为一时疏忽酿成打错.

ORA-00333 ORA-01595 恢复

Posted on 2022 年 07 月 05 日 by 惜分飞

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：ORA-00333 ORA-01595 恢复

客户反馈数据库异常，查看日志发现asm和db均发生hang住情况（由于环境原因部分日志没有拷贝出来)，基于现有情况,无法直接恢复，通过一些工具把asm磁盘组中的数据文件拷贝到文件系统，经过检测无坏块

修改相关路径,尝试recover库

Tue Jul 05 15:05:54 2022
ALTER DATABASE RECOVER  datafile 1  
Media Recovery Start
Serial Media Recovery started
Recovery of Online Redo Log: Thread 2 Group 4 Seq 29973 Reading mem 0
  Mem# 0: E:\ORADATA\GROUP_4.266.822672441
Recovery of Online Redo Log: Thread 1 Group 2 Seq 38422 Reading mem 0
  Mem# 0: E:\ORADATA\GROUP_2.262.822672137
Incomplete read from log member 'E:\ORADATA\GROUP_2.262.822672137'. Trying next member.
Media Recovery failed with error 333
ORA-283 signalled during: ALTER DATABASE RECOVER  datafile 1  ...

错误信息比较明显，在读入redo进行恢复的时候遭遇“ORA-00333: 重做日志读取块 11557 计数 731 出错”错误,从而无法继续恢复.这次故障运气比较好,通过分析v$datafile和v$datafile_header关系

进行一些操作,绕过redo block 11557,顺利recover成功,并且open库

ALTER DATABASE RECOVER  database  
Media Recovery Start
 started logmerger process
Tue Jul 05 15:17:46 2022
Parallel Media Recovery started with 32 slaves
Tue Jul 05 15:17:46 2022
Recovery of Online Redo Log: Thread 2 Group 4 Seq 29973 Reading mem 0
  Mem# 0: E:\ORADATA\GROUP_4.266.822672441
Recovery of Online Redo Log: Thread 1 Group 2 Seq 38422 Reading mem 0
  Mem# 0: E:\ORADATA\GROUP_2.262.822672137
Completed: ALTER DATABASE RECOVER  database

通过分析alert日志发现有ORA-600 4194错误

QMNC started with pid=58, OS id=15980 
LOGSTDBY: Validating controlfile with logical metadata
LOGSTDBY: Validation complete
Tue Jul 05 15:18:24 2022
Tue Jul 05 15:18:24 2022
Block recovery from logseq 38423, block 152 to scn 16218380250500
Recovery of Online Redo Log: Thread 1 Group 1 Seq 38423 Reading mem 0
  Mem# 0: E:\ORADATA\GROUP_1.261.822672135
Block recovery stopped at EOT rba 38423.154.16
Block recovery completed at rba 38423.154.16, scn 3776.583740804
Block recovery from logseq 38423, block 152 to scn 16218380250497
Recovery of Online Redo Log: Thread 1 Group 1 Seq 38423 Reading mem 0
  Mem# 0: E:\ORADATA\GROUP_1.261.822672135
Block recovery completed at rba 38423.154.16, scn 3776.583740804
Errors in file F:\APP\ADMINISTRATOR\diag\rdbms\xff\xff1\trace\xff1_smon_5660.trc:
ORA-01595: 释放区 (2) 回退段 (8) 时出错
ORA-00600: 内部错误代码, 参数: [4194], [], [
                                      
Completed: alter database open

这比较简单，对于异常的undo进行处理即可,然后使用hcheck检查字典一致性

SQL> @e:/oradata/txt/11.txt
HCheck Version 07MAY18 on 05-7月 -2022 16:30:18
----------------------------------------------
Catalog Version 11.2.0.3.0 (1102000300)
db_name: xff

                                   Catalog       Fixed
Procedure Name                     Version    Vs Release    Timestamp
Result
------------------------------ ... ---------- -- ---------- --------------
------
.- LobNotInObj                 ... 1102000300 <=  *All Rel* 07/05 16:30:18 PASS
.- MissingOIDOnObjCol          ... 1102000300 <=  *All Rel* 07/05 16:30:19 PASS
.- SourceNotInObj              ... 1102000300 <=  *All Rel* 07/05 16:30:19 PASS
.- OversizedFiles              ... 1102000300 <=  *All Rel* 07/05 16:30:19 PASS
.- PoorDefaultStorage          ... 1102000300 <=  *All Rel* 07/05 16:30:19 PASS
.- PoorStorage                 ... 1102000300 <=  *All Rel* 07/05 16:30:19 PASS
.- TabPartCountMismatch        ... 1102000300 <=  *All Rel* 07/05 16:30:20 PASS
.- OrphanedTabComPart          ... 1102000300 <=  *All Rel* 07/05 16:30:20 PASS
.- MissingSum$                 ... 1102000300 <=  *All Rel* 07/05 16:30:20 PASS
.- MissingDir$                 ... 1102000300 <=  *All Rel* 07/05 16:30:20 PASS
.- DuplicateDataobj            ... 1102000300 <=  *All Rel* 07/05 16:30:20 PASS
.- ObjSynMissing               ... 1102000300 <=  *All Rel* 07/05 16:30:20 PASS
.- ObjSeqMissing               ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- OrphanedUndo                ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- OrphanedIndex               ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- OrphanedIndexPartition      ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- OrphanedIndexSubPartition   ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- OrphanedTable               ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- OrphanedTablePartition      ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- OrphanedTableSubPartition   ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- MissingPartCol              ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- OrphanedSeg$                ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- OrphanedIndPartObj#         ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- DuplicateBlockUse           ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- FetUet                      ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- Uet0Check                   ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- SeglessUET                  ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- BadInd$                     ... 1102000300 <=  *All Rel* 07/05 16:30:21 PASS
.- BadTab$                     ... 1102000300 <=  *All Rel* 07/05 16:30:22 PASS
.- BadIcolDepCnt               ... 1102000300 <=  *All Rel* 07/05 16:30:22 PASS
.- ObjIndDobj                  ... 1102000300 <=  *All Rel* 07/05 16:30:22 PASS
.- TrgAfterUpgrade             ... 1102000300 <=  *All Rel* 07/05 16:30:22 PASS
.- ObjType0                    ... 1102000300 <=  *All Rel* 07/05 16:30:22 PASS
.- BadOwner                    ... 1102000300 <=  *All Rel* 07/05 16:30:22 PASS
.- StmtAuditOnCommit           ... 1102000300 <=  *All Rel* 07/05 16:30:22 PASS
.- BadPublicObjects            ... 1102000300 <=  *All Rel* 07/05 16:30:22 PASS
.- BadSegFreelist              ... 1102000300 <=  *All Rel* 07/05 16:30:22 PASS
.- BadDepends                  ... 1102000300 <=  *All Rel* 07/05 16:30:22 PASS
.- CheckDual                   ... 1102000300 <=  *All Rel* 07/05 16:30:23 PASS
.- ObjectNames                 ... 1102000300 <=  *All Rel* 07/05 16:30:23 WARN

HCKW-0018: OBJECT name clashes with SCHEMA name (Doc ID 2363142.1)
Schema=BSHRP INDEX=XFF.XFF

.- BadCboHiLo                  ... 1102000300 <=  *All Rel* 07/05 16:30:23 PASS
.- ChkIotTs                    ... 1102000300 <=  *All Rel* 07/05 16:30:24 PASS
.- NoSegmentIndex              ... 1102000300 <=  *All Rel* 07/05 16:30:24 PASS
.- BadNextObject               ... 1102000300 <=  *All Rel* 07/05 16:30:24 PASS
.- DroppedROTS                 ... 1102000300 <=  *All Rel* 07/05 16:30:24 PASS
.- FilBlkZero                  ... 1102000300 <=  *All Rel* 07/05 16:30:24 PASS
.- DbmsSchemaCopy              ... 1102000300 <=  *All Rel* 07/05 16:30:24 PASS
.- OrphanedObjError            ... 1102000300 >  1102000000 07/05 16:30:24 PASS
.- ObjNotLob                   ... 1102000300 <=  *All Rel* 07/05 16:30:24 PASS
.- MaxControlfSeq              ... 1102000300 <=  *All Rel* 07/05 16:30:24 PASS
.- SegNotInDeferredStg         ... 1102000300 >  1102000000 07/05 16:30:25 PASS
.- SystemNotRfile1             ... 1102000300 >   902000000 07/05 16:30:25 PASS
.- DictOwnNonDefaultSYSTEM     ... 1102000300 <=  *All Rel* 07/05 16:30:25 PASS
.- OrphanTrigger               ... 1102000300 <=  *All Rel* 07/05 16:30:25 PASS
.- ObjNotTrigger               ... 1102000300 <=  *All Rel* 07/05 16:30:25 PASS
---------------------------------------
05-7月 -2022 16:30:25  Elapsed: 7 secs
---------------------------------------
Found 0 potential problem(s) and 1 warning(s)
Contact Oracle Support with the output and trace file
to check if the above needs attention or not

PL/SQL 过程已成功完成。

有一个SCHEMA和对象名一样,这个不影响属于正常情况(客户创建了一个用户叫做XFF,然后有创建了一个XFF的对象),该数据库恢复至此基本上晚上,业务可以直接运行,不用做逻辑迁移

云主机快照之后Oracle无法正常启动处理

Posted on 2022 年 07 月 04 日 by 惜分飞

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：云主机快照之后Oracle无法正常启动处理

某客户数据库放在x云上面,需要对数据库盘进行扩容,在扩容之前对该盘做了快照,结果没有想到悲剧发生了

[root@xifenfei ~]# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1        99G   64G   31G  68% /
devtmpfs         16G     0   16G   0% /dev
tmpfs            16G     0   16G   0% /dev/shm
tmpfs            16G  720K   16G   1% /run
tmpfs            16G     0   16G   0% /sys/fs/cgroup
/dev/vdb        2.0T  1.2T  910G  56% /www/xifenfei
tmpfs           3.2G     0  3.2G   0% /run/user/1004
tmpfs           3.2G     0  3.2G   0% /run/user/0

如上显示,客户的数据文件都放在/dev/vdb中了,但是很不幸,redo文件放在/data中(也就是vda磁盘组中),没有被做快照,结果客户还原vdb快照之后,发现现象如下

SQL> set pages 10000
SQL> set numw 16
SQL> SELECT status,
  2  checkpoint_change#,
  3  checkpoint_time,last_change#,
  4  count(*) ROW_NUM
  5  FROM v$datafile
  6  GROUP BY status, checkpoint_change#, checkpoint_time,last_change#
  7  ORDER BY status, checkpoint_change#, checkpoint_time;

STATUS         CHECKPOINT_CHANGE# CHECKPOINT_T     LAST_CHANGE#          ROW_NUM
-------------- ------------------ ------------ ---------------- ----------------
ONLINE                69632585947 04-JUL-22                                   38
SYSTEM                69632585947 04-JUL-22                                    2

SQL> set numw 16
SQL> col CHECKPOINT_TIME for a40
SQL> set lines 150
SQL> set pages 1000
SQL> SELECT status,
  2  to_char(checkpoint_time,'yyyy-mm-dd hh24:mi:ss') checkpoint_time,FUZZY,checkpoint_change#,
  3  count(*) ROW_NUM
  4  FROM v$datafile_header
  5  GROUP BY status, checkpoint_change#, to_char(checkpoint_time,'yyyy-mm-dd hh24:mi:ss'),fuzzy
  6  ORDER BY status, checkpoint_change#, checkpoint_time;

STATUS         CHECKPOINT_TIME                          FUZZY  CHECKPOINT_CHANGE#          ROW_NUM
-------------- ---------------------------------------- ------ ------------------ ----------------
ONLINE         2022-07-04 09:03:24                      YES           69631105424               40

通过上述分析,该库相当数据文件和redo文件之间相差了一段时间数据,而且该库为非归档,基于这种情况,该库只能强制打开,在打开过程中遇到ORA-600 ktpridestroy2错误

SMON: enabling tx recovery
Database Characterset is AL32UTF8
No Resource Manager plan active
replication_dependency_tracking turned off (no async multimaster replication found)
SMON: Restarting fast_start parallel rollback
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_7332.trc  (incident=41257):
ORA-00600: internal error code, arguments: [ktpridestroy2], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /data/oracle/diag/rdbms/orcl/orcl/incident/incdir_41257/orcl_smon_7332_i41257.trc
Starting background process QMNC
Mon Jul 04 16:31:44 2022
QMNC started with pid=36, OS id=7454 
LOGSTDBY: Validating controlfile with logical metadata
LOGSTDBY: Validation complete
Fatal internal error happened while SMON was doing active transaction recovery.
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_7332.trc:
ORA-00600: internal error code, arguments: [ktpridestroy2], [], [], [], [], [], [], [], [], [], [], []
SMON (ospid: 7332): terminating the instance due to error 474
Instance terminated by SMON, pid = 7332

对应trace文件

Dump continued from file: /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_7332.trc
ORA-00600: internal error code, arguments: [ktpridestroy2], [], [], [], [], [], [], [], [], [], [], []

========= Dump for incident 41257 (ORA 600 [ktpridestroy2]) ========

*** 2022-07-04 16:31:44.261
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- SQL Statement (None) -----
Current SQL information unavailable - no cursor.

----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
skdstdst()+36        call     kgdsdst()            000000000 ? 000000000 ?
                                                   7FFCD123B998 ? 000000001 ?
                                                   7FFCD123FE98 ? 000000000 ?
ksedst1()+98         call     skdstdst()           000000000 ? 000000000 ?
                                                   7FFCD123B998 ? 000000001 ?
                                                   000000000 ? 000000000 ?
ksedst()+34          call     ksedst1()            000000000 ? 000000001 ?
                                                   7FFCD123B998 ? 000000001 ?
                                                   000000000 ? 000000000 ?
dbkedDefDump()+2736  call     ksedst()             000000000 ? 000000001 ?
                                                   7FFCD123B998 ? 000000001 ?
                                                   000000000 ? 000000000 ?
ksedmp()+36          call     dbkedDefDump()       000000003 ? 000000002 ?
                                                   7FFCD123B998 ? 000000001 ?
                                                   000000000 ? 000000000 ?
ksfdmp()+64          call     ksedmp()             000000003 ? 000000002 ?
                                                   7FFCD123B998 ? 000000001 ?
                                                   000000000 ? 000000000 ?
dbgexPhaseII()+1764  call     ksfdmp()             000000003 ? 000000002 ?
                                                   7FFCD123B998 ? 000000001 ?
                                                   000000000 ? 000000000 ?
dbgexProcessError()  call     dbgexPhaseII()       7F3C5D15C6F0 ? 7F3C5A851598 ?
+2279                                              7FFCD1247C88 ? 000000001 ?
                                                   000000000 ? 000000000 ?
dbgeExecuteForError  call     dbgexProcessError()  7F3C5D15C6F0 ? 7F3C5A851598 ?
()+83                                              000000001 ? 000000000 ?
                                                   7FFC00000000 ? 000000000 ?
dbgePostErrorKGE()+  call     dbgeExecuteForError  7F3C5D15C6F0 ? 7F3C5A851598 ?
1615                          ()                   000000001 ? 000000001 ?
                                                   000000000 ? 000000000 ?
dbkePostKGE_kgsf()+  call     dbgePostErrorKGE()   000000000 ? 7F3C5A6C1228 ?
63                                                 000000258 ? 7F3C5A851598 ?
                                                   000000000 ? 000000000 ?
kgeadse()+383        call     dbkePostKGE_kgsf()   00A984C60 ? 7F3C5A6C1228 ?
                                                   000000258 ? 7F3C5A851598 ?
                                                   000000000 ? 000000000 ?
kgerinv_internal()+  call     kgeadse()            00A984C60 ? 7F3C5A6C1228 ?
45                                                 000000258 ? 000000000 ?
                                                   000000000 ? 000000000 ?
kgerinv()+33         call     kgerinv_internal()   00A984C60 ? 7F3C5A6C1228 ?
                                                   D124022000000000 ?
                                                   000000258 ? 000000000 ?
                                                   000000000 ?
kgeasnmierr()+143    call     kgerinv()            00A984C60 ? 7F3C5A6C1228 ?
                                                   D124022000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ?
ktpridestroy()+912   call     kgeasnmierr()        00A984C60 ? 7F3C5A6C1228 ?
                                                   D124022000000000 ?
                                                   000000000 ? 1E0F02D40 ?
                                                   1EC6DA410 ?
ktprw1s()+527        call     ktpridestroy()       D124022000000000 ?
                                                   000000000 ? 1E7A1C2B0 ?
                                                   000000000 ? 1E0F02D40 ?
                                                   1EC6DA410 ?
ktprsched()+197      call     ktprw1s()            D124022000000000 ?
                                                   000000000 ? 1E7A1C2B0 ?
                                                   000000000 ? 1E0F02D40 ?
                                                   1EC6DA410 ?
kturRecoverUndoSegm  call     ktprsched()          D124022000000000 ?
ent()+1057                                         000000000 ? 1E7A1C2B0 ?
                                                   000000000 ? 1E0F02D40 ?
                                                   1EC6DA410 ?
kturRecoverActiveTx  call     kturRecoverUndoSegm  000000000 ? 000000000 ?
ns()+710                      ent()                000000001 ? 000000000 ?
                                                   0D124FFFF ? 6200000005 ?
ktprbeg()+2506       call     kturRecoverActiveTx  000000004 ? 000000000 ?
                              ns()                 000000027 ? 000000000 ?
                                                   0D124FFFF ? 6200000005 ?
ktmmon()+13588       call     ktprbeg()            000000000 ? 000000000 ?
                                                   000000027 ? 000000000 ?
                                                   0D124FFFF ? 6200000005 ?
ktmSmonMain()+201    call     ktmmon()             06002DEC0 ? 000000000 ?
                                                   000000027 ? 000000000 ?
                                                   0D124FFFF ? 6200000005 ?
ksbrdp()+923         call     ktmSmonMain()        06002DEC0 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   0D124FFFF ? 6200000005 ?
opirip()+618         call     ksbrdp()             06002DEC0 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   0D124FFFF ? 6200000005 ?
opidrv()+598         call     opirip()             000000032 ? 000000004 ?
                                                   7FFCD124B658 ? 000000000 ?
                                                   0D124FFFF ? 6200000005 ?
sou2o()+98           call     opidrv()             000000032 ? 000000004 ?
                                                   7FFCD124B658 ? 000000000 ?
                                                   0D124FFFF ? 6200000005 ?
opimai_real()+261    call     sou2o()              7FFCD124B630 ? 000000032 ?
                                                   000000004 ? 7FFCD124B658 ?
                                                   0D124FFFF ? 6200000005 ?
ssthrdmain()+209     call     opimai_real()        000000000 ? 7FFCD124B820 ?
                                                   000000004 ? 7FFCD124B658 ?
                                                   0D124FFFF ? 6200000005 ?
main()+196           call     ssthrdmain()         000000003 ? 7FFCD124B820 ?
                                                   000000001 ? 000000000 ?
                                                   0D124FFFF ? 6200000005 ?
__libc_start_main()  call     main()               000000003 ? 7FFCD124B9C0 ?
+245                                               000000001 ? 000000000 ?
                                                   0D124FFFF ? 6200000005 ?
_start()+36          call     __libc_start_main()  0009C12F0 ? 000000001 ?
                                                   7FFCD124B9B8 ? 000000000 ?
                                                   0D124FFFF ? 6200000005 ?
--------------------- Binary Stack Dump ---------------------

通过分析确认该错误和并行恢复有关系,绕过该错误之后,再次尝试启动库报错为ORA-600 4137

Mon Jul 04 16:33:41 2022
SMON: enabling cache recovery
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is AL32UTF8
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_7554.trc  (incident=42457):
ORA-00600: internal error code, arguments: [4137], [6.11.21484016], [0], [0], [], [], [], [], [], [], [], []
Incident details in: /data/oracle/diag/rdbms/orcl/orcl/incident/incdir_42457/orcl_smon_7554_i42457.trc
Stopping background process MMNL
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_7554.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/data/oracle/oradata/orcl/redo03.log'
ORA-00600: internal error code, arguments: [4137], [6.11.21484016], [0], [0], [], [], [], [], [], [], [], []
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_7554.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/data/oracle/oradata/orcl/redo03.log'
ORA-00600: internal error code, arguments: [4137], [6.11.21484016], [0], [0], [], [], [], [], [], [], [], []
ORACLE Instance orcl (pid = 13) - Error 600 encountered while recovering transaction (6, 11).
Errors in file /data/oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_7554.trc:
ORA-00600: internal error code, arguments: [4137], [6.11.21484016], [0], [0], [], [], [], [], [], [], [], []

该错误比较常见,一般是由于undo中有异常事务,对异常事务进行处理,数据库open成功,并顺利导入数据到新库中,完成本次数据恢复