110T oracle故障恢复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:110T oracle故障恢复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户一套110T的数据库由于存储控制器故障导致数据库无法正常启动,启动报错如下:

Wed Feb  2 23:16:13 2022
Recovery of Online Redo Log: Thread 1 Group 7 Seq 1647469 Reading mem 0
  Mem# 0: /dev/vgredo6/rredo7b
  Mem# 1: /dev/vgredo4/rredo7a
Wed Feb  2 23:16:14 2022
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_26754.trc:
ORA-07445: exception encountered: core dump [_memcpy()+7040] [SIGSEGV] [Address not mapped to object] [] [] []
Wed Feb  2 23:16:15 2022
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_26754.trc:
ORA-07445: exception encountered: core dump [kcbzdh()+560] [SIGSEGV] [Address not mapped to object] [] [] []
ORA-07445: exception encountered: core dump [_memcpy()+7040] [SIGSEGV] [Address not mapped to object] [] [] []
Wed Feb  2 23:16:16 2022
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_26754.trc:
ORA-07445: exception encountered: core dump [kcbzdh()+560] [SIGSEGV] [Address not mapped to object] [] [] []
ORA-07445: exception encountered: core dump [kcbzdh()+560] [SIGSEGV] [Address not mapped to object] [] [] []
ORA-07445: exception encountered: core dump [_memcpy()+7040] [SIGSEGV] [Address not mapped to object] [] [] []
Wed Feb  2 23:16:16 2022
Errors in file /opt/oracle/admin/xifenfei/udump/xifenfei_ora_26754.trc:
ORA-00600: internal error code, arguments: [kghstack_free2], [], [], [], [], [], [], []
ORA-00607: Internal error occurred while making a change to a data block
ORA-00602: internal programming exception
ORA-07445: exception encountered: core dump [kcbzdh()+560] [SIGSEGV] [Address not mapped to object] [] [] []
ORA-07445: exception encountered: core dump [kcbzdh()+560] [SIGSEGV] [Address not mapped to object] [] [] []
ORA-07445: exception encountered: core dump [_memcpy()+7040] [SIGSEGV] [Address not mapped to object] [] [] []
Wed Feb  2 23:16:26 2022
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_pmon_26722.trc:
ORA-07445: exception encountered: core dump [kcbzre1()+6593] [SIGSEGV] [Address not mapped to object] [] [] []
Wed Feb  2 23:16:27 2022
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_pmon_26722.trc:
ORA-07445: exception encountered: core dump [kcbs_dump_adv_state()+1200] [SIGSEGV][] []
ORA-07445: exception encountered: core dump [kcbzre1()+6593] [SIGSEGV] [Address not mapped to object] [] [] []
Wed Feb  2 23:16:27 2022
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_pmon_26722.trc:
ORA-00602: internal programming exception
ORA-07445: exception encountered: core dump [kcbs_dump_adv_state()+1200] [SIGSEGV][] []
ORA-07445: exception encountered: core dump [kcbzre1()+6593] [SIGSEGV] [Address not mapped to object] [] [] []
Wed Feb  2 23:17:11 2022
PSP0: terminating instance due to error 472
Instance terminated by PSP0, pid = 26724

该错误原因是由于redo信息和数据文件block信息不匹配导致无法正常应用日志,从而出现异常,在后续的recover 中还出现以下错误

Fri Feb 18 16:09:59 2022
ALTER DATABASE RECOVER  datafile 609,610,611,612,613,614,615,602,603,604,605,606,607,608  
Fri Feb 18 16:09:59 2022
Media Recovery Start
 parallel recovery started with 16 processes
Fri Feb 18 16:10:00 2022
Recovery of Online Redo Log: Thread 1 Group 7 Seq 1647469 Reading mem 0
  Mem# 0: /dev/vgredo6/rredo7b
  Mem# 1: /dev/vgredo4/rredo7a
Fri Feb 18 16:12:17 2022
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_p000_22509.trc:
ORA-00600: internal error code, arguments: [6101], [0], [42], [96], [], [], [], []
Fri Feb 18 16:18:51 2022
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_p000_22509.trc:
ORA-10562: Error occurred while applying redo to data block (file# 602, block# 1693691)
ORA-10564: tablespace DBS_DCDL_PT
ORA-01110: data file 602: '/dev/vgora12/rdbs_dcdl_pt0155'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 33682645
ORA-00600: internal error code, arguments: [6101], [0], [42], [96], [], [], [], []
Fri Feb 18 16:18:55 2022
Media Recovery failed with error 12801
Fri Feb 18 18:23:59 2022
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_dbw1_22483.trc:
ORA-07445: exception encountered: core dump [kcbs_dump_adv_state()+1200] [SIGSEGV][] []
ORA-07445: exception encountered: core dump [ksuitm()+2400] [SIGSEGV] [] [] [] []
ORA-00472: PMON  process terminated with error
Fri Feb 18 18:24:04 2022
DBW3: terminating instance due to error 472
Fri Feb 18 18:24:04 2022
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_dbw3_22487.trc:
ORA-07445: exception encountered: core dump [ksuitm()+2400] [SIGSEGV] [] [] [] []
ORA-00472: PMON  process terminated with error
Fri Feb 18 18:24:04 2022
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_dbw3_22487.trc:
ORA-07445: exception encountered: core dump [kcbs_dump_adv_state()+1200] [SIGSEGV] [] []
ORA-07445: exception encountered: core dump [ksuitm()+2400] [SIGSEGV] [] [] [] []
ORA-00472: PMON  process terminated with error
Fri Feb 18 18:24:09 2022
LGWR: terminating instance due to error 472
Fri Feb 18 18:24:09 2022
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_lgwr_22489.trc:
ORA-07445: exception encountered: core dump [ksuitm()+2400] [SIGSEGV] [] [] [] []
ORA-00472: PMON  process terminated with error

从sqlplus中看到类似一些报错

SQL>  recover datafile 601;
ORA-03113: end-of-file on communication channel

SQL> recover datafile 1066;
ORA-00283: recovery session canceled due to errors
ORA-12801: error signaled in parallel query server P015
ORA-00600: internal error code, arguments: [2037], [207064103], [207064103],
[162], [6], [1], [1833009883], [1130705717]


SQL> recover datafile 1065;
ORA-00283: recovery session canceled due to errors
ORA-12801: error signaled in parallel query server P004
ORA-00600: internal error code, arguments: [kcbzpb_1], [142189139], [3], [0],
[], [], [], []


SQL> recover datafile 2042;
ORA-00283: recovery session canceled due to errors
ORA-12801: error signaled in parallel query server P014
ORA-00600: internal error code, arguments: [3020], [627], [3234156],
[2633062764], [], [], [], []
ORA-10567: Redo is inconsistent with data block

通过屏蔽一致性,强制open库成功

Sun Feb 20 21:20:06 2022
SMON: enabling tx recovery
Sun Feb 20 21:20:06 2022
Database Characterset is ZHS16GBK
Sun Feb 20 21:20:07 2022
ORACLE Instance xifenfei (pid = 38) - Error 376 encountered while recovering transaction (74, 17) on object 34131051.
Sun Feb 20 21:20:07 2022
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_smon_25140.trc:
ORA-00376: file 1416 cannot be read at this time
ORA-01110: data file 1416: '/dev/vgora14/rdbs_icdl_pt116'
Sun Feb 20 21:20:08 2022
Stopping background process MMNL
Sun Feb 20 21:20:09 2022
ORACLE Instance xifenfei (pid = 38) - Error 376 encountered while recovering transaction (88, 36) on object 33514955.
Sun Feb 20 21:20:09 2022
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_smon_25140.trc:
ORA-00376: file 1264 cannot be read at this time
ORA-01110: data file 1264: '/dev/vgora14/rdbs_icdl_pt102'
Sun Feb 20 21:20:09 2022
Stopping background process MMON
Starting background process MMON
Starting background process MMNL
MMON started with pid=46, OS id=1482
Sun Feb 20 21:20:10 2022
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_smon_25140.trc:
ORA-01578: ORACLE data block corrupted (file # 652, block # 3767844)
ORA-01110: data file 652: '/dev/vgora13/rdbs_dcdl_pt0205'
Sun Feb 20 21:20:10 2022
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_smon_25140.trc:
ORA-01578: ORACLE data block corrupted (file # 652, block # 3767661)
ORA-01110: data file 652: '/dev/vgora13/rdbs_dcdl_pt0205'
replication_dependency_tracking turned off (no async multimaster replication found)
Sun Feb 20 21:20:11 2022
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_smon_25140.trc:
ORA-01578: ORACLE data block corrupted (file # 652, block # 3767661)
ORA-01110: data file 652: '/dev/vgora13/rdbs_dcdl_pt0205'
Sun Feb 20 21:20:11 2022
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_smon_25140.trc:
ORA-01578: ORACLE data block corrupted (file # 652, block # 3767661)
ORA-01110: data file 652: '/dev/vgora13/rdbs_dcdl_pt0205'
Sun Feb 20 21:20:11 2022
LOGSTDBY: Validating controlfile with logical metadata
Sun Feb 20 21:20:11 2022
LOGSTDBY: Validation complete
Sun Feb 20 21:20:11 2022
Errors in file /opt/oracle/admin/xifenfei/bdump/xifenfei_smon_25140.trc:
ORA-01578: ORACLE data block corrupted (file # 652, block # 3767661)
ORA-01110: data file 652: '/dev/vgora13/rdbs_dcdl_pt0205'
Completed: alter database open

对于异常undo进行处理,数据库正常open
20220220225006
20220220224004
由于客户短期无法迁移数据,先对于一些坏块进行修复,暂时运行数据库后续有时间窗口进行迁移.

ora-600 kcratr_scan_lastbwr

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ora-600 kcratr_scan_lastbwr

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户数据库由于断电,导致启动报错ora-600 kcratr_scan_lastbwr错误

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
alter database open
Beginning crash recovery of 1 threads
 parallel recovery started with 15 processes
Started redo scan
Hex dump of (file 4, block 3952129) in trace file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_4500.trc
Reading datafile 'D:\APP\ADMINISTRATOR\ORADATA\ORCL\USERS01.DBF' for corruption at rdba:0x013c4e01(file 4,block 3952129)
Reread (file 4, block 3952129) found same corrupt data (logically corrupt)
Write verification failed for File 4 Block 3952129 (rdba 0x13c4e01)
Fri Feb 18 10:16:34 2022
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_4500.trc  (incident=388961):
ORA-00600: ??????, ??: [kcratr_scan_lastbwr], [], [], [], [], [], [], [], [], [], [], []
Incident details in:D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\incident\incdir_388961\orcl_ora_4500_i388961.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Aborting crash recovery due to error 600
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_4500.trc:
ORA-00600: ??????, ??: [kcratr_scan_lastbwr], [], [], [], [], [], [], [], [], [], [], []
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_4500.trc:
ORA-00600: ??????, ??: [kcratr_scan_lastbwr], [], [], [], [], [], [], [], [], [], [], []
ORA-600 signalled during: alter database open...

根据MOS中的描述,这个问题主要出现在11.2.0.2之前版本中,但是本case发生在11.2.0.3的数据库中
20220218220920


ORA-600 [kcratr_scan_lastbwr] (Doc ID 1267231.1)描述,recover操作,数据库直接open,实现数据0丢失

ocr磁盘组掉盘故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ocr磁盘组掉盘故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

由于某种故障导致crs的OCR_0001盘掉线,votedisk从3个变为了2个

WARNING: Write Failed. group:3 disk:1 AU:1 offset:4190208 size:4096
WARNING: Hbeat write to PST disk 1.3915948466 in group 3 failed. [4]
Mon Jun 14 15:31:11 2021
NOTE: process _b000_+asm1 (21889) initiating offline of disk 1.3915948466 (OCR_0001) with mask 0x7e in group 3
NOTE: checking PST: grp = 3
GMON checking disk modes for group 3 at 14 for pid 28, osid 21889
NOTE: group OCR: updated PST location: disk 0000 (PST copy 0)
NOTE: group OCR: updated PST location: disk 0002 (PST copy 1)
NOTE: checking PST for grp 3 done.
NOTE: sending set offline flag message 1047812201 to 1 disk(s) in group 3
WARNING: Disk OCR_0001 in mode 0x7f is now being offlined
INFO: Instance #2 could not find disk 1 in group 3
NOTE: initiating PST update: grp = 3, dsk = 1/0xe968a1b2, mask = 0x6a, op = clear
GMON updating disk modes for group 3 at 15 for pid 28, osid 21889
NOTE: group OCR: updated PST location: disk 0000 (PST copy 0)
NOTE: group OCR: updated PST location: disk 0002 (PST copy 1)
NOTE: group OCR: updated PST location: disk 0000 (PST copy 0)
NOTE: group OCR: updated PST location: disk 0002 (PST copy 1)
NOTE: PST update grp = 3 completed successfully 
NOTE: initiating PST update: grp = 3, dsk = 1/0xe968a1b2, mask = 0x7e, op = clear
GMON updating disk modes for group 3 at 16 for pid 28, osid 21889
NOTE: group OCR: updated PST location: disk 0000 (PST copy 0)
NOTE: group OCR: updated PST location: disk 0002 (PST copy 1)
NOTE: group OCR: updated PST location: disk 0000 (PST copy 0)
NOTE: group OCR: updated PST location: disk 0002 (PST copy 1)
NOTE: cache closing disk 1 of grp 3: OCR_0001
NOTE: PST update grp = 3 completed successfully 
Mon Jun 14 15:31:13 2021
NOTE: Attempting voting file refresh on diskgroup OCR
NOTE: Refresh completed on diskgroup OCR
. Found 3 voting file(s).
NOTE: Voting file relocation is required in diskgroup OCR
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: Successful voting file relocation on diskgroup OCR
NOTE: Attempting voting file refresh on diskgroup OCR
NOTE: Refresh completed on diskgroup OCR
. Found 2 voting file(s).
NOTE: Voting file relocation is required in diskgroup OCR
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: Successful voting file relocation on diskgroup OCR
Mon Jun 14 15:34:08 2021
WARNING: PST-initiated drop of 1 disk(s) in group 3(.1918390620))
SQL> alter diskgroup OCR drop disk OCR_0001 force /* ASM SERVER */ 
NOTE: GroupBlock outside rolling migration privileged region
NOTE: requesting all-instance membership refresh for group=3
Mon Jun 14 15:34:10 2021
GMON updating for reconfiguration, group 3 at 17 for pid 28, osid 21889
NOTE: group OCR: updated PST location: disk 0000 (PST copy 0)
NOTE: group OCR: updated PST location: disk 0002 (PST copy 1)
NOTE: cache closing disk 1 of grp 3: (not open) OCR_0001
NOTE: group OCR: updated PST location: disk 0000 (PST copy 0)
NOTE: group OCR: updated PST location: disk 0002 (PST copy 1)
NOTE: group 3 PST updated.
Mon Jun 14 15:34:10 2021
NOTE: membership refresh pending for group 3/0x7258515c (OCR)
NOTE: Attempting voting file refresh on diskgroup OCR
NOTE: Refresh completed on diskgroup OCR
. Found 2 voting file(s).
NOTE: Voting file relocation is required in diskgroup OCR
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: Successful voting file relocation on diskgroup OCR
GMON querying group 3 at 18 for pid 18, osid 8900
NOTE: group OCR: updated PST location: disk 0000 (PST copy 0)
NOTE: group OCR: updated PST location: disk 0002 (PST copy 1)
NOTE: cache closing disk 1 of grp 3: (not open) _DROPPED_0001_OCR
SUCCESS: refreshed membership for 3/0x7258515c (OCR)
SUCCESS: alter diskgroup OCR drop disk OCR_0001 force /* ASM SERVER */

在第一次掉盘之后rebalance完成之后,又掉一块盘,ocr磁盘组正常,表决盘因为就只有一个磁盘,无法在ocr磁盘组中refresh到其他磁盘上

Tue Jun 15 04:41:42 2021
WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.
Tue Jun 15 04:41:42 2021
NOTE: process _b000_+asm1 (58548) initiating offline of disk 0.3915948465 (OCR_0000) with mask 0x7e in group 3
NOTE: checking PST: grp = 3
GMON checking disk modes for group 3 at 23 for pid 28, osid 58548
NOTE: group OCR: updated PST location: disk 0002 (PST copy 0)
NOTE: checking PST for grp 3 done.
NOTE: sending set offline flag message 3615961191 to 1 disk(s) in group 3
WARNING: Disk OCR_0000 in mode 0x7f is now being offlined
INFO: Instance #2 could not find disk 1 in group 3
NOTE: initiating PST update: grp = 3, dsk = 0/0xe968a1b1, mask = 0x6a, op = clear
GMON updating disk modes for group 3 at 24 for pid 28, osid 58548
NOTE: group OCR: updated PST location: disk 0002 (PST copy 0)
NOTE: group OCR: updated PST location: disk 0002 (PST copy 0)
NOTE: PST update grp = 3 completed successfully 
NOTE: initiating PST update: grp = 3, dsk = 0/0xe968a1b1, mask = 0x7e, op = clear
GMON updating disk modes for group 3 at 25 for pid 28, osid 58548
NOTE: group OCR: updated PST location: disk 0002 (PST copy 0)
NOTE: group OCR: updated PST location: disk 0002 (PST copy 0)
NOTE: cache closing disk 0 of grp 3: OCR_0000
NOTE: PST update grp = 3 completed successfully 
Tue Jun 15 04:41:44 2021
NOTE: Attempting voting file refresh on diskgroup OCR
NOTE: Refresh completed on diskgroup OCR
. Found 2 voting file(s).
NOTE: Voting file relocation is required in diskgroup OCR
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: Failed voting file relocation on diskgroup OCR
WARNING: Waited 18 secs for write IO to PST disk 0 in group 3.
WARNING: Waited 18 secs for write IO to PST disk 0 in group 3.
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: Failed voting file relocation on diskgroup OCR
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: Failed voting file relocation on diskgroup OCR
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: Failed voting file relocation on diskgroup OCR
Tue Jun 15 04:44:21 2021
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: Failed voting file relocation on diskgroup OCR
Tue Jun 15 04:44:21 2021
WARNING: PST-initiated drop of 1 disk(s) in group 3(.1918390620))
SQL> alter diskgroup OCR drop disk OCR_0000 force /* ASM SERVER */ 
NOTE: GroupBlock outside rolling migration privileged region
NOTE: requesting all-instance membership refresh for group=3
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: Failed voting file relocation on diskgroup OCR
Tue Jun 15 04:44:24 2021
GMON updating for reconfiguration, group 3 at 26 for pid 28, osid 58548
NOTE: cache closing disk 0 of grp 3: (not open) OCR_0000
NOTE: group OCR: updated PST location: disk 0002 (PST copy 0)
NOTE: group 3 PST updated.
NOTE: membership refresh pending for group 3/0x7258515c (OCR)
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: Failed voting file relocation on diskgroup OCR
GMON querying group 3 at 27 for pid 18, osid 8900
NOTE: cache closing disk 0 of grp 3: (not open) _DROPPED_0000_OCR
SUCCESS: refreshed membership for 3/0x7258515c (OCR)
NOTE: starting rebalance of group 3/0x7258515c (OCR) at power 1
SUCCESS: alter diskgroup OCR drop disk OCR_0000 force /* ASM SERVER */

查询这个时候的ocr磁盘组相关信息
7a57d339c00820129cb3522c5082f35


可以明显的看到,ocr磁盘组只剩余1个disk,查询表决盘信息

node1-> crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   3619aee7c3b04fc1bfa5c4ce659acbf7 (/dev/emcpowerc) [OCR]
 2. ONLINE   00bc3e79f7404ff2bf60925a7b8a5a6d (/dev/emcpowere) [OCR]
Located 2 voting disk(s).

可以发现表决盘中的两个disk一个属于ocr磁盘组,一个是被ocr磁盘组drop掉的磁盘,尝试增加以前离线的磁盘到ocr磁盘组

SQL> alter diskgroup OCR add  disk '/dev/emcpowerc';
alter diskgroup OCR add  disk '/dev/emcpowerc'
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15033: disk '/dev/emcpowerc' belongs to diskgroup "OCR"


SQL> alter diskgroup OCR add  disk '/dev/emcpowerc' force  
  2  ;
alter diskgroup OCR add  disk '/dev/emcpowerc' force
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 15191
Session ID: 1613 Serial number: 7

查看alert日志

SQL> alter diskgroup OCR add  disk '/dev/emcpowerc' force
NOTE: GroupBlock outside rolling migration privileged region
NOTE: Assigning number (3,4) to disk (/dev/emcpowerc)
NOTE: requesting all-instance membership refresh for group=3
WARNING: ignoring disk /dev/emcpowerd in deep discovery
NOTE: initializing header on grp 3 disk OCR_0004
WARNING: ignoring disk /dev/emcpowerd in deep discovery
NOTE: requesting all-instance disk validation for group=3
NOTE: skipping rediscovery for group 3/0x725d2390 (OCR) on local instance.
NOTE: requesting all-instance disk validation for group=3
NOTE: skipping rediscovery for group 3/0x725d2390 (OCR) on local instance.
NOTE: Attempting voting file relocation on diskgroup OCR
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_12207.trc  (incident=311185):
ORA-00600: internal error code, arguments: [kfdvfGetCurrent_baddsk], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_311185/+ASM1_rbal_12207_i311185.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
ERROR: ORA-600 thrown in RBAL for group number 3
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_12207.trc:
ORA-00600: internal error code, arguments: [kfdvfGetCurrent_baddsk], [], [], [], [], [], [], [], [], [], [], []
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_12207.trc:
ORA-00600: internal error code, arguments: [kfdvfGetCurrent_baddsk], [], [], [], [], [], [], [], [], [], [], []
RBAL (ospid: 12207): terminating the instance due to error 488

由于ORA-600 kfdvfGetCurrent_baddsk错误导致增加磁盘失败,通过上面查询的votedisk的信息,可以发现emcpowerc这个盘虽然ocr中离线,但是依旧还是votedisk盘,因此无法增加到该磁盘组中,采用变通方法,先加另外一块盘

SQL> alter diskgroup OCR add failgroup OCR_0001 disk '/dev/emcpowerd' force;

Diskgroup altered.

SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
node1-> crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   00bc3e79f7404ff2bf60925a7b8a5a6d (/dev/emcpowere) [OCR]
 2. ONLINE   0eef8152df5d4f41bf973ad5dc5a6cb1 (/dev/emcpowerd) [OCR]
Located 2 voting disk(s).

增加成功emcpowerd之后,emcpowerc已经不再是表决盘,变为了emcpowerd,再次增加emcpowerc

SQL> alter diskgroup OCR add failgroup OCR_0000 disk '/dev/emcpowerc' force;

Diskgroup altered.

SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
node1-> crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   00bc3e79f7404ff2bf60925a7b8a5a6d (/dev/emcpowere) [OCR]
 2. ONLINE   0eef8152df5d4f41bf973ad5dc5a6cb1 (/dev/emcpowerd) [OCR]
 3. ONLINE   4f6201f808dc4ff3bf928b14eae0d4a6 (/dev/emcpowerc) [OCR]
Located 3 voting disk(s).

ASMCMD> lsdsk -G ocr
Path
/dev/emcpowerc
/dev/emcpowerd
/dev/emcpowere
SQL> alter diskgroup OCR add failgroup OCR_0000 disk '/dev/emcpowerc' force 
NOTE: GroupBlock outside rolling migration privileged region
NOTE: Assigning number (3,0) to disk (/dev/emcpowerc)
NOTE: requesting all-instance membership refresh for group=3
NOTE: initializing header on grp 3 disk OCR_0000
NOTE: requesting all-instance disk validation for group=3
Mon Jan 24 17:47:42 2022
NOTE: skipping rediscovery for group 3/0x725dccb9 (OCR) on local instance.
NOTE: requesting all-instance disk validation for group=3
NOTE: skipping rediscovery for group 3/0x725dccb9 (OCR) on local instance.
Mon Jan 24 17:47:48 2022
GMON updating for reconfiguration, group 3 at 20 for pid 30, osid 16978
NOTE: group 3 PST updated.
NOTE: initiating PST update: grp = 3
GMON updating group 3 at 21 for pid 30, osid 16978
NOTE: group OCR: updated PST location: disk 0002 (PST copy 0)
NOTE: group OCR: updated PST location: disk 0005 (PST copy 1)
NOTE: group OCR: updated PST location: disk 0000 (PST copy 2)
NOTE: PST update grp = 3 completed successfully 
NOTE: membership refresh pending for group 3/0x725dccb9 (OCR)
NOTE: Attempting voting file refresh on diskgroup OCR
NOTE: Refresh completed on diskgroup OCR
. Found 2 voting file(s).
NOTE: Voting file relocation is required in diskgroup OCR
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: Failed voting file relocation on diskgroup OCR
GMON querying group 3 at 22 for pid 18, osid 15952
NOTE: cache opening disk 0 of grp 3: OCR_0000 path:/dev/emcpowerc
Mon Jan 24 17:47:53 2022
NOTE: Attempting voting file refresh on diskgroup OCR
NOTE: Refresh completed on diskgroup OCR
. Found 2 voting file(s).
NOTE: Voting file relocation is required in diskgroup OCR
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: Failed voting file relocation on diskgroup OCR
GMON querying group 3 at 23 for pid 18, osid 15952
SUCCESS: refreshed membership for 3/0x725dccb9 (OCR)
Mon Jan 24 17:47:53 2022
SUCCESS: alter diskgroup OCR add failgroup OCR_0000 disk '/dev/emcpowerc' force
NOTE: starting rebalance of group 3/0x725dccb9 (OCR) at power 1
Starting background process ARB0
Mon Jan 24 17:47:53 2022
ARB0 started with pid=31, OS id=17092 
NOTE: assigning ARB0 to group 3/0x725dccb9 (OCR) with 1 parallel I/O
cellip.ora not found.
NOTE: F1X0 copy 3 relocating from 65534:4294967294 to 0:2 for diskgroup 3 (OCR)
NOTE: stopping process ARB0
SUCCESS: rebalance completed for group 3/0x725dccb9 (OCR)
NOTE: Attempting voting file refresh on diskgroup OCR
NOTE: Refresh completed on diskgroup OCR
. Found 2 voting file(s).
NOTE: Voting file relocation is required in diskgroup OCR
NOTE: Attempting voting file relocation on diskgroup OCR
NOTE: voting file allocation on grp 3 disk OCR_0000
NOTE: Successful voting file relocation on diskgroup OCR
Mon Jan 24 17:47:57 2022
NOTE: GroupBlock outside rolling migration privileged region
NOTE: requesting all-instance membership refresh for group=3
NOTE: membership refresh pending for group 3/0x725dccb9 (OCR)
Mon Jan 24 17:48:03 2022
GMON querying group 3 at 24 for pid 18, osid 15952
SUCCESS: refreshed membership for 3/0x725dccb9 (OCR)
Mon Jan 24 17:48:06 2022
NOTE: Attempting voting file refresh on diskgroup OCR
NOTE: Refresh completed on diskgroup OCR
. Found 3 voting file(s).

表决磁盘组从2个变为了3个,ocr磁盘组也恢复了正常的3个,至此OCR掉盘的故障处理完成

ORA-600 3600恢复—-resetlogs scn异常

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-600 3600恢复—-resetlogs scn异常

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

由于客户误操作,在有文件offline的情况下进行resetlogs操作,导致有文件resetlogs scn不对
20220128000040


尝试offline异常文件,均报ORA-600 3600

---直接offline
Wed Jan 26 11:08:15 2022
ALTER DATABASE RECOVER  database until cancel  
Media Recovery Start
 started logmerger process
Wed Jan 26 11:08:17 2022
Datafile 8 (ckpscn 731239901) is orphaned on incarnation#=2
Media Recovery failed with error 19909
Slave exiting with ORA-283 exception
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_pr00_133504.trc:
ORA-00283: recovery session canceled due to errors
ORA-19909: datafile 8 belongs to an orphan incarnation
ORA-01110: data file 8: 'D:\APP\ADMINISTRATOR\PRODUCT\11.2.0\DBHOME_1\DATABASE\XIFENFEI.DBF'
Recovery Slave PR00 previously exited with exception 283
ORA-283 signalled during: ALTER DATABASE RECOVER  database until cancel  ...
Wed Jan 26 11:08:31 2022
alter database datafile 8 offline
Wed Jan 26 11:08:31 2022
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_dbw0_133948.trc  (incident=134637):
ORA-00600: internal error code, arguments: [3600], [8], [14], [], [], [], [], [], [], [], [], []
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_134637\orcl_dbw0_133948_i134637.trc
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_dbw0_133948.trc:
ORA-00600: internal error code, arguments: [3600], [8], [14], [], [], [], [], [], [], [], [], []
DBW0 (ospid: 133948): terminating the instance due to error 471
Instance terminated by DBW0, pid = 133948

---offline drop
Wed Jan 26 11:09:20 2022
alter database datafile 8 offline drop
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_dbw0_133932.trc  (incident=135837):
ORA-00600: internal error code, arguments: [3600], [8], [14], [], [], [], [], [], [], [], [], []
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_135837\orcl_dbw0_133932_i135837.trc
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_dbw0_133932.trc:
ORA-00600: internal error code, arguments: [3600], [8], [14], [], [], [], [], [], [], [], [], []
DBW0 (ospid: 133932): terminating the instance due to error 471
Wed Jan 26 11:09:22 2022
Instance terminated by DBW0, pid = 133932

因为resetlogs scn不对,也无法正常重建控制文件,对于这样的case,可以Oracle Recovery Tools进行修复resetlogs scn,然后直接open库

Wed Jan 26 11:15:12 2022
SMON: enabling cache recovery
Dictionary check beginning
Archived Log entry 3 added for thread 1 sequence 381 ID 0x60b930a1 dest 1:
Dictionary check complete
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Starting background process QMNC
Wed Jan 26 11:15:15 2022
QMNC started with pid=25, OS id=131784 
LOGSTDBY: Validating controlfile with logical metadata
LOGSTDBY: Validation complete
Completed: alter database open

软件下载:OraRecovery下载
使用说明:使用说明

redo异常强制拉库报ORA-600 kcbzib_kcrsds_1修复

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:redo异常强制拉库报ORA-600 kcbzib_kcrsds_1修复

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

节点2 asm dismount导致redo写报错(ORA-00340,ORA-00345),经过分析asm和系统日志,确认是由于多路径异常导致io异常

2022-01-24T23:44:39.966602+08:00
WARNING: group 4 is being dismounted.
WARNING: ASMB force dismounting group 4 (REDO) due to ASM server dismount
SUCCESS: diskgroup REDO was dismounted
2022-01-24T23:44:41.103783+08:00
Errors in file /u01/app/oracle/diag/rdbms/xff/XFF2/trace/XFF2_lgwr_228507.trc:
ORA-00345: redo log write error block 11961764 count 6
ORA-00312: online log 10 thread 2: '+REDO/XFF/ONLINELOG/group_10.261.1074690685'
2022-01-24T23:44:41.156809+08:00
Errors in file /u01/app/oracle/diag/rdbms/xff/XFF2/trace/XFF2_lgwr_228507.trc:
ORA-00340: IO error processing online log 10 of thread 2
ORA-00345: redo log write error block 11961764 count 6
ORA-00312: online log 10 thread 2: '+REDO/XFF/ONLINELOG/group_10.261.1074690685'
Errors in file /u01/app/oracle/diag/rdbms/xff/XFF2/trace/XFF2_lgwr_228507.trc  (incident=1341402):
ORA-340 [] [] [] [] [] [] [] [] [] [] [] []
Incident details in: /u01/app/oracle/diag/rdbms/xff/XFF2/incident/incdir_1341402/XFF2_lgwr_228507_i1341402.trc
2022-01-24T23:44:41.505251+08:00
USER (ospid: 133928): terminating the instance due to error 340

由于节点2是突然crash,节点1做实例恢复失败,由于节点2的redo发生了写丢失,导致节点1实例恢复后库crash,进而是的该集群的相关数据库节点全部crash

2022-01-24T23:46:08.440519+08:00
Slave encountered ORA-10388 exception during crash recovery
2022-01-24T23:46:08.442854+08:00
Slave encountered ORA-10388 exception during crash recovery
Abort recovery for domain 0, flags 4
2022-01-24T23:46:08.444531+08:00
Aborting crash recovery due to error 742
2022-01-24T23:46:08.444695+08:00
Errors in file /u01/app/oracle/diag/rdbms/xff/XFF1/trace/XFF1_ora_450481.trc:
ORA-00742: Log read detects lost write in thread 2 sequence 63507 block 11941482
ORA-00312: online log 10 thread 2: '+REDO/XFF/ONLINELOG/group_10.261.1074690685'
Abort recovery for domain 0, flags 4
2022-01-24T23:46:08.771108+08:00
Errors in file /u01/app/oracle/diag/rdbms/xff/XFF1/trace/XFF1_ora_450481.trc:
ORA-00742: Log read detects lost write inthread 2 sequence 63507 block 11941482
ORA-00312: online log 10 thread 2: '+REDO/XFF/ONLINELOG/group_10.261.1074690685'
ORA-742 signalled during: ALTER DATABASE OPEN /* db agent *//* {0:17:165} */...
2022-01-24T23:46:10.143155+08:00
License high water mark = 33
2022-01-24T23:46:10.143752+08:00
USER (ospid: 451049): terminating the instance
2022-01-24T23:46:11.167337+08:00
Instance terminated by USER, pid = 451049

经过第三方强制拉库之后,数据库报ORA-600 kcbzib_kcrsds_1

2022-01-25T10:13:37.922332+08:00
Completed crash recovery at
 Thread 2: RBA 5.3.16, nab 3, scn 0x00000a348a032122
 0 data blocks read, 0 data blocks written, 0 redo k-bytes read
2022-01-25T10:13:38.071326+08:00
Thread 2 advanced to log sequence 6 (thread recovery)
validate pdb 0, flags x4, valid 0, pdb flags x204 
* validated domain 0, flags = 0x200
CRASH recovery complete: pdb 0 valid 1 (flags x4, pdb flags x200) 
Picked broadcast on commit scheme to generate SCNs
Endian type of dictionary set to little
2022-01-25T10:13:38.389741+08:00
TT00: Gap Manager starting (PID:220505)
2022-01-25T10:13:38.646484+08:00
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: +DATA/XFF/ONLINELOG/group_1.536.1094875107
Successful open of redo thread 1
2022-01-25T10:13:38.647243+08:00
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Errors in file /u01/app/oracle/diag/rdbms/xff/XFF1/trace/XFF1_ora_216590.trc  (incident=1879556):
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/xff/XFF1/incident/incdir_1879556/XFF1_ora_216590_i1879556.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
*****************************************************************
An internal routine has requested a dump of selected redo.
This usually happens following a specific internal error, when
analysis of the redo logs will help Oracle Support with the
diagnosis.
It is recommended that you retain all the redo logs generated (by
all the instances) during the past 12 hours, in case additional
redo dumps are required to help with the diagnosis.
*****************************************************************
2022-01-25T10:13:39.734366+08:00
Errors in file /u01/app/oracle/diag/rdbms/xff/XFF1/trace/XFF1_ora_216590.trc:
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
2022-01-25T10:13:39.734424+08:00
Errors in file /u01/app/oracle/diag/rdbms/xff/XFF1/trace/XFF1_ora_216590.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
2022-01-25T10:13:39.734499+08:00
Errors in file /u01/app/oracle/diag/rdbms/xff/XFF1/trace/XFF1_ora_216590.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
2022-01-25T10:13:39.734536+08:00
Error 704 happened during db open, shutting down database
Errors in file /u01/app/oracle/diag/rdbms/xff/XFF1/trace/XFF1_ora_216590.trc  (incident=1879557):
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/xff/XFF1/incident/incdir_1879557/XFF1_ora_216590_i1879557.trc
2022-01-25T10:13:39.894464+08:00
2022-01-25T10:13:40.446888+08:00
opiodr aborting process unknown ospid (216590) as a result of ORA-603
2022-01-25T10:13:40.470643+08:00
ORA-603 : opitsk aborting process
License high water mark = 36
2022-01-25T10:13:40.471453+08:00
USER (ospid: 216590): terminating the instance due to error 704
2022-01-25T10:13:41.436133+08:00
opiodr aborting process unknown ospid (189796) as a result of ORA-1092
2022-01-25T10:13:41.439011+08:00
ORA-1092 : opitsk aborting process
2022-01-25T10:13:41.472060+08:00
PMON (ospid: 189585): terminating the instance due to error 704

该错误是12c之后才有的报错,由于文件异常导致,通过以前的解决经验,接手这个问题之后快速调整数据库文件头信息,顺利open库
参考以前相关blog内容:
Oracle 12c redo 丢失恢复
模拟19c数据库redo异常恢复
ORA-600 kcbzib_kcrsds_1报错
12C数据库报ORA-600 kcbzib_kcrsds_1故障处理
ORA-00603 ORA-01092 ORA-600 kcbzib_kcrsds_1

[oracle@xifenfei02 ~]$ sqlplus / as sysdba

SQL*Plus: Release 12.2.0.1.0 Production on Wed Jan 26 00:31:50 2022

Copyright (c) 1982, 2016, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup mount pfile='/tmp/pfile'
ORACLE instance started.

Total System Global Area 3.2320E+11 bytes
Fixed Size                 29879248 bytes
Variable Size            4.5634E+10 bytes
Database Buffers         1.9059E+11 bytes
Redo Buffers             1043861504 bytes
In-Memory Area           8.5899E+10 bytes
Database mounted.
SQL> alter database open ;

Database altered.