udev_start导致vip漂移(参见情况:rac在线加盘操作引起)

Posted on 2023 年 05 月 13 日 by 惜分飞

客户对asm进行扩容,执行udev_start命令之后,所有的vip全部漂移,业务全部中断

优先恢复业务,把所有vip漂移回来

[grid@rac3 ~]$  srvctl relocate vip -i rac1 -n rac1 -f -v
VIP was relocated successfully.
[grid@rac3 ~]$  srvctl relocate vip -i rac2 -n rac2 -f -v
VIP was relocated successfully.
[grid@rac3 ~]$  srvctl relocate vip -i rac3 -n rac3 -f -v
VIP was relocated successfully.
[grid@rac3 ~]$  srvctl relocate vip -i rac4 -n rac4 -f -v
VIP was relocated successfully.

vip恢复正常，业务也恢复正常

出现该问题的原因是由于udev_start命令引起网卡瞬间中断,从而使得vip发生漂移

查看ifcfg配置文件

引起该问题的原因是udev对网卡进行了操作,从而引起该问题,处理建议在对应的ifcfg文件中加上 HOTPLUG=”no” （pulbic,private和其他需要关注的网络）
参考:Network interface going down when dynamically adding disks to storage using udev in RHEL 6 (Doc ID 1569028.1)

又一例ORA-600 kcbzpbuf_1恢复

Posted on 2023 年 05 月 11 日 by 惜分飞

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：又一例ORA-600 kcbzpbuf_1恢复

数据库突然报ORA-600 kdddgb1和ORA-600 kcl_snd_cur_2错误,并且导致实例crash

Tue May 09 22:29:40 2023
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_338012.trc  (incident=962050):
ORA-00600: internal error code, arguments: [kdddgb1], [0], [], [], [], [], [], [], [], [], [], []
Incident details in: /oracle/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_962050/orcl1_ora_338012_i962050.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Tue May 09 22:29:43 2023
Hex dump of (file 75, block 1154926) in trace file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_lms3_217928.trc
Corrupt block relative dba: 0x12d19f6e (file 75, block 1154926)
Bad header found during preparing block for transfer
Data in bad block:
 type: 0 format: 2 rdba: 0x1affe051
 last change scn: 0x0009.a2266e65 seq: 0x2 flg: 0x10
 spare1: 0x83 spare2: 0x36 spare3: 0x3700
 consistency value in tail: 0x6e650002
 check value in block header: 0x0
 block checksum disabled
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_lms3_217928.trc  (incident=960186):
ORA-00600: internal error code, arguments: [kcl_snd_cur_2], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /oracle/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_960186/orcl1_lms3_217928_i960186.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Tue May 09 22:29:43 2023
Sweep [inc][962050]: completed
Sweep [inc][960186]: completed
Sweep [inc2][962050]: completed
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_lms3_217928.trc:
ORA-00600: internal error code, arguments: [kcl_snd_cur_2], [], [], [], [], [], [], [], [], [], [], []
LMS3 (ospid: 217928): terminating the instance due to error 484
System state dump requested by (instance=1, osid=217928 (LMS3)), summary=[abnormal instance termination].
System State dumped to trace file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_diag_217897_20230509222949.trc
Tue May 09 22:29:52 2023
ORA-1092 : opitsk aborting process
Tue May 09 22:29:53 2023
ORA-1092 : opitsk aborting process
Tue May 09 22:29:54 2023
Instance terminated by LMS3, pid = 217928

另外一个正在运行的实例做instance recovery,然后节点报ORA-600 kcbzpbuf_1,节点也crash,再次启动一直该错误无法正常启动.

Wed May 10 08:17:07 2023
Hex dump of (file 75, block 1154926) in trace file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_dbw9_134621.trc
Corrupt block relative dba: 0x12d19f6e (file 75, block 1154926)
Bad header found during preparing block for write
Data in bad block:
 type: 0 format: 2 rdba: 0x1affe051
 last change scn: 0x0009.a2266e65 seq: 0x2 flg: 0x34
 spare1: 0x83 spare2: 0x36 spare3: 0x3700
 consistency value in tail: 0x6e650002
 check value in block header: 0xf894
 computed block checksum: 0x0
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_dbw9_134621.trc  (incident=2240402):
ORA-00600: internal error code, arguments: [kcbzpbuf_1], [4], [1], [], [], [], [], [], [], [], [], []
Incident details in: /oracle/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_2240402/orcl1_dbw9_134621_i2240402.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_dbw9_134621.trc:
ORA-00600: internal error code, arguments: [kcbzpbuf_1], [4], [1], [], [], [], [], [], [], [], [], []
DBW9 (ospid: 134621): terminating the instance due to error 471
Wed May 10 08:17:08 2023
System state dump requested by (instance=1, osid=134621 (DBW9)), summary=[abnormal instance termination].
System State dumped to trace file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_diag_134555_20230510081708.trc
Instance terminated by DBW9, pid = 134621

尝试直接recover datafile 75失败,报ORA-03113

SQL> recover datafile 75;
ORA-03113: end-of-file on communication channel
Process ID: 281304
Session ID: 14161 Serial number: 1503

dbv检查file 75,发现15个block逻辑坏块

[oracle@oradb21 ~]$ dbv userid=xxx/xxx file=+datadg/orcl/datafile/xifenfei01.377.1130539753

DBVERIFY: Release 11.2.0.4.0 - Production on Wed May 10 08:29:44 2023

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = +datadg/orcl/datafile/xifenfei01.377.1130539753
Block Checking: DBA = 314866909, Block Type = KTB-managed data block
data header at 0x7f852b573064
kdbchk: row locked by non-existent transaction
        table=0   slot=13
        lockid=101   ktbbhitc=2
Page 294109 failed with check code 6101
Block Checking: DBA = 314866928, Block Type = KTB-managed data block
data header at 0x7f852b599064
kdbchk: row locked by non-existent transaction
        table=0   slot=18
        lockid=101   ktbbhitc=2
Page 294128 failed with check code 6101
Block Checking: DBA = 315415269, Block Type = KTB-managed data block
data header at 0x7f852b583064
kdbchk: the amount of space used is not equal to block size
        used=7470 fsc=0 avsp=625 dtl=8088
Page 842469 failed with check code 6110
Block Checking: DBA = 315415302, Block Type = KTB-managed data block
data header at 0x7f852b3c3064
kdbchk: row locked by non-existent transaction
        table=0   slot=13
        lockid=101   ktbbhitc=2
Page 842502 failed with check code 6101
Block Checking: DBA = 315415350, Block Type = KTB-managed data block
data header at 0x7f852b423064
kdbchk: row locked by non-existent transaction
        table=0   slot=14
        lockid=101   ktbbhitc=2
Page 842550 failed with check code 6101
Block Checking: DBA = 315415351, Block Type = KTB-managed data block
data header at 0x7f852b425064
kdbchk: row locked by non-existent transaction
        table=0   slot=10
        lockid=101   ktbbhitc=2
Page 842551 failed with check code 6101
Block Checking: DBA = 315415397, Block Type = KTB-managed data block
data header at 0x7f852b481064
kdbchk: row locked by non-existent transaction
        table=0   slot=14
        lockid=101   ktbbhitc=2
Page 842597 failed with check code 6101
Block Checking: DBA = 315415414, Block Type = KTB-managed data block
data header at 0x7f852b4a3064
kdbchk: row locked by non-existent transaction
        table=0   slot=14
        lockid=101   ktbbhitc=2
Page 842614 failed with check code 6101
Block Checking: DBA = 315665300, Block Type = KTB-managed data block
data header at 0x7f852b2dd0ac
kdbchk: the amount of space used is not equal to block size
        used=7191 fsc=0 avsp=832 dtl=8016
Page 1092500 failed with check code 6110
Block Checking: DBA = 315665302, Block Type = KTB-managed data block
data header at 0x7f852b2e10ac
kdbchk: row locked by non-existent transaction
        table=0   slot=14
        lockid=101   ktbbhitc=5
Page 1092502 failed with check code 6101
Block Checking: DBA = 315665316, Block Type = KTB-managed data block
data header at 0x7f852b2fd0ac
kdbchk: the amount of space used is not equal to block size
        used=7140 fsc=0 avsp=883 dtl=8016
Page 1092516 failed with check code 6110
Block Checking: DBA = 315665491, Block Type = KTB-managed data block
data header at 0x7f852f4170c4
kdbchk: row locked by non-existent transaction
        table=0   slot=3
        lockid=101   ktbbhitc=6
Page 1092691 failed with check code 6101
Block Checking: DBA = 315727518, Block Type = KTB-managed data block
data header at 0x7f852b4f50c4
kdbchk: row locked by non-existent transaction
        table=0   slot=8
        lockid=101   ktbbhitc=6
Page 1154718 failed with check code 6101
Block Checking: DBA = 315727614, Block Type = KTB-managed data block
data header at 0x7f852b5b50ac
kdbchk: row locked by non-existent transaction
        table=0   slot=15
        lockid=101   ktbbhitc=5
Page 1154814 failed with check code 6101
Block Checking: DBA = 315727646, Block Type = KTB-managed data block
data header at 0x7f852b3f30ac
kdbchk: row locked by non-existent transaction
        table=0   slot=3
        lockid=101   ktbbhitc=5
Page 1154846 failed with check code 6101


DBVERIFY - Verification complete

Total Pages Examined         : 1835008
Total Pages Processed (Data) : 250749
Total Pages Failing   (Data) : 15
Total Pages Processed (Index): 74532
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 1244181
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 265546
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 2720428335 (9.2720428335)

通过对坏块一些处理,数据库open成功,以前有过类似恢复ORA-600 kcbzpbuf_1故障恢复

SQL> alter database open;

Database altered.

alert日志报事务异常

ORACLE Instance orcl1 (pid = 34) - Error 1578 encountered while recovering transaction (697, 6) on object 170692.
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_smon_301450.trc:
ORA-01578: ORACLE data block corrupted (file # 75, block # 1154926)
ORA-01110: data file 75: '+DATADG/orcl/datafile/xifenfei01.377.1130539753'
Archived Log entry 9299 added for thread 1 sequence 4781 ID 0x5f4a1865 dest 1:
Wed May 10 08:24:03 2023
NOTE: dependency between database orcl and diskgroup resource ora.ARCHDG.dg is established
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Wed May 10 08:24:04 2023
Starting background process EMNC
Wed May 10 08:24:04 2023
EMNC started with pid=49, OS id=305303 
Archived Log entry 9300 added for thread 2 sequence 4530 ID 0x5f4a1865 dest 1:
ARC2: Archiving disabled thread 2 sequence 4531
Archived Log entry 9301 added for thread 2 sequence 4531 ID 0x5f4a1865 dest 1:
Wed May 10 08:24:13 2023
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_p000_305307.trc  (incident=2560578):
ORA-01578: ORACLE data block corrupted (file # 75, block # 1154926)
ORA-01110: data file 75: '+DATADG/orcl/datafile/xifenfei01.377.1130539753'
Incident details in: /oracle/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_2560578/orcl1_p000_305307_i2560578.trc
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_p000_305307.trc  (incident=2560579):
ORA-01578: ORACLE data block corrupted (file # , block # )
Incident details in: /oracle/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_2560579/orcl1_p000_305307_i2560579.trc
Wed May 10 08:24:15 2023
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_smon_301450.trc  (incident=2560427):
ORA-01578: ORACLE data block corrupted (file # 75, block # 1154926)
ORA-01110: data file 75: '+DATADG/orcl/datafile/xifenfei01.377.1130539753'
Incident details in: /oracle/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_2560427/orcl1_smon_301450_i2560427.trc
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_smon_301450.trc  (incident=2560432):
ORA-01578: ORACLE data block corrupted (file # 75, block # 1154926)
ORA-01110: data file 75: '+DATADG/orcl/datafile/xifenfei01.377.1130539753'
ORACLE Instance orcl1 (pid = 34) - Error 1578 encountered while recovering transaction (717, 20) on object 170692.
Errors in file /oracle/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_smon_301450.trc:
ORA-01578: ORACLE data block corrupted (file # 75, block # 1154926)
ORA-01110: data file 75: '+DATADG/orcl/datafile/xifenfei01.377.1130539753'

处理异常事务,并且定位异常对象表

SQL> select owner,object_name,object_type from dba_objects where object_id=170692;

OWNER
--------------------------------------------------------------------------------
OBJECT_NAME
--------------------------------------------------------------------------------
OBJECT_TYPE
---------------------------------------------------------
XFF
T_XIFENFEI
TABLE

rman检测逻辑坏块所属对象也是这个表（15个坏块均为该表）,对该表数据进行重建抛弃损坏数据,完成本次恢复

ORA-01172 ORA-01151 故障恢复

Posted on 2023 年 05 月 10 日 by 惜分飞

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：ORA-01172 ORA-01151 故障恢复

节点2报Error: Controlfile sequence number in file header is different from the one in memory,导致实例异常

Tue May 09 23:03:24 2023
Thread 2 cannot allocate new log, sequence 16728
Checkpoint not complete
  Current log# 3 seq# 16727 mem# 0: +DATA/xff/onlinelog/group_3.265.941900045
  Current log# 3 seq# 16727 mem# 1: +FRA/xff/onlinelog/group_3.259.941900045
Thread 2 advanced to log sequence 16728 (LGWR switch)
  Current log# 4 seq# 16728 mem# 0: +DATA/xff/onlinelog/group_4.266.941900045
  Current log# 4 seq# 16728 mem# 1: +FRA/xff/onlinelog/group_4.260.941900045
Tue May 09 23:03:31 2023
LNS: Standby redo logfile selected for thread 2 sequence 16728 for destination LOG_ARCHIVE_DEST_2
Tue May 09 23:03:32 2023
Archived Log entry 431615 added for thread 2 sequence 16727 ID 0x5ffc99b5 dest 1:
Tue May 09 23:05:30 2023
Error: Controlfile sequence number in file header is different from the one in memory
       Please check that the correct mount options are used if controlfile is located on NFS
USER (ospid: 30162): terminating the instance
Tue May 09 23:05:30 2023
System state dump requested by (instance=2, osid=30162), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/xff/xff2/trace/xff2_diag_6650.trc
Instance terminated by USER, pid = 30162

在节点1 进行实例重组之后,节点1 实例异常

Tue May 09 23:04:54 2023
Thread 1 cannot allocate new log, sequence 2060
Checkpoint not complete
  Current log# 1 seq# 2059 mem# 0: +DATA/xff/onlinelog/group_1.261.941899887
  Current log# 1 seq# 2059 mem# 1: +FRA/xff/onlinelog/group_1.257.941899887
Thread 1 advanced to log sequence 2060 (LGWR switch)
  Current log# 2 seq# 2060 mem# 0: +DATA/xff/onlinelog/group_2.262.941899889
  Current log# 2 seq# 2060 mem# 1: +FRA/xff/onlinelog/group_2.258.941899889
Tue May 09 23:04:58 2023
********************* ATTENTION: ******************** 
 The controlfile header block returned by the OS
 has a sequence number that is too old. 
 The controlfile might be corrupted.
 PLEASE DO NOT ATTEMPT TO START UP THE INSTANCE 
 without following the steps below.
 RE-STARTING THE INSTANCE CAN CAUSE SERIOUS DAMAGE 
 TO THE DATABASE, if the controlfile is truly corrupted.
 In order to re-start the instance safely, 
 please do the following:
 (1) Save all copies of the controlfile for later 
     analysis and contact your OS vendor and Oracle support.
 (2) Mount the instance and issue: 
     ALTER DATABASE BACKUP CONTROLFILE TO TRACE;
 (3) Unmount the instance. 
 (4) Use the script in the trace file to
     RE-CREATE THE CONTROLFILE and open the database. 
*****************************************************
Tue May 09 23:05:31 2023
Reconfiguration started (old inc 20, new inc 22)
List of instances:
 1 (myinst: 1) 
 Global Resource Directory frozen
 * dead instance detected - domain 0 invalid = TRUE 
 Communication channels reestablished
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
Tue May 09 23:05:31 2023
 LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Tue May 09 23:05:31 2023
 LMS 0: 3 GCS shadows cancelled, 0 closed, 0 Xw survived
 Set master node info 
 Submitted all remote-enqueue requests
 Dwn-cvts replayed, VALBLKs dubious
 All grantable enqueues granted
 Post SMON to start 1st pass IR
Tue May 09 23:05:32 2023
Instance recovery: looking for dead threads
 Submitted all GCS remote-cache requests
 Post SMON to start 1st pass IR
 Fix write in gcs resources
Reconfiguration complete
Tue May 09 23:06:00 2023
ARC1 (ospid: 26512): terminating the instance
Tue May 09 23:06:00 2023
System state dump requested by (instance=1, osid=26512 (ARC1)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_diag_26311.trc
Tue May 09 23:06:01 2023
ORA-1092 : opitsk aborting process
Instance terminated by ARC1, pid = 26512

实例重启报错

Recovery of Online Redo Log: Thread 1 Group 1 Seq 2059 Reading mem 0
  Mem# 0: +DATA/dbm/onlinelog/group_1.261.941899887
  Mem# 1: +FRA/dbm/onlinelog/group_1.257.941899887
Recovery of Online Redo Log: Thread 2 Group 3 Seq 16727 Reading mem 0
  Mem# 0: +DATA/dbm/onlinelog/group_3.265.941900045
  Mem# 1: +FRA/dbm/onlinelog/group_3.259.941900045
Recovery of Online Redo Log: Thread 2 Group 4 Seq 16728 Reading mem 0
  Mem# 0: +DATA/dbm/onlinelog/group_4.266.941900045
  Mem# 1: +FRA/dbm/onlinelog/group_4.260.941900045
Hex dump of (file 1, block 102777) in trace file /u01/app/oracle/diag/rdbms/dbm/dbm2/trace/dbm2_ora_30749.trc
Reading datafile '+DATA/dbm/datafile/system.256.941899799' for corruption at rdba: 0x00419179 (file 1, block 102777)
Reread (file 1, block 102777) found different corrupt data (logically corrupt)
Hex dump of (file 1, block 102777) in trace file /u01/app/oracle/diag/rdbms/dbm/dbm2/trace/dbm2_ora_30749.trc
RECOVERY OF THREAD 2 STUCK AT BLOCK 102777 OF FILE 1
Abort recovery for domain 0
Aborting crash recovery due to error 1172
Errors in file /u01/app/oracle/diag/rdbms/dbm/dbm2/trace/dbm2_ora_30749.trc:
ORA-01172: recovery of thread 2 stuck at block 102777 of file 1
ORA-01151: use media recovery to recover block, restore backup if needed
Abort recovery for domain 0
Errors in file /u01/app/oracle/diag/rdbms/dbm/dbm2/trace/dbm2_ora_30749.trc:
ORA-01172: recovery of thread 2 stuck at block 102777 of file 1
ORA-01151: use media recovery to recover block, restore backup if needed
ORA-1172 signalled during: ALTER DATABASE OPEN /* db agent *//* {0:890:17} */...

人工recover操作失败报ORA-600 3020错误

SQL> recover datafile 1;
ORA-00283: recovery session canceled due to errors
ORA-00600: internal error code, arguments: [3020], [1], [102777], [4297081],[], []
ORA-10567: Redo is inconsistent with data block (file# 1, block# 102777, file
offset is 841949184 bytes)
ORA-10564: tablespace SYSTEM
ORA-01110: data file 1: '+DATA/dbm/datafile/system.256.941899799'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 469884

---alert日志
Tue May 09 23:28:44 2023
ALTER DATABASE RECOVER  datafile 1  
Media Recovery Start
Serial Media Recovery started
Recovery of Online Redo Log: Thread 2 Group 3 Seq 16727 Reading mem 0
  Mem# 0: +DATA/xff/onlinelog/group_3.265.941900045
  Mem# 1: +FRA/xff/onlinelog/group_3.259.941900045
ORA-279 signalled during: ALTER DATABASE RECOVER  datafile 1  ...
ALTER DATABASE RECOVER    CONTINUE DEFAULT  
Media Recovery Log +FRA/xff/archivelog/2023_05_09/thread_1_seq_2055.20899.1136415701
ORA-279 signalled during: ALTER DATABASE RECOVER    CONTINUE DEFAULT  ...
ALTER DATABASE RECOVER    CONTINUE DEFAULT  
Media Recovery Log +FRA/xff/archivelog/2023_05_09/thread_1_seq_2056.20837.1136415753
ORA-279 signalled during: ALTER DATABASE RECOVER    CONTINUE DEFAULT  ...
ALTER DATABASE RECOVER    CONTINUE DEFAULT  
Media Recovery Log +FRA/xff/archivelog/2023_05_09/thread_1_seq_2057.20911.1136415803
ORA-279 signalled during: ALTER DATABASE RECOVER    CONTINUE DEFAULT  ...
ALTER DATABASE RECOVER    CONTINUE DEFAULT  
Media Recovery Log +FRA/xff/archivelog/2023_05_09/thread_1_seq_2058.21898.1136415853
Recovery of Online Redo Log: Thread 2 Group 4 Seq 16728 Reading mem 0
  Mem# 0: +DATA/xff/onlinelog/group_4.266.941900045
  Mem# 1: +FRA/xff/onlinelog/group_4.260.941900045
Recovery of Online Redo Log: Thread 1 Group 1 Seq 2059 Reading mem 0
  Mem# 0: +DATA/xff/onlinelog/group_1.261.941899887
  Mem# 1: +FRA/xff/onlinelog/group_1.257.941899887
Hex dump of (file 1, block 102777) in trace file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_16246.trc
Reading datafile '+DATA/xff/datafile/system.256.941899799' for corruption at rdba: 0x00419179 (file 1, block 102777)
Reread (file 1, block 102777) found different corrupt data (logically corrupt)
Hex dump of (file 1, block 102777) in trace file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_16246.trc
Tue May 09 23:28:59 2023
Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_16246.trc  (incident=6868615):
ORA-00600: internal error code, arguments: [3020], [1], [102777], [4297081], [], [], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 1, block# 102777, file offset is 841949184 bytes)
ORA-10564: tablespace SYSTEM
ORA-01110: data file 1: '+DATA/xff/datafile/system.256.941899799'
ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 469884
Incident details in: /u01/app/oracle/diag/rdbms/xff/xff1/incident/incdir_6868615/xff1_ora_16246_i6868615.trc
Tue May 09 23:29:00 2023
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Media Recovery failed with error 600
ORA-283 signalled during: ALTER DATABASE RECOVER    CONTINUE DEFAULT  ...
ALTER DATABASE RECOVER CANCEL 
ORA-1112 signalled during: ALTER DATABASE RECOVER CANCEL ...

根据上述报错信息可以确认报错的是一个index,而且非系统核心对象,可以通过allow 1 corruption方式进行恢复,并且open库成功

SQL> recover  datafile 1 allow 1 corruption;
Media recovery complete.
SQL> alter database open;

Database altered.

SQL> select owner,object_name,object_type from dba_objects where object_id=469884;

OWNER
--------------------------------------------------------------------------------
OBJECT_NAME
--------------------------------------------------------------------------------
OBJECT_TYPE
---------------------------------------------------------
SYSTEM
PK_XFF_SERVERS
INDEX

SQL> alter index system.PK_XFF_SERVERS rebuild online;

Index altered.

数据库完美恢复,数据0丢失,业务可以直接正常使用

存储双活同步导致数据库异常恢复

Posted on 2023 年 05 月 07 日 by 惜分飞

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：存储双活同步导致数据库异常恢复

客户双活存储异常之后,单个存储运行,故障存储修复之后,双活同步,出现多套系统异常,上一篇:Control file mount id mismatch!故障处理,这套是win的rac无法正常启动,ocr磁盘组异常(报ORA-600 kfrValAcd30无法正常mount)

C:\Users\Administrator>crsctl start cluster -all
CRS-2672: 尝试启动 'ora.crf' (在 'xff2' 上)
CRS-2672: 尝试启动 'ora.asm' (在 'xff2' 上)
CRS-2672: 尝试启动 'ora.crf' (在 'xff1' 上)
CRS-2672: 尝试启动 'ora.asm' (在 'xff1' 上)
CRS-2676: 成功启动 'ora.crf' (在 'xff2' 上)
CRS-2676: 成功启动 'ora.crf' (在 'xff1' 上)
CRS-5017: 资源操作 "ora.asm start" 遇到以下错误:
ORA-00600: internal error code, arguments: [kfrValAcd30], [OCR_VOTE], [1], [14], [7556], [15], [7584], [], [], [], [], []
。有关详细信息, 请参阅 "(:CLSN00107:)" (位于 "F:\app\grid\Administrator\diag\crs\xff2\crs\trace\ohasd_oraagent_system.trc" 中)。
CRS-2674: 未能启动 'ora.asm' (在 'xff2' 上)
CRS-2679: 尝试清除 'ora.asm' (在 'xff2' 上)
CRS-5017: 资源操作 "ora.asm start" 遇到以下错误:
ORA-00600: internal error code, arguments: [kfrValAcd30], [OCR_VOTE], [1], [14], [7556], [15], [7584], [], [], [], [], []
。有关详细信息, 请参阅 "(:CLSN00107:)" (位于 "F:\app\grid\Administrator\diag\crs\xff1\crs\trace\ohasd_oraagent_system.trc" 中)。
CRS-2674: 未能启动 'ora.asm' (在 'xff1' 上)
CRS-2679: 尝试清除 'ora.asm' (在 'xff1' 上)
CRS-2681: 成功清除 'ora.asm' (在 'xff2' 上)
CRS-2673: 尝试停止 'ora.crf' (在 'xff2' 上)
CRS-2677: 成功停止 'ora.crf' (在 'xff2' 上)
CRS-2681: 成功清除 'ora.asm' (在 'xff1' 上)
CRS-2673: 尝试停止 'ora.crf' (在 'xff1' 上)
CRS-2677: 成功停止 'ora.crf' (在 'xff1' 上)
CRS-4705: 无法在节点 xff1 上启动集群件。
CRS-4705: 无法在节点 xff2 上启动集群件。
CRS-4000: 命令 Start 失败, 或已完成但出现错误。

因为是ocr磁盘组操作比较简单,直接重建该磁盘组，还原ocr等即可

C:\Users\Administrator>asmtool -list
NTFS                             \Device\Harddisk0\Partition1              300M
NTFS                             \Device\Harddisk0\Partition4           599472M
NTFS                             \Device\Harddisk0\Partition5          1000000M
ORCLDISKDATA0                    \Device\Harddisk1\Partition1          1048587M
ORCLDISKDATA1                    \Device\Harddisk2\Partition1          1048587M
ORCLDISKDATA2                    \Device\Harddisk3\Partition1          1048587M
ORCLDISKDATA3                    \Device\Harddisk4\Partition1          1048587M
ORCLDISKDATA4                    \Device\Harddisk6\Partition1           460797M

C:\Users\Administrator>crsctl start crs -excl -nocrs
CRS-4123: Oracle 高可用性服务已启动。
CRS-2672: 尝试启动 'ora.evmd' (在 'xff2' 上)
CRS-2672: 尝试启动 'ora.mdnsd' (在 'xff2' 上)
CRS-2676: 成功启动 'ora.mdnsd' (在 'xff2' 上)
CRS-2676: 成功启动 'ora.evmd' (在 'xff2' 上)
CRS-2672: 尝试启动 'ora.gpnpd' (在 'xff2' 上)
CRS-2676: 成功启动 'ora.gpnpd' (在 'xff2' 上)
CRS-2672: 尝试启动 'ora.cssdmonitor' (在 'xff2' 上)
CRS-2672: 尝试启动 'ora.gipcd' (在 'xff2' 上)
CRS-2676: 成功启动 'ora.cssdmonitor' (在 'xff2' 上)
CRS-2676: 成功启动 'ora.gipcd' (在 'xff2' 上)
CRS-2672: 尝试启动 'ora.cssd' (在 'xff2' 上)
CRS-2676: 成功启动 'ora.cssd' (在 'xff2' 上)
CRS-2672: 尝试启动 'ora.ctssd' (在 'xff2' 上)
CRS-2676: 成功启动 'ora.ctssd' (在 'xff2' 上)
CRS-2672: 尝试启动 'ora.asm' (在 'xff2' 上)
CRS-5017: 资源操作 "ora.asm start" 遇到以下错误:
ORA-00600: internal error code, arguments: [kfrValAcd30], [OCR_VOTE], [1], [14], [7556], [15], [7584], [], [], [], [], []
。有关详细信息, 请参阅 "(:CLSN00107:)" (位于 "F:\app\grid\Administrator\diag\crs\xff2\crs\trace\ohasd_oraagent_system.trc" 中)。
CRS-2674: 未能启动 'ora.asm' (在 'xff2' 上)
CRS-2679: 尝试清除 'ora.asm' (在 'xff2' 上)
CRS-2681: 成功清除 'ora.asm' (在 'xff2' 上)
CRS-2673: 尝试停止 'ora.ctssd' (在 'xff2' 上)
CRS-2677: 成功停止 'ora.ctssd' (在 'xff2' 上)
CRS-4000: 命令 Start 失败, 或已完成但出现错误。

C:\Users\Administrator>sqlplus / as sysasm

SQL*Plus: Release 12.1.0.2.0 Production on 星期四 5月 4 13:52:07 2023

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

已连接到空闲例程。

SQL> startup nomount pfile='f:/pfile_asm.txt';
ASM 实例已启动

Total System Global Area 1140850688 bytes
Fixed Size                  3054680 bytes
Variable Size            1112630184 bytes
ASM Cache                  25165824 bytes

SQL>  create diskgroup OCR_VOTE  external redundancy disk '\\.\ORCLDISKDATA4' force  attribute 'COMPATIBLE.ASM' = '12.1.0';

Diskgroup created.

F:\>ocrconfig -restore backup00.ocr

F:\>crsctl replace votedisk +OCR_VOTE
已成功添加表决磁盘 e2b8fdbd05ae4f9fbf3531630853dbbc。
已成功将表决磁盘组替换为 +OCR_VOTE。
CRS-4266: 已成功替换表决文件

F:\>crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   e2b8fdbd05ae4f9fbf3531630853dbbc (\\.\ORCLDISKDATA4) [OCR_VOTE]
找到了 1 个表决磁盘。

F:\>ocrcheck
Oracle 集群注册表的状态如下:
         版本                  :          4
         总空间 (KB)     :     409568
         已用空间 (KB)      :       1348
         可用空间 (KB):     408220
         ID                       :  820087446
         设备/文件名         :  +OCR_VOTE
                                    设备/文件完整性检查成功

                                    设备/文件尚未配置

                                    设备/文件尚未配置

                                    设备/文件尚未配置

                                    设备/文件尚未配置

         集群注册表完整性检查成功

         逻辑损坏检查成功

mount其他磁盘组成功

SQL> alter diskgroup arch mount;

Diskgroup altered.

SQL>


SQL> alter diskgroup data mount;

Diskgroup altered.

尝试恢复数据库失败

C:\Users\Administrator>sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on 星期四 5月 4 14:09:39 2023

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

已连接到空闲例程。

SQL> startup mount;
ORACLE 例程已经启动。

Total System Global Area 2.0992E+11 bytes
Fixed Size                  7797816 bytes
Variable Size            1.3798E+11 bytes
Database Buffers         7.1672E+10 bytes
Redo Buffers              260636672 bytes
数据库装载完毕。

SQL> recover database;
ORA-10562: Error occurred while applying redo to data block (file# 13, block#1033775)
ORA-10564: tablespace USERS
ORA-01110: 数据文件 13: '+DATA/XFF/users07.dbf'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 40396
ORA-00600: 内部错误代码, 参数: [kdolkr-2], [2], [1], [44], [], [], [], [], [],[], [], []


SQL> recover datafile 2;
ORA-00283: 恢复会话因错误而取消
ORA-00742: 日志读取在线程 1 序列 60656 块 1150508 中检测到写入丢失情况
ORA-00312: 联机日志 3 线程 1: '+DATA/XFF/redo03.log'


SQL> recover datafile 1;
ORA-00283: 恢复会话因错误而取消
ORA-00742: 日志读取在线程 1 序列 60656 块 1150508 中检测到写入丢失情况
ORA-00312: 联机日志 3 线程 1: '+DATA/XFF/redo03.log'

SQL> recover datafile 10;
ORA-00283: ??????????
ORA-10562: Error occurred while applying redo to data block (file# 10, block#
2899468)
ORA-10564: tablespace USERS
ORA-01110: ???? 10: '+DATA/XFF/users04.dbf'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 40396
ORA-00600: ??????, ??: [ktbair2: illegal  inheritance], [], [], [], [], [], [],[], [], [], [], []

除了ORA-00742,还有其他一些日志应用错误,比如:ORA-600 ktbair2: illegal inheritance,ORA-600 kdolkr-2等,无法正常应用日志,尝试强制打开库,报ORA-600 2662错误.

SQL> alter database open resetlogs;
alter database open resetlogs
*
第 1 行出现错误:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [2662], [8], [678024613], [8],
[678508930], [12583040], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [2662], [8], [678024612], [8],
[678508930], [12583040], [], [], [], [], [], []
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00600: internal error code, arguments: [2662], [8], [678024610], [8],
[678508930], [12583040], [], [], [], [], [], []
进程 ID: 4628
会话 ID: 996 序列号: 48547

通过自研的Patch_SCN工具快速解决该问题

open数据库成功,实现最大限度抢救客户数据.

Control file mount id mismatch!故障处理

Posted on 2023 年 05 月 05 日 by 惜分飞

联系：手机/微信(+86 17813235971) QQ(107644445)

标题：Control file mount id mismatch!故障处理

通过沟通确认客户由于存储双活异常,业务运行在主存储上,另外一套存储修复之后,进行存储双活同步,结果在这个过程中由于遭遇Control file mount id mismatch! 导致数据库crash了

2023-05-03T20:21:07.446873+08:00
Archived Log entry 491897 added for T-1.S-246903 ID 0x97d92f0b LAD:1
2023-05-03T20:47:53.902701+08:00
Error: 2141
Control file mount id mismatch!
fhmid: 2592441863, SGA mid: 2624617448
Requesting DIAG on each RAC instance to dump the control file header block
2023-05-03T20:47:55.906490+08:00
Errors in file /opt/rac/oracle/diag/rdbms/xff/xff1/trace/xff1_rms0_20989.trc:
2023-05-03T20:47:56.521500+08:00
RMS0 (ospid: 20989): terminating the instance
2023-05-03T20:47:56.610656+08:00
System state dump requested by (instance=1, osid=20989 (RMS0)), summary=[abnormal instance termination].
System State dumped to trace file /opt/rac/oracle/diag/rdbms/xff/xff1/trace/xff1_diag_20912_20230503204756.trc
2023-05-03T20:47:58.480397+08:00
License high water mark = 395
2023-05-03T20:48:02.600203+08:00
Instance terminated by RMS0, pid = 20989
2023-05-03T20:48:02.601563+08:00
Warning: 2 processes are still attach to shmid 393226:
 (size: 28672 bytes, creator pid: 19941, last attach/detach pid: 20912)
2023-05-03T20:48:03.481726+08:00
USER (ospid: 967): terminating the instance
2023-05-03T20:48:03.483351+08:00
Instance terminated by USER, pid = 967

节点自动重启报错ORA-600 kccsbck_first

2023-05-03T20:48:34.870435+08:00
NOTE: ASMB mounting group 2 (FRA)
NOTE: ASM background process initiating disk discovery for grp 2 (reqid:0)
NOTE: Assigning number (2,1) to disk (/dev/asm_data0g)
NOTE: Assigning number (2,0) to disk (/dev/asm_data0f)
SUCCESS: mounted group 2 (FRA)
NOTE: grp 2 disk 1: FRA_0001 path:/dev/asm_data0g
NOTE: grp 2 disk 0: FRA_0000 path:/dev/asm_data0f
2023-05-03T20:48:34.919965+08:00
NOTE: dependency between database xff and diskgroup resource ora.FRA.dg is established
2023-05-03T20:48:38.983416+08:00
Errors in file /opt/rac/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_2436.trc  (incident=1333249):
ORA-00600: ??????, ??: [kccsbck_first], [1], [2624617448], [], [], [], [], [], [], [], [], []
Incident details in: /opt/rac/oracle/diag/rdbms/xff/xff1/incident/incdir_1333249/xff1_ora_2436_i1333249.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
ORA-600 signalled during: ALTER DATABASE MOUNT /* db agent *//* {0:8:116} */...

再次重启数据库报错ORA-00742 ORA-00312

2023-05-04T08:18:59.635790+08:00
Aborting crash recovery due to error 742
2023-05-04T08:18:59.635897+08:00
Errors in file /opt/rac/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_80855.trc:
ORA-00742: ??????? 2 ?? 244996 ? 8262 ??????????
ORA-00312: ???? 7 ?? 2: '+FRA/xff/ONLINELOG/group_7.446.1059323695'
ORA-00312: ???? 7 ?? 2: '+DATA/xff/ONLINELOG/group_7.272.1059323695'
Abort recovery for domain 0, flags 4
2023-05-04T08:18:59.647994+08:00
Errors in file /opt/rac/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_80855.trc:
ORA-00742: ??????? 2 ?? 244996 ? 8262 ??????????
ORA-00312: ???? 7 ?? 2: '+FRA/xff/ONLINELOG/group_7.446.1059323695'
ORA-00312: ???? 7 ?? 2: '+DATA/xff/ONLINELOG/group_7.272.1059323695'
ORA-742 signalled during: ALTER DATABASE OPEN /* db agent *//* {2:37368:2} */...
2023-05-04T08:19:00.820708+08:00
License high water mark = 33
2023-05-04T08:19:00.820936+08:00
USER (ospid: 82788): terminating the instance
2023-05-04T08:19:01.827132+08:00
Instance terminated by USER, pid = 82788

明显数据库在启动的时候做实例恢复,发现redo写丢失,从而引起数据库无法正常open,对于此类故障,处理比较多
ORA-00742 ORA-00312 故障恢复-1
ORA-00742 ORA-00312故障恢复-2
ORA-00742: 日志读取在线程 %d 序列 %d 块 %d 中检测到写入丢失情况