awr创建snapshot等待library cache: mutex X

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:awr创建snapshot等待library cache: mutex X

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户一个11.2.0.4的库,在准备收集awr的时候发现没有snap id
awr


人工创建snapshot发现hang住了
awr_snap

查询该会话等待事件为:library cache: mutex X,查看以前mmon的子进程m000/1的trace信息

Trace file /u01/app/oracle/diag/rdbms/xff/xff/trace/xff_m000_6241.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/11.2.0/db_1
System name:    Linux
Node name:      HIS
Release:        5.4.17-2102.201.3.el7uek.x86_64
Version:        #2 SMP Fri Apr 23 09:05:55 PDT 2021
Machine:        x86_64
VM name:        VMWare Version: 6
Instance name: xff
Redo thread mounted by this instance: 1
Oracle process number: 5714
Unix process pid: 6241, image: oracle@HIS (M000)


*** 2024-06-19 11:44:39.483
*** SESSION ID:(8709.38013) 2024-06-19 11:44:39.483
*** CLIENT ID:() 2024-06-19 11:44:39.483
*** SERVICE NAME:(SYS$BACKGROUND) 2024-06-19 11:44:39.483
*** MODULE NAME:(MMON_SLAVE) 2024-06-19 11:44:39.483
*** ACTION NAME:(Auto-Flush Slave Action) 2024-06-19 11:44:39.483

DDE rules only execution for: ORA 12751
----- START Event Driven Actions Dump ----
---- END Event Driven Actions Dump ----
----- START DDE Actions Dump -----
Executing SYNC actions
Executing ASYNC actions
----- START DDE Action: 'ORA_12751_DUMP' (Sync) -----
Runtime exceeded 900 seconds
Time limit violation detected at:
ksedsts()+465<-kspol_12751_dump()+145<-dbgdaExecuteAction()+1065<-dbgerRunAction()+109<-dbgerRunActions()
+4134<-dbgexPhaseII()+1873<-dbgexProcessError()+2680<-dbgeExecuteForError()+88<-dbgePostErrorKGE()+2136<-
dbkePostKGE_kgsf()+71<-kgeselv()+276<-kgesecl0()+139<-kgxWait()+1412<-kgxExclusive()+447<-
kglGetMutex()+140<-kglGetHandleReference()+69<-kglic0()+319<-kksIterCursorStat()+330<-kewrrtsq_rank_topsql()
+240<-kewrbtsq_build_topsql()+128<-kewrftsq_flush_topsql()+679<-kewrft_flush_table()+397<-
kewrftec_flush_table_ehdlcx()+766<-kewrfat_flush_all_tables()+1406<-kewrfos_flush_onesnap()+170
<-kewrfsc_flush_snapshot_c()+623<-kewrafs_auto_flush_slave()+769<-kebm_slave_main()+586<-ksvrdp()+1766
<-opirip()+674<-opidrv()+603<-sou2o()+103<-opimai_real()+250<-ssthrdmain()+265<-main()+201
<-__libc_start_main()+245
Current Wait Stack:
 0: waiting for 'library cache: mutex X'
    idn=0x644e2de0, value=0xf3a00000000, where=0x7c
    wait_id=1189 seq_num=1190 snap_id=1
    wait times: snap=3 min 0 sec, exc=3 min 0 sec, total=3 min 0 sec
    wait times: max=infinite, heur=15 min 3 sec
    wait counts: calls=16376 os=16376
    in_wait=1 iflags=0x15b2
There is at least one session blocking this session.
  Dumping 1 direct blocker(s):
    inst: 1, sid: 3898, ser: 47299
  Dumping final blocker:
    inst: 1, sid: 3898, ser: 47299
Wait State:
  fixed_waits=0 flags=0x22 boundary=(nil)/-1
Session Wait History:
    elapsed time of 0.000016 sec since current wait
 0: waited for 'library cache: mutex X'
    idn=0x644e2de0, value=0xf3a00000000, where=0x7c
    wait_id=1188 seq_num=1189 snap_id=1
    wait times: snap=12 min 2 sec, exc=12 min 2 sec, total=12 min 2 sec
    wait times: max=infinite
    wait counts: calls=65535 os=65535
    occurred after 0.327543 sec of elapsed time
 1: waited for 'db file sequential read'
    file#=0x2, block#=0x1a5b, blocks=0x1
    wait_id=1187 seq_num=1188 snap_id=1
    wait times: snap=0.000420 sec, exc=0.000420 sec, total=0.000420 sec
    wait times: max=infinite
    wait counts: calls=0 os=0
    occurred after 0.000251 sec of elapsed time
 2: waited for 'db file sequential read'
    file#=0x1, block#=0x82e6, blocks=0x1
    wait_id=1186 seq_num=1187 snap_id=1
    wait times: snap=0.000429 sec, exc=0.000429 sec, total=0.000429 sec
    wait times: max=infinite
    wait counts: calls=0 os=0
    occurred after 0.001085 sec of elapsed time
 3: waited for 'db file sequential read'
    file#=0x2, block#=0x11344, blocks=0x1
    wait_id=1185 seq_num=1186 snap_id=1
    wait times: snap=0.000356 sec, exc=0.000356 sec, total=0.000356 sec
    wait times: max=infinite
    wait counts: calls=0 os=0
    occurred after 0.000008 sec of elapsed time
 4: waited for 'db file sequential read'
    file#=0x2, block#=0x19eb, blocks=0x1
    wait_id=1184 seq_num=1185 snap_id=1
    wait times: snap=0.000397 sec, exc=0.000397 sec, total=0.000397 sec
    wait times: max=infinite
    wait counts: calls=0 os=0
    occurred after 0.000044 sec of elapsed time
 5: waited for 'db file sequential read'
    file#=0x2, block#=0xb1659, blocks=0x1
    wait_id=1183 seq_num=1184 snap_id=1
    wait times: snap=0.000003 sec, exc=0.000003 sec, total=0.000003 sec
    wait times: max=infinite
    wait counts: calls=0 os=0
    occurred after 0.000010 sec of elapsed time
 6: waited for 'db file sequential read'
    file#=0x2, block#=0xb1658, blocks=0x1
    wait_id=1182 seq_num=1183 snap_id=1
    wait times: snap=0.000453 sec, exc=0.000453 sec, total=0.000453 sec
    wait times: max=infinite
    wait counts: calls=0 os=0
    occurred after 0.000009 sec of elapsed time
 7: waited for 'db file sequential read'
    file#=0x2, block#=0x19e1, blocks=0x1
    wait_id=1181 seq_num=1182 snap_id=1
    wait times: snap=0.000388 sec, exc=0.000388 sec, total=0.000388 sec
    wait times: max=infinite
    wait counts: calls=0 os=0
    occurred after 0.000017 sec of elapsed time
 8: waited for 'db file sequential read'
    file#=0x2, block#=0x19e2, blocks=0x1
    wait_id=1180 seq_num=1181 snap_id=1
    wait times: snap=0.000415 sec, exc=0.000415 sec, total=0.000415 sec
    wait times: max=infinite
    wait counts: calls=0 os=0
    occurred after 0.004826 sec of elapsed time
 9: waited for 'db file sequential read'
    file#=0x2, block#=0x2ffc0c, blocks=0x1
    wait_id=1179 seq_num=1180 snap_id=1
    wait times: snap=0.000404 sec, exc=0.000404 sec, total=0.000404 sec
    wait times: max=infinite
    wait counts: calls=0 os=0
    occurred after 0.000007 sec of elapsed time
Sampled Session History of session 8709 serial 38013
---------------------------------------------------
The sampled session history is constructed by sampling
the target session every 1 second. The sampling process
captures at each sample if the session is in a non-idle wait,
an idle wait, or not in a wait. If the session is in a
non-idle wait then one interval is shown for all the samples
the session was in the same non-idle wait. If the
session is in an idle wait or not in a wait for
consecutive samples then one interval is shown for all
the consecutive samples. Though we display these consecutive
samples  in a single interval the session may NOT be continuously
idle or not in a wait (the sampling process does not know).

The history is displayed in reverse chronological order.

sample interval: 1 sec, max history 120 sec
---------------------------------------------------
  [118 samples,                                            11:42:39 - 11:44:39]
    waited for 'library cache: mutex X', seq_num: 1190
      p1: 'idn'=0x644e2de0
      p2: 'value'=0xf3a00000000
      p3: 'where'=0x7c
      time_waited: >= 120 sec (still in wait)
  [3 samples,                                              11:42:39 - 11:42:38]
    idle wait at each sample
---------------------------------------------------
Sampled Session History Summary:
  longest_non_idle_wait: 'library cache: mutex X'
  [118 samples, 11:42:39 - 11:44:39]
      time_waited: >= 120 sec (still in wait)
---------------------------------------------------
----- END DDE Action: 'ORA_12751_DUMP' (SUCCESS, 8 csec) -----
----- END DDE Actions Dump (total 8 csec) -----
KGX cleanup...
KGX Atomic Operation Log 0x1de44da670
 Mutex 0x1d113cf7c8(8709, 0) idn 2de0 oper EXCL(6)
 Library Cache uid 8709 efd 7 whr 49 slp 0
 oper=0 pt1=(nil) pt2=(nil) pt3=(nil)
 pt4=(nil) pt5=(nil) ub4=0
KGX cleanup...
KGX Atomic Operation Log 0x1de44da6c8
 Mutex 0x1de9468550(3898, 0) idn 644e2de0 oper GET_EXCL(5)
 Library Cache uid 8709 efd 7 whr 124 slp 16376
 oper=0 pt1=0x1de9468410 pt2=(nil) pt3=(nil)
 pt4=(nil) pt5=(nil) ub4=0
*** KEWRAFM1: Error=12751 encountered by kewrfteh
*** KEWRAFS: Error=12751 encountered by Auto Flush Slave.
KEBM: MMON slave action policy violation. kewrmafsa_; viol=1; err=12751

一般来说类似这样的系统自动任务被阻塞很可能是由于那种bug导致,找到相关mos文档: library cache: mutex x waits during AWR Flush High Cursor Scan (Doc ID 2382741.1),确认为:Bug 19294556 AWR Flush Waiting For Cursor Scan, Library Cache : Mutex X,目前没有好的workaround,而且在11.2.0.4基础版上没有对应的patch

exadata换flash卡的一些操作

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:exadata换flash卡的一些操作

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户有一套oracle exadata x3-2的1/4配置(采用高容量磁盘)的机器,反馈由于flash卡异常导致性能很慢,通过临时关闭异常卡所在机器业务恢复正常
xd_flash_error


相关版本信息

[root@oa0cel03 ~]# imageinfo

Kernel version: 2.6.32-400.11.1.el5uek #1 SMP Thu Nov 22 03:29:09 PST 2012 x86_64
Cell version: OSS_11.2.3.2.1_LINUX.X64_130109
Cell rpm version: cell-11.2.3.2.1_LINUX.X64_130109-1

Active image version: 11.2.3.2.1.130109
Active image activated: 2013-06-27 02:24:19 -0700
Active image status: success
Active system partition on device: /dev/md6
Active software partition on device: /dev/md8

In partition rollback: Impossible

Cell boot usb partition: /dev/sdm1
Cell boot usb version: 11.2.3.2.1.130109

Inactive image version: 11.2.3.2.0.120713
Inactive image activated: 2012-10-14 06:46:16 -0700
Inactive image status: success
Inactive system partition on device: /dev/md5
Inactive software partition on device: /dev/md7

[root@oa0cel03 ~]# cellcli
CellCLI: Release 11.2.3.2.1 - Production on Thu Jun 20 18:28:37 CST 2024

Copyright (c) 2007, 2012, Oracle.  All rights reserved.
Cell Efficiency Ratio: 3,617

CellCLI> list cell detail
	 name:              	 oa0cel03
	 bbuTempThreshold:  	 60
	 bbuChargeThreshold:	 800
	 bmcType:           	 IPMI
	 cellVersion:       	 OSS_11.2.3.2.1_LINUX.X64_130109
	 cpuCount:          	 24
	 diagHistoryDays:   	 7
	 fanCount:          	 8/8
	 fanStatus:         	 normal
	 flashCacheMode:    	 WriteBack
	 id:                	 1238FM507A
	 interconnectCount: 	 3
	 interconnect1:     	 bondib0
	 iormBoost:         	 0.0
	 ipaddress1:        	 192.168.10.5/22
	 kernelVersion:     	 2.6.32-400.11.1.el5uek
	 locatorLEDStatus:  	 off
	 makeModel:         	 Oracle Corporation SUN FIRE X4270 M3 SAS
	 metricHistoryDays: 	 7
	 notificationMethod:	 snmp
	 notificationPolicy:	 critical,warning,clear
	 offloadEfficiency: 	 3,616.5
	 powerCount:        	 2/2
	 powerStatus:       	 normal
	 releaseVersion:    	 11.2.3.2.1
	 releaseTrackingBug:	 14522699
	 snmpSubscriber:    	 host=oa0db02.qhsrmyy.com,port=3872,community=cell
	                    	 host=oa0db01.qhsrmyy.com,port=3872,community=cell
	 status:            	 online
	 temperatureReading:	 28.0
	 temperatureStatus: 	 normal
	 upTime:            	 0 days, 3:49
	 cellsrvStatus:     	 running
	 msStatus:          	 running
	 rsStatus:          	 running

客户第一次换盘之后,依旧有性能问题,先把griddisk给inactive

[root@oa0cel03 ~]# cellcli -e list metriccurrent attributes name,metricvalue where name like \'FC_BY_DIRTY.*\'
	 FC_BY_DIRTY	 38,820 MB
[root@oa0cel03 ~]# cellcli -e "alter flashcache all flush"
Flash cache on FD_00_oa0cel03 successfully altered
Flash cache on FD_01_oa0cel03 successfully altered
Flash cache on FD_02_oa0cel03 successfully altered
Flash cache on FD_03_oa0cel03 successfully altered
Flash cache on FD_04_oa0cel03 successfully altered
Flash cache on FD_05_oa0cel03 successfully altered
Flash cache on FD_06_oa0cel03 successfully altered
Flash cache on FD_07_oa0cel03 successfully altered
Flash cache on FD_09_exastlx01 successfully altered
Flash cache on FD_10_exastlx01 successfully altered
Flash cache on FD_11_oa0cel03 skipped because FD_11_oa0cel03 is degraded
Flash cache on FD_12_oa0cel03 successfully altered
Flash cache on FD_13_oa0cel03 successfully altered
Flash cache on FD_14_oa0cel03 successfully altered
Flash cache on FD_15_oa0cel03 successfully altered
[root@oa0cel03 ~]# cellcli -e list metriccurrent attributes name,metricvalue where name like \'FC_BY_DIRTY.*\'
	 FC_BY_DIRTY	 0.000 MB
[root@oa0cel03 ~]# cellcli -e "alter griddisk all inactive"
GridDisk DATA_oa0_CD_00_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_01_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_02_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_03_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_04_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_05_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_06_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_07_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_08_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_09_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_10_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_11_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_02_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_03_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_04_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_05_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_06_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_07_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_08_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_09_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_10_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_11_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_00_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_01_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_02_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_03_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_04_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_05_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_06_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_07_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_08_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_09_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_10_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_11_oa0cel03 successfully altered
[root@oa0cel03 ~]# cellcli -e list griddisk
	 DATA_oa0_CD_00_oa0cel03	 inactive
	 DATA_oa0_CD_01_oa0cel03	 inactive
	 DATA_oa0_CD_02_oa0cel03	 inactive
	 DATA_oa0_CD_03_oa0cel03	 inactive
	 DATA_oa0_CD_04_oa0cel03	 inactive
	 DATA_oa0_CD_05_oa0cel03	 inactive
	 DATA_oa0_CD_06_oa0cel03	 inactive
	 DATA_oa0_CD_07_oa0cel03	 inactive
	 DATA_oa0_CD_08_oa0cel03	 inactive
	 DATA_oa0_CD_09_oa0cel03	 inactive
	 DATA_oa0_CD_10_oa0cel03	 inactive
	 DATA_oa0_CD_11_oa0cel03	 inactive
	 DBFS_DG_CD_02_oa0cel03  	 inactive
	 DBFS_DG_CD_03_oa0cel03  	 inactive
	 DBFS_DG_CD_04_oa0cel03  	 inactive
	 DBFS_DG_CD_05_oa0cel03  	 inactive
	 DBFS_DG_CD_06_oa0cel03  	 inactive
	 DBFS_DG_CD_07_oa0cel03  	 inactive
	 DBFS_DG_CD_08_oa0cel03  	 inactive
	 DBFS_DG_CD_09_oa0cel03  	 inactive
	 DBFS_DG_CD_10_oa0cel03  	 inactive
	 DBFS_DG_CD_11_oa0cel03  	 inactive
	 RECO_oa0_CD_00_oa0cel03	 inactive
	 RECO_oa0_CD_01_oa0cel03	 inactive
	 RECO_oa0_CD_02_oa0cel03	 inactive
	 RECO_oa0_CD_03_oa0cel03	 inactive
	 RECO_oa0_CD_04_oa0cel03	 inactive
	 RECO_oa0_CD_05_oa0cel03	 inactive
	 RECO_oa0_CD_06_oa0cel03	 inactive
	 RECO_oa0_CD_07_oa0cel03	 inactive
	 RECO_oa0_CD_08_oa0cel03	 inactive
	 RECO_oa0_CD_09_oa0cel03	 inactive
	 RECO_oa0_CD_10_oa0cel03	 inactive
	 RECO_oa0_CD_11_oa0cel03	 inactive
[root@oa0cel03 ~]#  cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome 
	 DATA_oa0_CD_00_oa0cel03	 OFFLINE	 Yes
	 DATA_oa0_CD_01_oa0cel03	 OFFLINE	 Yes
	 DATA_oa0_CD_02_oa0cel03	 OFFLINE	 Yes
	 DATA_oa0_CD_03_oa0cel03	 OFFLINE	 Yes
	 DATA_oa0_CD_04_oa0cel03	 OFFLINE	 Yes
	 DATA_oa0_CD_05_oa0cel03	 OFFLINE	 Yes
	 DATA_oa0_CD_06_oa0cel03	 OFFLINE	 Yes
	 DATA_oa0_CD_07_oa0cel03	 OFFLINE	 Yes
	 DATA_oa0_CD_08_oa0cel03	 OFFLINE	 Yes
	 DATA_oa0_CD_09_oa0cel03	 OFFLINE	 Yes
	 DATA_oa0_CD_10_oa0cel03	 OFFLINE	 Yes
	 DATA_oa0_CD_11_oa0cel03	 OFFLINE	 Yes
	 DBFS_DG_CD_02_oa0cel03  	 OFFLINE	 Yes
	 DBFS_DG_CD_03_oa0cel03  	 OFFLINE	 Yes
	 DBFS_DG_CD_04_oa0cel03  	 OFFLINE	 Yes
	 DBFS_DG_CD_05_oa0cel03  	 OFFLINE	 Yes
	 DBFS_DG_CD_06_oa0cel03  	 OFFLINE	 Yes
	 DBFS_DG_CD_07_oa0cel03  	 OFFLINE	 Yes
	 DBFS_DG_CD_08_oa0cel03  	 OFFLINE	 Yes
	 DBFS_DG_CD_09_oa0cel03  	 OFFLINE	 Yes
	 DBFS_DG_CD_10_oa0cel03  	 OFFLINE	 Yes
	 DBFS_DG_CD_11_oa0cel03  	 OFFLINE	 Yes
	 RECO_oa0_CD_00_oa0cel03	 OFFLINE	 Yes
	 RECO_oa0_CD_01_oa0cel03	 OFFLINE	 Yes
	 RECO_oa0_CD_02_oa0cel03	 OFFLINE	 Yes
	 RECO_oa0_CD_03_oa0cel03	 OFFLINE	 Yes
	 RECO_oa0_CD_04_oa0cel03	 OFFLINE	 Yes
	 RECO_oa0_CD_05_oa0cel03	 OFFLINE	 Yes
	 RECO_oa0_CD_06_oa0cel03	 OFFLINE	 Yes
	 RECO_oa0_CD_07_oa0cel03	 OFFLINE	 Yes
	 RECO_oa0_CD_08_oa0cel03	 OFFLINE	 Yes
	 RECO_oa0_CD_09_oa0cel03	 OFFLINE	 Yes
	 RECO_oa0_CD_10_oa0cel03	 OFFLINE	 Yes
	 RECO_oa0_CD_11_oa0cel03	 OFFLINE	 Yes

客户继续换卡尝试,最终确认4号卡槽损坏,放弃这个槽位重建flashcache

[root@oa0cel03 ~]# cellcli -e list celldisk
	 CD_00_oa0cel03	 normal
	 CD_01_oa0cel03	 normal
	 CD_02_oa0cel03	 normal
	 CD_03_oa0cel03	 normal
	 CD_04_oa0cel03	 normal
	 CD_05_oa0cel03	 normal
	 CD_06_oa0cel03	 normal
	 CD_07_oa0cel03	 normal
	 CD_08_oa0cel03	 normal
	 CD_09_oa0cel03	 normal
	 CD_10_oa0cel03	 normal
	 CD_11_oa0cel03	 normal
	 FD_00_oa0cel03	 not present
	 FD_01_oa0cel03	 not present
	 FD_02_oa0cel03	 not present
	 FD_03_oa0cel03	 not present
	 FD_04_oa0cel03	 normal
	 FD_05_oa0cel03	 normal
	 FD_06_oa0cel03	 normal
	 FD_07_oa0cel03	 normal
	 FD_08_oa0cel03	 normal
	 FD_09_oa0cel03	 normal
	 FD_10_oa0cel03	 normal
	 FD_10_exastlx01 normal
	 FD_12_oa0cel03	 normal
	 FD_13_oa0cel03	 normal
	 FD_14_oa0cel03	 normal
	 FD_15_oa0cel03	 normal

这个里面FD_10_exastlx01名字是以前老的卡上面留下来的,太影响视觉感官了,删除重建

[root@oa0cel03 ~]# cellcli -e drop celldisk FD_10_exastlx01
CellDisk FD_10_exastlx01 successfully dropped
[root@oa0cel03 ~]# cellcli -e create celldisk all flashdisk
CellDisk FD_11_oa0cel03 successfully created
[root@oa0cel03 ~]# cellcli -e list celldisk
	 CD_00_oa0cel03	 normal
	 CD_01_oa0cel03	 normal
	 CD_02_oa0cel03	 normal
	 CD_03_oa0cel03	 normal
	 CD_04_oa0cel03	 normal
	 CD_05_oa0cel03	 normal
	 CD_06_oa0cel03	 normal
	 CD_07_oa0cel03	 normal
	 CD_08_oa0cel03	 normal
	 CD_09_oa0cel03	 normal
	 CD_10_oa0cel03	 normal
	 CD_11_oa0cel03	 normal
	 FD_00_oa0cel03	 not present
	 FD_01_oa0cel03	 not present
	 FD_02_oa0cel03	 not present
	 FD_03_oa0cel03	 not present
	 FD_04_oa0cel03	 normal
	 FD_05_oa0cel03	 normal
	 FD_06_oa0cel03	 normal
	 FD_07_oa0cel03	 normal
	 FD_08_oa0cel03	 normal
	 FD_09_oa0cel03	 normal
	 FD_10_oa0cel03	 normal
	 FD_11_oa0cel03	 normal
	 FD_12_oa0cel03	 normal
	 FD_13_oa0cel03	 normal
	 FD_14_oa0cel03	 normal
	 FD_15_oa0cel03	 normal

删除flashlog和flashcache

[root@oa0cel03 ~]# cellcli -e drop flashlog
Flash log oa0cel03_FLASHLOG successfully dropped
[root@oa0cel03 ~]# 
[root@oa0cel03 ~]# 
[root@oa0cel03 ~]# 
[root@oa0cel03 ~]# cellcli -e drop flashcache
Flash cache oa0cel03_FLASHCACHE successfully dropped

尝试重建flashlog和flashcache

[root@oa0cel03 ~]# cellcli -e create flashlog all size=512M
Flash log oa0cel03_FLASHLOG successfully created, but the following cell disks were degraded because their 
statuses are not normal: FD_00_oa0cel03, FD_01_oa0cel03, FD_02_oa0cel03, FD_03_oa0cel03

由于有一些celldisk实际硬盘不存在,无法直接创建成功,需要删除对应的celldisk

[root@oa0cel03 ~]# cellcli -e drop celldisk FD_00_oa0cel03

CELL-04519: Cannot complete the drop of cell disk: FD_00_oa0cel03. Received error: 
CELL-04516: LUN Object cannot be obtained for cell disk: FD_00_oa0cel03 
Cell disks not dropped: FD_00_oa0cel03 

--强制删除
[root@oa0cel03 ~]# cellcli -e drop celldisk FD_00_oa0cel03  force

CellDisk FD_00_oa0cel03 successfully dropped
[root@oa0cel03 ~]# cellcli -e drop celldisk FD_01_oa0cel03  force
CellDisk FD_01_oa0cel03 successfully dropped
[root@oa0cel03 ~]# cellcli -e drop celldisk FD_02_oa0cel03  force
CellDisk FD_02_oa0cel03 successfully dropped
[root@oa0cel03 ~]# cellcli -e drop celldisk FD_03_oa0cel03  force
CellDisk FD_03_oa0cel03 successfully dropped
[root@oa0cel03 ~]# cellcli -e list celldisk
	 CD_00_oa0cel03	 normal
	 CD_01_oa0cel03	 normal
	 CD_02_oa0cel03	 normal
	 CD_03_oa0cel03	 normal
	 CD_04_oa0cel03	 normal
	 CD_05_oa0cel03	 normal
	 CD_06_oa0cel03	 normal
	 CD_07_oa0cel03	 normal
	 CD_08_oa0cel03	 normal
	 CD_09_oa0cel03	 normal
	 CD_10_oa0cel03	 normal
	 CD_11_oa0cel03	 normal
	 FD_04_oa0cel03	 normal
	 FD_05_oa0cel03	 normal
	 FD_06_oa0cel03	 normal
	 FD_07_oa0cel03	 normal
	 FD_08_oa0cel03	 normal
	 FD_09_oa0cel03	 normal
	 FD_10_oa0cel03	 normal
	 FD_11_oa0cel03	 normal
	 FD_12_oa0cel03	 normal
	 FD_13_oa0cel03	 normal
	 FD_14_oa0cel03	 normal
	 FD_15_oa0cel03	 normal

创建flashlog和flashcache

[root@oa0cel03 ~]# cellcli -e create flashlog all size=512M
Flash log oa0cel03_FLASHLOG successfully created
[root@oa0cel03 ~]# cellcli -e list flashlog detail
	 name:              	 oa0cel03_FLASHLOG
	 cellDisk:               …………
	 creationTime:      	 2024-06-21T18:20:51+08:00
	 degradedCelldisks: 	 
	 effectiveSize:     	 384M
	 efficiency:        	 100.0
	 id:                	 f3ab3882-fa03-4f49-b0ca-879ef3f2ac05
	 size:              	 384M
	 status:            	 normal
[root@oa0cel03 ~]# cellcli -e create flashcache all
Flash cache oa0cel03_FLASHCACHE successfully created
[root@oa0cel03 ~]# cellcli -e list flashcache detail
	 name:              	 oa0cel03_FLASHCACHE
	 cellDisk:          	 …………
	 creationTime:      	 2024-06-21T18:21:24+08:00
	 degradedCelldisks: 	 
	 effectiveCacheSize:	 1116.5625G
	 id:                	 2195ac46-3021-461f-a6d5-5f64ff1da546
	 size:              	 1116.5625G
	 status:            	 normal
[root@oa0cel03 ~]#  cellcli -e list cell detail | grep flashCacheMode
	 flashCacheMode:    	 WriteBack

active griddisk,把这个cell的griddisk加入到asm磁盘组中

[root@oa0cel03 ~]# cellcli -e "alter griddisk all active"
GridDisk DATA_oa0_CD_00_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_01_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_02_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_03_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_04_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_05_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_06_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_07_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_08_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_09_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_10_oa0cel03 successfully altered
GridDisk DATA_oa0_CD_11_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_02_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_03_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_04_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_05_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_06_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_07_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_08_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_09_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_10_oa0cel03 successfully altered
GridDisk DBFS_DG_CD_11_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_00_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_01_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_02_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_03_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_04_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_05_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_06_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_07_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_08_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_09_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_10_oa0cel03 successfully altered
GridDisk RECO_oa0_CD_11_oa0cel03 successfully altered
[root@oa0cel03 ~]# cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome,status
	 DATA_oa0_CD_00_oa0cel03	 SYNCING	 Yes	 active
	 DATA_oa0_CD_01_oa0cel03	 SYNCING	 Yes	 active
	 DATA_oa0_CD_02_oa0cel03	 SYNCING	 Yes	 active
	 DATA_oa0_CD_03_oa0cel03	 SYNCING	 Yes	 active
	 DATA_oa0_CD_04_oa0cel03	 SYNCING	 Yes	 active
	 DATA_oa0_CD_05_oa0cel03	 SYNCING	 Yes	 active
	 DATA_oa0_CD_06_oa0cel03	 SYNCING	 Yes	 active
	 DATA_oa0_CD_07_oa0cel03	 SYNCING	 Yes	 active
	 DATA_oa0_CD_08_oa0cel03	 SYNCING	 Yes	 active
	 DATA_oa0_CD_09_oa0cel03	 SYNCING	 Yes	 active
	 DATA_oa0_CD_10_oa0cel03	 SYNCING	 Yes	 active
	 DATA_oa0_CD_11_oa0cel03	 SYNCING	 Yes	 active
	 DBFS_DG_CD_02_oa0cel03  	 ONLINE 	 Yes	 active
	 DBFS_DG_CD_03_oa0cel03  	 ONLINE 	 Yes	 active
	 DBFS_DG_CD_04_oa0cel03  	 ONLINE 	 Yes	 active
	 DBFS_DG_CD_05_oa0cel03  	 ONLINE 	 Yes	 active
	 DBFS_DG_CD_06_oa0cel03  	 ONLINE 	 Yes	 active
	 DBFS_DG_CD_07_oa0cel03  	 ONLINE 	 Yes	 active
	 DBFS_DG_CD_08_oa0cel03  	 ONLINE 	 Yes	 active
	 DBFS_DG_CD_09_oa0cel03  	 ONLINE 	 Yes	 active
	 DBFS_DG_CD_10_oa0cel03  	 ONLINE 	 Yes	 active
	 DBFS_DG_CD_11_oa0cel03  	 ONLINE 	 Yes	 active
	 RECO_oa0_CD_00_oa0cel03	 SYNCING	 Yes	 active
	 RECO_oa0_CD_01_oa0cel03	 SYNCING	 Yes	 active
	 RECO_oa0_CD_02_oa0cel03	 SYNCING	 Yes	 active
	 RECO_oa0_CD_03_oa0cel03	 SYNCING	 Yes	 active
	 RECO_oa0_CD_04_oa0cel03	 SYNCING	 Yes	 active
	 RECO_oa0_CD_05_oa0cel03	 SYNCING	 Yes	 active
	 RECO_oa0_CD_06_oa0cel03	 SYNCING	 Yes	 active
	 RECO_oa0_CD_07_oa0cel03	 SYNCING	 Yes	 active
	 RECO_oa0_CD_08_oa0cel03	 SYNCING	 Yes	 active
	 RECO_oa0_CD_09_oa0cel03	 SYNCING	 Yes	 active
	 RECO_oa0_CD_10_oa0cel03	 SYNCING	 Yes	 active
	 RECO_oa0_CD_11_oa0cel03	 SYNCING	 Yes	 active
[root@oa0cel03 ~]# cellcli -e list metriccurrent attributes name,metricvalue where name like \'FC_BY_DIRTY.*\'
	 FC_BY_DIRTY	 585 MB

RMAN-06207 RMAN-06208报错解决

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:RMAN-06207 RMAN-06208报错解决

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

数据库在rman备份时执行delete noprompt expired backup报RMAN-06207和RMAN-06208错误

RMAN>delete noprompt expired backup;
RMAN retention policy will be applied to the command
RMAN retention policy is set to recovery window of 1 days
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=1364 instance=hisdb1 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=3102 instance=hisdb1 device type=DISK
allocated channel: ORA_DISK_3
channel ORA_DISK_3: SID=4534 instance=hisdb1 device type=DISK
allocated channel: ORA_DISK_4
channel ORA_DISK_4: SID=5054 instance=hisdb1 device type=DISK
Deleting the following obsolete backups and copies:
Type                 Key    Completion Time    Filename/Handle
-------------------- ------ ------------------ --------------------
Datafile Copy        3      15-JUN-18          +DATA/hisdb/datafile/xifenfei.282.978867569
Datafile Copy        4      15-JUN-18          +DATA/hisdb/datafile/xifenfei.283.978867589
Backup Set           54704  14-JUN-24
  Backup Piece       54704  14-JUN-24          /oraback/rmanfile/ctl_20240614_m22ta31a_1_1.rman

RMAN-06207: WARNING: 3 objects could not be deleted for DISK channel(s) due
RMAN-06208:          to mismatched status.  Use CROSSCHECK command to fix status
RMAN-06210: List of Mismatched objects
RMAN-06211: ==========================
RMAN-06212:   Object Type   Filename/Handle
RMAN-06213: --------------- ---------------------------------------------------
RMAN-06214: Datafile Copy   +DATA/hisdb/datafile/xifenfei.282.978867569
RMAN-06214: Datafile Copy   +DATA/hisdb/datafile/xifenfei.283.978867589
RMAN-06214: Backup Piece    /oraback/rmanfile/ctl_20240614_m22ta31a_1_1.rman

处理方法crosscheck copy

RMAN> crosscheck copy;

using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=1364 instance=hisdb1 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=3102 instance=hisdb1 device type=DISK
allocated channel: ORA_DISK_3
channel ORA_DISK_3: SID=4534 instance=hisdb1 device type=DISK
allocated channel: ORA_DISK_4
channel ORA_DISK_4: SID=5070 instance=hisdb1 device type=DISK
specification does not match any control file copy in the repository
validation succeeded for archived log
archived log file name=+FRA/hisdb/archivelog/2024_06_16/thread_2_seq_149520.1675.1171833177 
     RECID=589464 STAMP=1171833177
Crosschecked 1 objects

validation failed for datafile copy
datafile copy file name=+DATA/hisdb/datafile/xifenfei.282.978867569 RECID=3 STAMP=978867571
validation failed for datafile copy
datafile copy file name=+DATA/hisdb/datafile/xifenfei.283.978867589 RECID=4 STAMP=978867590
Crosschecked 2 objects

再次验证删除不再提示错误

RMAN>   DELETE noprompt OBSOLETE;

RMAN retention policy will be applied to the command
RMAN retention policy is set to recovery window of 1 days
using channel ORA_DISK_1
using channel ORA_DISK_2
using channel ORA_DISK_3
using channel ORA_DISK_4
no obsolete backups found

RAC主机相差超过10分钟导致crs无法启动

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:RAC主机相差超过10分钟导致crs无法启动

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户反馈有一套19c 2节点rac,断电之后,一个节点数据库无法正常启动,通过crsctl命令查看发现crs进程没有正常启动

[root@xifenf1 ~]# /u01/app/19.0/grid/bin/crsctl status res -t -init
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  OFFLINE                               STABLE
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.crf
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.crsd
      1        ONLINE  OFFLINE                               STABLE
ora.cssd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.cssdmonitor
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.ctssd
      1        ONLINE  OFFLINE                               STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.evmd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.gipcd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.gpnpd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.mdnsd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.storage
      1        ONLINE  ONLINE       xifenf1                  STABLE
--------------------------------------------------------------------------------

查看crs的alert日志发现集群时间间隔超过600s,无法启动csst进程

2024-06-11 17:33:09.953 [OCSSD(5020)]CRS-1605: CSSD voting file is online: /dev/asm_ocr5; 
 details in /u01/app/grid/diag/crs/xifenf1/crs/trace/ocssd.trc.
2024-06-11 17:33:09.956 [OCSSD(5020)]CRS-1605: CSSD voting file is online: /dev/asm_ocr1; 
 details in /u01/app/grid/diag/crs/xifenf1/crs/trace/ocssd.trc.
2024-06-11 17:33:10.024 [OCSSD(5020)]CRS-1605: CSSD voting file is online: /dev/asm_ocr2; 
 details in /u01/app/grid/diag/crs/xifenf1/crs/trace/ocssd.trc.
2024-06-11 17:33:10.031 [OCSSD(5020)]CRS-1605: CSSD voting file is online: /dev/asm_ocr4; 
 details in /u01/app/grid/diag/crs/xifenf1/crs/trace/ocssd.trc.
2024-06-11 17:33:10.040 [OCSSD(5020)]CRS-1605: CSSD voting file is online: /dev/asm_ocr3; 
 details in /u01/app/grid/diag/crs/xifenf1/crs/trace/ocssd.trc.
2024-06-11 17:33:11.900 [OCSSD(5020)]CRS-1601: CSSD Reconfiguration complete. Active nodes are xifenf1 xifenf2 .
2024-06-11 17:33:13.344 [OCSSD(5020)]CRS-1720: Cluster Synchronization Services daemon (CSSD) is ready for operation.
2024-06-11 17:33:13.809 [OCTSSD(5488)]CRS-8500: Oracle Clusterware OCTSSD process is starting with operating system process ID 5488
2024-06-11 17:33:16.017 [OCTSSD(5488)]CRS-2407: The new Cluster Time Synchronization Service reference node is host xifenf2.
2024-06-11 17:33:16.018 [OCTSSD(5488)]CRS-2401: The Cluster Time Synchronization Service started on host xifenf1.
2024-06-11 17:33:16.105 [OCTSSD(5488)]CRS-2419: The clock on host xifenf1 differs from mean cluster time by 1031504618 microseconds. 
  The Cluster Time Synchronization Service will not perform time synchronization 
  because the time difference is beyond the permissible offset of 600 seconds. 
  Details in /u01/app/grid/diag/crs/xifenf1/crs/trace/octssd.trc.
2024-06-11 17:33:16.579 [OCTSSD(5488)]CRS-2402: The Cluster Time Synchronization Service aborted on host xifenf1. 
  Details at (:ctsselect_mstm4:) in /u01/app/grid/diag/crs/xifenf1/crs/trace/octssd.trc.

查看主机时间

[grid@xifenf1 ~]$ date ;ssh xifenf2 date
Tue Jun 11 17:54:09 CST 2024
Tue Jun 11 18:04:34 CST 2024

修改主机时间

[root@xifenf1 ~]# date -s "20240611 18:06:00"
Tue Jun 11 18:06:00 CST 2024
[root@xifenf1 ~]# su - grid
Last login: Tue Jun 11 17:37:53 CST 2024 on pts/0
[grid@xifenf1 ~]$ date ;ssh xifenf2 date
Tue Jun 11 18:06:09 CST 2024
Tue Jun 11 18:05:34 CST 2024

重启crs

[root@xifenf1 ~]# /u01/app/19.0/grid/bin/crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'xifenf1'
CRS-2673: Attempting to stop 'ora.storage' on 'xifenf1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'xifenf1'
CRS-2673: Attempting to stop 'ora.crf' on 'xifenf1'
CRS-2677: Stop of 'ora.storage' on 'xifenf1' succeeded
CRS-2673: Attempting to stop 'ora.evmd' on 'xifenf1'
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'xifenf1'
CRS-2677: Stop of 'ora.mdnsd' on 'xifenf1' succeeded
CRS-2677: Stop of 'ora.crf' on 'xifenf1' succeeded
CRS-2677: Stop of 'ora.evmd' on 'xifenf1' succeeded
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'xifenf1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'xifenf1'
CRS-2677: Stop of 'ora.cssd' on 'xifenf1' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'xifenf1'
CRS-2673: Attempting to stop 'ora.gipcd' on 'xifenf1'
CRS-2677: Stop of 'ora.gpnpd' on 'xifenf1' succeeded
CRS-2677: Stop of 'ora.gipcd' on 'xifenf1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'xifenf1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
[root@xifenf1 ~]# /u01/app/19.0/grid/bin/crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
[root@xifenf1 ~]# /u01/app/19.0/grid/bin/crsctl status res -t -init
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.crf
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.crsd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.cssd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.cssdmonitor
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.ctssd
      1        ONLINE  ONLINE       xifenf1                  ACTIVE:35600,STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.evmd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.gipcd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.gpnpd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.mdnsd
      1        ONLINE  ONLINE       xifenf1                  STABLE
ora.storage
      1        ONLINE  ONLINE       xifenf1                  STABLE
--------------------------------------------------------------------------------

O/S-Error: (OS 23) 数据错误(循环冗余检查)—故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:O/S-Error: (OS 23) 数据错误(循环冗余检查)—故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

有客户由于磁盘坏道导致数据文件访问报ORA-27070 OSD-04016 O/S-Error等相关错误
OSD-04016


rman 尝试读取88号文件

RMAN> backup datafile 88 format 'e:\rman\%d_%T_%I.%s%p';

启动 backup 于 05-6月 -24
使用目标数据库控制文件替代恢复目录
分配的通道: ORA_DISK_1
通道 ORA_DISK_1: SID=2246 设备类型=DISK
通道 ORA_DISK_1: 正在启动全部数据文件备份集
通道 ORA_DISK_1: 正在指定备份集内的数据文件
输入数据文件: 文件号=00088 名称=E:\APP\ADMINISTRATOR\ORADATA\XFF\XIFENFEI.ORA
通道 ORA_DISK_1: 正在启动段 1 于 05-6月 -24
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: backup 命令 (ORA_DISK_1 通道上, 在 06/05/2024 18:33:43 上) 失败
ORA-19501: 文件 "E:\APP\ADMINISTRATOR\ORADATA\XFF\XIFENFEI.ORA", 块编号 322944 (块大小=8192) 上出现读取错误
ORA-27070: 异步读取/写入失败
OSD-04016: 异步 I/O 请求排队时出错。
O/S-Error: (OS 23) 数据错误(循环冗余检查)。

检查系统日志,发现有报:设备 \Device\Harddisk0\DR0 有一个不正确的区块
disk-error


到目前为止基本上可以判断是文件系统或者磁盘层面出现坏道导致该问题(磁盘坏道概率更大),使用工具对损坏数据文件进行强制拷贝,提示少量扇区数据无法拷贝
force-copy

通过dbv检查恢复出来文件效果

E:\check_db>DBV FILE=D:/XIFENFEI.ORA

DBVERIFY: Release 11.2.0.3.0 - Production on 星期日 6月 9 22:17:28 2024

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - 开始验证: FILE = D:\XIFENFEI.ORA
页 323055 标记为损坏
Corrupt block relative dba: 0x1604edef (file 88, block 323055)
Bad header found during dbv:
Data in bad block:
 type: 229 format: 5 rdba: 0xe5e5e5e5
 last change scn: 0xe5e5.e5e5e5e5 seq: 0xe5 flg: 0xe5
 spare1: 0xe5 spare2: 0xe5 spare3: 0xe5e5
 consistency value in tail: 0x4c390601
 check value in block header: 0xe5e5
 computed block checksum: 0x5003



DBVERIFY - 验证完成

检查的页总数: 524288
处理的页总数 (数据): 204510
失败的页总数 (数据): 0
处理的页总数 (索引): 127485
失败的页总数 (索引): 0
处理的页总数 (其他): 3030
处理的总页数 (段)  : 0
失败的总页数 (段)  : 0
空的页总数: 189262
标记为损坏的总页数: 1
流入的页总数: 0
加密的总页数        : 0
最高块 SCN            : 184063522 (3470.184063522)

运气不错,就一个数据库block异常,通过dba_extents查询坏块所属对象,运气不太好是一个表数据

SQL> SELECT OWNER, SEGMENT_NAME, SEGMENT_TYPE, TABLESPACE_NAME, A.PARTITION_NAME
  2    FROM DBA_EXTENTS A
  3   WHERE FILE_ID = &FILE_ID
  4     AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1;
输入 file_id 的值:  88
原值    3:  WHERE FILE_ID = &FILE_ID
新值    3:  WHERE FILE_ID = 88
输入 block_id 的值:  323055
原值    4:    AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1
新值    4:    AND 323055 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1

OWNER
------------------------------
SEGMENT_NAME
--------------------------------------------------------------------------------
SEGMENT_TYPE       TABLESPACE_NAME                PARTITION_NAME
------------------ ------------------------------ ------------------------------
XFFUSER
XFFTABLE
TABLE              XFFTBS_TAB2

设置跳过坏块导出数据,然后重命名原表导入数据,完成本次恢复,以前有过类似恢复:

一次侥幸的OSD-04016 O/S-Error异常恢复
O/S-Error: (OS 23) 数据错误(循环冗余检查) 数据库恢复