ORA-04020导致adg异常

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-04020导致adg异常

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

今日早上有客户反馈adg停止同步了,通过检查alert日志发现

Tue Dec 24 18:17:41 2019
Media Recovery Waiting for thread 1 sequence 56655 (in transit)
Recovery of Online Redo Log: Thread 1 Group 11 Seq 56655 Reading mem 0
  Mem# 0: Y:\ORACLE\ORADATA\ORACLE11\STD_REDO11.LOG
Archived Log entry 56248 added for thread 1 sequence 56654 ID 0x5b6bcf9b dest 1:
Tue Dec 24 18:18:11 2019
Errors in file C:\APP\ADMINISTRATOR\diag\rdbms\oracle11dg\oracle11\trace\oracle11_lgwr_3252.trc:
ORA-04020: deadlock detected while trying to lock object SYS.orcl
LGWR (ospid: 3252): terminating the instance due to error 4020
Tue Dec 24 18:18:11 2019
System state dump requested by (instance=1, osid=3252 (LGWR)), summary=[abnormal instance termination].
System State dumped to trace file C:\APP\ADMINISTRATOR\diag\rdbms\oracle11dg\oracle11\trace\oracle11_diag_3236_20191224181811.trc
Dumping diagnostic data in directory=[cdmp_20191224181811], requested by (instance=1, osid=3252 (LGWR)), summary=[abnormal instance termination].
Instance terminated by LGWR, pid = 3252

由于lgwr进程遭遇ORA-04020,从而使得lgwr进程异常,进而整个数据库crash.

分析trace文件

Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Windows NT Version V6.1 Service Pack 1 
CPU                 : 8 - type 8664, 8 Physical Cores
Process Affinity    : 0x0x0000000000000000
Memory (Avail/Total): Ph:8395M/32733M, Ph+PgF:41002M/65464M 
Instance name: oracle11
Redo thread mounted by this instance: 1
Oracle process number: 10
Windows thread id: 3252, image: ORACLE.EXE (LGWR)


*** 2019-12-24 18:18:11.072
*** SESSION ID:(384.1) 2019-12-24 18:18:11.072
*** CLIENT ID:() 2019-12-24 18:18:11.072
*** SERVICE NAME:(SYS$BACKGROUND) 2019-12-24 18:18:11.072
*** MODULE NAME:() 2019-12-24 18:18:11.072
*** ACTION NAME:() 2019-12-24 18:18:11.072
 
A deadlock among DDL and parse locks is detected.
This deadlock is usually due to user errors in
the design of an application or from issuing a set
of concurrent statements which can cause a deadlock.
This should not be reported to Oracle Support.
The following information may aid in finding
the errors which cause the deadlock:
ORA-04020: deadlock detected while trying to lock object SYS.orcl
--------------------------------------------------------
  object   waiting  waiting       blocking blocking
  handle   session     lock mode   session     lock mode
--------  -------- -------- ----  -------- -------- ----
0000000676C20F08  000000066D22BE10 00000006738AB970    X  000000066D22BE10 00000006738A04B0    S
0000000677DF2E80  00000006792E2880 0000000673B13AE8    X  000000066D22BE10 00000006738A19B8    S
--------------------------------------------------------
---------- DUMP OF WAITING AND BLOCKING LOCKS ----------
--------------------------------------------------------
------------- WAITING LOCK -------------
----------------------------------------
SO: 0x00000006738AB970, type: 78, owner: 0x000000065D440498, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
 proc=0x000000066CDE7AD0, name=LIBRARY OBJECT LOCK, file=kgl.h LINE:8751, pg=0

LibraryObjectLock:  Address=00000006738AB970 Handle=0000000676C20F08 RequestMode=X CanBeBrokenCount=2 Incarnation=3 ExecutionCount=0  
………………
SO: 0x00000006738A19B8, type: 78, owner: 0x000000065A38D6C0, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
 proc=0x000000066CDE7AD0, name=LIBRARY OBJECT LOCK, file=kgl.h LINE:8751, pg=0

LibraryObjectLock:  Address=00000006738A19B8 Handle=0000000677DF2E80 Mode=S CanBeBrokenCount=1 Incarnation=1 ExecutionCount=0   
  
  User=000000066D22BE10 Session=000000066D22BE10 ReferenceCount=1 Flags=CNB/[0001] SavepointNum=1b749 
  LibraryHandle:  Address=0000000677DF2E80 Hash=675351da LockMode=S PinMode=0 LoadLockMode=0 Status=0 
  ObjectName:  Name=SYS.orcl   
    FullHashValue=285b654fe3f440652c403c98675351da Namespace=DBINSTANCE(74) Type=CURSOR(00) Identifier=1 OwnerIdn=0 
  Statistics:  InvalidationCount=0 ExecutionCount=0 LoadCount=0 ActiveLocks=1 TotalLockCount=74719 TotalPinCount=0 
  Counters:  BrokenCount=1 RevocablePointer=1 KeepDependency=0 Version=0 BucketInUse=2 HandleInUse=2 HandleReferenceCount=0 
  Concurrency:  DependencyMutex=0000000677DF2F30(0, 0, 0, 0) Mutex=0000000677DF2FC0(0, 149471, 1, 0) 
  Flags=RON/PIN/KEP/BSO/[00810003] 
  WaitersLists:  
    Lock=0000000677DF2F10[0000000673B13B58,000000067382E2F0] 
    Pin=0000000677DF2EF0[0000000677DF2EF0,0000000677DF2EF0] 
    LoadLock=0000000677DF2F68[0000000677DF2F68,0000000677DF2F68] 
  Timestamp:  
  HandleReference:  Address=0000000677DF3030 Handle=0000000000000000 Flags=[00] ---------------------------------
This lock request was aborted.
error 4020 detected in background process
ORA-04020: deadlock detected while trying to lock object SYS.orcl
kjzduptcctx: Notifying DIAG for crash event
----- Abridged Call Stack Trace -----
ksedsts()+585<-kjzdssdmp()+329<-kjzduptcctx()+288<-kjzdicrshnfy()+99<-ksuitm()+1525<-ksbrdp()+4578<-opirip()
+853<-opidrv()+906<-sou2o()+98<-opimai_real()+280<-opimai()+191<-BackgroundThreadStart()+646<-0000000076CF59CD<-0000000076E2A561 
----- End of Abridged Call Stack Trace -----

*** 2019-12-24 18:18:11.165
LGWR (ospid: 3252): terminating the instance due to error 4020

*** 2019-12-24 18:18:17.483
ksuitm: waiting up to [5] seconds before killing DIAG(3236)

 

日志显示由于lgwr进程等待LIBRARY OBJECT LOCK超时,从而引起异常,根据经验此类问题一般是由于bug导致,查询mos发现匹配bug信息Bug 18515268 ORA-4020 in ADG Standby Database causing instance crash by LGWR
20191225112622


可以根据需要打上相关Patch 18515268: ACTIVE DATAGUARD STANDBY CRASHES DUE TO AN ORA-4020 ENCOUNTERED BY LGWR

ORA-00600: internal error code, arguments: [2252]

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-00600: internal error code, arguments: [2252]

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

客户数据库版本10.2.0.4,启动成功之后立马crash,让我们协助解决
version


Thu Jul  4 13:03:10 2019
Completed: ALTER DATABASE OPEN
Thu Jul  4 13:03:10 2019
db_recovery_file_dest_size of 2048 MB is 0.00% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Thu Jul  4 13:04:01 2019
Errors in file /oracle/app/oracle/admin/tpfworcl/bdump/tpfworcl_reco_22268.trc:
ORA-00600: internal error code, arguments: [2252], [3987], [3375047096], [], [], [], [], []
Thu Jul  4 13:04:01 2019
Errors in file /oracle/app/oracle/admin/tpfworcl/bdump/tpfworcl_reco_22268.trc:
ORA-00600: internal error code, arguments: [2252], [3987], [3375047096], [], [], [], [], []
Thu Jul  4 13:04:02 2019
Errors in file /oracle/app/oracle/admin/tpfworcl/bdump/tpfworcl_reco_22268.trc:
ORA-00600: internal error code, arguments: [2252], [3987], [3375047096], [], [], [], [], []
Thu Jul  4 13:04:02 2019
RECO: terminating instance due to error 476
Instance terminated by RECO, pid = 22268

通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)检查scn相关信息
scn


从ORA-600 2252错误信息看,由于scn可能超过该数据库的天花板理论上而导致该问题,而reco进程主要是由于分布式事务引起,通过和客户确认,该库有通过dblink去访问11204版本oracle,而从2019年6月23日之后scn的算法发生了一些改变(SCN Compatibility问题汇总-2019年6月23日),导致数据库可以支持更大的scn,从而当低版本需要进行分布式事务操作之时,可能导致数据库异常.

处理方案:通过临时屏蔽分布式事务,让数据库临时正常工作;长期解决方案需要把数据库版本升级,避免scn引起相关问题

ALERT: Database Corruption ORA-600 ORA-7445 errors after applying AIX SP patches – AIX 6.1.9.8 or AIX 7.1.3.8 or AIX 7.1.4.3 or AIX 7.2.0.3 or AIX 7.2.1.0, 01

APPLIES TO:

Oracle Database – Enterprise Edition – Version 11.2.0.3 to 12.2.0.1 [Release 11.2 to 12.2]
IBM AIX on POWER Systems (64-bit)
A problem has been discovered in the latest SP patches for IBM AIX 6.1 and 7.1 (SP 08 and SP 03) where 11.2.0.3, 11.2.0.4, or 12.1 or 12.2 are running. ORA-600 errors and possible database corruption.

upgrade from AIX 6.1.9.7 to SP08
upgrade from AIX 7.1.4.2 to SP03
or running on one of the oslevels listed below in this note.

This is only known to impact Oracle 11.2.0.3.x, 11.2.0.4.x, 12.1.0.2, or 12.2.0.1 on AIX platforms. It has been observed on various Oracle PSU versions.
The symptoms observed so far are ORA-600 memory related failures with examples below.
Additionally, Redo log corruption has been observed in at least two cases.

DESCRIPTION

Database Corruption and/or ORA-600 ORA-7445 errors after applying IBM AIX SP patches – After update from AIX 6.1.9.7 to SP08 or AIX 7.1.4.2 to SP03 (note the earlier service packs (SP 07 or SP 02 are not impacted)

OCCURRENCE

The only changes were upgrades to the latest IBM SP patches.

upgraded from AIX 6.1.9.7 to SP08  –> SP08 has the problem.
upgraded from AIX 7.1.4.2 to SP03  –> SP03 has the problem.

To check for AIX patch levels that are exposed to this risk, run the following command and look for any of the following:

# oslevel -s

If any of the following are listed, exposure to this problem exists:

6100-09-08
7100-03-08
7100-04-03
7200-00-03
7200-01-00
7200-01-01

SYMPTOMS

The following ORA-600 errors have been observed. Note that not all errors are needed, and not all customers have seen all these errors.

=========================================================================================
ORA-00600: internal error code, arguments: [kkoipt:invalid aptyp], [0], [0], [], [], [], [], [], [], [], [], []
Optimizer – Maps the structures from memory
=========================================================================================
ORA-00600: internal error code, arguments: [kghssgai2], [1], [32], [], [], [], [], [], [], [], [], []
–looks to be pga related allocations
Generic memory Heap manager -we can’t have both a heap and an allocation function passed in to us
=========================================================================================
ORA-00600: internal error code, arguments: [qkkAssignKey:1], [], [], [], [], [], [], [], [], [], [], []
qkkAssignKey – copy keys from source to destination key
=========================================================================================
ORA-00600: internal error code, arguments: [kclgclks_3], [454], [2431642561], [], [], [], [], [], [], [], [], []
kclgclks – CR Server request
=========================================================================================
ORA-00600: internal error code, arguments: [kkqvmRmViewFromLst1], [], [], [], [], [], [], [], [], [], [], []
View Merging – list management
=========================================================================================
ORA-00600: internal error code, arguments: [kghstack_underflow_internal_1], [0x082024000], [rpi role space], [], [], [], [], [], [], [], [], []
shared heap manager Stack segment underflow, failure to follow stack discipline.
assert no previous chunk in this segment
=========================================================================================
ORA-00600: internal error code, arguments: [qerghFetch.y], [], [], [], [], [], [], [], [], [], [], []
Implements hash aggregation for query source
=========================================================================================
ORA-00600: internal error code, arguments: [qeshQBNextLoad.1], [], [], [], [], [], [], [], [], [], [], []
Hash Table Infrastructure -get Next buffer during Load
=========================================================================================
ORA-00600: internal error code, arguments: [qkshtQBGet:1], [], [], [], [], [], [], [], [], [], [], []
gets memory pointer for a query block.
Make sure the query block pointer is not NULL
=========================================================================================
ORA-00600: internal error code, arguments: [qeshIHBuildOnPartition block missed], [], [], [], [], [], [], [], [], [], [], []
Hash Table Infrastructure
update the partition at the end.
=========================================================================================
ORA-00600: internal error code, arguments: [kghssgfr2], [1]
=========================================================================================
ORA-07445: exception encountered: core dump [PC:0x0] [SIGILL] [ADDR:0x0] [PC:0x0] [Illegal opcode]
=========================================================================================
ORA-00600 [kkogbro: no kkoaptyp]
=========================================================================================
ORA-00600: internal error code, arguments: [kewrose_1], [600]
========================================================================================
ORA-00600: internal error code, arguments: [1868], [0x000000000], [], [], [], [], [], [], [], [], [], []
Core dumps are also possible.

—————

Redo log corruption with checksum error has also been observed.

Two known examples below:

example 1:

Alert.log messages:

ORA-00368: checksum error in redo log block
ORA-00353: log corruption near block 73804 change 8112409541614 time 12/07/2016 07:12:25
ORA-00334: archived log: ‘/dev/rredo13’
ORA-07445: exception encountered: core dump [pkrdi()+780] [SIGSEGV] [ADDR:0x0] [PC:0x10367B26C] [Invalid permissions for mapped

—————

There have been also transient database block corruptions or control file block corruption with checksum errors in the database where a reread finds valid data.

example 2 (transient database block corruption with checksum error):

Corrupt block relative dba: 0x5a066b2f (file 360, block 420655)
Bad check value found during buffer read
Data in bad block:
type: 6 format: 2 rdba: 0x5a066b2f
last change scn: 0x00cc.6a826294 seq: 0x1 flg: 0x06
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0x62940601
check value in block header: 0x9e7d
computed block checksum: 0x0           —> 0x0 means that checksum is good when printing the error message (transient problem)
Reading datafile ‘Datafile name’ for corruption at rdba: 0x5a066b2f (file 360, block 420655)
Reread (file 360, block 420655) found valid data
Hex dump of (file 360, block 420655) in trace file ….
Repaired corruption at (file 360, block 420655)

example 3 (transient control file corruption with checksum error):

Hex dump of (file 0, block 1) in trace file …
Corrupt block relative dba: 0x00000001 (file 0, block 1)
Bad check value found during control file header read
Data in bad block:
type: 21 format: 2 rdba: 0x00000001
last change scn: 0x0000.00000000 seq: 0x1 flg: 0x04
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0x00001501
check value in block header: 0xca35
computed block checksum: 0x0                 —> 0x0 means that checksum is good when printing the error message (transient problem)
Errors in file ..:
ORA-00202: control file: ‘/oracle/dbs/control_01.ctl’
Errors in file …
ORA-00227: corrupt block detected in control file: (block 1, # blocks 1)
ORA-00202: control file: ‘/oracle/dbs/control_01.ctl’

WORKAROUND

There is no workaround to avoid the problem, but if log corruption is encountered, one possible workaround is to clear the unarchived redo log.  The fix is to rollback the IBM SP or apply the updated fixes.

Syntax to clear logfile:

alter database clear <unarchived> logfile group <integer>;
alter database clear <unarchived> logfile ‘<filename>’;

PATCHES

The fix is now ready from IBM

It can be downloaded for the above releases via:

ftp://aix.software.ibm.com/aix/ifixes/

Affected AIX Levels     Fixed In           iFix / APAR (ftp://aix.software.ibm.com/aix/ifixes/)
6100-09-08               6100-09-09      IV93840
7100-03-08               7100-03-09      IV93884
7100-04-03               7100-04-04      IV93845
7200-00-03               7200-00-04      IV93883
7200-01-01               7200-01-02      IV93885

The fix is included in the next to be released AIX Service Packs.

IBM HIPER APAR
Abstract: PROBLEMS CAN OCCUR WITH THREAD_CPUTIME AND THREAD_CPUTIME_FAST

This APAR corrects an issue with system call thread_cputime_self with floating point registers which is exposed by Oracle Database 11gR2.

PROBLEM SUMMARY:
The thread_cputime or thread_cputime_fast interfaces can
cause invalid data in the FP/VMX/VSX registers if the thread
page faults in this function

For more information see the following from IBM:

http://www-01.ibm.com/support/docview.wss?uid=isg1SSRVPOAIX71HIPER170303-1247

参考:ALERT: Database Corruption ORA-600 ORA-7445 errors after applying AIX SP patches – AIX 6.1.9.8 or AIX 7.1.3.8 or AIX 7.1.4.3 or AIX 7.2.0.3 or AIX 7.2.1.0, 01 (Doc ID 2237498.1)

ORA-600 [kcrfr_update_nab_2]恢复支持

APPLIES TO:

Oracle Database – Enterprise Edition – Version 10.2.0.2 to 10.2.0.4 [Release 10.2]
Information in this document applies to any platform.
Oracle Server Enterprise Edition – Version: 10.2.0.2 to 10.2.0.4

SYMPTOMS

After Database Crashing or Shutdown abort we are unable to open database, in the alert.log we find the error   ORA-600[kcrfr_update_nab_2]

==> In the alert.log we can see that database crashes while performing instance recovery:

Tue Oct 07 13:30:28 2008
Starting ORACLE instance (normal)
..
ALTER DATABASE OPEN
Tue Oct 07 13:30:39 2008
Beginning crash recovery of 1 threads
Tue Oct 07 13:30:39 2008
Started redo scan
Tue Oct 07 13:30:41 2008
Errors in file ….ORCL_ora_3148.trc:
ORA-00600: internal error code, arguments: [kcrfr_update_nab_2], [0x3C2C5CD0], [2], [], [], []
Tue Oct 07 13:30:46 2008
Aborting crash recovery due to error 600

==> In the trace file we can see the following error stack

start recovery at logseq 18989, block 1312, scn 0

ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kcrfr_update_nab_2], [0x3C2C5CD0], [2], [], [], [],
Current SQL statement for this session:
alter database open
—– Call Stack Trace:
ksedst <- ksedmp <- ksfdmp <- kgerinv <- kgeasnmierr
<- kcrfr_update_nab <- kcrfr_read <- kcrfr_read_buffer <- kcrfrgv <- kcratr1
<- kcratr <- kctrec <- kcvcrv <- kcfopd <- adbdrv
<- opiexe <- opiosq0 <- kpooprx <- kpoal8 <- opiodr
<- ttcpip <- opitsk <- opiino <- opiodr <- opidrv
<- sou2o <- opimai_real <- opimai <- OracleThreadStart@

CAUSE

This issue has been reported in following bugs:

Bug 5692594
Hdr: 5692594 10.2.0.1 RDBMS 10.2.0.1 RECOVERY PRODID-5 PORTID-226 ORA-600
Abstract: AFTER DATABASE CRASHED DOESN’T OPEN ORA-600 [KCRFR_UPDATE_NAB_2]
Status: 95,Closed, Vendor OS Problem
 

Bug 6655116
Hdr: 6655116 10.2.0.3 RDBMS 10.2.0.3 RECOVERY PRODID-5 PORTID-23
Abstract: INSTANCES CRASH WITH ORA-600 [KCRFR_UPDATE_NAB_2] AFTER DISK FAILURE

Status: 95,Closed, Vendor OS Problem

EXPLANATION
The assert ORA-600: [kcrfr_update_nab_2] is a direct result of a lost write   in the current on line log that we are attempting to resolve.   So, this confirms the theory that this is a OS/hardware   lost write issue not an internal oracle bug. In fact the assert  ORA-600: [kcrfr_update_nab_2] is how we detect a lost log write.

SOLUTION

There are some bugs that match with this issue and all have been closed as Vendor OS/problem.
The error is caused by a corruption in the on line redo-log, probably a lost write in the file.

The best solution in this case is to restore database from backup and recover it until the sequence before the current on line redo-log

 

这类问题,由于写丢失无法直接open成功,如果需要,可以联系我们,提供专业ORACLE数据库恢复技术支持
Phone:17813235971    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com

Alter database open ORA-7445 [kkcnrli0] signalled

APPLIES TO:

Oracle Database – Enterprise Edition – Version 11.2.0.3 to 12.1.0.2 [Release 11.2 to 12.1]
Information in this document applies to any platform.

SYMPTOMS

Alter database open signaled ORA-7445 kkcnrli0

at :

Wed Jan 17 12:09:17 2018
CJQ0 started with pid=174, OS id=71144
Completed: ALTER DATABASE OPEN /* db agent *//* {1:39551:2} */
Wed Jan 17 12:09:18 2018
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x0] [PC:0x198F36F, kkcnrli0()+639] [flags: 0x0, count: 1]
Errors in file /u01/app/oracle/diag/rdbms/zdlra/zdlra1/trace/zdlra1_q003_71365.trc (incident=247784):
ORA-07445: exception encountered: core dump [kkcnrli0()+639] [SIGSEGV] [ADDR:0x0] [PC:0x198F36F] [Address not mapped to object] []
Incident details in: /u01/app/oracle/diag/rdbms/zdlra/zdlra1/incident/incdir_247784/zdlra1_q003_71365_i247784.trc

 

Trace files

shows :

*** SERVICE NAME:(SYS$BACKGROUND) 2018-01-17 12:09:18.108
*** MODULE NAME:(Streams) 2018-01-17 12:09:18.108 <—————-
*** CLIENT DRIVER:() 2018-01-17 12:09:18.108
*** ACTION NAME:(QMON Slave) 2018-01-17 12:09:18.108 <————–

========= Dump for incident 247784 (ORA 7445 [kkcnrli0]) ========


….

*** 2018-01-17 12:09:18.110
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x3, level=3, mask=0x0)
[TOC00004]
—– SQL Statement (None) —–
Current SQL information unavailable – no cursor.
[TOC00004-END]

[TOC00005]
—– Call Stack Trace —–
… kkcnrli0 kkcnrli kponPurgeUnreachLoc kwqmnslv kwsbsmspm
kwsbgcbkms ksvrdp opirip opidrv sou2o
opimai_real ssthrdmain main

 

CAUSE

Bug 17722075 – ORA-7445 [kkcnrli] in Qnnn process during ALTER DATABASE OPEN (Doc ID 17722075.8)

SOLUTION

User  may either upgrade to the releases that were fixed

12.2.0.1 (Base Release)
12.1.0.2.160719 (Jul 2016) Database Patch Set Update (DB PSU)
12.1.0.2.160719 (Jul 2016) Database Proactive Bundle Patch
11.2.0.4.160719 Exadata Database Bundle Patch (Jul 2016)
12.1.0.2.160719 (Jul 2016) Bundle Patch for Windows Platforms

or

check if there were one-off patch according to your RDBMS oraInventory version

or

simply workaorund by restart database.