ALERT: Database Corruption ORA-600 ORA-7445 errors after applying AIX SP patches – AIX 6.1.9.8 or AIX 7.1.3.8 or AIX 7.1.4.3 or AIX 7.2.0.3 or AIX 7.2.1.0, 01

APPLIES TO:

Oracle Database – Enterprise Edition – Version 11.2.0.3 to 12.2.0.1 [Release 11.2 to 12.2]
IBM AIX on POWER Systems (64-bit)
A problem has been discovered in the latest SP patches for IBM AIX 6.1 and 7.1 (SP 08 and SP 03) where 11.2.0.3, 11.2.0.4, or 12.1 or 12.2 are running. ORA-600 errors and possible database corruption.

upgrade from AIX 6.1.9.7 to SP08
upgrade from AIX 7.1.4.2 to SP03
or running on one of the oslevels listed below in this note.

This is only known to impact Oracle 11.2.0.3.x, 11.2.0.4.x, 12.1.0.2, or 12.2.0.1 on AIX platforms. It has been observed on various Oracle PSU versions.
The symptoms observed so far are ORA-600 memory related failures with examples below.
Additionally, Redo log corruption has been observed in at least two cases.

DESCRIPTION

Database Corruption and/or ORA-600 ORA-7445 errors after applying IBM AIX SP patches – After update from AIX 6.1.9.7 to SP08 or AIX 7.1.4.2 to SP03 (note the earlier service packs (SP 07 or SP 02 are not impacted)

OCCURRENCE

The only changes were upgrades to the latest IBM SP patches.

upgraded from AIX 6.1.9.7 to SP08  –> SP08 has the problem.
upgraded from AIX 7.1.4.2 to SP03  –> SP03 has the problem.

To check for AIX patch levels that are exposed to this risk, run the following command and look for any of the following:

# oslevel -s

If any of the following are listed, exposure to this problem exists:

6100-09-08
7100-03-08
7100-04-03
7200-00-03
7200-01-00
7200-01-01

SYMPTOMS

The following ORA-600 errors have been observed. Note that not all errors are needed, and not all customers have seen all these errors.

=========================================================================================
ORA-00600: internal error code, arguments: [kkoipt:invalid aptyp], [0], [0], [], [], [], [], [], [], [], [], []
Optimizer – Maps the structures from memory
=========================================================================================
ORA-00600: internal error code, arguments: [kghssgai2], [1], [32], [], [], [], [], [], [], [], [], []
–looks to be pga related allocations
Generic memory Heap manager -we can’t have both a heap and an allocation function passed in to us
=========================================================================================
ORA-00600: internal error code, arguments: [qkkAssignKey:1], [], [], [], [], [], [], [], [], [], [], []
qkkAssignKey – copy keys from source to destination key
=========================================================================================
ORA-00600: internal error code, arguments: [kclgclks_3], [454], [2431642561], [], [], [], [], [], [], [], [], []
kclgclks – CR Server request
=========================================================================================
ORA-00600: internal error code, arguments: [kkqvmRmViewFromLst1], [], [], [], [], [], [], [], [], [], [], []
View Merging – list management
=========================================================================================
ORA-00600: internal error code, arguments: [kghstack_underflow_internal_1], [0x082024000], [rpi role space], [], [], [], [], [], [], [], [], []
shared heap manager Stack segment underflow, failure to follow stack discipline.
assert no previous chunk in this segment
=========================================================================================
ORA-00600: internal error code, arguments: [qerghFetch.y], [], [], [], [], [], [], [], [], [], [], []
Implements hash aggregation for query source
=========================================================================================
ORA-00600: internal error code, arguments: [qeshQBNextLoad.1], [], [], [], [], [], [], [], [], [], [], []
Hash Table Infrastructure -get Next buffer during Load
=========================================================================================
ORA-00600: internal error code, arguments: [qkshtQBGet:1], [], [], [], [], [], [], [], [], [], [], []
gets memory pointer for a query block.
Make sure the query block pointer is not NULL
=========================================================================================
ORA-00600: internal error code, arguments: [qeshIHBuildOnPartition block missed], [], [], [], [], [], [], [], [], [], [], []
Hash Table Infrastructure
update the partition at the end.
=========================================================================================
ORA-00600: internal error code, arguments: [kghssgfr2], [1]
=========================================================================================
ORA-07445: exception encountered: core dump [PC:0x0] [SIGILL] [ADDR:0x0] [PC:0x0] [Illegal opcode]
=========================================================================================
ORA-00600 [kkogbro: no kkoaptyp]
=========================================================================================
ORA-00600: internal error code, arguments: [kewrose_1], [600]
========================================================================================
ORA-00600: internal error code, arguments: [1868], [0x000000000], [], [], [], [], [], [], [], [], [], []
Core dumps are also possible.

—————

Redo log corruption with checksum error has also been observed.

Two known examples below:

example 1:

Alert.log messages:

ORA-00368: checksum error in redo log block
ORA-00353: log corruption near block 73804 change 8112409541614 time 12/07/2016 07:12:25
ORA-00334: archived log: ‘/dev/rredo13’
ORA-07445: exception encountered: core dump [pkrdi()+780] [SIGSEGV] [ADDR:0x0] [PC:0x10367B26C] [Invalid permissions for mapped

—————

There have been also transient database block corruptions or control file block corruption with checksum errors in the database where a reread finds valid data.

example 2 (transient database block corruption with checksum error):

Corrupt block relative dba: 0x5a066b2f (file 360, block 420655)
Bad check value found during buffer read
Data in bad block:
type: 6 format: 2 rdba: 0x5a066b2f
last change scn: 0x00cc.6a826294 seq: 0x1 flg: 0x06
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0x62940601
check value in block header: 0x9e7d
computed block checksum: 0x0           —> 0x0 means that checksum is good when printing the error message (transient problem)
Reading datafile ‘Datafile name’ for corruption at rdba: 0x5a066b2f (file 360, block 420655)
Reread (file 360, block 420655) found valid data
Hex dump of (file 360, block 420655) in trace file ….
Repaired corruption at (file 360, block 420655)

example 3 (transient control file corruption with checksum error):

Hex dump of (file 0, block 1) in trace file …
Corrupt block relative dba: 0x00000001 (file 0, block 1)
Bad check value found during control file header read
Data in bad block:
type: 21 format: 2 rdba: 0x00000001
last change scn: 0x0000.00000000 seq: 0x1 flg: 0x04
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0x00001501
check value in block header: 0xca35
computed block checksum: 0x0                 —> 0x0 means that checksum is good when printing the error message (transient problem)
Errors in file ..:
ORA-00202: control file: ‘/oracle/dbs/control_01.ctl’
Errors in file …
ORA-00227: corrupt block detected in control file: (block 1, # blocks 1)
ORA-00202: control file: ‘/oracle/dbs/control_01.ctl’

WORKAROUND

There is no workaround to avoid the problem, but if log corruption is encountered, one possible workaround is to clear the unarchived redo log.  The fix is to rollback the IBM SP or apply the updated fixes.

Syntax to clear logfile:

alter database clear <unarchived> logfile group <integer>;
alter database clear <unarchived> logfile ‘<filename>’;

PATCHES

The fix is now ready from IBM

It can be downloaded for the above releases via:

ftp://aix.software.ibm.com/aix/ifixes/

Affected AIX Levels     Fixed In           iFix / APAR (ftp://aix.software.ibm.com/aix/ifixes/)
6100-09-08               6100-09-09      IV93840
7100-03-08               7100-03-09      IV93884
7100-04-03               7100-04-04      IV93845
7200-00-03               7200-00-04      IV93883
7200-01-01               7200-01-02      IV93885

The fix is included in the next to be released AIX Service Packs.

IBM HIPER APAR
Abstract: PROBLEMS CAN OCCUR WITH THREAD_CPUTIME AND THREAD_CPUTIME_FAST

This APAR corrects an issue with system call thread_cputime_self with floating point registers which is exposed by Oracle Database 11gR2.

PROBLEM SUMMARY:
The thread_cputime or thread_cputime_fast interfaces can
cause invalid data in the FP/VMX/VSX registers if the thread
page faults in this function

For more information see the following from IBM:

http://www-01.ibm.com/support/docview.wss?uid=isg1SSRVPOAIX71HIPER170303-1247

参考:ALERT: Database Corruption ORA-600 ORA-7445 errors after applying AIX SP patches – AIX 6.1.9.8 or AIX 7.1.3.8 or AIX 7.1.4.3 or AIX 7.2.0.3 or AIX 7.2.1.0, 01 (Doc ID 2237498.1)