amdu参数详解

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:amdu参数详解

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

最近发现amdu命令比以前认知中的强大,记录下相关参数

[oracle@xifenfei ~]$ amdu help=y
a/usize         AU size for corrupt disks
-ausize <bytes>: This option must be set when -baddisks is set. It
    must be a power of 2. This size is required to scan a disk looking
    for metadata, and it is normally read from the disk header. The
    value applies to all disks that do not have a valid header. The
    value from the disk header will be used if a valid header is
    found.

ba/ddisks               Include disks with bad headers
-baddisks <diskgroup>:  Normally disks with bad disk headers, or that
    look like they were never part of a disk group, will not be
    scanned. This option forces them to be scanned anyway and to be
    considered part of the given diskgroup. This is most useful when
    a disk header has been damaged. The disk will still need to have
    a valid allocation table to drive the scan unless -fullscan is
    used. In any case at least one block in the first two AUs must be
    valid so that the disk number can be determined. The options
    -ausize and -blksize are required since these values are normally
    fetched from the disk header. If the diskgroup uses external
    redundancy then -external should be specified. These values will
    be compared against any valid disks found in the diskgroup and
    they must be the same.

bl/ksize                ASM block size for corrupt disks
-blksize <bytes>: This option must be set when -baddisks is set. It
    must be a power of 2. This size is required to scan a disk looking
    for metadata, and it is normally read from the disk header. The
    value applies to all disks that do not have a valid header. The
    value from the disk header will be used if a valid header is
    found.

c/ompare                Compare file mirrors
-compare: This option only applies to file extraction from a normal or
    high redundancy disk group. Every extent that is mirrored on more
    than one discovered disk will have all sides of its mirror
    compared. If they are not identical a message will be reported
    on standard error and the report file. The message will indicate
    which copy was extracted. A count of the blocks that are not
    identical will be in the report file.

dir/ectory              Directory from previous dump
-directory <string>: This option completely eliminates the discovery
    phase of operation. It specifies the name of a dump directory from
    a previous run of AMDU. The report file and map files are read
    instead of doing a discovery and scan. The parsing of these ASCII
    files is very dependent on them being exactly as written by AMDU.
    AMDU is unlikely to work properly if they have been modified by
    a text editor, or if some of the files are missing or truncated.
    Note that the directory may be a copy FTPed from another
    machine. The other machine may even be a different platform
    with a different endianess.

dis/kstring             Diskstring for discovery
 -diskstring <string>: By default the null string is used for
    discovery. The null string should discover all disks the user has
    access to. Many installations specify an asm_diskstring parameter
    for their ASM instance. If so that parameter value should be given
    here. Multiple discovery strings can be specified by multiple
    occurrences of -diskstring <string>. Beware of shell syntax
    conflicts with discovery strings. Diskstrings are usually the same
    syntax the shell uses for expanding path names on command lines so
    they will most likely need to be enclosed in single quotes.

du/mp           Diskgroups to dump
-dump <diskgroup>: This option specifies the name of a diskgroup to
    have its metadata dumped. This option may be specified multiple
    times to dump multiple diskgroups. If the diskgroup name is ALL
    then all diskgroups encountered will be dumped. The diskgroup name
    is not case sensitive, but will be converted to uppercase for all
    reports. If this option is not specified then no map or image
    files will be created, but -extract and -print may still work.

exc/lude                Disks to exclude
-exclude <string>: Multiple exclude options may be specified. These
    strings are used for discovery just like the values for diskstring.
    Only shallow discovery is done on these diskstrings. Any disks
    found in the exclude discovery will not be accessed. If they are
    also discovered using the -diskstring strings, then the report will
    include the information from shallow discovery along with a message
    indicating the disk was excluded.

exte/rnal               Assume external redundancy
-external: Normally AMDU determines the diskgroup redundancy from the
    disk headers. However this is not possible with the -baddisks
    option. It is assumed that the redundancy of the -baddisks
    diskgroup is normal or high unless this option is given to specify
    external redundancy.

extr/act                Files to extract
-extract <diskgroup>.<file_number>: This extracts the numbered file
    from the named diskgroup, case insensitive. This option may be
    specified multiple times to extract multiple files. The extracted
    file is placed in the dump directory under the name
    <diskgroup>_<number>.f  where <diskgroup> is the diskgroup name
    in uppercase, and <number> is the file number. The -output option
    may be used to write the file to any location. The extracted file
    will appear to have the same contents it would have if accessed
    through the database. If some portion of the file is unavailable
    then that portion of the output file will be filled with
    0xBADFDA7A, and a message will appear on stderr.

fi/ledump               Dump files rather than extract
-filedump: This option causes the file objects in the command line to
    have their blocks dumped to the image files rather than extracted.
    This can be combined with the -novirtual option to selectively
    dump only some of the metadata files. It may also be used to dump
    user files (number >= 256) so that all mirrored copies can be
    examined.

fo/rmer         Include dropped disks
-former: Normally disks marked as former are not scanned, but this
    option will scan them and include their contents in the output.
    This is useful when it is necessary to look at the contents of a
    disk that was dropped. Note that dropped normal disks will not have
    any entries in their allocation tables and thus only the physically
    addressed extents will be dumped. Force dropped disks will not have
    status former in their disk headers and are not affected by this
    option. However if DROP DISKGROUP is used, the disks will have the
    contents as of the time of the drop, and will be in status former.
    Thus this option is useful for extracting files from a dropped
    diskgroup.

fu/llscan               Scan entire disk
-fullscan: This option reads every AU on the disk and looks at the
    contents of the AU rather than limiting the AU's read based on the
    allocation table. This is useful when the allocation table is
    corrupt or needs recovery. An AU will be written to the image file
    if it starts with a block that contains a valid ASM block header.
    The file and extent information for the map will be extracted from
    the block header. Physically addressed metadata will be dumped
    regardless of its contents. This option is incompatible with
    extracting a file. It is an error to specify -extract with this
    option. Note that this option is likely to find old garbage
    metadata in unallocated AU's since there is no means of
    determining what is allocated. Thus there may be many different
    copies of the same block, possibly of different versions.

h/ex            Always print block contents in hex
-hex: This prints the block contents in hex without attempting to print
    them as ASM metadata. This is useful when the block is known to not
    be ASM metadata. It avoids the ASM block header dump and ensures
    the block is not accidentally interpreted as ASM metadata. This
    option requires at least one -print option.

noa/cd          Do not dump ACD
-noacd: This option limits the dumping of the Active Change Directory
    to just the control blocks that contain the checkpoint. There is
    126 MB of ACD per ASM instance (42 MB for external redundancy). It
    is normally of no interest if there has been a clean shutdown or
    no updates for a while. This option avoids dumping a lot of
    unimportant data. The blocks will still be read and checked for
    corruption. The map file will still contain entries for the ACD
    extents, but the block counts will be zero.

nod/ir          Do not create a dump directory
-nodir: No dump directory is created, and no files are created in it.
    The directory name is not written to standard out. The report file
    is written to standard out before any block printouts from any
    -print options.  This option conflicts with -filedump. It is an
    error to specify this and extract a file to the dump directory.

noe/xtract              Do not create extracted file
-noextract: This prevents files from being extracted to an output
    file, but the file will be read and any errors in selecting the
    correct output will be reported. This is most useful in
    combination with the -compare option.

noh/eart                Do not check for heartbeat
-noheart: Normally the heartbeat block will be saved at discovery time
    and checked when the disk is scanned. A sleep is added between
    discovery and scanning to ensure there is time for the heartbeat
    to be written. If the heartbeat block changes then it is most
    likely that the diskgroup containing this disk is mounted by an
    active ASM instance. An error and warning is generated but
    operation proceeds normally. This option suppresses this check
    and avoids the sleep.

noi/mage                Do not create image files
-noimage: No image files will be created n the dump directory. All
    the reads specified by the read options will still be done. The
    map files may be used to find blocks on the disks themselves. In
    the map file, the count of blocks dumped, the image file sequence
    number, and the byte offset in the image file will all always be
    zero (C00000 S0000 B0000000000)

nom/ap          Do not create map or image files
-nomap: No map file is created and no image file is created. The only
    output is the report file. The -noimage option is assumed if this
    is set since an image file without a map is useless. The options
    -noscan and -noread also result in no map or image files, but
    -nomap still reads the metadata to check for I/O errors and corrupt
    blocks.

nop/rint                Do no print block contents
-noprint: This suppresses the printout of the block contents for
    blocks printed with the -print option. It is useful for getting
    just the block reports without a lot of data. This option requires
    at least one -print option.

norea/d         Shallow discovery only
-noread: This eliminates any reading of any disks at all. Only shallow
    discovery will be done. The report will end after the discovery
    section. It is an error to specify this option and specify a file
    to extract or blocks to print. It is an error to specify this
    and -fullscan.

norep/ort               Do not generate a report
-noreport: This suppresses the generation of the report file. It is
    most useful in combination with -nodir and -print to get block
    printouts without a lot of clutter. It is unnecessary to include
    this with -directory since no report is generated then anyway.

nosc/an         Deep discovery only
-noscan: This eliminates any reading of any disks after deep
    discovery. This results in just doing a deep discovery using the
    disksting parameter. The report will end after the discovery
    section. It is an error to specify this option and specify a file
    to extract. It is an error to specify this and -fullscan.

nosu/bdir               Do not create a dump directory
-nosubdir: No dump directory is created, but files are still created.
    The directory name is not written to standard out. The report file
    and any other dump or extract  files are written to the current
    directory or to the directory indicated by -parentdir. This means
    that if multiple AMDU dumps are requested using this option, the
    report file will always correspond to the last dump requested.

nov/irtual              Do not dump virtual metadata
-novirtual: This option eliminates reading of any virtual metadata.
    Only the physically addressed metadata will be read. This
    implicitly eliminates the ACD and extent maps so -noacd and
    -noxmap will be assumed.

nox/map         Do not dump extent maps
-noxmap: This option eliminates reading of the indirect extents
    containing the file extent maps. This is the bulk of the metadata
    in most diskgroups. Even the entries in the map file will be
    eliminated.

o/utput         Files to create for extract
-output <file_name>: This option specifies a different file for
    writing an extracted file. The file will be overwritten if it
    already exists. This option requires that exactly one file is
    extracted via the -extract option.

pa/rent         Parent for dump directory
-parent <path_name>: By default the dump directory is created in the
    current directory, but another directory can be specified using
    this option. The parent directory for the dump directory must
    already exist.

pr/int          Block to print
-print <block_spec>: This option prints one or more blocks to standard
    out. This option may be specified multiple times to print multiple
    <block_spec>s. The printout contains information about how each
    block was read as well as a formatted printout. Multiple blocks
    matching the same <block_spec> may be found when scanning the
    disks. For example there may be multiple disks that have headers
    for the same diskgroup and disk number. If the block is from a
    mirrored file then multiple copies should exist on different disks.
    If multiple copies of the same block have identical contents then
    only one formatted printout of the contents will be generated, but
    a header will be printed for each copy. A <block_spec> may include
    a count of sequential blocks to print. A <block_spec> may specify
    a block either by disk or file.
   <block_spec> ::= <single_block> | <single_block>.C<count>
   <single_block> ::= <report_disk_block> | <group_disk_block> |
        <extent_file_block> | <virtual_file_block> | <xmap_file_block>
   <report_disk_block> ::=
        <group_name>.N<report_number>.A<au_number>.B<block_number>
   <group_disk_block> ::=
         <group_name>.D<disk_number>.A<au_number>.B<block_number>
   <extent_file_block> ::=
         <group_name>.F<file_number>.X<physical_extent>.B<block_number>
   <virtual_file_block> ::= 
         <group_name>.F<file_number>.V<virtual_block_number>
   <xmap_file_block> ::=
         <group_name>.F<file_number>.M<extent_map_block_number>

r/egistry               Dump registry files
-registry: The ASM registries will be read and dumped to the image
    file. There will be no block consistency checks since these files
    do not have ASM cache headers. To dump one specific registry
    specify -filedump and include the file object for the registry
    (e.g. DATA.255).

s/pfile         Extract usable spfile
-spfile: This causes extract to render the resulting file in a form   
    that is directly usable by startup. Without this option, AMDU   
    will extract the file as a regular ASM file including all ASM   
    specific headers and such

ORA-600 kghstack_underflow_internal_2

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:ORA-600 kghstack_underflow_internal_2

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

aix平台运行11.2.0.4 rac,突然一个节点crash,lms2进程报ORA-600 kghstack_underflow_internal_2错误

Thu Aug 03 18:43:16 2023
Errors in file /u01/oracle/app/oracle/diag/rdbms/xff/xff2/trace/xff2_lms2_2884404.trc  (incident=761244):
ORA-00600: internal error code, arguments: [kghstack_underflow_internal_2], [0x11074D658], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/oracle/app/oracle/diag/rdbms/xff/xff2/incident/incdir_761244/xff2_lms2_2884404_i761244.trc
Errors in file /u01/oracle/app/oracle/diag/rdbms/xff/xff2/trace/xff2_lms2_2884404.trc  (incident=761245):
ORA-00600: internal error code, arguments: [kghstack_underflow_internal_2], [0x11AB5BBF0], [], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kghstack_underflow_internal_2], [0x11074D658], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/oracle/app/oracle/diag/rdbms/xff/xff2/incident/incdir_761245/xff2_lms2_2884404_i761245.trc
Thu Aug 03 18:43:19 2023
Dumping diagnostic data in directory=[cdmp_20230803184319], requested by (instance=2, osid=2884404 (LMS2)), summary=[incident=761245].
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Thu Aug 03 18:43:23 2023
Sweep [inc][761245]: completed
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/oracle/app/oracle/diag/rdbms/xff/xff2/trace/xff2_lms2_2884404.trc:
ORA-00600: internal error code, arguments: [kghstack_underflow_internal_2], [0x11074D658], [], [], [], [], [], [], [], [], [], []
Sweep [inc][761244]: completed
Sweep [inc2][761245]: completed
Sweep [inc2][761244]: completed
Thu Aug 03 18:43:29 2023
Errors in file /u01/oracle/app/oracle/diag/rdbms/xff/xff2/trace/xff2_lms2_2884404.trc:
ORA-00600: internal error code, arguments: [kghstack_underflow_internal_2], [0x11074D658], [], [], [], [], [], [], [], [], [], []
LMS2 (ospid: 2884404): terminating the instance due to error 484

分析trace文件中的Call Stack Trace信息

----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
skdstdst()+40        bl       0000000109B3EE38     000000000 ? 000000001 ?
                                                   000000003 ? 000000000 ?
                                                   000000000 ? 000000001 ?
                                                   000000003 ? 000000000 ?
ksedst1()+112        call     skdstdst()           1777D9901C4FD34D ?
                                                   4840284100000000 ?
                                                   FFFFFFFFFFECE20 ?
                                                   2A501377F67A7 ? 10A742204 ?
                                                   000000000 ? 1107486C0 ?
                                                   2050033FFFECE28 ?
ksedst()+40          call     ksedst1()            FFFFFFFFFFFE0002 ?
                                                   0000060F1 ? 000000001 ?
                                                   10A46AD18 ? 000000000 ?
                                                   000000000 ? 000002004 ?
                                                   000000001 ?
dbkedDefDump()+1516  call     ksedst()             000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 300000003 ?
ksedmp()+72          call     dbkedDefDump()       3107486C0 ? 110000A28 ?
                                                   FFFFFFFFFFED630 ? 1106ABC70 ?
                                                   100125778 ? FFFFFFFFFFED5B0 ?
                                                   FFFFFFFFFFEDA30 ? 1106ABC70 ?
ksfdmp()+100         call     ksedmp()             000000002 ? 000000000 ?
                                                   000000002 ? 10AF71A68 ?
                                                   10A0720F8 ? 000000000 ?
                                                   1108EC608 ? 1107486C0 ?
dbgexPhaseII()+1904  call     ksfdmp()             FFFFFFFFFFFE0002 ?
                                                   0000060F1 ? 000000002 ?
                                                   000000000 ? 000000002 ?
                                                   10A0720F0 ? 000000000 ?
                                                   001050005 ?
dbgexProcessError()  call     dbgexPhaseII()       1107486C0 ? 1108EFB28 ?
+1556                                              0000B9D9D ? 200000000 ?
                                                   FFFFFFFFFFEE548 ? 000000104 ?
                                                   FFFFFFFFFFEDBB0 ?
                                                   FB400000000 ?
dbgeExecuteForError  call     dbgexProcessError()  1107486C0 ? 1108EC608 ?
()+72                                              100000000 ? 000000000 ?
                                                   FFFFFFFFFFF29E0 ?
                                                   2840288000000012 ?
                                                   10013DA4C ? 1108EE350 ?
dbgePostErrorKGE()+  call     dbgeExecuteForError  000000002 ? 000000128 ?
2044                          ()                   FFFFFFFFFFFE0002 ?
                                                   215265335E5162 ?
                                                   3726000000000001 ?
                                                   10A46AD18 ? 10A46CB00 ?
                                                   FFFFFFFFFFF1D30 ?
dbkePostKGE_kgsf()+  call     dbgePostErrorKGE()   000000001 ? 10A46AD18 ?
68                                                 25800000000 ? 109E7A740 ?
                                                   000000000 ? 000000038 ?
                                                   FFFFFFFFFFF2800 ? 11AB1AC50 ?
kgeadse()+380        call     dbkePostKGE_kgsf()   900000000512C74 ?
                                                   9001000A008DAD0 ? 000000000 ?
                                                   9001000A008DAD0 ?
                                                   8000000FFFF2C40 ?
                                                   7000147E8F28C98 ? 400000008 ?
                                                   1100054A0 ?
kgerinv_internal()+  call     kgeadse()            7FFFFFFFFFFFFFFF ?
48                                                 FFFFFFFFFFFEF8FF ?
                                                   000000019 ? 110476528 ?
                                                   000000001 ? 000000017 ?
                                                   00000000B ? 000000000 ?
kgerinv()+48         call     kgerinv_internal()   FFFFFFFFFFFEF8FF ?
                                                   FFFFFFFFFFFFFFFF ?
                                                   FFFFFFFFFFFFFFFF ?
                                                   7FFFFFFFFFFFFFFF ?
                                                   1001648E0 ? FFFFFFFFFFF25E0 ?
                                                   1106ABC70 ? 11073B3C0 ?
kgeasnmierr()+72     call     kgerinv()            000000000 ? 215265335E5162 ?
                                                   372600383A0F5000 ?
                                                   000000004 ? 10A328F7C ?
                                                   FFFFFFFFFFF2898 ? 000000002 ?
                                                   0FFFFFFFF ?
kghstack_underflow_  call     kgeasnmierr()        11AB967A0 ? 000000000 ?
internal()+280                                     FFFFFFFFFFF2860 ? 100000001 ?
                                                   000000002 ? 11AB5BBF0 ?
                                                   000000000 ? 11AB96778 ?
kghstack_free()+716  call     kghstack_underflow_  10A328F7C ? 110A2FEC0 ?
                              internal()           000000004 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000080 ? 80000000000000 ?
ktudda()+912         call     kghstack_free()      11AB5BBF0 ? 7215265335E5162 ?
                                                   3726000000000008 ?
                                                   000000102 ? 109E747E0 ?
                                                   FFFFFFFFFFF2A90 ? 000000048 ?
                                                   28408880FFFFFFFF ?
kcbtdu()+1636        call     ktudda()             70001383A0F4014 ? 000000000 ?
                                                   1FE800000000 ? 07F7F7F7F ?
                                                   FFFFFFFF80808080 ?
                                                   000000000 ? 000000030 ?
                                                   FFFFFFFFFFF2B30 ?
kcbzdh()+3200        call     kcbtdu()             35900000359 ? 100000001 ?
                                                   000000001 ? 200000001 ?
                                                   000000001 ? 00000005D ?
                                                   200066665D20 ? 000000000 ?
kcbzpnd()+504        call     kcbzdh()             70001383F6D64B8 ? 000002004 ?
                                                   2107486C0 ? 10A74269E ?
                                                   1107486C0 ? FFFFFFFFFFF3B30 ?
                                                   FFFFFFFFFFF38E0 ? 000000000 ?
kcbdnb()+724         call     kcbzpnd()            10A74267C ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 0001CE860 ?
                                                   000000000 ? 000000000 ?
dbkedDefDump()+5528  call     kcbdnb()             200000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   1100224D0 ? 000000018 ?
                                                   110001366 ? 000000000 ?
ksedmp()+72          call     dbkedDefDump()       3107486C0 ? 110000A28 ?
                                                   FFFFFFFFFFF3FC0 ? 1106ABC70 ?
                                                   100125778 ? 000000000 ?
                                                   FFFFFFFFFFF3FB0 ? 1106ABC70 ?
ksfdmp()+100         call     ksedmp()             000000002 ? 000000000 ?
                                                   000000002 ? 10AF71A68 ?
                                                   10A0720F8 ? 000000000 ?
                                                   1109DE650 ? 1107486C0 ?
dbgexPhaseII()+1904  call     ksfdmp()             11074B65C ? 000000001 ?
                                                   000000002 ? 000000000 ?
                                                   000000002 ? 10A0720F0 ?
                                                   000000000 ? 001050005 ?
dbgexProcessError()  call     dbgexPhaseII()       1107486C0 ? 1109DC860 ?
+1556                                              0000B9D9C ? 200000000 ?
                                                   FFFFFFFFFFF4ED8 ? 000000082 ?
                                                   FFFFFFFFFFF4560 ?
                                                   88A4422A00000000 ?
dbgeExecuteForError  call     dbgexProcessError()  1107486C0 ? 1109DE650 ?
()+72                                              100000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   0DFFFFFFF ? 1109E0398 ?
dbgePostErrorKGE()+  call     dbgeExecuteForError  00000000A ? 000000000 ?
2044                          ()                   000000001 ? 000000001 ?
                                                   000000000 ? 000000000 ?
                                                   FFFFFFFFFFFB4E0 ? 000000000 ?
dbkePostKGE_kgsf()+  call     dbgePostErrorKGE()   000000000 ? FFFFFFFFFFF96B0 ?
68                                                 2580000000A ? 109E7A740 ?
                                                   000000000 ? 000000000 ?
                                                   FFFFFFFFFFF9190 ? 11AB1AC50 ?
kgeadse()+380        call     dbkePostKGE_kgsf()   000000001 ? 000000008 ?
                                                   000000000 ? 10A30EA38 ?
                                                   110000C20 ? 700014771160D68 ?
                                                   700014772ADB3A8 ? 000000001 ?
kgerinv_internal()+  call     kgeadse()            000000003 ? 000000000 ?
48                                                 11074B65C ? 000000001 ?
                                                   000000000 ? FFFFFFFFFFF96B0 ?
                                                   00000000A ? 000000001 ?
kgerinv()+48         call     kgerinv_internal()   000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
kgeasnmierr()+72     call     kgerinv()            000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   FFFFFFFFFFF92B0 ?
                                                   48102840FFFFA5B0 ?
                                                   11AB5BBB8 ? 11074D658 ?
kghstack_underflow_  call     kgeasnmierr()        022028200 ? 022202820 ?
internal()+280                                     11AB5BBB8 ? 100000001 ?
                                                   000000002 ? 11074D658 ?
                                                   0442C2394 ? 000002000 ?
kghstack_free()+716  call     kghstack_underflow_  FFFFFFFFFFF92B0 ?
                              internal()           FFFFFFFFFFF95B8 ?
                                                   FFFFFFFFFFF92B0 ? 000000001 ?
                                                   FFFFFFFFFFF92B0 ?
                                                   FFFFFFFFFFF95E8 ?
                                                   FFFFFFFFFFF95B8 ? 11074B650 ?
ktundo()+924         call     kghstack_free()      0DEADBEEF ? 11074D668 ?
                                                   11074B654 ? 300000000 ?
                                                   1FFFFB4E0 ? FFFFFFFFFFFB4E0 ?
                                                   FFFFFFFFFFF94C0 ?
                                                   FFFFFFFFFFF9470 ?
kturCRBackoutOneChg  call     ktundo()             19FFFFB5E0 ?
()+848                                             494CEDB3FFFF9E50 ?
                                                   FFFFFFFFFFF9E48 ? 000000000 ?
                                                   000000000 ? FFFFFFFFFFFA5B0 ?
                                                   100000000 ? FFFFFFFFFFFB4E0 ?
ktrgcm()+5816        call     kturCRBackoutOneChg  FFFFFFFFFFFA5B0 ?
                              ()                   19FFFFA440 ?
                                                   FFFFFFFFFFFA5B8 ? 000000000 ?
                                                   1FFFFA478 ? FFFFFFFFFFFB4E0 ?
                                                   000000000 ? 000000000 ?
ktrget3()+832        call     ktrgcm()             FFFFFFFFFFFAC80 ? 000000000 ?
                                                   000000000 ? 000000003 ?
                                                   058F7501F ? 000000001 ?
                                                   000000004 ? 000000003 ?
ktrget2()+104        call     ktrget3()            000000002 ? 700000000014488 ?
                                                   7000147E9C41A50 ? 000000022 ?
                                                   110A123A0 ? 000000000 ?
                                                   FFFFFFFFFFFB080 ? 110A123B8 ?
kclgeneratecr()+654  call     ktrget2()            FFFFFFFFFFFB4D0 ? 110AA1610 ?
0                                                  14F11E4E00 ? 0F11E4E00 ?
                                                   357FED028 ? 000030000 ?
                                                   7000147E9C41A50 ?
                                                   700000000014488 ?
kclgcr()+812         call     kclgeneratecr()      11A209508 ? FFFFFFFFFFFBFC0 ?
                                                   FFFFFFFFFFFBC18 ? 000000000 ?
                                                   0FFFFBB10 ? 01A275AC8 ?
                                                   1761D7F302ED25AC ?
                                                   20000011A275AC8 ?
kclcrrf()+536        call     kclgcr()             FFFFFFFFFFFBC20 ?
                                                   FFFFFFFFFFFBD00 ? 101F5080C ?
                                                   000000000 ? 0000003E8 ?
                                                   000000028 ? 0000000C8 ?
                                                   FFFFFFFFFFFBF88 ?
kjblcrcbk()+896      call     kclcrrf()            000000001 ? 000000000 ?
                                                   7000147EB0F07B8 ?
                                                   7000147576C4471 ?
                                                   401472C30C7F0 ?
                                                   7000147576C4408 ?
                                                   7000147576C3190 ?
                                                   7000147576C7170 ?
kjblpcr()+304        call     kjblcrcbk()          FFFFFFFFFFFBDA8 ? 000000038 ?
                                                   7000147FABBDB48 ? 600000006 ?
                                                   000000016 ? 11A209468 ?
                                                   000000013 ? 0001C2153 ?
kjbmpbast()+1792     call     kjblpcr()            000000012 ? 000000168 ?
                                                   000000002 ? 70001109FDB8148 ?
                                                   357000000000357 ?
                                                   7000144F31F7750 ?
                                                   895000000000895 ? 000000000 ?
kjmxmpm()+760        call     kjbmpbast()          1000000000000 ? 80000001E ?
                                                   000000000 ? 11A2951C8 ?
                                                   C000000000 ? 000000000 ?
                                                   1000000000000 ? 000000000 ?
kjmpbmsg()+3508      call     kjmxmpm()            000000000 ? 11A3769E0 ?
                                                   FFFFFFFFFFFC380 ? 06DBFBAEF ?
                                                   101E13820 ? 11A3769E0 ?
                                                   7000147E339AE08 ?
                                                   FFFFFFFFFFFC210 ?
kjmsm()+13416        call     kjmpbmsg()           11A209448 ? 7000147E339AE08 ?
                                                   100000019 ? 100000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 7000000000168FD ?
ksbrdp()+2216        call     kjmsm()              7000000000168E0 ?
                                                   7000000000168FC ? 048244028 ?
                                                   000000E00 ? 1108B69F0 ?
                                                   100637768 ? 000000001 ?
                                                   700000007 ?
opirip()+1620        call     ksbrdp()             FFFFFFFFFFFFE22 ? 10AFA5FC8 ?
                                                   FFFFFFFFFFFDC10 ? 000000000 ?
                                                   000000001 ? 000000000 ?
                                                   01380038F ? 000000001 ?
opidrv()+608         call     opirip()             10AFA23B0 ? 410134118 ?
                                                   FFFFFFFFFFFED80 ?
                                                   2F7530312F ? 108A7E8C4 ?
                                                   1106ABC70 ?
                                                   652F70726F647563 ?
                                                   1106ABC70 ?
sou2o()+136          call     opidrv()             3208A885B0 ? 400000000 ?
                                                   FFFFFFFFFFFED80 ?
                                                   23001801CD0000 ? 000000010 ?
                                                   1106ABC70 ? 000000000 ?
                                                   000000000 ?
opimai_real()+188    call     sou2o()              FFFFFFFFFFFEDF0 ?
                                                   4424444B00000001 ?
                                                   9000000000D73CC ?
                                                   BADC0FFEE0DDF00D ?
                                                   000000003 ? 9001000A008DAD0 ?
                                                   A0000000A000000 ? 10B6A8F30 ?
ssthrdmain()+276     call     opimai_real()        9001000A0011A60 ?
                                                   FFFFFFFFFFFF148 ?
                                                   FFFFFFFFFFFEEF0 ? 10B6E9280 ?
                                                   90000000008582C ?
                                                   9001000A008DAD0 ?
                                                   FFFFFFFFFFFEED0 ?
                                                   9001000A008DAD0 ?
main()+204           call     ssthrdmain()         3F0003660 ? FFFFFFFFFFFF238 ?
                                                   FFFFFFFFFFFF2A0 ?
                                                   9FFFFFFF000D658 ?
                                                   9FFFFFFF00009A0 ? 000000000 ?
                                                   000000000 ? 9FFFFFFF000D658 ?
__start()+112        call     main()               000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
 

--------------------- Binary Stack Dump ---------------------

查询mos对比相关信息,参考: LMON or LMS Process Crashes Instance With ORA-600 [kghstack_underflow_internal_2] (Doc ID 2003278.1)信息

The LMON or LMS process crash the instance with an error like:

ORA-00600: internal error code, arguments: [kghstack_underflow_internal_2], [0x110A10838], [], [], [], [], [], [], [], [], [], []

ORA-1092 : opitsk aborting process
Instance terminated by LMS1, pid = 14024818
 

Review of the generated tracefiles reveals a call stack similar to:

... kghstack_underflow_internal kghstack_free kccgrd kjxgrf_rr_read kjxgrDD_rr_read kjxgrimember kjxggpoll kjfmact kjfdact kjfcln ksbrdp ...
  
- OR -
  
... kghstack_underflow_internal kghstack_free ktundo kturcrbackoutonechg ktrgcm ktrget3 ktrget2 kclgcr ...

确认为Bug 18687067 – ORA-600 [KGHSTACK_UNDERFLOW_INTERNAL_2] closed as duplicate of Bug 20675347 – ORA-07445 [KGHSTACK_OVERFLOW_INTERNAL()+644](The bug is caused by an AIX compiler issue causing volatile variables in the Oracle kernel not to be handled properly.),解决方案升级数据库到12.1及其以上版本或者打上patch 20675347

WRH$_LATCH, WRH$_SYSSTAT, WRH$_PARAMETER对象较大

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:WRH$_LATCH, WRH$_SYSSTAT, WRH$_PARAMETER对象较大

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

通过awrinfo查看发现sysaux中以下对象大小属于top N

**********************************
(3b) Space usage within AWR Components (> 500K)
**********************************

COMPONENT        MB SEGMENT_NAME - % SPACE_USED                                           SEGMENT_TYPE
--------- --------- --------------------------------------------------------------------- ---------------
FIXED         136.0 WRH$_PARAMETER_PK.WRH$_PARAME_1600597976_0                    -  68%  INDEX PARTITION
FIXED         128.0 WRH$_LATCH.WRH$_LATCH_1600597976_0                            -  98%  TABLE PARTITION
FIXED         104.0 WRH$_PARAMETER.WRH$_PARAME_1600597976_0                       -  97%  TABLE PARTITION
FIXED          88.0 WRH$_SYSSTAT_PK.WRH$_SYSSTA_1600597976_0                      -  99%  INDEX PARTITION
FIXED          88.0 WRH$_SYSSTAT.WRH$_SYSSTA_1600597976_0                         -  90%  TABLE PARTITION
FIXED          80.0 WRH$_LATCH_PK.WRH$_LATCH_1600597976_0                         -  99%  INDEX PARTITION

查新mos发现类似文档:WRH$_LATCH, WRH$_SYSSTAT, and WRH$_PARAMETER Consume the Majority of Space within SYSAUX (Doc ID 2099998.1)
对应的bug为:Bug 14084247 – ORA-1555 or ORA-12571 Failed AWR purge can lead to continued SYSAUX space use (Doc ID 14084247.8)
处理操作

SQL> SELECT COUNT(1) HOW_MANY
  2  FROM sys.WRH$_PARAMETER a
  3  WHERE NOT EXISTS
  4  (SELECT 1
  5  FROM sys.wrm$_snapshot
  6  WHERE snap_id = a.snap_id
  7  AND dbid = a.dbid
  8  AND instance_number = a.instance_number
  9  );

  HOW_MANY
----------
   2406788

SQL> DELETE FROM sys.WRH$_LATCH a
  2  WHERE NOT EXISTS
  3  (SELECT 1
  4  FROM sys.wrm$_snapshot b
  5  WHERE b.snap_id = a.snap_id
  6  AND dbid=(SELECT dbid FROM v$database)
  7  AND b.dbid = a.dbid
  8  AND b.instance_number = a.instance_number);

已删除2411808行。

SQL>
SQL> DELETE FROM sys.WRH$_SYSSTAT a
  2  WHERE NOT EXISTS
  3  (SELECT 1
  4  FROM sys.wrm$_snapshot b
  5  WHERE b.snap_id = a.snap_id
  6  AND dbid=(SELECT dbid FROM v$database)
  7  AND b.dbid = a.dbid
  8  AND b.instance_number = a.instance_number);

已删除2747472行。

SQL>
SQL> DELETE FROM sys.WRH$_PARAMETER a
  2  WHERE NOT EXISTS
  3  (SELECT 1
  4  FROM sys.wrm$_snapshot b
  5  WHERE b.snap_id = a.snap_id
  6  AND dbid=(SELECT dbid FROM v$database)
  7  AND b.dbid = a.dbid
  8  AND b.instance_number = a.instance_number);

已删除2406788行。

SQL>
SQL> COMMIT;

提交完成。

SQL> ALTER TABLE WRH$_LATCH ENABLE ROW MOVEMENT;

表已更改。

SQL> ALTER TABLE WRH$_LATCH SHRINK SPACE COMPACT;

表已更改。

SQL> ALTER TABLE WRH$_LATCH SHRINK SPACE;

表已更改。

SQL> ALTER TABLE WRH$_LATCH SHRINK SPACE CASCADE;

表已更改。

SQL>
SQL> ALTER TABLE WRH$_PARAMETER ENABLE ROW MOVEMENT;

表已更改。

SQL> ALTER TABLE WRH$_PARAMETER SHRINK SPACE COMPACT;

表已更改。

SQL> ALTER TABLE WRH$_PARAMETER SHRINK SPACE;

表已更改。

SQL> ALTER TABLE WRH$_PARAMETER SHRINK SPACE CASCADE;

表已更改。

SQL> ALTER TABLE WRH$_SYSSTAT ENABLE ROW MOVEMENT;

表已更改。

SQL> ALTER TABLE WRH$_SYSSTAT SHRINK SPACE COMPACT;

表已更改。

SQL> ALTER TABLE WRH$_SYSSTAT SHRINK SPACE;

表已更改。

SQL> ALTER TABLE WRH$_SYSSTAT SHRINK SPACE CASCADE;

表已更改。

SQL> ALTER TABLE WRH$_SYSSTAT disable ROW MOVEMENT;

表已更改。

SQL> ALTER TABLE WRH$_PARAMETER disable ROW MOVEMENT;

表已更改。

SQL> ALTER TABLE WRH$_LATCH disable ROW MOVEMENT;

表已更改。

再次查看这些TOP对象消失

**********************************
(3b) Space usage within AWR Components (> 500K)
**********************************

COMPONENT        MB SEGMENT_NAME - % SPACE_USED                                           SEGMENT_TYPE
--------- --------- --------------------------------------------------------------------- ---------------
FIXED          56.0 WRH$_SERVICE_STAT_PK.WRH$_SERVIC_1600597976_0                 -  64%  INDEX PARTITION
FIXED          29.0 WRH$_SERVICE_STAT.WRH$_SERVIC_1600597976_0                    -  95%  TABLE PARTITION
FIXED          26.0 WRH$_ROWCACHE_SUMMARY.WRH$_ROWCAC_1600597976_0                -  96%  TABLE PARTITION
FIXED          21.0 WRH$_MVPARAMETER.WRH$_MVPARA_1600597976_0                     -  95%  TABLE PARTITION
FIXED          17.0 WRH$_ROWCACHE_SUMMARY_PK.WRH$_ROWCAC_1600597976_0             -  98%  INDEX PARTITION
FIXED          17.0 WRH$_MVPARAMETER_PK.WRH$_MVPARA_1600597976_0                  -  97%  INDEX PARTITION
FIXED          12.0 WRH$_SYSMETRIC_HISTORY                                        -  45%  TABLE

Oracle 19C 报ORA-704 ORA-01555故障处理

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:Oracle 19C 报ORA-704 ORA-01555故障处理

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

异常断电导致数据库无法启动,尝试对数据文件进行recover操作,报ORA-00283 ORA-00742 ORA-00312错误,由于redo写丢失无法正常应用

D:\check_db>sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on 星期日 7月 30 07:49:19 2023
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


连接到:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

SQL> recover datafile 1;
ORA-00283: 恢复会话因错误而取消
ORA-00742: 日志读取在线程 1 序列 9274 块 18057 中检测到写入丢失情况
ORA-00312: 联机日志 1 线程 1: 'D:\APP\ADMINISTRATOR\ORADATA\XFF\REDO01.LOG'

屏蔽数据一致性,尝试强制打开库,报ORA-00604,ORA-00704,ORA-01555错误

SQL> alter database open resetlogs;
alter database open resetlogs
*
第 1 行出现错误:
ORA-00603: ORACLE server session terminated by fatal error
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 9 with name
"_SYSSMU9_4165470211$" too small
进程 ID: 4036
会话 ID: 2277 序列号: 40707

alert日志对应错误

2023-07-30T06:54:43.457383+08:00
.... (PID:5836): Clearing online redo logfile 1 complete
.... (PID:5836): Clearing online redo logfile 2 complete
.... (PID:5836): Clearing online redo logfile 3 complete
Resetting resetlogs activation ID 3572089731 (0xd4e9c383)
Online log D:\APP\ADMINISTRATOR\ORADATA\XFF\REDO01.LOG: Thread 1 Group 1 was previously cleared
Online log D:\APP\ADMINISTRATOR\ORADATA\XFF\REDO02.LOG: Thread 1 Group 2 was previously cleared
Online log D:\APP\ADMINISTRATOR\ORADATA\XFF\REDO03.LOG: Thread 1 Group 3 was previously cleared
2023-07-30T06:54:43.863676+08:00
Setting recovery target incarnation to 2
2023-07-30T06:54:44.816771+08:00
Ping without log force is disabled:
  instance mounted in exclusive mode.
Endian type of dictionary set to little
2023-07-30T06:54:44.957395+08:00
Assigning activation ID 3664275149 (0xda6866cd)
2023-07-30T06:54:44.957395+08:00
TT00 (PID:4640): Gap Manager starting
2023-07-30T06:54:45.004305+08:00
Redo log for group 1, sequence 1 is not located on DAX storage
2023-07-30T06:54:46.176153+08:00
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: D:\APP\ADMINISTRATOR\ORADATA\XFF\REDO01.LOG
Successful open of redo thread 1
2023-07-30T06:54:46.191771+08:00
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
stopping change tracking
2023-07-30T06:54:46.223036+08:00
TT03 (PID:1816): Sleep 5 seconds and then try to clear SRLs in 2 time(s)
2023-07-30T06:54:46.332398+08:00
ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 
0x0000000017b852a7
):
2023-07-30T06:54:46.332398+08:00
select ctime, mtime, stime from obj$ where obj# = :1
2023-07-30T06:54:46.332398+08:00
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\trace\xff_ora_5836.trc:
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01555: 快照过旧: 回退段号 9 (名称为 "_SYSSMU9_4165470211$") 过小
2023-07-30T06:54:46.332398+08:00
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\trace\xff_ora_5836.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01555: 快照过旧: 回退段号 9 (名称为 "_SYSSMU9_4165470211$") 过小
2023-07-30T06:54:46.348028+08:00
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\trace\xff_ora_5836.trc:
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01555: 快照过旧: 回退段号 9 (名称为 "_SYSSMU9_4165470211$") 过小
Error 704 happened during db open, shutting down database
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\trace\xff_ora_5836.trc  (incident=474502):
ORA-00603: ORACLE 服务器会话因致命错误而终止
ORA-01092: ORACLE 实例终止。强制断开连接
ORA-00704: 引导程序进程失败
ORA-00704: 引导程序进程失败
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01555: 快照过旧: 回退段号 9 (名称为 "_SYSSMU9_4165470211$") 过小
Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\xff\xff\incident\incdir_474502\xff_ora_5836_i474502.trc
2023-07-30T06:54:47.785549+08:00
opiodr aborting process unknown ospid (5836) as a result of ORA-603
2023-07-30T06:54:47.816792+08:00
ORA-603 : opitsk aborting process
License high water mark = 6
USER (ospid: (prelim)): terminating the instance due to ORA error 

这类错误比较常见,参考以前类似恢复:
在数据库open过程中常遇到ORA-01555汇总
数据库open过程遭遇ORA-1555对应sql语句补充
Oracle Recovery Tools恢复—ORA-00704 ORA-01555故障
使用_allow_resetlogs_corruption导致ORA-00704/ORA-01555故障
对于本次故障,通过Oracle Recovery Tools工具快速处理
patch


open数据库成功

SQL> alter database open;

数据库已更改。

SQL>
SQL>
SQL> select status,count(1) from v$datafile group by status;

STATUS           COUNT(1)
-------------- ----------
SYSTEM                  1
ONLINE                 61

Exadata磁盘损坏导致磁盘组无法mount恢复(oracle一体机磁盘组异常恢复)

联系:手机/微信(+86 17813235971) QQ(107644445)QQ咨询惜分飞

标题:Exadata磁盘损坏导致磁盘组无法mount恢复(oracle一体机磁盘组异常恢复)

作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]

Oracle Exadata客户,在换盘过程中,cell节点又一块磁盘损坏,导致datac1磁盘组(该磁盘组是normal方式冗余)无法mount

Thu Jul 20 22:01:21 2023
SQL> alter diskgroup datac1 mount force 
NOTE: cache registered group DATAC1 number=1 incarn=0x0728ad12
NOTE: cache began mount (first) of group DATAC1 number=1 incarn=0x0728ad12
NOTE: Assigning number (1,35) to disk (o/192.168.10.9;192.168.10.10/DATAC1_CD_11_dm01celadm03)
NOTE: Assigning number (1,31) to disk (o/192.168.10.9;192.168.10.10/DATAC1_CD_07_dm01celadm03)
NOTE: Assigning number (1,24) to disk (o/192.168.10.9;192.168.10.10/DATAC1_CD_00_dm01celadm03)
NOTE: Assigning number (1,25) to disk (o/192.168.10.9;192.168.10.10/DATAC1_CD_01_dm01celadm03)
NOTE: Assigning number (1,27) to disk (o/192.168.10.9;192.168.10.10/DATAC1_CD_03_dm01celadm03)
NOTE: Assigning number (1,33) to disk (o/192.168.10.9;192.168.10.10/DATAC1_CD_09_dm01celadm03)
NOTE: Assigning number (1,30) to disk (o/192.168.10.9;192.168.10.10/DATAC1_CD_06_dm01celadm03)
NOTE: Assigning number (1,28) to disk (o/192.168.10.9;192.168.10.10/DATAC1_CD_04_dm01celadm03)
NOTE: Assigning number (1,26) to disk (o/192.168.10.9;192.168.10.10/DATAC1_CD_02_dm01celadm03)
NOTE: Assigning number (1,1) to disk (o/192.168.10.9;192.168.10.10/DATAC1_CD_08_dm01celadm03)
NOTE: Assigning number (1,34) to disk (o/192.168.10.9;192.168.10.10/DATAC1_CD_10_dm01celadm03)
NOTE: Assigning number (1,29) to disk (o/192.168.10.9;192.168.10.10/DATAC1_CD_05_dm01celadm03)
NOTE: Assigning number (1,3) to disk (o/192.168.10.7;192.168.10.8/DATAC1_CD_07_dm01celadm02)
NOTE: Assigning number (1,4) to disk (o/192.168.10.7;192.168.10.8/DATAC1_CD_06_dm01celadm02)
NOTE: Assigning number (1,5) to disk (o/192.168.10.7;192.168.10.8/DATAC1_CD_00_dm01celadm02)
NOTE: Assigning number (1,6) to disk (o/192.168.10.7;192.168.10.8/DATAC1_CD_10_dm01celadm02)
NOTE: Assigning number (1,7) to disk (o/192.168.10.7;192.168.10.8/DATAC1_CD_08_dm01celadm02)
NOTE: Assigning number (1,8) to disk (o/192.168.10.7;192.168.10.8/DATAC1_CD_03_dm01celadm02)
NOTE: Assigning number (1,9) to disk (o/192.168.10.7;192.168.10.8/DATAC1_CD_11_dm01celadm02)
NOTE: Assigning number (1,10) to disk (o/192.168.10.7;192.168.10.8/DATAC1_CD_01_dm01celadm02)
NOTE: Assigning number (1,11) to disk (o/192.168.10.7;192.168.10.8/DATAC1_CD_04_dm01celadm02)
NOTE: Assigning number (1,21) to disk (o/192.168.10.7;192.168.10.8/DATAC1_CD_05_dm01celadm02)
NOTE: Assigning number (1,43) to disk (o/192.168.10.7;192.168.10.8/DATAC1_CD_02_dm01celadm02)
NOTE: Assigning number (1,36) to disk (o/192.168.10.5;192.168.10.6/DATAC1_CD_07_dm01celadm01)
NOTE: Assigning number (1,37) to disk (o/192.168.10.5;192.168.10.6/DATAC1_CD_09_dm01celadm01)
NOTE: Assigning number (1,38) to disk (o/192.168.10.5;192.168.10.6/DATAC1_CD_11_dm01celadm01)
NOTE: Assigning number (1,0) to disk (o/192.168.10.5;192.168.10.6/DATAC1_CD_08_dm01celadm01)
NOTE: Assigning number (1,40) to disk (o/192.168.10.5;192.168.10.6/DATAC1_CD_00_dm01celadm01)
NOTE: Assigning number (1,41) to disk (o/192.168.10.5;192.168.10.6/DATAC1_CD_03_dm01celadm01)
NOTE: Assigning number (1,42) to disk (o/192.168.10.5;192.168.10.6/DATAC1_CD_06_dm01celadm01)
NOTE: Assigning number (1,44) to disk (o/192.168.10.5;192.168.10.6/DATAC1_CD_05_dm01celadm01)
NOTE: Assigning number (1,45) to disk (o/192.168.10.5;192.168.10.6/DATAC1_CD_01_dm01celadm01)
NOTE: Assigning number (1,46) to disk (o/192.168.10.5;192.168.10.6/DATAC1_CD_02_dm01celadm01)
NOTE: Assigning number (1,47) to disk (o/192.168.10.5;192.168.10.6/DATAC1_CD_10_dm01celadm01)
NOTE: Assigning number (1,2) to disk (o/192.168.10.5;192.168.10.6/DATAC1_CD_04_dm01celadm01)
Thu Jul 20 22:01:28 2023
NOTE: GMON heartbeating for grp 1
GMON querying group 1 at 450 for pid 30, osid 171838
NOTE: Assigning number (1,32) to disk ()
NOTE: Assigning number (1,39) to disk ()
GMON querying group 1 at 451 for pid 30, osid 171838
NOTE: cache closing disk 32 of grp 1: (not open) 
NOTE: process _user171838_+asm1 (171838) 
     initiating offline of disk 39.3915945266 () with mask 0x7e[0x7f] in group 1
NOTE: initiating PST update: grp = 1, dsk = 39/0xe9689532, mask = 0x6a, op = clear
GMON updating disk modes for group 1 at 452 for pid 30, osid 171838
NOTE: cache closing disk 32 of grp 1: (not open) 
ERROR: Disk 39 cannot be offlined, since all the disks [39, 32] with mirrored data would be offline.
ERROR: too many offline disks in PST (grp 1)
WARNING: Offline for disk  in mode 0x7f failed.
NOTE: cache dismounting (not clean) group 1/0x0728AD12 (DATAC1) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 171838, image: oracle@dm01dbadm01.gyzq.cn (TNS V1-V3)
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 1/0x0728AD12 (DATAC1) 
NOTE: cache ending mount (fail) of group DATAC1 number=1 incarn=0x0728ad12
NOTE: cache deleting context for group DATAC1 1/0x0728ad12
NOTE: cache closing disk 32 of grp 1: (not open) 
GMON dismounting group 1 at 453 for pid 30, osid 171838
NOTE: Disk DATAC1_CD_08_DM01CELADM01 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_08_DM01CELADM03 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_04_DM01CELADM01 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_07_DM01CELADM02 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_06_DM01CELADM02 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_00_DM01CELADM02 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_10_DM01CELADM02 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_08_DM01CELADM02 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_03_DM01CELADM02 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_11_DM01CELADM02 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_01_DM01CELADM02 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_04_DM01CELADM02 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_05_DM01CELADM02 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_00_DM01CELADM03 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_01_DM01CELADM03 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_02_DM01CELADM03 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_03_DM01CELADM03 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_04_DM01CELADM03 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_05_DM01CELADM03 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_06_DM01CELADM03 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_07_DM01CELADM03 in mode 0x7f marked for de-assignment
NOTE: Disk  in mode 0x1 marked for de-assignment
NOTE: Disk DATAC1_CD_09_DM01CELADM03 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_10_DM01CELADM03 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_11_DM01CELADM03 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_07_DM01CELADM01 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_09_DM01CELADM01 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_11_DM01CELADM01 in mode 0x7f marked for de-assignment
NOTE: Disk  in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_00_DM01CELADM01 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_03_DM01CELADM01 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_06_DM01CELADM01 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_02_DM01CELADM02 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_05_DM01CELADM01 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_01_DM01CELADM01 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_02_DM01CELADM01 in mode 0x7f marked for de-assignment
NOTE: Disk DATAC1_CD_10_DM01CELADM01 in mode 0x7f marked for de-assignment
ERROR: diskgroup DATAC1 was not mounted
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15066: offlining disk "39" in group "DATAC1" may result in a data loss
ORA-15042: ASM disk "39" is missing from group number "1" 
ORA-15042: ASM disk "32" is missing from group number "1" 
ERROR: alter diskgroup datac1 mount force

故障原因是由于asm disk 32还已经损坏在换盘过程中(数据没有reblance完成),又损坏了asm disk 39,而这两份磁盘中有数据互为镜像,因此磁盘组无法正常mount起来.

检查cell节点celldisk和griddisk情况,确认底层磁盘损坏
cellcli


对于这种情况,因为normal冗余的两份数据都有部分丢失,无法直接恢复数据,通过底层磁盘级别恢复(参考以前一次的Oracle exadata故障恢复:Oracle Exadata坏盘导致磁盘组无法mount恢复),然后比较顺利恢复数据,实现业务数据0丢失

SQL> alter datac1 mount;

Diskgroup altered.

SQL> alter diskgroup datac1 check all;

Diskgroup altered.

多套库顺利open成功
20230728124241


在实际恢复过程中由于客户进行了各种尝试,直接新镜像盘然后插入新盘,强制拉磁盘组drop异常disk操作等,导致第一现场发生一些破坏,增加了恢复难道,但是最终通过各种方法弥补,实现了预期的恢复效果(业务数据0丢失)